KOI8-RU
| Languages | Belarusian, Ukrainian, Russian, Bulgarian |
|---|---|
| Classification | 8-bit KOI, extended ASCII |
| Extends | KOI8-B |
| Based on | KOI8-U, KOI8-R |
| Other related encodings | KOI8-E, KOI8-F |
KOI8-RU is an 8-bit character encoding, designed to cover Russian, Ukrainian, and Belarusian which use a Cyrillic alphabet. It is closely related to KOI8-R, which covers Russian and Bulgarian, but replaces ten box drawing characters with five Ukrainian and Belarusian letters G', Ie, I, Yi, and U in both upper case and lower case. It is even more closely related to KOI8-U, which does not include U but otherwise makes the same letter replacements. The additional letter allocations are matched by KOI8-E, except for G' which is added to KOI8-F.
In IBM, KOI8-RU is assigned code page/CCSID 1167.[1][2]
KOI8 remains much more commonly used than ISO 8859-5, which never really caught on.[citation needed] Another common Cyrillic character encoding is Windows-1251. In the future, both may eventually give way to Unicode.
KOI8 stands for Kod obmena informatsiey, 8 bit (Russian: Kod obmena informatsiei, 8 bit) which means "Code for Information Exchange, 8 bit".
The KOI8 character sets have the property that the Russian Cyrillic letters are in pseudo-Roman order rather than the natural Cyrillic alphabetical order as in ISO 8859-5. Although this may seem unnatural, it has the useful property that if the eighth bit is stripped, the text can still be read (or at least deciphered) in case-reversed transliteration on an ordinary ASCII terminal. For instance, "Kod Obmena Informatsiei" in KOI8-RU becomes kOD oBMENA iNFORMACIEJ (the Russian meaning of the "KOI" acronym) if the 8th bit is stripped.
Character set
[edit]The following table shows the KOI8-RU encoding. Each character is shown with its equivalent Unicode code point.
| 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | A | B | C | D | E | F | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0x | ||||||||||||||||
| 1x | ||||||||||||||||
| 2x | SP | ! | " | # | $ | % | & | ' | ( | ) | * | + | , | - | . | / |
| 3x | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | : | ; | < | = | > | ? |
| 4x | @ | A | B | C | D | E | F | G | H | I | J | K | L | M | N | O |
| 5x | P | Q | R | S | T | U | V | W | X | Y | Z | [ | \ | ] | ^ | _ |
| 6x | ` | a | b | c | d | e | f | g | h | i | j | k | l | m | n | o |
| 7x | p | q | r | s | t | u | v | w | x | y | z | { | | | } | ~ | |
| 8x | - 2500 |
| 2502 |
+ 250C |
+ 2510 |
+ 2514 |
+ 2518 |
+ 251C |
+ 2524 |
+ 252C |
+ 2534 |
+ 253C |
# 2580 |
# 2584 |
# 2588 |
# 258C |
# 2590 |
| 9x | # 2591 |
# 2592 |
# 2593 |
"[a] 201C |
# 25A0 |
2219 |
" 201D |
--[a] 2014 |
No. 2116 |
(tm)[a] 2122 |
NBSP | >> 00BB |
(r) 00AE |
<< 00AB |
* 00B7 |
$? 00A4 |
| Ax | - 2550 |
| 2551 |
+ 2552 |
io 0451 |
ie[b][c] 0454 |
+ 2554 |
i[b][c] 0456 |
yi[b][c] 0457 |
+ 2557 |
+ 2558 |
+ 2559 |
+ 255A |
+ 255B |
g'[b] 0491 |
u[c] 045E |
+ 255E |
| Bx | + 255F |
+ 2560 |
+ 2561 |
Io 0401 |
Ie[b][c] 0404 |
+ 2563 |
I[b][c] 0406 |
Yi[b][c] 0407 |
+ 2566 |
+ 2567 |
+ 2568 |
+ 2569 |
+ 256A |
G'[b] 0490 |
U[c] 040E |
(c) 00A9 |
| Cx | iu 044E |
a 0430 |
b 0431 |
ts 0446 |
d 0434 |
e 0435 |
f 0444 |
g 0433 |
kh 0445 |
i 0438 |
i 0439 |
k 043A |
l 043B |
m 043C |
n 043D |
o 043E |
| Dx | p 043F |
ia 044F |
r 0440 |
s 0441 |
t 0442 |
u 0443 |
zh 0436 |
v 0432 |
' 044C |
y 044B |
z 0437 |
sh 0448 |
e 044D |
shch 0449 |
ch 0447 |
' 044A |
| Ex | Iu 042E |
A 0410 |
B 0411 |
Ts 0426 |
D 0414 |
E 0415 |
F 0424 |
G 0413 |
Kh 0425 |
I 0418 |
I 0419 |
K 041A |
L 041B |
M 041C |
N 041D |
O 041E |
| Fx | P 041F |
Ia 042F |
R 0420 |
S 0421 |
T 0422 |
U 0423 |
Zh 0416 |
V 0412 |
' 042C |
Y 042B |
Z 0417 |
Sh 0428 |
E 042D |
Shch 0429 |
Ch 0427 |
' 042A |
- ^ a b c Changed relative to KOI8-R to match Windows-1251.
- ^ a b c d e f g h Changed relative to KOI8-R to match KOI8-U.
- ^ a b c d e f g h Changed relative to KOI8-R to match KOI8-E.
Although RFC 2319 says that character 0x95 should be U+2219 (), it may also be U+2022 (*) to match the bullet character in Windows-1251.
Some references have a typo and incorrectly state that character 0xB4 is U+0403, rather than the correct U+0404. This typo is present in Appendix A of RFC 2319 (but the table in the main text of the RFC gives the correct mapping).
See also
[edit]References
[edit]- ^ "Code page 1167 information document". Archived from the original on 2017-01-16.
- ^ "CCSID 1167 information document". Archived from the original on 2016-03-27.
- ^ Leisher, Mark (1999-12-20), KOI8-RU Belarusian/Ukrainian Cyrillic to Unicode 2.1 mapping table, KOI8RU.TXT, archived from the original on 2020-07-28, retrieved 2020-04-29
- ^ Code Page CPGID 01167 (pdf) (PDF), IBM
- ^ Code Page CPGID 01167 (txt), IBM
External links
[edit]- Nechayev, Valentin (2013) [2001]. "Review of 8-bit Cyrillic encodings universe". Archived from the original on 2016-12-05. Retrieved 2016-12-05.