Dark Mode

Jump to content

KOI8-RU

From Wikipedia, the free encyclopedia
(Redirected from Code page 1167)
8-bit Character encoding
KOI8-RU
LanguagesBelarusian, Ukrainian, Russian, Bulgarian
Classification8-bit KOI, extended ASCII
ExtendsKOI8-B
Based onKOI8-U, KOI8-R
Other related encodingsKOI8-E, KOI8-F

KOI8-RU is an 8-bit character encoding, designed to cover Russian, Ukrainian, and Belarusian which use a Cyrillic alphabet. It is closely related to KOI8-R, which covers Russian and Bulgarian, but replaces ten box drawing characters with five Ukrainian and Belarusian letters G', Ie, I, Yi, and U in both upper case and lower case. It is even more closely related to KOI8-U, which does not include U but otherwise makes the same letter replacements. The additional letter allocations are matched by KOI8-E, except for G' which is added to KOI8-F.

In IBM, KOI8-RU is assigned code page/CCSID 1167.[1][2]

KOI8 remains much more commonly used than ISO 8859-5, which never really caught on.[citation needed] Another common Cyrillic character encoding is Windows-1251. In the future, both may eventually give way to Unicode.

KOI8 stands for Kod obmena informatsiey, 8 bit (Russian: Kod obmena informatsiei, 8 bit) which means "Code for Information Exchange, 8 bit".

The KOI8 character sets have the property that the Russian Cyrillic letters are in pseudo-Roman order rather than the natural Cyrillic alphabetical order as in ISO 8859-5. Although this may seem unnatural, it has the useful property that if the eighth bit is stripped, the text can still be read (or at least deciphered) in case-reversed transliteration on an ordinary ASCII terminal. For instance, "Kod Obmena Informatsiei" in KOI8-RU becomes kOD oBMENA iNFORMACIEJ (the Russian meaning of the "KOI" acronym) if the 8th bit is stripped.

Character set

[edit]

The following table shows the KOI8-RU encoding. Each character is shown with its equivalent Unicode code point.

KOI8-RU[3][4][5]
0 1 2 3 4 5 6 7 8 9 A B C D E F
0x
1x
2x SP ! " # $ % & ' ( ) * + , - . /
3x 0 1 2 3 4 5 6 7 8 9 : ; < = > ?
4x @ A B C D E F G H I J K L M N O
5x P Q R S T U V W X Y Z [ \ ] ^ _
6x ` a b c d e f g h i j k l m n o
7x p q r s t u v w x y z { | } ~
8x -
2500
|
2502
+
250C
+
2510
+
2514
+
2518
+
251C
+
2524
+
252C
+
2534
+
253C
#
2580
#
2584
#
2588
#
258C
#
2590
9x #
2591
#
2592
#
2593
"[a]
201C
#
25A0

2219
"
201D
--[a]
2014
No.
2116
(tm)[a]
2122
NBSP >>
00BB
(r)
00AE
<<
00AB
*
00B7
$?
00A4
Ax -
2550
|
2551
+
2552
io
0451
ie[b][c]
0454
+
2554
i[b][c]
0456
yi[b][c]
0457
+
2557
+
2558
+
2559
+
255A
+
255B
g'[b]
0491
u[c]
045E
+
255E
Bx +
255F
+
2560
+
2561
Io
0401
Ie[b][c]
0404
+
2563
I[b][c]
0406
Yi[b][c]
0407
+
2566
+
2567
+
2568
+
2569
+
256A
G'[b]
0490
U[c]
040E
(c)
00A9
Cx iu
044E
a
0430
b
0431
ts
0446
d
0434
e
0435
f
0444
g
0433
kh
0445
i
0438
i
0439
k
043A
l
043B
m
043C
n
043D
o
043E
Dx p
043F
ia
044F
r
0440
s
0441
t
0442
u
0443
zh
0436
v
0432
'
044C
y
044B
z
0437
sh
0448
e
044D
shch
0449
ch
0447
'
044A
Ex Iu
042E
A
0410
B
0411
Ts
0426
D
0414
E
0415
F
0424
G
0413
Kh
0425
I
0418
I
0419
K
041A
L
041B
M
041C
N
041D
O
041E
Fx P
041F
Ia
042F
R
0420
S
0421
T
0422
U
0423
Zh
0416
V
0412
'
042C
Y
042B
Z
0417
Sh
0428
E
042D
Shch
0429
Ch
0427
'
042A
Differences from KOI8-R
  1. ^ a b c Changed relative to KOI8-R to match Windows-1251.
  2. ^ a b c d e f g h Changed relative to KOI8-R to match KOI8-U.
  3. ^ a b c d e f g h Changed relative to KOI8-R to match KOI8-E.

Although RFC 2319 says that character 0x95 should be U+2219 (), it may also be U+2022 (*) to match the bullet character in Windows-1251.

Some references have a typo and incorrectly state that character 0xB4 is U+0403, rather than the correct U+0404. This typo is present in Appendix A of RFC 2319 (but the table in the main text of the RFC gives the correct mapping).

See also

[edit]

References

[edit]
  1. ^ "Code page 1167 information document". Archived from the original on 2017-01-16.
  2. ^ "CCSID 1167 information document". Archived from the original on 2016-03-27.
  3. ^ Leisher, Mark (1999-12-20), KOI8-RU Belarusian/Ukrainian Cyrillic to Unicode 2.1 mapping table, KOI8RU.TXT, archived from the original on 2020-07-28, retrieved 2020-04-29
  4. ^ Code Page CPGID 01167 (pdf) (PDF), IBM
  5. ^ Code Page CPGID 01167 (txt), IBM
[edit]
Early
telecommunication
ISO/IEC 8859
Bibliographic use
National standards
ISO/IEC 2022
Code pages
Mac OS
("scripts")
DOS
IBM AIX
Windows
EBCDIC
DEC
terminals
(VTx)
Platform
specific
Other
Unicode,
ISO/IEC 10646
TeX typesetting
Control character
Related topics