Jump to content

Arabic script in Unicode

From Wikipedia, the free encyclopedia

Many scripts in Unicode, such as Arabic, have special orthographic rules that require certain combinations of letterforms to be combined into special ligature forms. In English, the common ampersand (&) developed from a ligature in which the handwritten Latin letters e and t (spelling et, Latin for and) were combined.[1]

As of Unicode 17.0, the Arabic script is contained in the following blocks:[2]

The basic Arabic range encodes the standard letters and diacritics, but does not encode contextual forms (U+0621-U+0652 being directly based on ISO 8859-6); and also includes the most common diacritics and Arabic-Indic digits. The Arabic Supplement range encodes letter variants mostly used for writing African (non-Arabic) languages. The Arabic Extended-B and Arabic Extended-A ranges encode additional Qur'anic annotations and letter variants used for various non-Arabic languages. The Arabic Presentation Forms-A range encodes contextual forms and ligatures of letter variants needed for Persian, Urdu, Sindhi and Central Asian languages. The Arabic Presentation Forms-B range encodes spacing forms of Arabic diacritics, and more contextual letter forms. The presentation forms are present only for compatibility with older standards, and are not currently needed for coding text.[3] The Arabic Mathematical Alphabetical Symbols block encodes characters used in Arabic mathematical expressions. The Indic Siyaq Numbers block contains a specialized subset of Arabic script that was used for accounting in India under the Mughal Empire by the 17th century through the middle of the 20th century.[4][5] The Ottoman Siyaq Numbers block contains a specialized subset of Arabic script, also known as Siyakat numbers, used for accounting in Ottoman Turkish documents.[5]

Contextual forms

[edit]

Below is a demonstration for the basic alphabet used in Modern Standard Arabic illustrating how Arabic letters are expected to appear in different contexts. Codepoints listed as contextual forms should "should not be used in general interchange"[3]. Unicode has other methods of encoding the difference if necessary, such as Zero-width joiner.

General
Unicode
Contextual forms Name
Isolated Final (End) Medial (Middle) Initial (Beginning)
0627
FE8D
FE8E
`alif
0628
b
FE8F
FE90
FE92
FE91
ba`
062A
t
FE95
FE96
FE98
FE97
ta`
062B
th
FE99
FE9A
FE9C
FE9B
ta`
062C
j
FE9D
FE9E
FEA0
FE9F
gim
062D
H
FEA1
FEA2
FEA4
FEA3
ha`
062E
kh
FEA5
FEA6
FEA8
FEA7
ha`
062F
d
FEA9
FEAA
dal
0630
dh
FEAB
FEAC
dal
0631
r
FEAD
FEAE
ra`
0632
z
FEAF
FEB0
zayn/zay
0633
s
FEB1
FEB2
FEB4
FEB3
sin
0634
sh
FEB5
FEB6
FEB8
FEB7
sin
0635
S
FEB9
FEBA
FEBC
FEBB
sad
0636
D
FEBD
FEBE
FEC0
FEBF
dad
0637
T
FEC1
FEC2
FEC4
FEC3
ta`
0638
Z
FEC5
FEC6
FEC8
FEC7
za`
0639
`
FEC9
FECA
FECC
FECB
'ayn
063A
G
FECD
FECE
FED0
FECF
gayn
0641
f
FED1
FED2
FED4
FED3
fa`
0642
q
FED5
FED6
FED8
FED7
qaf
0643
k
FED9
FEDA
FEDC
FEDB
kaf
0644
l
FEDD
FEDE
FEE0
FEDF
lam
0645
m
FEE1
FEE2
FEE4
FEE3
mim
0646
n
FEE5
FEE6
FEE8
FEE7
nun
0647
h
FEE9
FEEA
FEEC
FEEB
ha`
0648
w
FEED
FEEE
waw
064A
y
FEF1
FEF2
FEF4
FEF3
ya`
0622
a
FE81
FE82
`alif maddah
0629
@
FE93
FE94
-- -- Ta` marbutah
0649
~
FEEF
FEF0
-- -- `alif maqsurah

Punctuation and ornaments

[edit]

Only the Arabic question mark <?> and the Arabic comma <,> are used in regular Arabic script typing and the comma is often substituted for the Latin script comma <,> which is also used as the decimal separator when the Eastern Arabic numerals are used (e.g. <100.6> compared to <100,6> ).

  • U+060C , ARABIC COMMA
  • U+060D ARABIC DATE SEPARATOR
  • U+060E ARABIC POETIC VERSE SIGN
  • U+060F ARABIC SIGN MISRA
  • U+061B ; ARABIC SEMICOLON
  • U+061E ARABIC TRIPLE DOT PUNCTUATION MARK
  • U+061F ? ARABIC QUESTION MARK
  • U+066D * ARABIC FIVE POINTED STAR
  • U+06D4 . ARABIC FULL STOP
  • U+06DD @ ARABIC END OF AYAH
  • U+06DE # ARABIC START OF RUB EL HIZB
  • U+06E9 ^ ARABIC PLACE OF SAJDAH
  • U+06FD & ARABIC SIGN SINDHI AMPERSAND
  • U+FD3E Arabic ornate left parenthesis
  • U+FD3F Arabic ornate right parenthesis

Word ligatures

[edit]

Arabic Presentation Forms-A has a few characters defined as "word ligatures" for terms frequently used in formulaic expressions in Arabic. They are rarely used out of professional liturgical typing, also the Rial grapheme is normally written fully, not by the ligature.

  • U+FDF0 ARABIC LIGATURE SALLA USED AS KORANIC STOP SIGN ISOLATED FORM (Sl~, stylized as Sly)
  • U+FDF1 ARABIC LIGATURE QALA USED AS KORANIC STOP SIGN ISOLATED FORM (ql~, stylized as qly)
  • U+FDF2 ARABIC LIGATURE ALLAH ISOLATED FORM (llWh)
  • U+FDF3 ARABIC LIGATURE AKBAR ISOLATED FORM (kbr), as in the phrase llh kbr Allahu akbar
  • U+FDF4 ARABIC LIGATURE MOHAMMAD ISOLATED FORM (mHmd)
  • U+FDF5 ARABIC LIGATURE SALAM ISOLATED FORM (Sl`m, the abbreviation for Sl~ llh `lyh wslm "peace be upon him")
  • U+FDF6 ARABIC LIGATURE RASOUL ISOLATED FORM (rswl)
  • U+FDF7 ARABIC LIGATURE ALAYHE ISOLATED FORM (`lyh)
  • U+FDF8 ARABIC LIGATURE WASALLAM ISOLATED FORM (wslm)
  • U+FDF9 ARABIC LIGATURE SALLA ISOLATED FORM (Sl~)
  • U+FDFA ARABIC LIGATURE SALLALLAHOU ALAYHE WASALLAM (Sl~ llh `lyh wslm "peace be upon him")
  • U+FDFB ARABIC LIGATURE JALLAJALALOUHOU (jl jllh)
  • U+FDFC RIAL SIGN (ryl)
  • U+FDFD ARABIC LIGATURE BISMILLAH AR-RAHMAN AR-RAHEEM (bsm llh lrHmn lrHym bism-i llah-i r-rahman-i r-rahim)

Code blocks

[edit]
See also: Help:IPA/Arabic

Arabic

[edit]

Character table

[edit]
Code Result Unicode name
U+0600 Arabic Number Sign
U+0601 Arabic Sign Sanah
U+0602 Arabic Footnote Marker
U+0603 Arabic Sign Safha
U+0604 Arabic Sign Samvat

used for writing Samvat era dates in Urdu

U+0605 Arabic Number Mark Above

may be used with Coptic Epact numbers

U+0606 Arabic-Indic Cube Root

- U+221B Cube Root

U+0607 Arabic-Indic Fourth Root

- U+221C Fourth Root

U+0608 Arabic Ray
U+0609 Arabic-Indic Per Mille Sign

- U+2030 %0 Per Mille Sign

U+060A Arabic-Indic Per Ten Thousand Sign

- U+2031%00 Per Ten Thousand Sign

U+060B Afghani Sign
U+060C , Arabic Comma

also used with Thaana and Syriac in modern text

- U+002C, Comma

- U+2E32 , Turned Comma

- U+2E41 , Reversed Comma

U+060D Arabic Date Separator
U+060E Arabic Poetic Verse Sign
U+060F Arabic Sign Misra
U+0610 Arabic Sign Sallallahou Alayhe Wassallam

represents sallallahu alayhe wasallam "may God's peace and blessings be upon him"

U+0611 Arabic Sign Alayhe Assallam

represents alayhe assalam "upon him be peace"

U+0612 Arabic Sign Rahmatullah Alayhe

represents rahmatullah alayhe "may God have mercy upon him"

U+0613 Arabic Sign Radi Allahou Anhu

represents radi allahu 'anhu "may God be pleased with him"

U+0614 Arabic Sign Takhallus

sign placed over the name or nom-de-plume of a poet, or in some writings used to mark all proper names

U+0615 Arabic Small High Tah

marks a recommended pause position in some Qurans published in Iran and Pakistan should not be confused with the small TAH sign used as a diacritic for some letters such as 0679

U+0616 Arabic Small High Ligature Alef With Lam With Yeh

early Persian

Arabic Small High Ligature Alef With Yeh Barree

U+0617 Arabic Small High Zain
U+0618 Arabic Small Fatha

should not be confused with 064E Fatha

U+0619 Arabic Small Damma

should not be confused with 064F Damma

U+061A Arabic Small Kasra

should not be confused with 0650 Kasra

U+061B ; Arabic Semicolon

also used with Thaana and Syriac in modern text - U+003B ; Semicolon - U+204F Reversed Semicolon - U+2E35 ; Turned Semicolon

U+061C Arabic Letter Mark (Alm)
U+061D Arabic End Of Text Mark
U+061E Arabic Triple Dot Punctuation Mark
U+061F ? Arabic Question Mark

also used with Thaana and Syriac in modern text - U+003F ? Question Mark - U+2E2E ? Reversed Question Mark

U+0620 Arabic Letter Kashmiri Yeh
U+0621 Arabic Letter Hamza

- U+02BE ` Modifier Letter Right Half Ring

U+0622 a Arabic Letter Alef With Madda Above

a U+0627 U+0653

U+0623 ' Arabic Letter Alef With Hamza Above

' U+0627 U+0654

U+0624 w' Arabic Letter Waw With Hamza Above

w' U+0648 U+0654

U+0625 Arabic Letter Alef With Hamza Below

U+0627 U+0655

U+0626 y' Arabic Letter Yeh With Hamza Above

in Kyrgyz the hamza is consistently positioned to the top right in isolate and final forms y' U+064A U+0654

U+0627 Arabic Letter Alef
U+0628 b Arabic Letter Beh
U+0629 @ Arabic Letter Teh Marbuta
U+062A t Arabic Letter Teh
U+062B th Arabic Letter The
U+062C j Arabic Letter Jeem
U+062D H Arabic Letter Hah
U+062E kh Arabic Letter Khah
U+062F d Arabic Letter Dal
U+0630 dh Arabic Letter Thal
U+0631 r Arabic Letter Reh
U+0632 z Arabic Letter Zain
U+0633 s Arabic Letter Seen
U+0634 sh Arabic Letter Sheen
U+0635 S Arabic Letter Sad
U+0636 D Arabic Letter Dad
U+0637 T Arabic Letter Tah
U+0638 Z Arabic Letter Zah
U+0639 ` Arabic Letter Ain

- U+01B9 zh Latin Small Letter Ezh Reversed - U+02BF ' MODIFIER LETTER LEFT HALF RING

U+063A G Arabic Letter Ghain
U+063B Arabic Letter Keheh With Two Dots Above
U+063C Arabic Letter Keheh With Three Dots Below
U+063D Arabic Letter Farsi Yeh With Inverted V

Azerbaijani

U+063E Arabic Letter Farsi Yeh With Two Dots Above
U+063F Arabic Letter Farsi Yeh With Three Dots Above
U+0640 Arabic Tatweel

inserted to stretch characters or to carry tashkil with no base letter also used with Adlam, Hanifi Rohingya, Mandaic, Manichaean, Psalter Pahlavi, Sogdian, and Syriac= kashida

U+0641 f Arabic Letter Feh
U+0642 q Arabic Letter Qaf
U+0643 k Arabic Letter Kaf
U+0644 l Arabic Letter Lam
U+0645 m Arabic Letter Meem

Sindhi uses a shape with a short tail

U+0646 n Arabic Letter Noon
U+0647 h Arabic Letter Heh
U+0648 w Arabic Letter Waw
U+0649 ~ Arabic Letter Alef Maksura

represents YEH-shaped dual-joining letter with no dots in any positional form not intended for use in combination with 0654 - U+0626 y' Arabic Letter Yeh With Hamza Above

U+064A y Arabic Letter Yeh

loses its dots when used in combination with 0654 retains its dots when used in combination with other combining marks - U+08A8 Arabic Letter Yeh With Two Dots Below And Hamza Above

U+064B an Arabic Fathatan
U+064C un Arabic Dammatan

a common alternative form is written as two intertwined dammas, one of which is turned 180 degrees

U+064D in Arabic Kasratan
U+064E a Arabic Fatha
U+064F u Arabic Damma
U+0650 i Arabic Kasra
U+0651 W Arabic Shadda
U+0652 Arabic Sukun

marks absence of a vowel after the base consonant used in some Qurans to mark a long vowel as ignored can have a variety of shapes, including a circular one and a shape that looks like '06E1' - U+06E1 Arabic Small High Dotless Head Of Khah

U+0653 Arabic Maddah Above

used for madd jaa'iz in South Asian and Indonesian orthographies -U+089C Arabic Madda Waajib -U+089E Arabic Doubled Madda -U+089F Arabic Half Madda Over Madda

U+0654 ' Arabic Hamza Above

restricted to hamza and ezafe semantics is not used as a diacritic to form new letters

U+0655 ' Arabic Hamza Below
U+0656 Arabic Subscript Alef
U+0657 Arabic Inverted Damma

Kashmiri, Urdu, Swahili, Somali

U+0658 Arabic Mark Noon Ghunna

Baluchi indicates nasalization in Urdu

U+0659 Arabic Zwarakay

Pashto

U+065A Arabic Vowel Sign Small V Above

African languages

U+065B Arabic Vowel Sign Inverted Small V Above

African languages

U+065C Arabic Vowel Sign Dot Below

African languages also used in Quranic text in African and other orthographies

U+065D Arabic Reversed Damma

African languages

U+065E Arabic Fatha With Two Dots

Kalami

U+065F Arabic Wavy Hamza Below

Kashmiri

U+0660 0 Arabic-Indic Digit Zero
U+0661 1 Arabic-Indic Digit One
U+0662 2 Arabic-Indic Digit Two
U+0663 3 Arabic-Indic Digit Three
U+0664 4 Arabic-Indic Digit Four
U+0665 5 Arabic-Indic Digit Five
U+0666 6 Arabic-Indic Digit Six
U+0667 7 Arabic-Indic Digit Seven
U+0668 8 Arabic-Indic Digit Eight
U+0669 9 Arabic-Indic Digit Nine
U+066A % Arabic Percent Sign

- U+0025 % Percent Sign

U+066B . Arabic Decimal Separator

the ordinary comma is most commonly used instead

- U+002C, Comma

U+066C , Arabic Thousands Separator

the Arabic comma is most commonly used instead

- U+060C , Arabic Comma

- U+0027 ' Apostrophe

- U+2019 ' Right Single Quotation Mark

U+066D * Arabic Five Pointed Star

appearance rather variable

- U+002A * Asterisk

U+066E Arabic Letter Dotless Beh
U+066F Arabic Letter Dotless Qaf
U+0670 Arabic Letter Superscript Alef
U+0671 ' Arabic Letter Alef Wasla

Quranic Arabic

U+0672 ' Arabic Letter Alef With Wavy Hamza Above

Baluchi, Kashmiri

U+0673 ' Arabic Letter Alef With Wavy Hamza Below (deprecated)[6] Kashmiri

This character is deprecated and its use is strongly discouraged; the sequence 0627 065F is the preferred way of encoding this character.

U+0674 Arabic Letter High Hamza

Kazakh, Jawi forms digraphs

U+0675 ' Arabic Letter High Hamza Alef

preferred spelling is U+0674 U+0627

U+0676 'w Arabic Letter High Hamza Waw

preferred spelling is w U+0674 U+0648

U+0677 'u Arabic Letter U With Hamza Above

preferred spelling is u U+0674 U+06C7

U+0678 'y Arabic Letter High Hamza Yeh

preferred spelling is y U+0674 06CC

U+0679 tt Arabic Letter Tteh

Urdu

U+067A tth Arabic Letter Tteheh

Sindhi

U+067B b Arabic Letter Beeh

Sindhi

U+067C t Arabic Letter Teh With Ring

Pashto

U+067D T Arabic Letter Teh With Three Dots Above Downwards

Sindhi

U+067E p Arabic Letter Peh

Persian, Urdu, ...

U+067F th Arabic Letter Teheh

Sindhi

U+0680 bh Arabic Letter Beheh

Sindhi

U+0681 'h Arabic Letter Hah With Hamza Above

Pashto, Sarikoli represents the phoneme /dz/

U+0682 H Arabic Letter Hah With Two Dots Vertical Above

not used in modern Pashto

U+0683 ny Arabic Letter Nyeh

Sindhi

U+0684 dy Arabic Letter Dyeh

Sindhi, historically Bosnian

U+0685 H Arabic Letter Hah With Three Dots Above

Pashto, Khwarazmian, Sarikoli represents the phoneme /ts/ in Pashto

U+0686 ch Arabic Letter Tcheh

Persian, Urdu, ...

U+0687 cch Arabic Letter Tcheheh

Sindhi

U+0688 dd Arabic Letter Ddal

Urdu

U+0689 D Arabic Letter Dal With Ring

Pashto

U+068A D Arabic Letter Dal With Dot Below

Sindhi, early Persian, Pegon, Malagasy

U+068B Dt Arabic Letter Dal With Dot Below And Small Tah

Lahnda

U+068C dh Arabic Letter Dahal

Sindhi

U+068D ddh Arabic Letter Ddahal

Sindhi

U+068E d Arabic Letter Dul

older shape for DUL, now obsolete in Sindhi Burushaski

U+068F D Arabic Letter Dal With Three Dots Above Downwards

Sindhi current shape used for DUL

U+0690 D Arabic Letter Dal With Four Dots Above

Old Urdu, not in current use

U+0691 rr Arabic Letter Rreh

Urdu

U+0692 R Arabic Letter Reh With Small V

Kurdish

U+0693 R Arabic Letter Reh With Ring

Pashto

U+0694 R Arabic Letter Reh With Dot Below

Kurdish, early Persian

U+0695 R Arabic Letter Reh With Small V Below

Kurdish

U+0696 R Arabic Letter Reh With Dot Below And Dot Above

Pashto

U+0697 R Arabic Letter Reh With Two Dots Above

Dargwa

U+0698 j Arabic Letter Jeh

Persian, Urdu, ...

U+0699 R Arabic Letter Reh With Four Dots Above

Sindhi

U+069A S Arabic Letter Seen With Dot Below And Dot Above

Pashto

U+069B S Arabic Letter Seen With Three Dots Below

early Persian

U+069C S Arabic Letter Seen With Three Dots Below And Three Dots Above

Moroccan Arabic

U+069D S Arabic Letter Sad With Two Dots Below

Turkic

U+069E S Arabic Letter Sad With Three Dots Above

Berber, Burushaski

U+069F T Arabic Letter Tah With Three Dots Above

Old Hausa

U+06A0 GH Arabic Letter Ain With Three Dots Above

Jawi

U+06A1 F Arabic Letter Dotless Feh

Adighe

U+06A2 F Arabic Letter Feh With Dot Moved Below

Maghrib Arabic

U+06A3 F Arabic Letter Feh With Dot Below

Ingush

U+06A4 v Arabic Letter Veh

Middle Eastern Arabic for foreign words Kurdish, Khwarazmian, early Persian, Jawi

U+06A5 f Arabic Letter Feh With Three Dots Below

North African Arabic for foreign words

U+06A6 ph Arabic Letter Peheh

Sindhi

U+06A7 Q Arabic Letter Qaf With Dot Above

Maghrib Arabic, Uyghur

U+06A8 Q Arabic Letter Qaf With Three Dots Above

Tunisian and Algerian Arabic

U+06A9 kh Arabic Letter Keheh

Persian, Urdu, Sindhi, ...= kaf mashkula

U+06AA k Arabic Letter Swash Kaf

represents a letter distinct from Arabic KAF (0643) in Sindhi

U+06AB K Arabic Letter Kaf With Ring

Pashto may appear like an Arabic KAF (0643) with a ring below the base

U+06AC K Arabic Letter Kaf With Dot Above

use for the Jawi gaf is not recommended, although it may be found in some existing text data; recommended character for Jawi gaf is 0762 - U+0762 Arabic Letter Keheh With Dot Above

U+06AD ng Arabic Letter Ng

Uyghur, Kazakh, Moroccan Arabic, early Jawi, early Persian, ...

U+06AE K Arabic Letter Kaf With Three Dots Below

Berber, early Persian Pegon alternative for 08B4

U+06AF g Arabic Letter Gaf

Persian, Urdu, ...

U+06B0 G Arabic Letter Gaf With Ring

Lahnda

U+06B1 N Arabic Letter Ngoeh

Sindhi

U+06B2 G Arabic Letter Gaf With Two Dots Below

not used in Sindhi

U+06B3 G Arabic Letter Gueh

Sindhi, Saraiki

U+06B4 G Arabic Letter Gaf With Three Dots Above

not used in Sindhi, Karakalpak

U+06B5 L Arabic Letter Lam With Small V

Kurdish, historically Bosnian

U+06B6 L Arabic Letter Lam With Dot Above

Kurdish

U+06B7 L Arabic Letter Lam With Three Dots Above

Kurdish

U+06B8 L Arabic Letter Lam With Three Dots Below

Avar, Soqotri

U+06B9 N Arabic Letter Noon With Dot Below
U+06BA N Arabic Letter Noon Ghunna

Urdu, archaic Arabic dotless in all four contextual forms

U+06BB N Arabic Letter Rnoon

dotless in all four contextual forms Sindhi

U+06BC N Arabic Letter Noon With Ring

Pashto

U+06BD N Arabic Letter Noon With Three Dots Above

Jawi

U+06BE h Arabic Letter Heh Doachashmee

forms aspirate digraphs in Urdu and other languages of South Asia represents the glottal fricative /h/ in Uyghur

U+06BF Ch Arabic Letter Tcheh With Dot Above
U+06C0 hy Arabic Letter Heh With Yeh Above

for ezafe, use 0654 over the language-appropriate base letter actually a ligature, not an independent letter Arabic letter hamzah on ha (1.0) hy U+06D5 U+0654

U+06C1 h Arabic Letter Heh Goal

Urdu

U+06C2 H Arabic Letter Heh Goal With Hamza Above

Urdu actually a ligature, not an independent letter H U+06C1 U+0654

U+06C3 @ Arabic Letter Teh Marbuta Goal

Urdu

U+06C4 W Arabic Letter Waw With Ring

Kashmiri

U+06C5 oe Arabic Letter Kirghiz Oe

Kyrgyz a glyph variant occurs which replaces the looped tail with a horizontal bar through the tail

U+06C6 oe Arabic Letter Oe

Uyghur, Kurdish, Kazakh, Azerbaijani, historically Bosnian

U+06C7 u Arabic Letter U

Azerbaijani, Kazakh, Kyrgyz, Uyghur

U+06C8 yu Arabic Letter Yu

Uyghur

U+06C9 yu Arabic Letter Kirghiz Yu

Kazakh, Kyrgyz, historically Bosnian

U+06CA W Arabic Letter Waw With Two Dots Above

Kurdish

U+06CB v Arabic Letter Ve

Uyghur, Kazakh

U+06CC y Arabic Letter Farsi Yeh

Arabic, Persian, Urdu, Kashmiri, ... initial and medial forms of this letter have dots - U+0649 ~ ARABIC LETTER ALEF MAKSURA - U+064A y Arabic Letter Yeh

U+06CD Y Arabic Letter Yeh With Tail

Pashto, Sindhi

U+06CE Y Arabic Letter Yeh With Small V

Kurdish

U+06CF W Arabic Letter Waw With Dot Above

Jawi

U+06D0 Arabic Letter E

Pashto, Uyghur used as the letter bbeh in Sindhi

U+06D1 Arabic Letter Yeh With Three Dots Below

Mende languages, Hausa

U+06D2 y Arabic Letter Yeh Barree

Urdu

U+06D3 y' Arabic Letter Yeh Barree With Hamza Above

Urdu

U+06D4 . Arabic Full Stop

Urdu

U+06D5 ae Arabic Letter Ae

Uyghur, Kazakh, Kyrgyz

U+06D6 Arabic Small High Ligature Sad With Lam With Alef Maksura
U+06D7 Arabic Small High Ligature Qaf With Lam With Alef Maksura
U+06D8 Arabic Small High Meem Initial Form
U+06D9 Arabic Small High Lam Alef
U+06DA Arabic Small High Jeem
U+06DB Arabic Small High Three Dots
U+06DC Arabic Small High Seen
U+06DD @ Arabic End of Ayah
U+06DE # Arabic Star of Rub El Hizb
U+06DF Arabic Small High Rounded Zero

smaller than the typical circular shape used for 0652

U+06E0 Arabic Small High Upright Rectangular Zero

the term "rectangular zero" is a translation of the Arabic name of this sign

U+06E1 Arabic Small High Dotless Head Of Khah presentation form of 0652, using font technology to select the variant is preferred

used in some Qurans to mark absence of a vowel= Arabic jazm - U+0652 Arabic Sukun

U+06E2 Arabic Small High Meem Isolated Form
U+06E3 Arabic Small Low Seen
U+06E4 Arabic Small High Madda

typically used with 06E5, 06E6, 06E7, and 08F3

U+06E5 Arabic Small Waw

- U+08D3 Arabic Small Low Waw - U+08F3 Arabic Small High Waw

U+06E6 Arabic Small Yeh
U+06E7 Arabic Small High Yeh
U+06E8 Arabic Small High Noon
U+06E9 ^ Arabic Place Of Sajdah

there is a range of acceptable glyphs for this character

U+06EA Arabic Empty Centre Low Stop
U+06EB Arabic Empty Centre High Stop
U+06EC Arabic Rounded High Stop With Filled Centre

also used in Quranic text in African and other orthographies to represent wasla, ikhtilas, etc.

U+06ED Arabic Small Low Meem
U+06EE Arabic Letter Dal With Inverted V
U+06EF Arabic Letter Reh With Inverted V

also used in early Persian

U+06F0 0 Extended Arabic-Indic Digit Zero
U+06F1 1 Extended Arabic-Indic Digit One
U+06F2 2 Extended Arabic-Indic Digit Two
U+06F3 3 Extended Arabic-Indic Digit Three
U+06F4 4 Extended Arabic-Indic Digit Four

Persian has a different glyph than Sindhi and Urdu

U+06F5 5 Extended Arabic-Indic Digit Five

Persian, Sindhi, and Urdu share glyph different from Arabic

U+06F6 6 Extended Arabic-Indic Digit Six

Persian, Sindhi, and Urdu have glyphs different from Arabic

U+06F7 7 Extended Arabic-Indic Digit Seven

Urdu and Sindhi have glyphs different from Arabic

U+06F8 8 Extended Arabic-Indic Digit Eight
U+06F9 9 Extended Arabic-Indic Digit Nine
U+06FA Sh Arabic Letter Sheen With Dot Below
U+06FB D Arabic Letter Dad With Dot Below
U+06FC Gh Arabic Letter Ghain With Dot Below
U+06FD & Arabic Sign Sindhi Ampersand
U+06FE +m Arabic Sign Sindhi Postposition Men
U+06FF Arabic Letter Heh With Inverted V

Compact table

[edit]
Arabic[1][2]
Official Unicode Consortium code chart (PDF)
0 1 2 3 4 5 6 7 8 9 A B C D E F
U+060x ,
U+061x ; ALM ?
U+062x a ' w' y' b @ t th j H kh d
U+063x dh r z s sh S D T Z ` G
U+064x f q k l m n h w ~ y an un in a u
U+065x i W ' '
U+066x 0 1 2 3 4 5 6 7 8 9 % . , *
U+067x ' ' ' ' 'w 'u 'y tt tth b t T p th
U+068x bh 'h H ny dy H ch cch dd D D Dt dh ddh d D
U+069x D rr R R R R R R j R S S S S S T
U+06Ax GH F F F v f ph Q Q kh k K K ng K g
U+06Bx G N G G G L L L L N N N N N h Ch
U+06Cx hy h H @ W oe oe u yu yu W v y Y Y W
U+06Dx y y' . ae @ #
U+06Ex ^
U+06Fx 0 1 2 3 4 5 6 7 8 9 Sh D Gh & +m
Notes
1.^ As of Unicode version 17.0
2.^ Unicode code point U+0673 is deprecated as of Unicode version 6.0

Arabic Supplement

[edit]
Arabic Supplement[1]
Official Unicode Consortium code chart (PDF)
0 1 2 3 4 5 6 7 8 9 A B C D E F
U+075x
U+076x
U+077x
Notes
1.^ As of Unicode version 17.0

Arabic Extended-B

[edit]
Arabic Extended-B[1][2]
Official Unicode Consortium code chart (PDF)
0 1 2 3 4 5 6 7 8 9 A B C D E F
U+087x
U+088x
U+089x
Notes
1.^ As of Unicode version 17.0
2.^ Grey areas indicate non-assigned code points

Arabic Extended-A

[edit]
Arabic Extended-A[1]
Official Unicode Consortium code chart (PDF)
0 1 2 3 4 5 6 7 8 9 A B C D E F
U+08Ax
U+08Bx
U+08Cx
U+08Dx
U+08Ex
U+08Fx
Notes
1.^ As of Unicode version 17.0

Arabic Presentation Forms A

[edit]

They are mostly ligatures which can be created from the previous charts' characters, with the exception of the bracket-like graphemes and some of them are ligatures of common liturgical phrases.

Arabic Presentation Forms-A[1][2]
Official Unicode Consortium code chart (PDF)
0 1 2 3 4 5 6 7 8 9 A B C D E F
U+FB5x
U+FB6x
U+FB7x
U+FB8x
U+FB9x
U+FBAx
U+FBBx
U+FBCx
U+FBDx
U+FBEx
U+FBFx
U+FC0x
U+FC1x
U+FC2x
U+FC3x
U+FC4x
U+FC5x
U+FC6x
U+FC7x
U+FC8x
U+FC9x
U+FCAx
U+FCBx
U+FCCx
U+FCDx
U+FCEx
U+FCFx
U+FD0x
U+FD1x
U+FD2x
U+FD3x
U+FD4x
U+FD5x
U+FD6x
U+FD7x
U+FD8x
U+FD9x
U+FDAx
U+FDBx
U+FDCx
U+FDDx
U+FDEx
U+FDFx
Notes
1.^ As of Unicode version 17.0
2.^ Black areas indicate noncharacters (code points that are guaranteed never to be assigned as encoded characters in the Unicode Standard)

Arabic Presentation Forms B

[edit]

These can all be created from the basic chart's characters.

Arabic Presentation Forms-B[1][2]
Official Unicode Consortium code chart (PDF)
0 1 2 3 4 5 6 7 8 9 A B C D E F
U+FE7x
U+FE8x
U+FE9x
U+FEAx
U+FEBx
U+FECx
U+FEDx
U+FEEx
U+FEFx ZW
NBSP
Notes
1.^ As of Unicode version 17.0
2.^ Grey areas indicate non-assigned code points

Rumi Numeral Symbols

[edit]
Rumi Numeral Symbols[1][2]
Official Unicode Consortium code chart (PDF)
0 1 2 3 4 5 6 7 8 9 A B C D E F
U+10E6x
U+10E7x
Notes
1.^ As of Unicode version 17.0
2.^ Grey area indicates non-assigned code point

Arabic Extended-C

[edit]
Arabic Extended-C[1][2]
Official Unicode Consortium code chart (PDF)
0 1 2 3 4 5 6 7 8 9 A B C D E F
U+10ECx
U+10EDx
U+10EEx
U+10EFx
Notes
1.^ As of Unicode version 17.0
2.^ Grey areas indicate non-assigned code points

Indic Siyaq Numbers

[edit]
Indic Siyaq Numbers[1][2]
Official Unicode Consortium code chart (PDF)
0 1 2 3 4 5 6 7 8 9 A B C D E F
U+1EC7x
U+1EC8x
U+1EC9x
U+1ECAx
U+1ECBx
Notes
1.^ As of Unicode version 17.0
2.^ Grey areas indicate non-assigned code points

Ottoman Siyaq Numbers

[edit]
Ottoman Siyaq Numbers[1][2]
Official Unicode Consortium code chart (PDF)
0 1 2 3 4 5 6 7 8 9 A B C D E F
U+1ED0x
U+1ED1x
U+1ED2x
U+1ED3x
U+1ED4x
Notes
1.^ As of Unicode version 17.0
2.^ Grey areas indicate non-assigned code points

Arabic Mathematical Alphabetic Symbols

[edit]
Arabic Mathematical Alphabetic Symbols[1][2]
Official Unicode Consortium code chart (PDF)
0 1 2 3 4 5 6 7 8 9 A B C D E F
U+1EE0x
U+1EE1x
U+1EE2x
U+1EE3x
U+1EE4x
U+1EE5x
U+1EE6x
U+1EE7x
U+1EE8x
U+1EE9x
U+1EEAx
U+1EEBx
U+1EECx
U+1EEDx
U+1EEEx
U+1EEFx
Notes
1.^ As of Unicode version 17.0
2.^ Grey areas indicate non-assigned code points

References

[edit]
  1. ^ "What is the origin of the ampersand (&)?"
  2. ^ "UAX #24: Script data file". Unicode Character Database. The Unicode Consortium.
  3. ^ a b "Arabic, Arabic Presentation Forms-B". The Unicode Standard. The Unicode Consortium. September 2025.
  4. ^ Pandey, Anshuman (2015-11-05). "L2/15-121R2: Proposal to Encode Indic Siyaq Numbers" (PDF).
  5. ^ a b "Chapter 22: Symbols". Unicode, Inc. September 2024.
  6. ^ Deprecated as of Unicode version 6.0 UCD Change History "The particular combination of an alef with this vowel mark should be written with the sequence , rather than with the character U+0673 ARABIC LETTER ALEF WITH WAVY HAMZA BELOW, which has been deprecated and which is not canonically equivalent. "Section 9.2: Arabic, Additional Vowel Marks". The Unicode Standard. The Unicode Consortium. September 2025.
[edit]
Unicode
Code points
Characters
Special purpose
Lists
Processing
Algorithms
Comparison of encodings
On pairs of
code points
Usage
Related standards
Related topics
Scripts and symbols in Unicode
Common and
inherited scripts
Modern scripts
Ancient and
historic scripts
Notational scripts
Symbols, emojis