Dark Mode

Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

unicode-org/unihan-database

Repository files navigation

Unihan Database

The purpose of this repository is for reviewing draft Unihan database changes, removals, and additions by experts.

Each provisional Unihan database property currently being worked on has its own data file. At the moment, these are:

Additional files included are:

  • AlternateRadicals.txt
  • CantoneseLookup.txt

AlternateRadicals.txt is a list of characters which could reasonably be looked up in a radical-stroke index such as Unicode's under multiple radical-stroke values. Excluded are instances where the radical is the same but stroke counts differ only slightly. For easier editing, the characters for the radicals are generally shown, e.g.

U+61D5 Yan 61.14 27.16

In all cases, the first value should be considered the standard value as defined in UAX #38.

Simplified radicals are not indicated.

CantoneseLookup.txt is an aid to editors of the kCantonese property, and includes ideographs for which a Cantonese reading is known to exist, but whose Cantonese reading has not been confirmed by an authoritative source.

Changes to properties that are not provisional require UTC approval. As such, the appropriate way to request changes to non-provisional properties is by preparing and submitting a proposal, or submitting feedback via the Contact Form, not by submitting a pull request, or creating a new issue in this repository.

The format for the data files in this repository is almost exactly as in the Unihan database, using tabs to delimit the three fields, but with the actual ideograph following the code point in the first column to ease review. For example:

U+4E95 Jing kCantonese zeng2

Please use the #unihan channel in the Unicode Consortium's Slack organization for general discussions, or for requesting that other property data files be added to this repository.

Please use the #cantonese channel in the Unicode Consortium's Slack organization for discussions regarding kCantonese property values.

Copyright & Licenses

Copyright (c) 2021-2025 Unicode, Inc. Unicode and the Unicode Logo are registered trademarks of Unicode, Inc. in the United States and other countries.

A CLA is required to contribute to this project - please refer to the CONTRIBUTING.md file (or start a Pull Request) for more information.

The contents of this repository are governed by the Unicode Terms of Use and are released under LICENSE.

About

For review of draft Unihan database changes, removals, and additions by experts.

Resources

Readme

License

View license

Contributing

Contributing

Stars

Watchers

Forks

Releases

No releases published

Sponsor this project

Packages