wiki:UnicodeTopics

Unicode is a computing industry standard for the consistent encoding, representation and handling of text expressed in most of the world's writing systems.

Codes between SC and TC

The characters which is of the same writing in both Simplified Chinese (SC) and Traditional Chinese (TC) are assigned with the unified code points, but the ones of different writing are assigned with different code points. Like:

中华 / 中華- 中: U+4E2D 华: U+534E 華: U+83EF注:“中"字只有一个编码。
汉字 / 漢字- 汉: U+6C49 漢: U+6F22 字: U+5B57 注:“字"字只有一个编码。
学习 / 學習- 学: U+5B66 學: U+5B78习: U+4E60 習: U+7FD2

多音字 (Heteronym)

对音字在Unicode中只有一个编码,例如,“单”,你通过中文输入法分别输入dan1和shan4,所得到的“单”字的Unicode编码均是U+5355

相应的例子还有:

  • 朝 : chao2 / zhao1
  • 单 : dan1 / shan4
  • 仇 : chou2 / qiu2

Online Tools

  1. Escaped Unicode, Decimal NCRs, Hexadecimal NCRs, UTF-8 Converter -- recommended.
  2. A visual unicode database -- For query or browsing.
  3. Unicode character inspector -- For query. And it will tell you the duplicated input.

TBD - To be done

See Also

Last modified 3 years ago Last modified on May 22, 2015, 1:35:06 PM