The HZ encoding was invented to facilitate the use of Chinese characters through e-mail, which at that time only allowed 7-bit characters. Therefore, in lieu of standard ISO 2022 escape sequences or 8-bit characters , the HZ code uses only printable, 7-bit characters to represent Chinese characters.
It was also popular in USENET networks, which in the late 1980s and early 1990s, generally did not allow transmission of 8-bit characters or escape characters.
Structure and use
In the HZ encoding system, the character sequences "~" act as escape sequences; anything between them is interpreted as Chinese encoded in GB2312 . Outside the escape sequences, characters are assumed to be ASCII.
An example will help illustrate the relationship between GB2312, , and the HZ code:
{| border=1 cellpadding=4 style="border-collapse: collapse;"
|+ Various forms of the GB2312 code for the character "一"
|---
! Form || Code || With escape sequences || Remarks
|---
| Kuten / Qūwèi / 区位 form || 5027 || — || Zone 50, point 27
|---
| ISO 2022 form || 5216 3B16 || 0E16 5216 3B16 0F16 || 50 + 32 = 82 = 5216
|---
| EUC-CN form || D216 BB16 || D216 BB16 || 5216 ∨ 8016 = D216
|---
| HZ form || 5216 3B16 || 7E16 7B16 5216 3B16 7E16 7D16 || Appears as ~ without HZ decoder
|---
| HZ form || D216 BB16 || 7E16 7B16 D216 BB16 7E16 7D16 || EUC form acceptable to at least some decoders
|}
HZ was originally designed to be used purely as a 7-bit code. However, when situations allow, the escape sequences "~" sometimes surround characters represented in EUC-CN; this alternative use allows Chinese to be readable either with the help of HZ decoder software, or with a system that understands EUC-CN.
Additionally, the specification defines that
* the sequence "~~" is to be treated as encoding a single ASCII "~"
* the character "~" followed by a newline is to be discarded.
However, not all HZ decoders follow these two rules.
HZ decoders
The first HZ decoder was written in 1989 by the code's inventor for the Unix operating system.
The hztty program, also for the Unix operating system, was also among the first and one of the most popular HZ decoders. It deviates from the specification in that it will display the escape sequences , and it does not treat "~~" and "~" followed by a newline specially. This was probably to allow software which assumes one character to occupy one screen position to function correctly without modification.
Support on Microsoft Windows came later, and a number of third-party "Chinese systems" support HZ. These systems may provide an option to hide the escape sequences.
No comments:
Post a Comment