Chinese language

Saturday, October 4, 2008

Han unification

Han unification is an effort by the authors of Unicode and the Universal Character Set to map multiple character sets of the so-called CJK languages into a single set of unified . Han characters are a common feature of written , , , and in Hong Kong, and — at least historically — other East and Southeast Asian languages.

Modern Chinese, Korean, and Japanese typefaces typically use regional or historical . In the formulation of Unicode, an attempt was made to unify these variants by considering them different glyphs representing the same "grapheme", or unit, hence, "Han unification", with the resulting character repertoire sometimes contracted to Unihan.

Unihan can also refer to the Unihan Database maintained by the Unicode Consortium, which provides information about all of the unified Han characters encoded in the Unicode standard, including mappings to various national and industry standards, indices into standard dictionaries, encoded variants, pronunciations in various languages, and an English definition. The database is available to the public as and via an . The latter also includes representative glyphs and definitions for compound words drawn from the free Japanese EDICT and Chinese CEDICT dictionary projects .

Rationale and controversy

Rules for Han unification are given in the East Asian Scripts chapter of the various versions of the Unicode Standard . The Ideographic Rapporteur Group , made up of experts from the Chinese-speaking countries, North and South Korea, Japan, Vietnam, and other countries, is responsible for the process.

One possible rationale is the desire to limit the size of the full Unicode character set, where CJK characters as represented by discrete ideograms may approach or exceed 100,000, . article located on IBM DeveloperWorks attempts to illustrate part of the motivation for Han unification:

In fact, the three ideographs for "one" are encoded separately in Unicode, as they are not considered national variants. The first and second are used on financial instruments to prevent tampering, , while the third is the common form in all three countries.

However, Han unification has also caused considerable controversy, particularly among the Japanese public, who, with the nation's literati, have a history of protesting the culling of historically and culturally significant variants.

Since the Unihan standard encodes "graphemes", not "glyphs", the graphical artifacts produced by Unicode have been considered temporary technical hurdles, and at most, cosmetic. However, again, particularly in Japan, due in part to the way in which Chinese characters were incorporated into Japanese writing systems historically, the inability to specify a particular variant is considered a significant obstacle to the use of Unicode in scholarly work. For example, the unification of "grass" , means that a historical text cannot be encoded so as to preserve its peculiar orthography. Instead, for example, the scholar would be required to locate the desired glyph in a specific typeface in order to convey the text as written, defeating the purpose of a unified character set.

Small differences in graphical representation are also problematic when they affect legibility or the wrong cultural tradition. Besides making some Unicode fonts unusable for texts involving multiple "Unihan languages", names or other orthographically sensitive terminology might be displayed incorrectly. While this may be considered primarily a graphical representation or rendering problem to be overcome by more artful fonts, the widespread use of Unicode would make it difficult to preserve such distinctions. The problem of one character representing semantically different concepts is also present in the Latin part of Unicode. The Unicode character for an apostrophe is the same as the character for a right single quote: ’. On the other hand, it is sometimes pointed out that the capital Latin letter "A" is not unified with the Greek letter "Α" . This is, of course, desirable for reasons of compatibility, and deals with a much smaller alphabetic character set.

While the unification aspect of Unicode is controversial in some quarters for the reasons given above, Unicode itself does now encode a vast number of seldom-used characters of a more-or-less antiquarian nature.

Some of the controversy stems from the fact that the very decision of performing Han unification was made by the initial Unicode Consortium, which at the time was a consortium of North American companies and organizations , but included no East Asia government representatives. The initial design goal was to create a 16-bit standard, and Han unification was therefore a critical step for avoiding tens of thousands of character duplications. This 16-bit requirement was later abandoned, making the size of the character set less an issue today.

The controversy later extended to the internationally representative ISO: the initial CJK-JRG group favored a proposal for a non-unified character set, "which was thrown out in favor of unification with the Unicode Consortium's unified character set by the votes of American and European ISO members" . Endorsing the Unicode Han unification was a necessary step for the heated ISO 10646/Unicode merger.

Much of the controversy surrounding Han unification is based on the distinction between glyphs, as defined in Unicode, and the related but distinct idea of graphemes. Unicode defines abstract characters, as opposed to glyphs, which are a particular visual representations of a character in a specific typeface, or a grapheme, the "basic unit of writing" in a given language. One character may be represented by many distinct glyphs, for example a "g" or an "a", both of which may have one loop or two. In Dutch, "ij" is a sometimes considered a single letter , and thus arguably a grapheme . For example, the first letter in "IJsselmeer" is capitalized. Similarly for "ch" in some Spanish-speaking countries, and "lj" in Croatian. Graphemes present in national character code standards have been added to Unicode, as required by Unicode's Source Separation rule, even where they can be composed of characters already available. The national character code standards existing in CJK languages are considerably more involved, given the technological limitations under which they evolved, and so the official CJK participants in Han unification may well have been amenable to reform.

Unlike European versions, CJK Unicode fonts, due to Han unification, have large but irregular patterns of overlap, requiring language-specific fonts. Unfortunately, language-specific fonts also make it difficult to access to a variant which, as with the "grass" example, happens to appear more typically in another language style. Unihan proponents tend to favor markup languages for defining language strings, but this would not ensure the use of a specific variant in the case given, only the language-specific font more likely to depict a character as that variant.

Chinese users seem to have fewer objections to Han unification, largely because Unicode did not attempt to unify Simplified Chinese characters, , with Traditional Chinese characters, as used in Hong Kong, Taiwan , and, with some differences, more familiar to Korean and Japanese users. Unicode is seen as neutral with regards to this politically charged issue, and has encoded Simplified and Traditional Chinese glyphs separately, . It is also noted that Traditional and Simplified characters should be encoded separately according to Unicode Han Unification rules, because they are distinguished in pre-existing PRC character sets. Furthermore, as with other variants, Traditional to Simplified characters is not a one-to-one relationship.

Specialist character sets developed to address, or regarded by some as not suffering from, these perceived deficiencies include:
*ISO/IEC 2022
*CNS character set
*CCCII character set
*
*UTF-2000
*Mojikyo
*
* and its successor HKSCS

However, none of these alternative standards has been as widely adopted as Unicode, which is now the base character set for many new standards and protocols, and is built into the architecture of operating systems , programming languages , and libraries , font formats and so on.

Examples of language dependent characters

In each row of the following table, the same character is repeated in all five columns. However, each column is marked as being in a different language: , , or . The should select, for each character, a glyph suitable to the specified language. This only works for fallback glyph selection if you have CJK fonts installed on your system and the font selected to display this article does not include glyphs for these characters. Note also that Unicode includes non-graphical language tag characters in the range U+E0000 – U+E007F for plain text language tagging.

Examples of some non-unified Han ideographs

For some glyphs, Unicode has encoded variant characters, making it unnecessary to switch between fonts or language tags. In the following table, the separate rows in each group contains the Unicode equivalent character using different code points. Note that for characters such as 入 , the only way to display the two variants is to change font as described in the previous table. However, for 內 , there is an alternate character 内 as illustrated below. For some characters, like 兌/兑 , either method can be used to display the different glyphs.

{| border style="font-size: xx-large; line-height: normal; text-align: center; border-collapse: collapse"
|- style="text-align: left; vertical-align: bottom" lang="zh" xml:lang="zh"
|style="font-size: medium; font-family: sans-serif;" |Code
|valign="bottom" style="font-size: medium; font-family: sans-serif;"|
|valign="bottom" style="font-size: medium"|
|style="font-size: medium"|
|style="font-size: medium"|
|style="font-size: medium"|

|-
|style="font-size: medium; font-family: sans-serif;" |U+9AD8
| lang="zh" xml:lang="zh"|高
| lang="zh-Hans" xml:lang="zh-Hans"|高
|style="vertical-align: middle" lang="zh-Hant" xml:lang="zh-Hant"|高
|style="vertical-align: middle" lang="ja" xml:lang="ja"|高
|style="vertical-align: middle; height: 1.5em;" lang="ko" xml:lang="ko"|高
|-
|style="font-size: medium; font-family: sans-serif;" |U+9AD9
| lang="zh" xml:lang="zh"|髙
| lang="zh-Hans" xml:lang="zh-Hans"|髙
|style="vertical-align: middle" lang="zh-Hant" xml:lang="zh-Hant"|髙
|style="vertical-align: middle" lang="ja" xml:lang="ja"|髙
|style="vertical-align: middle; height: 1.5em;" lang="ko" xml:lang="ko"|�br />

Unicode ranges

Ideographic characters assigned by Unicode appear in the following blocks:

*CJK Unified Ideographs
*CJK Unified Ideographs Extension A
*CJK Unified Ideographs Extension B

Unicode includes support of CJKV radicals, strokes, punctuation, marks and symbols in the following blocks:
*CJK Radicals Supplement
*CJK Symbols and Punctuation
*CJK Strokes
*Ideographic Description Characters

Additional compatibility characters appear in these blocks:
*Kangxi Radicals
*Enclosed CJK Letters and Months
*CJK Compatibility
*CJK Compatibility Ideographs
*CJK Compatibility Ideographs
*CJK Compatibility Forms

These compatibility characters are included for compatibility with legacy text handling system and other legacy character sets. They include forms of characters for vertical text layout and rich text characters that Unicode recommends handling through other means.

Unihan database files

The Unihan project always made large effort to let available their build database.

An Unihan.zip file is provided on unicode.org, it provide all datas the unihan team have collected.

A project libUnihan provide a normalized SQLite Unihan database and corresponding C library. All tables in this database are in fifth normal form.

This 69Mo libUnihan is released as LGPL, while its database, UnihanDb, is released as MIT License.

HZ (character encoding)

The HZ character encoding is an of GB2312 that was formerly commonly used in email and USENET postings. It was designed in 1989 by Fung Fung Lee of Stanford University, and subsequently codified in 1995 into RFC 1843.

The HZ encoding was invented to facilitate the use of Chinese characters through e-mail, which at that time only allowed 7-bit characters. Therefore, in lieu of standard ISO 2022 escape sequences or 8-bit characters , the HZ code uses only printable, 7-bit characters to represent Chinese characters.

It was also popular in USENET networks, which in the late 1980s and early 1990s, generally did not allow transmission of 8-bit characters or escape characters.

Structure and use

In the HZ encoding system, the character sequences "~" act as escape sequences; anything between them is interpreted as Chinese encoded in GB2312 . Outside the escape sequences, characters are assumed to be ASCII.

An example will help illustrate the relationship between GB2312, , and the HZ code:

{| border=1 cellpadding=4 style="border-collapse: collapse;"
|+ Various forms of the GB2312 code for the character "一"
|---
! Form || Code || With escape sequences || Remarks
|---
| Kuten / Qūwèi / 区位 form || 5027 || — || Zone 50, point 27
|---
| ISO 2022 form || 52₁₆ 3B₁₆ || 0E₁₆ 52₁₆ 3B₁₆ 0F₁₆ || 50 + 32 = 82 = 52₁₆
|---
| EUC-CN form || D2₁₆ BB₁₆ || D2₁₆ BB₁₆ || 52₁₆ ∨ 80₁₆ = D2₁₆
|---
| HZ form || 52₁₆ 3B₁₆ || 7E₁₆ 7B₁₆ 52₁₆ 3B₁₆ 7E₁₆ 7D₁₆ || Appears as ~ without HZ decoder
|---
| HZ form || D2₁₆ BB₁₆ || 7E₁₆ 7B₁₆ D2₁₆ BB₁₆ 7E₁₆ 7D₁₆ || EUC form acceptable to at least some decoders
|}

HZ was originally designed to be used purely as a 7-bit code. However, when situations allow, the escape sequences "~" sometimes surround characters represented in EUC-CN; this alternative use allows Chinese to be readable either with the help of HZ decoder software, or with a system that understands EUC-CN.

Additionally, the specification defines that
* the sequence "~~" is to be treated as encoding a single ASCII "~"
* the character "~" followed by a newline is to be discarded.
However, not all HZ decoders follow these two rules.

HZ decoders

The first HZ decoder was written in 1989 by the code's inventor for the Unix operating system.

The hztty program, also for the Unix operating system, was also among the first and one of the most popular HZ decoders. It deviates from the specification in that it will display the escape sequences , and it does not treat "~~" and "~" followed by a newline specially. This was probably to allow software which assumes one character to occupy one screen position to function correctly without modification.

Support on Microsoft Windows came later, and a number of third-party "Chinese systems" support HZ. These systems may provide an option to hide the escape sequences.

Guwen

Gǔwén literally means ''ancient ''. Historically the term has been used in several different ways.

The first usage, which is common, is as a reference to the ''most'' ancient forms of Chinese writing, namely the writing of the and early dynasties, such as found on oracle bones, bronzes, or pottery. This usage can be found at least as early as Xu Shen's Han dynasty etymological dictionary Shuowen Jiezi .

The second usage, also well known, refers to variant forms in Shuowen which Xu Shen mistook as being ancient, but which were actually used in the eastern areas during the Warring States period, as exemplified by copies of the Zuo Zhuan and 'books from within the walls' which were available to Xu Shen at the time of Shuowen's compilation. Xu mistook these as being significantly earlier than seal script, and thus also called them guwen. That is, Xu used the term guwen to refer to two different groups of scripts, both those which were truly ancient , and those which he mistook as being ancient . It took the work of later scholars like Wang Guowei to separate and clarify Xu's ambiguous usage of the term.

The third usage is for scripts which are no longer legible to the average modern reader, including the those referred to in meaning one above as well as the Stone Drums of Qin of the late Spring and Autumn period, other writing of the later Zhōu period preserved on stone, mid to late Zhou bronzes, the Eastern Warring States writing in meaning two above, and the late Zhōu to seal script. uses the term "ancient stage" of Chinese script in this manner, such that the Qín seal script and all its aforementioned predecessors are 'ancient', in contrast to the clerical script of the late Warring States through Qín and , and the , as both of these are legible to the modern reader of Chinese.

Additional Reading

*Chén Zhāoróng ''Research on the Qín Lineage of Writing: An Examination from the Perspective of the History of Chinese Writing'' . Academia Sinica, Institute of History and Philology Monograph . ISBN 957-671-995-X.
*Qiú Xīguī ''Chinese Writing'' . Translation of 文字學概要 by Gilbert L. Mattos and Jerry Norman. Early China Special Monograph Series No. 4. Berkeley: The Society for the Study of Early China and the Institute of East Asian Studies, University of California, Berkeley. ISBN 1-55729-071-7.

Gan Chinese

Gàn , alternatively Jiangxihua is one of the major divisions of spoken , a member of the Sino-Tibetan family of languages. Gan speakers are concentrated in and typical of Jiangxi Province, as well as the northwest of Fujian; and some parts of Anhui and Hubei in mainland China.

Different dialects of Gan exist, and the representative dialect is the Nanchang dialect.

The name "Gàn" comes from the shortened name of Jiangxi Province .

Classification

The classification of Gan is a subject of ongoing debate. Like all other varieties of , there is large amount of dispute as to whether Gan is a language or a dialect. It could be generally divided into three viewpoints:

*The first viewpoint considers Gan to be a dialect of Chinese, which is supported by the scholars in mainland China. Actually Gan, with , were carved out of the region of the language until 1937, and there are some Gan speakers that think Gan to be a dialect, mostly owing to the political factors or national emotion, also because Gan has more similarities with , than compared with Cantonese or Min.

*The second viewpoint considers Gan to be the same language with , called “Gan-Hakka”, or to be a group of languages with Hakka and Cantonese, because there are quite many similarities among the three.

*The third viewpoint considers Gan to be an independent language. Because Gan is not intelligible with other Chinese languages, and linguistically, it should be divided into different languages in case of intelligibility.

Please see Identification of the varieties of Chinese for the issues surrounding this dispute.

Name

* Gan: the formal name.

* Jiangxinese: the most common name. But there are several languages in Jiangxi, and there also many Gan speakers out of Jiangxi, so this name is not very exact.

* Xi: ancient name. Now it is seldom used.

* Gan dialect: the name used by the scholars in mainland China. And “Gan” is also used.

* Right-river language: because most of Gan speakers live in the south of Yangtze River, so this name was used in ancient China.

Relation with other Chinese languages

In ancient times, Jiangxi was divided into the same politic division with its neighboring provinces. Large numbers of people immigrated into Jiangxi naturally resulted in some similarities with the surrounding languages, Gan and Hakka are the most similar.

Geographical distribution

Region

Gan speakers almost live in the middle and lower reaches of Gan River, the drainage area of Fu River and the region of Poyang Lake, there are also many Gan speakers living in eastern Hunan, eastern Hubei, southern Anhui and northwest Fujian, etc.

According to the《Diagram of Divisions in the People’s Republic of China》, Gan is spoken by approximately 48,000,000 people, while 29,000,000 in Jiangxi ,4,500,000 in Anhui 、5,300,000 in Hubei 、9,000,000 in Hunan 、270,000 in Fujian .

History

Ancient Ages

During the Qin Dynasty , a large number of troops were sent to southern China in order to conquer the Baiyue territories in Fujian and Guangdong, as a result, numerous Han Chinese immigrated to Jiangxi in the years following.
In the early years of the Han Dynasty , Nanchang was established as the capital of the Yuzhang Commandery , along with the 18 counties of . The population of the Yuzhang Commandery increased to 1,670,000 from 350,000 , with a net growth of 1,320,000. The Yuzhang Commandery ranked forth in population among the more than 100 contemporary commanderies of China. As the largest commandery of YangZhou , Yuzhang accounted for two fifths of the population and Gan gradually took shape during this period.

Middle Ages

As a result of continuous warfare in the region of central China, the first large-scale immigration in the history of China took place. Large numbers of people in central China relocated to southern China in order to escape the bloodshed and at this time, Jiangxi played a role as a transfer station. Also, during this period, ancient Gan began to be exposed to the northern Mandarin dialects. After centuries of rule by the Southern Dynasties, Gan still retained many original characteristics despite having absorbed some elements of Guan-Hua.
Up until the Tang Dynasty, there was little difference between old Gan and the contemporary Gan of that era. Beginning in the Five Dynasties period, however, inhabitants in the central and northern parts of began to migrate to eastern Hunan, eastern Hubei, southern Anhui and northwest Fujian. During this period, following hundreds of years of migration, Gan spread to its current areas of distribution.

Recent History

evolved into a language based on , owing largely to political factors. At the same time, the differences between Gan and Guan-hua continued to become more pronounced. However, because Jiangxi borders on Jianghuai, a Guan-hua, Xiang, and Hakka speaking region, Gan proper has also been influenced by these surrounding languages, especially in its border regions.

Modern Times

After 1949, as a “dialect” in Mainland China, Gan faced a critical period. The impact of is quite evident today as a result of official governmental linguistic campaigns. Currently, many youths are unable to master Gan expressions, and some are no longer able speak Gan at all.

Recently, however, as a result of increased interest in protecting the local language, Gan now has begun to appear in various regional media, and there are also newscasts and television programs broadcast in the Gan language.

.

Dialects

According to 《Atlas of Chinese languages》, there are 9 dialects in Gan.

Ps: name with * means Gan is partly spoken in this city.

Phonetics

Like other Chinese tonal language, the function of tones in Gan is to distinguish the words’ meaning, and the tones may change in some cases.

The eight tone of ancient Chinese has been preserved in Gan: level, rising, departing, entering. Some dialects of Gan has reserved all of them.

Tone

Gan has 19 syllable onset, 65 syllable rimes and 5 tones .

The 6th and 7th tones are the same as the 4th and 5th tones, except that the syllable ends in a stop consonant, or .

vowels

Gan has 6 vowels:i, y, e, a, o, u.

Initials

In each cell below, the first line indicates IPA, the second indicates pinyin.

Ps: ? is an initial without sound.

Finals

opening finals:

nasal finals:

entering finals:

independent finals:

consonantal finals

Example

Grammar

In Gan, there are 9 principal grammatical tenses – initial （起始）, progressive （進行）, experimental （嘗試）, durative （持續）, processive （經歷）, continuative （繼續）, repeating （重行）, perfect （已然）, complete （完成）.

The grammar of Gan is similar to southern Chinese languages. The sequence 'subject verb object’ is most typical, but ' subject object verb ' or the passive voice is possible with particles. Take a simple sentence for example: "I hold you." The words involved are: ngo , tsot dok , ň .
* Subject verb object : The sentence in the typical sequence would be: ngo tsot dok ň.
* Subject lat object verb: Another sentence of roughly equivalent meaning is ngo lat ň tsot dok, with the slight connotation of "I take you and hold" or "I get to you and hold."
* Object den subject verb : Then, ň den ngo tsot dok means the same thing but in the passive voice, with the connotation of "You allow yourself to be held by me" or "You make yourself available for my holding."

Vocabulary

In Gan, there are a number of archaic words and expressions originally found in ancient Chinese, and which are now seldom or no longer used in Mandarin. For example, the noun ‘clothes’ in Gan is ‘衣裳’ while ‘衣服’ in Mandarin, the verb ‘sleep’ in Gan is ‘睏覺’ while ‘睡覺’ in Mandarin. Also, to describe something dirty, Gan speakers use ‘下里巴人’, which is a reference to a song from the Chu （楚國） region dating to China's Spring and Autumn Period.

Additionally, there are numerous interjections in Gan （e.g. 哈、噻、啵）, which can largely strengthen sentences, and better express different feelings.

Writing system

Gan is written with Chinese characters, though it does not have a strong written tradition. There are also some romanization schemes, but none is widely used. Gan speakers usually use Vernacular Chinese as the written form, which is used by all Chinese speakers.

Note

Fuzhou dialect

Foochowese , also known as Fuzhou dialect, Foochow dialect, Foochow, Fuzhounese, or Fuzhouhua, is considered the standard dialect of Min Dong, which is a branch of mainly spoken in the eastern part of Fujian Province. Native speakers also call it , meaning the language spoken in everyday life.

Although traditionally called a dialect, Foochowese is actually a separate language according to linguistic standards, because it is not mutually intelligible with other , let alone other Chinese languages. Therefore, whether Foochowese is a ''dialect'' or a ''language'' is highly disputable.

Centered in Fuzhou City, Foochowese mainly covers eleven cities and counties, viz.: Fuzhou , , , , , , , Changle , , Fuqing and . Foochowese is also the second local language in northern and middle Fujian cities and counties, like Nanping , Shaowu , , Sanming and .

Foochowese is also widely spoken in some regions abroad, especially in Southeastern Asian countries like Malaysia and Indonesia. The city of Sibu in Malaysia is called "New Fuzhou" due to the influx of immigrants there in the early 1900s. Similarly, the language has spread to the USA, UK and Japan as a result of immigration in recent decades.

History

Formation

After Han China's occupation of Minyue in 110 BC, Han people began its reign in what is Fujian Province today. Having lost their nationalities, the aboriginal Minyue people, a branch of , were gradually assimilated into Chinese culture. The and Ancient Chu language brought by the mass influx of Han immigrants from Northern area gradually mixed with the local Minyue language and finally developed into the Ancient Min language, from which Foochowese evolved.

Foochowese came into being during the period somewhere between late Tang Dynasty and "Five Dynasties and Ten Kingdoms", and has been considered by most as a Chinese dialect ever since. However, it is also worth noting that its substratum is constituted by large quantities of well-preserved Minyue vocabulary. In this sense, Foochowese is a ''de facto'' mixed language of Ancient Chinese and Minyue language.

The famous book Qī Lín Bāyīn , which was compiled in the 17th century, is the first and the most full-scale rime book that provides a systematic guide to character reading for people speaking or learning Foochowese. It once served to standardize the language and is still widely quoted as an authoritative reference book in modern academic research in Chinese phonology.

Studies by early Western missionaries

In 1842, Fuzhou was open to Westerners as a treaty port after the signing of the Treaty of Nanjing. But due to the language barrier, however, the first Christian missionary base in this city did not take place without difficulties. In order to convert Fuzhou people, those missionaries found it very necessary to make a careful study of the Foochowese. Their most notable works are listed below:

:* 1856, M. C. White:
:* 1870, R. S. Maclay & C. C. Baldwin: An alphabetic dictionary of the Chinese language in the Foochow dialect
:* 1871, C. C. Baldwin: Manual of the Foochow dialect
:* 1891, T. B. Adam: An English-Chinese dictionary of the Foochow dialect
:* 1893, Charles Hartwell:
:* 1898, R. S. Maclay & C. C. Baldwin: An alphabetic dictionary of the Chinese language of the Foochow dialect, 2nd edition
:* 1906, The Foochow translation of the complete Bible
:* 1923, T. B. Adam & L. P. Peet: An English-Chinese dictionary of the Foochow dialect, 2nd edition
:* 1929, R. S. Maclay & C. C. Baldwin :

Status quo

By the end of the Qing Dynasty, Fuzhou society had been largely . But for decades the Chinese government has discouraged the use of the colloquial in school education and in media, so the number of speakers has been greatly boosted. It is reported that merely less than half of the children and youngsters in Fuzhou are able to speak this language.

Nevertheless, it should be noted that Foochowese is currently widely spoken among some native speakers as an "endearing" language. Speaking Foochowese in Fuzhou often allows mutual speakers a certain level of familiarity. Even though Mandarin Chinese is more often heard in casual conversations on the city streets, the careful observer will notice that in more communal settings, such as small neighborhoods in the city or the surrounding countryside, Foochowese is often the dominant language.

In Mainland China, Foochowese has been officially listed as Intangible Cultural Heritage and its promotion work is being systematically carried out. In Matsu, Taiwan, the teaching of Foochowese has been successfully introduced into elementary schools, alongside the Taiwanese localization movement.

Grammar

:''This section is about Standard Foochowese only. See for a discussion of other dialects.''

Phonetics

, Foochowese is a tonal language, which has extremely extensive sandhi rules in the , , and the . These over-complicated rules make Foochowese one of the most difficult Chinese languages.

Tones

There are seven original in Foochowese, which reserves the tonal system of Ancient Chinese:

The sample characters are taken from the Qī Lín Bāyīn.

In ''Qī Lín Bāyīn'', the Foochowese is described as having eight tones, which explains how the book got its title . That name, however, is somewhat misleading, because ?ng-siōng and Iòng-siōng are identical in tone contour; therefore, only seven tones exist.

?ng-?k and Iòng-?k characters are ended with either or Glottal stop .

Besides those seven tones listed above, two new tonal values, "21" and "35" also occur in connected speech .

Tonal sandhi

The rules of tonal sandhi in Foochowese are complicated, even compared with those of other Chinese dialects. When two or more than two characters combine into a word, the tonal value of the last character remains stable but those of its preceding characters change in most cases. For example, "獨", "立" and "日" are characters of Iòng-?k with the same tonal value "5", and are pronounced as , and , respectively. When combined together as the phrase "獨立日" , "獨" changes its tonal value to "21", and "立" changes its to "33", therefore the pronunciation as a whole is .

The two-character tonal sandhi rules are shown in the table below:

?ng-?k-gák are ?ng-?k characters with glottal stop and ?ng-?k-ék with .

However, the tonal sandhi rules of more than two characters are much more complicated than can be conveniently displayed in a single table.

Initials

There are seventeen in all:

The Chinese characters in the brackets are also sample characters from ''Qī Lín Bāyīn''.

Most Chinese linguists argue that Foochowese should be described as possessing a null onset. In fact, any character that has a null onset begins with a glottal stop .

Some speakers find it difficult to distinguish between the initials and ].

No such as or exist in Foochowese, which is one of the most conspicuous characteristics shared by all branches in the , as well as and .

and exist in connected speech only.

Initial assimilation

In Foochowese, there are various kinds of initial , all of which are progressive. When two or more than two characters combine into a phrase, the initial of the first character stays unchanged while those of the following characters, in most cases, change to match its preceding phoneme, i.e., the of its preceding character.

Rimes

The table below shows the eleven of Foochowese.

In Foochowese codas , , and have all merged as ; and , , have all merged as . Eleven vowel phonemes, together with the codas and , are organized into forty-six .

As has been mentioned above, there are theoretically two different entering tonal codas in Foochowese: and . But for most Foochowese speakers, those two codas are only distinguishable when in the or . Therefore, most Chinese linguists think that the codas and has merged together.

Close/Open rimes

All rimes come in pairs in the above table: the one to the left represents a close rime , while the other represents an open rime . The close/open rimes are closely related with the tones. As single characters, the tones of ?ng-bìng , Siōng-si?ng , Iòng-bìng and Iòng-?k have close rimes while ?ng-ké?? , ?ng-?k and Iòng-ké?? have the open rimes. In connected speech, an open rime shifts to its close counterpart in the .

For instance, "福" is a ?ng-?k character and is pronounced as and "州" a ?ng-bìng character with the pronunciation of . When these two characters combine into the word "福州" , "福" changes its tonal value from "24" to "21" and, simultaneously, shifts its rime from to , so the phrase is pronounced as . While in the word "中國" , "中" is a ?ng-bìng character and therefore its close rime never changes, though it does change its tonal value from "55" to "53" in the tonal sandhi.

The phenomenon of close/open rimes is unique to Foochowese and this feature makes it especially intricate and hardly intelligible even to other .

Phonological features

Vocabulary

Most words in Foochowese have cognates in other Chinese languages, so a non-Fuzhou speaker would find it much easier to understand Foochowese written in Chinese characters than spoken in conversation. But it should also be noted, however, that false friends do exist: for example, "莫細膩" means "don't be too polite" or "make yourself at home", "我對手汝洗碗" means "I help you wash dishes", "伊共伊老媽嚟冤家" means "he and his wife are quarreling ", etc. Sheer knowledge of Mandarin vocabulary does not help one catch the meaning of these sentences.

The majority of Foochowese vocabulary dates back to more than 1,200 years ago. Some daily-used words are even preserved as they were in Tang Dynasty, which can be illustrated by a poem of a famous Chinese poet Gu Kuang . In his poem ''Jiǎn'' , Gu Kuang explicitly noted:

In Foochowese, "囝" and "郎罷" are still in use today, without any slightest change.

Words from Ancient Chinese

Quite a few words from Ancient Chinese have retained the original meanings for thousands of years, while their counterparts in Mandarin Chinese have either fallen out of daily use or varied to different meanings.

This table shows some Foochowese words from Classical Chinese, as contrasted to Mandarin Chinese:

:¹ "看" is also used as the verb "to look" in Foochowese.
:² "養" in Foochowese means "give birth to ".

And this table shows some words that are both used in Foochowese and Mandarin Chinese, while the meanings in Mandarin Chinese have altered:

Words from Minyue language

Some daily used words, shared by all Min languages, came from the ancient Minyue language. Such as follows:

The literary and colloquial readings

The literary and colloquial readings is a feature commonly found in all Chinese dialects throughout China. The literary readings are mainly used in formal phrases and written language, while the colloquial ones are basically used in vulgar phrases and spoken language.

This table displays some widely used characters in Foochowese which have both literary and colloquial readings:

江
|
|-
| 百
| báik
| 百科 báik-ku?
| encyclopedical
| báh
| 百姓 báh-sáng
| common people
|-
| 飛
| h?
| 飛機 h?-g?
| aeroplane
| bu?i
| 飛鳥 bu?i-cēu
| flying birds
|-
| 寒
| hàng
| 寒食 Hàng-s?k
| Cold Food Festival
| gàng
| 天寒 ti?ng gàng
| cold, freezing
|-
| 廈
| h?
| 大廈 d?i-h?
| mansion
| ?
| 廈門
| Amoy
|}

Loan words from English

The First Opium War, also known as the First Anglo-Chinese War, was ended in 1842 with the signing of the Treaty of Nanjing, which forced the Qing government to open Fuzhou to all traders and missionaries. Since then, quite a number of churches and Western-style schools have been established. Consequently, some words Foochowese, but without fixed written forms in Chinese characters. The most frequently used words are listed below:
* , , noun, meaning "an article of dress", is from the word "coat";
* , , noun, meaning "a meshwork barrier in tennis or badminton", is from the word "net";
* , , noun, meaning "oil paint", is from the word "paint";
* , , noun, meaning "a small sum of money", is from the word "penny";
* , , noun, meaning "money", is from the word "take";
* , , noun, meaning "girl" in a humorous way, is from the word "girl";
* , , verb, meaning "to shoot ", is from the word "shoot";
* , , verb, meaning "to pause ", is from the word "again".
* , , meaning "Southeastern Asian ", is from the word "Malacca".

Other features of Foochowese grammar

Examples

Some common phrases in Foochowese:
* Foochowese : 福州話 / /
* Hello: 汝好 / /
* Good-bye: 再見 / /
* Please: 請 / / ; 起動 / /
* Thank you: 謝謝 / / ; 起動 / Kī-d?e?ng /
* Sorry: 對不住 / /
* This: 嚽 / / ; 啫 / / ; 茲 / /
* That: 噲 / / ; 嘻 / / ; 許 / /
* How much?: 偌 / /
* Yes: 正是 / / ; 無綻 / / ; 著 / /
* No: 伓是 / / ; 綻 / / ; 賣著 / /
* I don't understand: 我賣會意 / /
* What's his name?: 伊名什乇？ / /
* Where's the hotel?: 賓館洽底所？ / /
* How can I go to the school?: 去學校怎樣行？ / /
* Do you speak Foochowese?: 汝會講福州話賣？ / /
* Do you speak English?: 汝會講英語賣？ / /

Regional variations

Writing system

Chinese characters

Most of the characters of Foochowese stem from Ancient Chinese and can therefore be written in Chinese characters. Many books published in Qing Dynasty have been written in this traditional way, such as Mǐndū Biéjì and the Bible in Foochowese. However, Chinese characters as the writing system for Foochowese do have many shortcomings.

Firstly, a great number of characters are unique to Foochowese, so that they can only be written in informal ways. For instance, the character "", a negative word, has no common form. Some write it as "" or "", both of which share with it an identical pronunciation but has a totally irrelevant meaning; and others prefer to use a newly-created character combining "" and "", but this character is not included in most fonts.

Secondly, Foochowese has been excluded from the educational system for many decades. As a result, many if not all take for granted that Foochowese does not have a formal writing system and when they have to write it, they tend to misuse characters with a similar Mandarin Chinese enunciation. For example, " ", meaning "okay", are frequently written as "" because they are uttered almost in the same way.

Foochow Romanized

Foochow Romanized, also known as or , is a orthography for Foochowese adopted in the middle of 19th century by and missionaries. It had varied at different times, and became standardized several decades later. Foochow Romanized was mainly used inside of Church circles, and was taught in some Mission Schools in Fuzhou.

Mǐnqiāng Kuàizì

Mǐnqiāng Kuàizì , literally meaning "Fujian Colloquial Fast Characters", is a Qieyin System for Foochowese designed by Chinese scholar and calligrapher Li Jiesan in 1896.

Literary and art forms

Books and other sources

* Cathryn Donohue: , University of Nevada, Reno
* Chen, Leo & Norman, Jerry: ''An Introduction to the Foochow Dialect'', San Francisco State Coll., CA, 1965.

Foochow Romanized

Foochow Romanized, a.k.a. Bàng-u?-cê or Hók-ci?-u? Lò?-mā-cê , is a orthography for the Fuzhou dialect adopted in the middle of 19th century by Western missionaries. It had varied at different times, and became standardized several decades later. Foochow Romanized was mainly used inside of Church circles, and was taught in some Mission Schools in Fuzhou. But unlike its counterpart Pe?h-ōe-jī for Southern Min Language, Foochow Romanized, even in its prime days, was by no means universally understood by Christians.

History of Foochow Romanized

After Fuzhou became one of the five Chinese treaty ports opened by the Treaty of Nanjing at the end of First Opium War , many Western missionaries arrived in the city. Faced with widespread illiteracy, they developed romanization schemes for Fuzhou dialect.

The first attempt in romanizing Fuzhou dialect was made by the M. C. White, who borrowed a system of orthography known as the System of Sir William Jones. In this system, 14 were designed exactly according to their and . P, T, K and CH stand for , , and ; while the Greek spiritus lenis "?" were affixed to the above initials to represent their aspirated counterparts. Besides the default five vowels of Latin alphabet, four ?, ?, ? and ? were also introduced, representing , , and , respectively. This system is described at length in White's linguistic work .

Subsequent missionaries, including Robert S. Maclay from American Methodist Episcopal Mission, R. W. Stewart from the Church of England and Charles Hartwell from the American Board Mission, further modified White's System in several ways. The most significant change was made in the scheme of plosive consonants, by which the spiritus lenis "?" of the aspirated initials was totally removed and the letters B, D and G were introduced to represent and . In the aspect of vowels, ?, ?, ? and ? were replaced by A?, E?, O? and U?; and since the diacritical marks were all shifted to underneath the vowels, tonal marks were thus invented.

Scheme

The sample characters are taken from the phonetics book Qī Lín Bāyīn , a renowned phonology book about the Fuzhou dialect written in the Qing Dynasty. The pronunciations are recorded in standard symbols.

Initials

Rimes

Rimes without

Rimes with coda

Rimes with codas and

Tones

Note that Foochow Romanized uses the breve, not the caron , to indicate Yīnpíng and Yángrù tones of Fuzhou dialect.

Sample text

Fanqie

In Chinese phonology, fanqie is a method to indicate the pronunciation of a by using two other characters.

The Origin

Before ''fanqie'' was widely adopted, method of ''du ruo'' was used in works such as Erya . Introduction of around the first century brought Sanskrit. Its phonetic knowledge might have inspired the idea of ''fanqie''

Sun Yan is generally considered to be the first to adopt ''fanqie'' in ''Erya Yinyi'' . He was a man in ''Wei'' state during the period of Three Kingdoms . .

In the original ''fanqie'', a character's pronunciation is represented by two other characters. The consonant is represented by that of the first of the two characters ; the final and the are represented by those of the second of the two characters . . The representation of tone notably changed later.

In 601 AD during the Sui Dynasty, , a Chinese rhyme dictionary using ''fanqie'' was published .

Modern form

In Middle Chinese, the tone was represented by the rhyme character. However, owing to sound changes that have occurred since then, a more complicated rule is used today :
# The yin-yang classification, which arose in some tones due to distinctions in the onset, is determined by the onset character.
# The ping-shang-qu-ru classification, which is kept from Middle Chinese, is determined by the rhyme character.
Thus
: + =

For example, the character ? is represented by 德?切. The third character 切 indicates that this is a fanqie spelling, while the first two characters indicate the onset and rhyme respectively. Thus the pronunciation of ? is given as the onset of 德 ''dé'' with the rhyme of ? hóng'' , yielding ''dong''. Also, 德 has a yin ru tone and ? has a yang ping tone. So the tone of ? is yin ping.

Gari Ledyard has given this informative example of how an English equivalent to fanqie might look:
:To show the pronunciation of an unknown character, one "cut" the initial consonant from a second character and the rhyme from a third, and combined them to show the reading of the first. To use an English example, one could indicate the pronunciation of the word ''sough'' by "cutting" ''sun'' and ''now'' , or "cut" ''sun'' and ''cuff '' to show the alternate pronunciation. This method was a bit circular in that it required knowledge of the pronunciations of the characters that were "cut," but it proved to be a workable system and lasted well into the twentieth century.

Language change

Owing to the development of the Chinese language over the last millennium and a half, the fanqie spellings are not always accurate for of Modern Chinese; for example, the modern pronunciation of 德 is in a yang tone. However, it is still rather accurate for southern Chinese such as and Hakka, which have preserved many elements of and Middle Chinese.

Saturday, October 4, 2008

Han unification

Rationale and controversy

Examples of language dependent characters

Examples of some non-unified Han ideographs

Unicode ranges

Unihan database files

HZ (character encoding)

Structure and use

HZ decoders

Guwen

Additional Reading

Gan Chinese

Classification

Name

Relation with other Chinese languages

Geographical distribution

Region

History

Dialects

Phonetics

Tone

vowels

Initials

Finals

consonantal finals

Example

Grammar

Vocabulary

Writing system

Note

Fuzhou dialect

History

Formation

Studies by early Western missionaries

Status quo

Grammar

Phonetics

Tones

Tonal sandhi

Initials

Initial assimilation

Rimes

Close/Open rimes

Phonological features

Vocabulary

Words from Ancient Chinese

Words from Minyue language

The literary and colloquial readings

Loan words from English

Other features of Foochowese grammar

Examples

Regional variations

Writing system

Chinese characters

Foochow Romanized

Mǐnqiāng Kuàizì

Literary and art forms

Books and other sources

Foochow Romanized

History of Foochow Romanized

Scheme

Initials

Rimes

Rimes without

Rimes with coda

Rimes with codas and

Tones

Sample text

Fanqie

The Origin

Modern form

Language change

Chinese language

Blog Archive