Chinese language: October 2008

Saturday, October 4, 2008

Han unification

Han unification is an effort by the authors of Unicode and the Universal Character Set to map multiple character sets of the so-called CJK languages into a single set of unified . Han characters are a common feature of written , , , and in Hong Kong, and — at least historically — other East and Southeast Asian languages.

Modern Chinese, Korean, and Japanese typefaces typically use regional or historical . In the formulation of Unicode, an attempt was made to unify these variants by considering them different glyphs representing the same "grapheme", or unit, hence, "Han unification", with the resulting character repertoire sometimes contracted to Unihan.

Unihan can also refer to the Unihan Database maintained by the Unicode Consortium, which provides information about all of the unified Han characters encoded in the Unicode standard, including mappings to various national and industry standards, indices into standard dictionaries, encoded variants, pronunciations in various languages, and an English definition. The database is available to the public as and via an . The latter also includes representative glyphs and definitions for compound words drawn from the free Japanese EDICT and Chinese CEDICT dictionary projects .

Rationale and controversy

Rules for Han unification are given in the East Asian Scripts chapter of the various versions of the Unicode Standard . The Ideographic Rapporteur Group , made up of experts from the Chinese-speaking countries, North and South Korea, Japan, Vietnam, and other countries, is responsible for the process.

One possible rationale is the desire to limit the size of the full Unicode character set, where CJK characters as represented by discrete ideograms may approach or exceed 100,000, . article located on IBM DeveloperWorks attempts to illustrate part of the motivation for Han unification:

In fact, the three ideographs for "one" are encoded separately in Unicode, as they are not considered national variants. The first and second are used on financial instruments to prevent tampering, , while the third is the common form in all three countries.

However, Han unification has also caused considerable controversy, particularly among the Japanese public, who, with the nation's literati, have a history of protesting the culling of historically and culturally significant variants.

Since the Unihan standard encodes "graphemes", not "glyphs", the graphical artifacts produced by Unicode have been considered temporary technical hurdles, and at most, cosmetic. However, again, particularly in Japan, due in part to the way in which Chinese characters were incorporated into Japanese writing systems historically, the inability to specify a particular variant is considered a significant obstacle to the use of Unicode in scholarly work. For example, the unification of "grass" , means that a historical text cannot be encoded so as to preserve its peculiar orthography. Instead, for example, the scholar would be required to locate the desired glyph in a specific typeface in order to convey the text as written, defeating the purpose of a unified character set.

Small differences in graphical representation are also problematic when they affect legibility or the wrong cultural tradition. Besides making some Unicode fonts unusable for texts involving multiple "Unihan languages", names or other orthographically sensitive terminology might be displayed incorrectly. While this may be considered primarily a graphical representation or rendering problem to be overcome by more artful fonts, the widespread use of Unicode would make it difficult to preserve such distinctions. The problem of one character representing semantically different concepts is also present in the Latin part of Unicode. The Unicode character for an apostrophe is the same as the character for a right single quote: ’. On the other hand, it is sometimes pointed out that the capital Latin letter "A" is not unified with the Greek letter "Α" . This is, of course, desirable for reasons of compatibility, and deals with a much smaller alphabetic character set.

While the unification aspect of Unicode is controversial in some quarters for the reasons given above, Unicode itself does now encode a vast number of seldom-used characters of a more-or-less antiquarian nature.

Some of the controversy stems from the fact that the very decision of performing Han unification was made by the initial Unicode Consortium, which at the time was a consortium of North American companies and organizations , but included no East Asia government representatives. The initial design goal was to create a 16-bit standard, and Han unification was therefore a critical step for avoiding tens of thousands of character duplications. This 16-bit requirement was later abandoned, making the size of the character set less an issue today.

The controversy later extended to the internationally representative ISO: the initial CJK-JRG group favored a proposal for a non-unified character set, "which was thrown out in favor of unification with the Unicode Consortium's unified character set by the votes of American and European ISO members" . Endorsing the Unicode Han unification was a necessary step for the heated ISO 10646/Unicode merger.

Much of the controversy surrounding Han unification is based on the distinction between glyphs, as defined in Unicode, and the related but distinct idea of graphemes. Unicode defines abstract characters, as opposed to glyphs, which are a particular visual representations of a character in a specific typeface, or a grapheme, the "basic unit of writing" in a given language. One character may be represented by many distinct glyphs, for example a "g" or an "a", both of which may have one loop or two. In Dutch, "ij" is a sometimes considered a single letter , and thus arguably a grapheme . For example, the first letter in "IJsselmeer" is capitalized. Similarly for "ch" in some Spanish-speaking countries, and "lj" in Croatian. Graphemes present in national character code standards have been added to Unicode, as required by Unicode's Source Separation rule, even where they can be composed of characters already available. The national character code standards existing in CJK languages are considerably more involved, given the technological limitations under which they evolved, and so the official CJK participants in Han unification may well have been amenable to reform.

Unlike European versions, CJK Unicode fonts, due to Han unification, have large but irregular patterns of overlap, requiring language-specific fonts. Unfortunately, language-specific fonts also make it difficult to access to a variant which, as with the "grass" example, happens to appear more typically in another language style. Unihan proponents tend to favor markup languages for defining language strings, but this would not ensure the use of a specific variant in the case given, only the language-specific font more likely to depict a character as that variant.

Chinese users seem to have fewer objections to Han unification, largely because Unicode did not attempt to unify Simplified Chinese characters, , with Traditional Chinese characters, as used in Hong Kong, Taiwan , and, with some differences, more familiar to Korean and Japanese users. Unicode is seen as neutral with regards to this politically charged issue, and has encoded Simplified and Traditional Chinese glyphs separately, . It is also noted that Traditional and Simplified characters should be encoded separately according to Unicode Han Unification rules, because they are distinguished in pre-existing PRC character sets. Furthermore, as with other variants, Traditional to Simplified characters is not a one-to-one relationship.

Specialist character sets developed to address, or regarded by some as not suffering from, these perceived deficiencies include:
*ISO/IEC 2022
*CNS character set
*CCCII character set
*
*UTF-2000
*Mojikyo
*
* and its successor HKSCS

However, none of these alternative standards has been as widely adopted as Unicode, which is now the base character set for many new standards and protocols, and is built into the architecture of operating systems , programming languages , and libraries , font formats and so on.

Examples of language dependent characters

In each row of the following table, the same character is repeated in all five columns. However, each column is marked as being in a different language: , , or . The should select, for each character, a glyph suitable to the specified language. This only works for fallback glyph selection if you have CJK fonts installed on your system and the font selected to display this article does not include glyphs for these characters. Note also that Unicode includes non-graphical language tag characters in the range U+E0000 – U+E007F for plain text language tagging.

Examples of some non-unified Han ideographs

For some glyphs, Unicode has encoded variant characters, making it unnecessary to switch between fonts or language tags. In the following table, the separate rows in each group contains the Unicode equivalent character using different code points. Note that for characters such as 入 , the only way to display the two variants is to change font as described in the previous table. However, for 內 , there is an alternate character 内 as illustrated below. For some characters, like 兌/兑 , either method can be used to display the different glyphs.

{| border style="font-size: xx-large; line-height: normal; text-align: center; border-collapse: collapse"
|- style="text-align: left; vertical-align: bottom" lang="zh" xml:lang="zh"
|style="font-size: medium; font-family: sans-serif;" |Code
|valign="bottom" style="font-size: medium; font-family: sans-serif;"|
|valign="bottom" style="font-size: medium"|
|style="font-size: medium"|
|style="font-size: medium"|
|style="font-size: medium"|

|-
|style="font-size: medium; font-family: sans-serif;" |U+9AD8
| lang="zh" xml:lang="zh"|高
| lang="zh-Hans" xml:lang="zh-Hans"|高
|style="vertical-align: middle" lang="zh-Hant" xml:lang="zh-Hant"|高
|style="vertical-align: middle" lang="ja" xml:lang="ja"|高
|style="vertical-align: middle; height: 1.5em;" lang="ko" xml:lang="ko"|高
|-
|style="font-size: medium; font-family: sans-serif;" |U+9AD9
| lang="zh" xml:lang="zh"|髙
| lang="zh-Hans" xml:lang="zh-Hans"|髙
|style="vertical-align: middle" lang="zh-Hant" xml:lang="zh-Hant"|髙
|style="vertical-align: middle" lang="ja" xml:lang="ja"|髙
|style="vertical-align: middle; height: 1.5em;" lang="ko" xml:lang="ko"|�br />

Unicode ranges

Ideographic characters assigned by Unicode appear in the following blocks:

*CJK Unified Ideographs
*CJK Unified Ideographs Extension A
*CJK Unified Ideographs Extension B

Unicode includes support of CJKV radicals, strokes, punctuation, marks and symbols in the following blocks:
*CJK Radicals Supplement
*CJK Symbols and Punctuation
*CJK Strokes
*Ideographic Description Characters

Additional compatibility characters appear in these blocks:
*Kangxi Radicals
*Enclosed CJK Letters and Months
*CJK Compatibility
*CJK Compatibility Ideographs
*CJK Compatibility Ideographs
*CJK Compatibility Forms

These compatibility characters are included for compatibility with legacy text handling system and other legacy character sets. They include forms of characters for vertical text layout and rich text characters that Unicode recommends handling through other means.

Unihan database files

The Unihan project always made large effort to let available their build database.

An Unihan.zip file is provided on unicode.org, it provide all datas the unihan team have collected.

A project libUnihan provide a normalized SQLite Unihan database and corresponding C library. All tables in this database are in fifth normal form.

This 69Mo libUnihan is released as LGPL, while its database, UnihanDb, is released as MIT License.

HZ (character encoding)

The HZ character encoding is an of GB2312 that was formerly commonly used in email and USENET postings. It was designed in 1989 by Fung Fung Lee of Stanford University, and subsequently codified in 1995 into RFC 1843.

The HZ encoding was invented to facilitate the use of Chinese characters through e-mail, which at that time only allowed 7-bit characters. Therefore, in lieu of standard ISO 2022 escape sequences or 8-bit characters , the HZ code uses only printable, 7-bit characters to represent Chinese characters.

It was also popular in USENET networks, which in the late 1980s and early 1990s, generally did not allow transmission of 8-bit characters or escape characters.

Structure and use

In the HZ encoding system, the character sequences "~" act as escape sequences; anything between them is interpreted as Chinese encoded in GB2312 . Outside the escape sequences, characters are assumed to be ASCII.

An example will help illustrate the relationship between GB2312, , and the HZ code:

{| border=1 cellpadding=4 style="border-collapse: collapse;"
|+ Various forms of the GB2312 code for the character "一"
|---
! Form || Code || With escape sequences || Remarks
|---
| Kuten / Qūwèi / 区位 form || 5027 || — || Zone 50, point 27
|---
| ISO 2022 form || 52₁₆ 3B₁₆ || 0E₁₆ 52₁₆ 3B₁₆ 0F₁₆ || 50 + 32 = 82 = 52₁₆
|---
| EUC-CN form || D2₁₆ BB₁₆ || D2₁₆ BB₁₆ || 52₁₆ ∨ 80₁₆ = D2₁₆
|---
| HZ form || 52₁₆ 3B₁₆ || 7E₁₆ 7B₁₆ 52₁₆ 3B₁₆ 7E₁₆ 7D₁₆ || Appears as ~ without HZ decoder
|---
| HZ form || D2₁₆ BB₁₆ || 7E₁₆ 7B₁₆ D2₁₆ BB₁₆ 7E₁₆ 7D₁₆ || EUC form acceptable to at least some decoders
|}

HZ was originally designed to be used purely as a 7-bit code. However, when situations allow, the escape sequences "~" sometimes surround characters represented in EUC-CN; this alternative use allows Chinese to be readable either with the help of HZ decoder software, or with a system that understands EUC-CN.

Additionally, the specification defines that
* the sequence "~~" is to be treated as encoding a single ASCII "~"
* the character "~" followed by a newline is to be discarded.
However, not all HZ decoders follow these two rules.

HZ decoders

The first HZ decoder was written in 1989 by the code's inventor for the Unix operating system.

The hztty program, also for the Unix operating system, was also among the first and one of the most popular HZ decoders. It deviates from the specification in that it will display the escape sequences , and it does not treat "~~" and "~" followed by a newline specially. This was probably to allow software which assumes one character to occupy one screen position to function correctly without modification.

Support on Microsoft Windows came later, and a number of third-party "Chinese systems" support HZ. These systems may provide an option to hide the escape sequences.

Guwen

Gǔwén literally means ''ancient ''. Historically the term has been used in several different ways.

The first usage, which is common, is as a reference to the ''most'' ancient forms of Chinese writing, namely the writing of the and early dynasties, such as found on oracle bones, bronzes, or pottery. This usage can be found at least as early as Xu Shen's Han dynasty etymological dictionary Shuowen Jiezi .

The second usage, also well known, refers to variant forms in Shuowen which Xu Shen mistook as being ancient, but which were actually used in the eastern areas during the Warring States period, as exemplified by copies of the Zuo Zhuan and 'books from within the walls' which were available to Xu Shen at the time of Shuowen's compilation. Xu mistook these as being significantly earlier than seal script, and thus also called them guwen. That is, Xu used the term guwen to refer to two different groups of scripts, both those which were truly ancient , and those which he mistook as being ancient . It took the work of later scholars like Wang Guowei to separate and clarify Xu's ambiguous usage of the term.

The third usage is for scripts which are no longer legible to the average modern reader, including the those referred to in meaning one above as well as the Stone Drums of Qin of the late Spring and Autumn period, other writing of the later Zhōu period preserved on stone, mid to late Zhou bronzes, the Eastern Warring States writing in meaning two above, and the late Zhōu to seal script. uses the term "ancient stage" of Chinese script in this manner, such that the Qín seal script and all its aforementioned predecessors are 'ancient', in contrast to the clerical script of the late Warring States through Qín and , and the , as both of these are legible to the modern reader of Chinese.

Additional Reading

*Chén Zhāoróng ''Research on the Qín Lineage of Writing: An Examination from the Perspective of the History of Chinese Writing'' . Academia Sinica, Institute of History and Philology Monograph . ISBN 957-671-995-X.
*Qiú Xīguī ''Chinese Writing'' . Translation of 文字學概要 by Gilbert L. Mattos and Jerry Norman. Early China Special Monograph Series No. 4. Berkeley: The Society for the Study of Early China and the Institute of East Asian Studies, University of California, Berkeley. ISBN 1-55729-071-7.

Gan Chinese

Gàn , alternatively Jiangxihua is one of the major divisions of spoken , a member of the Sino-Tibetan family of languages. Gan speakers are concentrated in and typical of Jiangxi Province, as well as the northwest of Fujian; and some parts of Anhui and Hubei in mainland China.

Different dialects of Gan exist, and the representative dialect is the Nanchang dialect.

The name "Gàn" comes from the shortened name of Jiangxi Province .

Classification

The classification of Gan is a subject of ongoing debate. Like all other varieties of , there is large amount of dispute as to whether Gan is a language or a dialect. It could be generally divided into three viewpoints:

*The first viewpoint considers Gan to be a dialect of Chinese, which is supported by the scholars in mainland China. Actually Gan, with , were carved out of the region of the language until 1937, and there are some Gan speakers that think Gan to be a dialect, mostly owing to the political factors or national emotion, also because Gan has more similarities with , than compared with Cantonese or Min.

*The second viewpoint considers Gan to be the same language with , called “Gan-Hakka”, or to be a group of languages with Hakka and Cantonese, because there are quite many similarities among the three.

*The third viewpoint considers Gan to be an independent language. Because Gan is not intelligible with other Chinese languages, and linguistically, it should be divided into different languages in case of intelligibility.

Please see Identification of the varieties of Chinese for the issues surrounding this dispute.

Name

* Gan: the formal name.

* Jiangxinese: the most common name. But there are several languages in Jiangxi, and there also many Gan speakers out of Jiangxi, so this name is not very exact.

* Xi: ancient name. Now it is seldom used.

* Gan dialect: the name used by the scholars in mainland China. And “Gan” is also used.

* Right-river language: because most of Gan speakers live in the south of Yangtze River, so this name was used in ancient China.

Relation with other Chinese languages

In ancient times, Jiangxi was divided into the same politic division with its neighboring provinces. Large numbers of people immigrated into Jiangxi naturally resulted in some similarities with the surrounding languages, Gan and Hakka are the most similar.

Geographical distribution

Region

Gan speakers almost live in the middle and lower reaches of Gan River, the drainage area of Fu River and the region of Poyang Lake, there are also many Gan speakers living in eastern Hunan, eastern Hubei, southern Anhui and northwest Fujian, etc.

According to the《Diagram of Divisions in the People’s Republic of China》, Gan is spoken by approximately 48,000,000 people, while 29,000,000 in Jiangxi ,4,500,000 in Anhui 、5,300,000 in Hubei 、9,000,000 in Hunan 、270,000 in Fujian .

History

Ancient Ages

During the Qin Dynasty , a large number of troops were sent to southern China in order to conquer the Baiyue territories in Fujian and Guangdong, as a result, numerous Han Chinese immigrated to Jiangxi in the years following.
In the early years of the Han Dynasty , Nanchang was established as the capital of the Yuzhang Commandery , along with the 18 counties of . The population of the Yuzhang Commandery increased to 1,670,000 from 350,000 , with a net growth of 1,320,000. The Yuzhang Commandery ranked forth in population among the more than 100 contemporary commanderies of China. As the largest commandery of YangZhou , Yuzhang accounted for two fifths of the population and Gan gradually took shape during this period.

Middle Ages

As a result of continuous warfare in the region of central China, the first large-scale immigration in the history of China took place. Large numbers of people in central China relocated to southern China in order to escape the bloodshed and at this time, Jiangxi played a role as a transfer station. Also, during this period, ancient Gan began to be exposed to the northern Mandarin dialects. After centuries of rule by the Southern Dynasties, Gan still retained many original characteristics despite having absorbed some elements of Guan-Hua.
Up until the Tang Dynasty, there was little difference between old Gan and the contemporary Gan of that era. Beginning in the Five Dynasties period, however, inhabitants in the central and northern parts of began to migrate to eastern Hunan, eastern Hubei, southern Anhui and northwest Fujian. During this period, following hundreds of years of migration, Gan spread to its current areas of distribution.

Recent History

evolved into a language based on , owing largely to political factors. At the same time, the differences between Gan and Guan-hua continued to become more pronounced. However, because Jiangxi borders on Jianghuai, a Guan-hua, Xiang, and Hakka speaking region, Gan proper has also been influenced by these surrounding languages, especially in its border regions.

Modern Times

After 1949, as a “dialect” in Mainland China, Gan faced a critical period. The impact of is quite evident today as a result of official governmental linguistic campaigns. Currently, many youths are unable to master Gan expressions, and some are no longer able speak Gan at all.

Recently, however, as a result of increased interest in protecting the local language, Gan now has begun to appear in various regional media, and there are also newscasts and television programs broadcast in the Gan language.

.

Dialects

According to 《Atlas of Chinese languages》, there are 9 dialects in Gan.

Ps: name with * means Gan is partly spoken in this city.

Phonetics

Like other Chinese tonal language, the function of tones in Gan is to distinguish the words’ meaning, and the tones may change in some cases.

The eight tone of ancient Chinese has been preserved in Gan: level, rising, departing, entering. Some dialects of Gan has reserved all of them.

Tone

Gan has 19 syllable onset, 65 syllable rimes and 5 tones .

The 6th and 7th tones are the same as the 4th and 5th tones, except that the syllable ends in a stop consonant, or .

vowels

Gan has 6 vowels:i, y, e, a, o, u.

Initials

In each cell below, the first line indicates IPA, the second indicates pinyin.

Ps: ? is an initial without sound.

Finals

opening finals:

nasal finals:

entering finals:

independent finals:

consonantal finals

Example

Grammar

In Gan, there are 9 principal grammatical tenses – initial （起始）, progressive （進行）, experimental （嘗試）, durative （持續）, processive （經歷）, continuative （繼續）, repeating （重行）, perfect （已然）, complete （完成）.

The grammar of Gan is similar to southern Chinese languages. The sequence 'subject verb object’ is most typical, but ' subject object verb ' or the passive voice is possible with particles. Take a simple sentence for example: "I hold you." The words involved are: ngo , tsot dok , ň .
* Subject verb object : The sentence in the typical sequence would be: ngo tsot dok ň.
* Subject lat object verb: Another sentence of roughly equivalent meaning is ngo lat ň tsot dok, with the slight connotation of "I take you and hold" or "I get to you and hold."
* Object den subject verb : Then, ň den ngo tsot dok means the same thing but in the passive voice, with the connotation of "You allow yourself to be held by me" or "You make yourself available for my holding."

Vocabulary

In Gan, there are a number of archaic words and expressions originally found in ancient Chinese, and which are now seldom or no longer used in Mandarin. For example, the noun ‘clothes’ in Gan is ‘衣裳’ while ‘衣服’ in Mandarin, the verb ‘sleep’ in Gan is ‘睏覺’ while ‘睡覺’ in Mandarin. Also, to describe something dirty, Gan speakers use ‘下里巴人’, which is a reference to a song from the Chu （楚國） region dating to China's Spring and Autumn Period.

Additionally, there are numerous interjections in Gan （e.g. 哈、噻、啵）, which can largely strengthen sentences, and better express different feelings.

Writing system

Gan is written with Chinese characters, though it does not have a strong written tradition. There are also some romanization schemes, but none is widely used. Gan speakers usually use Vernacular Chinese as the written form, which is used by all Chinese speakers.

Note

Fuzhou dialect

Foochowese , also known as Fuzhou dialect, Foochow dialect, Foochow, Fuzhounese, or Fuzhouhua, is considered the standard dialect of Min Dong, which is a branch of mainly spoken in the eastern part of Fujian Province. Native speakers also call it , meaning the language spoken in everyday life.

Although traditionally called a dialect, Foochowese is actually a separate language according to linguistic standards, because it is not mutually intelligible with other , let alone other Chinese languages. Therefore, whether Foochowese is a ''dialect'' or a ''language'' is highly disputable.

Centered in Fuzhou City, Foochowese mainly covers eleven cities and counties, viz.: Fuzhou , , , , , , , Changle , , Fuqing and . Foochowese is also the second local language in northern and middle Fujian cities and counties, like Nanping , Shaowu , , Sanming and .

Foochowese is also widely spoken in some regions abroad, especially in Southeastern Asian countries like Malaysia and Indonesia. The city of Sibu in Malaysia is called "New Fuzhou" due to the influx of immigrants there in the early 1900s. Similarly, the language has spread to the USA, UK and Japan as a result of immigration in recent decades.

History

Formation

After Han China's occupation of Minyue in 110 BC, Han people began its reign in what is Fujian Province today. Having lost their nationalities, the aboriginal Minyue people, a branch of , were gradually assimilated into Chinese culture. The and Ancient Chu language brought by the mass influx of Han immigrants from Northern area gradually mixed with the local Minyue language and finally developed into the Ancient Min language, from which Foochowese evolved.

Foochowese came into being during the period somewhere between late Tang Dynasty and "Five Dynasties and Ten Kingdoms", and has been considered by most as a Chinese dialect ever since. However, it is also worth noting that its substratum is constituted by large quantities of well-preserved Minyue vocabulary. In this sense, Foochowese is a ''de facto'' mixed language of Ancient Chinese and Minyue language.

The famous book Qī Lín Bāyīn , which was compiled in the 17th century, is the first and the most full-scale rime book that provides a systematic guide to character reading for people speaking or learning Foochowese. It once served to standardize the language and is still widely quoted as an authoritative reference book in modern academic research in Chinese phonology.

Studies by early Western missionaries

In 1842, Fuzhou was open to Westerners as a treaty port after the signing of the Treaty of Nanjing. But due to the language barrier, however, the first Christian missionary base in this city did not take place without difficulties. In order to convert Fuzhou people, those missionaries found it very necessary to make a careful study of the Foochowese. Their most notable works are listed below:

:* 1856, M. C. White:
:* 1870, R. S. Maclay & C. C. Baldwin: An alphabetic dictionary of the Chinese language in the Foochow dialect
:* 1871, C. C. Baldwin: Manual of the Foochow dialect
:* 1891, T. B. Adam: An English-Chinese dictionary of the Foochow dialect
:* 1893, Charles Hartwell:
:* 1898, R. S. Maclay & C. C. Baldwin: An alphabetic dictionary of the Chinese language of the Foochow dialect, 2nd edition
:* 1906, The Foochow translation of the complete Bible
:* 1923, T. B. Adam & L. P. Peet: An English-Chinese dictionary of the Foochow dialect, 2nd edition
:* 1929, R. S. Maclay & C. C. Baldwin :

Status quo

By the end of the Qing Dynasty, Fuzhou society had been largely . But for decades the Chinese government has discouraged the use of the colloquial in school education and in media, so the number of speakers has been greatly boosted. It is reported that merely less than half of the children and youngsters in Fuzhou are able to speak this language.

Nevertheless, it should be noted that Foochowese is currently widely spoken among some native speakers as an "endearing" language. Speaking Foochowese in Fuzhou often allows mutual speakers a certain level of familiarity. Even though Mandarin Chinese is more often heard in casual conversations on the city streets, the careful observer will notice that in more communal settings, such as small neighborhoods in the city or the surrounding countryside, Foochowese is often the dominant language.

In Mainland China, Foochowese has been officially listed as Intangible Cultural Heritage and its promotion work is being systematically carried out. In Matsu, Taiwan, the teaching of Foochowese has been successfully introduced into elementary schools, alongside the Taiwanese localization movement.

Grammar

:''This section is about Standard Foochowese only. See for a discussion of other dialects.''

Phonetics

, Foochowese is a tonal language, which has extremely extensive sandhi rules in the , , and the . These over-complicated rules make Foochowese one of the most difficult Chinese languages.

Tones

There are seven original in Foochowese, which reserves the tonal system of Ancient Chinese:

The sample characters are taken from the Qī Lín Bāyīn.

In ''Qī Lín Bāyīn'', the Foochowese is described as having eight tones, which explains how the book got its title . That name, however, is somewhat misleading, because ?ng-siōng and Iòng-siōng are identical in tone contour; therefore, only seven tones exist.

?ng-?k and Iòng-?k characters are ended with either or Glottal stop .

Besides those seven tones listed above, two new tonal values, "21" and "35" also occur in connected speech .

Tonal sandhi

The rules of tonal sandhi in Foochowese are complicated, even compared with those of other Chinese dialects. When two or more than two characters combine into a word, the tonal value of the last character remains stable but those of its preceding characters change in most cases. For example, "獨", "立" and "日" are characters of Iòng-?k with the same tonal value "5", and are pronounced as , and , respectively. When combined together as the phrase "獨立日" , "獨" changes its tonal value to "21", and "立" changes its to "33", therefore the pronunciation as a whole is .

The two-character tonal sandhi rules are shown in the table below:

?ng-?k-gák are ?ng-?k characters with glottal stop and ?ng-?k-ék with .

However, the tonal sandhi rules of more than two characters are much more complicated than can be conveniently displayed in a single table.

Initials

There are seventeen in all:

The Chinese characters in the brackets are also sample characters from ''Qī Lín Bāyīn''.

Most Chinese linguists argue that Foochowese should be described as possessing a null onset. In fact, any character that has a null onset begins with a glottal stop .

Some speakers find it difficult to distinguish between the initials and ].

No such as or exist in Foochowese, which is one of the most conspicuous characteristics shared by all branches in the , as well as and .

and exist in connected speech only.

Initial assimilation

In Foochowese, there are various kinds of initial , all of which are progressive. When two or more than two characters combine into a phrase, the initial of the first character stays unchanged while those of the following characters, in most cases, change to match its preceding phoneme, i.e., the of its preceding character.

Rimes

The table below shows the eleven of Foochowese.

In Foochowese codas , , and have all merged as ; and , , have all merged as . Eleven vowel phonemes, together with the codas and , are organized into forty-six .

As has been mentioned above, there are theoretically two different entering tonal codas in Foochowese: and . But for most Foochowese speakers, those two codas are only distinguishable when in the or . Therefore, most Chinese linguists think that the codas and has merged together.

Close/Open rimes

All rimes come in pairs in the above table: the one to the left represents a close rime , while the other represents an open rime . The close/open rimes are closely related with the tones. As single characters, the tones of ?ng-bìng , Siōng-si?ng , Iòng-bìng and Iòng-?k have close rimes while ?ng-ké?? , ?ng-?k and Iòng-ké?? have the open rimes. In connected speech, an open rime shifts to its close counterpart in the .

For instance, "福" is a ?ng-?k character and is pronounced as and "州" a ?ng-bìng character with the pronunciation of . When these two characters combine into the word "福州" , "福" changes its tonal value from "24" to "21" and, simultaneously, shifts its rime from to , so the phrase is pronounced as . While in the word "中國" , "中" is a ?ng-bìng character and therefore its close rime never changes, though it does change its tonal value from "55" to "53" in the tonal sandhi.

The phenomenon of close/open rimes is unique to Foochowese and this feature makes it especially intricate and hardly intelligible even to other .

Phonological features

Vocabulary

Most words in Foochowese have cognates in other Chinese languages, so a non-Fuzhou speaker would find it much easier to understand Foochowese written in Chinese characters than spoken in conversation. But it should also be noted, however, that false friends do exist: for example, "莫細膩" means "don't be too polite" or "make yourself at home", "我對手汝洗碗" means "I help you wash dishes", "伊共伊老媽嚟冤家" means "he and his wife are quarreling ", etc. Sheer knowledge of Mandarin vocabulary does not help one catch the meaning of these sentences.

The majority of Foochowese vocabulary dates back to more than 1,200 years ago. Some daily-used words are even preserved as they were in Tang Dynasty, which can be illustrated by a poem of a famous Chinese poet Gu Kuang . In his poem ''Jiǎn'' , Gu Kuang explicitly noted:

In Foochowese, "囝" and "郎罷" are still in use today, without any slightest change.

Words from Ancient Chinese

Quite a few words from Ancient Chinese have retained the original meanings for thousands of years, while their counterparts in Mandarin Chinese have either fallen out of daily use or varied to different meanings.

This table shows some Foochowese words from Classical Chinese, as contrasted to Mandarin Chinese:

:¹ "看" is also used as the verb "to look" in Foochowese.
:² "養" in Foochowese means "give birth to ".

And this table shows some words that are both used in Foochowese and Mandarin Chinese, while the meanings in Mandarin Chinese have altered:

Words from Minyue language

Some daily used words, shared by all Min languages, came from the ancient Minyue language. Such as follows:

The literary and colloquial readings

The literary and colloquial readings is a feature commonly found in all Chinese dialects throughout China. The literary readings are mainly used in formal phrases and written language, while the colloquial ones are basically used in vulgar phrases and spoken language.

This table displays some widely used characters in Foochowese which have both literary and colloquial readings:

江
|
|-
| 百
| báik
| 百科 báik-ku?
| encyclopedical
| báh
| 百姓 báh-sáng
| common people
|-
| 飛
| h?
| 飛機 h?-g?
| aeroplane
| bu?i
| 飛鳥 bu?i-cēu
| flying birds
|-
| 寒
| hàng
| 寒食 Hàng-s?k
| Cold Food Festival
| gàng
| 天寒 ti?ng gàng
| cold, freezing
|-
| 廈
| h?
| 大廈 d?i-h?
| mansion
| ?
| 廈門
| Amoy
|}

Loan words from English

The First Opium War, also known as the First Anglo-Chinese War, was ended in 1842 with the signing of the Treaty of Nanjing, which forced the Qing government to open Fuzhou to all traders and missionaries. Since then, quite a number of churches and Western-style schools have been established. Consequently, some words Foochowese, but without fixed written forms in Chinese characters. The most frequently used words are listed below:
* , , noun, meaning "an article of dress", is from the word "coat";
* , , noun, meaning "a meshwork barrier in tennis or badminton", is from the word "net";
* , , noun, meaning "oil paint", is from the word "paint";
* , , noun, meaning "a small sum of money", is from the word "penny";
* , , noun, meaning "money", is from the word "take";
* , , noun, meaning "girl" in a humorous way, is from the word "girl";
* , , verb, meaning "to shoot ", is from the word "shoot";
* , , verb, meaning "to pause ", is from the word "again".
* , , meaning "Southeastern Asian ", is from the word "Malacca".

Other features of Foochowese grammar

Examples

Some common phrases in Foochowese:
* Foochowese : 福州話 / /
* Hello: 汝好 / /
* Good-bye: 再見 / /
* Please: 請 / / ; 起動 / /
* Thank you: 謝謝 / / ; 起動 / Kī-d?e?ng /
* Sorry: 對不住 / /
* This: 嚽 / / ; 啫 / / ; 茲 / /
* That: 噲 / / ; 嘻 / / ; 許 / /
* How much?: 偌 / /
* Yes: 正是 / / ; 無綻 / / ; 著 / /
* No: 伓是 / / ; 綻 / / ; 賣著 / /
* I don't understand: 我賣會意 / /
* What's his name?: 伊名什乇？ / /
* Where's the hotel?: 賓館洽底所？ / /
* How can I go to the school?: 去學校怎樣行？ / /
* Do you speak Foochowese?: 汝會講福州話賣？ / /
* Do you speak English?: 汝會講英語賣？ / /

Regional variations

Writing system

Chinese characters

Most of the characters of Foochowese stem from Ancient Chinese and can therefore be written in Chinese characters. Many books published in Qing Dynasty have been written in this traditional way, such as Mǐndū Biéjì and the Bible in Foochowese. However, Chinese characters as the writing system for Foochowese do have many shortcomings.

Firstly, a great number of characters are unique to Foochowese, so that they can only be written in informal ways. For instance, the character "", a negative word, has no common form. Some write it as "" or "", both of which share with it an identical pronunciation but has a totally irrelevant meaning; and others prefer to use a newly-created character combining "" and "", but this character is not included in most fonts.

Secondly, Foochowese has been excluded from the educational system for many decades. As a result, many if not all take for granted that Foochowese does not have a formal writing system and when they have to write it, they tend to misuse characters with a similar Mandarin Chinese enunciation. For example, " ", meaning "okay", are frequently written as "" because they are uttered almost in the same way.

Foochow Romanized

Foochow Romanized, also known as or , is a orthography for Foochowese adopted in the middle of 19th century by and missionaries. It had varied at different times, and became standardized several decades later. Foochow Romanized was mainly used inside of Church circles, and was taught in some Mission Schools in Fuzhou.

Mǐnqiāng Kuàizì

Mǐnqiāng Kuàizì , literally meaning "Fujian Colloquial Fast Characters", is a Qieyin System for Foochowese designed by Chinese scholar and calligrapher Li Jiesan in 1896.

Literary and art forms

Books and other sources

* Cathryn Donohue: , University of Nevada, Reno
* Chen, Leo & Norman, Jerry: ''An Introduction to the Foochow Dialect'', San Francisco State Coll., CA, 1965.

Foochow Romanized

Foochow Romanized, a.k.a. Bàng-u?-cê or Hók-ci?-u? Lò?-mā-cê , is a orthography for the Fuzhou dialect adopted in the middle of 19th century by Western missionaries. It had varied at different times, and became standardized several decades later. Foochow Romanized was mainly used inside of Church circles, and was taught in some Mission Schools in Fuzhou. But unlike its counterpart Pe?h-ōe-jī for Southern Min Language, Foochow Romanized, even in its prime days, was by no means universally understood by Christians.

History of Foochow Romanized

After Fuzhou became one of the five Chinese treaty ports opened by the Treaty of Nanjing at the end of First Opium War , many Western missionaries arrived in the city. Faced with widespread illiteracy, they developed romanization schemes for Fuzhou dialect.

The first attempt in romanizing Fuzhou dialect was made by the M. C. White, who borrowed a system of orthography known as the System of Sir William Jones. In this system, 14 were designed exactly according to their and . P, T, K and CH stand for , , and ; while the Greek spiritus lenis "?" were affixed to the above initials to represent their aspirated counterparts. Besides the default five vowels of Latin alphabet, four ?, ?, ? and ? were also introduced, representing , , and , respectively. This system is described at length in White's linguistic work .

Subsequent missionaries, including Robert S. Maclay from American Methodist Episcopal Mission, R. W. Stewart from the Church of England and Charles Hartwell from the American Board Mission, further modified White's System in several ways. The most significant change was made in the scheme of plosive consonants, by which the spiritus lenis "?" of the aspirated initials was totally removed and the letters B, D and G were introduced to represent and . In the aspect of vowels, ?, ?, ? and ? were replaced by A?, E?, O? and U?; and since the diacritical marks were all shifted to underneath the vowels, tonal marks were thus invented.

Scheme

The sample characters are taken from the phonetics book Qī Lín Bāyīn , a renowned phonology book about the Fuzhou dialect written in the Qing Dynasty. The pronunciations are recorded in standard symbols.

Initials

Rimes

Rimes without

Rimes with coda

Rimes with codas and

Tones

Note that Foochow Romanized uses the breve, not the caron , to indicate Yīnpíng and Yángrù tones of Fuzhou dialect.

Sample text

Fanqie

In Chinese phonology, fanqie is a method to indicate the pronunciation of a by using two other characters.

The Origin

Before ''fanqie'' was widely adopted, method of ''du ruo'' was used in works such as Erya . Introduction of around the first century brought Sanskrit. Its phonetic knowledge might have inspired the idea of ''fanqie''

Sun Yan is generally considered to be the first to adopt ''fanqie'' in ''Erya Yinyi'' . He was a man in ''Wei'' state during the period of Three Kingdoms . .

In the original ''fanqie'', a character's pronunciation is represented by two other characters. The consonant is represented by that of the first of the two characters ; the final and the are represented by those of the second of the two characters . . The representation of tone notably changed later.

In 601 AD during the Sui Dynasty, , a Chinese rhyme dictionary using ''fanqie'' was published .

Modern form

In Middle Chinese, the tone was represented by the rhyme character. However, owing to sound changes that have occurred since then, a more complicated rule is used today :
# The yin-yang classification, which arose in some tones due to distinctions in the onset, is determined by the onset character.
# The ping-shang-qu-ru classification, which is kept from Middle Chinese, is determined by the rhyme character.
Thus
: + =

For example, the character ? is represented by 德?切. The third character 切 indicates that this is a fanqie spelling, while the first two characters indicate the onset and rhyme respectively. Thus the pronunciation of ? is given as the onset of 德 ''dé'' with the rhyme of ? hóng'' , yielding ''dong''. Also, 德 has a yin ru tone and ? has a yang ping tone. So the tone of ? is yin ping.

Gari Ledyard has given this informative example of how an English equivalent to fanqie might look:
:To show the pronunciation of an unknown character, one "cut" the initial consonant from a second character and the rhyme from a third, and combined them to show the reading of the first. To use an English example, one could indicate the pronunciation of the word ''sough'' by "cutting" ''sun'' and ''now'' , or "cut" ''sun'' and ''cuff '' to show the alternate pronunciation. This method was a bit circular in that it required knowledge of the pronunciations of the characters that were "cut," but it proved to be a workable system and lasted well into the twentieth century.

Language change

Owing to the development of the Chinese language over the last millennium and a half, the fanqie spellings are not always accurate for of Modern Chinese; for example, the modern pronunciation of 德 is in a yang tone. However, it is still rather accurate for southern Chinese such as and Hakka, which have preserved many elements of and Middle Chinese.

Zhuyin table

This bopomofo table is a complete listing of all syllables used in Standard Mandarin. Each syllable in a cell is composed of an and a .

are grouped into subsets ㄚ, ㄧ, ㄨ and ㄩ.

ㄧ, ㄨ and ㄩ groupings indicate a combination of those finals with finals from Group ㄚ.

An empty cell indicates that the corresponding syllable does not exist in Standard Mandarin.

Please note that this table indicates possible combinations of and in Standard Mandarin, but does not indicate , which are equally important to the proper pronunciation Chinese. Although some initial-final combinations have some syllables using each of the 5 different tones, most do not. Some utilize only one tone.

Equivalent Hanyu Pinyin initials and finals are listed next to their respective bopomofo initial and final. Bopomofo entries in this page can also be compared to syllables using the Pinyin phonetic system in the Pinyin table page.

There are discrepancies between the bopomofo tables and the pinyin table due to the few standardization differences of a few slight characters between the mainland standard ''putonghua'' and the Taiwanese standard ''guoyu''. For example, the variant sounds 挼, 扽, 忒 are not used in ''guoyu''. Likewise the variant sound 孿 is not recognized in ''putonghua'', or it is folded into .

Zhuang logogram

Zhuang logograms or sawndip are logograms created as a derivative characters of Han characters and used by Zhuang in Guangxi, China. In , it is called
Gǔ Zhuàngzì or Fāngkuài Zhuàngzì , meaning ''old Zhuang'' or ''square shaped Zhuang''.

History

Sawndip is a Zhuang word that means immature character. Though it is not clarified when was the time of its creation, but the present oldest record of this logograms is a stela built in 689, Tang dynasty. These logograms were used earlier than Vietnamese Chu Nom.

They have been used for over 1300 years by Zhuang singers and shamans to record poems and scriptures. Though the romanized script for Zhuang language was created in 1957 as the official script, sawndip continues to be used to this day.

Published in 1989 the Sawndip Sawdenj includes characters written in manuscipts dated before the end of the Qing Dynasty .

Some logograms are used as a part of Han characters for Guangxi place names, such as ''bya'' for mountain or ''ndoeng'' for forest and are encoded as Unicode ideograms. However, many thousands of Zhuang logograms have yet to be encoded in Unicode.

Xiao'erjing

Xiao'erjing or Xiao'erjin or, in its shortened form, Xiaojing is the practice of writing Sinitic languages such as or the Dungan language in the Arabic script. It is used on occasion by many who adhere to the faith in China , and formerly by their Dungan descendants in Central Asia. writing reforms forced the Dungan to replace Xiao'erjing with a orthography and later a one, which they continue to use up until today.

Xiao'erjing is written from right to left, as with other writing systems based on the Arabic alphabet. The Xiao'erjing writing system is similar to the of the Uyghur language in that all the vowels are explicitly marked at all times. This is in contrast to the practice of omitting the short vowels in the majority of the languages for which the Arabic script has been adopted . This is possibly due to the overarching importance of the vowel in a Chinese syllable.

Nomenclature

Xiao'erjing does not have a standard name to which it can be referred. In Shanxi, Hebei, Henan, Shandong, eastern Shaanxi and also Beijing, Tianjin, and the , the script is referred to as "Xiǎo'érjīng", which when shortened becomes "Xiǎojīng" or "Xiāojīng" . In Ningxia, Gansu, Inner Mongolia, Qinghai, western Shaanxi and the , the script is referred to as "Xiǎo'érjǐn". The Dongxiang people refer to it as the "Dongxiang script" or the "Huihui script"; The Salar refer to it as the "Salar script"; The Dungan of Central Asia used a variation of Xiao'erjing called the "Hui script", before abandoning the Arabic script for and .

Origins

Since the arrival of Islam during the Tang Dynasty , many Arabic or speaking people migrated into China. Centuries later, these peoples assimilated with the native Han Chinese, forming the Hui ethnicity of today. Many Chinese Muslims students attended madrasas to study Classical Arabic and the Qur'an. Because these students had a very basic understanding of Chinese characters but would have a better command of the spoken tongue once assimilated, they starting using the Arabic alphabet for Chinese. This was often done by writing notes in Chinese to aid in the memorisation of surahs. This method was also used to write Chinese translations of Arabic vocabulary learnt in the madrasas. Thus, a system of writing the Chinese language with Arabic script gradually developed and standardised to some extent. Currently, the oldest known artefact showing signs of Xiao'erjing is a stone stele in the courtyard of in Xi'an in the province of Shaanxi. The stele shows inscribed Qur'anic verses in Arabic as well as a short note of the names of the inscribers in Xiao'erjing. The stele was done in the year AH 740 in the Islamic calendar .

Usage

Xiao'erjing can be divided into two sets, the "Mosque system", and the "Daily system". The "Mosque system" is the system used by pupils and imams in mosques and madrasahs. It contains much Arabic and Persian religious lexicon, and no usage of Chinese characters. This system is relatively standardised, and could be considered a true writing system. The "Daily system" is the system used by the less educated for letters and correspondences on a personal level. Often simple Chinese characters are mixed in with the Arabic alphabet, mostly discussing non-religious matters, and therewith relatively little Arabic and Persian loans. This practice can differ drastically from person to person. The system would be devised by the writer himself, with one's own understanding of the Arabic and Persian alphabets, mapped accordingly to one's own dialectal pronunciation. Often, only the letter's sender and the letter's receiver can understand completely what is written, while being very difficult for others to read.

Modern usage

In recent years, the usage of Xiao'erjing is nearing extinction due to the growing economy of the People's Republic of China and the improvement of the education of Chinese characters in rural areas of China. Chinese characters along with Hanyu Pinyin have since replaced Xiao'erjing. Since the mid 1980s, there have been much scholarly work done within and outside China concerning Xiao'erjing. On-location research has been conducted and the users of Xiao'erjing have been interviewed. Written and printed materials of Xiao'erjing were also collected by researchers, the ones at Nanjing University being the most comprehensive.

Alphabet

Xiao'erjing has 36 letters, 4 of which are used to represent vowel sounds. The 36 letters consists of 28 letters borrowed from Arabic, 4 letters borrowed from Persian along with 2 modified letters, and 4 extra letters unique to Xiao'erjing.

and consonants

{| border="0" width="100%"
|-
|bgcolor="#FFFFFF" valign="top" width="90%"|

|}

and vowels

Chinese

{| border="0" width="100%"
|-
|bgcolor="#FFFFFF" valign="top" width="90%"|

|}

Vowels in Arabic and Persian loans follow their respective orthographies, namely, only the long vowels are represented and the short vowels are omitted.
Although the sukuun can be omitted when representing Arabic and Persian loans, it cannot be omitted when representing Chinese. The exception being that of oft-used monosyllabic words which can have the sukuun omitted from writing. For example, when emphasised, "的" and "和" are written as "??" and " ??"; when unemphasised, they can be written with the sukuuns as "??" and " ??", or without the sukuuns as "?" and " ?".
Similarly, the sukuun can also represent the Chinese - final. This is sometimes replaced by the fatHatan , the kasratan , or the dammatan .
In polysyllabic words, the final 'alif that represents the long vowel -ā can be omitted and replaced by a fatHah representing the short vowel -?.
Xiao'erjing is similar to Hanyu Pinyin in the respect that words are written as one, while a space is inserted between words.
When representing Chinese words, the shaddah sign represents a doubling of the entire syllable on which it rests. It has the same function as the Chinese iteration mark "々".
Arabic punctuation marks can be used with Xiao'erjing as can Chinese punctuation marks, they can also be mixed .

Example

Article 1 of the Universal Declaration of Human Rights in Xiao'erjing, simplified and traditional Chinese characters, Hanyu Pinyin and English:
*Xiao'erjing:
*:
*:
*: "Rénrén shēng ér zìyóu, zài zūnyán hé quánlì shàng yílǜ píngděng. Tāmen fù yǒu lǐxìng hé liángxīn, bìng yīng yǐ xiōngdiguānxì de jīngshén hùxiāng duìdài."
*: "All human beings are born free and equal in dignity and rights. They are endowed with reason and conscience and should act towards one another in a spirit of brotherhood."

Wu Chinese

Wu is one of the major of the Chinese language. It is spoken in most of Zhejiang province, the municipality of Shanghai, southern Jiangsu province, as well as smaller parts of Anhui, Jiangxi, and Fujian provinces. Major Wu dialects include those of , , , , , , , and . The traditional prestige dialect of Wu is the Suzhou dialect, though due to its large population, the Shanghai dialect is today sometimes considered the prestige dialect.

As of 1991, there are at least 77 million speakers of Wu Chinese, making it the second most populous Chinese language after , which has 800 million speakers, and the 10th most populous language in the world.

Among speakers of other Chinese languages, Wu is often subjectively judged to be soft, light, and flowing. There is even a special term used to describe these qualities of Wu speech . The actual source of this impression is harder to place. It is likely a combination of many factors. Among speakers of Wu, for example, Shanghainese is considered softer and mellower than the variant spoken in , although some Wu speakers still insist that old standard Suzhou dialect is more pleasant and beautiful than the dialects of Shanghai and Ningbo.

Like other varieties of , there is debate as to whether Wu is a language or a dialect. By the standard of mutual intelligibility, Wu is a separate language; however, socially it is considered to be a regional form of the Chinese language. See Identification of the varieties of Chinese for the issues surrounding this dispute. In terms of written communication, there is a great but not complete degree of mutual intelligibility between Wu and Mandarin within the People's Republic of China as both are written in the current Vernacular Chinese, which uses Simplified Chinese characters as well as grammar and vocabulary centred on Standard Mandarin with a few allowances for "regional variation".

History

The modern Wu language can be traced back to the ancient Wu and peoples centred around what is now southern Jiangsu and northern Zhejiang. The Japanese pronunciation of Chinese characters is from the same region of China where Wu is spoken today.

Like most other branches of Chinese, Wu descends from Middle Chinese. Although Wu represents the earliest split from the rest of these branches, and thus keeps many ancient characteristics, it was influenced by northern Chinese throughout its development. This was due to its geographical closeness to north China and also to the high rate of education in this region. During the time between Ming Dynasty and early Republican era, the main characteristics of modern Wu were formed. The Suzhou dialect became the most influential, and many dialectologists use it in citing examples of Wu.

After the Taiping Revolution at the end of Qing dynasty, in which most of the other Wu-speaking regions were largely destroyed, Shanghai became an important city with immigration from other Wu-speaking regions. This greatly affected the language of Shanghai, making it a language island compared to the surrounding area. In the first half of the 20th century, before Mandarin was strongly promoted in the Wu area, ''Shanghainese'' played the role of a regional ''lingua franca'' and gradually replaced the influence of the Suzhou dialect.

After the founding of People's Republic of China, the strong promotion of Mandarin in the Wu-speaking region influenced the development of the language. Wu was gradually excluded from most modern media and schools. Public organisations are required to use Mandarin. With the influx of a migrant non Wu-speaking population and the near total ''mandarinisation'' of public media and organizations, as well as the radical Mandarin promotion measures, more and more children of Wu descent cannot speak Wu anymore, even within their families. Instead, Mandarin has become their mother tongue.

Many people have noticed this trend and thus call for the protection of this language. More and more TV programs in Wu appear although they are mostly comedies rather than formal programs.

Roughly speaking, modern Wu is a leftover of the Chinese dialects – starting from 1500 BC with Wu's position relative to other dialects.

Dialects

Many Wu dialects are diverse and not mutually intelligible with each other. However, all Wu dialects including Oujiang can understand the Taihu dialect, while Taihu speakers find the other dialects unintelligible or intelligible only to a small extent.

According to Yan , Wu is divided into six dialect areas:

*Taihu : Spoken over much of southern part of Jiangsu province, including Suzhou, Wuxi,Changzhou,southern part of Nantong,Jingjiang and Danyang; the municipality of Shanghai; and the northern part of Zhejiang province, including Hangzhou, Shaoxing, Ningbo, Huzhou, and Jiaxing. This group makes up the largest population among all Wu speakers. The subdialects of this region are, in a large degree, mutually intelligible among each other.
**Shanghainese
**Suzhou dialect
**Hangzhou dialect
**Ningbo dialect
**Wuxi dialect
**Changzhou dialect
*Taizhou : Spoken in and around , Zhejiang province. Taizhou Wu is among the southern dialects the closest to Taihu Wu, also known as North Wu, and can communicate with speakers of Taihu Wu.
*Oujiang /Dong'ou : Spoken in and around Wenzhou, Zhejiang province. This dialect is the most distinctive and mutually unintelligible among all the Wu dialects. Some dialectologists even treated it as a dialect separate from the rest of Wu dialect.
**Wenzhou dialect
*Wuzhou : Spoken in and around Jinhua, Zhejiang province. Like Taizhou Wu dialect, it is mutually intelligible with Taihu Wu dialect at least in some degree.
*Chuqu : Spoken in and around Lishui and Quzhou in Zhejiang as well as in Shangrao County and Yushan County in Jiangxi province.
*Xuanzhou : Spoken in and around Xuancheng, Anhui province. This part of Wu is becoming less spoken since the campaign started by Taiping Revolution and is being slowly replaced by the immigrants' mandarin dialect from the north of Yangtse river.

Phonology

According to Yan , the Wu dialects are notable among Chinese languages in having kept the "muddy" s and s of Middle Chinese, such as ''etc.'', thus maintaining the three-way contrast of Middle Chinese stops and affricates, , , ''etc.'' In tone, the Wu dialects may have as few as two word tones , to eight or more syllable tones .

See Suzhou dialect, Hangzhou dialect, and Shanghai dialect for examples of Wu phonology.

Grammar

The Wu pronoun system is complex when it comes to personal and demonstrative pronouns. For example, the first person plural pronoun differs when it is inclusive and when it is exclusive . Wu employs six demonstratives, three of which are used to refer to close objects, and three of which are used for further objects.

In terms of word order, Wu uses SVO , but unlike Mandarin, it can also be spoken in SOV.

In terms of phonology, tone sandhi is extremely complex, and helps parse multisyllabic words and idiomatic phrases. In some cases, indirect objects are distinguished from direct objects by a voiced/voiceless distinction.

Vocabulary

Resources on Wu dialects

* glossika.com
** – Search in Mandarin, Shanghai, IPA, or English
** – By James Campbell
** – Compiled by James Campbell
* –
A BBS set up in 2004, in which topics such as phonology, grammar, orthography and romanization of Wu Chinese are widely talked about. The cultural and linguistic diversity within China is also a significant concerning of this forum.
* wu-chinese.com
** – Excellent reference on Wu Chinese, including tones of the sub-dialects.
** – Aimed at modernization of Wu Chinese, including basics of Wu, Wu romanization scheme, pronunciation dictionaries of different dialects, Wu input method development, Wu research literatures, written Wu experiment, Wu orthography, a discussion forum etc.

Articles

* – A comprehensive article, written by Wu Mei and Guo Zhenzhi of World Association for Christian Communication, related to the struggle for national cultural unity by current Chinese Communist national government while desperately fighting for preservation on Chinese regional cultures that have been the precious roots of all Han Chinese people . Excellent for anyone doing research on Chinese language linguistic, anthropology on Chinese culture, international business, foreign languages, global studies, and translation/interpretation.
* – An excellent article originally from Straits Times Interactive through YTL Community website, it provides an insight of Chinese dialects, both major and minor, losing their speakers to Standard Mandarin due to greater mobility and interaction. Excellent for anyone doing research on Chinese language linguistic, anthropology on Chinese culture, international business, foreign languages, global studies, and translation/interpretation.
* – An excellent article including a section on future exchange programs in learning Chinese language in Hangzhou . Requires registration of online account before viewing.
* – This newspaper article provides a deep insight on the danger of decline in the usage of dialects, including Wu dialects, other than the rising star of Standard Mandarin. It also mentions an exception where some grassroots’ organizations and, sometimes, larger institutions, are the force behind the preservation of their dialects. Another excellent article for research on Chinese language linguistic, anthropology on Chinese culture, international business, foreign languages, global studies, and translation/interpretation.
* – Article on the use of dialects other than standard Mandarin in China where strict media censorship is high.
* – Another article on the use of dialects other than standard Mandarin in China.

Written Chinese

Written Chinese comprises the written symbols used to represent spoken Chinese and the rules about how they are arranged and punctuated. These symbols are commonly known as Chinese characters . Chinese characters do not constitute an alphabet or a compact syllabary. Rather, the writing system is roughly logosyllabic; that is, each character generally represents either a complete one-syllable word or a single-syllable part of a word. The characters themselves are often composed of parts that may represent physical objects, abstract notions, Instead, Chinese characters are glyphs whose components may depict objects or represent abstract notions. Occasionally, a character consists of only one component; more commonly, two or more components are combined, using a variety of different principles, to form more complex characters. The best known exposition of Chinese character composition is the , compiled by around 120 CE. Since Xu Shen did not have access to Chinese characters in their earliest forms, his analysis cannot always be taken as authoritative. Nonetheless, no later work has supplanted the Shuowen Jiezi in terms of breadth, and it is still relevant to etymological research today.

According to the Shuowen Jiezi, Chinese characters are developed on six basic principles. The first two principles produce simple characters, known as 文 wén: In fact, some phonetic complexes were originally simple pictographs that were later augmented by the addition of a semantic root. An example is 炷 zhù "candle", which was originally a pictograph 主, a character that is now pronounced zhǔ and means "host". The character 火 huǒ "fire" was added to indicate that the meaning is fire-related.

The last two principles do not produce new written forms; instead, they transfer new meanings to existing forms:

Layout

Chinese characters conform to a roughly square frame and are not usually linked to one another, so they can be written in any direction in a square grid. Traditionally, Chinese is written in vertical columns from top to bottom; the first column is on the right side of the page, and the text runs toward the left. Text written in Classical Chinese also uses little or no . In such cases, sentence and phrase breaks are determined by context and rhythm.

In modern times, the familiar Western layout of horizontal rows from left to right, read from the top of the page to the bottom, has become more popular, especially in the People's Republic of China ; the government there mandated left-to-right writing in 1955. The Republic of China followed suit in 2004. Punctuation has also become more prevalent, whether the text is written in columns or rows. The punctuation marks are clearly influenced by their Western counterparts, although some marks are particular to Chinese: for example, the double and single quotation marks ; the hollow period , which is otherwise used just like an ordinary full stop; and a special kind of comma called an ''enumeration comma'' , which is used to separate items in a list, as opposed to clauses in a sentence.

Signs are often a particularly challenging aspect of written Chinese layout, since they can be written either left to right or right to left , as well as from top to bottom. It is not unusual to encounter all three orientations on signs on neighboring stores.

Evolution

In 2003, tentative evidence was found at , an archaeological site in the province of China, for an early form of Chinese writing. Some symbols were found that bear striking resemblance to certain modern characters, such as 目 mù "eye". Since the Jiahu site dates from about 6600 BCE, it predates the earliest confirmed Chinese writing by about 4,000 years. The nature of this finding—whether it represents true writing or simply proto-writing —is still disputed. Critics contend that if the Jiahu finding really represented a direct ancestor of modern Chinese writing, it would indicate that Chinese writing remained relatively static for three millennia, at a time when China was sparsely populated.

The first ''indisputable'' examples of Chinese writing, dating back to the Shāng Dynasty in the latter half of the second millennium BCE, are the oracle bones , originally used for divination. Characters were inscribed on the bones in order to frame a query; the bones were then heated over a fire, and the resulting cracks were interpreted to determine the answer to the query. Such characters are called 甲骨文 jiǎgǔwén "shell-bone script" or oracle bone script.

After the Shāng Dynasty, Chinese writing evolved into the form found on bronzeware made during the Western and the Spring and Autumn Period , a kind of writing called 金文 jīnwén "metal script". Jinwen characters are more regular and angular than the embellished script of the oracle bone script. Later, in the Warring States Period , the script became still more regular, and settled on a form, called 六國文字/六国文字 liùguó wénzì "script of the six states", that Xu Shen used as source material in the Shuowen Jiezi. These characters were later embellished and stylized to yield the 篆書/篆书 zhuànshū seal script, which represents the oldest form of Chinese characters surviving to modern use. They are used principally for signature seals, or , which are often used in place of a signature, for Chinese documents and artwork. During the Qin dynasty, promulgated the seal script as the standard throughout the empire, then newly unified.

Seal script, in turn, evolved into the other surviving writing styles. Clerical script developed first, after the seal script. In general, clerical script characters are "flat" in appearance, being wider than the seal script, which tends to be taller than it is wide. Compared with the seal script, clerical script characters are strikingly rectilinear. In running script , a semi-cursive form, the character parts begin to run into each other, although the characters themselves generally remain separate. There are some conventions in which characters deviate from their canonical forms in a consistent manner. Running script eventually evolved into grass script , a fully cursive form, in which the characters are often entirely unrecognizable by their canonical forms. Grass script gives the impression of anarchy in its appearance, and there is indeed considerable freedom on the part of the calligrapher, but this freedom is circumscribed by conventional "abbreviations" in the forms of the characters. Regular script , a non-cursive form, is the most widely recognized script. In regular script, each stroke of each character is clearly drawn out from the others. Even though both the running and grass scripts appear to be derived as semi-cursive and cursive variants of regular script, it is in fact the regular script that was the last to develop.

Regular script is considered the archetype for Chinese writing, and forms the basis for most printed forms. In addition, regular script imposes a , which must be followed in order for the characters to be written correctly. Thus, for instance, the character 木 mù "wood" must be written starting with the horizontal stroke, drawn from left to right; next, the vertical stroke, from top to bottom, with a small hook toward the upper left at the end; next, the left diagonal stroke, from top to bottom; and lastly the right diagonal stroke, from top to bottom.

Simplified and traditional Chinese

In the 20th century, written Chinese divided into two canonical forms, called 簡體字/简体字 jiǎntǐzì and 繁體字/繁体字 fántǐzì . Simplified Chinese was developed in mainland China in order to make the characters faster to write and easier to memorize. The People's Republic of China has claimed that both goals have been achieved, but some external observers disagree. Little systematic study has been conducted on how simplified Chinese has affected the way Chinese people become literate; the only studies conducted before it was standardized in mainland China seem to have been statistical ones regarding how many strokes were saved on average in samples of running text.

The simplified forms have also been criticized for being inconsistent. For instance, traditional 讓 ràng "allow" is simplified to 让, in which the phonetic on the right side is reduced from 17 strokes to just three. However, the same phonetic is used in its full form, even in simplified Chinese, in such characters as 壤 rǎng "soil" and 齉 nàng "snuffle"; these forms remained uncontracted because they were relatively uncommon and would therefore represent a negligible stroke reduction. On the other hand, some simplified forms are simply calligraphic abbreviations of long standing, as for example 万 wàn "ten thousand", for which the traditional Chinese form is 萬.

Simplified Chinese is standard in the People's Republic of China, Singapore, and Malaysia. Traditional Chinese is retained in Hong Kong, Taiwan, Macau and overseas Chinese communities . Throughout this article, Chinese text is given in both simplified and traditional forms when they differ, with the traditional forms being given first.

Function

At the inception of written Chinese, spoken Chinese was monosyllabic; that is, Chinese words expressing independent concepts were usually one syllable. Each written character corresponded to one monosyllabic word. The spoken language has since become polysyllabic, but because modern polysyllabic words are usually composed of older monosyllabic words, Chinese characters have always been used to represent individual Chinese syllables.

For over two thousand years, the prevailing written standard was a vocabulary and syntax rooted in Chinese as spoken around the time of Confucius , called Classical Chinese, or 文言文 wényánwén. Over the centuries, Classical Chinese gradually acquired some of its grammar and character senses from the various dialects. This accretion was generally slow and minor, however; by the 20th century, Classical Chinese was distinctly different from any contemporary dialect, and had to be learned separately. Once learned, however, it was a common medium for communication between people speaking different dialects, many of which were mutually unintelligible by the end of the first millennium CE. A Mandarin speaker might say yī, a Cantonese yat, and a Hokkienese tsit, but all three will understand the character 一 "one".

Chinese dialects vary not only by pronunciation, but also, to a lesser extent, vocabulary and grammar. This version of written Chinese is called Vernacular Chinese, or 白話/白话 báihuà . Despite its ties to the dominant Mandarin dialect, Vernacular Chinese also permits some communication between people of different dialects, limited by the fact that Vernacular Chinese expressions are often ungrammatical or unidiomatic in non-Mandarin dialects. This role may not differ substantially from the role of other lingua francas, such as Latin: For those trained in written Chinese, it serves as a common medium; for those untrained in it, the graphic nature of the characters is in general no aid to common understanding . In this regard, Chinese characters may be considered a large and inefficient phonetic script. However, Ghil'ad Zuckermann’s exploration of phono-semantic matching in Standard Mandarin concludes that the Chinese writing system is multifunctional, conveying both semantic and phonetic content.

The variation in vocabulary among dialects has also led to the informal use of "dialectal characters", as well as standard characters that are nevertheless considered archaic by today's standards. Cantonese is unique among non-Mandarin regional languages in having a written colloquial standard, used in Hong Kong and overseas, with a large number of unofficial characters for words particular to this dialect. has become quite popular in online chat rooms and instant messaging, although for formal written communications Cantonese speakers still normally use Vernacular Chinese.

Other languages

Chinese characters were first introduced into Japanese sometime in the first half of the first millennium CE, probably from Chinese products imported into Japan through Korea. At the time, Japanese had no native written system, and Chinese characters were used for the most part to represent Japanese words with the corresponding meanings, rather than similar pronunciations. A notable exception to this rule was the system of man'yōgana, which used a small set of Chinese characters to help indicate pronunciation. The man'yōgana later developed into the phonetic alphabets, hiragana and katakana.

Chinese characters are called hànzì in Chinese, after the of China; in Japanese, this was pronounced kanji. In modern written Japanese, kanji are used for nouns, verb stems, and adjective stems, while hiragana are used for prefixes and suffixes; katakana are used exclusively for sound symbols, and for loans from other languages. The , a list of kanji for common use standardized by the Japanese government, contains 1,945 characters—about half the number of characters commanded by literate Chinese.

The role of Chinese characters in Korean and is much more limited. At one time, many Chinese characters were introduced into Korean for their meaning, just as in Japanese.

Literacy

Because the majority of modern Chinese words contain more than one character, there are at least two measuring sticks for Chinese literacy: the number of characters known, and the number of words known. John DeFrancis, in the introduction to his ''Advanced Chinese Reader'', estimates that a typical Chinese college graduate recognizes 4,000 to 5,000 characters, and 40,000 to 60,000 words. Jerry Norman, in ''Chinese'', places the number of characters somewhat lower, at 3,000 to 4,000.

These counts are complicated by the tangled development of Chinese characters. In many cases, a single character came to be written in multiple ways, as with English "color/colour". This latter development was stemmed to an extent by the standardization of the seal script during the Qin dynasty, but soon started again. Although the Shuowen Jiezi lists 10,516 characters—9,353 of them unique plus 1,163 graphic variants—the 集韻/集韵 of the Northern , compiled less than 1,000 years later in 1039, contains no fewer than 53,525 characters, most of them graphic variants.

Dictionaries

Chinese is not based on an alphabet or syllabary, so Chinese dictionaries cannot be alphabetized or otherwise lexically ordered, as English dictionaries are. The need to arrange Chinese characters in order to permit efficient lookup has given rise to a considerable variety of ways to organize and index the characters.

A traditional mechanism is the method of radicals, which uses a set of character roots. These roots, or radicals, generally but imperfectly align with the parts used to compose characters by means of logical aggregation and phonetic complex. A canonical set of 214 radicals was developed during the rule of the ; these are sometimes called the Kangxi radicals. The radicals are ordered first by stroke count ; within a given stroke count, the radicals also have a prescribed order.

Every Chinese character falls under the heading of exactly one of these 214 radicals.

Because the method of radicals is applied only to the written character, one need not know how to pronounce a character before looking it up; the entry, once located, usually gives the pronunciation. However, it is not always easy to identify which of the various roots of a character is the proper radical. Accordingly, dictionaries often include a list of hard to locate characters, indexed by total stroke count, near the beginning of the dictionary. Some dictionaries include almost one-seventh of all characters in this list. This index points to the page in the main dictionary where the desired character can be found. Other methods use only the structure of the characters, such as the four-corner method, in which characters are indexed according to the kinds of strokes located nearest the four corners , or the , in which characters are broken down into a set of 24 basic components. Neither the four-corner method nor the Cangjie method requires the user to identify the proper radical, although many strokes or components have alternate forms, which must be memorized in order to use these methods effectively.

Transliteration and romanization

Chinese characters do not unambiguously indicate their pronunciation, even for any single dialect. It is therefore useful to be able to transliterate a dialect of Chinese into the Latin alphabet, for those who cannot read Chinese characters. However, transliteration was not always considered merely a way to record the sounds of any particular dialect of Chinese; it was once also considered a potential replacement for the Chinese characters. This was first prominently proposed during the May Fourth Movement, and it gained further support with the victory of the Communists in 1949. Immediately afterward, the mainland government began two parallel programs relating to written Chinese. One was the development of an alphabetic script for Mandarin, which was spoken by about two-thirds of the Chinese population; the other was the simplification of the traditional characters—a process that would eventually lead to simplified Chinese. The latter was not viewed as an impediment to the former; rather, it would ease the transition toward the exclusive use of an alphabetic script.

By 1958, however, priority was given officially to simplified Chinese; a phonetic script, hanyu pinyin, had been developed, but its deployment to the exclusion of simplified characters was pushed off to some distant future date. The association between pinyin and Mandarin, as opposed to other dialects, may have contributed to this deferment. It seems unlikely that pinyin will supplant Chinese characters anytime soon as the ''sole'' means of representing Chinese.

Pinyin uses the Latin alphabet, along with a few diacritical marks, to represent the sounds of Mandarin in standard pronunciation. For the most part, pinyin uses vowel and consonant letters as they are used in Romance languages . However, although 'b' and 'p', for instance, represent the voice/unvoiced distinction in some languages, such as , they represent the distinction in Mandarin; Mandarin has few voiced consonants. All transliterations in this article use the pinyin system.

=Works cited

*
*
*
*
*
*
*

Written Cantonese

Written Cantonese refers to the written language used to write colloquial standard Cantonese using Chinese characters.

Cantonese is usually referred to as a spoken variant, and not as a written variant. Spoken vernacular Cantonese is different from Written Standard Chinese, which is essentially formal Standard Mandarin in written form. Written Chinese spoken word for word in Cantonese sounds overly formal and distant. As a result, an informal locally written script which matched the spoken features of language developed over time. This resulted in the formation of additional Chinese characters to complement the existing characters. Many of these represent phonological sounds not present in Mandarin. A good source for well documented written Cantonese words can be found in the scripts for Cantonese opera.

With the advent of the computer and standardization of character sets specifically for Cantonese, many printed materials in predominantly Cantonese spoken areas of the world are written to cater to their population with these written Cantonese characters. As a result, mainstream media such as newspapers and magazines have become progressively less conservative and more colloquial in their dissemination of ideas. Generally speaking, some of the older generation of Cantonese speakers regard this trend as a step backwards and away from tradition. This tension between the ''old'' and ''new'' is a reflection of a transition that is taking place in the Cantonese-speaking population.

History

Before the 20th century, the standard written language of China was Classical Chinese, which has grammar and vocabulary based on the Chinese used in ancient China, Old Chinese. However, while this written standard remained essentially static for over two thousand years, the actual spoken language diverged further and further away. Some writings based on local vernacular speech did exist but these were rare. In the early 20th century, Chinese reformers like Hu Shi saw the need for language reform and championed the development of a vernacular that allowed modern Chinese to write the language the same way they speak. The vernacular language movement took hold, and the written language was standardized as Vernacular Chinese. For unity's sake, the dialects were chosen as the basis for the new standard, despite the variation in colloquial speech throughout China, on the basis of the number of speakers.

The standardization and adoption of Vernacular Chinese as standard written Chinese pre-empted the development and standardization of other vernaculars based on other Chinese varieties. No matter what dialects one spoke, one still wrote in standard written Chinese for everyday writing. However, Cantonese is unique among the non-Mandarin spoken varieties in having a widely used colloquial written form. Because of Cantonese speaking Hong Kong’s isolation from the rest of while under rule, Cantonese is also unique in having a large number of speakers who do not speak Mandarin, a fact which has prompted the creation of a standard for Written Cantonese to facilitate written communication between Cantonese speakers without the need for translation. But even so, this kind of writing is considered by some people to be informal, non-standard and unprofessional. Cantonese speakers have to use standard written Chinese in most formal written communications, since written Cantonese contains many unique characters and grammatical structures that may be unfamiliar or even unintelligible to other speakers of other Chinese spoken variants.

Historically, written Cantonese has been used in Hong Kong for legal proceedings in order to write down the exact spoken testimony of a witness, instead of paraphrasing spoken Cantonese into standard written Chinese. However, its popularity and usage has been rising in the last two decades, the late Wong Jim being one of the pioneers of its use as an effective written language. Written colloquial Cantonese has become quite popular in certain tabloids, online chat rooms, and instant messaging. Some tabloids like ''Apple Daily'' write colloquial Cantonese; papers may contain editorials that contain Cantonese; and Cantonese-specific characters can be increasingly seen on advertisements and billboards. Written Cantonese remains limited outside of Hong Kong, even in other Cantonese-speaking areas such as Guangdong, where the use of colloquial writing is discouraged. Despite the relative popularity of written Cantonese in Hong Kong, some disdain it, believing that being too accustomed to write in such a way would affect a person's ability to use standard written Chinese in situations that demand it.

Cantonese characters

Written Cantonese contains many characters not used in standard written Chinese in order to transcribe words not present in the standard lexicon. Despite attempts by the government of Hong Kong in the 1990’s to standardize this character set, culminating in the release of the for use in electronic communication, there is still significant disagreement about which characters are ‘correct’ in written Cantonese.

Synonyms

Some characters used to represent words in Cantonese are simply synonyms of words used in standard written Chinese. The most common are the character for the verb "to be" and the character for "not" , which are simply replaced by ; and , respectively. Another example is the pronoun , which is replaced by . The plural pronoun marker is replaced by . The possessive is replaced by . For instance:

Cognates

There are certain words that share a common root with words in standard written Chinese. However, because they have diverged in pronunciation, tone, and/or meaning, they are often written using a different character. One example is the lai2 and lei4 , meaning "to come." Both share the same meaning and usage, but because the colloquial pronunciation differs from the literary pronunciation, they are represented using two different characters, and , respectively. Some people argue that representing the colloquial pronunciation with a different character is superfluous, and encourage using the same character for both forms since they are cognates .

Native words

Some words are native to Cantonese and have no equivalents in Standard Chinese . Another situation is that some Cantonese words, with their corresponding characters, did exist in Standard Chinese in ancient times, but have since become obsolete in Standard Chinese, and only the words, not the characters, survive in Cantonese.

Today those characters can mainly be found in ancient such as ''Guangyun''. Some scholars have made some "archaeological" efforts to find out what the "original characters" are. Often, however, these efforts are of little use to the modern Cantonese writer, since the characters so discovered are not available in the available to computer users.

On the other hand, some of the characters have only been corrupted in Cantonese, not disappeared in Standard Chinese. For instance, it is suggested that the common word leng3 , written with the character in Cantonese , should rather be written with the character .

Loanwords

These are characters created to represent loanwords borrowed into Cantonese.

Examples:

* Elevator
: /lip1/, composed of the radical and the phonetic component //, which means "to stand."

Particles

Cantonese makes use of particles in speech. Some are added to the end of a sentence while others are suffixed to verbs to indicate aspect. There are many such particles; here are a few.

* - "mē" placed at end of sentence to indicate disbelief
* - "nē" placed at end of sentence to indicate question
* - "meih" placed at end of sentence to ask if action is done yet
* - "háh" placed after a verb to indicate a little bit, ie "eat a little bit"; "há" used singly, to show uncertainty or unbelief
* - "gán" placed after a verb to indicate a progressive, ie "I am eating"
* - "jó" placed after verb to indicate a completed action, ie "I finished eating"
* - "màaih" placed after verb to indicate a future tense, ie "I will finish eating"
* - "wà" wow!

Cantonese words

In Chinese, distinction is made between single syllable , which may represent either a word, morpheme, or , and multi-syllabic words. Characters are generally represented by a unique character, while a word may be composed of two or more characters, which may not be necessarily related in meaning. Thus, some Cantonese words may use existing characters to form words which do not exist or possess different meaning in standard Chinese.

Loanwords

Some Cantonese loanwords are not necessarily written with new characters and simply use the pronunciations of existing Chinese characters; because many loanwords originated from Hong Kong or overseas Chinese, they often use different characters and pronunciations than the Mandarin Chinese equivalents .

see for a list of loan words in Cantonese.

Cantonese character formation

Cantonese characters, as with regular Chinese characters, are formed in one of several ways:

Borrowings

Some characters already exist in standard Chinese, but are simply reborrowed into Cantonese with new meanings. Most of these tend to be archaic or rarely used characters. An example is the character 子, which means "child". The Cantonese word for child is represented by 仔, which has the original meaning of "young animal".

Marked phonetic loans

Many characters used in colloquial Cantonese writings are formed by putting a mouth radical on the left hand side of another more well known character, usually a standard Chinese character. This indicates that the new character sounds like the standard character, but is only used phonetically in the Cantonese context. The characters which are commonly used in Cantonese writing include:
*
* lek7
* haa4
*
* ngaak7 Standard Chinese:
* gam2 Standard Chinese: e.g.
* gam5 Standard Chinese: e.g.
* zo2 Standard Chinese:
* me1
* sai3 Standard Chinese: ,
* dei6 Standard Chinese:
* ni1 Standard Chinese:
* m4 Standard Chinese:
*}
* ngaam1 Standard Chinese:
* di1 Standard Chinese: , ,
* yuk7 Standard Chinese:
* dou6 Standard Chinese:
* hai2 At, in, during , at, in Standard Chinese:
* go2 Standard Chinese:
* ge3 Standard Chinese:
* maak1
*
*
* ye5 Standard Chinese: ,
* saai1
* lei4 Standard Chinese:
*
*
* etc.

There is evidence that the mouth radical in such characters can, over time, be replaced by a Signific, which indicates the meaning of the character. The new character is then a . For instance, , written with the signific , is instead written in older dictionaries as , with the mouth radical.

The development of new Cantonese characters is interesting linguistically, because they have never been subject to government standardization, in contrast to Standard Chinese, which has been regulated for over 2000 years. Therefore, a better understanding can be gained of the linguistics of how Chinese writing evolves, and how the script is modelled perceptually by the Chinese reader.

Derived characters

Other common characters are unique to Cantonese or deviated from their Mandarin usage, including: etc.

The words represented by these characters are sometimes cognates with pre-existing Chinese words. However, their colloquial Cantonese pronunciations have diverged from formal Cantonese pronunciations. For example, in formal written Chinese, is the character used for "without". In spoken Cantonese, has the same usage, meaning, and pronunciation as , differing only by tone. represents the spoken Cantonese form of the word "without", while represents the word used in Mandarin and formal Chinese writing. However, is still used in some instances in spoken Cantonese, like . Another example is the /, which means "to come". is used in formal writing; is the spoken Cantonese form.

Colloquial usage

As not all Cantonese words can be found in current encoding system, or the users simply don't know how to enter such characters on the computer, in very informal speech, Cantonese tends to use extremely simple romanization , symbols , homophones , and Chinese characters with that have different meanings in Mandarin to compose a message.
For example, "你喺嗰喥好喇, 千祈咪搞佢啲嘢。" is often written in easier form as "你o係果度好喇, 千祈咪搞佢D野。"