Dear Michael
My name is Svay Leng. I'm Cambodian and also a member of KPP.
You mentioned that we bringing politics into Khmer scripts problem.
I would like to say that it is not a politic problem, it's a really
culture problem and we want to solve this problem not politically
if it is possible as we can do. Khmer language is one of our culture.
Cambodian people know that our language was come from Bali or
[Sanskrit] but our remote ancestor had change step by step to
meet Cambodian people needs for more than one thousand year. The
continuity in the transition of a culture is very
important. We can't deny our existing culture and introduce the
Virama model to our Khmer script. Technically speaking, we can
use the existing UNICODE table without Coeng and we don't need
your proof with the representation with this Virama model. Nobody
in Cambodia (may be except a very few people ) say that KA and
COENG KA are the same. In Latin I can write your name in minuscule
like michael everson and I can get the same pronunciation but
it is very impolite and in this [L]atin script we have "A"
and also"a" in the UNICODE table even if we can get
the same pronunciation. But in Khmer, the consonant can have different
pronunciation than the COENG, for example (july=KA+KA+COENG KA+DA+SRAK
AA) and (KA+KA+KA+DA+SRAK AA) have not same
pronunciation. If you say that is the same we can replace it but
in reality nobody can recognize that is "july". You
are a very good linguistic and do a bunch of contribution in IT
field to achieve UNICODE for many countries, I
think you can understand very very well. Frankly speaking technologies
are not used for changing the culture if this culture doesn't
against public order or public moral or other cultures in the
world. We need to talk to find out a best solution to make Khmer
people happy and also you and Bauhahn by seeing just only the
UNICODE Khmer script table, not by the representation , because
it is easy to use technologies to achieve it. Respect and understanding
of the the culture of each other are the best way to bring out
peace and coexistence in the world.
Regards,
Svay Leng
**********
Valuable contributions to the discussion of Khmer encoding were suggested in N2380R. In particular I was grateful to see it subscribed to a phonetic encoding: that is a significant step forward from earlier discussions (when glyph-based encoding was tenaciously held). Therefore in the report of N2394 it is written:
"Recommendation to Cambodian delegate:
Keeping the principle in mind, the ad-hoc recommends the Cambodian
delegate to take following actions.
1. Communicate with the author of n2385 to clarify the comments
each other.
2. Provide a new proposal of addition of new characters.
3. Propose draft text of additional note (such as annotation)
for unreasonably coded characters to avoid a misuse of the characters
by the users."
Subsequently the document N2406 has been offered. It also offers some additional valuable insights...but does not respond in the spirit or letter of the N2394 recommendation.
Many issues have been raised and each should be answered in time, but orderly and in a spirit of cooperation.
Obviously the most fundamental issue is the use of 17D2 (COENG). There is a discussion of that on page 6 and following of N2385 (and page 8 of N2406).
(a) It is not insignificant that the use of COENG in one stroke makes unnecessary the addition of about (5 * 16 = ) 80 characters (or spaces reserved for potential future subscript characters) proposed in N2380R. Hence in effect it is one character versus 80 ligatures.
(b) The existing Khmer block has only 25 unused slots. It will take a very big shoehorn to squeeze 80 characters into 25 slots (and this is not even taking into consideration ten minority script characters, ten divination lore numbers and other miscellaneous numbers that might be added in addition).
(c) Given the limited number of characters easily accessible
from a keyboard, an implementation something like COENG would
have to be improvised to accommodate such an unwieldy group of
characters. Under the COENG model of encoding (along with frequently
used non-Khmer characters) there are already about 150 characters
which need to be typed from a Khmer keyboard. Obliviating COENG
would result in an addition of greater than 100
characters.
(d) I am presently undertaking an interesting implementation
of a Khmer font which has an optional feature that would facilitate
transliteration of the Khmer script into Latin script. The only
difference between base characters
and subscript characters in this context is figuring out which
one is the last in the cluster (in order to attach a vowel to
it [and at this point the inherent vowels need to become explicit!])
(e) The linguists committee (upon which much of the existing Khmer Unicode encoding was based) was not composed of implementation experts; however, they were not offended by the COENG model.
More could be added but unfortunately I must close for now.
Sincerely,
Maurice
********
Dear Svay Leng:
>Cambodian people know
>that our language was come from Bali or [Sanskrit] but our
remote ancestor
>had change step by step to meet Cambodian people needs for
more than one
>thousand year. The continuity in the transition of a culture
is very
>important. We can't deny our existing culture and introduce
the Virama model
>to our Khmer script. Technically speaking, we can use the
existing UNICODE
>table without Coeng and we don't need your proof with the
representation
>with this Virama model. Nobody in Cambodia (may be except
a very few
>people ) say that KA and COENG KA are the same.
I very much appreciate your concern for cultural appropriateness of technologies. I certainly know there are cases in which one might be inclined to impose a virama model to scripts of the Brahmic family simply because they are from that family, but for which that may not make sense in relation to the way that that particular script actually works or the way that it is perceived within the primary culture in which it is used.
I do not personally have an opinion for or against one or the
other approach to implementation at this point as far as COENG
is concerned. (I have some opinions with respect to the representation
of vowels, but that is a different matter.) Before we go very
far in judging cultural validity, though, I wonder if it might
be helpful to step back and
consider a larger perspective. What I have in mind is that we
perhaps need to distinguish between two things:
1) the way users will perceive an implementation, which is based on their cultural models and their experience in using the implementation; and
2) the technical details regarding how an implementation actually works and produces the user experience that it does.
In this regard, you have said that Khmer users would not perceive a common identity between KA and COENG KA. Thus, I gather, you are suggesting that they should have distinct and comparable encodings, and that the current implementation in Unicode violates this.
Without suggesting what users should or shouldn't perceive
or hold as culturally valid, I'd like to ask the question as to
whether it is possible that implementations might be able to hide
the technical details of how the implementation is being accomplished?
For instance, I can easily envision overall implementations based
on the current definitions
in Unicode in which users are not at all aware that KA and COENG
KA do not have distinct and comparable encodings.
There seems to me to be a slight [analogy] with Latin case
pairs. There is a measure to which English speakers do view "a"
and "A" as being the same. Our history with type has
reinforced a distinction, but from typewriters through current
computer implementations both are typed using the same key on
the keyboard. Now, in terms of the encoding implementation, it
just happens that these are encoded as distinct characters of
comparable status. Note that it would have been possible to develop
Unicode and related implementations on another basis, one in which
"a" was represented as a variant of "A", or
vice versa. For example, imagine that "a" is encoded
as LATIN LETTER A and "A" is encoded as a sequence <
LATIN LETTER A, UPPER CASE MODIFIER >. Technically, this would
have been entirely possible. What is crucial to note, however,
is that users would not necessarily have to be aware of any difference
whatsoever. For instance, it would the possible to place two systems
side by side, one that
implemented one way (two comparable characters), and another that
implemented another way (a basic character and an casing modifier
character), and have these two systems implemented in such a way
that users could not distinguish them based on the user experience.
So, what I am asking is this:
While it may be true that the encoding implementation of Khmer
script does not closely follow the cultural perceptions that Khmer
people have of the script, might it be possible that this inconsistency
could be masked from
users so that they are not aware of it?
This would be somewhat comparable to the implementation of
Latin script not directly reflecting a relationship between upper
and lower case pairs that does exist. It would not be intended
to suggest that the encoded
implementation is how the script should be culturally perceived.
It would be merely to facilitate the quickest path to see successful
implementation of Khmer script in commercial and other software,
something which might be
of more immediate benefit to users (particularly keeping in mind
that various font implementations that could easily have hidden
this inconsistency from users were in process of development at
the time when these issues arose).
I realise in asking this question that there may be factors
I am not considering, as I am neither a member of the Khmer community
nor even thoroughly acquainted with the details of the script.
It is for this reason that I do not assert an answer one way or
another but rather present this to you as a question. I raise
this in case it may present a
possibility for finding some solution to this concern.
Kind regards,
- Peter
---------------------------------------------------------------------------
Peter Constable
Non-Roman Script Initiative, SIL International
7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA
Tel: +1 972 708 7485
E-mail: <peter_constable@sil.org>
***********
Maurice,
You are very aware that I understand your approach and have even, with your help, implemented Khmer Unicode support. Please see my comments to your thoughts below:
A) Having one character used with each base character instead of "ligatures" doesn't automatically make it the correct manner in which to handle Khmer text.
B) There is room available in the BMP of Unicode for us to be able to do the right thing. We could get an Extended Khmer block added. This is not just about saving the number of characters encoded. It is about handling the Khmer language correctly.
C) This is not about keyboarding. It is just as easy to use your dead key approach to enter a subscript character representation as it is to enter the COENG + character with the same keyboard.
D) I would say that the ability to convert the Khmer to Latin
is fascinating, but is not a factor that needs to be considered
for correctly encoding Khmer into Unicode. It will be possible
to take any correct encoding of a language and deduce rules for
some type of morphological transformation to another language.
The same rules will
apply to your transformation if the subscript letters are encoded.
Can you please present why the COENG model (based on virama)
is so critical for implementing "correct" Unicode use
of subscripts. If you remove constraints of the number of characters
and any data entry issues, what is the compelling reason to encode
Khmer in this manner? Frankly, I see no difference in the COENG
model results than
representing each of the subscript forms as an individual character...except
that encoding the subscript forms is more efficient and intuitive
to use.
Personally, I believe that resolving the COENG issue will resolve over 80% of the problems that the Cambodian [delegation] has with the current Khmer Unicode implementation. It would be great if we can tackle this issue and bring it to some resolution.
Regards to all,
Paul
***********
On 11/12/2001 06:51:08 AM Paul Nelson wrote:
>C) This is not about keyboarding. It is just as easy to
use your dead
>key approach to enter a subscript character representation
as it is to
>enter the COENG + character with the same keyboard.
>
>D) I would say that the ability to convert the Khmer to Latin
is
>fascinating, but is not a factor that needs to be considered
for
>correctly encoding Khmer into Unicode. It will be possible
to take any
>correct encoding of a language and deduce rules for some type
of
>morphological transformation to another language. The same
rules will
>apply to your transformation if the subscript letters are
encoded.
I definitely agree strongly with both of these points.
- Peter