Re: utf16 branch
[I find repeating the same claims, I'll try to stop posting after
this]
|> In deciding to support unicode (or some other character set) you
|> should be aware that you have no control over what data the user
|> is expected to process, and if the user makes no use of the new
|> facility, there should be no penalty. However there is a penalty
|> in the current approach which may be unacceptable for original
|> user
|>
|
|Users have, and will always have, the option to read character data
|as raw 8-bit bytes and decode them into strings or byte arrays any
|way they see fit.
The whole point of using 8 bit mechanisms is to avoid an additional
encoding/decoding and/or copy step.
A lot of CMUCL code I've seen takes advantage of CMUCL the
implementation which is now slated to change and which assumes 8 bit
wide base-char.
Now converting to string will double the storage.
I will not be able to used mmap backed strings anymore.
| Moreover, you can and will continue to be able to create your own
|character stream classes which encapsulate an 8-bit byte stream and
|circumvent the default behavior of the default character stream.
|Users who
I will not be able to subclass fd-stream to provide efficient bivalent
streams like http or https streams.
I will not be able to call run-program and use `dup(2)'ed bivalent
input and output streams anymore, or expect the same speed from it.
All applications where CMUCL sits as a pipe stand to lose.
|care about speed and preform no or only light character processing
|can leave data in 8-bit arrays and pass them to C without any
|overhead if that is the format their C code expects.
Not if my data has to come from a lisp string. I have to do the
conversion in this case.
|Earlier you had stated that one of your reasons for generally
|disliking Unicode was that your character set of choice, ISCII, was
|not fully accommodated by Unicode.
When I referred to unicode as a bureaucracy I was talking more of the
regimes it imposes on the programmer.
But my primary concern is with the eager elimination of support for 8
bit strings, and all the advantages it entails, even though the
advantages may sound alien to many.
Especially when I think there is a reasonable implementation strategy
where unicode suport can be added without changing the base-char
implementation. [This would have been the path followed if there had
been an application to start with.]
There has been a UNICODE branch in CMUCL from early 2000s, using UTF-8
I'm surprised none of the people claiming a unicode CMUCL requirement
checked that out.