Helmut Eller | 29 Oct 2011 13:18

string-to-octets

I'm wondering how STRING-TO-OCTETS can be used with a fixed sized byte
buffer.  For example we want to write a long string to a byte stream
using a fixed size byte buffer.  STRING-TO-OCTETS seems to return the
buffer and the number of bytes written.  E.g.

(let ((buffer (make-array 20 :element-type '(unsigned-byte 8)))
      (string (make-string 100  :initial-element #\a)))
  (stream:string-to-octets string :external-format :utf32 :buffer buffer))

Returns a new vector (non-eq to buffer) and 404.

Shouldn't one return value also indicate how many characters were
converted so that the conversion can be continued at that character
offset without allocating a fresh buffer?

Helmut

_______________________________________________
cmucl-imp mailing list
cmucl-imp <at> cmucl.cons.org
http://lists.zs64.net/mailman/listinfo/cmucl-imp

Raymond Toy | 29 Oct 2011 17:31
Picon

Re: string-to-octets

On 10/29/11 4:18 AM, Helmut Eller wrote:
> I'm wondering how STRING-TO-OCTETS can be used with a fixed sized byte
> buffer.  For example we want to write a long string to a byte stream
> using a fixed size byte buffer.  STRING-TO-OCTETS seems to return the
> buffer and the number of bytes written.  E.g.
>
> (let ((buffer (make-array 20 :element-type '(unsigned-byte 8)))
>       (string (make-string 100  :initial-element #\a)))
>   (stream:string-to-octets string :external-format :utf32 :buffer buffer))
>
> Returns a new vector (non-eq to buffer) and 404.
>
> Shouldn't one return value also indicate how many characters were
> converted so that the conversion can be continued at that character
> offset without allocating a fresh buffer?
>
Good question.  It seems that string-to-octets grew the buffer and the
new buffer has all of the converted characters and the original contains
just the that portion.

I'll have to look through the history to see why it is this way, but I
think you're right.  If a buffer is supplied, string-to-octets shouldn't
grow the buffer;  it should stop when the buffer is full.  Although, I
can see why it needs to:  the code doesn't know how many octets are
needed for a character until the character is converted, and by then we
may have exceeded the buffer, so the buffer gets a partially converted
character.

If you were going the other way (octets-to-string), there's
octets-to-string-counted that tells you how many octets were consumed
(Continue reading)

Raymond Toy | 30 Oct 2011 04:28
Picon

Re: string-to-octets

On 10/29/11 4:18 AM, Helmut Eller wrote:
> I'm wondering how STRING-TO-OCTETS can be used with a fixed sized byte
> buffer.  For example we want to write a long string to a byte stream
> using a fixed size byte buffer.  STRING-TO-OCTETS seems to return the
> buffer and the number of bytes written.  E.g.
>
> (let ((buffer (make-array 20 :element-type '(unsigned-byte 8)))
>       (string (make-string 100  :initial-element #\a)))
>   (stream:string-to-octets string :external-format :utf32 :buffer buffer))
>
>
Is this better:

(let ((buffer (make-array 19 :element-type '(unsigned-byte 8)))
           (string (make-string 100  :initial-element #\u+3b2)))
       (multiple-value-bind (b p i last)
           (stream:string-to-octets string :external-format :utf8
:buffer buffer)
         (values b p i last)))

#(206 178 206 178 206 178 206 178 206 178 206 178 206 178 206 178 206
178 206)
19
9
18

9 is the number of characters converted, 18 is the index+1 in the buffer
where the last valid octet was placed.  The last octet is the first
octect of the 2-octet utf8 encoding for #\u+3b2.

(Continue reading)

Helmut Eller | 30 Oct 2011 08:04

Re: string-to-octets

* Raymond Toy [2011-10-30 03:28] writes:

> On 10/29/11 4:18 AM, Helmut Eller wrote:
>> I'm wondering how STRING-TO-OCTETS can be used with a fixed sized byte
>> buffer.  For example we want to write a long string to a byte stream
>> using a fixed size byte buffer.  STRING-TO-OCTETS seems to return the
>> buffer and the number of bytes written.  E.g.
>>
>> (let ((buffer (make-array 20 :element-type '(unsigned-byte 8)))
>>       (string (make-string 100  :initial-element #\a)))
>>   (stream:string-to-octets string :external-format :utf32 :buffer buffer))
>>
>>
> Is this better:
>
> (let ((buffer (make-array 19 :element-type '(unsigned-byte 8)))
>            (string (make-string 100  :initial-element #\u+3b2)))
>        (multiple-value-bind (b p i last)
>            (stream:string-to-octets string :external-format :utf8
> :buffer buffer)
>          (values b p i last)))
>
> #(206 178 206 178 206 178 206 178 206 178 206 178 206 178 206 178 206
> 178 206)
> 19
> 9
> 18
>
> 9 is the number of characters converted, 18 is the index+1 in the buffer
> where the last valid octet was placed.  The last octet is the first
(Continue reading)

Raymond Toy | 30 Oct 2011 17:30
Picon

Re: string-to-octets

On 10/30/11 12:04 AM, Helmut Eller wrote:
> * Raymond Toy [2011-10-30 03:28] writes:
>
>> On 10/29/11 4:18 AM, Helmut Eller wrote:
>>> I'm wondering how STRING-TO-OCTETS can be used with a fixed sized byte
>>> buffer.  For example we want to write a long string to a byte stream
>>> using a fixed size byte buffer.  STRING-TO-OCTETS seems to return the
>>> buffer and the number of bytes written.  E.g.
>>>
>>> (let ((buffer (make-array 20 :element-type '(unsigned-byte 8)))
>>>       (string (make-string 100  :initial-element #\a)))
>>>   (stream:string-to-octets string :external-format :utf32 :buffer buffer))
>>>
>>>
>> Is this better:
>>
>> (let ((buffer (make-array 19 :element-type '(unsigned-byte 8)))
>>            (string (make-string 100  :initial-element #\u+3b2)))
>>        (multiple-value-bind (b p i last)
>>            (stream:string-to-octets string :external-format :utf8
>> :buffer buffer)
>>          (values b p i last)))
>>
>> #(206 178 206 178 206 178 206 178 206 178 206 178 206 178 206 178 206
>> 178 206)
>> 19
>> 9
>> 18
>>
>> 9 is the number of characters converted, 18 is the index+1 in the buffer
(Continue reading)

Helmut Eller | 30 Oct 2011 19:28

Re: string-to-octets

* Raymond Toy [2011-10-30 16:30] writes:

>> I'm not sure that I understand the purpose of the 18.  Is this something
>> that is needed later?
>
> Yeah, that was kind of messy.  This is what I currently have:
>
> * (let ((buffer (make-array 20 :element-type '(unsigned-byte 8)))
>            (string (make-string 100  :initial-element #\u+f012)))
>        (stream:string-to-octets string :external-format :utf8 :buffer
> buffer))
>
> #(239 128 146 239 128 146 239 128 146 239 128 146 239 128 146 239 128
> 146 239
>   128)
> 18
> 6
>
> So 18 is the number of valid octets actually written.  (The last two
> octets form an incomplete conversion.)   The 6 is the number of
> characters consumed to produce those 18 octets.
>
> For the case where no buffer is specified, a new buffer is created and
> the second return value is the number of octets written (same as the
> buffer length), and the third value is the number of characters (length
> of the string).
>
> Is that better?

Yes, looks good.
(Continue reading)

Helmut Eller | 30 Oct 2011 23:40

Re: string-to-octets

* Helmut Eller [2011-10-30 18:28] writes:

> * Raymond Toy [2011-10-30 16:30] writes:
>
>>> I'm not sure that I understand the purpose of the 18.  Is this something
>>> that is needed later?
>>
>> Yeah, that was kind of messy.  This is what I currently have:
>>
>> * (let ((buffer (make-array 20 :element-type '(unsigned-byte 8)))
>>            (string (make-string 100  :initial-element #\u+f012)))
>>        (stream:string-to-octets string :external-format :utf8 :buffer
>> buffer))
>>
>> #(239 128 146 239 128 146 239 128 146 239 128 146 239 128 146 239 128
>> 146 239
>>   128)
>> 18
>> 6
>>
>> So 18 is the number of valid octets actually written.  (The last two
>> octets form an incomplete conversion.)   The 6 is the number of
>> characters consumed to produce those 18 octets.
>>
>> For the case where no buffer is specified, a new buffer is created and
>> the second return value is the number of octets written (same as the
>> buffer length), and the third value is the number of characters (length
>> of the string).
>>
>> Is that better?
(Continue reading)

Raymond Toy | 31 Oct 2011 00:06
Picon

Re: string-to-octets

On 10/30/11 3:40 PM, Helmut Eller wrote:
> * Helmut Eller [2011-10-30 18:28] writes:
>
>> * Raymond Toy [2011-10-30 16:30] writes:
>>
>>> Is that better?
>> Yes, looks good.
> BTW, it would also be useful to have a parameter to specify a start
> position in buffer.  With that it would be possible to buffer up
> multiple small chunks.
That could be solved by passing in displaced arrays.  But you can't
because string-to-octets wants a simple-array.   That should probably be
changed.

I'll into adding a :buffer-start parameter.

Ray

_______________________________________________
cmucl-imp mailing list
cmucl-imp <at> cmucl.cons.org
http://lists.zs64.net/mailman/listinfo/cmucl-imp


Gmane