Peter Schuller | 12 Jul 2007 10:20
Gravatar

using tablewriter with binary data

Hello,

is there a way to successfully use tablewriter to insert binary data?

The problem I am seeing is that applying escape_binary() to the tuple
value yields an extra level of escaping (which breaks), and either not
escaping at all or using sqlesc() breaks as expected as soon as a null
byte is encountered.

I looked at the source and the tablewriter seems to do it's own
escaping of a select few characters (tabs, whitespace, backslash, a
few more), but I see no way to make this either generic (handling
anything) or optional.

Any input would be appreciated.

Thanks!

--

-- 
/ Peter Schuller

PGP userID: 0xE9758B7D or 'Peter Schuller <peter.schuller@...>'
Key retrieval: Send an E-Mail to getpgpkey@...
E-Mail: peter.schuller@... Web: http://www.scode.org

_______________________________________________
Libpqxx-general mailing list
Libpqxx-general@...
(Continue reading)

Peter Schuller | 12 Jul 2007 10:54
Gravatar

Re: using tablewriter with binary data

> I looked at the source and the tablewriter seems to do it's own
> escaping of a select few characters (tabs, whitespace, backslash, a
> few more), but I see no way to make this either generic (handling
> anything) or optional.

So I don't know what I was on; I misread it. It seems to also escape
all high bit bytes - but not NULL bytes. I see no mention of NULL
bytes (only NULL column values) in the PostgreSQL documentation for
COPY, but presume that escaping them as \x00 or \000 would be
supported by the server.

Am I missing something or is this a change that needs to be made
before tablewriter can support binary data?

Thanks,

--

-- 
/ Peter Schuller

PGP userID: 0xE9758B7D or 'Peter Schuller <peter.schuller@...>'
Key retrieval: Send an E-Mail to getpgpkey@...
E-Mail: peter.schuller@... Web: http://www.scode.org

_______________________________________________
Libpqxx-general mailing list
Libpqxx-general@...
http://gborg.postgresql.org/mailman/listinfo/libpqxx-general
(Continue reading)

Jeroen T. Vermeulen | 12 Jul 2007 16:07
Picon
Picon
Favicon

Re: using tablewriter with binary data

On Thu, July 12, 2007 15:54, Peter Schuller wrote:

> So I don't know what I was on; I misread it. It seems to also escape
> all high bit bytes - but not NULL bytes. I see no mention of NULL
> bytes (only NULL column values) in the PostgreSQL documentation for
> COPY, but presume that escaping them as \x00 or \000 would be
> supported by the server.

Oops, you're right.  I've fixed this in revision [1265].  Thanks for a
thorough report.

There is a snapshot at http://pqxx.org/~jtv/tmp/pqxx/ but note that it
still has some problems, particularly with cursors.

Jeroen
Peter Schuller | 13 Jul 2007 12:55
Gravatar

Re: using tablewriter with binary data

> Oops, you're right.  I've fixed this in revision [1265].  Thanks for a
> thorough report.

Thanks! However, an issue remains.

In this case I am writing to a table with one column being a bytea
(rather than a varchar). The r1265 change unfortunately does not solve
the problem, since there seem to be two levels of unescaping involved.

Looking at the output of pg_dump, binary values have their backslash
escaped aswell (so \\000 instead of \000). I will admit that I find
the PostgreSQL's escaping to be quite strange and I never understood
why there is not simply a single escaping scheme regardless of
binary/non-binary data. In this particular case I suppose one level of
unescaping is being applied by the COPY receiver, and the second level
further into the backend where newlines and such need not be escaped,
but binary data does.

In order to make the table writer work in the case of inserting binary
data into a byteaa column I had to further make this modification:

-      R += '\\';
+      R += "\\\\";

Note that one specifically should not do this in the escaped character
case (i.e., newlines are \n, not \\n), which is consistent with the
above mentioned theory on escaping.

This solves the problem in the case of inserting binary data into a
bytea column, and still seems to work in the case of varchar:s (i.e.,
(Continue reading)

Jeroen T. Vermeulen | 13 Jul 2007 14:00
Picon
Picon
Favicon

Re: using tablewriter with binary data

On Fri, July 13, 2007 17:55, Peter Schuller wrote:

> In order to make the table writer work in the case of inserting binary
> data into a byteaa column I had to further make this modification:
>
> -      R += '\\';
> +      R += "\\\\";
>
> Note that one specifically should not do this in the escaped character
> case (i.e., newlines are \n, not \\n), which is consistent with the
> above mentioned theory on escaping.

Okay, I've made the change.  As it happens, I believe this very issue came
up in the pgsql-hackers list in the past few days, but I did not have time
to read the posts.

One thing was not 100% clear to me from your message: is a nul byte "\000"
or "\\000"?

Jeroen
Peter Schuller | 13 Jul 2007 15:05
Gravatar

Re: using tablewriter with binary data

> Okay, I've made the change.  As it happens, I believe this very issue came
> up in the pgsql-hackers list in the past few days, but I did not have time
> to read the posts.

Interesting; I will have a look.

> One thing was not 100% clear to me from your message: is a nul byte "\000"
> or "\\000"?

The representation of the NULL byte in the output of pg_dump was the
string composed of the characters with ASCII values 0x5c, 0x5c, 0x30,
0x30, 0x30, which visually prints (without any escaping) as:

  \\000

--

-- 
/ Peter Schuller

PGP userID: 0xE9758B7D or 'Peter Schuller <peter.schuller@...>'
Key retrieval: Send an E-Mail to getpgpkey@...
E-Mail: peter.schuller@... Web: http://www.scode.org

_______________________________________________
Libpqxx-general mailing list
Libpqxx-general@...
http://gborg.postgresql.org/mailman/listinfo/libpqxx-general
Jeroen T. Vermeulen | 13 Jul 2007 19:47
Picon
Picon
Favicon

Re: using tablewriter with binary data

On Fri, July 13, 2007 20:05, Peter Schuller wrote:

>> One thing was not 100% clear to me from your message: is a nul byte
>> "\000"
>> or "\\000"?
>
> The representation of the NULL byte in the output of pg_dump was the
> string composed of the characters with ASCII values 0x5c, 0x5c, 0x30,
> 0x30, 0x30, which visually prints (without any escaping) as:
>
>   \\000

Okay, then what we have in the repository now should work.

Jeroen
Peter Schuller | 16 Jul 2007 13:03
Gravatar

Re: using tablewriter with binary data

> Okay, then what we have in the repository now should work.

Alright, this time I have tested it more properly with all possible
byte values. The current version breaks for at least backslashes.

I would suggest completely eliminating the special casing of certain
characters - that is, never even using escapechar(). Since the data
being worked on is never likely to be read by humans - except when
debugging things like this - readability does not seem that important
compared to eliminating special cases for ranges.

I have tested a version of the table writer that does octal escaping
for all unprintable charcters, and it works for all possible byte
values and for both the BYTEA type and VARCHAR volumn type (all 2x256
permutations tested).

--

-- 
/ Peter Schuller

PGP userID: 0xE9758B7D or 'Peter Schuller <peter.schuller@...>'
Key retrieval: Send an E-Mail to getpgpkey@...
E-Mail: peter.schuller@... Web: http://www.scode.org

_______________________________________________
Libpqxx-general mailing list
Libpqxx-general@...
http://gborg.postgresql.org/mailman/listinfo/libpqxx-general
(Continue reading)

Peter Schuller | 16 Jul 2007 13:32
Gravatar

Re: using tablewriter with binary data

> Alright, this time I have tested it more properly with all possible
> byte values. The current version breaks for at least backslashes.

Apologies for the storm of E-Mail:s. Just to clarify; the error here
was:

   ERROR:  invalid input syntax for type bytea

Thus, BYTEA specific. I realize the handling of backslashes was
probably tested and never broken for VARCHAR:s. The intent with my
patch was to make it work with both.

--

-- 
/ Peter Schuller

PGP userID: 0xE9758B7D or 'Peter Schuller <peter.schuller@...>'
Key retrieval: Send an E-Mail to getpgpkey@...
E-Mail: peter.schuller@... Web: http://www.scode.org

_______________________________________________
Libpqxx-general mailing list
Libpqxx-general@...
http://gborg.postgresql.org/mailman/listinfo/libpqxx-general
Peter Schuller | 16 Jul 2007 13:47
Gravatar

Re: using tablewriter with binary data

> Apologies for the storm of E-Mail:s. Just to clarify; the error here
> was:

Fourth E-Mail... this is embarrassing. At least one issue remains
w.r.t. the interpretation of quoted values. I will report back when
properly tested/investigated. Do not apply the patch...

--

-- 
/ Peter Schuller

PGP userID: 0xE9758B7D or 'Peter Schuller <peter.schuller@...>'
Key retrieval: Send an E-Mail to getpgpkey@...
E-Mail: peter.schuller@... Web: http://www.scode.org

_______________________________________________
Libpqxx-general mailing list
Libpqxx-general@...
http://gborg.postgresql.org/mailman/listinfo/libpqxx-general
Jeroen T. Vermeulen | 16 Jul 2007 20:10
Picon
Picon
Favicon

Re: using tablewriter with binary data

On Mon, July 16, 2007 18:47, Peter Schuller wrote:
>> Apologies for the storm of E-Mail:s. Just to clarify; the error here
>> was:
>
> Fourth E-Mail... this is embarrassing. At least one issue remains
> w.r.t. the interpretation of quoted values. I will report back when
> properly tested/investigated. Do not apply the patch...

Just in time.  Anxiously awaiting further news.  :-)

Jeroen
Peter Schuller | 16 Jul 2007 13:28
Gravatar

Re: using tablewriter with binary data

> I would suggest completely eliminating the special casing of certain

Patch against trunk:

   http://distfiles.scode.org/pqxx/tablewriter_octalescaping.diff

--

-- 
/ Peter Schuller

PGP userID: 0xE9758B7D or 'Peter Schuller <peter.schuller@...>'
Key retrieval: Send an E-Mail to getpgpkey@...
E-Mail: peter.schuller@... Web: http://www.scode.org

_______________________________________________
Libpqxx-general mailing list
Libpqxx-general@...
http://gborg.postgresql.org/mailman/listinfo/libpqxx-general

Gmane