Pieter van der Eems | 3 Aug 2012 09:33

function_clause error in HTTP request

Hello,

We have an IOS client using TouchDB which syncs with CouchDB (1.2) on
our server.
The IOS client is a black box for me so I can't do any debugging there.

We have a problem that some documents created on TouchDB are nog being
synced with the server.
(most documents are synced without any troubels).

I've set the log level to 'debug' to get as much information from the
server as I can.
The log show this when TouchDB tries to sync the issue to CouchDB:

[Fri, 03 Aug 2012 07:12:52 GMT] [error] [<0.2639.144>] function_clause
error in HTTP request

Full debug pasted on pastebin: http://pastebin.com/QefqXZsN

Any ideas as to what is causing this?

Greetings,
Pieter.

Jens Alfke | 3 Aug 2012 19:24
Favicon
Gravatar

Re: function_clause error in HTTP request


On Aug 3, 2012, at 12:33 AM, Pieter van der Eems <p.van.der.eems <at> interactiveblueprints.nl> wrote:

> We have an IOS client using TouchDB which syncs with CouchDB (1.2) on
> our server.
> The IOS client is a black box for me so I can't do any debugging there.

The HTTP headers in the logs show that it's TouchDB 0.8; the latest release is 0.93. If this is an app in the
wild it's understandable that it's not running the latest release, but there have been several multipart
upload bugs fixed since then (where "then" = June 20) that _might_ be causing this. 

IIRC, most of those bugs involved HTTP auth; is that involved here? The headers in the pastie don't include
"Authorization:" but that might have been stripped from the log for security purposes.

When I was debugging these, it was the CouchDB side that was the black box :) so I'm curious to know what the
CouchDB experts make of the logs you captured, i.e. what they imply might have been wrong with the
formatting of the multipart data.

	 [Fri, 03 Aug 2012 07:12:52 GMT] [error] [<0.2639.144>] function_clause error in HTTP request
	 [Fri, 03 Aug 2012 07:12:52 GMT] [info] [<0.2639.144>] Stacktrace: [{couch_db,write_streamed_attachment,
	                                     [<0.4031.144>,
	                                      #Fun<couch_doc.16.119974875>,-1576],
	                                     [{file,
	                                       "/opt/build-couchdb/dependencies/couchdb/src/couchdb/couch_db.erl"},
	                                      {line,1031}]},

In the CouchDB sources I have on hand (which are not up-to-date with 1.2) write_streamed_attachment looks like
> write_streamed_attachment(_Stream, _F, 0) ->
>     ok;
> write_streamed_attachment(Stream, F, LenLeft) when LenLeft > 0 ->
(Continue reading)

Paul Davis | 3 Aug 2012 19:36
Picon
Gravatar

Re: function_clause error in HTTP request

On Fri, Aug 3, 2012 at 12:24 PM, Jens Alfke <jens@...> wrote:
>
> On Aug 3, 2012, at 12:33 AM, Pieter van der Eems <p.van.der.eems <at> interactiveblueprints.nl> wrote:
>
>> We have an IOS client using TouchDB which syncs with CouchDB (1.2) on
>> our server.
>> The IOS client is a black box for me so I can't do any debugging there.
>
> The HTTP headers in the logs show that it's TouchDB 0.8; the latest release is 0.93. If this is an app in the
wild it's understandable that it's not running the latest release, but there have been several multipart
upload bugs fixed since then (where "then" = June 20) that _might_ be causing this.
>
> IIRC, most of those bugs involved HTTP auth; is that involved here? The headers in the pastie don't include
"Authorization:" but that might have been stripped from the log for security purposes.
>
> When I was debugging these, it was the CouchDB side that was the black box :) so I'm curious to know what the
CouchDB experts make of the logs you captured, i.e. what they imply might have been wrong with the
formatting of the multipart data.
>
>          [Fri, 03 Aug 2012 07:12:52 GMT] [error] [<0.2639.144>] function_clause error in HTTP request
>          [Fri, 03 Aug 2012 07:12:52 GMT] [info] [<0.2639.144>] Stacktrace: [{couch_db,write_streamed_attachment,
>                                              [<0.4031.144>,
>                                               #Fun<couch_doc.16.119974875>,-1576],
>                                              [{file,
>                                                "/opt/build-couchdb/dependencies/couchdb/src/couchdb/couch_db.erl"},
>                                               {line,1031}]},
>
> In the CouchDB sources I have on hand (which are not up-to-date with 1.2) write_streamed_attachment
looks like
>> write_streamed_attachment(_Stream, _F, 0) ->
(Continue reading)

Jens Alfke | 3 Aug 2012 20:14
Favicon
Gravatar

Re: function_clause error in HTTP request


On Aug 3, 2012, at 10:36 AM, Paul Davis <paul.joseph.davis@...> wrote:

> I think it was when CouchDB
> receives an attachment that's gzipped it doesn't bother doing an
> "gunzip > /dev/null" type operation to get the identity length and
> then when it sends the attachment to something that doesn't understand
> gzip compression there's a mismatch in what lengths are expected.

Thanks for the info — this is relevant to my interests. When TouchDB PUTs a gzipped attachment body it
marks it as encoded in the JSON _attachments dict; so CouchDB shouldn't be needing to decompress it right
then, since it can store it in the gzipped form (unless it wants to validate the data integrity first?)

Also, the Content-Length of the attachment MIME body is the gzipped length, not the uncompressed length.

> Or something along those lines.

Anyone got a bug number I could look at? I'd really appreciate it!

—Jens
Jens Alfke | 6 Aug 2012 20:09
Favicon
Gravatar

Re: function_clause error in HTTP request


On Aug 3, 2012, at 10:36 AM, Paul Davis <paul.joseph.davis@...> wrote:

> I've seen errors like this before. IIRC, the underlying issue is that
> there's a bug in the attachment handling related to gzipped vs
> identity content lengths. Specifically I think it was when CouchDB
> receives an attachment that's gzipped it doesn't bother doing an
> "gunzip > /dev/null" type operation to get the identity length and
> then when it sends the attachment to something that doesn't understand
> gzip compression there's a mismatch in what lengths are expected.

I searched for this in Jira and found COUCHDB-1177
<https://issues.apache.org/jira/browse/COUCHDB-1177> which has the exact same backtrace,
although nothing in its description mentions attachments or gzip. 

It was fixed last May, but there is a later comment from December saying it still occurs with 1.1.1.
I added a new comment referencing the issue replicating with TouchDB.

Anything I can do here? As it stands this is a serious compatibility issue for TouchDB and I want to fix it
before the next beta. Would disabling transmitting gzipped attachments help?

—Jens
Pieter van der Eems | 6 Aug 2012 20:31

Re: function_clause error in HTTP request

2012/8/6 Jens Alfke <jens@...>:
>
> On Aug 3, 2012, at 10:36 AM, Paul Davis
<paul.joseph.davis@...> wrote:
>
>> I've seen errors like this before. IIRC, the underlying issue is that
>> there's a bug in the attachment handling related to gzipped vs
>> identity content lengths. Specifically I think it was when CouchDB
>> receives an attachment that's gzipped it doesn't bother doing an
>> "gunzip > /dev/null" type operation to get the identity length and
>> then when it sends the attachment to something that doesn't understand
>> gzip compression there's a mismatch in what lengths are expected.
>
> I searched for this in Jira and found COUCHDB-1177
<https://issues.apache.org/jira/browse/COUCHDB-1177> which has the exact same backtrace,
although nothing in its description mentions attachments or gzip.

We checked the IOS/TouchDB side and no attachments were gzipped.
I did find a similar bug there:
https://github.com/couchbaselabs/TouchDB-iOS/issues/116

Pieter

Pieter van der Eems | 7 Aug 2012 12:27

Re: function_clause error in HTTP request

Jens looked at this problem from the TouchDB side and concluded that
it is a bug on the CouchDB side. I would have updated the Jira issues
that look similar to our problem
(https://issues.apache.org/jira/browse/COUCHDB-1177> but Jira is down
at the moment.

Is this list the correct place to discuss bugs or should I join the
CouchDB-Dev mailing-list?

Pieter.

2012/8/6 Pieter van der Eems <p.van.der.eems@...>:
> 2012/8/6 Jens Alfke <jens@...>:
>>
>> On Aug 3, 2012, at 10:36 AM, Paul Davis
<paul.joseph.davis@...> wrote:
>>
>>> I've seen errors like this before. IIRC, the underlying issue is that
>>> there's a bug in the attachment handling related to gzipped vs
>>> identity content lengths. Specifically I think it was when CouchDB
>>> receives an attachment that's gzipped it doesn't bother doing an
>>> "gunzip > /dev/null" type operation to get the identity length and
>>> then when it sends the attachment to something that doesn't understand
>>> gzip compression there's a mismatch in what lengths are expected.
>>
>> I searched for this in Jira and found COUCHDB-1177
<https://issues.apache.org/jira/browse/COUCHDB-1177> which has the exact same backtrace,
although nothing in its description mentions attachments or gzip.
>
> We checked the IOS/TouchDB side and no attachments were gzipped.
(Continue reading)

Jens Alfke | 8 Aug 2012 19:07
Favicon
Gravatar

Re: function_clause error in HTTP request

I've figured this out, thanks to Robert Newson looking at a TCP dump Pieter van der Eems sent him. It turns out
to be an issue with CouchDB that I already knew about but had forgotten would bite in this particular
circumstance. Specifically, CouchDB isn't associating the MIME bodies with the attachments
correctly; it gets them mixed up. As a result it gets confused about the lengths and blows up.

The issue is with CouchDB's multipart support, specifically the way in which it matches MIME bodies to
attachment names. The IMHO correct way to do this would be to look at the filename in the
Content-Disposition header, and this is in fact what TouchDB generates:
	Content-Disposition: attachment; filename="20120808-092628.png"
But CouchDB ignores this header. Instead it assumes that the order in which the MIME bodies appear matches
the order in which the attachment objects appear in the _attachments object.

The problem with this is that JSON objects (dictionaries) are _not_ ordered collections. I know that
Erlang's implementation of them (as linked lists of key/value pairs) happens to be ordered, and I think
some JavaScript implementations have the side effect of preserving order; but in many languages these
are implemented as hash tables and genuinely unordered.

So when TouchDB serializes the NSDictionary object representing the attachments, it has _no idea_ in what
order the JSON encoder will write the keys. This means it can't comply with CouchDB's ordering
requirement because it doesn't know what order in which to write out the attachments. I believe I am going
to have to work around this by using a custom JSON encoder that I can make write out dictionary entries in a
known (sorted?) order.

I've filed this as COUCHDB-1521. As I said, I can work around it, but I really think this should be fixed as
it's a hurdle for interoperability.

(Ironically I ran into the flip side of this issue last year and filed a bug on it (COUCHDB-1368): when
_receiving_ a multipart body from CouchDB, it's difficult to match attachments with their MIME bodies
because CouchDB doesn't put any headers into the MIME bodies to indicate filenames; the only clue is the
ordering of the entries in the _attachments dictionary, and that ordering is lost when Cocoa's JSON
(Continue reading)

Robert Newson | 8 Aug 2012 19:22
Picon
Favicon
Gravatar

Re: function_clause error in HTTP request

Ah! that's great. Yes, the very peculiar "follows":true api assumes the ordering (which is fine for the
erlang replicator because it's to/from json does not reorder objects).

So, that closes the last gap of comprehension. The request *is* malformed but in a way that's hard to justify.

Having read much of the attachment streaming code and the multipart parsing code, and the manner that they
connect, the fix isn't going to be easy but it feels necessary.

Each MP part can have http headers, including Content-Length, which points to a way forward.

B.

On 8 Aug 2012, at 18:07, Jens Alfke wrote:

> I've figured this out, thanks to Robert Newson looking at a TCP dump Pieter van der Eems sent him. It turns
out to be an issue with CouchDB that I already knew about but had forgotten would bite in this particular
circumstance. Specifically, CouchDB isn't associating the MIME bodies with the attachments
correctly; it gets them mixed up. As a result it gets confused about the lengths and blows up.
> 
> The issue is with CouchDB's multipart support, specifically the way in which it matches MIME bodies to
attachment names. The IMHO correct way to do this would be to look at the filename in the
Content-Disposition header, and this is in fact what TouchDB generates:
> 	Content-Disposition: attachment; filename="20120808-092628.png"
> But CouchDB ignores this header. Instead it assumes that the order in which the MIME bodies appear matches
the order in which the attachment objects appear in the _attachments object.
> 
> The problem with this is that JSON objects (dictionaries) are _not_ ordered collections. I know that
Erlang's implementation of them (as linked lists of key/value pairs) happens to be ordered, and I think
some JavaScript implementations have the side effect of preserving order; but in many languages these
are implemented as hash tables and genuinely unordered.
(Continue reading)

Robert Newson | 8 Aug 2012 19:24
Picon
Favicon
Gravatar

Re: function_clause error in HTTP request

matching on name is simpler approach though I think that the content-disposition header is not currently considered.

I'd rather have a think about a more straightforward handling of multipart/related for a bit.

On 8 Aug 2012, at 18:22, Robert Newson wrote:

> Ah! that's great. Yes, the very peculiar "follows":true api assumes the ordering (which is fine for the
erlang replicator because it's to/from json does not reorder objects).
> 
> So, that closes the last gap of comprehension. The request *is* malformed but in a way that's hard to justify.
> 
> Having read much of the attachment streaming code and the multipart parsing code, and the manner that they
connect, the fix isn't going to be easy but it feels necessary.
> 
> Each MP part can have http headers, including Content-Length, which points to a way forward.
> 
> B.
> 
> 
> On 8 Aug 2012, at 18:07, Jens Alfke wrote:
> 
>> I've figured this out, thanks to Robert Newson looking at a TCP dump Pieter van der Eems sent him. It turns
out to be an issue with CouchDB that I already knew about but had forgotten would bite in this particular
circumstance. Specifically, CouchDB isn't associating the MIME bodies with the attachments
correctly; it gets them mixed up. As a result it gets confused about the lengths and blows up.
>> 
>> The issue is with CouchDB's multipart support, specifically the way in which it matches MIME bodies to
attachment names. The IMHO correct way to do this would be to look at the filename in the
Content-Disposition header, and this is in fact what TouchDB generates:
>> 	Content-Disposition: attachment; filename="20120808-092628.png"
(Continue reading)

Jens Alfke | 8 Aug 2012 19:26
Favicon
Gravatar

Re: function_clause error in HTTP request

Robert,

> Having read much of the attachment streaming code and the multipart parsing code, and the manner that they
connect, the fix isn't going to be easy but it feels necessary.
> Each MP part can have http headers, including Content-Length, which points to a way forward.

Great. Actually the workaround on my side is going to be easier than I thought, because I just realized I've
already written a custom JSON encoder, for the purpose of generating canonical JSON (necessary for
getting consistent revision IDs for the same document bodies.) And a big part of what that encoder does is
write dictionary keys in sorted order. So I can use that to write the main document body, and then make sure
to write the attachment bodies sorted in the same order by filename. (Pieter, I think I can have this pushed
out pretty soon, probably later today.)

—Jens

Robert Newson | 8 Aug 2012 19:39
Picon
Favicon
Gravatar

Re: function_clause error in HTTP request

That's good news, but couchdb should remove this ambiguity. Perhaps the quickest thing is to add a
"follows_order":[] in the json blob which, since the value is an array, can be unambiguously evaluated.

I'm keener on this idea, though: the JSON blob in a multipart/related PUT does not need to mention new
attachments at all (though it does need stubs to preserve existing attachments). All non-first parts are
explicitly intended to be added to the document. Those parts can have standard headers that tell us the
attachments name and expected length.

The further proposal, mentioned at the Boston Summit, would remove all the magical _-prefix values from
the document, which would motivate us to unpick this tangle even more completely.

B.

On 8 Aug 2012, at 18:26, Jens Alfke wrote:

> Robert,
> 
>> Having read much of the attachment streaming code and the multipart parsing code, and the manner that
they connect, the fix isn't going to be easy but it feels necessary.
>> Each MP part can have http headers, including Content-Length, which points to a way forward.
> 
> Great. Actually the workaround on my side is going to be easier than I thought, because I just realized I've
already written a custom JSON encoder, for the purpose of generating canonical JSON (necessary for
getting consistent revision IDs for the same document bodies.) And a big part of what that encoder does is
write dictionary keys in sorted order. So I can use that to write the main document body, and then make sure
to write the attachment bodies sorted in the same order by filename. (Pieter, I think I can have this pushed
out pretty soon, probably later today.)
> 
> —Jens
> 
(Continue reading)

Jens Alfke | 8 Aug 2012 19:55
Favicon
Gravatar

Re: function_clause error in HTTP request


On Aug 8, 2012, at 10:39 AM, Robert Newson
<rnewson@...<mailto:rnewson@...>> wrote:

 All non-first parts are explicitly intended to be added to the document. Those parts can have standard
headers that tell us the attachments name and expected length.

That sounds good. I'm already adding some code to add the standard headers to each part — I started with
Content-Length because it will help make any future issues like this one easier to diagnose, then I
decided I might as well add Content-Type and Content-Encoding since I've already got them in the
metadata. The only thing missing then is the decoded length, which I don't think there's a standard MIME
header for.

—Jens
Pieter van der Eems | 8 Aug 2012 21:45

Re: function_clause error in HTTP request

Gentlemen,
I must admit you have lost me in the last few exchanges, but I'm very
happy to hear that you are fixing this :-)

The release of our app and backend system is relying on a correctly
functioning touchDB <-> couchDB replication. I'm happy we can tell our
customer that his document synchronisation will be better soon.

Thank you very much and I hope to be testing the new code soon.

PIeter

2012/8/8 Jens Alfke <jens@...>:
>
> On Aug 8, 2012, at 10:39 AM, Robert Newson <rnewson@...> wrote:
>
>  All non-first parts are explicitly intended to be added to the document.
> Those parts can have standard headers that tell us the attachments name and
> expected length.
>
>
> That sounds good. I'm already adding some code to add the standard headers
> to each part — I started with Content-Length because it will help make any
> future issues like this one easier to diagnose, then I decided I might as
> well add Content-Type and Content-Encoding since I've already got them in
> the metadata. The only thing missing then is the decoded length, which I
> don't think there's a standard MIME header for.
>
> —Jens

(Continue reading)


Gmane