Michael Orlitzky | 19 Aug 02:52 2012

Network.Curl cookie jar madness

I'm one bug away from a working program and need some help. I wrote a
little utility that logs into LWN.net, retrieves an article, and creates
an epub out of it. Full code here:

  git clone http://michael.orlitzky.com/git/lwn-epub.git

This is the code that gets the login cookie:

  cj <- make_cookie_jar
  li_result <- log_in cj uname pword

  case li_result of
    Left err -> do
      let msg = "Failed to log in. " ++ err
      hPutStrLn stderr msg
    Right response_body -> do
      hPutStrLn stderr response_body

  return $ cfg { C.cookie_jar = Just cj }

Curl is making the request, but if I remove the (hPutStrLn stderr
response_body), it doesn't work! What's even more insane is, this works:

  hPutStrLn stderr response_body

and this doesn't:

  hPutStrLn stdout response_body

whaaaaaaatttttttt? I really don't want to dump the response body to
(Continue reading)

Iustin Pop | 19 Aug 03:00 2012

Re: Network.Curl cookie jar madness

On Sat, Aug 18, 2012 at 08:52:00PM -0400, Michael Orlitzky wrote:
> I'm one bug away from a working program and need some help. I wrote a
> little utility that logs into LWN.net, retrieves an article, and creates
> an epub out of it. Full code here:
> 
>   git clone http://michael.orlitzky.com/git/lwn-epub.git
> 
> This is the code that gets the login cookie:
> 
>   cj <- make_cookie_jar
>   li_result <- log_in cj uname pword
> 
>   case li_result of
>     Left err -> do
>       let msg = "Failed to log in. " ++ err
>       hPutStrLn stderr msg
>     Right response_body -> do
>       hPutStrLn stderr response_body
> 
>   return $ cfg { C.cookie_jar = Just cj }
> 
> Curl is making the request, but if I remove the (hPutStrLn stderr
> response_body), it doesn't work! What's even more insane is, this works:
> 
>   hPutStrLn stderr response_body
> 
> and this doesn't:
> 
>   hPutStrLn stdout response_body
> 
(Continue reading)

Michael Orlitzky | 19 Aug 03:50 2012

Re: Network.Curl cookie jar madness

On 08/18/2012 09:00 PM, Iustin Pop wrote:
> On Sat, Aug 18, 2012 at 08:52:00PM -0400, Michael Orlitzky wrote:
>>
>> Curl is making the request, but if I remove the (hPutStrLn stderr
>> response_body), it doesn't work! What's even more insane is, this works:
>>
>>   hPutStrLn stderr response_body
>>
>> and this doesn't:
>>
>>   hPutStrLn stdout response_body
>>
>> whaaaaaaatttttttt? I really don't want to dump the response body to
>> stderr, but I can't even begin to imagine what's going on here. Has
>> anyone got Network.Curl working with a cookie jar?
> 
> Is this perchance due to laziness? And the fact that stderr is not
> buffered by default, so all output is forced right then (forcing the
> evaluation), whereas stdout is buffered, so the output might only be
> made later (or even after you to an hFlush).
> 
> I'd try to make sure that response_body is fully evaluated before
> returning from the function.
> 
> Or I might be totally wrong, in which case sorry :)
> 

I thought so at first, but I've tried every trick I know to avoid it. If
I add an hFlush to the stdout version, it still fails. If I deepseq the
response_body (it's just a string, after all), it still fails.
(Continue reading)

Michael Orlitzky | 19 Aug 06:45 2012

Re: Network.Curl cookie jar madness

On 08/18/2012 08:52 PM, Michael Orlitzky wrote:
> I'm one bug away from a working program and need some help. I wrote a
> little utility that logs into LWN.net, retrieves an article, and creates
> an epub out of it.

I've created two pages where anyone can test this. The first just takes
any username and password via post and sets a session variable. The
second prints "Success." if the session variable is set, and "Failure."
if it isn't. The bash script,

  #!/bin/bash

  COOKIE_JAR='/tmp/network-curl-test-bash.txt'
  POST_DATA='username=foo&password=bar'
  URL1='http://michael.orlitzky.com/tmp/network-curl-test1.php'
  URL2='http://michael.orlitzky.com/tmp/network-curl-test2.php'

  echo 'Logging in...'
  curl --cookie-jar "${COOKIE_JAR}" \
       --data "${POST_DATA}" \
       "${URL1}"

  echo 'Retrieving second page...'
  curl --cookie "${COOKIE_JAR}" \
       "${URL2}"

works:

  $ ./bash-test.sh
  Logging in...
(Continue reading)

Iustin Pop | 19 Aug 18:06 2012

Re: Network.Curl cookie jar madness

On Sun, Aug 19, 2012 at 12:45:47AM -0400, Michael Orlitzky wrote:
> On 08/18/2012 08:52 PM, Michael Orlitzky wrote:
> > I'm one bug away from a working program and need some help. I wrote a
> > little utility that logs into LWN.net, retrieves an article, and creates
> > an epub out of it.
> 
> I've created two pages where anyone can test this. The first just takes
> any username and password via post and sets a session variable. The
> second prints "Success." if the session variable is set, and "Failure."
> if it isn't. The bash script,

[…]

> The attached haskell program using Network.Curl, doesn't:
> 
>   $ runghc haskell-test.hs
>   Logged in...
>   Failure.
> 
> Any help is appreciated =)

So, take this with a grain of salt: I've been bitten by curl (the
haskell bindings, I mean) before, and I don't hold the quality of the
library in great regard.

The libcurl documentation says: "When you set a file name with
CURLOPT_COOKIEJAR, that file name will be created and all received
cookies will be stored in it when curl_easy_cleanup(3) is called" (i.e.
at the end of a curl handle session). But even though the curl bindings
seem to run easy_cleanup on handles (initialize → mkCurl →
(Continue reading)

Iustin Pop | 19 Aug 18:58 2012

Re: Network.Curl cookie jar madness

On Sun, Aug 19, 2012 at 06:06:53PM +0200, Iustin Pop wrote:
> On Sun, Aug 19, 2012 at 12:45:47AM -0400, Michael Orlitzky wrote:
> > On 08/18/2012 08:52 PM, Michael Orlitzky wrote:
> > > I'm one bug away from a working program and need some help. I wrote a
> > > little utility that logs into LWN.net, retrieves an article, and creates
> > > an epub out of it.
> > 
> > I've created two pages where anyone can test this. The first just takes
> > any username and password via post and sets a session variable. The
> > second prints "Success." if the session variable is set, and "Failure."
> > if it isn't. The bash script,
> 
> […]
> 
> > The attached haskell program using Network.Curl, doesn't:
> > 
> >   $ runghc haskell-test.hs
> >   Logged in...
> >   Failure.
> > 
> > Any help is appreciated =)
> 
> So, take this with a grain of salt: I've been bitten by curl (the
> haskell bindings, I mean) before, and I don't hold the quality of the
> library in great regard.
> 
> The libcurl documentation says: "When you set a file name with
> CURLOPT_COOKIEJAR, that file name will be created and all received
> cookies will be stored in it when curl_easy_cleanup(3) is called" (i.e.
> at the end of a curl handle session). But even though the curl bindings
(Continue reading)

Michael Orlitzky | 19 Aug 20:54 2012

Re: Network.Curl cookie jar madness

On 08/19/2012 12:58 PM, Iustin Pop wrote:
> 
> On more investigation, this seems to be due to the somewhat careless use
> of Foreign.Concurrent; from the docs:
> 
>   “The finalizer will be executed after the last reference to the
>   foreign object is dropped. There is no guarantee of promptness, and in
>   fact there is no guarantee that the finalizer will eventually run at
>   all.”
> 
> Also, see http://hackage.haskell.org/trac/ghc/ticket/1364.
> 
> So it seems that the intended way of cleaning up curl handles is all
> fine and dandy if one doesn't require timely cleanup; in most cases,
> this is not needed, but for cookies it is broken.
> 
> I don't know what the proper solution is; either way, it seems that
> there should be a way to force the cleanup to be run, via
> finalizeForeignPtr, or requiring full manual handling of curl handles
> (instead of via finalizers).
> 
> Gah, native libs++.
> 

Wow, thanks for the in-depth analysis. I'll just switch to
Network.Browser or its conduit counterpart.

_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe <at> haskell.org
(Continue reading)

Brandon Allbery | 19 Aug 17:28 2012
Picon

Re: Network.Curl cookie jar madness

On Sat, Aug 18, 2012 at 8:52 PM, Michael Orlitzky <michael <at> orlitzky.com> wrote:
Curl is making the request, but if I remove the (hPutStrLn stderr
response_body), it doesn't work! What's even more insane is, this works:

  hPutStrLn stderr response_body

and this doesn't:

  hPutStrLn stdout response_body

whaaaaaaatttttttt? I really don't want to dump the response body to

At a guess, this is laziness and buffering interacting:  stderr is usually unbuffered since it's error or log output that one usually wants to see immediately; stdout is usually line buffered unless redirected, in which case it's block buffered.

The real issue is that you (or perhaps Curl) is being too lazy and not running the log_in until the result is actually needed; hPutStrLn is forcing it, but incompletely when it's buffered.  (Which strikes me as weird unless Curl is using unsafeInterleaveIO somewhere....)  You will need to force it or hold the handle open until the content is fully evaluated; if it's in the half-closed state that hGetContents sets, it's usually best to not close the handle explicitly but let the implicit lazy close do it.

--
brandon s allbery                                      allbery.b <at> gmail.com
wandering unix systems administrator (available)     (412) 475-9364 vm/sms

_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe <at> haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Gmane