George Rizkalla | 24 Jul 2012 17:01
Favicon

HTTP Pipelining Contributions

Hi all,

We are currently looking at contributing to libcurl's pipelining
implementation, and we were hoping to get your feedback on some areas we'd
like to help with.

Joe Mason will be contributing the bulk of the code changes (after he
wraps up the authentication changes he has been discussing on this list).
In the interim, I was hoping to run some of our design ideas by you.

The proposed algorithm involves balancing HTTP requests over multiple TCP
sockets, while avoiding use of HTTP pipelining in instances where we
believe errors are likely to occur, or where it is likely that there would
be a performance hit if pipelining is used.

Essentially, there are three areas we wish to address:

1. Controlling the maximum number of sockets in use
2. Controlling maximum pipeline length, and protocol behaviour when the
limit is reached
3. Providing the ability to blacklist sites or proxies that are known to
not support pipelining

1.  MAX SOCKETS CHANGES
While CURLMOPT_MAXCONNECTS imposes a limit on the number of sockets
persisted, it does not regulate the maximum number of sockets that are in
use at a given point in time.  It is proposed that CURLMOPT_MAXCONNECTS be
aliased/renamed CURLMOPT_MAXCONNECTS_SOFT, and that a new option,
CURLMOPT_MAXCONNECTS_HARD would be introduced.  The latter option would
regulate the maximum number of open sockets at any given point in time.
(Continue reading)

Vladimir Grishchenko | 24 Jul 2012 19:50
Picon
Favicon

RE: HTTP Pipelining Contributions


>
> 1. MAX SOCKETS CHANGES
> While CURLMOPT_MAXCONNECTS imposes a limit on the number of sockets
> persisted, it does not regulate the maximum number of sockets that are in
> use at a given point in time. It is proposed that CURLMOPT_MAXCONNECTS be
> aliased/renamed CURLMOPT_MAXCONNECTS_SOFT, and that a new option,
> CURLMOPT_MAXCONNECTS_HARD would be introduced. The latter option would
> regulate the maximum number of open sockets at any given point in time.
> This will be necessary to eliminate the need for queuing requests at the
> application layer when the application wishes to throttle the number of
> underlying sockets.
>

Assuming this new option (CURLMOPT_MAXCONNECTS_HARD) will be supported when not using pipelining, will
the client be able to prioritize queued requests or it is going to be a FIFO implementation?

Thanks,
Vladimir
 		 	   		  
-------------------------------------------------------------------
List admin: http://cool.haxx.se/list/listinfo/curl-library
Etiquette:  http://curl.haxx.se/mail/etiquette.html

George Rizkalla | 24 Jul 2012 20:33
Favicon

Re: HTTP Pipelining Contributions

On 7/24/12 1:50 PM, Vladimir Grishchenko wrote:
>
>Assuming this new option (CURLMOPT_MAXCONNECTS_HARD) will be supported
>when not using pipelining, will the client be able to prioritize queued
>requests or it is going to be a FIFO implementation?

I think a prioritized queue would be nice to have, but we would more
likely be focused on getting a FIFO implementation committed (in addition
to pipelining enhancements) before looking at adding prioritization.

---------------------------------------------------------------------
This transmission (including any attachments) may contain confidential information, privileged
material (including material protected by the solicitor-client or other applicable privileges), or
constitute non-public information. Any use of this information by anyone other than the intended
recipient is prohibited. If you have received this transmission in error, please immediately reply to
the sender and delete this information from your system. Use, dissemination, distribution, or
reproduction of this transmission by unintended recipients is not authorized and may be unlawful.

-------------------------------------------------------------------
List admin: http://cool.haxx.se/list/listinfo/curl-library
Etiquette:  http://curl.haxx.se/mail/etiquette.html

Joe Mason | 25 Jul 2012 16:40
Favicon
Gravatar

RE: HTTP Pipelining Contributions

> From: curl-library-bounces <at> cool.haxx.se [curl-library-bounces <at> cool.haxx.se] on
>  behalf of Vladimir Grishchenko [vladgri <at> hotmail.com]
> 
> Assuming this new option (CURLMOPT_MAXCONNECTS_HARD) will be supported when 
> not using pipelining, will the client be able to prioritize queued requests or
>  it is going to be a FIFO implementation?

FIFO to start, but we should consider expandability.

Can you suggest an interface that would let the client prioritize queued requests?

I can think of a couple off the top of my head:

1. Allow the client to provide a sort function for requests.  If one is set, then requests are stored in a
priority queue using that sort function. If not, use FIFO.

This has the advantage of being simple and expandable, but needs a call through a function pointer for every request.

2. Add an algorithm parameter (initially defining only FIFO), and have curl implement other algorithms internally.

This can be optimized well, but is a lot harder to extend and could cause bloat inside curl.

(In both of these cases, we can default to FIFO if the new option is not set, so the option can be added after the
initial work is done.)

Joe
---------------------------------------------------------------------
This transmission (including any attachments) may contain confidential information, privileged
material (including material protected by the solicitor-client or other applicable privileges), or
constitute non-public information. Any use of this information by anyone other than the intended
(Continue reading)

Vladimir Grishchenko | 25 Jul 2012 18:27
Picon
Favicon

Re: HTTP Pipelining Contributions

> FIFO to start, but we should consider expandability.
>
> Can you suggest an interface that would let the client prioritize queued 
> requests?
>
> I can think of a couple off the top of my head:
>
> 1. Allow the client to provide a sort function for requests.  If one is 
> set, then requests are stored in a priority queue using that sort 
> function. If not, use FIFO.
>
> This has the advantage of being simple and expandable, but needs a call 
> through a function pointer for every request.
>
> 2. Add an algorithm parameter (initially defining only FIFO), and have 
> curl implement other algorithms internally.
>
> This can be optimized well, but is a lot harder to extend and could cause 
> bloat inside curl.
>
> (In both of these cases, we can default to FIFO if the new option is not 
> set, so the option can be added after the initial work is done.)
>
> Joe

I was thinking along the lines that a new int "easy" option can be 
introduced for expressing priorities so that the clients can set whatever 
priority they want, internally libcurl would not attempt to interpret the 
value but would simply maintain a queue sorted by handle priority. If the 
client chooses not to set priorities then all handles will have the same 
(Continue reading)

Daniel Stenberg | 25 Jul 2012 00:13
Picon
Favicon
Gravatar

Re: HTTP Pipelining Contributions

On Tue, 24 Jul 2012, George Rizkalla wrote:

> We are currently looking at contributing to libcurl's pipelining 
> implementation, and we were hoping to get your feedback on some areas we'd 
> like to help with.

Awesome. You are most welcome here and I'm sure I'm not the only one who will 
appreciate this!

Let me give you a word of advice already now: work on making sure we can 
exercise most of the pipelining code and features in the test suite. 
Pipelining is tricky already by default, if we can't verify functionality and 
track down problems with the test suite things often tend to fade and break 
while we're not paying attention. I believe the existing pipelining support 
suffers a bit from this.

> 1.  MAX SOCKETS CHANGES

> While CURLMOPT_MAXCONNECTS imposes a limit on the number of sockets 
> persisted, it does not regulate the maximum number of sockets that are in 
> use at a given point in time.  It is proposed that CURLMOPT_MAXCONNECTS be 
> aliased/renamed CURLMOPT_MAXCONNECTS_SOFT, and that a new option, 
> CURLMOPT_MAXCONNECTS_HARD would be introduced.  The latter option would 
> regulate the maximum number of open sockets at any given point in time. This 
> will be necessary to eliminate the need for queuing requests at the 
> application layer when the application wishes to throttle the number of 
> underlying sockets.

I know this has been requested by others in the past - exactly to prevent the 
application from having to do the queing. I suspect you want to do this by not 
(Continue reading)

Joe Mason | 25 Jul 2012 17:00
Favicon
Gravatar

RE: HTTP Pipelining Contributions

> From: curl-library-bounces <at> cool.haxx.se [curl-library-bounces <at> cool.haxx.se] on
>  behalf of Daniel Stenberg [daniel <at> haxx.se]
> 
> On Tue, 24 Jul 2012, George Rizkalla wrote:
> 
> > We are currently looking at contributing to libcurl's pipelining
> > implementation, and we were hoping to get your feedback on some areas we'd
> > like to help with.
> 
> Awesome. You are most welcome here and I'm sure I'm not the only one who will
> appreciate this!
> 
> Let me give you a word of advice already now: work on making sure we can
> exercise most of the pipelining code and features in the test suite.
> Pipelining is tricky already by default, if we can't verify functionality and
> track down problems with the test suite things often tend to fade and break
> while we're not paying attention. I believe the existing pipelining support
> suffers a bit from this.

I'm working on unit tests for the auth callbacks right now, and learning how the test suite works as I go, so I
should be in good shape to make use of it for pipelining when the time comes.

> This sounds fine too. I'm a bit doubtful about the ncesssity for the
> CURLMOPT_*_PENALTY_SIZE options though. They seem like options that simply
> nobody would touch if we would claim the defaults are somewhat sensible. Or do
> you see any use-case in your world where you will actually change these
> according to some scheme or acquired knowledge in the application?

We're going to want to do a lot of performance profiling to find the optimal default (probably refining it
between releases), and it would be convenient if we could just update one setting in the app with the
(Continue reading)

Fabian Keil | 25 Jul 2012 17:44
Picon
Favicon

Re: HTTP Pipelining Contributions

Joe Mason <jmason <at> rim.com> wrote:

> > From: curl-library-bounces <at> cool.haxx.se [curl-library-bounces <at> cool.haxx.se] on
> >  behalf of Daniel Stenberg [daniel <at> haxx.se]

> > > If the first response to a request on a socket is marked as HTTP/1.0, or 
> > > an
> > > older IIS server version is used, or the site is black-listed (see below),
> > > the socket should be characterized as CAN_PIPELINE = false.
> > 
> > This makes the blacklist basically always just grow. Will there be any means
> > of purging old entries or don't you see this as a problem?
> 
> Well, it means that socket can't be used for pipelining for the rest of
> its life, but it's not like the server characteristics are going to change
> while we're connected to it, so that's unavoidable.

How do you intend to deal with HTTP proxies?

If an HTTP proxy is used, the socket isn't directly connected to the
server and may be used for requests to different servers. curl already
does that.

It's also possible that the server supports pipelining while the
proxy does not (or only poorly).

Fabian
-------------------------------------------------------------------
(Continue reading)

George Rizkalla | 25 Jul 2012 19:19
Favicon

Re: HTTP Pipelining Contributions

On 7/25/12 11:44 AM, Fabian Keil wrote:
>
>How do you intend to deal with HTTP proxies?
>
>If an HTTP proxy is used, the socket isn't directly connected to the
>server and may be used for requests to different servers. curl already
>does that.
>
>It's also possible that the server supports pipelining while the
>proxy does not (or only poorly).

The proxied case is a bit trickier to get right.  Nottingham suggests some
methods for faulty proxy detection:
http://tools.ietf.org/html/draft-nottingham-http-pipeline-01#section-5 .

Joe and I had some discussion about where we believe the best place for
implementing this sort of faulty proxy detection is.  Our inclination was
to keep this out of the protocol stack itself (although we're certainly
open to suggestions!).

If a client is going through a faulty proxy, there are two cases to be
dealt with:

(1) The client is going through a transparent faulty proxy (i.e. We don't
have an identifiable host)
(2) The client is going through a non-transparent faulty proxy (i.e. We
have an identifiable host)

For (1), the application might simply disable pipelining entirely.

(Continue reading)

Fabian Keil | 26 Jul 2012 12:13
Picon
Favicon

Re: HTTP Pipelining Contributions

George Rizkalla <grizkalla <at> rim.com> wrote:

> On 7/25/12 11:44 AM, Fabian Keil wrote:
> >
> >How do you intend to deal with HTTP proxies?
> >
> >If an HTTP proxy is used, the socket isn't directly connected to the
> >server and may be used for requests to different servers. curl already
> >does that.
> >
> >It's also possible that the server supports pipelining while the
> >proxy does not (or only poorly).
> 
> The proxied case is a bit trickier to get right.  Nottingham suggests some
> methods for faulty proxy detection:
> http://tools.ietf.org/html/draft-nottingham-http-pipeline-01#section-5 .

Which could be considered "phoning home", even if the intentions are good.

> Joe and I had some discussion about where we believe the best place for
> implementing this sort of faulty proxy detection is.  Our inclination was
> to keep this out of the protocol stack itself (although we're certainly
> open to suggestions!).

I agree that this doesn't really belong in libcurl. I'd be fine either
way, though, as long as it's not done without being explicitly requested.

> If a client is going through a faulty proxy, there are two cases to be
> dealt with:
> 
(Continue reading)

George Rizkalla | 26 Jul 2012 21:46
Favicon

Re: HTTP Pipelining Contributions

On 7/26/12 6:13 AM, Fabian Keil wrote:

>Or apply the standard host blacklist to the proxy as well (unless that's
>already what you are referring to). Do you see any advantages in having
>two
>separate blacklists?

I don't feel too strongly about either option :).

It seems like we might get some minor performance benefits out of two
lists.  Presuming that the proxy blacklist would be smaller (which seems
like a reasonable assumption), we may end up shortcutting the longer list
in the case that we are using a known, faulty proxy.  In the case that a
proxy is not being used, we would potentially be parsing through a shorter
standard host list...

That said, I'm not sure how long these blacklists would typically get, so
it might not be worth the added interface complexity.  What do you think?

---------------------------------------------------------------------
This transmission (including any attachments) may contain confidential information, privileged
material (including material protected by the solicitor-client or other applicable privileges), or
constitute non-public information. Any use of this information by anyone other than the intended
recipient is prohibited. If you have received this transmission in error, please immediately reply to
the sender and delete this information from your system. Use, dissemination, distribution, or
reproduction of this transmission by unintended recipients is not authorized and may be unlawful.

-------------------------------------------------------------------
List admin: http://cool.haxx.se/list/listinfo/curl-library
Etiquette:  http://curl.haxx.se/mail/etiquette.html
(Continue reading)

Fabian Keil | 27 Jul 2012 13:49
Picon
Favicon

Re: HTTP Pipelining Contributions

George Rizkalla <grizkalla <at> rim.com> wrote:

> On 7/26/12 6:13 AM, Fabian Keil wrote:
> 
> >Or apply the standard host blacklist to the proxy as well (unless that's
> >already what you are referring to). Do you see any advantages in having
> >two
> >separate blacklists?
> 
> I don't feel too strongly about either option :).

Me neither, I was just wondering if I was overlooking a use case.

> It seems like we might get some minor performance benefits out of two
> lists.  Presuming that the proxy blacklist would be smaller (which seems
> like a reasonable assumption), we may end up shortcutting the longer list
> in the case that we are using a known, faulty proxy.

The application could simply add the blacklisted proxies at the
beginning of the list, though, in which case the rest of the list
could be skipped as well.

>                                                       In the case that a
> proxy is not being used, we would potentially be parsing through a shorter
> standard host list...
>
> That said, I'm not sure how long these blacklists would typically get, so
> it might not be worth the added interface complexity.  What do you think?

If the theoretical performance gain is the only thing you get
(Continue reading)

George Rizkalla | 25 Jul 2012 18:45
Favicon

Re: HTTP Pipelining Contributions

On 7/24/12 6:13 PM, Daniel Stenberg wrote:
>
>Awesome. You are most welcome here and I'm sure I'm not the only one who
>will 
>appreciate this!

Thanks!  We're certainly looking forward to the opportunity to work on
this :).

>
>> 1.  MAX SOCKETS CHANGES
>
>I know this has been requested by others in the past - exactly to prevent
>the 
>application from having to do the queing. I suspect you want to do this
>by not 
>altering the API and simply just not start new handles if there aren't
>any 
>available connections ?

Exactly.

>
>> 2.  HTTP PIPELINING CHANGES
>>
>> It is possible to determine whether a socket should be penalized by
>>either:
>>
>> a) A Content-Length header specifying a content-length greater than the
>> proposed new curl option CURLMOPT_CONTENT_LENGTH_PENALTY_SIZE
(Continue reading)

Dan Fandrich | 25 Jul 2012 23:30
Favicon

Re: HTTP Pipelining Contributions

On Tue, Jul 24, 2012 at 03:01:26PM +0000, George Rizkalla wrote:
> The proposed algorithm involves balancing HTTP requests over multiple TCP
> sockets, while avoiding use of HTTP pipelining in instances where we
> believe errors are likely to occur, or where it is likely that there would
> be a performance hit if pipelining is used.

This is set of proposals sounds really useful, but I wonder if it's
better kept in the application rather than in libcurl. Most of the
suggestions can be done today without changing (and complicating)
libcurl, using its existing interfaces and callbacks. And some of them
only make sense with additional data that only the application can
provide, which would require numerous additional options and interfaces
to bring into libcurl.

Really what this calls for is a layer sitting above libcurl that takes
care of queuing handles into appropriate pipelines, creating new ones
as necessary and optimizing the balance of requests to sockets as
appropriate.  The low-level requests can already be performed by
libcurl--this would really be a "value added" layer that would only be
used by applications that need it.

I guess what I've had in the back of my mind for a while is a new
library, let's call it "libcurlapp", sitting on top of libcurl. This
could be a place for those numerous suggestions over the years for
features that don't quite make sense in a low-level library like
libcurl, but would still be really useful for some applications.
Things like sophisticated proxy selection, memory and disk caching
of results, handle pools, Metalink support, and now pipeline
optimization. Those sort of features would IMHO be well-suited for a
libcurlapp, even if not in libcurl itself.
(Continue reading)

Joe Mason | 26 Jul 2012 19:30
Favicon
Gravatar

RE: HTTP Pipelining Contributions

> From: curl-library-bounces <at> cool.haxx.se [curl-library-bounces <at> cool.haxx.se] on
>  behalf of Dan Fandrich [dan <at> coneharvesters.com]
> Sent: Wednesday, July 25, 2012 5:30 PM
> To: curl-library <at> cool.haxx.se
> Subject: Re: HTTP Pipelining Contributions
> 
> This is set of proposals sounds really useful, but I wonder if it's
> better kept in the application rather than in libcurl. Most of the
> suggestions can be done today without changing (and complicating)
> libcurl, using its existing interfaces and callbacks. And some of them
> only make sense with additional data that only the application can
> provide, which would require numerous additional options and interfaces
> to bring into libcurl.

Can they?  curl today doesn't expose the number of connections and the mapping of curl handles to
connections.  The only options I can find to control how a request is mapped to a connection are 
The only callbacks I can find today that can be used to control how requests are assigned to connections are
CURLOPT_FRESH_CONNECT and CURLOPT_FORBID_REUSE, which force requests to use new connections (and
disable both pipelining and regular connection reuse).

I don't see any way to implement this proposal outside curl without adding functions to assign requests to
connections explicitly, and I thought that Daniel was strongly against that.

> Really what this calls for is a layer sitting above libcurl that takes
> care of queuing handles into appropriate pipelines, creating new ones
> as necessary and optimizing the balance of requests to sockets as
> appropriate.  The low-level requests can already be performed by
> libcurl--this would really be a "value added" layer that would only be
> used by applications that need it.

(Continue reading)

Dan Fandrich | 27 Jul 2012 23:43
Favicon

Re: HTTP Pipelining Contributions

On Thu, Jul 26, 2012 at 05:30:26PM +0000, Joe Mason wrote:
> Can they? curl today doesn't expose the number of connections and the
> mapping of curl handles to connections.  The only options I can find

For pipelined connections, this is implicitly controlled by the
application as it adds pipelined easy handles to a multi handle.

> to control how a request is mapped to a connection are
> The only callbacks I can find today that can be used to control how
> requests are assigned to connections are CURLOPT_FRESH_CONNECT and
> CURLOPT_FORBID_REUSE, which force requests to use new connections (and
> disable both pipelining and regular connection reuse).
> 
> I don't see any way to implement this proposal outside curl without
> adding functions to assign requests to connections explicitly, and I
> thought that Daniel was strongly against that.

I'll admit I'm not completely familiar with the existing pipelining
code, but my understanding is that libcurl will pipeline all it can on a
single connection within a multi handle. An application that wants two
connections would use two multi handles. Within each handle, the
application can control (and in some cases, even reorder) requests and
the pipeline depth by controlling when the easy handles are added to the
multi handle. This could become a bit hairy, which is why I suggest
delegating it to a libcurlapp which would only have to be written once,
not for each app.

> What would the interface to curl look like?

I'm thinking of something along the lines of the multi interface, where
(Continue reading)


Gmane