Daniel Stenberg | 1 Sep 2006 22:52
Picon
Favicon
Gravatar

Re: does curl_multi handle can be accessed from 2 threads?

On Fri, 1 Sep 2006, Tom Jerry wrote:

> So will it help if I lock the multi handle before accessing it on both 
> threads?

lock? It will work if you have a mutex around your use of the multi handle, 
sure.

--

-- 
  Commercial curl and libcurl Technical Support: http://haxx.se/curl.html

Christian Grade | 5 Sep 2006 14:29
Picon

Re: [SAD TRUTH] does curl_multi handle can be accessed from 2 threads?


Hi,

the truth is: the multi-interface isn't designed in way that it *would 
be sanely thread-safe* or *could be made sanely thread-safe easily*. If 
one puts locks for the multi-interface *outside* the interface instead 
of locking *in* it, the multi-interface is ever-lastingly locked during 
transfer, whereas I claim it should be able to potentially permanently 
download something and at the same time take in new transfer requests. 
Locks would have to be incorporated *within* the multi-interface, 
*within* the data-structures used for transfers. You can't lock up the 
multi-interface wrapping it. It's crazy because you perm-lock pretty 
much everything you want to retrieve data from, because you lock 
perm-traversals. Yet, there's only one shaky data-structure for handling 
file transfers, and there's no possibility of differentiation 
(data-structure-wise) between running downloads and finished downloads. 
Having a look in the code of "multi.c", there's a funny remark at status 
flag 'dl finished' (or other) which states something like: "we should 
detach the node here and put it elsewhere"; it's a *list* (guarantor for 
high performance), yes. So you can't just even adapt "multi.c" to fit 
your needs, putting in some OMP code or other; no, you have to 
practically rewrite "multi.c" to make your application thread-safe, 
performant and usable on the one hand and in order to let it run in some 
useful and sane concurrent mode for your application on the other hand. 
You also can't just put locks in there since the authors have been 
*wisely* mixing the logics of data structure traversal, data 
manipulation and net transfers: a tight-fitting universal 
built-in-and-around.

Short: If you wrap up the multi-interface for multi-threading purposes, 
(Continue reading)

Richard Atterer | 5 Sep 2006 17:17

Re: [SAD TRUTH] does curl_multi handle can be accessed from 2 threads?

On Tue, Sep 05, 2006 at 02:29:26PM +0200, Christian Grade wrote:
> If one puts locks for the multi-interface *outside* the interface instead 
> of locking *in* it, the multi-interface is ever-lastingly locked during 
> transfer, whereas I claim it should be able to potentially permanently 
> download something and at the same time take in new transfer requests.

I think it is possible to do this; temporarily release the lock inside your 
registered callback function which is called whenever new data arrives. If 
you have a wait()/notify() mechanism (a monitor), you'd call notify() from 
the callback and your main thread would be able to do whatever it likes 
with libcurl.

ISTR you can set a timeout to avoid the case that all curl-multi file 
descriptors are waiting for data. Is it curl_easy_setopt(CURLOPT_TIMEOUT)? 
Then you can call curl_multi_perform() in a loop and also release the lock 
between these calls.

Cheers,

  Richard

--

-- 
  __   _
  |_) /|  Richard Atterer     |  GnuPG key: 888354F7
  | \/¯|  http://atterer.net  |  08A9 7B7D 3D13 3EF2 3D25  D157 79E6 F6DC 8883 54F7
  ¯ '` ¯

Richard Atterer | 5 Sep 2006 17:28

Re: [SAD TRUTH] does curl_multi handle can be accessed from 2 threads?

On Tue, Sep 05, 2006 at 05:17:50PM +0200, Richard Atterer wrote:
> ISTR you can set a timeout to avoid the case that all curl-multi file 
> descriptors are waiting for data. Is it 
> curl_easy_setopt(CURLOPT_TIMEOUT)? Then you can call curl_multi_perform() 
> in a loop and also release the lock between these calls.

Er, nonsense... since you call select() yourself, you can leave the lock 
unlocked during the call, and only take it immediately before you call 
curl_multi_perform(). Actually, it's probably not necessary to release the 
lock from within the callback because the data will be processed pretty 
quickly by libcurl.

  Richard

--

-- 
Richard Atterer
Raum 503, Amalienstraße 17 5.OG, Fon 089/2180-4654, Fax 089/2180-994654,
Medieninformatik, LMU München: http://www.medien.ifi.lmu.de

Daniel Stenberg | 5 Sep 2006 16:34
Picon
Favicon
Gravatar

Re: [SAD TRUTH] does curl_multi handle can be accessed from 2 threads?

On Tue, 5 Sep 2006, Christian Grade wrote:

> the truth is: the multi-interface isn't designed in way that it *would be 
> sanely thread-safe* or *could be made sanely thread-safe easily*.

The multi interface is thread-safe, but only from one thread at a time. I've 
explained this before.

> If one puts locks for the multi-interface *outside* the interface instead of 
> locking *in* it, the multi-interface is ever-lastingly locked during 
> transfer,

I don't understand what you're saying. If you call a function in a thread, 
then yes that function will be using that thread the whole time.

> whereas I claim it should be able to potentially permanently download 
> something and at the same time take in new transfer requests.

That's indeed possible with the multi interface. And it is being done by 
applications already.

> Locks would have to be incorporated *within* the multi-interface, *within* 
> the data-structures used for transfers.

If we would want to allow multiple threads to use the same multi handle, yes. 
But I haven't written it do allow that and I have no plans on doing so either. 
What would the point be?

> You can't lock up the multi-interface wrapping it. It's crazy because you 
> perm-lock pretty much everything you want to retrieve data from, because you 
(Continue reading)

Christian Grade | 6 Sep 2006 15:12
Picon

Re: [SAD TRUTH] does curl_multi handle can be accessed from 2 threads?


You actually end up with two parallel bookkeeping mechanisms for the 
transfer data at
two code abstraction levels: one in "multi.c", one in your own wrapper, 
where one level
is redundant. This is undesirable per se. Things become even more 
complicated if you
need to associate additional data with an hEasy. An 'easy node' should 
have at least a
void* where you could pass your own struct to. The other way round 
doesn't work for
me: handling nodes with an hEasy member outside the multi-interface; 
this counteracts
the multi-interface purpose. On top of that, I think transfers should 
rather be identified
by a hashed key (via url string) only.

I don't understand why you don't understand my complaints.
I'm pondering what I might be getting wrong here; also taking Richard 
Atterer's follow-up
into account, which I admittedly don't understand in my context of 
perception.

Regards
Chr. Grade

Daniel Stenberg wrote:
> On Tue, 5 Sep 2006, Christian Grade wrote:
>
>> the truth is: the multi-interface isn't designed in way that it 
(Continue reading)

Daniel Stenberg | 6 Sep 2006 16:14
Picon
Favicon
Gravatar

Re: [SAD TRUTH] does curl_multi handle can be accessed from 2 threads?

On Wed, 6 Sep 2006, Christian Grade wrote:

(Please don't top-post)

> You actually end up with two parallel bookkeeping mechanisms for the 
> transfer data at two code abstraction levels: one in "multi.c", one in your 
> own wrapper, where one level is redundant. This is undesirable per se.

What "bookkeeping mechanisms" are you talking about that you need to do that 
the multi interface already does? Or rather, in what way can the interface 
help you do that bookkeeping?

> Things become even more complicated if you need to associate additional data 
> with an hEasy. An 'easy node' should have at least a void* where you could 
> pass your own struct to.

You mean like CURLOPT_PRIVATE ?

> On top of that, I think transfers should rather be identified by a hashed 
> key (via url string) only.

Why? What would the gain be? I don't see what the difference would be to your 
app. You still need to create an entity and tell what URL to transfer.
So, please elaborate with more details how that would work.

> I don't understand why you don't understand my complaints.

And I don't understand why you don't understand that I don't understand...

So perhaps we can go back to this point:
(Continue reading)

Christian Grade | 7 Sep 2006 21:13
Picon

Re: [SAD TRUTH] does curl_multi handle can be accessed from 2 threads?


My requirement is/was to have  a libcurl module which consumes data
(remote paths/urls) from [two] producing threads. At the same time,
the libcurl module is/was supposed to produce data (mainly files)
for [three] consuming threads. Modelling the logics for the *data-
structure-wise separation* of the *transfer statús*, I found I'd
better adapt "multi.c" by building in some locking mechanisms and
by replacing the "multi.c" list with a *lock-free* alternative:
one thread transferring, one thread retrieving data about transfers.
I didn't get far with this.
When the necessity arose to even adapt "url.c" (monitored strings)
and as I didn't find out how to retrieve a stack of redirection urls
and when I heard the multi-interface will undergo a major overhaul
soon anyway, I took a break from fiddling with it, pondering about
alternatives.

I must have overlooked toUpper( "curlopt_private" ). Keyword 'private'
(suggesting "better keep hands off") and 'char*' (suggesting "it's a
string") didn't seem to catch my attention. Now the existence of this
option fulfills the *should-at-least-have* part.

I see three transfer scenarios (downstreaming):
[1] Module supplied with url, file name, optional offset (resumption)
[2] Module supplied with url, continuous buffer, salvation function
[3] Module supplied with url, chunked buffer, what-if/process function

This introduces new data members which one has to care for:
book-keeping. So, one could associate these with 'easy handles'
but it would be more performant, less tedious to implement if
these were in the multi-interface already. One would have two lists
(Continue reading)

Daniel Stenberg | 7 Sep 2006 23:05
Picon
Favicon
Gravatar

Re: [SAD TRUTH] does curl_multi handle can be accessed from 2 threads?

On Thu, 7 Sep 2006, Christian Grade wrote:

> My requirement is/was to have a libcurl module which consumes data (remote 
> paths/urls) from [two] producing threads. At the same time, the libcurl 
> module is/was supposed to produce data (mainly files) for [three] consuming 
> threads. Modelling the logics for the *data- structure-wise separation* of 
> the *transfer statús*, I found I'd better adapt "multi.c" by building in 
> some locking mechanisms and by replacing the "multi.c" list with a 
> *lock-free* alternative: one thread transferring, one thread retrieving data 
> about transfers. I didn't get far with this.

I think it sounds totally crazy. To me, it sounds like you've had made your 
mind up how this would be done already before you ever saw libcurl or read its 
documentation. And since then you've tried to squeeze libcurl into working 
with this design.

When using the multi interface to transfer multiple files, it doesn't make any 
sense to split it up into multiple threads. If using many threads is your 
game, then I suggest you instead simply use indivual easy transfers in each 
thread.

> When the necessity arose to even adapt "url.c" (monitored strings) and as I 
> didn't find out how to retrieve a stack of redirection urls and when I heard 
> the multi-interface will undergo a major overhaul soon anyway, I took a 
> break from fiddling with it, pondering about alternatives.

Well, the multi interface API is not about to change, but the internals are 
gonna be somewhat changed within a few days when I commit the HTTP pipelining 
support. Further, "a stack of redirection urls" is not a problem to the multi 
interface. Not now and not tomorrow.
(Continue reading)

Christian Grade | 9 Sep 2006 19:30
Picon

Re: [SAD TRUTH] does curl_multi handle can be accessed from 2 threads?


Spawning one thread for each transfer was the way I
went for some time. Unfortunately a UltraSPARC environment
is not the target platform aimed at, and I seek perfection.

I don't know anything about the impact of an implementation
of http pipelining when it comes to reacting to unexpected,
undesirable server responses. I quoted the one responsible
on "major overhaul". It sounded as if everything is going
to change, mankind being on the verge of extinction. Well,
so it's just a bit more than refactoring.

There *might* be a good reason not to take the approach
of using keys instead of handles themselves but I currently
don't see why not to make use of this abstraction. Identifying
data only by an 'easy handle' seems a shortsighted approach
since all transfer-relevant data continues to exist in an
application, but this does not hold true for the association
between 'socket' and a short-termed 'currently transferring'.
And in a multi-threaded application, everything is modeled
around shared entities.
It's so obvious: if a transfer is enqueued, waiting for actual
transfer, thus no handle associated with it yet, only thing
you can grab entities with is a key.
A consequence is the need to make those entities local
to 'a' or 'the' multi-interface. This might help to
understand why I consider the maximum number of
current 'easy handles' in use pretty constant.

In my case, I don't see the scenario "I need to stream three
(Continue reading)

Daniel Stenberg | 9 Sep 2006 20:32
Picon
Favicon
Gravatar

Re: [SAD TRUTH] does curl_multi handle can be accessed from 2 threads?

On Sat, 9 Sep 2006, Christian Grade wrote:

> I don't know anything about the impact of an implementation of http 
> pipelining when it comes to reacting to unexpected, undesirable server 
> responses.

It doesn't change that much, but of course some requests may need to get 
re-sent when in a non-piped situation they might not have had to.

> I quoted the one responsible on "major overhaul". It sounded as if 
> everything is going to change, mankind being on the verge of extinction. 
> Well, so it's just a bit more than refactoring.

That one was me and the added pipelining support has changed lots of 
internals. If you check my commit from a few days ago you should see that. The 
unified patch was some 250K.

This change is now in CVS HEAD.

> There *might* be a good reason not to take the approach of using keys 
> instead of handles themselves but I currently don't see why not to make use 
> of this abstraction.

Can you please give me a real use-case example where such an API would be used 
and where it would actually add something useful to the application using 
libcurl?

> Identifying data only by an 'easy handle' seems a shortsighted approach 
> since all transfer-relevant data continues to exist in an application, but 
> this does not hold true for the association between 'socket' and a 
(Continue reading)

Richard Atterer | 8 Sep 2006 15:49

Re: [SAD TRUTH] does curl_multi handle can be accessed from 2 threads?

On Thu, Sep 07, 2006 at 11:05:51PM +0200, Daniel Stenberg wrote:
> When using the multi interface to transfer multiple files, it doesn't 
> make any sense to split it up into multiple threads. If using many 
> threads is your game, then I suggest you instead simply use indivual easy 
> transfers in each thread.

I agree that this may be the basic misunderstanding. Because of the use of 
select(), libcurl can perform a large number of concurrent downloads in one 
thread.

Cheers,

  Richard

--

-- 
  __   _
  |_) /|  Richard Atterer     |  GnuPG key: 888354F7
  | \/¯|  http://atterer.net  |  08A9 7B7D 3D13 3EF2 3D25  D157 79E6 F6DC 8883 54F7
  ¯ '` ¯

Tom Jerry | 2 Sep 2006 10:58
Picon

Re: does curl_multi handle can be accessed from 2 threads?

Yes, that's what I ment, a mutex.
I will try that. thanks.


On 9/1/06, Daniel Stenberg < daniel <at> haxx.se> wrote:
On Fri, 1 Sep 2006, Tom Jerry wrote:

> So will it help if I lock the multi handle before accessing it on both
> threads?

lock? It will work if you have a mutex around your use of the multi handle,
sure.

--
  Commercial curl and libcurl Technical Support: http://haxx.se/curl.html


Gmane