Dominic LoBue | 9 Mar 18:48 2010
Picon

Massive memory leak

Hello,

I am seeing a massive memory leak in xappy, but I'm having trouble
pinning it down exactly.

Some background: I'm using xappy to index my email and store useful
headers for quick access.

In the program I'm developing I will open a new search connection,
perform a query, copy all the header information into a custom
container class, and then close the search connection. I have found
that as I keep performing these operations my program continues to use
more and more ram, and never releases anything.

Here's a really simple example that makes the problem obvious:
import pdb
import xappy
from overwatch import xapidx
from databasics import msg_factory
sconn = xappy.SearchConnection(xapidx)
r = sconn.search(sconn.query_all(), 0, 99999999, checkatleast= -1,
sortby= '-sent')
r = map(msg_factory, r)
del r
del sconn
pdb.set_trace()

msg_factory is just a factory function that returns a named tuple that
contains all the header information contained in the ProcessedDocument
it gets.
(Continue reading)

Re: Massive memory leak

On Mar 9, 5:48 pm, Dominic LoBue <dom.lo...@...> wrote:
> Hello,
>
> I am seeing a massive memory leak in xappy, but I'm having trouble
> pinning it down exactly.
>
> Some background: I'm using xappy to index my email and store useful
> headers for quick access.
>
> In the program I'm developing I will open a new search connection,
> perform a query, copy all the header information into a custom
> container class, and then close the search connection. I have found
> that as I keep performing these operations my program continues to use
> more and more ram, and never releases anything.
>
> Here's a really simple example that makes the problem obvious:
> import pdb
> import xappy
> from overwatch import xapidx
> from databasics import msg_factory
> sconn = xappy.SearchConnection(xapidx)
> r = sconn.search(sconn.query_all(), 0, 99999999, checkatleast= -1,
> sortby= '-sent')
> r = map(msg_factory, r)
> del r
> del sconn
> pdb.set_trace()
>
> msg_factory is just a factory function that returns a named tuple that
> contains all the header information contained in the ProcessedDocument
(Continue reading)

Dominic LoBue | 10 Mar 05:15 2010
Picon

Re: Re: Massive memory leak

On Tue, Mar 9, 2010 at 2:53 PM, boulton.rj@...
<boulton.rj@...> wrote:
> On Mar 9, 5:48 pm, Dominic LoBue <dom.lo...@...> wrote:
>> Hello,
>>
>> I am seeing a massive memory leak in xappy, but I'm having trouble
>> pinning it down exactly.
>>
>> Some background: I'm using xappy to index my email and store useful
>> headers for quick access.
>>
>> In the program I'm developing I will open a new search connection,
>> perform a query, copy all the header information into a custom
>> container class, and then close the search connection. I have found
>> that as I keep performing these operations my program continues to use
>> more and more ram, and never releases anything.
>>
>> Here's a really simple example that makes the problem obvious:
>> import pdb
>> import xappy
>> from overwatch import xapidx
>> from databasics import msg_factory
>> sconn = xappy.SearchConnection(xapidx)
>> r = sconn.search(sconn.query_all(), 0, 99999999, checkatleast= -1,
>> sortby= '-sent')
>> r = map(msg_factory, r)
>> del r
>> del sconn
>> pdb.set_trace()
>>
(Continue reading)

Richard Boulton | 10 Mar 10:56 2010

Re: Re: Massive memory leak

Your test code appears to be calling cont.thread(to_join) in the loop, which
sounds from your description like something which will store the results in
something.  You've not provided the code for thread_container() and
msg_factory() so I don't know what they're doing, but my guess would be that
the cont.thread(to_join) call is causing more data to be stored each
time, which is why the
memory usage increases each time you go around the loop.  Try taking
that call out and see how the memory usage behaves.

It's unlikely that there's a memory leak in xapian core (the C++ code)
- that code is very thoroughly tested for this kind of thing.

It's plausible there's a leak in the xapian bindings, but you're not
doing anything unusual and I'd expect to have found it by now.  The
same goes for xappy - I'm using it in production environments in
long-running servers, and not observing any memory leak type of
behaviour.  So, most likely is that something in your code is causing
data to be retained.  Second most likely is that you're doing
something with xappy which my long-running servers don't do - but I
can't see offhand what that would be.

-- 
Richard

--

-- 
You received this message because you are subscribed to the Google Groups "xappy-discuss" group.
To post to this group, send email to xappy-discuss@...
To unsubscribe from this group, send email to xappy-discuss+unsubscribe@...
For more options, visit this group at http://groups.google.com/group/xappy-discuss?hl=en.

(Continue reading)

Dominic LoBue | 10 Mar 13:01 2010
Picon

Re: Re: Massive memory leak

On Wed, Mar 10, 2010 at 1:56 AM, Richard Boulton <richard@...> wrote:
> Your test code appears to be calling cont.thread(to_join) in the loop, which
> sounds from your description like something which will store the results in
> something.  You've not provided the code for thread_container() and
> msg_factory() so I don't know what they're doing, but my guess would be that
> the cont.thread(to_join) call is causing more data to be stored each
> time, which is why the
> memory usage increases each time you go around the loop.  Try taking
> that call out and see how the memory usage behaves.
>
> It's unlikely that there's a memory leak in xapian core (the C++ code)
> - that code is very thoroughly tested for this kind of thing.
>
> It's plausible there's a leak in the xapian bindings, but you're not
> doing anything unusual and I'd expect to have found it by now.  The
> same goes for xappy - I'm using it in production environments in
> long-running servers, and not observing any memory leak type of
> behaviour.  So, most likely is that something in your code is causing
> data to be retained.  Second most likely is that you're doing
> something with xappy which my long-running servers don't do - but I
> can't see offhand what that would be.
>
> --
> You received this message because you are subscribed to the Google Groups "xappy-discuss" group.
> To post to this group, send email to xappy-discuss@...
> To unsubscribe from this group, send email to xappy-discuss+unsubscribe <at> googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/xappy-discuss?hl=en.
>
>

(Continue reading)

Richard Boulton | 10 Mar 13:29 2010

Re: Re: Massive memory leak

On 10 March 2010 12:01, Dominic LoBue <dom.lobue@...> wrote:
>   To save you some time though, the thread method for
> thread_container filters out anything it already has.

I've just looked at your code briefly: I don't think it does.  It
seems to call a "join" method on each result which merges the result
with existing results if the id is already found.  (A style point
about your code - you don't need to prefix all the local variables you
use in functions with __ - in fact, you shouldn't do so.  Variables
defined inside functions are local scope anyway.)

> I did as you suggested and took the cont.thread(to_join) out of the
> loop. Here are the results:
>  VSZ     RSS
> 332756 166708
> 426628 260720
> 426628 260856
> 492164 276712

That doesn't look like convincing evidence of a leak to me: the memory
only went up significantly after the first query.  The fluctuation
after that could easily be to do with the garbage collector.

-- 
Richard

--

-- 
You received this message because you are subscribed to the Google Groups "xappy-discuss" group.
To post to this group, send email to xappy-discuss@...
To unsubscribe from this group, send email to xappy-discuss+unsubscribe <at> googlegroups.com.
(Continue reading)

Dominic LoBue | 10 Mar 16:12 2010
Picon

Re: Re: Massive memory leak

On Wed, Mar 10, 2010 at 4:29 AM, Richard Boulton <richard@...> wrote:
> On 10 March 2010 12:01, Dominic LoBue <dom.lobue@...> wrote:
>>   To save you some time though, the thread method for
>> thread_container filters out anything it already has.
>
> I've just looked at your code briefly: I don't think it does.  It
> seems to call a "join" method on each result which merges the result
> with existing results if the id is already found.

Right. The thread_container puts all conversation objects into a list
(for sorting by the date of the most recently received email), and
adds a mapping of thread id to the conversation object to a
WeakValueDict for quick access.

When the thread_container is told to thread a conversation object into
itself, it first tries to merge the new conversation object into an
already existing conversation object with the same thread id. If one
isn't found, its a new conversation and is just appended.

If a conversation with the same thread id is found, I merge the new
conversation into the existing one. First I update the attributes
nique_terms and labels (which are sets) with the corresponding
attributes from the new conversation. Since they are sets, they won't
copy over anything they already have.

Lastly I do this:
        def do_insort(x):
            insort_right(self.messages, x)
            self.muuids.extend(x.muuid)

(Continue reading)

Richard Boulton | 10 Mar 16:29 2010

Re: Re: Massive memory leak

On 10 March 2010 15:12, Dominic LoBue <dom.lobue@...> wrote:
> That's great, but still leaves me with xapian eating up gobs of ram,
> which is a big concern since my program is a desktop application.
>
> Is there any way to coax xapian to release memory it allocated for old
> search results/searchconnects/whatever?

I suppose you could try telling the garbage collector to do a run:

    import gc
    gc.collect()

None of Xapian, the Xapian bindings, and Xappy cache previously
calculated results, so if you're not linking to them from your code
still, they should be available for garbage collection.

Note that Python doesn't necessarily make memory available to the
operating system again just because the python objects which were
using it have been unlinked: Python has its own memory allocator which
may well be keeping memory allocated which is no longer in use by any
python objects in the process.

One approach, if calling gc.collect() directly doesn't help, could be
to try to allocate fewer objects at a time: your code looks like it
makes various large sets of things; you could try changing it to
compute only those results needed for display.

--

-- 
Richard

(Continue reading)

Richard Boulton | 10 Mar 16:56 2010

Re: Re: Massive memory leak

You might also find it interesting to look at the output of
len(gc.get_objects()) - this displays the number of objects known to
the garbage collector.  I've just run various tests using this to
check if objects are being leaked by xappy when running a query in a
loop, and have not seen any cases where the count rises each time the
loop is called.

-- 
Richard

--

-- 
You received this message because you are subscribed to the Google Groups "xappy-discuss" group.
To post to this group, send email to xappy-discuss@...
To unsubscribe from this group, send email to xappy-discuss+unsubscribe@...
For more options, visit this group at http://groups.google.com/group/xappy-discuss?hl=en.

Dominic LoBue | 12 Mar 18:06 2010
Picon

Re: Re: Massive memory leak

On Wed, Mar 10, 2010 at 7:56 AM, Richard Boulton <richard@...> wrote:
> You might also find it interesting to look at the output of
> len(gc.get_objects()) - this displays the number of objects known to
> the garbage collector.  I've just run various tests using this to
> check if objects are being leaked by xappy when running a query in a
> loop, and have not seen any cases where the count rises each time the
> loop is called.
>
> --
> You received this message because you are subscribed to the Google Groups "xappy-discuss" group.
> To post to this group, send email to xappy-discuss@...
> To unsubscribe from this group, send email to xappy-discuss+unsubscribe <at> googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/xappy-discuss?hl=en.
>
>

Richard,

I wanted to thank you for all your help in diagnosing this problem the
other day.

In case anybody is curious about what the cause of my massive use of
memory was, I believe it is memory fragmentation.

I just read that when you assign a string to a new namespace it
creates a new object in memory instead of reusing the original one. So
when I manipulate all the results from xapian into my data containers
simultaneously, I'm also fragmenting my ram all to hell.

No solution yet, but at least I (think I) know what the problem is now.
(Continue reading)

Kapil Thangavelu | 10 Mar 00:14 2010
Picon

Re: Massive memory leak

i'd try something like meliae from pdb to dump your reference counts, that should give your some insight if your generating objects cycles that are the source of the leaks.
http://jam-bazaar.blogspot.com/2009/11/memory-debugging-with-meliae.html

https://launchpad.net/meliae

cheers,

kapil

On Tue, Mar 9, 2010 at 12:48 PM, Dominic LoBue <dom.lobue <at> gmail.com> wrote:
Hello,

I am seeing a massive memory leak in xappy, but I'm having trouble
pinning it down exactly.

Some background: I'm using xappy to index my email and store useful
headers for quick access.

In the program I'm developing I will open a new search connection,
perform a query, copy all the header information into a custom
container class, and then close the search connection. I have found
that as I keep performing these operations my program continues to use
more and more ram, and never releases anything.

Here's a really simple example that makes the problem obvious:
import pdb
import xappy
from overwatch import xapidx
from databasics import msg_factory
sconn = xappy.SearchConnection(xapidx)
r = sconn.search(sconn.query_all(), 0, 99999999, checkatleast= -1,
sortby= '-sent')
r = map(msg_factory, r)
del r
del sconn
pdb.set_trace()




msg_factory is just a factory function that returns a named tuple that
contains all the header information contained in the ProcessedDocument
it gets.

Running that script on my machine and running `ps aux` when it starts
pdb I see that the script is using 128568k, or ~125megs of ram. Now,
correct me if I'm wrong, but since I've deleted all objects, shouldn't
the only things using up memory still be the python interpreter, and
everything I imported?

 I'm using the latest xappy from trunk and xapian 1.0.17.

Any idea how to fix this?

Dominic

--
You received this message because you are subscribed to the Google Groups "xappy-discuss" group.
To post to this group, send email to xappy-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org.
To unsubscribe from this group, send email to xappy-discuss+unsubscribe <at> googlegroups.com.
For more options, visit this group at http://groups.google.com/group/xappy-discuss?hl=en.


--
You received this message because you are subscribed to the Google Groups "xappy-discuss" group.
To post to this group, send email to xappy-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To unsubscribe from this group, send email to xappy-discuss+unsubscribe <at> googlegroups.com.
For more options, visit this group at http://groups.google.com/group/xappy-discuss?hl=en.
Dominic LoBue | 10 Mar 05:23 2010
Picon

Re: Massive memory leak

On Tue, Mar 9, 2010 at 3:14 PM, Kapil Thangavelu <kapilt@...> wrote:
> i'd try something like meliae from pdb to dump your reference counts, that
> should give your some insight if your generating objects cycles that are the
> source of the leaks.
> http://jam-bazaar.blogspot.com/2009/11/memory-debugging-with-meliae.html
> https://launchpad.net/meliae
>
> cheers,
> kapil
> On Tue, Mar 9, 2010 at 12:48 PM, Dominic LoBue <dom.lobue@...> wrote:
>>
>> Hello,
>>
>> I am seeing a massive memory leak in xappy, but I'm having trouble
>> pinning it down exactly.
>>
>> Some background: I'm using xappy to index my email and store useful
>> headers for quick access.
>>
>> In the program I'm developing I will open a new search connection,
>> perform a query, copy all the header information into a custom
>> container class, and then close the search connection. I have found
>> that as I keep performing these operations my program continues to use
>> more and more ram, and never releases anything.
>>
>> Here's a really simple example that makes the problem obvious:
>> import pdb
>> import xappy
>> from overwatch import xapidx
>> from databasics import msg_factory
>> sconn = xappy.SearchConnection(xapidx)
>> r = sconn.search(sconn.query_all(), 0, 99999999, checkatleast= -1,
>> sortby= '-sent')
>> r = map(msg_factory, r)
>> del r
>> del sconn
>> pdb.set_trace()
>>
>>
>>
>>
>> msg_factory is just a factory function that returns a named tuple that
>> contains all the header information contained in the ProcessedDocument
>> it gets.
>>
>> Running that script on my machine and running `ps aux` when it starts
>> pdb I see that the script is using 128568k, or ~125megs of ram. Now,
>> correct me if I'm wrong, but since I've deleted all objects, shouldn't
>> the only things using up memory still be the python interpreter, and
>> everything I imported?
>>
>>  I'm using the latest xappy from trunk and xapian 1.0.17.
>>
>> Any idea how to fix this?
>>
>> Dominic
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "xappy-discuss" group.
>> To post to this group, send email to xappy-discuss@...
>> To unsubscribe from this group, send email to
>> xappy-discuss+unsubscribe@...
>> For more options, visit this group at
>> http://groups.google.com/group/xappy-discuss?hl=en.
>>
>

Kapil,

Funny you should mention meliae, because I've been using it and
objgraph in order to try and debug this problem.

What I found is that while `ps aux` reports python is using > 120 megs
of ram, meliae only reports python using 44 megs.

From my testing I was able to rule out everything except xappy/xapian.
I figure either something is not being destructed in xapian by xappy,
or there's a memory leak in xapian or the python swig xapian bindings.
Since the answer is out of my league, I thought I'd start with xappy
first and work my way down.

-- 
Dominic LoBue

--

-- 
You received this message because you are subscribed to the Google Groups "xappy-discuss" group.
To post to this group, send email to xappy-discuss@...
To unsubscribe from this group, send email to xappy-discuss+unsubscribe <at> googlegroups.com.
For more options, visit this group at http://groups.google.com/group/xappy-discuss?hl=en.


Gmane