Joshua Udall | 29 Aug 08:45

DB_File and assembly IO

Bioperl -

I'm trying to read/parse a single cap3 ace file with several thousand
contigs.  I get a DB_File error at Contig247.  Here's the error:

------------- EXCEPTION -------------
MSG: Unable to tie DB_File handle
STACK Bio::SeqFeature::Collection::new
/Users/jaudall/bin/src/bioperl-live/Bio/SeqFeature/Collection.pm:195
STACK Bio::Assembly::Contig::new
/Users/jaudall/bin/bioperl-live/Bio/Assembly/Contig.pm:256
STACK Bio::Assembly::IO::ace::next_assembly
/Users/jaudall/bin/src/bioperl-live/Bio/Assembly/IO/ace.pm:148
STACK toplevel /Users/jaudall/bin/read_ace.pl:214
-------------------------------------

Looking at the Collection::new, the error is on the middle line:

  $self->{'_btree'} = tie %{$self->{'_btreehash'}}, 'DB_File',
$self->indexfile, O_RDWR|O_CREAT, 0640, $DB_BTREE;  # or die "Cannot open
file: $!\n" ;
  $self->{'_btree'} || $self->throw("Unable to tie DB_File handle");
  return $self;

If I uncomment out the $! die statement that I inserted, I get this:

'Cannot open file tree: Too many open files'

Apparently the Collection constructor is creating a new index file for each
one and the handles for each are sticking around?  That confuses me because
(Continue reading)

Florent Angly | 29 Aug 10:40

Re: DB_File and assembly IO

Hi Joshua,

I don't know the specifics of DB_File, but the 'Cannot open file tree: 
Too many open files' is pretty explicit.
If you're on Unix/Linux you can check the files that are open by your 
program by typing:
    lsof | grep name_of_program
There is probably a filehandle that in not closed somewhere in your code 
or the BioPerl code.
Best,

Florent

Joshua Udall wrote:
> Bioperl -
>
> I'm trying to read/parse a single cap3 ace file with several thousand
> contigs.  I get a DB_File error at Contig247.  Here's the error:
>
> ------------- EXCEPTION -------------
> MSG: Unable to tie DB_File handle
> STACK Bio::SeqFeature::Collection::new
> /Users/jaudall/bin/src/bioperl-live/Bio/SeqFeature/Collection.pm:195
> STACK Bio::Assembly::Contig::new
> /Users/jaudall/bin/bioperl-live/Bio/Assembly/Contig.pm:256
> STACK Bio::Assembly::IO::ace::next_assembly
> /Users/jaudall/bin/src/bioperl-live/Bio/Assembly/IO/ace.pm:148
> STACK toplevel /Users/jaudall/bin/read_ace.pl:214
> -------------------------------------
>
(Continue reading)

Chris Fields | 29 Aug 16:30

Re: DB_File and assembly IO

This is a known problem with Bio::Assembly and stems from having a  
DB_File tied (opened) for each Bio::Assembly::Contig (via a retained  
Bio::SeqFeature::Collection).  You can extend the number of open  
filehandles on UNIX'y flavors using ulimit (see following link), but  
I'm not sure about Win32.

http://bugzilla.open-bio.org/show_bug.cgi?id=2320

The general bug is reproducible using the following simple script.  If  
needed adjust the range end in the for loop to exceed the ulimit (via  
'ulimit -n);  Mac OS X 10.5 is set to 2560.

---------------------------
use Bio::Assembly::Contig;

my @contigs;

push @contigs, Bio::Assembly::Contig->new() for (1..10000);
---------------------------

I'll open a bug report on this for tracking (for release 1.7, along  
with any other Bio::Assembly issues).  That doesn't mean it won't get  
fixed sooner, just that we aren't under pressure with the next  
release, which already has a full plate.  IMO, I don't think there  
needs to be one SF::Collection per contig; one instance should work do  
for the entire assembly, using the same SF::Collection passed in to  
each contig and distinguishing the contig using the SeqFeature  
seq_id.  It would also be nice if we could change that to also allow  
other SeqFeature::CollectionI (i.e. Bio::DB::SeqFeature::Store and the  
like, for instance).
(Continue reading)


Gmane