Nick Loman | 2 Feb 2009 12:54
Picon
Picon

Problems importing GenBank Files with complex LOCATION tags

Hi there,

I'm attempting to import the whole of RefSeq into a BioSQL schema using 
the BioPython loader. However, I am encountering problems with items in 
the CON division, such as NW_002063152. I am using stock Biopython 1.49 
install.

The problem occurs when parsing complex CONTIG location tags, such as 
the following (spacing adjusted for readability):

CONTIG
    join(NZ_ABJI01000250.1:1..6235,gap(unk100),
    NZ_ABJI01000251.1:1..2827,gap(1420),NZ_ABJI01000252.1:1..1802,
    gap(unk100),NZ_ABJI01000253.1:1..2460,gap(unk100),
    NZ_ABJI01000254.1:1..12092,gap(639),NZ_ABJI01000255.1:1..1192,
    gap(unk100),NZ_ABJI01000256.1:1..5498,gap(unk100),
    NZ_ABJI01000257.1:1..20442,gap(unk100),NZ_ABJI01000258.1:1..2364,
    gap(511),NZ_ABJI01000259.1:1..17405,gap(unk100),
    NZ_ABJI01000260.1:1..2462,gap(570),NZ_ABJI01000261.1:1..3348,
    gap(410),NZ_ABJI01000262.1:1..815,gap(196),
    NZ_ABJI01000263.1:1..589)

I have worked around the problem by rewriting during my import to 
produce a blank ORIGIN definition, which at least gets the sequence 
features imported.

I realise complex location parsing has been discussed before on this 
list - would the authors expect this to parse correctly, or is it out of 
the scope of the current code?

(Continue reading)

Bruce Southey | 2 Feb 2009 15:39
Picon

Re: Problems importing GenBank Files with complex LOCATION tags

Hi,
I guess this pertains to Bugs 2681 and  2745. Please see Peter's 
comments and suggested patch to Bug 2745.

http://bugzilla.open-bio.org/show_bug.cgi?id=2681
http://bugzilla.open-bio.org/show_bug.cgi?id=2745

Any comments or thoughts on these would be appreciated!

Thanks
Bruce

Nick Loman wrote:
> Hi there,
>
> I'm attempting to import the whole of RefSeq into a BioSQL schema 
> using the BioPython loader. However, I am encountering problems with 
> items in the CON division, such as NW_002063152. I am using stock 
> Biopython 1.49 install.
>
> The problem occurs when parsing complex CONTIG location tags, such as 
> the following (spacing adjusted for readability):
>
> CONTIG
>    join(NZ_ABJI01000250.1:1..6235,gap(unk100),
>    NZ_ABJI01000251.1:1..2827,gap(1420),NZ_ABJI01000252.1:1..1802,
>    gap(unk100),NZ_ABJI01000253.1:1..2460,gap(unk100),
>    NZ_ABJI01000254.1:1..12092,gap(639),NZ_ABJI01000255.1:1..1192,
>    gap(unk100),NZ_ABJI01000256.1:1..5498,gap(unk100),
>    NZ_ABJI01000257.1:1..20442,gap(unk100),NZ_ABJI01000258.1:1..2364,
(Continue reading)


Gmane