22 Jul 2012 16:19
Replacing SeqFeature sub_features with compound feature locations
Peter Cock <p.j.a.cock <at> googlemail.com>
2012-07-22 14:19:31 GMT
2012-07-22 14:19:31 GMT
Dear all, One of the 'warts' in the current SeqRecord/SeqFeature object model is how non-trivial features are stored - in particular joins (in the terminology of GenBank/EMBL). Previous discussions include: http://lists.open-bio.org/pipermail/biopython-dev/2009-April/005830.html ... http://lists.open-bio.org/pipermail/biopython-dev/2011-September/009183.html http://lists.open-bio.org/pipermail/biopython-dev/2011-October/009221.html Consider a single gene like this from NC_000932 in our test suite: complement(join(97999..98793,69611..69724)) Currently that becomes three SeqFeature objects, a parent object present in the SeqRecord's feature list, and two child objects (one for each exon) within that parent feature's sub_features list. The parent feature gets a location which summarises the span, so start 97999-1 (Pythonic counting), end 69724, and strand -1. This usage of the sub_features property in this way has been present in Biopython for a very long time, and prevents us using it for nesting features based on the parent/child relationship models used in GFF (e.g. gene and CDS, or gene, mRNA, CDS, and exon). As Brad and I had discussed, a new separate mechanism might be added for explicit parent/child relationships(Continue reading)
RSS Feed