Jason Stajich | 4 Dec 2005 22:49
Favicon

Re: Bio:Seq $seq_obj->accession_number not returningaccession number?

Sam -
Yeah what Barry said.

It doesn't get set when reading fasta files - see Hilmar's link below  
for more info - all the info is in the display id, available in $seq- 
 >display_id

my ($gi,$acc,$locus);
(undef,$gi,undef,$acc,$locus) = split(/\|/,$seq->display_id);
$seq->accession_number($acc);

I thought there was a function already to do this for you, but I  
guess not.  There is something Search::Hit objects to parse accession  
number so maybe we can consolidate this if someone volunteers to do it.

See also Hilmar's response about this:
http://bioperl.org/pipermail/bioperl-l/2005-August/019579.html

I've added it as a Q&A to the new wiki FAQ which we'll roll out soon.

-jason

On Dec 4, 2005, at 4:23 PM, Barry Moore wrote:

> Sam-
>
> The fasta parser makes no attempt to parse the fasta header since  
> there
> is no standard format for what should be in a fasta header.  Parse the
> accession out of the primary_id field with a regular expression in  
(Continue reading)

Barry Moore | 4 Dec 2005 22:23
Picon

RE: Bio:Seq $seq_obj->accession_number not returningaccession number?

Sam-

The fasta parser makes no attempt to parse the fasta header since there
is no standard format for what should be in a fasta header.  Parse the
accession out of the primary_id field with a regular expression in your
script or use GenBank or ENSEMBL format sequences to get all the goodies
parsed for you.  Google on "accession fasta parse site:bioperl.org" to
read other posts on this topic.

Barry

-----Original Message-----
From: bioperl-l-bounces <at> portal.open-bio.org
[mailto:bioperl-l-bounces <at> portal.open-bio.org] On Behalf Of Sam
Al-Droubi
Sent: Sunday, December 04, 2005 1:18 PM
To: BioPerl list BioPerl list
Subject: [Bioperl-l] Bio:Seq $seq_obj->accession_number not
returningaccession number?

The fasta format for this sequence AF410462 from NCBI looks like this

 
>gi|17066572|gb|AF410462.1|AF410462 Mus musculus PEM homeobox (Pem)
gene, promoter region and partial cds
ATGCGTGTGGGCATGCGCTCATGCCCACTTGCTTGAGCACATGTGTGCTCACATGGACGTTAGAGGCAAC
TTTCAGGAGTTATTTTTTTCCCTTCTAACTTGAGTTCCTGGACCTCAGACTTGTATAATAGGTACTTTCC
CAACTTAAGTCTTACTGGCTCCAGGGTATCTGGTATACTCTTCTAGCCTCCAAGGGCAGCCACTCATGCT
TCTTCAGGTGTGAAGAGGTGAGCCAGATACAACGGTGGGAGGCAGTGTGCCCTCAGTGTGTAGACTCTTT
ATGCCCTTGGGGATTAGCGCCTCTAGCTGCCAGTCGGGTCTCTGGGTCCCTCCTGCTAAGGCCACTCTCG
(Continue reading)


Gmane