江文恺 | 15 Sep 22:00
Favicon

HSP tiling problem


Dear all:
I try to use bioperl's SearhIO module to parse the blast output, 
the output of blast contain many HSPs from the same hit, each HSP come out with an count of identity residues
and alignment length, but what i want to get is the global identiy and alignment length of the query
sequence and hit sequence, which may be done by tiling the HSPs and build a HSP contig, 

the bioperl BLASTutil module contain "hsp_tiling and length_aln, frac_identical" method fullfill my
purpose, but i read through the mailing list, some user said the method used by bioperl were not precise in
many cases, they sugguest to use WUBLAST.

but i don't know which WUBLAST opinion should i use, i use "links" opinion, but the output still give me lots
of HSPs, are these HSPs created from some small HSPs?best regards!wenkaichinese academy of sciences
_________________________________________________________________
一边聊天一边快速搜索,并把结果共享给好友,立刻试试!
http://im.live.cn/Share/18.htm
_______________________________________________
Bioperl-l mailing list
Bioperl-l <at> lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l
Jason Stajich | 15 Sep 22:27

Re: HSP tiling problem

The -links option gives you a logical ordering of non-overlapping HSPs  
so it is the tiling you desire. You just process each HSP from the  
list provided by -links. There may be several alternative paths so you  
can compute a %id for all of them or just take the longest one, etc.

You need to just decide how you want to compute global identity - an  
average of the HSP identities or just sum up the number of identical  
bases across them all divided by the total length of sequence that is  
aligned.  All those pieces of information are available and described  
in the SearchIO HOWTO on the website.

Or, requiring less coding, re-align the sequences with an aligner like  
SSEARCH to get a single alignment and a single %id/%sim number for the  
sequence pair.

-jason

On Sep 15, 2008, at 1:00 PM, 江文恺 wrote:

>
> Dear all:
> I try to use bioperl's SearhIO module to parse the blast output,
> the output of blast contain many HSPs from the same hit, each HSP  
> come out with an count of identity residues and alignment length,  
> but what i want to get is the global identiy and alignment length of  
> the query sequence and hit sequence, which may be done by tiling the  
> HSPs and build a HSP contig,
>
> the bioperl BLASTutil module contain "hsp_tiling and length_aln,  
> frac_identical" method fullfill my purpose, but i read through the  
(Continue reading)


Gmane