Kevin Brown | 2 Aug 2006 00:43
Picon
Favicon

Re: Getting sequences by base pair locations

Perl Mechanize is a great way to submit web forms repeatedly.  I do it
for things like MHC epitope prediction sites as well as a way to grab
things like journal articles matching certain keywords.

http://www.perl.com/pub/a/2003/01/22/mechanize.html
http://search.cpan.org/dist/WWW-Mechanize/lib/WWW/Mechanize.pm 

> -----Original Message-----
> From: bioperl-l-bounces <at> lists.open-bio.org 
> [mailto:bioperl-l-bounces <at> lists.open-bio.org] On Behalf Of 
> Cook, Malcolm
> Sent: Tuesday, August 01, 2006 8:12 AM
> To: Yuval Itan; bioperl-l <at> lists.open-bio.org
> Subject: Re: [Bioperl-l] Getting sequences by base pair locations
> 
> Yuval,
> 
> Glad to help.  Given that you are not running blat suite 
> locally, but at
> ucsc, you should try this approach:
> 
> upload/paste your blat results (in blat's native output 
> format, psl) as
> a custom track in the genome browser, named, say, myhumanhits
> (i.e. just give the blat results a new first line like: `track
> name="myhumanhits" description="myhumanhits from my favorite human
> genes" visibility=2`)
> then goto the table browser and configure it 
> 	group = 'custom tracks'
> 	track = 'myhumanhits'
(Continue reading)

Amir Karger | 10 Apr 2006 19:06
Picon

Re: Getting sequences by ID

Yuval, if this is all you're doing, it may not be worth using Bioperl.
Here's a Perl one-liner that does it: cut and paste he whole thing onto the
command line and replace the filenames (id_list, input.fasta, and
found.fasta at the end) with your own.

perl -e '($id,$fasta)= <at> ARGV; open(ID,$id); while (<ID>) {s/\r?\n//;
/^>?(\S+)/; $ids{$1}++;} $num_ids = keys %ids; open(F, $fasta); $s_read =
$s_wrote = $print_it = 0; while (<F>) { if (/^>(\S+)/) {$s_read++; if
($ids{$1}) {$s_wrote++; $print_it = 1; delete $ids{$1}} else {$print_it =
0}}; if ($print_it) {print $_}}; END {warn "Searched $s_read FASTA
records.\nFound $s_wrote IDs out of $num_ids in the ID list.\n"}' id_list
input.fasta > found.fasta 

http://www.cgr.harvard.edu/cbg/scriptome/UNIX/Tools/Choose.html#choose_a_set
_of_fasta_sequences_from_a_file__choose_fastas_from_list_

(There's a teensy-bit different version for Windows Perl too.)

You can email me off-list for more info.

- Amir Karger
Computational Biology Group
Bauer Center for Genomics Research
Harvard University
617-496-0626

> -----Original Message-----
> From: Torsten Seemann [mailto:torsten.seemann <at> infotech.monash.edu.au] 
> Sent: Wednesday, April 05, 2006 6:14 PM
> To: Yuval Itan
(Continue reading)


Gmane