Felipe Figueiredo | 11 Sep 10:38

difference in opening file from @ARGV and STDIN?

I'm not sure if this is related to bioperl (Bio::AlignIO) or if it's a
general perl error on my part, but I find strange that the following
code gives differente results depending on how I input the alignment:

--- test.pl ---
#!/usr/bin/perl

use warnings;
use strict;

use Bio::AlignIO;

my $file;
if (@ARGV) {
    $file = shift @ARGV;
}
else {
    $file = "-";
}

my $align = Bio::AlignIO->new(-file=>$file)->next_aln;

printf "Sequences: %s\n",$align->no_sequences;
--- test.pl ---

If I run this using a file containing 4 sequences, the following
hapens: 

--- run tests ---
$ ./test.pl  exemplo-alinhamento.fasta
(Continue reading)

Dave Messina | 11 Sep 13:21
Favicon

Re: difference in opening file from @ARGV and STDIN?

Hi Felipe,

Specifying STDIN via a '-' argument to the -file parameter is not valid.
While that is a convention with some UNIX tools, it's not, as far as I know,
something you should be able to count on.

In BioPerl, one can specify STDIN by passing the \*STDIN filehandle glob to
the -fh parameter (NOT to -file).

In other words,

my $align = Bio::AlignIO->new(-fh => \*STDIN)->next_aln;

That is a convention in BioPerl, so the -file and -fh parameters should work
the same way in AlignIO, SearchIO, SeqIO, etc.

Take a look at the beginners' HOWTO for some examples.
http://www.bioperl.org/wiki/HOWTO:Beginners

Dave
Cook, Malcolm | 11 Sep 20:01

Re: difference in opening file from @ARGV and STDIN?

Filipe and Dave,

I find that the following works generically for SeqIO and AlignIO (at least)...

#after processing all options using GetOpt,
#any remaining options should name files to process...
@ARGV = ('-') unless @ARGV;     # Default to standard input
my %inopt;
$inopt{-fh} ||=  \*ARGV;
my $AlignIO =  Bio::AlignIO->new(
                         %inopt
                        )  or die "calling Bio::AlignIO->new on %inopt" ;

--Malcolm

-----Original Message-----
From: bioperl-l-bounces <at> lists.open-bio.org [mailto:bioperl-l-bounces <at> lists.open-bio.org] On
Behalf Of Dave Messina
Sent: Thursday, September 11, 2008 6:22 AM
To: Felipe Figueiredo
Cc: bioperl-l <at> lists.open-bio.org
Subject: Re: [Bioperl-l] difference in opening file from @ARGV and STDIN?

Hi Felipe,

Specifying STDIN via a '-' argument to the -file parameter is not valid.
While that is a convention with some UNIX tools, it's not, as far as I know, something you should be able to
count on.

In BioPerl, one can specify STDIN by passing the \*STDIN filehandle glob to the -fh parameter (NOT to -file).
(Continue reading)

Dave Messina | 11 Sep 20:47
Favicon

Re: difference in opening file from @ARGV and STDIN?

Thanks, Malcolm.
So then, '-' as STDIN does work?

D
Cook, Malcolm | 11 Sep 22:19

Re: difference in opening file from @ARGV and STDIN?

Note exactly the way I would put it.

Look at the difference between the first command and the second is the following transcript:

> echo -e ">asdf\natgc\n" | perl -MBio::SeqIO -e 'my $s = Bio::SeqIO->new(-format => qw{fasta}, -fh =>
\*ARGV); print $ARGV[0] . qq{ has } . $s->next_seq()->seq . qq{\n}' -- '-'
- has atgc
> echo -e ">asdf\natgc\n" | perl -MBio::SeqIO -e 'my $s = Bio::SeqIO->new(-format => qw{fasta}, -fh =>
\*ARGV); print $ARGV[0] . qq{ has } . $s->next_seq()->seq . qq{\n}' -- 'NoSuchFile'
Can't open NoSuchFile: No such file or directory at /home/mec/cvs/bioperl-live/Bio/Root/IO.pm line 458.
Can't call method "seq" on an undefined value at -e line 1.

THe only difference is that @ARG is the singleton list composed of '-' in the first call, and is the singlton
list composed of 'NoSuchFile' in the second.

If you passed in a list of multiple files that actually do exist, it should work fine.

It is really a matter of ARGV processing magic.

from http://perldoc.perl.org/perlop.html

The null filehandle <> is special: it can be used to emulate the behavior of sed and awk. Input from <> comes
either from standard input, or from each file listed on the command line. Here's how it works: the first
time <> is evaluated, the @ARGV array is checked, and if it is empty, $ARGV[0] is set to "-", which when
opened gives you standard input. The @ARGV array is then processed as a list of filenames. The loop

    while (<>) {
        ...                     # code for each line
    }

(Continue reading)

Dave Messina | 11 Sep 23:15
Favicon

Re: difference in opening file from @ARGV and STDIN?

Cool, thanks for the explanation Malcolm!

At the risk of belaboring this point and your patience, one thing still
confuses me, though:

and if [@ARGV] is empty, $ARGV[0] is set to "-"
>

If $ARGV[0] is set (by Perl's ARGV processing magic) to '-', then why in
your earlier example do you manually set $ARGV[0] to '-' instead of simply
leaving @ARGV empty?

@ARGV = ('-') unless @ARGV;

If I run your example and omit '-' as an argument, it still works:

> echo -e ">asdf\natgc\n" | perl -MBio::SeqIO -e 'my $s =
Bio::SeqIO->new(-format => qw{fasta}, -fh => \*ARGV); print $ARGV[0] . qq{
has } . $s->next_seq()->seq . qq{\n}'
 has atgc

Dave

Gmane