$seq = Bio::PreSeq->new;
$seq = Bio::PreSeq->new($filename);
$seq = Bio::PreSeq->new(-seq=>'ACTGTGGCGTCAACTG');
$seq = Bio::PreSeq->new(-seq=>$sequence_string);
$seq = Bio::PreSeq->new(-seq=>@character_list);
$seq = Bio::PreSeq->new(-file=>'seqfile.aa',
-desc=>'Sample Bio::PreSeq sequence',
-numbering=>'1',
-type=>'Amino',
-ffmt=>'Fasta');
$seq = Bio::PreSeq->new($file,$seq,$id,$desc,$names,
$numbering,$type,$ffmt,$descffmt);
$seq->[METHOD];
$result = $seq->[METHOD]; Accessors -------------------------------------------------------- There are a wide variety of methods designed to give easy and flexible access to the contents of sequence objects The following accessors can be invoked upon a sequence object
ary() - access sequence (or slice of sequence) as an array str() - access sequence (or slice of sequence) as a string getseq() - access sequence (or slice) as string or array seq_len() - access sequence length id() - access/change object id desc() - access/change object description names() - access/change object names numbering() - access/change sequence numbering offset origin() - access/change sequence origin type() - access/change sequence type ffmt() - access/change default output format descffmt() - access/change description format setseq() - change sequence Methods -------------------------------------------------------- The following methods can be invoked upon a sequence object
copy() - returns an exact copy of an object alphabet_ok() - check sequence against genetic alphabet layout() - sequence formatter for output revcom() - reverse complement of sequence complement() - complement of sequence reverse() - reverse of sequence Dna_to_Rna() - translate Dna seq to Rna Rna_to_Dna() - translate Rna seq to Dna translate() - protein translation of Dna/Rna sequence
Bio::PreSeq is the precursor to what will eventually become Bio::Seq when things are fully stable. =head2 Sequence Types
Currently the following sequence types are recognized:
Dna Rna Amino
This module uses the standard extended single-letter genetic alphabets to represent nucleotide and amino acid sequences. In addition to the standard alphabet, the following symbols are also acceptable in a biosequence: ? (a missing nucleotide or amino acid) - (gap in sequence)
(includes symbols for nucleotide ambiguity) ------------------------------------------ Symbol Meaning Nucleic Acid ------------------------------------------ A A Adenine C C Cytosine G G Guanine T T Thymine U U Uracil M A or C R A or G W A or T S C or G Y C or T K G or T V A or C or G H A or C or T D A or G or T B C or G or T X G or A or T or C N G or A or T or C
------------------------------------------ Symbol Meaning ------------------------------------------ A Alanine B Aspartic Acid, Asparagine C Cystine D Aspartic Acid E Glutamic Acid F Phenylalanine G Glycine H Histidine I Isoleucine K Lysine L Leucine M Methionine N Asparagine P Proline Q Glutamine R Arginine S Serine T Threonine V Valine W Tryptophan X Unknown Y Tyrosine Z Glutamic Acid, Glutamine * Terminator
In addition to ``raw'' sequence files, PreSeq.pm is currently only able to read in Fasta and GCG formatted single sequence files. Support for additional formats is forthcoming.
PreSeq.pm has the ability to make use of D.G. Gilbert's ReadSeq program when reading in sequence files. ReadSeq has the ability to read and interconvert between many different biological sequence formats.
When readseq is present and PreSeq.pm has been properly configured to use it, ReadSeq will be invoked when internal parsing code fails to recognize the sequence.
Formats which readseq currently understands:
- IG/Stanford - GenBank/GB - NBRF - EMBL - GCG - DnaStrider - Fitch format - Pearson/Fasta - Zuker format - Olsen format - Phylip3.2 - Phylip - Plain/Raw * MSF * PAUP's multiple sequence (NEXUS) format * PIR/CODATA format used by PIR * ASN.1 format used by NCBI
Note: Formats indicated with a '*' allow for multiple
sequences to be contained within one file. At this
time, the behaviour of PreSeq.pm with regard to these
multiple-sequence files has not been spefified.
Readseq is freely distributed and is available in shell archive (.shar) form via FTP from ftp.bio.indiana.edu (129.79.224.25) in the molbio/readseq directory. (URL) ftp://ftp.bio.indiana.edu/molbio/readseq/
If ReadSeq is not available or PreSeq.pm is not configured to use it, internal parsing mechanisms will be used.
Currently supported filetypes for input: Raw, Fasta
PreSeq.pm is one part of the larger Bio::Perl project. Bio::Perl will eventually encompass a range of molecular-biology related perl modules and object-oriented classes.
This distribution should be able to be installed just like any other perl module:
`perl Makefile.PL` # makes a system-specific makefile `make` # makes the distribution `make test` # runs the test code `make install` # [may need root access for system install]
Makefile.PL will ask if you want the modules to be configured so that they may use the ReadSeq sequence conversion program. If you do not have ReadSeq installed or do not wish it to be used, simply answer 'no' to the question. If you do want ReadSeq support enabled, you will have to provide a fully qualified pathname at this time. Makefile.PL will then auto-configure the modules using a series of in-place edits.
@INC array. Perl checks all the directories
listed in the @INC array when looking for modules. All of the
perl modules that are part of the standard distribution can be found in
/usr/local/lib/perl5/ [your system paths may vary slightly]. There should
also be a directory such as ``/usr/local/lib/perl5/site_perl/'', this is
where PreSeq.pm belongs. User-installed perl modules that are not part of
the standard perl distribution should be kept in the site_perl/ directory,
this separation is needed to protect site-specific modules from getting
inadvertently altered when installing new patches or versions of perl. Once
in this location, PreSeq.pm can be accessed by invoking ``use Seq;'' in
your perl script.
If PreSeq.pm is part of a larger bio::perl distribution, the individual modules making up the distribution should be placed within their own ``Bio/'' subdirectory off of the main perl5/site_perl/ location. PreSeq.pm in this case would be found in the path Bio/PreSeq.pm. To use PreSeq.pm in your perl script, invoke ``use Bio::PreSeq;''
If you lack permission or are unable to access the perl distribution directories, ask your system administrator to place the files there for you, or keep PreSeq.pm in the same location as the perl script you are writing. As a last resort when looking for a module, perl will always check the current directory.
You can also explicitly tell perl where to look for PreSeq.pm by including
the following code in your script (set the value of
$INSTALL_PATH to whatever is appropriate on your local
system):
BEGIN { use vars qw($INSTALL_PATH);
$INSTALL_PATH = "/usr2/users/dag/bioperl/dist/Perl"; }
use lib "$INSTALL_PATH/Bio/PreSeq"; use PreSeq;
o From the prospective of novice or occasional perl users, objects are useful because they can offer direct and simple ways to do things that in reality may be somewhat complex or arcane. Users interact with and manipulate objects via specific, documented methods and never have to worry about what is going on "behind the scenes." Many perl programmers have devoted significant amounts of time and effort creating easy-to-use "wrappers" around complex or abstract tasks. Visit the CPAN Module list at (URL) http://www.perl.com/perl/CPAN/CPAN.html to see the fruits of their labor. o From the prospective of a perl power-user, object-oriented programming allows programmers to write code that is easily scalable and reusable. This allows powerful applications to be built rapidly with and with a minimum of waste or repeated effort.
use PreSeq;
new() function.
The proper syntax for accessing the new() function in
PreSeq.pm is as follows:
$myseq = Bio::PreSeq->new;
Of course, objects are only useful if they have something in them so you would probably want to pass along some additional information or arguments to the constructor. The foundation of any biosequence object is course the sequence itself.
You can address new() with a sequence directly:
$myseq = Bio::PreSeq->new(-seq=>'AACTGGCGTTCGTG');
Or you can pass in a string or a list:
$myseq = Bio::PreSeq->new(-seq=>$sequence_string); $myseq = Bio::PreSeq->new(-seq=>@sequence_list);
It is also possible to create a new sequence object based on a sequence contained in a file. You can tell constructor where to find the sequence file by passing in the 'file' parameter:
$myseq = Bio::PreSeq->new(-file=>'seqfile.gcg');
Because there are so many different conventions or formats for storing sequence information in files, it would be polite (although not absolutely necessary) to tell the constructor what format the sequence file is in. We can provide that information via the file-format or 'ffmt' field. To create a sequence object based upon a GCG-formatted sequence file:
$myseq = Bio::PreSeq->new(-file=>'seqfile.gcg',-ffmt=>'GCG');
We've already introduced three different object attributes or arguments
that can be passed to the new() object constructor ('seq','file' and
'ffmt') so now would be a good time to introduce them all:
BioSeq Constructor Arguments
file: The ``file'' argument should be a string value containing path and filename information for a sequence file that is to be read into an object.
seq: The ``seq'' argument is for passing in sequence directly instead of reading in a sequence file. The sequence should consist of RAW info (no whitespace, newlines or formatting) and can be passed in as either an array/list or string.
id: The ``id'' argument should be a ONE-WORD string value giving a short name for the sequence.
desc: The ``desc'' argument should be a string containing a description of the sequence. This field is not limited to one word.
names:
The ``names'' argument should be a hash or reference to a hash that
contains any number of user generated key-value pairs. Various bits of
identifying information can be stored here including name(s),
database locations, accession numbers, URL's, etc.
type: The ``type'' argument should be a string value describing the sequence type eg; ``Dna'', ``Rna'' or ``Amino''.
origin: The ``origin'' argument should be a string value describing sequence origin info
numbering: The ``numbering'' argument should be an integer value containing the sequence numbering offset value. By default all sequence are numbered starting with 1.
ffmt: The ``ffmt'' argument should be a string describing sequence file-format. If a sequence is being read from a file via the ``file'' argument, ``ffmt'' is used to invoke the proper parsing code. ``ffmt'' is also the default format for sequence output when the layout method is called. See elsewhere in this documentation for info regarding recognized sequence file-formats.
If most of these arguments were used at once to create a sequence object, it would look something like this:
#Set up the name hash %names = ( 'CloneID','DB1', 'Isolate','5', 'Tissue','Xenopus', 'Location','/usr2/users/dag/bioperl/sample.tfa' );
$name_ref = \%names;
#Create the object
$myseq = new Bio::PreSeq(-file=>'sample.tfa',
-names=>$name_ref,
-type=>'Dna',
-origin=>'Xenopus mesoderm',
-numbering=>'1',
-desc=>'Sample Bio::PreSeq sequence',
-ffmt=>'Fasta');
For each defined way to access information from a biosequence object, there is a corresponding "method" that is invoked. What follows is a brief description of each accessor method. For more detailed information see the individual annotations for each method near the end of this document. Sequence The sequence can be accessed in several ways via the seq() method. Depending on how it is invoked, it can return either a string or a list value. Both examples are appropriate: @sequence_list = $myseq->seq; $sequence_string = $myseq->seq; Sequence "slices" can be accessed by passing start and stop integer position arguments to getseq(): @slice = $myseq->getseq($start,$stop); @slice = $myseq->getseq(1,50); @slice = $myseq->getseq(100); If no stop value is passed in, seq() will return a slice from the start position to the end of the sequence. Slices are returned in the context of the object "numbering" attribute, not absolute position so be aware of the objects numbering scheme. Sequences can also be accessed in with the ary() and str() methods. The ary() method will always return a list value and str() will always return a string. Otherwise they are functionally identical to the seq() method. $sequence = $myseq->str; @sequence = $myseq->ary; @slice = $myseq->ary($start,$stop); $slice = $myseq->str($start,$stop); Sequence length The sequence length can be accessed by $len = $myseq->seq_len; Sequence ID The ID field can be accessed by $ID = $myseq->id; Description The object description field can be accessed by $description = $myseq->desc; Names The associative array (hash) that contains flexible information regarding alternative sequence names, database locations, accession numbers, etc. can be accessed by %name_hash = $myseq->names; Sequence numbering The default numbering offset for the sequence can be accessed by $numbering = $myseq->numbering; Sequence Origin The object origin field can be accessed by $seq_origin = $myseq->origin; File input format / default output format The object format field can be accessed by $format = $myseq->ffmt;
In the previous section it was shown how object attributes and values could
be retrieved from a sequence object by calling upon various methods. Many
of the above methods will also allow the user to CHANGE object attributes
by passing in additional arguments. Detailed information on each method can
be found in the Appendix.
Changing the sequence
The sequence information for an object can be changed by passing a string
or list value to the _seq() method. Here are some ways that sequence
information can be changed
$myseq->seqseq($new_sequence_string);
$myseq->setseq(@new_sequence_list);
$myseq->setseq("aaccttgcctgc");
The setseq() method checks sequence elements and warns if it finds
non-standard characters. Because of this, arbitrary sequence compositions
are not supported at this time. This method is considered slightly
'insecure' because the 'id','desc' and 'type' fields are not updated
along with the sequence. If necessary, the user must make the appropriate
changes to these fields whenever sequence information is updated or changed.
Changing the sequence ID
The ID field can be changed by passing in a new ID argument
$myseq->id($new_id);
Changing the object description
The object description field can be changed by passing in a new argument
$myseq->desc($new_desc);
Changing the object names hash
The associative array (hash) that contains flexible information regarding
alternative sequence names, database locations, accession numbers, etc. can
be changed by passing in a reference to a new hash.
$hash_ref = \%name_hash;
$myseq->names($hash_ref);
Changing the sequence numbering offset
The default numbering offset for the sequence can be changed by passing in
a new value
$myseq->numbering(1);
$myseq->numbering($new_value);
Sequence Origin
The object origin field can be changed by passing in a new string value
$myseq->origin("mitochondrial");
$myseq->origin($origin_string);
File input format / default output format
The object format field can be accessed by passing in a new value
$myseq->ffmt("GCG");
Creating, accessing and changing biosequence objects and fields is all well and good, but eventually you are going to want to actually do some work.
Included with PreSeq.pm are some commonly used utility methods for
manipulating sequence data. So far PreSeq.pm contains methods for: Copying
a biosequence object: $new_obj = $myseq->copy;
Reversing a sequence
$reversed_seq = $myseq->reverse;
Complementing a sequence
The 2nd strand, or "complement" of a biosequence can be obtained by calling
upon the complement method.
$comp_seq = $myseq->complement;
Reverse complementing a sequence
$rev_comp = $myseq->revcom;
Translating Dna to Rna
$rna_seq = $myseq->Dna_to_Rna;
Translating Rna to Dna
$dna_seq = $myseq->Rna_to_Dna;
Translating Dna or Rna to protein
$peptide_seq = $myseq->translate;
Checking the sequence alphabet To check if any nonstandard characters are present in a biosequence, an alphabet_ok() method is provided. The method returns "1" if everything is OK, otherwise it returns a "0".
if($myseq->alphabet_ok) { print "OK!!\n"; }
else { print "Not OK! \n"; }
There are several methods for outputting formatted sequences. For your
convenience, a "meta-output" method called layout() also exists.
If layout() is called without any arguments, it calls upon the output
methods as defined by the "ffmt" field.
print $myseq->layout;
The "ffmt" field is mainly used to describe the format of a sequence
being read in from a file. It is also used as the default format for
all sequence output. If these differ (ie; the format that the
sequence was read in is not desired as a default output style) then
"ffmt" should be set manually via the ffmt() accessor method. Of course,
after reading the sequence in you are free to change "ffmt" at will.
layout() can also be called with specific formats:
$gcg_formatted_seq = $myseq->layout("GCG"):
$fasta_seq = $myseq->layout("Fasta"):
Calling output methods directly
Many output methods accept unique named parameters/arguments that allow a
greater degree of control over output format and style, to take advantage
of these abilities, the formatting methods must be called directly. See the
appendix notes describing each output format for detailed information.
print $myseq->out_GCG(-date->"10 May 1996",
-caps-"up");
Most output methods will return either a string or list value depending
on how they are invoked, check the detailed method documentation in
the Appendix to be sure.
@formatted_seqlist = $myseq->out_genbank(-id=>'New ID',
-def=>'User defined definition',
-acc=>'User defined accession');
$formatted_seqstring = $myseq->out_genbank(-id=>'New ID',
-def=>'User defined definition',
-acc=>'User defined accession');
[to be completed]
[to be completed]
Title : new
Usage : $mySeq = Bio::PreSeq->new($file,$seq,$id,$desc,$names,
$numbering,$type,$ffmt,$descffmt);
: - or -
: $mySeq = Bio::PreSeq->new(-file=$file,
-seq=>$seq,
-id=>$id,
-desc=>$desc,
-names=>$names,
-numbering=>$numbering,
-type=>$type,
-origin=>$origin,
-ffmt=>$ffmt,
-descffmt=>$descffmt);
Function : The constructor for this class, returns a new object.
Example : See usage
Returns : Bio::PreSeq object
Argument : $file: file from which the sequence data can be read; all
the other arguments will overwrite the data read in.
"_nofile" is recommanded if no file is given.
$seq: String or array of characters
$id: String describing the ID the user wishes to assign.
$desc: String giving a description of the sequence
$names: A reference to a hash which stores {loc,name}
pairs of other database locations and corresponding names
where the sequence is located.
$numbering: The offset of the sequence, as an integer
$type: The type of the sequence, see type()
$origin: The sequence origin
$ffmt: Sequence format, see ffmt()
$descffmt: format of $desc, see descffmt()
Title : _initialize
Usage : n/a (internal function)
Function : Assigns initial parameters to a blessed object.
Example :
Returns :
Argument : As Bio::PreSeq->new, allows for named or listed parameters.
See ->new for the legal types of these values.
Title : _rearrange
Usage : n/a (internal function)
Function : Rearranges named parameters to requested order.
Example : $self->_rearrange([SEQUENCE,ID,DESC],@p);
Returns : @params - an array of parameters in the requested order.
Argument : $order : a reference to an array which describes the desired
order of the named parameters.
@param : an array of parameters, either as a list (in
which case the function simply returns the list),
or as an associative array (in which case the
function sorts the values according to @{$order}
and returns that new array.
Title : _seq()
Usage : n/a, internal function
Function : called by new() to set sequence field. Checks
: alphabet before setting.
:
Returns : n/a
Argument : sequence string
Title : _monomer()
Usage : n/a, internal function
Function : Returns the internal monomer that represents
: sequence type.
:
: Sequence type is treated internally as a monomer
: defined by the %SeqAlph hash. The type field
: is a list of format [monomer,origin]. For any
: output outside the module, the monomer is resolved
: back into string form via the %TypeSeq hash.
:
Returns : original type setting [as monomer]
Argument : none
Title : _file_read()
Usage : n/a (Internal Function)
Function : _file_read is called whenever the constructor is called
: with the name of a sequence to be read from disk.
:
Example : n/a, only called upon by _initialize()
Returns :
Argument :
Title : str
Usage : str([$start,[$end]])
Function : Returns the sequence of the object as a string, or a slice
of the sequence if $start/$end are defined. If $start is
defined and $end isn't, the slice is from $start to the
end of the sequence.
Example : $slice = $myObject->str(3,9);
Returns : string scalar
Argument : $start,$end (both integers). They are interpreted w.r.t. the
specific numeration of the sequence!! ($self->{numbering})
Title : getseq
Usage : getseq([$start,[$end]])
Function : Returns the sequence of the object as an array or a char
string, depending on the value of wantarray. Will rtn a slice
of the sequence if $start/$end are defined. If $start is
defined and $end isn't, the slice is from $start to the
end of the sequence.
Example : @slice = $myObject->seq(3,9);
Returns : regular array of characters, or a scalar string
Argument : $start,$end (both integers). They are interpreted w.r.t. the
specific numeration of the sequence!! ($self->{numbering})
Title : id()
Usage : $seq_id = $myseq->id;
: $myseq->id($id_string);
:
Function : Sets field if an ID argument string is
: passed in. If no arguments, returns ID value for
: object.
:
Returns : original ID value
Argument : sequence string
Title : desc()
Usage : $description = $myseq->desc;
: $myseq->desc($desc_string);
:
Function : Sets field if an argument string is
: passed in. If no arguments, returns original value for
: object description field.
:
Returns : original value for description
Argument : sequence string
Title : names()
Usage : %names = $myseq->names;
: $myseq->names($hash_ref);
:
Function : Sets field if a name hash refrence is
: passed in. If no arguments, returns original
: names hash.
:
Returns : hash refrence (associative array)
Argument : refrence to a hash (associative array)
Title : numbering()
Usage : $num_start = $myseq->numbering;
: $myseq->numbering($value);
:
Function : Sets field if an argument is
: passed in. If no arguments, returns original value.
:
Returns : original value
Argument : new value
Title : origin()
Usage : myseq->origin($value)
Function : Sets the origin field which is actually the second
: field of the Type list. The {type} field is a 2 value list
: with a format of ["Monomer","Origin"]
:
Returns : Original value
Argument : string
Title : type()
Usage : myseq->type($value)
Function : Sets the type field which is the first
: field of the Type list. The {type} field is a 2 value list
: with a format of ["Monomer","Origin"]
:
Returns : Original value
Argument : string containing a valid sequence type
Title : ffmt()
Usage : $format = $myseq->ffmt;
: $myseq->ffmt("Fasta");
:
Function : The file format field is used by the internal
: sequence parsing code when trying to read
: in a sequence file. It is also what is used
: as a default output format if the layout
: method is called without an argument.
:
: If a sequence object is created without
: reading in a file, or if the file is read
: in with the use of the ReadSeq package then
: the ffmt field can be set to indicate any default
: output-format preference.
:
: If a sequence is read from a file and parsed
: by internal code (ReadSeq not used) then the ffmt
: field should describe the format of the sequence
: file. The ffmt field is used to send the sequence
: to the correct internal parsing code.
:
Returns : original ffmt value
Argument : recognized ffmt string value (see list of recognized
: formats)
Title : descffmt()
Usage : $desc = $myseq->descffmt;
: $myseq->descffmt($new_value);
Function :
:
Returns : original value
Title : setseq()
Usage : $self->setseq($new_sequence);
Function : Changes the sequence inside a bioseq object
:
Returns :
Argument : sequence string
Title : parse
Usage : parse($ent,[$ffmt]);
Function : Invokes the proper parsing code depending on
: the value of the object 'ffmt' field.
Example : $self->parse;
Returns : n/a
Argument : the prospective sequence to be parsed,
: and optionally its format so that it doesn't need to
: be estimated
Title : parse_unknown
Usage : parse_unknown($ent);
Function : tries to figure out the format of $ent and then
: calls the appropriate function to parse it into $self->{seq}.
Example : $self->parse_unknown;
Returns : n/a
Argument : $ent : the rough multi-line string to be parsed
Title : parse_bad
Usage : parse_bad;
Function : complains of un-parsable sequence, last-ditch attempt via
: Parse.pm if sequence is being read from a file.
:
Example : $self->parse_bad;
Returns : n/a
Argument : n/a
Title : parse_raw
Usage : parse_raw;
Function : parses $ent into the $self->{seq} field, using Raw
: file format.
Example : $self->parse_raw;
Returns : n/a
Argument : n/a
Title : parse_fasta
Usage : parse_fasta;
Function : parses $ent into the "seq" field, using Fasta
: file format.
:
To-do : use benchmark module to find best/fastest parse
: method
:
Example : $self->parse_fasta;
Returns : n/a
Argument : n/a
Title : parse_gcg
Usage : used by internal code
Function : Parses the sequence out of a gcg-format string and
: sets the object sequence field accordingly. This is
: a simple, ineffecient method for grabbing JUST the
: sequence.
:
To-do : - parse out more info than just sequence
: - implement alphabet checking
: - better regular expressions/efficiency
: - carp on unexpected / wrong-format situations
:
Version : .01 / 16 Jan 1997
Returns : 1
Argument : gcg-formatted sequence string
layout()
Title : layout()
Usage : layout([$format]);
Function : Returns the sequence in whichever format the user specifies,
or in the "ffmt" field if the user does not specify a format.
Example : $fastaFormattedSeq = $myObj->layout("Fasta");
Returns : varies
Argument : $format (one of the formats as defined in $SeqForm).
Title : out_bad() Usage : out_bad; Function : Croaks if we don't know the output format. Example : $self->out_bad; Returns : n/a Argument : n/a
Title : out_raw Usage : out_raw; Function : Returns the sequence in Raw format. Example : $self->out_raw; Returns : string sequence, in raw format Argument : n/a
Title : out_fasta
Usage : out_fasta;
Function : Returns the sequence as a string in FASTA format.
Example : $self->out_fasta;
:
To-do : benchmark code / find fastest method
:
Returns : string sequence in Fasta format
Argument : n/a
Title : dump
Usage : @results = $mySeq->dump; -or-
: $results = $mySeq->dump;
:
Function : Returns a formatted array or string (depending on how it
: is invoked) containing the contents of a
: Bio::PreSeq object. Useful for debugging
:
: ***This is used by Chris Dagdigian for debugging ***
: ***Probably should be removed before distribution***
:
Example : @results = $mySeq->dump;
: foreach(@results){print;}
: -or-
: print $myseq->dump;
:
Returns : Array or string depending on value of wantarray
Argument : n/a
Title : out_primer()
Usage : $formatted_seq = $myseq->out_primer;
: @formatted_seq = $myseq->out_primer;
:
: print $myseq->out_primer(-id=>'New ID',
: -header=>'This is my header');
:
Function : outputs a sequence in primer format
:
Note : Not a supported output type - (cant be invoked via layout)
: Use at your own risk :)
:
Example : see usage
:
Revision : 0.01 / 20 Dec 1996
Returns : string or list, depending on how it is invoked
Argument : named list parameters for "id" and "header" are alowed
Title : out_pir()
Usage : $formatted_seq = $myseq->layout("PIR");
: $formatted_seq = $myseq->out_pir;
: @formatted_seq = $myseq->out_pir;
:
: print $myseq->out_pir(-title=>'New TITLE',
: -entry=>'New ENTRY',
: -acc=>'User defined accession',
: -date=>'User defined date',
: -reference=>'User defined ref info');
:
Function : Returns a string or an array depending on how it
: is invoked. Can be easily accessed via the layout()
: method, or if more output control is desired it can
: be called directly with the folowing named parameters:
:
: -entry PIR entry
: -title PIR title
: -acc user defined accession number
: -reference user defined reference
: -date user defined date/time info
:
: All named parameters will take precedance over any
: default behavior. When there are no user arguments,
: the default output is as follows:
:
: PIR 'ENTRY' = sequence object "id" field
: PIR 'TITLE' = sequence object "desc" field
: PIR 'DATE' = curent date/time
: PIR 'ACC' = not used in default output
: PIR 'REFERENCE' = not used in default output
:
Note : Not tested stringently.
:
WARNING : Does not deal with numbering issue
:
To-do : - Allow user to pass in hash of additional fields/values
: - Deal with numbering issue
:
Example : see usage
:
Revision : 0.02 / 12 Jan 1997
Returns : string or list, depending on how it is invoked
Argument : named list parameters are allowed, see above
Title : out_genbank()
Usage : $formatted_seq = $myseq->out_genbank;
: @formatted_seq = $myseq->out_genbank;
: print $myseq->out_genbank(-id=>'New ID',
: -def=>'User defined definition',
: -acc=>'User defined accession',
: -origin=>'User defined origin info',
: -spacing=>'single',
: -caps=>'up',
: -date=>'DATE GOES HERE',
: -type=>'mRna');
:
Function : Returns a GenBank formatted sequence array or string
: depending on the value of wantarray when invoked via layout().
: If more control is desired over output format, out_genbank()
: can be addressed directly with the following named parameters:
:
: def - Sequence definition information
: acc - Sequence accession number
: origin - Sequence origin information
: id - short name
: date - new date info
: type - sequence type (Dna, mRna, Amino, etc.)
: spacing - "single" or "double" sequence line spacing
: caps - "up" or "down" sequence capitalization
:
: When invoked via layout() or called directly with no
: arguments, the following default behaviours apply:
: DATE = Current date and time
: DEFINITION = object's description field
: ID = object's ID field
: SPACING = single
:
: All named parameters must be strings. Passed in parameters will
: always take precedence over any fields with default settings.
:
Note : Format not stringently tested for accuracy. Sequence is numbered
: according to the integer specified in the object 'numbering' field
: but the implementation has not been robustly tested.
:
To-do : - allow user hash reference for additional format fields
:
Example : see usage
:
Revision : 0.02 / 12 Jan 1997
Returns : string or list, depending on how it is invoked
Argument : named list parameters are allowed, see above
Title : out_GCG
Usage : $formatted_seq = $mySeq->layout("GCG");
: @formatted_seq = $mySeq->layout("GCG");
:
: print $myseq->out_GCG(-id=>'New ID',
: -spacing=>'single',
: -caps=>'up',
: -date=>'DATE GOES HERE',
: -header=>'This is a user submitted header',
: -type=>'n');
:
Function : Returns a GCG formatted sequence array or string
: depending on the value of wantarray when invoked via layout().
: If more control is desired over output format, out_GCG()
: can be addressed directly with the following named parameters:
:
: header - first line(s) of formatted sequence
: id - short name that appears before 'Length:' field
: date - overwrite default date info
: type - can be "N" or "P", for nucleotide/protein
: spacing - "single" or "double" sequence line spacing
: caps - "up" or "down" sequence capitalization
:
: When invoked via layout() or called directly with no
: arguments, the following default behaviours apply:
: DATE = Current date and time
: DEFINITION = object's description field
: ID = object's ID field
: SPACING = single
:
: All named parameters must be strings. Passed in parameters will
: always take precedence over any fields with default settings.
:
Example :
Output :
:Sample Bio::PreSeq sequence
: sample Length: 240 Wed Nov 27 13:24:28 EST 1996 Type: N Check: 5371 ..
:
: 1 aaaacctatg gggtgggctc tcaagctgag accctgtgtg cacagccctc
: 51 tggctggtgg cagtggagac gggatnnnat gacaagcctg ggggacatga
: 101 ccccagagaa ggaacgggaa caggatgagt gagaggaggt tctaaattat
: 151 ccattagcac aggctgccag tggtccttgc ataaatgtat agagcacaca
: 201 ggtgggggga aagggagaga gagaagaagc cagggtataa
:
:
Note : GCG formatted sequences contain a "Type:" field.
: If Type cannot be internally determined and no
: Type name-parameter is passed in then the Type:
: field is not printed.
:
Warning : Unconventional numbering offsets may not
: be robustly handled
:
Revision : 0.06 / 12 Jan 1997
Source : Found guts of this code on bionet.gcg, unknown author
Returns : Array or String
Argument : n/a
Title : out_nbrf()
Usage : $self->layout("NBRF") or $self->out_nbrf
:
Function : FORMAT NOT INTERNALLY IMPLEMENTED YET!!!
:
: If the ReadSeq wrapper Parse.pm apppears
: to be configured properly it is used
: to generate the output.
:
: If Parse.pm cannot be used then this code
: carps out with an error message.
:
To-do : write internal output code
:
Version : 1.0 / 16 MAR 1997
Example : see Usage
Returns : FORMATTED STRING (wantarray is not used here!)
Argument :
Title : out_ig()
Usage : $self->layout("IG") or $self->out_ig
:
Function : FORMAT NOT INTERNALLY IMPLEMENTED YET!!!
:
: If the ReadSeq wrapper Parse.pm apppears
: to be configured properly it is used
: to generate the output.
:
: If Parse.pm cannot be used then this code
: carps out with an error message.
:
To-do : write internal output code
:
Version : 1.0 / 16 MAR 1997
Example : see Usage
Returns : FORMATTED STRING (wantarray is not used here!)
Argument :
Title : out_strider()
Usage : $self->layout("Strider") or $self->out_strider
:
Function : FORMAT NOT INTERNALLY IMPLEMENTED YET!!!
:
: If the ReadSeq wrapper Parse.pm apppears
: to be configured properly it is used
: to generate the output.
:
: If Parse.pm cannot be used then this code
: carps out with an error message.
:
To-do : write internal output code
:
Version : 1.0 / 16 MAR 1997
Example : see Usage
Returns : FORMATTED STRING (wantarray is not used here!)
Argument :
Title : out_zuker()
Usage : $self->layout("Zuker") or $self->out_zuker
:
Function : FORMAT NOT INTERNALLY IMPLEMENTED YET!!!
:
: If the ReadSeq wrapper Parse.pm apppears
: to be configured properly it is used
: to generate the output.
:
: If Parse.pm cannot be used then this code
: carps out with an error message.
:
To-do : write internal output code
:
Version : 1.0 / 16 MAR 1997
Example : see Usage
Returns : FORMATTED STRING (wantarray is not used here!)
Argument :
Title : out_msf()
Usage : $self->layout("MSF") or $self->out_msf
:
Function : FORMAT NOT INTERNALLY IMPLEMENTED YET!!!
:
: If the ReadSeq wrapper Parse.pm apppears
: to be configured properly it is used
: to generate the output.
:
: If Parse.pm cannot be used then this code
: carps out with an error message.
:
To-do : write internal output code
:
Version : 1.0 / 16 MAR 1997
Example : see Usage
Returns : FORMATTED STRING (wantarray is not used here!)
Argument :
# Title : copy # Usage : $copyOfObj = $mySeq->copy; # Function : Returns an identical copy of the object. # Example : # : # : # Returns : Bio::PreSeq object ref. # Argument : n/a #-----------------------------------------------------------------------
Title : complement
Usage : $complemented_seq = $mySeq->compliment;
Function : Returns a char string containing
: the complementary sequence (eg; other strand)
: of the original sequence. The translation method
: is identical to revcom() but the nucleotide order
: is not reversed.
:
Example : $complemented_seq = $mySeq->complement;
:
Source : Guts from Jong's <jong@mrc-lmb.cam.ac.uk>
: library of molbio perl routines
Note :
: The letter codes and complement translations
: are those proposed by IUB (Nomenclature Committee,
: 1985, Eur. J. Biochem. 150; 1-5) and are also
: used by the GCG package. The IUB/GCG letter codes
: for nucleotide ambiguity are compatible with
: EMBL, GenBank and PIR database formats but are
: *NOT* compatible with Stadem/Sanger ambiguity
: symbols. Staden/Sanger use different symbols to
: represent uncertainty and frame abiguity.
:
: Currently Staden/Sanger are not recognized
: sequence types.
:
: GCG Documentation on sequence symbols:
URL : http://www.neb.com/gcgdoc/GCGdoc/Appendices/appendix_iii.html
:
:
Translation :
: GCG/IUB Meaning Complement
: ------------------------------------
: A A T
: C C G
: G G C
: T T A
: U U A
: M A or C K
: R A or G Y
: W A or T W
: S C or G S
: Y C or T R
: K G or T M
: V A or C or G B
: H A or C or T D
: D A or G or T H
: B C or G or T V
: X G or A or T or C X
: N G or A or T or C N
:--------------------------------------
:
Revision : 0.01 / 6 Dec 1996
Returns : char string
Argument : n/a
Title : reverse
Usage : $reversed_seq = $mySeq->reverse;
Function : Returns a char string containing the
: reverse of the object sequence
:
Example : $reversed_seq = $mySeq->reverse;
:
Revision : 0.01 / 6 Dec 1996
Returns : char string
Argument : n/a
Title : Dna_to_Rna
Usage : $translated_seq = $mySeq->Dna_to_Rna;
Function : Returns a char string containing the
: Rna translation of the Dna nucleotide sequence
: (Replaces T with U)
:
Example : $translated_seq = $mySeq->Dna_to_Rna;
:
Source : modified from Jong's <jong@mrc-lmb.cam.ac.uk>
: library of molbio perl routines
:
Revision : 0.01 / 6 Dec 1996
Returns : char string
Argument : n/a
Title : Rna_to_Dna
Usage : $translated_seq = $mySeq->Rna_to_Dna;
Function : Returns a char string containing the
: Dna translation of the Rna nucleotide sequence
: (Replaces U with T)
:
Example : $translated_seq = $mySeq->Rna_to_Dna;
:
Revision : 0.01 / 16 MAR 1997
Returns : char string
Argument : n/a
Title : translate
Usage :
Function : Returns a char string containing the single-letter
: protein translation of a Dna/Rna sequence
:
: "*" is the default symbol for a stop codon
: "X" is the default symbol for an unknown codon
:
Example : $translation = $mySeq->translate;
: -or- with user defined stop/unknown codon symbols:
: $translation = $mySeq->translate($stop_symbol,$unknown_symbol);
:
Source : modified from Jong's <jong@mrc-lmb.cam.ac.uk>
: library of molbio perl routines
:
To-do : - allow named parameters (just like new and out_GCG )
: - allow "frame" parameter to pick translation frame
:
Revision : 0.01 / 6 Dec 1996
Returns : char string
Argument : n/a
Title : version(); Usage : $myseq->version; Function : prints Bio::PreSeq current version number
The sequence object is merly a reference to a hash containing all or some of the following fields...
Field Value
--------------------------------------------------------------
seq the sequence
id a short identifier for the sequence
desc a description of the sequence, in descffmt file-format
names a hash of identifiers that relate to the sequence..
these could be Database ID's, Accession #'s, URL's,
pathnames, etc. Currently there is no set format
for the names hash and no formal definition of databases
or names
numbering numeration scheme, currently is the starting numeration
or offset for the sequence
type the sequence type. Is actually a 2 value list of format
["monomer","origin"] where monomer is one of the
recognized sequence types and origin is a string
description of the sequences' origin (mitochondrial, etc)
ffmt file-format for the sequence
descffmt file-format of the description string
UnivAln.pm - The biosequence alignment object Parse.pm - The perl interface to ReadSeq
BioPerl Project Page (URL) http://www.techfak.uni-bielefeld.de/bcd/Perl/Bio/
Copyright (c) 1996 Georg Fuellen, Richard Resnick, Steven E. Brenner, Chris Dagdigian and others. All Rights Reserved. This module is free software; you can redistribute it and/or modify it under the same terms as Perl itself.