The Seq object is Biopython's core sequence representation. It behaves very much like a Python string but with the addition of an alphabet (allowing explicit declaration of a protein sequence for example) and some key biologically relevant methods. For example,
Sequence annotation is represented using SeqRecord objects which augment a Seq object with properties such as the record name, identifier and description and space for additional key/value terms. The SeqRecord can also hold a list of SeqFeature objects which describe sub-features of the sequence with their location and their own annotation.
The
Bio.SeqIO module provides a simple interface for reading and writing biological sequence files in various formats (), where regardless of the file format, the information is held as
SeqRecord objects.
Bio.SeqIO interprets multiple sequence alignment file formats as collections of equal length (gapped) sequences. Alternatively,
Bio.AlignIO works directly with alignments, including files holding more than one alignment (e.g. re-sampled alignments for bootstrapping, or multiple pairwise alignments). Related module
Bio.Nexus, developed for Kauff
et al. (
2007), supports phylogenetic tools using the NEXUS interface (Maddison
et al.,
1997) or the Newick standard tree format.
| Table 1.Selected Bio.SeqIO or Bio.AlignIO file formats |
Modules for a number of online databases are included, such as the NCBI Entrez Utilities, ExPASy, InterPro, KEGG and SCOP.
Bio.Blast can call the NCBI's online Blast server or a local standalone installation, and includes a parser for their XML output. Biopython has wrapper code for other command line tools too, such as ClustalW and EMBOSS.
Bio.PDB module provides a PDB file parser, and functionality related to macromolecular structure (Hamelryck and Manderick,
2003). Module
Bio.Motif provides support for sequence motif analysis (searching, comparing and
de novo learning). Biopython's graphical output capabilities were recently significantly extended by the inclusion of GenomeDiagram (Pritchard
et al.,
2006).
Biopython contains modules for supervised statistical learning, such as Bayesian methods and Markov models, as well as unsu pervised learning, such as clustering (De Hoon
et al.,
2004).
The population genetics module provides wrappers for GENEPOP (Rousset,
2007), coalescent simulation via SIMCOAL2 (Laval and Excoffier,
2004) and selection detection based on a well-evaluated
Fst-outlier detection method (Beaumont and Nichols,
1996).
BioSQL (
www.biosql.org) is another OBF supported initiative, a joint collaboration between BioPerl, Biopython, BioJava and BioRuby to support loading and retrieving annotated sequences to and from an SQL database using a standard schema. Each project provides an object-relational mapping (ORM) between the shared schema and its own object model (a
SeqRecord in Biopython). As an example,
xBASE (Chaudhuri and Pallen,
2006) uses BioSQL with both BioPerl and Biopython.