The Seq object is Biopython's core sequence representation. It behaves very much like a Python string but with the addition of an alphabet (allowing explicit declaration of a protein sequence for example) and some key biologically relevant methods. For example,
Sequence annotation is represented using SeqRecord objects which augment a Seq object with properties such as the record name, identifier and description and space for additional key/value terms. The SeqRecord can also hold a list of SeqFeature objects which describe sub-features of the sequence with their location and their own annotation.
module provides a simple interface for reading and writing biological sequence files in various formats (), where regardless of the file format, the information is held as SeqRecord
interprets multiple sequence alignment file formats as collections of equal length (gapped) sequences. Alternatively, Bio.AlignIO
works directly with alignments, including files holding more than one alignment (e.g. re-sampled alignments for bootstrapping, or multiple pairwise alignments). Related module Bio.Nexus
, developed for Kauff et al.
), supports phylogenetic tools using the NEXUS interface (Maddison et al.
) or the Newick standard tree format.
Selected Bio.SeqIO or Bio.AlignIO file formats
Modules for a number of online databases are included, such as the NCBI Entrez Utilities, ExPASy, InterPro, KEGG and SCOP. Bio.Blast
can call the NCBI's online Blast server or a local standalone installation, and includes a parser for their XML output. Biopython has wrapper code for other command line tools too, such as ClustalW and EMBOSS. Bio.PDB
module provides a PDB file parser, and functionality related to macromolecular structure (Hamelryck and Manderick, 2003
). Module Bio.Motif
provides support for sequence motif analysis (searching, comparing and de novo
learning). Biopython's graphical output capabilities were recently significantly extended by the inclusion of GenomeDiagram (Pritchard et al.
Biopython contains modules for supervised statistical learning, such as Bayesian methods and Markov models, as well as unsu pervised learning, such as clustering (De Hoon et al.
The population genetics module provides wrappers for GENEPOP (Rousset, 2007
), coalescent simulation via SIMCOAL2 (Laval and Excoffier, 2004
) and selection detection based on a well-evaluated Fst
-outlier detection method (Beaumont and Nichols, 1996
) is another OBF supported initiative, a joint collaboration between BioPerl, Biopython, BioJava and BioRuby to support loading and retrieving annotated sequences to and from an SQL database using a standard schema. Each project provides an object-relational mapping (ORM) between the shared schema and its own object model (a SeqRecord
in Biopython). As an example, x
BASE (Chaudhuri and Pallen, 2006
) uses BioSQL with both BioPerl and Biopython.