Cytochrome P450 proteins (CYPs) are found in all domains of life
] and represent one of the largest protein families. Their existence predates the emergence of oxygen-metabolizing life forms
]. CYPs are defined by the absorption of light at 450nm by the heme cofactor, and oxidize a very diverse array of metabolic intermediates and environmental compounds. CYPs participate in a large number of primary, secondary and xenobiotic metabolic reactions
The evolution of CYPs has been intimately intertwined with organismal adaptation to new ecological niches due to the roles of CYPs in the production of metabolites critical for specific processes such as pathogenesis, the utilization of specific substrates, and/or the detoxification of xenobiotics. Based on their roles in synthesizing or neutralizing toxic metabolites, many CYPs are hypothesized to have evolved through the chemical warfare waged among plants, animals, insects, and microbes
]. In fungi, several CYPs have been implicated in pathogen virulence because they neutralize antifungal compounds produced by hosts
]. Expansions and diversifications of several CYP families have been associated with the evolution of fungal pathogenicity
]. Accordingly, functional and evolutionary analyses of CYPs have been useful in understanding the ecological specialization and functional diversification of individual fungal taxa
The extraordinary functional and evolutionary diversity of fungal CYPomes presents a major hurdle to CYP classification
]. Fungal CYPs share little sequence similarity, except for a few conserved residues that are characteristic of CYPs. The most conserved region is the binding domain for a heme cofactor. Substrate binding regions are much more variable but may possess a signature motif. This motif is often found in conjunction with one or more binding domains such as those for cytochrome b5, ferredoxin, and binding sites for the NADPH cytochrome P450 reductase that contains FAD (flavin adenine dinucleotide) and FMN (flavin mononucleotide)
Another challenge in developing a comprehensive CYP classification system is the rapidly increasing number of sequenced fungal genomes. Currently, more than 250 genomes are present in the public domain
], but this number is predicted to increase rapidly (e.g.,
). The rapid influx of genome sequences calls for robust computational tools that can effectively support large-scale comparative analyses of genomes and specific gene families.
The first nomenclature/grouping schema for CYPs, proposed by Nebert et al. in 1987
], was based on amino acid sequence similarity. According to this schema, any two CYPs with sequence identity greater than 40% belong to a single CYP family; and any two CYPs with sequence identity greater than 55% belong to a subfamily. Manually curated databases of CYPs in multiple kingdoms based on this approach (thereafter referred to as Nelson’s P450 databases) have been maintained at
]. These databases also serve as a central repository of CYP nomenclature. Unfortunately, this schema cannot be efficiently used to curate and classify rapidly increasing CYPs uncovered through genome sequencing.
The clan system approach was developed to support higher-level grouping of families identified via the sequence similarity-based schema. This approach places all CYP families with a monophyletic origin into a single clan and has been successfully applied to classify CYP families in Metazoa
] and four fungal species
]. For example, if new CYPs had equal identity to two or more CYP families, they can be tentatively assigned to a clan in which these families belong. Since the introduction of the “clan concept” in 1998 to classify metazoan CYPs
], additional clans in vertebrates (9), plants (11)
], bivalves (4), and fungi (115)
] have been identified. However, the clan classification system has become problematic for classifying the pan-fungal CYPome, because the number of fungal CYPs is too large to conduct phylogenetic analyses efficiently. Automated clustering based on sequence similarity remains the gold standard for the rapid classification of large protein sets
]. This approach does not require any prior knowledge and allows for rapid clustering of large protein families such as CYPs.
In 2008, we employed an automated clustering approach to build the Fungal Cytochrome P450 Database (FCPD)
]. Since then the number of sequenced fungal genomes has increased substantially, which necessitated the improvement of our classification system. Additionally, the original FCPD classification generated several mega clusters, underscoring the need for optimizing clustering parameters.
Here we present FCPD release 1.2 (
with an improved CYP classification pipeline based on the modified TRIBE-MCL algorithm. The pipeline allowed for a larger number of CYP families to be merged into existing clans as well as supporting the discovery of potential new clans. To aid functional annotation, putative functional roles were assigned to over 150 clusters based on their similarity to functionally characterized fungal CYPs. The families and clans are accessible through FCPD, which offers global viewing and analysis of fungal CYPs.