Cytochrome P450 monooxygenases (CYPs) are a ubiquitous protein family, existing in all eukaryotes, most prokaryotes and Archae. These heme-containing enzymes catalyze the monooxygenation of a large variety of substrates [
1]. CYPs have an essential function in drug metabolism, hence focussed in the pharmaceutical industry [
2]. Besides, they are of great interest for synthetical application in biotechnology as versatile biocatalysts [
3]. A profound knowledge in the factors mediating selectivity and activity of these proteins is a prerequisite in the development of CYPs with improved properties. Therefore, deeper insights in the relationships between sequence, structure and function are of great interest.
According to Nelson's classification [
4] CYPs are grouped into homologous families and superfamilies, predominantly based on sequence similarity. The sequence identity between proteins from different superfamilies is extremely low and may be less than 20% [
5]. Only three amino acids are totally conserved, the glutamic acid and the arginine of the ExxR-motif, which is involved in stabilizing the core and heme-binding [
6], and the heme-binding cysteine. However, the increasing number of crystal structures shows that despite this unusual variability the overall structure is highly conserved: CYPs consists of structural conserved modules that are essential for structure and function, and of variable regions that mediate the individual biochemical properties. The defined conserved secondary structures are named αA-L and β1-5 and could be identified in all CYP structures and make up the so called CYP-fold [
7-
9].
Most CYPs require interaction with a reductase to provide electrons, either as separate proteins or as fusion proteins. Depending on the nature of their electron transfer partner, CYPs are assigned to different classes. Although, no consensus has been reached in the definition of this classification, there are several proposed schemes which subdivide CYPs in up to nine classes [
10-
12]. The most general one, which was applied in this work, discriminates between two major classes of CYPs [
13]: class I, which comprises mitochondrial and bacterial CYPs and class II which comprises CYPs interacting with a cytochrome P450 reductase-type (CPR-type) FMN/FAD reductase and represents a simplification of the widely accepted classification scheme by Kelly et al. in [
1]. Further, there are CYPs known which do not need a reductase for their reaction [
14]. Fusion proteins, such as the self-sufficient class II CYP 102A1 from
Bacillus megaterium (P450 BM-3) which contains a heme domain and a reductase, as well as those CYPs which do not require any reductase interaction appear very rarely in nature [
15]. Therefore, in most CYPs the interaction with their appropriate redox partner is prerequisite for their reaction to occur. Many different CYP isoenzymes interact with only one reductase, and it is assumed that CYPs of the same class are comparable in regard to their reductase interaction sites [
16]. It is expected that there are favorable electrostatic interactions between CYPs and their electron transfer partner [
17]. A crystal structure for a CYP-reductase-complex is not yet available. Even though the kinetics in P450 reduction may not be generalized among different P450 systems, and the concepts regarding the influence of a rate-limiting step are not universal [
18], the electron transfer from the reductase to the heme domain is often slow and one of the rate-limiting aspects in many CYP systems [
19]. However, the interactions between the components of the electron transfer systems still remain unclear. A deeper understanding of the factors determining reductase interaction gained by the analysis of the reductase interaction sites of CYPs will assist in improving interactions and consequently lead to optimized enzymes for biocatalytic applications [
20].
Previous analyses of the structure conservation in CYPs showed that all CYPs have a well-conserved heme-binding structural core formed out of αD, αE, αI, and αL and αJ and αK [
21]. The β-bulge region which contains the thiolate heme ligand is referred to as Cys-pocket. Between αK and the Cys-pocket, a structurally conserved region is located, the so-called 'meander' loop. It is spanned by 7-10 amino acid residues and is supposed to play a role in heme binding and stabilization of the tertiary structure. The proposed reductase interaction face of CYPs mainly comprises the αJ/αJ' and the insertion following the meander loop [
6]. Since the structures of all CYPs are highly similar, but differ in substrate specificity and their electron transfer partners, the different biochemical properties of CYPs are mediated by the diverse regions, which vary in both sequence and structure [
8].
Six regions which are involved in recognition and binding of substrates and hence determine substrate specificity were described as SRSs (substrate recognition sites [
22]). SRS1 lies in the highly variable loop region between αB and αC (BC-loop), SRS2 is located in the C-terminal end of αF, SRS3 and SRS4 are spanned by the N-terminal regions of αG and αI, β1-4 houses SRS5 and β4-1 SRS6. While the access of the substrate to the binding pocket is limited by flexible regions in the entrance channel, such as αF and αG which undergo strong conformational changes upon substrate binding [
23,
24], the regions flanking directly the binding pocket and thus limiting the access of the substrate to the heme, namely αI, the BC-loop region and SRS5, were observed to remain rigid during simulation [
25,
26]. In a systematic analysis of SRS5 in more than 6300 sequences, single substrate- and heme-interacting residues could be identified in this region [
27]: Thus, a hotspot for regio- and stereoselectivity in one residue in SRS5 and one position in the BC-loop (F87), were previously reported as key residues in determining activity, regio- and stereoselectivity in CYP102A1 [
28-
30]. Combinations of variants of these two positions were applied to design a minimal mutant library with improved selectivity [
31]. Due to the high variability of the BC-loop, the identification of position 87 in CYP102A1 in other CYPs, remains a challenge for sequences without structural information.
To serve as a tool for a comprehensive comparison of protein sequences and structures within the vast and diverse family of CYPs in order to transfer the newly gained insights among the CYP sequences, the Cytochrome P450 Engineering Database (
CYPED) [
32] has been designed. In its current version 2.02 it contains 8614 sequences [
33]. The highly similar structures have been compared in detail to identify the common core and to assign the variable regions. For this purpose a structural alignment was used as a base to generate a reliable structure profile. With this profile all structurally conserved regions (SCR) could be predicted and annotated among all
CYPED protein sequence entries, hence allowing a structural navigation in those sequences lacking structural information. Beyond this, the
CYPED website provides an interface which allows the prediction of the SCRs for every user-specified CYP sequence.