The standard approach to determine unique or shared genetic factors across populations is to identify risk alleles in one population and investigate replication in others. However, since populations differ in DNA sequence information, allele frequencies, effect sizes, and linkage disequilibrium patterns, SNP association using a uniform stringent threshold on p values may not be reproducible across populations. Here, we developed rank-based methods to investigate shared or population-specific loci and pathways for childhood asthma across individuals of diverse ancestry. We performed genome-wide association studies on 859,790 SNPs genotyped in 527 affected offspring trios of European, African, and Hispanic ancestry using publically available asthma database in the Genotypes and Phenotypes database.
Rank-based analyses showed that there are shared genetic factors for asthma across populations, more at the gene and pathway levels than at the SNP level. Although the top 1,000 SNPs were not shared, 11 genes (RYR2, PDE4D, CSMD1, CDH13, ROBO2, RBFOX1, PTPRD, NPAS3, PDE1C, SEMA5A, and CTNNA2) mapped by these SNPs were shared across populations. Ryanodine receptor 2 (RYR2, a statin response-related gene) showed the strongest association in European (p value = 2.55 × 10−7) and was replicated in African (2.57 × 10−4) and Hispanic (1.18 × 10−3) Americans. Imputation analyses based on the 1000 Genomes Project uncovered additional RYR2 variants associated with asthma. Network and functional ontology analyses revealed that RYR2 is an integral part of dermatological or allergic disorder biological networks, specifically in the functional classes involving inflammatory, eosinophilic, and respiratory diseases.
Our rank-based genome-wide analysis revealed for the first time an association of RYR2 variants with asthma and replicated previously discovered PDE4D asthma gene across human populations. The replication of top-ranked asthma genes across populations suggests that such loci are less likely to be false positives and could indicate true associations. Variants that are associated with asthma across populations could be used to identify individuals who are at high risk for asthma regardless of genetic ancestry.
Asthma; GWAS; Ancestry; Trans-ancestral analysis; Rank analysis; Imputation; dbGaP; 1000 Genomes project; Networks/pathways, RYR2
A genome-wide association study of serum uric acid levels was performed in a relatively isolated population of European descent from an island of the Adriatic coast of Croatia. The study sample included 532 unrelated and 768 related individuals from 235 pedigrees. Inflation due to relatedness was controlled by using genomic control. Genetic association was assessed with 2,241,249 SNPs in 1300 samples after adjusting for age and gender. Our study replicated four previously reported serum uric acid loci (SLC2A9, ABCG2, RREB1, and SLC22A12). The strongest association was found with a SNP in SLC2A9 (rs13129697, P=2.33×10−19), which exhibited significant gender-specific effects, 35.76μmol/L (P=2.11×10−19) in females and 19.58 μmol/L (P=5.40×10−5) in males. Within this region of high linkage disequilibrium, we also detected a strong association with a non-synonymous SNP, rs16890979 (P=2.24×10−17), a putative causal variant for serum uric acid variation. In addition, we identified several novel loci suggestive of association with uric acid levels (SEMA5A, TMEM18, SLC28A2, and ODZ2), although the P-values (P<5×10−6) did not reach the threshold of genome-wide significance. Together, these findings provide further confirmation of previously reported uric acid-related genetic variants and highlight suggestive new loci for additional investigation.
Serum uric acid; genome-wide association; Adriatic island population
Genome-wide association studies (GWAS) have identified many common variants associated with complex traits in human populations. Thus far, most reported variants have relatively small effects and explain only a small proportion of phenotypic variance, leading to the issues of ‘missing’ heritability and its explanation. Using height as an example, we examined two possible sources of missing heritability: first, variants with smaller effects whose associations with height failed to reach genome-wide significance and second, allelic heterogeneity due to the effects of multiple variants at a single locus. Using a novel analytical approach we examined allelic heterogeneity of height-associated loci selected from SNPs of different significance levels based on the summary data of the GIANT (stage 1) studies. In a sample of 1,304 individuals collected from an island population of the Adriatic coast of Croatia, we assessed the extent of height variance explained by incorporating the effects of less significant height loci and multiple effective SNPs at the same loci. Our results indicate that approximately half of the 118 loci that achieved stringent genome-wide significance (p-value<5×10−8) showed evidence of allelic heterogeneity. Additionally, including less significant loci (i.e., p-value<5×10−4) and accounting for effects of allelic heterogeneity substantially improved the variance explained in height.
The relationship between sequence polymorphisms and human disease has been studied mostly in terms of effects of single nucleotide polymorphisms (SNPs) leading to single amino acid substitutions that change protein structure and function. However, less attention has been paid to more drastic sequence polymorphisms which cause premature termination of a protein’s sequence or large changes, insertions, or deletions in the sequence. We have analyzed a large set (n = 512) of insertions and deletions (indels) and single nucleotide polymorphisms causing premature termination of translation in disease-related genes. Prediction of protein-destabilization effects was performed by graphical presentation of the locations of polymorphisms in the protein structure, using the Genomes TO Protein (GTOP) database, and manual annotation with a set of specific criteria. Protein-destabilization was predicted for 44.4% of the nonsense SNPs, 32.4% of the frameshifting indels, and 9.1% of the non-frameshifting indels. A prediction of nonsense-mediated decay allowed to infer which truncated proteins would actually be translated as defective proteins. These cases included the proteins linked to diseases inherited dominantly, suggesting a relation between these diseases and toxic aggregation. Our approach would be useful in identifying potentially aggregation-inducing polymorphisms that may have pathological effects.
Twenty-two single-nucleotide polymorphisms (SNPs) in 10 gene regions previously identified in obesity and type 2 diabetes (T2D) genome-wide association studies (GWAS) were evaluated for association with metabolic traits in a sample from an island population of European descent. We performed a population-based study using 18 anthropometric and biochemical traits considered as continuous variables in a sample of 843 unrelated subjects (360 men and 483 women) aged 18–80 years old from the island of Hvar on the eastern Adriatic coast of Croatia. All eight GWAS SNPs in FTO were significantly associated with weight, body mass index, waist circumference and hip circumference; 20 of the 32 nominal P-values remained significant after permutation testing for multiple corrections. The strongest associations were found between the two TCF7L2 GWAS SNPs with fasting plasma glucose and HbA1c levels, all four P-values remained significant after permutation tests. Nominally significant associations were found between several SNPs and other metabolic traits; however, the significance did not hold after permutation tests. Although the sample size was modest, our study strongly replicated the association of FTO variants with obesity-related measures and TCF7L2 variants with T2D-related traits. The estimated effect sizes of these variants were larger or comparable to published studies. This is likely attributable to the homogenous genetic background of the relatively isolated study population.
genetic association; obesity; type 2 diabetes; FTO; TCF7L2; isolated population
Please see related article: http://www.investigativegenetics.com/content/3/1/2
Guidelines; human identity; legal proceedings; prior odds; subjectivity
Human height is a classical example of a polygenic quantitative trait. Recent large-scale genome-wide association studies (GWAS) have identified more than 200 height-associated loci, though these variants explain only 2∼10% of overall variability of normal height. The objective of this study was to investigate the variance explained by these loci in a relatively isolated population of European descent with limited admixture and homogeneous genetic background from the Adriatic coast of Croatia.
In a sample of 1304 individuals from the island population of Hvar, Croatia, we performed genome-wide SNP typing and assessed the variance explained by genetic scores constructed from different panels of height-associated SNPs extracted from five published studies. The combined information of the 180 SNPs reported by Lango Allen el al. explained 7.94% of phenotypic variation in our sample. Genetic scores based on 20∼50 SNPs reported by the remaining individual GWA studies explained 3∼5% of height variance. These percentages of variance explained were within ranges comparable to the original studies and heterogeneity tests did not detect significant differences in effect size estimates between our study and the original reports, if the estimates were obtained from populations of European descent.
We have evaluated the portability of height-associated loci and the overall fitting of estimated effect sizes reported in large cohorts to an isolated population. We found proportions of explained height variability were comparable to multiple reference GWAS in cohorts of European descent. These results indicate similar genetic architecture and comparable effect sizes of height loci among populations of European descent.
Identification of missing persons from mass disasters is based on evaluation of a number of variables and observations regarding the combination of features derived from these variables. DNA typing now is playing a more prominent role in the identification of human remains, and particularly so for highly decomposed and fragmented remains. The strength of genetic associations, by either direct or kinship analyses, is often quantified by calculating a likelihood ratio. The likelihood ratio can be multiplied by prior odds based on nongenetic evidence to calculate the posterior odds, that is, by applying Bayes' Theorem, to arrive at a probability of identity. For the identification of human remains, the path creating the set and intersection of variables that contribute to the prior odds needs to be appreciated and well defined. Other than considering the total number of missing persons, the forensic DNA community has been silent on specifying the elements of prior odds computations. The variables include the number of missing individuals, eyewitness accounts, anthropological features, demographics and other identifying characteristics. The assumptions, supporting data and reasoning that are used to establish a prior probability that will be combined with the genetic data need to be considered and justified. Otherwise, data may be unintentionally or intentionally manipulated to achieve a probability of identity that cannot be supported and can thus misrepresent the uncertainty with associations. The forensic DNA community needs to develop guidelines for objectively computing prior odds.
The human CYP1A1_CYP1A2 locus comprises the CYP1A1 (5,988 bp) and CYP1A2 (7,759 bp) transcribed regions, oriented head-to-head, sharing a bidirectional promoter of 23,306 bp. The older CYP1A1 gene appears more conserved and responsible for critical life function(s), whereas the younger CYP1A2 gene might have evolved more rapidly due to environmental (dietary) pressures. A population genetics study might confirm this premise. We combined 60 CYP1A1_CYP1A2 SNPs found in the present study (eight New Guinea Highlanders, eight Samoans, four Dogrib, four Teribe, four Pehuenche, one Caucasian) with those found in a previous study (six West Africans, four Han Chinese, six Germans, four Samoans, and four Dogrib)—yielding a total of 106 SNPs in 106 chromosomes. Resequencing of Oceanians plus Amerindians in the present study yielded 21 New World SNPs (~20%), of which 17 are not previously reported in any SNP database. Various tests revealed selective pressures for both genes and both haploblocks; unfortunately, differences in rates of evolution between the two genes were undetectable. Fay & Wu's H test revealed a “hitchhiking event” centered around four SNPs in the CYP1A1 3'-UTR; a study in silico identified different microRNA-binding patterns in the hitchhiked region, when the mutations were present compared with the mutations absent.
CYP1A1; CYP1A2; aryl hydrocarbon receptor; AHR; microRNA
DNA typing is an important tool in missing-person identification, especially in mass-fatality disasters. Identification methods comparing a DNA profile from unidentified human remains with that of a direct (from the person) or indirect (for example, from a biological relative) reference sample and ranking the pairwise likelihood ratios (LR) is straightforward and well defined. However, for indirect comparison cases in which several members from a family can serve as reference samples, the full power of kinship analysis is not entirely exploited. Because biologically related family members are not genetically independent, more information and thus greater power can be attained by simultaneous use of all pedigree members in most cases, although distant relationships may reduce the power. In this study, an improvement was made on the method for missing-person identification for autosomal and lineage-based markers, by considering jointly the DNA profile data of all available family reference samples. The missing person is evaluated by a pedigree LR of the probability of DNA evidence under alternative hypotheses (for example, the missing person is unrelated or if they belong to this pedigree with a specified biological relationship) and can be ranked for all pedigrees within a database. Pedigree LRs are adjusted for population substructure according to the recommendations of the second National Research Council (NRCII) Report. A realistic mutation model was also incorporated to accommodate the possibility of false exclusion. The results show that the effect of mutation on the pedigree LR is moderate, but LRs can be significantly decreased by the effect of population substructure. Finally, Y chromosome and mitochondrial DNA were integrated into the analysis to increase the power of identification. A program titled MPKin was developed, combining the aforementioned features to facilitate genetic analysis for identifying missing persons. The computational complexity of the algorithms is explained, and several ways to reduce the complexity are introduced.
Young age at portoenterostomy has been linked to improved outcome in biliary atresia, but pre-existing biological factors may influence the rate of disease progression. In this study, we aimed to determine whether molecular profiling of the liver identifies stages of disease at diagnosis.
We examined liver biopsies from 47 infants with biliary atresia enrolled in a prospective observational study. Biopsies were scored for inflammation and fibrosis, used for gene expression profiles, and tested for association with indicators of disease severity, response to surgery, and survival at 2 years.
Fourteen of 47 livers displayed predominant histological features of inflammation (N = 9) or fibrosis (N = 5), with the remainder showing similar levels of both simultaneously. By differential profiling of gene expression, the 14 livers had a unique molecular signature containing 150 gene probes. Applying prediction analysis models, the probes classified 29 of the remaining 33 livers into inflammation or fibrosis. Molecular classification into the two groups was validated by the findings of increased hepatic population of lymphocyte subsets or tissue accumulation of matrix substrates. The groups had no association with traditional markers of liver injury or function, response to surgery, or complications of cirrhosis. However, infants with an inflammation signature were younger, while those with a fibrosis signature had decreased transplant-free survival.
Molecular profiling at diagnosis of biliary atresia uncovers a signature of inflammation or fibrosis in most livers. This signature may relate to staging of disease at diagnosis and has implications to clinical outcomes.
Multiple studies have provided compelling evidence that the FTO gene variants are associated with obesity measures. The objective of the study was to investigate whether FTO variants are associated with a broad range of obesity related anthropometric traits in an island population.
We examined genetic association between 29 FTO SNPs and a comprehensive set of anthropometric traits in 843 unrelated individuals from an island population in the eastern Adriatic coast of Croatia. The traits include 11 anthropometrics (height, weight, waist circumference, hip circumference, bicondilar upper arm width, upper arm circumference, and biceps, triceps, subscapular, suprailiac and abdominal skin-fold thicknesses) and two derived measures (BMI and WHR). Using single locus score tests, 15 common SNPs were found to be significantly associated with “body fatness” measures such as weight, BMI, hip and waist circumferences with P-values ranging from 0.0004 to 0.01. Similar but less significant associations were also observed between these markers and bicondilar upper arm width and upper arm circumference. Most of these significant findings could be explained by a mediating effect of “body fatness”. However, one unique association signal between upper arm width and rs16952517 (P-value = 0.00156) could not be explained by this mediating effect. In addition, using a principle component analysis and conditional association tests adjusted for “body fatness”, two novel association signals were identified between upper arm circumference and rs11075986 (P-value = 0.00211) and rs16945088 (P-value = 0.00203).
The current study confirmed the association of common variants of FTO gene with “body fatness” measures in an isolated island population. We also observed evidence of pleiotropic effects of FTO gene on fat-free mass, such as frame size and muscle mass assessed by bicondilar upper arm width and upper arm circumference respectively and these pleiotropic effects might be influenced by variants that are different from the ones associated with “body fatness”.
Eczema is very common and increasing in prevalence. Prospective studies investigating environmental and genetic risk factors for eczema in a birth cohort are lacking. We evaluated risk factors that may promote development of childhood eczema in the Cincinnati Childhood Allergy and Air Pollution Study (CCAAPS) birth cohort (n = 762) of infants with at least one atopic parent. Objective environmental exposure data were available for each participant. At annual physical examinations, children underwent skin prick tests (SPTs), eczema was diagnosed by a clinician, and DNA was collected. Among Caucasian children, 39% developed eczema by age 3. Children with a pet dog were significantly less likely to have eczema at age one (odds ratio (OR) 0.62, 95% confidence interval (CI): 0.40–0.97) or at both ages 2 and 3 (OR = 0.54, 95% CI: 0.30–0.97). This finding was most significant among children carrying the CD14–159C/T CC genotype. Carriers of the CD14–159C/T and IL4Rα I75V single-nucleotide polymorphisms (SNPs) had an increased risk of eczema at ages 2 and 3 (OR 3.44, 95% CI: 1.56–7.57), especially among children who were SPT+. These results provide new insights into the pathogenesis of eczema in high-risk children and support a protective role for early exposure to dog, especially among those carrying the CD14–159C/T SNP. The results also demonstrate a susceptible effect of the combination of CD14 and IL4Rα SNPs with eczema.
Sixteen candidate polymorphisms (13 SNPs and 3 microsatellites) in nine genes from four DNA repair pathways were examined in 83 subjects, comprising 23 survivors of childhood cancer, their 23 partners, and 37 offspring, all of whom had previously been studied for G2 chromosomal radiosensitivity. Genotype at the Asp148Glu SNP site in the APEX gene of the base excision repair (BER) pathway was associated with childhood cancer in survivors (P = 0.001, significant even after multiple test adjustment), due to the enhanced frequency of the APEX Asp148 allele among survivors in comparison to that of their partners. Analysis of variance (ANOVA) of G2 radiosensitivity in the pooled sample, as well as family-based association test (FBAT) of the family-wise data, showed sporadic suggestions of associations between G2 radiosensitivity and polymorphisms at two sites (the Thr241-Met SNP site in the XRCC3 gene of the homologous recombinational pathway by ANOVA, and the Ser326Cys site in the hOGG1 gene of the BER pathway by FBAT analysis), but neither of these remained significant after multiple-test adjustment. This pilot study provides an intriguing indication that DNA repair gene polymorphisms may underlie cancer susceptibility and variation in radiosensitivity. Environ. Mol. Mutagen. 48:48–57, 2007.
chromosomal radiosensitivity; DNA repair genes; cancer susceptibility
Here, we constructed a phylogenetic tree of 17 bacterial phyla covering eubacteria and archaea by using a new method and 102 carefully selected orthologs from their genomes. One of the serious disturbing factors in phylogeny construction is the existence of out-paralogs that cannot easily be found out and discarded. In our method, out-paralogs are detected and removed by constructing a phylogenetic tree of the genes in question and examining the clustered genes in the tree. We also developed a method for comparing two tree topologies or shapes, ComTree. Applying ComTree to the constructed tree we computed the relative number of orthologs that support a node of the tree. This number is called the Positive Ortholog Ratio (POR), which is conceptually and methodologically different from the frequently used bootstrap value. Our study concretely shows drawbacks of the bootstrap test. Our result of bacterial phylogeny analysis is consistent with previous ones showing that hyperthermophilic bacteria such as Thermotogae and Aquificae diverged earlier than the others in the eubacterial phylogeny studied. It is noted that our results are consistent whether thermophilic archaea or mesophilic archaea is employed for determining the root of the tree. The earliest divergence of hyperthermophilic eubacteria is supported by genes involved in fundamental metabolic processes such as glycolysis, nucleotide and amino acid syntheses.
Bacterial phylogeny; Concatenated tree; Out-paralog; Ortholog database; Thermophilic eubacteria; Tree evaluation
Arab forces conquered the Indus Delta region in 711 A.D. and, although a Muslim state was established there, their influence was barely felt in the rest of South Asia at that time. By the end of the tenth century, Central Asian Muslims moved into India from the northwest and expanded throughout the subcontinent. Muslim communities are now the largest minority religion in India, comprising more than 138 million people in a predominantly Hindu population of over one billion. It is unclear whether the Muslim expansion in India was a purely cultural phenomenon or had a genetic impact on the local population. To address this question from a male perspective, we typed eight microsatellite loci and 16 binary markers from the Y chromosome in 246 Muslims from Andhra Pradesh, and compared them to published data on 4,204 males from China, Central Asia, other parts of India, Sri Lanka, Pakistan, Iran, the Middle East, Turkey, Egypt and Morocco. We find that the Muslim populations in general are genetically closer to their non-Muslim geographical neighbors than to other Muslims in India, and that there is a highly significant correlation between genetics and geography (but not religion). Our findings indicate that, despite the documented practice of marriage between Muslim men and Hindu women, Islamization in India did not involve large-scale replacement of Hindu Y chromosomes. The Muslim expansion in India was predominantly a cultural change and was not accompanied by significant gene flow, as seen in other places, such as China and Central Asia.
Y-chromosomal polymorphism; India; Muslim; Hindu
A great amount of data has been accumulated on genetic variations in the human genome, but we still do not know much about how the genetic variations affect gene function. In particular, little is known about the distribution of nonsense polymorphisms in human genes despite their drastic effects on gene products.
To detect polymorphisms affecting gene function, we analyzed all publicly available polymorphisms in a database for single nucleotide polymorphisms (dbSNP build 125) located in the exons of 36,712 known and predicted protein-coding genes that were defined in an annotation project of all human genes and transcripts (H-InvDB ver3.8). We found a total of 252,555 single nucleotide polymorphisms (SNPs) and 8,479 insertion and deletions in the representative transcripts in these genes. The SNPs located in ORFs include 40,484 synonymous and 53,754 nonsynonymous SNPs, and 1,258 SNPs that were predicted to be nonsense SNPs or read-through SNPs. We estimated the density of nonsense SNPs to be 0.85×10−3 per site, which is lower than that of nonsynonymous SNPs (2.1×10−3 per site). On average, nonsense SNPs were located 250 codons upstream of the original termination codon, with the substitution occurring most frequently at the first codon position. Of the nonsense SNPs, 581 were predicted to cause nonsense-mediated decay (NMD) of transcripts that would prevent translation. We found that nonsense SNPs causing NMD were more common in genes involving kinase activity and transport. The remaining 602 nonsense SNPs are predicted to produce truncated polypeptides, with an average truncation of 75 amino acids. In addition, 110 read-through SNPs at termination codons were detected.
Our comprehensive exploration of nonsense polymorphisms showed that nonsense SNPs exist at a lower density than nonsynonymous SNPs, suggesting that nonsense mutations have more severe effects than amino acid changes. The correspondence of nonsense SNPs to known pathological variants suggests that phenotypic effects of nonsense SNPs have been reported for only a small fraction of nonsense SNPs, and that nonsense SNPs causing NMD are more likely to be involved in phenotypic variations. These nonsense SNPs may include pathological variants that have not yet been reported. These data are available from Transcript View of H-InvDB and VarySysDB (http://h-invitational.jp/varygene/).
We have previously reported that family history of ICH was associated with both lobar and non-lobar ICH. We sought to further examine this finding by analyzing differences by age and Apolipoprotein E genotype.
All cases of hemorrhagic stroke in the Greater Cincinnati area are identified through retrospective screening and a subset is invited to undergo a direct interview and genetic testing. Interviewed subjects are matched to two controls by age, race and gender. Conditional stepwise logistic regression modeling was used to determine if having a first-degree relative with an ICH (FHICH) was an independent risk factor for ICH.
Between 5/97 and 12/02, we recruited 333 cases of ICH. FHICH was found to be an independent risk factor for both lobar ICH (OR=3.9; p=0.04) and non-lobar ICH (OR=5.4; p=0.01) after controlling for the presence of numerous variables. Among non-lobar ICH cases the risk appeared to be predominately in those <70 years of age. The presence of apolipoprotein E4 was associated with lobar ICH = 70 years of age but not < 70 years of age.
Family history of ICH appears to be a significant risk factor for non-lobar ICH <70 years of age. Presence of Apolipoprotein E4 appears to be a risk factor for lobar ICH = 70 years of age but not < 70 years of age. Family history of ICH is a risk factor for lobar ICH after controlling for the presence of Apo E4.
Genetics; Stroke; Intracerebral Hemorrhage; Family History; Apolipoprotein E
Apolipoprotein E (APOE) and elastin (ELN) are plausible candidate genes involved in the pathogenesis of stroke. We tested for association of variants in APOE and ELN with subarachnoid hemorrhage (SAH) in a population-based study. We genotyped 12 single nucleotide polymorphisms (SNPs) on APOE and 10 SNPs on ELN in a sample of 309 Caucasian individuals, of whom 107 are SAH cases and 202 are age-, race-, and gender-matched controls from the Greater Cincinnati/Northern Kentucky region. Associations were tested at genotype, allele, and haplotype levels. A genomic control analysis was performed to check for spurious associations resulting from population substructure.
At the APOE locus, no individual SNP was associated with SAH after correction for multiple comparisons. Haplotype analysis revealed significant association of the major haplotype (Hap1) in APOE with SAH (p = 0.001). The association stemmed from both the 5' promoter and the 3' region of the APOE gene. APOE ε2 and ε 4 were not significantly associated with SAH. No association was observed for ELN at genotype, allele, or haplotype level and our study failed to confirm previous reports of ELN association with aneurysmal SAH.
This study suggests a role of the APOE gene in the etiology of aneurysmal SAH.
Phanerochaete chrysosporium, the model white rot basidiomycetous fungus, has the extraordinary ability to mineralize (to CO2) lignin and detoxify a variety of chemical pollutants. Its cytochrome P450 monooxygenases have recently been implied in several of these biotransformations. Our initial P450 cloning efforts in P. chrysosporium and its subsequent whole genome sequencing have revealed an extraordinary P450 repertoire ("P450ome") containing at least 150 P450 genes with yet unknown function. In order to understand the functional diversity and the evolutionary mechanisms and significance of these hemeproteins, here we report a genome-wide structural and evolutionary analysis of the P450ome of this fungus.
Our analysis showed that P. chrysosporium P450ome could be classified into 12 families and 23 sub-families and is characterized by the presence of multigene families. A genome-level structural analysis revealed 16 organizationally homogeneous and heterogeneous clusters of tandem P450 genes. Analysis of our cloned cDNAs revealed structurally conserved characteristics (intron numbers and locations, and functional domains) among members of the two representative multigene P450 families CYP63 and CYP505 (P450foxy). Considering the unusually complex structural features of the P450 genes in this genome, including microexons (2–10 aa) and frequent small introns (45–55 bp), alternative splicing, as experimentally observed for CYP63, may be a more widespread event in the P450ome of this fungus. Clan-level phylogenetic comparison revealed that P. chrysosporium P450 families fall under 11 fungal clans and the majority of these multigene families appear to have evolved locally in this genome from their respective progenitor genes, as a result of extensive gene duplications and rearrangements.
P. chrysosporium P450ome, the largest known todate among fungi, is characterized by tandem gene clusters and multigene families. This enormous P450 gene diversity has evolved by extensive gene duplications and intragenomic recombinations of the progenitor genes presumably to meet the exceptionally high metabolic demand of this biodegradative group of basidiomycetous fungi in ecological niches. In this context, alternative splicing appears to further contribute to the evolution of functional diversity of the P450ome in this fungus. The evolved P450 diversity is consistent with the known vast biotransformation potential of P. chrysosporium. The presented analysis will help design future P450 functional studies to understand the underlying mechanisms of secondary metabolism and oxidative biotransformation pathways in this model white rot fungus.