1.  A type I interferon transcriptional signature precedes autoimmunity in children genetically at-risk of type 1 diabetes 
Diabetes  2014;63(7):2538-2550.
Diagnosis of the autoimmune disease type 1 diabetes (T1D) is preceded by the appearance of circulating autoantibodies to pancreatic islets. However, almost nothing is known about events leading to this islet autoimmunity. Previous epidemiological and genetic data have associated viral infections and anti-viral type I interferon (IFN) immune response genes with T1D. Here, we first used DNA microarray analysis to identify IFN-β inducible genes in vitro and then used this set of genes to define an IFN-inducible transcriptional signature in peripheral blood mononuclear cells from a group of active systemic lupus erythematosus patients (N=25). Using this predefined set of 225 IFN signature genes, we investigated expression of the signature in cohorts of healthy controls (N=87), T1D patients (N=64) and a large longitudinal birth cohort of children genetically predisposed to T1D (N=109; 454 microarrayed samples). Expression of the IFN signature was increased in genetically-predisposed children prior to the development of autoantibodies (P=0.0012), but not in established T1D patients. Upregulation of IFN-inducible genes was transient, temporally associated with a recent history of upper respiratory tract infections (P=0.0064) and marked by increased expression of SIGLEC-1 (CD169), a lectin-like receptor expressed on CD14+ monocytes. DNA variation in IFN-inducible genes altered T1D risk (P=0.007), as exemplified by IFIH1, one of the genes in our IFN signature and for which increased expression is a known disease risk factor. These findings identify transient increased expression of type I IFN genes in pre-clinical diabetes as a risk factor for autoimmunity in children with a genetic predisposition to T1D.
PMCID: PMC4066333  PMID: 24561305
2.  Postthymic Expansion in Human CD4 Naive T Cells Defined by Expression of Functional High-Affinity IL-2 Receptors* 
As the thymus involutes with age, the maintenance of peripheral naive T cells in humans becomes strongly dependent on peripheral cell division. However, mechanisms that orchestrate homeostatic division remain unclear. In this study we present evidence that the frequency of naive CD4 T cells that express CD25 (IL-2 receptor α-chain) increases with age on subsets of both CD31+ and CD31− naive CD4 T cells. Analyses of TCR excision circles from sorted subsets indicate that CD25+ naive CD4 T cells have undergone more rounds of homeostatic proliferation than their CD25− counterparts in both the CD31+ and CD31− subsets, indicating that CD25 is a marker of naive CD4 T cells that have preferentially responded to survival signals from self-Ags or cytokines. CD25 expression on CD25− naive CD4 T cells can be induced by IL-7 in vitro in the absence of TCR activation. Although CD25+ naive T cells respond to lower concentrations of IL-2 as compared with their CD25− counterparts, IL-2 responsiveness is further increased in CD31− naive T cells by their expression of the signaling IL-2 receptor β-chain CD122, forming with common γ-chain functional high-affinity IL-2 receptors. CD25 plays a role during activation: CD25+ naive T cells stimulated in an APC-dependent manner were shown to produce increased levels of IL-2 as compared with their CD25− counterparts. This study establishes CD25+ naive CD4 T cells, which are further delineated by CD31 expression, as a major functionally distinct immune cell subset in humans that warrants further characterization in health and disease.
PMCID: PMC3614027  PMID: 23418630
3.  Genomic clustering and co-regulation of transcriptional networks in the pathogenic fungus Fusarium graminearum 
BMC Systems Biology  2013;7:52.
Genes for the production of a broad range of fungal secondary metabolites are frequently colinear. The prevalence of such gene clusters was systematically examined across the genome of the cereal pathogen Fusarium graminearum. The topological structure of transcriptional networks was also examined to investigate control mechanisms for mycotoxin biosynthesis and other processes.
The genes associated with transcriptional processes were identified, and the genomic location of transcription-associated proteins (TAPs) analyzed in conjunction with the locations of genes exhibiting similar expression patterns. Highly conserved TAPs reside in regions of chromosomes with very low or no recombination, contrasting with putative regulator genes. Co-expression group profiles were used to define positionally clustered genes and a number of members of these clusters encode proteins participating in secondary metabolism. Gene expression profiles suggest there is an abundance of condition-specific transcriptional regulation. Analysis of the promoter regions of co-expressed genes showed enrichment for conserved DNA-sequence motifs. Potential global transcription factors recognising these motifs contain distinct sets of DNA-binding domains (DBDs) from those present in local regulators.
Proteins associated with basal transcriptional functions are encoded by genes enriched in regions of the genome with low recombination. Systematic searches revealed dispersed and compact clusters of co-expressed genes, often containing a transcription factor, and typically containing genes involved in biosynthetic pathways. Transcriptional networks exhibit a layered structure in which the position in the hierarchy of a regulator is closely linked to the DBD structural class.
PMCID: PMC3703260  PMID: 23805903
Transcriptional networks; DNA-binding domains; mycotoxin biosynthesis; filamentous fungi; gene clusters
4.  An automated graphics tool for comparative genomics: the Coulson plot generator 
BMC Bioinformatics  2013;14:141.
Comparative analysis is an essential component to biology. When applied to genomics for example, analysis may require comparisons between the predicted presence and absence of genes in a group of genomes under consideration. Frequently, genes can be grouped into small categories based on functional criteria, for example membership of a multimeric complex, participation in a metabolic or signaling pathway or shared sequence features and/or paralogy. These patterns of retention and loss are highly informative for the prediction of function, and hence possible biological context, and can provide great insights into the evolutionary history of cellular functions. However, representation of such information in a standard spreadsheet is a poor visual means from which to extract patterns within a dataset.
We devised the Coulson Plot, a new graphical representation that exploits a matrix of pie charts to display comparative genomics data. Each pie is used to describe a complex or process from a separate taxon, and is divided into sectors corresponding to the number of proteins (subunits) in a complex/process. The predicted presence or absence of proteins in each complex are delineated by occupancy of a given sector; this format is visually highly accessible and makes pattern recognition rapid and reliable. A key to the identity of each subunit, plus hierarchical naming of taxa and coloring are included. A java-based application, the Coulson plot generator (CPG) automates graphic production, with a tab or comma-delineated text file as input and generating an editable portable document format or svg file.
CPG software may be used to rapidly convert spreadsheet data to a graphical matrix pie chart format. The representation essentially retains all of the information from the spreadsheet but presents a graphically rich format making comparisons and identification of patterns significantly clearer. While the Coulson plot format is highly useful in comparative genomics, its original purpose, the software can be used to visualize any dataset where entity occupancy is compared between different classes.
CPG software is available at sourceforge and
PMCID: PMC3668160  PMID: 23621955
5.  Functional IL6R 358Ala Allele Impairs Classical IL-6 Receptor Signaling and Influences Risk of Diverse Inflammatory Diseases 
PLoS Genetics  2013;9(4):e1003444.
Inflammation, which is directly regulated by interleukin-6 (IL-6) signaling, is implicated in the etiology of several chronic diseases. Although a common, non-synonymous variant in the IL-6 receptor gene (IL6R Asp358Ala; rs2228145 A>C) is associated with the risk of several common diseases, with the 358Ala allele conferring protection from coronary heart disease (CHD), rheumatoid arthritis (RA), atrial fibrillation (AF), abdominal aortic aneurysm (AAA), and increased susceptibility to asthma, the variant's effect on IL-6 signaling is not known. Here we provide evidence for the association of this non-synonymous variant with the risk of type 1 diabetes (T1D) in two independent populations and confirm that rs2228145 is the major determinant of the concentration of circulating soluble IL-6R (sIL-6R) levels (34.6% increase in sIL-6R per copy of the minor allele 358Ala; rs2228145 [C]). To further investigate the molecular mechanism of this variant, we analyzed expression of IL-6R in peripheral blood mononuclear cells (PBMCs) in 128 volunteers from the Cambridge BioResource. We demonstrate that, although 358Ala increases transcription of the soluble IL6R isoform (P = 8.3×10−22) and not the membrane-bound isoform, 358Ala reduces surface expression of IL-6R on CD4+ T cells and monocytes (up to 28% reduction per allele; P≤5.6×10−22). Importantly, reduced expression of membrane-bound IL-6R resulted in impaired IL-6 responsiveness, as measured by decreased phosphorylation of the transcription factors STAT3 and STAT1 following stimulation with IL-6 (P≤5.2×10−7). Our findings elucidate the regulation of IL-6 signaling by IL-6R, which is causally relevant to several complex diseases, identify mechanisms for new approaches to target the IL-6/IL-6R axis, and anticipate differences in treatment response to IL-6 therapies based on this common IL6R variant.
Author Summary
Interleukin-6 (IL-6) is a complex cytokine, which plays a critical role in the regulation of inflammatory responses. Genetic variation in the IL-6 receptor gene is associated with the risk of several human diseases with an inflammatory component, including coronary heart disease, rheumatoid arthritis, and asthma. A common non-synonymous single nucleotide polymorphism in this gene (Asp358Ala) has been suggested to be the causal variant in this region by affecting the circulatory concentrations of soluble IL-6R (sIL-6R). In this study we extend the genetic association of this variant to type 1 diabetes and provide evidence that this variant exerts its functional mechanism by regulating the balance between sIL-6R (generated through cleavage of the surface receptor and by alternative splicing of a soluble IL6R isoform) and membrane-bound IL-6R. These data show for the first time that the minor allele of this non-synonymous variant (Ala358) directly controls the surface levels of IL-6R on individual immune cells and that these differences in protein levels translate into a functional impairment in IL-6R signaling. These findings may have implications for clinical trials targeting inflammatory mechanisms involving IL-6R signaling and may provide tools for identifying patients with specific benefit from therapeutic intervention in the IL-6R signaling pathway.
PMCID: PMC3617094  PMID: 23593036
6.  Long-range DNA looping and gene expression analyses identify DEXI as an autoimmune disease candidate gene 
Human Molecular Genetics  2011;21(2):322-333.
The chromosome 16p13 region has been associated with several autoimmune diseases, including type 1 diabetes (T1D) and multiple sclerosis (MS). CLEC16A has been reported as the most likely candidate gene in the region, since it contains the most disease-associated single-nucleotide polymorphisms (SNPs), as well as an imunoreceptor tyrosine-based activation motif. However, here we report that intron 19 of CLEC16A, containing the most autoimmune disease-associated SNPs, appears to behave as a regulatory sequence, affecting the expression of a neighbouring gene, DEXI. The CLEC16A alleles that are protective from T1D and MS are associated with increased expression of DEXI, and no other genes in the region, in two independent monocyte gene expression data sets. Critically, using chromosome conformation capture (3C), we identified physical proximity between the DEXI promoter region and intron 19 of CLEC16A, separated by a loop of >150 kb. In reciprocal experiments, a 20 kb fragment of intron 19 of CLEC16A, containing SNPs associated with T1D and MS, as well as with DEXI expression, interacted with the promotor region of DEXI but not with candidate DNA fragments containing other potential causal genes in the region, including CLEC16A. Intron 19 of CLEC16A is highly enriched for transcription-factor-binding events and markers associated with enhancer activity. Taken together, these data indicate that although the causal variants in the 16p13 region lie within CLEC16A, DEXI is an unappreciated autoimmune disease candidate gene, and illustrate the power of the 3C approach in progressing from genome-wide association studies results to candidate causal genes.
PMCID: PMC3276289  PMID: 21989056
7.  Comparative genomics reveals mobile pathogenicity chromosomes in Fusarium 
Nature  2010;464(7287):367-373.
Fusarium species are among the most important phytopathogenic and toxigenic fungi. To understand the molecular underpinnings of pathogenicity in the genus Fusarium, we compared the genomes of three phenotypically diverse species: Fusarium graminearum, Fusarium verticillioides and Fusarium oxysporum f. sp. lycopersici. Our analysis revealed lineage-specific (LS) genomic regions in F. oxysporum that include four entire chromosomes and account for more than one-quarter of the genome. LS regions are rich in transposons and genes with distinct evolutionary profiles but related to pathogenicity, indicative of horizontal acquisition. Experimentally, we demonstrate the transfer of two LS chromosomes between strains of F. oxysporum, converting a non-pathogenic strain into a pathogen. Transfer of LS chromosomes between otherwise genetically isolated strains explains the polyphyletic origin of host specificity and the emergence of new pathogenic lineages in F. oxysporum. These findings put the evolution of fungal pathogenicity into a new perspective.
PMCID: PMC3048781  PMID: 20237561
8.  T1DBase: update 2011, organization and presentation of large-scale data sets for type 1 diabetes research 
Nucleic Acids Research  2010;39(Database issue):D997-D1001.
T1DBase ( is web platform, which supports the type 1 diabetes (T1D) community. It integrates genetic, genomic and expression data relevant to T1D research across mouse, rat and human and presents this to the user as a set of web pages and tools. This update describes the incorporation of new data sets, tools and curation efforts as well as a new website design to simplify site use. New data sets include curated summary data from four genome-wide association studies relevant to T1D, HaemAtlas—a data set and tool to query gene expression levels in haematopoietic cells and a manually curated table of human T1D susceptibility loci, incorporating genetic overlap with other related diseases. These developments will continue to support T1D research and allow easy access to large and complex T1D relevant data sets.
PMCID: PMC3013780  PMID: 20937630
9.  Integrating sequence, evolution and functional genomics in regulatory genomics 
Genome Biology  2009;10(1):202.
Finding transcription factor binding sites in regulatory regions of the genome
With genome analysis expanding from the study of genes to the study of gene regulation, 'regulatory genomics' utilizes sequence information, evolution and functional genomics measurements to unravel how regulatory information is encoded in the genome.
PMCID: PMC2687781  PMID: 19226437
10.  The genome of the blood fluke Schistosoma mansoni 
Nature  2009;460(7253):352-358.
Schistosoma mansoni is responsible for the neglected tropical disease schistosomiasis that affects 210 million people in 76 countries. We report here analysis of the 363 megabase nuclear genome of the blood fluke. It encodes at least 11,809 genes, with an unusual intron size distribution, and novel families of micro-exon genes that undergo frequent alternate splicing. As the first sequenced flatworm, and a representative of the lophotrochozoa, it offers insights into early events in the evolution of the animals, including the development of a body pattern with bilateral symmetry, and the development of tissues into organs. Our analysis has been informed by the need to find new drug targets. The deficits in lipid metabolism that make schistosomes dependent on the host are revealed, while the identification of membrane receptors, ion channels and more than 300 proteases, provide new insights into the biology of the life cycle and novel targets. Bioinformatics approaches have identified metabolic chokepoints while a chemogenomic screen has pinpointed schistosome proteins for which existing drugs may be active. The information generated provides an invaluable resource for the research community to develop much needed new control tools for the treatment and eradication of this important and neglected disease.
PMCID: PMC2756445  PMID: 19606141
11.  Comparative genomics of the neglected human malaria parasite Plasmodium vivax 
Nature  2008;455(7214):757-763.
The human malaria parasite Plasmodium vivax is responsible for 25-40% of the ~515 million annual cases of malaria worldwide. Although seldom fatal, the parasite elicits severe and incapacitating clinical symptoms and often relapses months after a primary infection has cleared. Despite its importance as a major human pathogen, P. vivax is little studied because it cannot be propagated in the laboratory except in non-human primates. We determined the genome sequence of P. vivax in order to shed light on its distinctive biologic features, and as a means to drive development of new drugs and vaccines. Here we describe the synteny and isochore structure of P. vivax chromosomes, and show that the parasite resembles other malaria parasites in gene content and metabolic potential, but possesses novel gene families and potential alternate invasion pathways not recognized previously. Completion of the P. vivax genome provides the scientific community with a valuable resource that can be used to advance scientific investigation into this neglected species.
PMCID: PMC2651158  PMID: 18843361
12.  ArrayExpress update—from an archive of functional genomics experiments to the atlas of gene expression 
Nucleic Acids Research  2008;37(Database issue):D868-D872.
ArrayExpress consists of three components: the ArrayExpress Repository—a public archive of functional genomics experiments and supporting data, the ArrayExpress Warehouse—a database of gene expression profiles and other bio-measurements and the ArrayExpress Atlas—a new summary database and meta-analytical tool of ranked gene expression across multiple experiments and different biological conditions. The Repository contains data from over 6000 experiments comprising approximately 200 000 assays, and the database doubles in size every 15 months. The majority of the data are array based, but other data types are included, most recently—ultra high-throughput sequencing transcriptomics and epigenetic data. The Warehouse and Atlas allow users to query for differentially expressed genes by gene names and properties, experimental conditions and sample properties, or a combination of both. In this update, we describe the ArrayExpress developments over the last two years.
PMCID: PMC2686529  PMID: 19015125
13.  Developmentally regulated expression, alternative splicing and distinct sub-groupings in members of the Schistosoma mansoni venom allergen-like (SmVAL) gene family 
BMC Genomics  2008;9:89.
The Sperm-coating protein/Tpx-1/Ag5/PR-1/Sc7 (SCP/TAPS) domain is found across phyla and is a major structural feature of insect allergens, mammalian sperm proteins and parasitic nematode secreted molecules. Proteins containing this domain are implicated in diverse biological activities and may be important for chronic host/parasite interactions.
We report the first description of an SCP/TAPS gene family (Schistosoma mansoni venom allergen-like (SmVALs)) in the medically important Platyhelminthes (class Trematoda) and describe individual members' phylogenetic relationships, genomic organization and life cycle expression profiles. Twenty-eight SmVALs with complete SCP/TAPS domains were identified and comparison of their predicted protein features and gene structures indicated the presence of two distinct sub-families (group 1 & group 2). Phylogenetic analysis demonstrated that this group 1/group 2 split is zoologically widespread as it exists across the metazoan sub-kingdom. Chromosomal localisation and PCR analysis, coupled to inspection of the current S. mansoni genomic assembly, revealed that many of the SmVAL genes are spatially linked throughout the genome. Quantitative lifecycle expression profiling demonstrated distinct SmVAL expression patterns, including transcripts specifically associated with lifestages involved in definitive host invasion, transcripts restricted to lifestages involved in the invasion of the intermediate host and transcripts ubiquitously expressed. Analysis of SmVAL6 transcript diversity demonstrated statistically significant, developmentally regulated, alternative splicing.
Our results highlight the existence of two distinct SCP/TAPS protein types within the Platyhelminthes and across taxa. The extensive lifecycle expression analysis indicates several SmVAL transcripts are upregulated in infective stages of the parasite, suggesting that these particular protein products may be linked to the establishment of chronic host/parasite interactions.
PMCID: PMC2270263  PMID: 18294395
14.  The Genome of the Kinetoplastid Parasite, Leishmania major 
Ivens, Alasdair C. | Peacock, Christopher S. | Worthey, Elizabeth A. | Murphy, Lee | Aggarwal, Gautam | Berriman, Matthew | Sisk, Ellen | Rajandream, Marie-Adele | Adlem, Ellen | Aert, Rita | Anupama, Atashi | Apostolou, Zina | Attipoe, Philip | Bason, Nathalie | Bauser, Christopher | Beck, Alfred | Beverley, Stephen M. | Bianchettin, Gabriella | Borzym, Katja | Bothe, Gordana | Bruschi, Carlo V. | Collins, Matt | Cadag, Eithon | Ciarloni, Laura | Clayton, Christine | Coulson, Richard M. R. | Cronin, Ann | Cruz, Angela K. | Davies, Robert M. | Gaudenzi, Javier De | Dobson, Deborah E. | Duesterhoeft, Andreas | Fazelina, Gholam | Fosker, Nigel | Frasch, Alberto Carlos | Fraser, Audrey | Fuchs, Monika | Gabel, Claudia | Goble, Arlette | Goffeau, André | Harris, David | Hertz-Fowler, Christiane | Hilbert, Helmut | Horn, David | Huang, Yiting | Klages, Sven | Knights, Andrew | Kube, Michael | Larke, Natasha | Litvin, Lyudmila | Lord, Angela | Louie, Tin | Marra, Marco | Masuy, David | Matthews, Keith | Michaeli, Shulamit | Mottram, Jeremy C. | Müller-Auer, Silke | Munden, Heather | Nelson, Siri | Norbertczak, Halina | Oliver, Karen | O'Neil, Susan | Pentony, Martin | Pohl, Thomas M. | Price, Claire | Purnelle, Bénédicte | Quail, Michael A. | Rabbinowitsch, Ester | Reinhardt, Richard | Rieger, Michael | Rinta, Joel | Robben, Johan | Robertson, Laura | Ruiz, Jeronimo C. | Rutter, Simon | Saunders, David | Schäfer, Melanie | Schein, Jacquie | Schwartz, David C. | Seeger, Kathy | Seyler, Amber | Sharp, Sarah | Shin, Heesun | Sivam, Dhileep | Squares, Rob | Squares, Steve | Tosato, Valentina | Vogt, Christy | Volckaert, Guido | Wambutt, Rolf | Warren, Tim | Wedler, Holger | Woodward, John | Zhou, Shiguo | Zimmermann, Wolfgang | Smith, Deborah F. | Blackwell, Jenefer M. | Stuart, Kenneth D. | Barrell, Bart | Myler, Peter J.
Science (New York, N.Y.)  2005;309(5733):436-442.
PMCID: PMC1470643  PMID: 16020728
15.  Control systems for membrane fusion in the ancestral eukaryote; evolution of tethering complexes and SM proteins 
In membrane trafficking, the mechanisms ensuring vesicle fusion specificity remain to be fully elucidated. Early models proposed that specificity was encoded entirely by SNARE proteins; more recent models include contributions from Rab proteins, Syntaxin-binding (SM) proteins and tethering factors. Most information on membrane trafficking derives from an evolutionarily narrow sampling of model organisms. However, considering factors from a wider diversity of eukaryotes can provide both functional information on core systems and insight into the evolutionary history of the trafficking machinery. For example, the major Qa/syntaxin SNARE families are present in most eukaryotic genomes and likely each evolved via gene duplication from a single ancestral syntaxin before the existing eukaryotic groups diversified. This pattern is also likely for Rabs and various other components of the membrane trafficking machinery.
We performed comparative genomic and phylogenetic analyses, when relevant, on the SM proteins and components of the tethering complexes, both thought to contribute to vesicle fusion specificity. Despite evidence suggestive of secondary losses amongst many lineages, the tethering complexes are well represented across the eukaryotes, suggesting an origin predating the radiation of eukaryotic lineages. Further, whilst we detect distant sequence relations between GARP, COG, exocyst and DSL1 components, these similarities most likely reflect convergent evolution of similar secondary structural elements. No similarity is found between the TRAPP and HOPS complexes and the other tethering factors. Overall, our data favour independent origins for the various tethering complexes. The taxa examined possess at least one homologue of each of the four SM protein families; since the four monophyletic families each encompass a wide diversity of eukaryotes, the SM protein families very likely evolved before the last common eukaryotic ancestor (LCEA).
These data further support a highly complex LCEA and indicate that the basic architecture of the trafficking system is remarkably conserved and ancient, with the SM proteins and tethering factors having originated very early in eukaryotic evolution. However, the independent origin of the tethering complexes suggests a novel pattern for increasing complexity in the membrane trafficking system, in addition to the pattern of paralogous machinery elaboration seen thus far.
PMCID: PMC1810245  PMID: 17319956
16.  Lineage-specific partitions in archaeal transcription 
Archaea  2006;2(2):117-125.
The phylogenetic distribution of the components comprising the transcriptional machinery in the crenarchaeal and euryarchaeal lineages of the Archaea was analyzed in a systematic manner by genome-wide profiling of transcription complements in fifteen complete archaeal genome sequences. Initially, a reference set of transcription-associated proteins (TAPs) consisting of sequences functioning in all aspects of the transcriptional process, and originating from the three domains of life, was used to query the genomes. TAP-families were detected by sequence clustering of the TAPs and their archaeal homologues, and through extensive database searching, these families were assigned a function. The phylogenetic origins of archaeal genes matching hidden Markov model profiles of protein domains associated with transcription, and those encoding the TAP-homologues, showed there is extensive lineage-specificity of proteins that function as regulators of transcription: most of these sequences are present solely in the Euryarchaeota, with nearly all of them homologous to bacterial DNA-binding proteins. Strikingly, the hidden Markov model profile searches revealed that archaeal chromatin and histone-modifying enzymes also display extensive taxon-restrictedness, both across and within the two phyla.
PMCID: PMC2686387  PMID: 17350932
genome profiling; protein families; sequence clustering; transcription-associated proteins
17.  The phylogenetic diversity of eukaryotic transcription 
Nucleic Acids Research  2003;31(2):653-660.
Eukaryotic transcription is a highly regulated process involving interactions between large numbers of proteins. To analyse the phylogenetic distribution of the components of this process, six crown eukaryote group genomes were queried with a reference set of transcription-associated (TA) pro teins. On average, one in 10 proteins encoded by these genomes were found to be homologous to sequences in the reference set. Analysis of families identified using an accurate sequence clustering algorithm and containing both TA proteins and eukaryotic sequences showed that in two-thirds of the families the homologues originate from a single kingdom. Furthermore, in only 15% of the fungal-specific clusters are the homologues present in both budding and fission yeast, as compared with the metazoan-specific clusters where 53% of the homologues originate from two or more species. Families whose members comprise general transcription factor or RNA polymerase subunits exhibit a low degree of taxon specificity, suggesting that the transcription initiation complex is highly conserved. This contrasts with transcriptional regulator families, that are primarily taxon-specific, indicating proteins controlling gene activation exhibit considerable sequence diversity across the eukaryotic domain.
PMCID: PMC140520  PMID: 12527774

