PMCC PMCC

Search tips
Search criteria

Advanced
Results 1-17 (17)
 

Clipboard (0)
None

Select a Filter Below

Journals
more »
Year of Publication
1.  Utility and Limitations of Using Gene Expression Data to Identify Functional Associations 
PLoS Computational Biology  2016;12(12):e1005244.
Gene co-expression has been widely used to hypothesize gene function through guilt-by association. However, it is not clear to what degree co-expression is informative, whether it can be applied to genes involved in different biological processes, and how the type of dataset impacts inferences about gene functions. Here our goal is to assess the utility and limitations of using co-expression as a criterion to recover functional associations between genes. By determining the percentage of gene pairs in a metabolic pathway with significant expression correlation, we found that many genes in the same pathway do not have similar transcript profiles and the choice of dataset, annotation quality, gene function, expression similarity measure, and clustering approach significantly impacts the ability to recover functional associations between genes using Arabidopsis thaliana as an example. Some datasets are more informative in capturing coordinated expression profiles and larger data sets are not always better. In addition, to recover the maximum number of known pathways and identify candidate genes with similar functions, it is important to explore rather exhaustively multiple dataset combinations, similarity measures, clustering algorithms and parameters. Finally, we validated the biological relevance of co-expression cluster memberships with an independent phenomics dataset and found that genes that consistently cluster with leucine degradation genes tend to have similar leucine levels in mutants. This study provides a framework for obtaining gene functional associations by maximizing the information that can be obtained from gene expression datasets.
Author Summary
There remain genes with no known function even in the most well studied, model species. One common way to hypothesize gene function is based on the assumption that genes with similar expression profiles tend to have similar functions. However, using datasets and biological pathway information from the model plant Arabidopsis thaliana as an example, we discovered that, although genes in the same pathways are functionally related, genes in only a subset of the pathways have highly similar expression patterns. In addition, our ability to hypothesize gene functions based on expression is significantly impacted by how the dataset is processed and combined as well as the methodology used to identify genes with similar expression. Therefore, multiple datasets and methods should be tested to maximize the functional information that we can get based on similarity in gene expression.
doi:10.1371/journal.pcbi.1005244
PMCID: PMC5147789  PMID: 27935950
2.  A novel method for identifying polymorphic transposable elements via scanning of high-throughput short reads 
Identification of polymorphic transposable elements (TEs) is important because TE polymorphism creates genetic diversity and influences the function of genes in the host genome. However, de novo scanning of polymorphic TEs remains a challenge. Here, we report a novel computational method, called PTEMD (polymorphic TEs and their movement detection), for de novo discovery of genome-wide polymorphic TEs. PTEMD searches highly identical sequences using reads supported breakpoint evidences. Using PTEMD, we identified 14 polymorphic TE families (905 sequences) in rice blast fungus Magnaporthe oryzae, and 68 (10,618 sequences) in maize. We validated one polymorphic TE family experimentally, MoTE-1; all MoTE-1 family members are located in different genomic loci in the three tested isolates. We found that 57.1% (8 of 14) of the PTEMD-detected polymorphic TE families in M. oryzae are active. Furthermore, our data indicate that there are more polymorphic DNA transposons in maize than their counterparts of retrotransposons despite the fact that retrotransposons occupy largest fraction of genomic mass. We demonstrated that PTEMD is an effective tool for identifying polymorphic TEs in M. oryzae and maize genomes. PTEMD and the genome-wide polymorphic TEs in M. oryzae and maize are publically available at http://www.kanglab.cn/blast/PTEMD_V1.02.htm.
doi:10.1093/dnares/dsw011
PMCID: PMC4909310  PMID: 27098848
polymorphic transposon; high-throughput sequencing; rice blast fungus; maize
3.  Evolutionary Relationships and Functional Diversity of Plant Sulfate Transporters 
Sulfate is an essential nutrient cycled in nature. Ion transporters that specifically facilitate the transport of sulfate across the membranes are found ubiquitously in living organisms. The phylogenetic analysis of known sulfate transporters and their homologous proteins from eukaryotic organisms indicate two evolutionarily distinct groups of sulfate transport systems. One major group named Tribe 1 represents yeast and fungal SUL, plant SULTR, and animal SLC26 families. The evolutionary origin of SULTR family members in land plants and green algae is suggested to be common with yeast and fungal SUL and animal anion exchangers (SLC26). The lineage of plant SULTR family is expanded into four subfamilies (SULTR1–SULTR4) in land plant species. By contrast, the putative SULTR homologs from Chlorophyte green algae are in two separate lineages; one with the subfamily of plant tonoplast-localized sulfate transporters (SULTR4), and the other diverged before the appearance of lineages for SUL, SULTR, and SLC26. There also was a group of yet undefined members of putative sulfate transporters in yeast and fungi divergent from these major lineages in Tribe 1. The other distinct group is Tribe 2, primarily composed of animal sodium-dependent sulfate/carboxylate transporters (SLC13) and plant tonoplast-localized dicarboxylate transporters (TDT). The putative sulfur-sensing protein (SAC1) and SAC1-like transporters (SLT) of Chlorophyte green algae, bryophyte, and lycophyte show low degrees of sequence similarities with SLC13 and TDT. However, the phylogenetic relationship between SAC1/SLT and the other two families, SLC13 and TDT in Tribe 2, is not clearly supported. In addition, the SAC1/SLT family is absent in the angiosperm species analyzed. The present study suggests distinct evolutionary trajectories of sulfate transport systems for land plants and green algae.
doi:10.3389/fpls.2011.00119
PMCID: PMC3355512  PMID: 22629272
evolution; plant; sulfate; transporter
4.  Correction: Global Analysis of Genetic, Epigenetic and Transcriptional Polymorphisms in Arabidopsis thaliana Using Whole Genome Tiling Arrays 
PLoS Genetics  2008;4(6):10.1371/annotation/e21d3565-fec6-44d9-8fab-83da49c7c0b8.
doi:10.1371/annotation/e21d3565-fec6-44d9-8fab-83da49c7c0b8
PMCID: PMC2645276
5.  Contribution of Sequence Motif, Chromatin State, and DNA Structure Features to Predictive Models of Transcription Factor Binding in Yeast 
PLoS Computational Biology  2015;11(8):e1004418.
Transcription factor (TF) binding is determined by the presence of specific sequence motifs (SM) and chromatin accessibility, where the latter is influenced by both chromatin state (CS) and DNA structure (DS) properties. Although SM, CS, and DS have been used to predict TF binding sites, a predictive model that jointly considers CS and DS has not been developed to predict either TF-specific binding or general binding properties of TFs. Using budding yeast as model, we found that machine learning classifiers trained with either CS or DS features alone perform better in predicting TF-specific binding compared to SM-based classifiers. In addition, simultaneously considering CS and DS further improves the accuracy of the TF binding predictions, indicating the highly complementary nature of these two properties. The contributions of SM, CS, and DS features to binding site predictions differ greatly between TFs, allowing TF-specific predictions and potentially reflecting different TF binding mechanisms. In addition, a "TF-agnostic" predictive model based on three DNA “intrinsic properties” (in silico predicted nucleosome occupancy, major groove geometry, and dinucleotide free energy) that can be calculated from genomic sequences alone has performance that rivals the model incorporating experiment-derived data. This intrinsic property model allows prediction of binding regions not only across TFs, but also across DNA-binding domain families with distinct structural folds. Furthermore, these predicted binding regions can help identify TF binding sites that have a significant impact on target gene expression. Because the intrinsic property model allows prediction of binding regions across DNA-binding domain families, it is TF agnostic and likely describes general binding potential of TFs. Thus, our findings suggest that it is feasible to establish a TF agnostic model for identifying functional regulatory regions in potentially any sequenced genome.
Author Summary
Identification of transcription factor binding sites based on sequence motifs is typically accompanied by a high false positive rate. Increasing evidence suggests that there are many other factors besides DNA sequence that may affect the binding and interaction of TFs with DNA. Through the integration of sequence motif, chromatin state, and DNA structure properties, we show that TF binding can be better predicted. Moreover, considering chromatin state and DNA structure properties simultaneously yields a significant improvement. While the binding of some TFs can be readily predicted using either chromatin state information or DNA structure, other TFs need both. Thus, our findings provide insights on how different histone modifications and DNA structure properties may influence the binding of a particular TF and thus how TFs regulate gene expression. These features are referred to as sequence “intrinsic properties” because they can be predicted from sequences alone. These intrinsic properties can be used to build a TF binding prediction model that has a similar performance to considering all features. Moreover, the intrinsic property model allows TFBS predictions not only across TFs, but also across DNA-binding domain families that are present in most eukaryotes, suggesting that the model likely can be used across species.
doi:10.1371/journal.pcbi.1004418
PMCID: PMC4546298  PMID: 26291518
6.  Retained duplicate genes in green alga Chlamydomonas reinhardtii tend to be stress responsive and experience frequent response gains 
BMC Genomics  2015;16(1):149.
Background
Green algae belong to a group of photosynthetic organisms that occupy diverse habitats, are closely related to land plants, and have been studied as sources of food and biofuel. Although multiple green algal genomes are available, a global comparative study of algal gene families has not been carried out. To investigate how gene families and gene expression have evolved, particularly in the context of stress response that have been shown to correlate with gene family expansion in multiple eukaryotes, we characterized the expansion patterns of gene families in nine green algal species, and examined evolution of stress response among gene duplicates in Chlamydomonas reinhardtii.
Results
Substantial variation in domain family sizes exists among green algal species. Lineage-specific expansion of families occurred throughout the green algal lineage but inferred gene losses occurred more often than gene gains, suggesting a continuous reduction of algal gene repertoire. Retained duplicates tend to be involved in stress response, similar to land plant species. However, stress responsive genes tend to be pseudogenized as well. When comparing ancestral and extant gene stress response state, we found that response gains occur in 13% of duplicate gene branches, much higher than 6% in Arabidopsis thaliana.
Conclusion
The frequent gains of stress response among green algal duplicates potentially reflect a high rate of innovation, resulting in a species-specific gene repertoire that contributed to adaptive response to stress. This could be further explored towards deciphering the mechanism of stress response, and identifying suitable green algal species for oil production.
Electronic supplementary material
The online version of this article (doi:10.1186/s12864-015-1335-5) contains supplementary material, which is available to authorized users.
doi:10.1186/s12864-015-1335-5
PMCID: PMC4364661  PMID: 25880851
7.  Prevalence, Evolution, and cis-Regulation of Diel Transcription in Chlamydomonas reinhardtii 
G3: Genes|Genomes|Genetics  2014;4(12):2461-2471.
Endogenous (circadian) and exogenous (e.g., diel) biological rhythms are a prominent feature of many living systems. In green algal species, knowledge of the extent of diel rhythmicity of genome-wide gene expression, its evolution, and its cis-regulatory mechanism is limited. In this study, we identified cyclically expressed genes under diel conditions in Chlamydomonas reinhardtii and found that ~50% of the 17,114 annotated genes exhibited cyclic expression. These cyclic expression patterns indicate a clear succession of biological processes during the course of a day. Among 237 functional categories enriched in cyclically expressed genes, >90% were phase-specific, including photosynthesis, cell division, and motility-related processes. By contrasting cyclic expression between C. reinhardtii and Arabidopsis thaliana putative orthologs, we found significant but weak conservation in cyclic gene expression patterns. On the other hand, within C. reinhardtii cyclic expression was preferentially maintained between duplicates, and the evolution of phase between paralogs is limited to relatively minor time shifts. Finally, to better understand the cis regulatory basis of diel expression, putative cis-regulatory elements were identified that could predict the expression phase of a subset of the cyclic transcriptome. Our findings demonstrate both the prevalence of cycling genes as well as the complex regulatory circuitry required to control cyclic expression in a green algal model, highlighting the need to consider diel expression in studying algal molecular networks and in future biotechnological applications.
doi:10.1534/g3.114.015032
PMCID: PMC4267941  PMID: 25354782
green algae; diel expression; transcriptomics; evolution; gene regulation; cis-regulatory element
8.  Diversity, classification and function of the plant protein kinase superfamily 
Eukaryotic protein kinases belong to a large superfamily with hundreds to thousands of copies and are components of essentially all cellular functions. The goals of this study are to classify protein kinases from 25 plant species and to assess their evolutionary history in conjunction with consideration of their molecular functions. The protein kinase superfamily has expanded in the flowering plant lineage, in part through recent duplications. As a result, the flowering plant protein kinase repertoire, or kinome, is in general significantly larger than other eukaryotes, ranging in size from 600 to 2500 members. This large variation in kinome size is mainly due to the expansion and contraction of a few families, particularly the receptor-like kinase/Pelle family. A number of protein kinases reside in highly conserved, low copy number families and often play broadly conserved regulatory roles in metabolism and cell division, although functions of plant homologues have often diverged from their metazoan counterparts. Members of expanded plant kinase families often have roles in plant-specific processes and some may have contributed to adaptive evolution. Nonetheless, non-adaptive explanations, such as kinase duplicate subfunctionalization and insufficient time for pseudogenization, may also contribute to the large number of seemingly functional protein kinases in plants.
doi:10.1098/rstb.2012.0003
PMCID: PMC3415837  PMID: 22889912
plant protein kinase; gene family evolution; lineage-specific expansion; comparative genomics
9.  Genome, Functional Gene Annotation, and Nuclear Transformation of the Heterokont Oleaginous Alga Nannochloropsis oceanica CCMP1779 
PLoS Genetics  2012;8(11):e1003064.
Unicellular marine algae have promise for providing sustainable and scalable biofuel feedstocks, although no single species has emerged as a preferred organism. Moreover, adequate molecular and genetic resources prerequisite for the rational engineering of marine algal feedstocks are lacking for most candidate species. Heterokonts of the genus Nannochloropsis naturally have high cellular oil content and are already in use for industrial production of high-value lipid products. First success in applying reverse genetics by targeted gene replacement makes Nannochloropsis oceanica an attractive model to investigate the cell and molecular biology and biochemistry of this fascinating organism group. Here we present the assembly of the 28.7 Mb genome of N. oceanica CCMP1779. RNA sequencing data from nitrogen-replete and nitrogen-depleted growth conditions support a total of 11,973 genes, of which in addition to automatic annotation some were manually inspected to predict the biochemical repertoire for this organism. Among others, more than 100 genes putatively related to lipid metabolism, 114 predicted transcription factors, and 109 transcriptional regulators were annotated. Comparison of the N. oceanica CCMP1779 gene repertoire with the recently published N. gaditana genome identified 2,649 genes likely specific to N. oceanica CCMP1779. Many of these N. oceanica–specific genes have putative orthologs in other species or are supported by transcriptional evidence. However, because similarity-based annotations are limited, functions of most of these species-specific genes remain unknown. Aside from the genome sequence and its analysis, protocols for the transformation of N. oceanica CCMP1779 are provided. The availability of genomic and transcriptomic data for Nannochloropsis oceanica CCMP1779, along with efficient transformation protocols, provides a blueprint for future detailed gene functional analysis and genetic engineering of Nannochloropsis species by a growing academic community focused on this genus.
Author Summary
Algae are a highly diverse group of organisms that have become the focus of renewed interest due to their potential for producing biofuel feedstocks, nutraceuticals, and biomaterials. Their high photosynthetic yields and ability to grow in areas unsuitable for agriculture provide a potential sustainable alternative to using traditional agricultural crops for biofuels. Because none of the algae currently in use have a history of domestication, and bioengineering of algae is still in its infancy, there is a need to develop algal strains adapted to cultivation for industrial large-scale production of desired compounds. Model organisms ranging from mice to baker's yeast have been instrumental in providing insights into fundamental biological structures and functions. The algal field needs versatile models to develop a fundamental understanding of photosynthetic production of biomass and valuable compounds in unicellular, marine, oleaginous algal species. To contribute to the development of such an algal model system for basic discovery, we sequenced the genome and two sets of transcriptomes of N. oceanica CCMP1779, assembled the genomic sequence, identified putative genes, and began to interpret the function of selected genes. This species was chosen because it is readily transformable with foreign DNA and grows well in culture.
doi:10.1371/journal.pgen.1003064
PMCID: PMC3499364  PMID: 23166516
10.  Alternative Splicing of a Multi-Drug Transporter from Pseudoperonospora cubensis Generates an RXLR Effector Protein That Elicits a Rapid Cell Death 
PLoS ONE  2012;7(4):e34701.
Pseudoperonospora cubensis, an obligate oomycete pathogen, is the causal agent of cucurbit downy mildew, a foliar disease of global economic importance. Similar to other oomycete plant pathogens, Ps. cubensis has a suite of RXLR and RXLR-like effector proteins, which likely function as virulence or avirulence determinants during the course of host infection. Using in silico analyses, we identified 271 candidate effector proteins within the Ps. cubensis genome with variable RXLR motifs. In extending this analysis, we present the functional characterization of one Ps. cubensis effector protein, RXLR protein 1 (PscRXLR1), and its closest Phytophthora infestans ortholog, PITG_17484, a member of the Drug/Metabolite Transporter (DMT) superfamily. To assess if such effector-non-effector pairs are common among oomycete plant pathogens, we examined the relationship(s) among putative ortholog pairs in Ps. cubensis and P. infestans. Of 271 predicted Ps. cubensis effector proteins, only 109 (41%) had a putative ortholog in P. infestans and evolutionary rate analysis of these orthologs shows that they are evolving significantly faster than most other genes. We found that PscRXLR1 was up-regulated during the early stages of infection of plants, and, moreover, that heterologous expression of PscRXLR1 in Nicotiana benthamiana elicits a rapid necrosis. More interestingly, we also demonstrate that PscRXLR1 arises as a product of alternative splicing, making this the first example of an alternative splicing event in plant pathogenic oomycetes transforming a non-effector gene to a functional effector protein. Taken together, these data suggest a role for PscRXLR1 in pathogenicity, and, in total, our data provide a basis for comparative analysis of candidate effector proteins and their non-effector orthologs as a means of understanding function and evolutionary history of pathogen effectors.
doi:10.1371/journal.pone.0034701
PMCID: PMC3320632  PMID: 22496844
11.  Genomic Transition to Pathogenicity in Chytrid Fungi 
PLoS Pathogens  2011;7(11):e1002338.
Understanding the molecular mechanisms of pathogen emergence is central to mitigating the impacts of novel infectious disease agents. The chytrid fungus Batrachochytrium dendrobatidis (Bd) is an emerging pathogen of amphibians that has been implicated in amphibian declines worldwide. Bd is the only member of its clade known to attack vertebrates. However, little is known about the molecular determinants of - or evolutionary transition to - pathogenicity in Bd. Here we sequence the genome of Bd's closest known relative - a non-pathogenic chytrid Homolaphlyctis polyrhiza (Hp). We first describe the genome of Hp, which is comparable to other chytrid genomes in size and number of predicted proteins. We then compare the genomes of Hp, Bd, and 19 additional fungal genomes to identify unique or recent evolutionary elements in the Bd genome. We identified 1,974 Bd-specific genes, a gene set that is enriched for protease, lipase, and microbial effector Gene Ontology terms. We describe significant lineage-specific expansions in three Bd protease families (metallo-, serine-type, and aspartyl proteases). We show that these protease gene family expansions occurred after the divergence of Bd and Hp from their common ancestor and thus are localized to the Bd branch. Finally, we demonstrate that the timing of the protease gene family expansions predates the emergence of Bd as a globally important amphibian pathogen.
Author Summary
The chytrid fungus Batrachochytrium dendrobatidis (Bd) is an emerging pathogen that has been implicated in decimating amphibian populations around the world. Bd is the only member of an ancient group of fungi (called the Chytridiomycota) that is known to attack vertebrates. The question of how an amphibian-killing fungus evolved from non-pathogenic ancestors is vital to protecting the world's remaining amphibians from Bd. We sequenced the genome of Bd's closest known relative - a non-pathogenic chytrid named Homolaphlyctis polyrhiza (Hp). We compared the genomes of Bd, Hp and 18 additional fungi to identify what makes Bd unique. We identified a large number of Bd-specific genes, a gene set that contains a number of possible pathogenicity factors. In particular, we describe a large number of protease genes in the Bd genome and show that these genes were duplicated after the divergence of Bd and Hp from their common ancestor. Studying Bd's pathogenesis in an evolutionary context provides new evidence for the role of protease genes in Bd's ability to kill amphibians.
doi:10.1371/journal.ppat.1002338
PMCID: PMC3207900  PMID: 22072962
12.  A comparison of the low temperature transcriptomes and CBF regulons of three plant species that differ in freezing tolerance: Solanum commersonii, Solanum tuberosum, and Arabidopsis thaliana 
Journal of Experimental Botany  2011;62(11):3807-3819.
Solanum commersonii and Solanum tuberosum are closely related plant species that differ in their abilities to cold acclimate; whereas S. commersonii increases in freezing tolerance in response to low temperature, S. tuberosum does not. In Arabidopsis thaliana, cold-regulated genes have been shown to contribute to freezing tolerance, including those that comprise the CBF regulon, genes that are controlled by the CBF transcription factors. The low temperature transcriptomes and CBF regulons of S. commersonii and S. tuberosum were therefore compared to determine whether there might be differences that contribute to their differences in ability to cold acclimate. The results indicated that both plants alter gene expression in response to low temperature to similar degrees with similar kinetics and that both plants have CBF regulons composed of hundreds of genes. However, there were considerable differences in the sets of genes that comprised the low temperature transcriptomes and CBF regulons of the two species. Thus differences in cold regulatory programmes may contribute to the differences in freezing tolerance of these two species. However, 53 groups of putative orthologous genes that are cold-regulated in S. commersonii, S. tuberosum, and A. thaliana were identified. Given that the evolutionary distance between the two Solanum species and A. thaliana is 112–156 million years, it seems likely that these conserved cold-regulated genes—many of which encode transcription factors and proteins of unknown function—have fundamental roles in plant growth and development at low temperature.
doi:10.1093/jxb/err066
PMCID: PMC3134341  PMID: 21511909
Arabidopsis; CBF regulon; freezing tolerance; low temperature transcriptome; Solanum species
13.  Phylogenetic Comparison of F-Box (FBX) Gene Superfamily within the Plant Kingdom Reveals Divergent Evolutionary Histories Indicative of Genomic Drift 
PLoS ONE  2011;6(1):e16219.
The emergence of multigene families has been hypothesized as a major contributor to the evolution of complex traits and speciation. To help understand how such multigene families arose and diverged during plant evolution, we examined the phylogenetic relationships of F-Box (FBX) genes, one of the largest and most polymorphic superfamilies known in the plant kingdom. FBX proteins comprise the target recognition subunit of SCF-type ubiquitin-protein ligases, where they individually recruit specific substrates for ubiquitylation. Through the extensive analysis of 10,811 FBX loci from 18 plant species, ranging from the alga Chlamydomonas reinhardtii to numerous monocots and eudicots, we discovered strikingly diverse evolutionary histories. The number of FBX loci varies widely and appears independent of the growth habit and life cycle of land plants, with a little as 198 predicted for Carica papaya to as many as 1350 predicted for Arabidopsis lyrata. This number differs substantially even among closely related species, with evidence for extensive gains/losses. Despite this extraordinary inter-species variation, one subset of FBX genes was conserved among most species examined. Together with evidence of strong purifying selection and expression, the ligases synthesized from these conserved loci likely direct essential ubiquitylation events. Another subset was much more lineage specific, showed more relaxed purifying selection, and was enriched in loci with little or no evidence of expression, suggesting that they either control more limited, species-specific processes or arose from genomic drift and thus may provide reservoirs for evolutionary innovation. Numerous FBX loci were also predicted to be pseudogenes with their numbers tightly correlated with the total number of FBX genes in each species. Taken together, it appears that the FBX superfamily has independently undergone substantial birth/death in many plant lineages, with its size and rapid evolution potentially reflecting a central role for ubiquitylation in driving plant fitness.
doi:10.1371/journal.pone.0016219
PMCID: PMC3030570  PMID: 21297981
14.  Comparative analyses reveal distinct sets of lineage-specific genes within Arabidopsis thaliana 
Background
The availability of genome and transcriptome sequences for a number of species permits the identification and characterization of conserved as well as divergent genes such as lineage-specific genes which have no detectable sequence similarity to genes from other lineages. While genes conserved among taxa provide insight into the core processes among species, lineage-specific genes provide insights into evolutionary processes and biological functions that are likely clade or species specific.
Results
Comparative analyses using the Arabidopsis thaliana genome and sequences from 178 other species within the Plant Kingdom enabled the identification of 24,624 A. thaliana genes (91.7%) that were termed Evolutionary Conserved (EC) as defined by sequence similarity to a database entry as well as two sets of lineage-specific genes within A. thaliana. One of the A. thaliana lineage-specific gene sets share sequence similarity only to sequences from species within the Brassicaceae family and are termed Conserved Brassicaceae-Specific Genes (914, 3.4%, CBSG). The other set of A. thaliana lineage-specific genes, the Arabidopsis Lineage-Specific Genes (1,324, 4.9%, ALSG), lack sequence similarity to any sequence outside A. thaliana. While many CBSGs (76.7%) and ALSGs (52.9%) are transcribed, the majority of the CBSGs (76.1%) and ALSGs (94.4%) have no annotated function. Co-expression analysis indicated significant enrichment of the CBSGs and ALSGs in multiple functional categories suggesting their involvement in a wide range of biological functions. Subcellular localization prediction revealed that the CBSGs were significantly enriched in proteins targeted to the secretory pathway (412, 45.1%). Among the 107 putatively secreted CBSGs with known functions, 67 encode a putative pollen coat protein or cysteine-rich protein with sequence similarity to the S-locus cysteine-rich protein that is the pollen determinant controlling allele specific pollen rejection in self-incompatible Brassicaceae species. Overall, the ALSGs and CBSGs were more highly methylated in floral tissue compared to the ECs. Single Nucleotide Polymorphism (SNP) analysis showed an elevated ratio of non-synonymous to synonymous SNPs within the ALSGs (1.99) and CBSGs (1.65) relative to the EC set (0.92), mainly caused by an elevated number of non-synonymous SNPs, indicating that they are fast-evolving at the protein sequence level.
Conclusions
Our analyses suggest that while a significant fraction of the A. thaliana proteome is conserved within the Plant Kingdom, evolutionarily distinct sets of genes that may function in defining biological processes unique to these lineages have arisen within the Brassicaceae and A. thaliana.
doi:10.1186/1471-2148-10-41
PMCID: PMC2829037  PMID: 20152032
15.  Evolution of Stress-Regulated Gene Expression in Duplicate Genes of Arabidopsis thaliana 
PLoS Genetics  2009;5(7):e1000581.
Due to the selection pressure imposed by highly variable environmental conditions, stress sensing and regulatory response mechanisms in plants are expected to evolve rapidly. One potential source of innovation in plant stress response mechanisms is gene duplication. In this study, we examined the evolution of stress-regulated gene expression among duplicated genes in the model plant Arabidopsis thaliana. Key to this analysis was reconstructing the putative ancestral stress regulation pattern. By comparing the expression patterns of duplicated genes with the patterns of their ancestors, duplicated genes likely lost and gained stress responses at a rapid rate initially, but the rate is close to zero when the synonymous substitution rate (a proxy for time) is >∼0.8. When considering duplicated gene pairs, we found that partitioning of putative ancestral stress responses occurred more frequently compared to cases of parallel retention and loss. Furthermore, the pattern of stress response partitioning was extremely asymmetric. An analysis of putative cis-acting DNA regulatory elements in the promoters of the duplicated stress-regulated genes indicated that the asymmetric partitioning of ancestral stress responses are likely due, at least in part, to differential loss of DNA regulatory elements; the duplicated genes losing most of their stress responses were those that had lost more of the putative cis-acting elements. Finally, duplicate genes that lost most or all of the ancestral responses are more likely to have gained responses to other stresses. Therefore, the retention of duplicates that inherit few or no functions seems to be coupled to neofunctionalization. Taken together, our findings provide new insight into the patterns of evolutionary changes in gene stress responses after duplication and lay the foundation for testing the adaptive significance of stress regulatory changes under highly variable biotic and abiotic environments.
Author Summary
Plants have developed a multitude of response mechanisms to survive stressful environments. Since the environment is highly variable, these stress response mechanisms are expected to undergo frequent innovation. Duplicate genes represent a potential source for such innovation. In this paper, we explored the evolutionary changes in stress responses at the transcriptional level among duplicated genes in the model plant Arabidopsis thaliana. We found that after gene duplication, ancestral stress responses tend to be retained by only one of the gene duplicates (partitioning). In addition, the pattern of partitioning of multiple stress responses is extremely asymmetric, where one duplicate tends to inherit most or all of the ancestral stress responses. We present evidence that the asymmetric loss of stress responses is correlated with the asymmetric loss of putative transcription factor binding sites. Interestingly, those duplicate genes inheriting few or no ancestral responses tend to have gained new stress responses, providing support for the model that gene duplicates are a source of innovation. Our findings provide important insight into the mechanisms of gene function evolution and lay the foundation for experimental studies to determine the significance of gain of stress responses in plant adaptation.
doi:10.1371/journal.pgen.1000581
PMCID: PMC2709438  PMID: 19649161
16.  Two-Component Signaling Elements and Histidyl-Aspartyl Phosphorelays† 
Two-component systems are an evolutionarily ancient means for signal transduction. These systems are comprised of a number of distinct elements, namely histidine kinases, response regulators, and in the case of multi-step phosphorelays, histidine-containing phosphotransfer proteins (HPts). Arabidopsis makes use of a two-component signaling system to mediate the response to the plant hormone cytokinin. Two-component signaling elements have also been implicated in plant responses to ethylene, abiotic stresses, and red light, and in regulating various aspects of plant growth and development. Here we present an overview of the two-component signaling elements found in Arabidopsis, including functional and phylogenetic information on both bona-fide and divergent elements.
doi:10.1199/tab.0112
PMCID: PMC3243373  PMID: 22303237
17.  Patterns of expansion and expression divergence in the plant polygalacturonase gene family 
Genome Biology  2006;7(9):R87.
Analysis of Arabidopsis and rice polygalacturonases suggests that polygalacturonases duplicates underwent rapid expression divergence and that the mechanisms of duplication affect the divergence rate.
Background
Polygalacturonases (PGs) belong to a large gene family in plants and are believed to be responsible for various cell separation processes. PG activities have been shown to be associated with a wide range of plant developmental programs such as seed germination, organ abscission, pod and anther dehiscence, pollen grain maturation, fruit softening and decay, xylem cell formation, and pollen tube growth, thus illustrating divergent roles for members of this gene family. A close look at phylogenetic relationships among Arabidopsis and rice PGs accompanied by analysis of expression data provides an opportunity to address key questions on the evolution and functions of duplicate genes.
Results
We found that both tandem and whole-genome duplications contribute significantly to the expansion of this gene family but are associated with substantial gene losses. In addition, there are at least 21 PGs in the common ancestor of Arabidopsis and rice. We have also determined the relationships between Arabidopsis and rice PGs and their expression patterns in Arabidopsis to provide insights into the functional divergence between members of this gene family. By evaluating expression in five Arabidopsis tissues and during five stages of abscission, we found overlapping but distinct expression patterns for most of the different PGs.
Conclusion
Expression data suggest specialized roles or subfunctionalization for each PG gene member. PGs derived from whole genome duplication tend to have more similar expression patterns than those derived from tandem duplications. Our findings suggest that PG duplicates underwent rapid expression divergence and that the mechanisms of duplication affect the divergence rate.
doi:10.1186/gb-2006-7-9-r87
PMCID: PMC1794546  PMID: 17010199

Results 1-17 (17)