1.  Genome-Wide Computational Prediction and Analysis of Core Promoter Elements across Plant Monocots and Dicots 
PLoS ONE  2013;8(10):e79011.
Transcription initiation, essential to gene expression regulation, involves recruitment of basal transcription factors to the core promoter elements (CPEs). The distribution of currently known CPEs across plant genomes is largely unknown. This is the first large scale genome-wide report on the computational prediction of CPEs across eight plant genomes to help better understand the transcription initiation complex assembly. The distribution of thirteen known CPEs across four monocots (Brachypodium distachyon, Oryza sativa ssp. japonica, Sorghum bicolor, Zea mays) and four dicots (Arabidopsis thaliana, Populus trichocarpa, Vitis vinifera, Glycine max) reveals the structural organization of the core promoter in relation to the TATA-box as well as with respect to other CPEs. The distribution of known CPE motifs with respect to transcription start site (TSS) exhibited positional conservation within monocots and dicots with slight differences across all eight genomes. Further, a more refined subset of annotated genes based on orthologs of the model monocot (O. sativa ssp. japonica) and dicot (A. thaliana) genomes supported the positional distribution of these thirteen known CPEs. DNA free energy profiles provided evidence that the structural properties of promoter regions are distinctly different from that of the non-regulatory genome sequence. It also showed that monocot core promoters have lower DNA free energy than dicot core promoters. The comparison of monocot and dicot promoter sequences highlights both the similarities and differences in the core promoter architecture irrespective of the species-specific nucleotide bias. This study will be useful for future work related to genome annotation projects and can inspire research efforts aimed to better understand regulatory mechanisms of transcription.
PMCID: PMC3812177  PMID: 24205361
2.  Diversity, classification and function of the plant protein kinase superfamily 
Eukaryotic protein kinases belong to a large superfamily with hundreds to thousands of copies and are components of essentially all cellular functions. The goals of this study are to classify protein kinases from 25 plant species and to assess their evolutionary history in conjunction with consideration of their molecular functions. The protein kinase superfamily has expanded in the flowering plant lineage, in part through recent duplications. As a result, the flowering plant protein kinase repertoire, or kinome, is in general significantly larger than other eukaryotes, ranging in size from 600 to 2500 members. This large variation in kinome size is mainly due to the expansion and contraction of a few families, particularly the receptor-like kinase/Pelle family. A number of protein kinases reside in highly conserved, low copy number families and often play broadly conserved regulatory roles in metabolism and cell division, although functions of plant homologues have often diverged from their metazoan counterparts. Members of expanded plant kinase families often have roles in plant-specific processes and some may have contributed to adaptive evolution. Nonetheless, non-adaptive explanations, such as kinase duplicate subfunctionalization and insufficient time for pseudogenization, may also contribute to the large number of seemingly functional protein kinases in plants.
PMCID: PMC3415837  PMID: 22889912
plant protein kinase; gene family evolution; lineage-specific expansion; comparative genomics
3.  Alternative Splicing of a Multi-Drug Transporter from Pseudoperonospora cubensis Generates an RXLR Effector Protein That Elicits a Rapid Cell Death 
PLoS ONE  2012;7(4):e34701.
Pseudoperonospora cubensis, an obligate oomycete pathogen, is the causal agent of cucurbit downy mildew, a foliar disease of global economic importance. Similar to other oomycete plant pathogens, Ps. cubensis has a suite of RXLR and RXLR-like effector proteins, which likely function as virulence or avirulence determinants during the course of host infection. Using in silico analyses, we identified 271 candidate effector proteins within the Ps. cubensis genome with variable RXLR motifs. In extending this analysis, we present the functional characterization of one Ps. cubensis effector protein, RXLR protein 1 (PscRXLR1), and its closest Phytophthora infestans ortholog, PITG_17484, a member of the Drug/Metabolite Transporter (DMT) superfamily. To assess if such effector-non-effector pairs are common among oomycete plant pathogens, we examined the relationship(s) among putative ortholog pairs in Ps. cubensis and P. infestans. Of 271 predicted Ps. cubensis effector proteins, only 109 (41%) had a putative ortholog in P. infestans and evolutionary rate analysis of these orthologs shows that they are evolving significantly faster than most other genes. We found that PscRXLR1 was up-regulated during the early stages of infection of plants, and, moreover, that heterologous expression of PscRXLR1 in Nicotiana benthamiana elicits a rapid necrosis. More interestingly, we also demonstrate that PscRXLR1 arises as a product of alternative splicing, making this the first example of an alternative splicing event in plant pathogenic oomycetes transforming a non-effector gene to a functional effector protein. Taken together, these data suggest a role for PscRXLR1 in pathogenicity, and, in total, our data provide a basis for comparative analysis of candidate effector proteins and their non-effector orthologs as a means of understanding function and evolutionary history of pathogen effectors.
PMCID: PMC3320632  PMID: 22496844
4.  Genomic Transition to Pathogenicity in Chytrid Fungi 
PLoS Pathogens  2011;7(11):e1002338.
Understanding the molecular mechanisms of pathogen emergence is central to mitigating the impacts of novel infectious disease agents. The chytrid fungus Batrachochytrium dendrobatidis (Bd) is an emerging pathogen of amphibians that has been implicated in amphibian declines worldwide. Bd is the only member of its clade known to attack vertebrates. However, little is known about the molecular determinants of - or evolutionary transition to - pathogenicity in Bd. Here we sequence the genome of Bd's closest known relative - a non-pathogenic chytrid Homolaphlyctis polyrhiza (Hp). We first describe the genome of Hp, which is comparable to other chytrid genomes in size and number of predicted proteins. We then compare the genomes of Hp, Bd, and 19 additional fungal genomes to identify unique or recent evolutionary elements in the Bd genome. We identified 1,974 Bd-specific genes, a gene set that is enriched for protease, lipase, and microbial effector Gene Ontology terms. We describe significant lineage-specific expansions in three Bd protease families (metallo-, serine-type, and aspartyl proteases). We show that these protease gene family expansions occurred after the divergence of Bd and Hp from their common ancestor and thus are localized to the Bd branch. Finally, we demonstrate that the timing of the protease gene family expansions predates the emergence of Bd as a globally important amphibian pathogen.
Author Summary
The chytrid fungus Batrachochytrium dendrobatidis (Bd) is an emerging pathogen that has been implicated in decimating amphibian populations around the world. Bd is the only member of an ancient group of fungi (called the Chytridiomycota) that is known to attack vertebrates. The question of how an amphibian-killing fungus evolved from non-pathogenic ancestors is vital to protecting the world's remaining amphibians from Bd. We sequenced the genome of Bd's closest known relative - a non-pathogenic chytrid named Homolaphlyctis polyrhiza (Hp). We compared the genomes of Bd, Hp and 18 additional fungi to identify what makes Bd unique. We identified a large number of Bd-specific genes, a gene set that contains a number of possible pathogenicity factors. In particular, we describe a large number of protease genes in the Bd genome and show that these genes were duplicated after the divergence of Bd and Hp from their common ancestor. Studying Bd's pathogenesis in an evolutionary context provides new evidence for the role of protease genes in Bd's ability to kill amphibians.
PMCID: PMC3207900  PMID: 22072962
5.  A comparison of the low temperature transcriptomes and CBF regulons of three plant species that differ in freezing tolerance: Solanum commersonii, Solanum tuberosum, and Arabidopsis thaliana 
Journal of Experimental Botany  2011;62(11):3807-3819.
Solanum commersonii and Solanum tuberosum are closely related plant species that differ in their abilities to cold acclimate; whereas S. commersonii increases in freezing tolerance in response to low temperature, S. tuberosum does not. In Arabidopsis thaliana, cold-regulated genes have been shown to contribute to freezing tolerance, including those that comprise the CBF regulon, genes that are controlled by the CBF transcription factors. The low temperature transcriptomes and CBF regulons of S. commersonii and S. tuberosum were therefore compared to determine whether there might be differences that contribute to their differences in ability to cold acclimate. The results indicated that both plants alter gene expression in response to low temperature to similar degrees with similar kinetics and that both plants have CBF regulons composed of hundreds of genes. However, there were considerable differences in the sets of genes that comprised the low temperature transcriptomes and CBF regulons of the two species. Thus differences in cold regulatory programmes may contribute to the differences in freezing tolerance of these two species. However, 53 groups of putative orthologous genes that are cold-regulated in S. commersonii, S. tuberosum, and A. thaliana were identified. Given that the evolutionary distance between the two Solanum species and A. thaliana is 112–156 million years, it seems likely that these conserved cold-regulated genes—many of which encode transcription factors and proteins of unknown function—have fundamental roles in plant growth and development at low temperature.
PMCID: PMC3134341  PMID: 21511909
Arabidopsis; CBF regulon; freezing tolerance; low temperature transcriptome; Solanum species
6.  Evolutionary Relationships and Functional Diversity of Plant Sulfate Transporters 
Sulfate is an essential nutrient cycled in nature. Ion transporters that specifically facilitate the transport of sulfate across the membranes are found ubiquitously in living organisms. The phylogenetic analysis of known sulfate transporters and their homologous proteins from eukaryotic organisms indicate two evolutionarily distinct groups of sulfate transport systems. One major group named Tribe 1 represents yeast and fungal SUL, plant SULTR, and animal SLC26 families. The evolutionary origin of SULTR family members in land plants and green algae is suggested to be common with yeast and fungal SUL and animal anion exchangers (SLC26). The lineage of plant SULTR family is expanded into four subfamilies (SULTR1–SULTR4) in land plant species. By contrast, the putative SULTR homologs from Chlorophyte green algae are in two separate lineages; one with the subfamily of plant tonoplast-localized sulfate transporters (SULTR4), and the other diverged before the appearance of lineages for SUL, SULTR, and SLC26. There also was a group of yet undefined members of putative sulfate transporters in yeast and fungi divergent from these major lineages in Tribe 1. The other distinct group is Tribe 2, primarily composed of animal sodium-dependent sulfate/carboxylate transporters (SLC13) and plant tonoplast-localized dicarboxylate transporters (TDT). The putative sulfur-sensing protein (SAC1) and SAC1-like transporters (SLT) of Chlorophyte green algae, bryophyte, and lycophyte show low degrees of sequence similarities with SLC13 and TDT. However, the phylogenetic relationship between SAC1/SLT and the other two families, SLC13 and TDT in Tribe 2, is not clearly supported. In addition, the SAC1/SLT family is absent in the angiosperm species analyzed. The present study suggests distinct evolutionary trajectories of sulfate transport systems for land plants and green algae.
PMCID: PMC3355512  PMID: 22629272
evolution; plant; sulfate; transporter
7.  Comparative Genome Analysis Reveals an Absence of Leucine-Rich Repeat Pattern-Recognition Receptor Proteins in the Kingdom Fungi 
PLoS ONE  2010;5(9):e12725.
In plants and animals innate immunity is the first line of defence against attack by microbial pathogens. Specific molecular features of bacteria and fungi are recognised by pattern recognition receptors that have extracellular domains containing leucine rich repeats. Recognition of microbes by these receptors induces defence responses that protect hosts against potential microbial attack.
Methodology/Principal Findings
A survey of genome sequences from 101 species, representing a broad cross-section of the eukaryotic phylogenetic tree, reveals an absence of leucine rich repeat-domain containing receptors in the fungal kingdom. Uniquely, however, fungi possess adenylate cyclases that contain distinct leucine rich repeat-domains, which have been demonstrated to act as an alternative means of perceiving the presence of bacteria by at least one fungal species. Interestingly, the morphologically similar osmotrophic oomycetes, which are taxonomically distant members of the stramenopiles, possess pattern recognition receptors with similar domain structures to those found in plants.
The absence of pattern recognition receptors suggests that fungi may possess novel classes of pattern-recognition receptor, such as the modified adenylate cyclase, or instead rely on secretion of anti-microbial secondary metabolites for protection from microbial attack. The absence of pattern recognition receptors in fungi, coupled with their abundance in oomycetes, suggests this may be a unique characteristic of the fungal kingdom rather than a consequence of the osmotrophic growth form.
PMCID: PMC2939053  PMID: 20856863
8.  Diversification and Specialization of Plant RBR Ubiquitin Ligases 
PLoS ONE  2010;5(7):e11579.
RBR ubiquitin ligases are components of the ubiquitin-proteasome system present in all eukaryotes. They are characterized by having the RBR (RING – IBR – RING) supradomain. In this study, the patterns of emergence of RBR genes in plants are described.
Methodology/Principal Findings
Phylogenetic and structural data confirm that just four RBR subfamilies (Ariadne, ARA54, Plant I/Helicase and Plant II) exist in viridiplantae. All of them originated before the split that separated green algae from the rest of plants. Multiple genes of two of these subfamilies (Ariadne and Plant II) appeared in early plant evolution. It is deduced that the common ancestor of all plants contained at least five RBR genes and the available data suggest that this number has been increasing slowly along streptophyta evolution, although losses, especially of Helicase RBR genes, have also occurred in several lineages. Some higher plants (e. g. Arabidopsis thaliana, Oryza sativa) contain a very large number of RBR genes and many of them were recently generated by tandem duplications. Microarray data indicate that most of these new genes have low-level and sometimes specific expression patterns. On the contrary, and as occurs in animals, a small set of older genes are broadly expressed at higher levels.
The available data suggests that the dynamics of appearance and conservation of RBR genes is quite different in plants from what has been described in animals. In animals, an abrupt emergence of many structurally diverse RBR subfamilies in early animal history, followed by losses of multiple genes in particular lineages, occurred. These patterns are not observed in plants. It is also shown that while both plants and animals contain a small, similar set of essential RBR genes, the rest evolves differently. The functional implications of these results are discussed.
PMCID: PMC2904391  PMID: 20644651
9.  Comparative analyses reveal distinct sets of lineage-specific genes within Arabidopsis thaliana 
The availability of genome and transcriptome sequences for a number of species permits the identification and characterization of conserved as well as divergent genes such as lineage-specific genes which have no detectable sequence similarity to genes from other lineages. While genes conserved among taxa provide insight into the core processes among species, lineage-specific genes provide insights into evolutionary processes and biological functions that are likely clade or species specific.
Comparative analyses using the Arabidopsis thaliana genome and sequences from 178 other species within the Plant Kingdom enabled the identification of 24,624 A. thaliana genes (91.7%) that were termed Evolutionary Conserved (EC) as defined by sequence similarity to a database entry as well as two sets of lineage-specific genes within A. thaliana. One of the A. thaliana lineage-specific gene sets share sequence similarity only to sequences from species within the Brassicaceae family and are termed Conserved Brassicaceae-Specific Genes (914, 3.4%, CBSG). The other set of A. thaliana lineage-specific genes, the Arabidopsis Lineage-Specific Genes (1,324, 4.9%, ALSG), lack sequence similarity to any sequence outside A. thaliana. While many CBSGs (76.7%) and ALSGs (52.9%) are transcribed, the majority of the CBSGs (76.1%) and ALSGs (94.4%) have no annotated function. Co-expression analysis indicated significant enrichment of the CBSGs and ALSGs in multiple functional categories suggesting their involvement in a wide range of biological functions. Subcellular localization prediction revealed that the CBSGs were significantly enriched in proteins targeted to the secretory pathway (412, 45.1%). Among the 107 putatively secreted CBSGs with known functions, 67 encode a putative pollen coat protein or cysteine-rich protein with sequence similarity to the S-locus cysteine-rich protein that is the pollen determinant controlling allele specific pollen rejection in self-incompatible Brassicaceae species. Overall, the ALSGs and CBSGs were more highly methylated in floral tissue compared to the ECs. Single Nucleotide Polymorphism (SNP) analysis showed an elevated ratio of non-synonymous to synonymous SNPs within the ALSGs (1.99) and CBSGs (1.65) relative to the EC set (0.92), mainly caused by an elevated number of non-synonymous SNPs, indicating that they are fast-evolving at the protein sequence level.
Our analyses suggest that while a significant fraction of the A. thaliana proteome is conserved within the Plant Kingdom, evolutionarily distinct sets of genes that may function in defining biological processes unique to these lineages have arisen within the Brassicaceae and A. thaliana.
PMCID: PMC2829037  PMID: 20152032
10.  Evolution of Stress-Regulated Gene Expression in Duplicate Genes of Arabidopsis thaliana 
PLoS Genetics  2009;5(7):e1000581.
Due to the selection pressure imposed by highly variable environmental conditions, stress sensing and regulatory response mechanisms in plants are expected to evolve rapidly. One potential source of innovation in plant stress response mechanisms is gene duplication. In this study, we examined the evolution of stress-regulated gene expression among duplicated genes in the model plant Arabidopsis thaliana. Key to this analysis was reconstructing the putative ancestral stress regulation pattern. By comparing the expression patterns of duplicated genes with the patterns of their ancestors, duplicated genes likely lost and gained stress responses at a rapid rate initially, but the rate is close to zero when the synonymous substitution rate (a proxy for time) is >∼0.8. When considering duplicated gene pairs, we found that partitioning of putative ancestral stress responses occurred more frequently compared to cases of parallel retention and loss. Furthermore, the pattern of stress response partitioning was extremely asymmetric. An analysis of putative cis-acting DNA regulatory elements in the promoters of the duplicated stress-regulated genes indicated that the asymmetric partitioning of ancestral stress responses are likely due, at least in part, to differential loss of DNA regulatory elements; the duplicated genes losing most of their stress responses were those that had lost more of the putative cis-acting elements. Finally, duplicate genes that lost most or all of the ancestral responses are more likely to have gained responses to other stresses. Therefore, the retention of duplicates that inherit few or no functions seems to be coupled to neofunctionalization. Taken together, our findings provide new insight into the patterns of evolutionary changes in gene stress responses after duplication and lay the foundation for testing the adaptive significance of stress regulatory changes under highly variable biotic and abiotic environments.
Author Summary
Plants have developed a multitude of response mechanisms to survive stressful environments. Since the environment is highly variable, these stress response mechanisms are expected to undergo frequent innovation. Duplicate genes represent a potential source for such innovation. In this paper, we explored the evolutionary changes in stress responses at the transcriptional level among duplicated genes in the model plant Arabidopsis thaliana. We found that after gene duplication, ancestral stress responses tend to be retained by only one of the gene duplicates (partitioning). In addition, the pattern of partitioning of multiple stress responses is extremely asymmetric, where one duplicate tends to inherit most or all of the ancestral stress responses. We present evidence that the asymmetric loss of stress responses is correlated with the asymmetric loss of putative transcription factor binding sites. Interestingly, those duplicate genes inheriting few or no ancestral responses tend to have gained new stress responses, providing support for the model that gene duplicates are a source of innovation. Our findings provide important insight into the mechanisms of gene function evolution and lay the foundation for experimental studies to determine the significance of gain of stress responses in plant adaptation.
PMCID: PMC2709438  PMID: 19649161
11.  Identification of Coevolving Residues and Coevolution Potentials Emphasizing Structure, Bond Formation and Catalytic Coordination in Protein Evolution 
PLoS ONE  2009;4(3):e4762.
The structure and function of a protein is dependent on coordinated interactions between its residues. The selective pressures associated with a mutation at one site should therefore depend on the amino acid identity of interacting sites. Mutual information has previously been applied to multiple sequence alignments as a means of detecting coevolutionary interactions. Here, we introduce a refinement of the mutual information method that: 1) removes a significant, non-coevolutionary bias and 2) accounts for heteroscedasticity. Using a large, non-overlapping database of protein alignments, we demonstrate that predicted coevolving residue-pairs tend to lie in close physical proximity. We introduce coevolution potentials as a novel measure of the propensity for the 20 amino acids to pair amongst predicted coevolutionary interactions. Ionic, hydrogen, and disulfide bond-forming pairs exhibited the highest potentials. Finally, we demonstrate that pairs of catalytic residues have a significantly increased likelihood to be identified as coevolving. These correlations to distinct protein features verify the accuracy of our algorithm and are consistent with a model of coevolution in which selective pressures towards preserving residue interactions act to shape the mutational landscape of a protein by restricting the set of admissible neutral mutations.
PMCID: PMC2651771  PMID: 19274093
12.  Predicting Quantitative Genetic Interactions by Means of Sequential Matrix Approximation 
PLoS ONE  2008;3(9):e3284.
Despite the emerging experimental techniques for perturbing multiple genes and measuring their quantitative phenotypic effects, genetic interactions have remained extremely difficult to predict on a large scale. Using a recent high-resolution screen of genetic interactions in yeast as a case study, we investigated whether the extraction of pertinent information encoded in the quantitative phenotypic measurements could be improved by computational means. By taking advantage of the observation that most gene pairs in the genetic interaction screens have no significant interactions with each other, we developed a sequential approximation procedure which ranks the mutation pairs in order of evidence for a genetic interaction. The sequential approximations can efficiently remove background variation in the double-mutation screens and give increasingly accurate estimates of the single-mutant fitness measurements. Interestingly, these estimates not only provide predictions for genetic interactions which are consistent with those obtained using the measured fitness, but they can even significantly improve the accuracy with which one can distinguish functionally-related gene pairs from the non-interacting pairs. The computational approach, in general, enables an efficient exploration and classification of genetic interactions in other studies and systems as well.
PMCID: PMC2538561  PMID: 18818762
13.  Two-Component Signaling Elements and Histidyl-Aspartyl Phosphorelays† 
Two-component systems are an evolutionarily ancient means for signal transduction. These systems are comprised of a number of distinct elements, namely histidine kinases, response regulators, and in the case of multi-step phosphorelays, histidine-containing phosphotransfer proteins (HPts). Arabidopsis makes use of a two-component signaling system to mediate the response to the plant hormone cytokinin. Two-component signaling elements have also been implicated in plant responses to ethylene, abiotic stresses, and red light, and in regulating various aspects of plant growth and development. Here we present an overview of the two-component signaling elements found in Arabidopsis, including functional and phylogenetic information on both bona-fide and divergent elements.
PMCID: PMC3243373  PMID: 22303237
14.  Correction: Global Analysis of Genetic, Epigenetic and Transcriptional Polymorphisms in Arabidopsis thaliana Using Whole Genome Tiling Arrays 
PLoS Genetics  2008;4(6):10.1371/annotation/e21d3565-fec6-44d9-8fab-83da49c7c0b8.
PMCID: PMC2645276
15.  Super-Genotype: Global Monoclonality Defies the Odds of Nature 
PLoS ONE  2007;2(7):e590.
The ability to respond to natural selection under novel conditions is critical for the establishment and persistence of introduced alien species and their ability to become invasive. Here we correlated neutral and quantitative genetic diversity of the weed Pennisetum setaceum Forsk. Chiov. (Poaceae) with differing global (North American and African) patterns of invasiveness and compared this diversity to native range populations. Numerous molecular markers indicate complete monoclonality within and among all of these areas (FST = 0.0) and is supported by extreme low quantitative trait variance (QST = 0.00065–0.00952). The results support the general-purpose-genotype hypothesis that can tolerate all environmental variation. However, a single global genotype and widespread invasiveness under numerous environmental conditions suggests a super-genotype. The super-genotype described here likely evolved high levels of plasticity in response to fluctuating environmental conditions during the Early to Mid Holocene. During the Late Holocene, when environmental conditions were predominantly constant but extremely inclement, strong selection resulted in only a few surviving genotypes.
PMCID: PMC1895887  PMID: 17611622
16.  A Two-Locus Global DNA Barcode for Land Plants: The Coding rbcL Gene Complements the Non-Coding trnH-psbA Spacer Region 
PLoS ONE  2007;2(6):e508.
A useful DNA barcode requires sufficient sequence variation to distinguish between species and ease of application across a broad range of taxa. Discovery of a DNA barcode for land plants has been limited by intrinsically lower rates of sequence evolution in plant genomes than that observed in animals. This low rate has complicated the trade-off in finding a locus that is universal and readily sequenced and has sufficiently high sequence divergence at the species-level.
Methodology/Principal Findings
Here, a global plant DNA barcode system is evaluated by comparing universal application and degree of sequence divergence for nine putative barcode loci, including coding and non-coding regions, singly and in pairs across a phylogenetically diverse set of 48 genera (two species per genus). No single locus could discriminate among species in a pair in more than 79% of genera, whereas discrimination increased to nearly 88% when the non-coding trnH-psbA spacer was paired with one of three coding loci, including rbcL. In silico trials were conducted in which DNA sequences from GenBank were used to further evaluate the discriminatory power of a subset of these loci. These trials supported the earlier observation that trnH-psbA coupled with rbcL can correctly identify and discriminate among related species.
A combination of the non-coding trnH-psbA spacer region and a portion of the coding rbcL gene is recommended as a two-locus global land plant barcode that provides the necessary universality and species discrimination.
PMCID: PMC1876818  PMID: 17551588
17.  Patterns of expansion and expression divergence in the plant polygalacturonase gene family 
Genome Biology  2006;7(9):R87.
Analysis of Arabidopsis and rice polygalacturonases suggests that polygalacturonases duplicates underwent rapid expression divergence and that the mechanisms of duplication affect the divergence rate.
Polygalacturonases (PGs) belong to a large gene family in plants and are believed to be responsible for various cell separation processes. PG activities have been shown to be associated with a wide range of plant developmental programs such as seed germination, organ abscission, pod and anther dehiscence, pollen grain maturation, fruit softening and decay, xylem cell formation, and pollen tube growth, thus illustrating divergent roles for members of this gene family. A close look at phylogenetic relationships among Arabidopsis and rice PGs accompanied by analysis of expression data provides an opportunity to address key questions on the evolution and functions of duplicate genes.
We found that both tandem and whole-genome duplications contribute significantly to the expansion of this gene family but are associated with substantial gene losses. In addition, there are at least 21 PGs in the common ancestor of Arabidopsis and rice. We have also determined the relationships between Arabidopsis and rice PGs and their expression patterns in Arabidopsis to provide insights into the functional divergence between members of this gene family. By evaluating expression in five Arabidopsis tissues and during five stages of abscission, we found overlapping but distinct expression patterns for most of the different PGs.
Expression data suggest specialized roles or subfunctionalization for each PG gene member. PGs derived from whole genome duplication tend to have more similar expression patterns than those derived from tandem duplications. Our findings suggest that PG duplicates underwent rapid expression divergence and that the mechanisms of duplication affect the divergence rate.
PMCID: PMC1794546  PMID: 17010199
18.  Using Natural Selection to Explore the Adaptive Potential of Chlamydomonas reinhardtii 
PLoS ONE  2014;9(3):e92533.
Improving feedstock is critical to facilitate the commercial utilization of algae, in particular in open pond systems where, due to the presence of competitors and pests, high algal growth rates and stress tolerance are beneficial. Here we raised laboratory cultures of the model alga Chlamydomonas reinhardtii under serial dilution to explore the potential of crop improvement using natural selection. The alga was evolved for 1,880 generations in liquid medium under continuous light (EL population). At the end of the experiment, EL cells had a growth rate that was 35% greater than the progenitor population (PL). The removal of acetate from the medium demonstrated that EL growth enhancement largely relied on efficient usage of this organic carbon source. Genome re-sequencing uncovered 1,937 polymorphic DNA regions in the EL population with 149 single nucleotide polymorphisms resulting in amino acid substitutions. Transcriptome analysis showed, in the EL population, significant up regulation of genes involved in protein synthesis, the cell cycle and cellular respiration, whereas the DNA repair pathway and photosynthesis were down regulated. Like other algae, EL cells accumulated neutral lipids under nitrogen depletion. Our work demonstrates transcriptome and genome-wide impacts of natural selection on algal cells and points to a useful strategy for strain improvement.
PMCID: PMC3962425  PMID: 24658261
19.  A Modified ABCDE Model of Flowering in Orchids Based on Gene Expression Profiling Studies of the Moth Orchid Phalaenopsis aphrodite 
PLoS ONE  2013;8(11):e80462.
Previously we developed genomic resources for orchids, including transcriptomic analyses using next-generation sequencing techniques and construction of a web-based orchid genomic database. Here, we report a modified molecular model of flower development in the Orchidaceae based on functional analysis of gene expression profiles in Phalaenopsis aphrodite (a moth orchid) that revealed novel roles for the transcription factors involved in floral organ pattern formation. Phalaenopsis orchid floral organ-specific genes were identified by microarray analysis. Several critical transcription factors including AP3, PI, AP1 and AGL6, displayed distinct spatial distribution patterns. Phylogenetic analysis of orchid MADS box genes was conducted to infer the evolutionary relationship among floral organ-specific genes. The results suggest that gene duplication MADS box genes in orchid may have resulted in their gaining novel functions during evolution. Based on these analyses, a modified model of orchid flowering was proposed. Comparison of the expression profiles of flowers of a peloric mutant and wild-type Phalaenopsis orchid further identified genes associated with lip morphology and peloric effects. Large scale investigation of gene expression profiles revealed that homeotic genes from the ABCDE model of flower development classes A and B in the Phalaenopsis orchid have novel functions due to evolutionary diversification, and display differential expression patterns.
PMCID: PMC3827201  PMID: 24265826
20.  Ethylene Response Factor 6 Is a Regulator of Reactive Oxygen Species Signaling in Arabidopsis 
PLoS ONE  2013;8(8):e70289.
Reactive oxygen species (ROS) are produced in plant cells in response to diverse biotic and abiotic stresses as well as during normal growth and development. Although a large number of transcription factor (TF) genes are up- or down-regulated by ROS, currently very little is known about the functions of these TFs during oxidative stress. In this work, we examined the role of ERF6 (ETHYLENE RESPONSE FACTOR6), an AP2/ERF domain-containing TF, during oxidative stress responses in Arabidopsis. Mutant analyses showed that NADPH oxidase (RbohD) and calcium signaling are required for ROS-responsive expression of ERF6. erf6 insertion mutant plants showed reduced growth and increased H2O2 and anthocyanin levels. Expression analyses of selected ROS-responsive genes during oxidative stress identified several differentially expressed genes in the erf6 mutant. In particular, a number of ROS responsive genes, such as ZAT12, HSFs, WRKYs, MAPKs, RBOHs, DHAR1, APX4, and CAT1 were more strongly induced by H2O2 in erf6 plants than in wild-type. In contrast, MDAR3, CAT3, VTC2 and EX1 showed reduced expression levels in the erf6 mutant. Taken together, our results indicate that ERF6 plays an important role as a positive antioxidant regulator during plant growth and in response to biotic and abiotic stresses.
PMCID: PMC3734174  PMID: 23940555
21.  Genome-Wide Survey of Cold Stress Regulated Alternative Splicing in Arabidopsis thaliana with Tiling Microarray 
PLoS ONE  2013;8(6):e66511.
Alternative splicing plays a major role in expanding the potential informational content of eukaryotic genomes. It is an important post-transcriptional regulatory mechanism that can increase protein diversity and affect mRNA stability. Alternative splicing is often regulated in a tissue-specific and stress-responsive manner. Cold stress, which adversely affects plant growth and development, regulates the transcription and splicing of plant splicing factors. This can affect the pre-mRNA processing of many genes. To identify cold regulated alternative splicing we applied Affymetrix Arabidopsis tiling arrays to survey the transcriptome under cold treatment conditions. A novel algorithm was used for detection of statistically relevant changes in intron expression within a transcript between control and cold growth conditions. A reverse transcription polymerase chain reaction (RT-PCR) analysis of a number of randomly selected genes confirmed the changes in splicing patterns under cold stress predicted by tiling array. Our analysis revealed new types of cold responsive genes. While their expression level remains relatively unchanged under cold stress their splicing pattern shows detectable changes in the relative abundance of isoforms. The majority of cold regulated alternative splicing introduced a premature termination codon (PTC) into the transcripts creating potential targets for degradation by the nonsense mediated mRNA decay (NMD) process. A number of these genes were analyzed in NMD-defective mutants by RT-PCR and shown to evade NMD. This may result in new and truncated proteins with altered functions or dominant negative effects. The results indicate that cold affects both quantitative and qualitative aspects of gene expression.
PMCID: PMC3679080  PMID: 23776682
22.  Transcriptome Exploration in Leymus chinensis under Saline-Alkaline Treatment Using 454 Pyrosequencing 
PLoS ONE  2013;8(1):e53632.
Leymus chinensis (Trin.) Tzvel. is a high saline-alkaline tolerant forage grass genus of the tribe Gramineae family, which also plays an important role in protection of natural environment. To date, little is known about the saline-alkaline tolerance of L. chinensis on the molecular level. To better understand the molecular mechanism of saline-alkaline tolerance in L. chinensis, 454 pyrosequencing was used for the transcriptome study.
We used Roche-454 massive parallel pyrosequencing technology to sequence two different cDNA libraries that were built from the two samples of control and under saline-alkaline treatment (optimal stress concentration-Hoagland solution with 100 mM NaCl and 200 mM NaHCO3). A total of 363,734 reads in control group and 526,267 reads in treatment group with an average length of 489 bp and 493 bp were obtained, respectively. The reads were assembled into 104,105 unigenes with MIRA sequence assemable software, among which, 73,665 unigenes were in control group, 88,016 unigenes in treatment group and 57,576 unigenes in both groups. According to the comparative expression analysis between the two groups with the threshold of “log2 Ratio ≥1”, there were 36,497 up-regulated unegenes and 18,218 down-regulated unigenes predicted to be the differentially expressed genes. After gene annotation and pathway enrichment analysis, most of them were involved in stress and tolerant function, signal transduction, energy production and conversion, and inorganic ion transport. Furthermore, 16 of these differentially expressed genes were selected for real-time PCR validation, and they were successfully confirmed with the results of 454 pyrosequencing.
This work is the first time to study the transcriptome of L. chinensis under saline-alkaline treatment based on the 454-FLX massively parallel DNA sequencing platform. It also deepened studies on molecular mechanisms of saline-alkaline in L. chinensis, and constituted a database for future studies.
PMCID: PMC3554714  PMID: 23365637
23.  Differential Gene Expression in Soybean Leaf Tissues at Late Developmental Stages under Drought Stress Revealed by Genome-Wide Transcriptome Analysis 
PLoS ONE  2012;7(11):e49522.
The availability of complete genome sequence of soybean has allowed research community to design the 66 K Affymetrix Soybean Array GeneChip for genome-wide expression profiling of soybean. In this study, we carried out microarray analysis of leaf tissues of soybean plants, which were subjected to drought stress from late vegetative V6 and from full bloom reproductive R2 stages. Our data analyses showed that out of 46093 soybean genes, which were predicted with high confidence among approximately 66000 putative genes, 41059 genes could be assigned with a known function. Using the criteria of a ratio change > = 2 and a q-value<0.05, we identified 1458 and 1818 upregulated and 1582 and 1688 downregulated genes in drought-stressed V6 and R2 leaves, respectively. These datasets were classified into 19 most abundant biological categories with similar proportions. There were only 612 and 463 genes that were overlapped among the upregulated and downregulated genes, respectively, in both stages, suggesting that both conserved and unconserved pathways might be involved in regulation of drought response in different stages of plant development. A comparative expression analysis using our datasets and that of drought stressed Arabidopsis leaves revealed the existence of both conserved and species-specific mechanisms that regulate drought responses. Many upregulated genes encode either regulatory proteins, such as transcription factors, including those with high homology to Arabidopsis DREB, NAC, AREB and ZAT/STZ transcription factors, kinases and two-component system members, or functional proteins, e.g. late embryogenesis-abundant proteins, glycosyltransferases, glycoside hydrolases, defensins and glyoxalase I family proteins. A detailed analysis of the GmNAC family and the hormone-related gene category showed that expression of many GmNAC and hormone-related genes was altered by drought in V6 and/or R2 leaves. Additionally, the downregulation of many photosynthesis-related genes, which contribute to growth retardation under drought stress, may serve as an adaptive mechanism for plant survival. This study has identified excellent drought-responsive candidate genes for in-depth characterization and future development of improved drought-tolerant transgenic soybeans.
PMCID: PMC3505142  PMID: 23189148
24.  Genome-Wide Identification and Characterization of R2R3MYB Family in Cucumis sativus 
PLoS ONE  2012;7(10):e47576.
The R2R3MYB proteins comprise one of the largest families of transcription factors in plants. Although genome-wide analysis of this family has been carried out in some species, little is known about R2R3MYB genes in cucumber (Cucumis sativus L.).
Principal Findings
This study has identified 55 R2R3MYB genes in the latest cucumber genome and the CsR2R3MYB family contained the smallest number of identified genes compared to other species that have been studied due to the absence of recent gene duplication events. These results were also supported by genome distribution and gene duplication analysis. Phylogenetic analysis showed that they could be classified into 11 subgroups. The evolutionary relationships and the intron - exon organizations that showed similarities with Arabidopsis, Vitis and Glycine R2R3MYB proteins were also analyzed and suggested strong gene conservation but also the expansions of particular functional genes during the evolution of the plant species. In addition, we found that 8 out of 55 (∼14.54%) cucumber R2R3MYB genes underwent alternative splicing events, producing a variety of transcripts from a single gene, which illustrated the extremely high complexity of transcriptome regulation. Tissue-specific expression profiles showed that 50 cucumber R2R3MYB genes were expressed in at least one of the tissues and the other 5 genes showed very low expression in all tissues tested, which suggested that cucumber R2R3MYB genes took part in many cellular processes. The transcript abundance level analysis during abiotic conditions (NaCl, ABA and low temperature treatments) identified a group of R2R3MYB genes that responded to one or more treatments.
This study has produced a comparative genomics analysis of the cucumber R2R3MYB gene family and has provided the first steps towards the selection of CsR2R3MYB genes for cloning and functional dissection that can be used in further studies to uncover their roles in cucumber growth and development.
PMCID: PMC3479133  PMID: 23110079
25.  Comparative Transcriptome Profiling of Chilling Stress Responsiveness in Two Contrasting Rice Genotypes 
PLoS ONE  2012;7(8):e43274.
Rice is sensitive to chilling stress, especially at the seedling stage. To elucidate the molecular genetic mechanisms of chilling tolerance in rice, comprehensive gene expressions of two rice genotypes (chilling-tolerant LTH and chilling-sensitive IR29) with contrasting responses to chilling stress were comparatively analyzed. Results revealed a differential constitutive gene expression prior to stress and distinct global transcription reprogramming between the two rice genotypes under time-series chilling stress and subsequent recovery conditions. A set of genes with higher basal expression were identified in chilling-tolerant LTH compared with chilling-sensitive IR29, indicating their possible role in intrinsic tolerance to chilling stress. Under chilling stress, the major effect on gene expression was up-regulation in the chilling- tolerant genotype and strong repression in chilling-sensitive genotype. Early responses to chilling stress in both genotypes featured commonly up-regulated genes related to transcription regulation and signal transduction, while functional categories for late phase chilling regulated genes were diverse with a wide range of functional adaptations to continuous stress. Following the cessation of chilling treatments, there was quick and efficient reversion of gene expression in the chilling-tolerant genotype, while the chilling-sensitive genotype displayed considerably slower recovering capacity at the transcriptional level. In addition, the detection of differentially-regulated TF genes and enriched cis-elements demonstrated that multiple regulatory pathways, including CBF and MYBS3 regulons, were involved in chilling stress tolerance. A number of the chilling-regulated genes identified in this study were co-localized onto previously fine-mapped cold-tolerance-related QTLs, providing candidates for gene cloning and elucidation of molecular mechanisms responsible for chilling tolerance in rice.
PMCID: PMC3422246  PMID: 22912843

