Since its initial identification as a HIV-1-inducible gene in 2002, astrocyte elevated gene-1 (AEG-1), subsequently cloned as metadherin (MTDH) and lysine-rich CEACAM1 coisolated (LYRIC), has emerged over the past 10 years as an important oncogene providing a valuable prognostic marker in patients with various cancers. Recent studies demonstrate that AEG-1/MTDH/LYRIC is a pleiotropic protein that can localize in the cell membrane, cytoplasm, endoplasmic reticulum (ER), nucleus, and nucleolus, and contributes to diverse signaling pathways such as PI3K–AKT, NF-κB, MAPK, and Wnt. In addition to tumorigenesis, this multifunctional protein is implicated in various physiological and pathological processes including development, neurodegeneration, and inflammation. The present review focuses on the discovery of AEG-1/MTDH/LYRIC and conceptualizes areas of future direction for this intriguing gene. We begin by describing how AEG-1, MTDH, and LYRIC were initially identified by different research groups and then discuss AEG-1 structure, functions, localization, and evolution. We conclude with a discussion of the expression profile of AEG-1/MTDH/LYRIC in the context of cancer, neurological disorders, inflammation, and embryogenesis, and discuss how AEG-1/MTDH/LYRIC is regulated. This introductory discussion of AEG-1/MTDH/LYRIC will serve as the basis for the detailed discussions in other chapters of the unique properties of this intriguing molecule.
Oil palm is the most productive oil-bearing crop. Planted on only 5% of the total vegetable oil acreage, palm oil accounts for 33% of vegetable oil, and 45% of edible oil worldwide, but increased cultivation competes with dwindling rainforest reserves. We report the 1.8 gigabase (Gb) genome sequence of the African oil palm Elaeis guineensis, the predominant source of worldwide oil production. 1.535 Gb of assembled sequence and transcriptome data from 30 tissue types were used to predict at least 34,802 genes, including oil biosynthesis genes and homologues of WRINKLED1 (WRI1), and other transcriptional regulators1, which are highly expressed in the kernel. We also report the draft sequence of the S. American oil palm Elaeis oleifera, which has the same number of chromosomes (2n=32) and produces fertile interspecific hybrids with E. guineensis2, but appears to have diverged in the new world. Segmental duplications of chromosome arms define the palaeotetraploid origin of palm trees. The oil palm sequence enables the discovery of genes for important traits as well as somaclonal epigenetic alterations which restrict the use of clones in commercial plantings3, and thus helps achieve sustainability for biofuels and edible oils, reducing the rainforest footprint of this tropical plantation crop.
The recent availability of sequenced genomes from a broad array of chordates (cephalochordates, urochordates and vertebrates) has allowed us to systematically analyze the evolution of uroplakins: tetraspanins (UPK1a and UPK1b families) and their respective partner proteins (UPK2 and UPK3 families).
We report here: (1) the origin of uroplakins in the common ancestor of vertebrates, (2) the appearance of several residues that have statistically significantly positive dN/dS ratios in the duplicated paralogs of uroplakin genes, and (3) the existence of strong coevolutionary relationships between UPK1a/1b tetraspanins and their respective UPK2/UPK3-related partner proteins. Moreover, we report the existence of three new UPK2/3 family members we named UPK2b, 3c and 3d, which will help clarify the evolutionary relationships between fish, amphibian and mammalian uroplakins that may perform divergent functions specific to these different and physiologically distinct groups of vertebrates.
Since our analyses cover species of all major chordate groups this work provides an extremely clear overall picture of how the uroplakin families and their partner proteins have evolved in parallel. We also highlight several novel features of uroplakin evolution including the appearance of UPK2b and 3d in fish and UPK3c in the common ancestor of reptiles and mammals. Additional studies of these novel uroplakins should lead to new insights into uroplakin structure and function.
The species Alphapapillomavirus 7 (alpha-7) contains human papillomavirus genotypes that account for 15% of invasive cervical cancers and are disproportionately associated with adenocarcinoma of the cervix. Complete genome analyses enable identification and nomenclature of variant lineages and sublineages.
The URR/E6 region was sequenced to screen for novel variants of HPV18, 39, 45, 59, 68, 70, 85 and 97 from 1147 cervical samples obtained from multiple geographic regions that had previously been shown to contain an alpha-7 HPV isolate. To study viral heterogeneity, the complete 8 kb genome of 128 isolates, including 109 sequenced for this analysis, were annotated and analyzed. Viral evolution was characterized by constructing phylogenic trees using maximum-likelihood and Bayesian algorithms. Global and pairwise alignments were used to calculate total and ORF/region nucleotide differences; lineages and sublineages were assigned using an alphanumeric system. The prototype genome was assigned to the A lineage or A1 sublineage.
The genomic diversity of alpha-7 HPV types ranged from 1.1% to 6.7% nucleotide sequence differences; the extent of genome-genome pairwise intratype heterogeneity was 1.1% for HPV39, 1.3% for HPV59, 1.5% for HPV45, 1.6% for HPV70, 2.1% for HPV18, and 6.7% for HPV68. ME180 (previously a subtype of HPV68) was designated as the representative genome for HPV68 sublineage C1. Each ORF/region differed in sequence diversity, from most variable to least variable: noncoding region 1 (NCR1) / noncoding region 2 (NCR2) > upstream regulatory region (URR) > E6 / E7 > E2 / L2 > E1 / L1.
These data provide estimates of the maximum viral genomic heterogeneity of alpha-7 HPV type variants. The proposed taxonomic system facilitates the comparison of variants across epidemiological and molecular studies. Sequence diversity, geographic distribution and phylogenetic topology of this clinically important group of HPVs suggest an independent evolutionary history for each type.
Genomic and transcriptomic sequence data are essential tools for tackling ecological problems. Using an approach that combines next-generation sequencing, de novo transcriptome assembly, gene annotation and synthetic gene construction, we identify and cluster the protein families from Favia corals from the northern Red Sea.
We obtained 80 million 75 bp paired-end cDNA reads from two Favia adult samples collected at 65 m (Fav1, Fav2) on the Illumina GA platform, and generated two de novo assemblies using ABySS and CAP3. After removing redundancy and filtering out low quality reads, our transcriptome datasets contained 58,268 (Fav1) and 62,469 (Fav2) contigs longer than 100 bp, with N50 values of 1,665 bp and 1,439 bp, respectively. Using the proteome of the sea anemone Nematostella vectensis as a reference, we were able to annotate almost 20% of each dataset using reciprocal homology searches. Homologous clustering of these annotated transcripts allowed us to divide them into 7,186 (Fav1) and 6,862 (Fav2) homologous transcript clusters (E-value ≤ 2e-30). Functional annotation categories were assigned to homologous clusters using the functional annotation of Nematostella vectensis. General annotation of the assembled transcripts was improved 1-3% using the Acropora digitifera proteome. In addition, we screened these transcript isoform clusters for fluorescent proteins (FPs) homologs and identified seven potential FP homologs in Fav1, and four in Fav2. These transcripts were validated as bona fide FP transcripts via robust fluorescence heterologous expression. Annotation of the assembled contigs revealed that 1.34% and 1.61% (in Fav1 and Fav2, respectively) of the total assembled contigs likely originated from the corals’ algal symbiont, Symbiodinium spp.
Here we present a study to identify the homologous transcript isoform clusters from the transcriptome of Favia corals using a far-related reference proteome. Furthermore, the symbiont-derived transcripts were isolated from the datasets and their contribution quantified. This is the first annotated transcriptome of the genus Favia, a major increase in genomics resources available in this important family of corals.
K-mer; Contig; Open reading frame; Fluorescent protein; Blast; Clustering; High-throughput sequencing; Illumina paired-end; Coral
Population-level studies of parasites have the potential to elucidate patterns of host movement and cross-species interactions that are not evident from host genealogy alone. Bat flies are obligate and generally host-specific blood-feeding parasites of bats. Old-World flies in the family Nycteribiidae are entirely wingless and depend on their hosts for long-distance dispersal; their population genetics has been unstudied to date.
We collected a total of 125 bat flies from three Pteropus species (Pteropus vampyrus, P. hypomelanus, and P. lylei) from eight localities in Malaysia, Cambodia, and Vietnam. We identified specimens morphologically and then sequenced three mitochondrial DNA gene fragments (CoI, CoII, cytB; 1744 basepairs total) from a subset of 45 bat flies. We measured genetic diversity, molecular variance, and population genetic subdivision (FST), and used phylogenetic and haplotype network analyses to quantify parasite genetic structure across host species and localities.
All flies were identified as Cyclopodia horsfieldi with the exception of two individuals of Eucampsipoda sundaica. Low levels of population genetic structure were detected between populations of Cyclopodia horsfieldi from across a wide geographic range (~1000 km), and tests for isolation by distance were rejected. AMOVA results support a lack of geographic and host-specific population structure, with molecular variance primarily partitioned within populations. Pairwise FST values from flies collected from island populations of Pteropus hypomelanus in East and West Peninsular Malaysia supported predictions based on previous studies of host genetic structure.
The lack of population genetic structure and morphological variation observed in Cyclopodia horsfieldi is most likely due to frequent contact between flying fox species and subsequent high levels of parasite gene flow. Specifically, we suggest that Pteropus vampyrus may facilitate movement of bat flies between the three Pteropus species in the region. We demonstrate the utility of parasite genetics as an additional layer of information to measure host movement and interspecific host contact. These approaches may have wide implications for understanding zoonotic, epizootic, and enzootic disease dynamics. Bat flies may play a role as vectors of disease in bats, and their competence as vectors of bacterial and/or viral pathogens is in need of further investigation.
Bartonella; Connectivity; Diptera; Flying fox; Ectoparasite; Emerging infectious disease; Gene flow; Nipah virus; Nycteribiidae; Pathogens; Phylogeography
The enigmatic animal phylum Placozoa holds a key position in the metazoan Tree of Life. A simple bauplan makes it appear to be the most basal metazoan known and genetic evidence also points to a position close to the last common metazoan ancestor. Trichoplax adhaerens is the only formally described species in the phylum to date, making the Placozoa the only monotypic phylum in the animal kingdom. However, recent molecular genetic as well as morphological studies have identified a high level of diversity, and hence a potential high level of taxonomic diversity, within this phylum. Different taxa, possibly at different taxonomic levels, are awaiting description. In this review we firstly summarize knowledge on the morphology, phylogenetic position and ecology of the Placozoa. Secondly, we give an overview of placozoan morphological and genetic diversity and finally present an updated distribution of placozoan populations. We conclude that there is great potential and need to erect new taxa and to establish a firm system for this taxonomic tabula rasa.
Glucosyltransferases (Gtfs) catalyze the synthesis of glucans from sucrose and are produced by several species of lactic-acid bacteria. The oral bacterium Streptococcus mutans produces large amounts of glucans through the action of three Gtfs. GtfD produces water-soluble glucan (WSG), GtfB synthesizes water-insoluble glucans (WIG) and GtfC produces mainly WIG but also WSG. These enzymes, especially those synthesizing WIG, are of particular interest because of their role in the formation of dental plaque, an environment where S. mutans can thrive and produce lactic acid, promoting the formation of dental caries. We sequenced the gtfB, gtfC and gtfD genes from several mutans streptococcal strains isolated from the oral cavity of humans and searched for their homologues in strains isolated from chimpanzees and macaque monkeys. The sequence data were analyzed in conjunction with the available Gtf sequences from other bacteria in the genera Streptococcus, Lactobacillus and Leuconostoc to gain insights into the evolutionary history of this family of enzymes, with a particular emphasis on S. mutans Gtfs. Our analyses indicate that streptococcal Gtfs arose from a common ancestral progenitor gene, and that they expanded to form two clades according to the type of glucan they synthesize. We also show that the clade of streptococcal Gtfs synthesizing WIG appeared shortly after the divergence of viviparous, dentate mammals, which potentially contributed to the formation of dental plaque and the establishment of several streptococci in the oral cavity. The two S. mutans Gtfs capable of WIG synthesis, GtfB and GtfC, are likely the product of a gene duplication event. We dated this event to coincide with the divergence of the genomes of ancestral early primates. Thus, the acquisition and diversification of S. mutans Gtfs predates modern humans and is unrelated to the increase in dietary sucrose consumption.
The evolution of the diverse insect lineages is one of the most fascinating issues in evolutionary biology. Despite extensive research in this area, the resolution of insect phylogeny especially of interordinal relationships has turned out to be still a great challenge. One of the challenges for insect systematics is the radiation of the polyneopteran lineages with several contradictory and/or unresolved relationships. Here, we provide the first transcriptomic data for three enigmatic polyneopteran orders (Dermaptera, Plecoptera, and Zoraptera) to clarify one of the most debated issues among higher insect systematics. We applied different approaches to generate 3 data sets comprising 78 species and 1,579 clusters of orthologous genes. Using these three matrices, we explored several key mechanistic problems of phylogenetic reconstruction including missing data, matrix selection, gene and taxa number/choice, and the biological function of the genes. Based on the first phylogenomic approach including these three ambiguous polyneopteran orders, we provide here conclusive support for monophyletic Polyneoptera, contesting the hypothesis of Zoraptera + Paraneoptera and Plecoptera + remaining Neoptera. In addition, we employ various approaches to evaluate data quality and highlight problematic nodes within the Insect Tree that still exist despite our phylogenomic approach. We further show how the support for these nodes or alternative hypotheses might depend on the taxon- and/or gene-sampling.
polyneoptera; zoraptera; dermaptera; plecoptera; data quality
Aggregatibacter actinomycetemcomitans is implicated in localized aggressive periodontitis. We report the first genome sequence of an A. actinomycetemcomitans strain isolated from an Old World primate.
Alpha human papillomaviruses (HPVs) are among the most common sexually transmitted agents of which a subset causes cervical neoplasia and cancer in humans. Alpha-PVs have also been identified in non-human primates although few studies have systematically characterized such mucosal PVs. We cloned and characterized 10 distinct types of PVs from exfoliated cervicovaginal cells from different populations of female cynomolgus macaques (Macaca fascicularis) originating from China and Indonesia. These include 5 novel genotypes and 5 previously identified genotypes found in rhesus (Macaca mulatta) (RhPV-1, RhPV-a, RhPV-b and RhPV-d) and cynomolgus macaques (MfPV-a). Type-specific primers were designed to amplify the complete PV genomes using an overlapping PCR method. Four MfPVs were associated with cervical intraepithelial neoplasia (CIN). The most prevalent virus type was MfPV-3 (formerly RhPV-d), which was identified in 60% of animals with CIN. In addition, the complete genomes of variants of MfPV-3 and RhPV-1 were characterized. These variants are 97.1% and 97.7% similar across the L1 nucleotide sequences with the prototype genomes, respectively. Sequence comparisons and phylogenetic analyses indicate that these novel MfPVs cluster together within the alpha PV α12 species closely related to the α9 (e.g., HPV16) and α11 species (e.g., HPV34), and all share a most recent common ancestor. Our data expand the molecular diversity of non-human primate PVs and suggest the recent expansion of alpha PV species groups. Moreover, identification of an overlapping set of MfPVs in rhesus and cynomolgus macaques indicates that non-human primate alpha PVs might not be strictly species specific and that “subtypes” may represent recent divergence of host species or past interspecies infection.
alpha papillomavirus; Macaca fascicularis; novel PVs; genomic diversity; evolution
The international wildlife trade is a key threat to biodiversity. Temporal genetic marketplace monitoring can determine if wildlife trade regulation efforts such as the Convention on International Trade in Endangered Species (CITES) are succeeding. Protected under CITES effective 1997, sturgeons and paddlefishes, the producers of black caviar, are flagship CITES species.
We test whether CITES has limited the amount of fraudulent black caviar reaching the marketplace. Using mitochondrial DNA-based methods, we compare mislabeling in caviar and meat purchased in the New York City area pre and post CITES listing. Our recent sampling of this market reveals a decrease in mislabeled caviar (2006–2008; 10%; n = 90) compared to pre-CITES implementation (1995–1996; 19%; n = 95). Mislabeled caviar was found only in online purchase (n = 49 online/41 retail).
Stricter controls on importing and exporting as per CITES policies may be having a positive conservation effect by limiting the amount of fraudulent caviar reaching the marketplace. Sturgeons and paddlefishes remain a conservation priority, however, due to continued overfishing and habitat degradation. Other marine and aquatic species stand to benefit from the international trade regulation that can result from CITES listing.
Human DOR/TP53INP2 displays a unique bifunctional role as a modulator of autophagy and gene transcription. However, the domains or regions of DOR that participate in those functions have not been identified. Here we have performed structure/function analyses of DOR guided by identification of conserved regions in the DOR gene family by phylogenetic reconstructions. We show that DOR is present in metazoan species. Invertebrates harbor only one gene, DOR/Tp53inp2, and in the common ancestor of vertebrates Tp53inp1 may have arisen by gene duplication. In keeping with these data, we show that human TP53INP1 regulates autophagy and that different DOR/TP53INP2 and TP53INP1 proteins display transcriptional activity. The use of molecular evolutionary information has been instrumental to determine the regions that participate in DOR functions. DOR and TP53INP1 proteins share two highly conserved regions (region 1, aa residues 28–42; region 2, 66–112 in human DOR). Mutation of conserved hydrophobic residues in region 1 of DOR (that are part of a nuclear export signal, NES) reduces transcriptional activity, and blocks nuclear exit and autophagic activity under autophagy-activated conditions. We also identify a functional and conserved LC3-interacting motif (LIR) in region 1 of DOR and TP53INP1 proteins. Mutation of conserved acidic residues in region 2 of DOR reduces transcriptional activity, impairs nuclear exit in response to autophagy activation, and disrupts autophagy. Taken together, our data reveal DOR and TP53INP1 as dual regulators of transcription and autophagy, and identify two conserved regions in the DOR family that concentrate multiple functions crucial for autophagy and transcription.
A novel result of the current research is the development and implementation of a unique functional phylogenomic approach that explores the genomic origins of seed plant diversification. We first use 22,833 sets of orthologs from the nuclear genomes of 101 genera across land plants to reconstruct their phylogenetic relationships. One of the more salient results is the resolution of some enigmatic relationships in seed plant phylogeny, such as the placement of Gnetales as sister to the rest of the gymnosperms. In using this novel phylogenomic approach, we were also able to identify overrepresented functional gene ontology categories in genes that provide positive branch support for major nodes prompting new hypotheses for genes associated with the diversification of angiosperms. For example, RNA interference (RNAi) has played a significant role in the divergence of monocots from other angiosperms, which has experimental support in Arabidopsis and rice. This analysis also implied that the second largest subunit of RNA polymerase IV and V (NRPD2) played a prominent role in the divergence of gymnosperms. This hypothesis is supported by the lack of 24nt siRNA in conifers, the maternal control of small RNA in the seeds of flowering plants, and the emergence of double fertilization in angiosperms. Our approach takes advantage of genomic data to define orthologs, reconstruct relationships, and narrow down candidate genes involved in plant evolution within a phylogenomic view of species' diversification.
Understanding the genetic and genomic basis of plant diversification has been a major goal of evolutionary biologists since Darwin first pondered his “abominable mystery,” the rapid diversification of the angiosperms in the fossil record. We develop and deploy a functional phylogenomic approach that helps identify genes and biological processes putatively involved in species diversification. We assembled a matrix of 22,833 orthologs from 150 species to reconstruct seed plant phylogenetic relationships and to identify gene sets with a unique evolutionary signal. Our analysis of overrepresented biological processes in these sets narrowed down possible genetic mechanisms underlying plant adaptation and diversification. The phylogenetic relationships we uncovered support the hypothesis that gnetophytes are closely related to the rest of the gymnosperms at the base of the living seed plants. We also found that genes involved in post-transcriptional silencing via RNA interference (RNAi)—increasingly important in understanding plant evolution—are significantly represented early in angiosperm and gymnosperm divergence, with an apparent loss of specific classes of small interfering RNAs (siRNA) in gymnosperms. Our functional phylogenomic approach can be applied to any taxa with available sequences to enhance our knowledge of the evolutionary processes underlying biodiversity in general.
Recent whole-genome approaches to microbial phylogeny have emphasized partitioning genes into functional classes, often focusing on differences between a stable core of genes and a variable shell. To rigorously address the effects of partitioning and combining genes in genome-level analyses, we developed a novel technique called Random Addition Concatenation Analysis (RADICAL). RADICAL operates by sequentially concatenating randomly chosen gene partitions starting with a single-gene partition and ending with the entire genomic data set. A phylogenetic tree is built for every successive addition, and the entire process is repeated creating multiple random concatenation paths. The result is a library of trees representing a large variety of differently sized random gene partitions. This library can then be mined to identify unique topologies, assess overall agreement, and measure support for different trees. To evaluate RADICAL, we used 682 orthologous genes across 13 cyanobacterial genomes. Despite previous assertions of substantial differences between a core and a shell set of genes for this data set, RADICAL reveals the two partitions contain congruent phylogenetic signal. Substantial disagreement within the data set is limited to a few nodes and genes involved in metabolism, a functional group that is distributed evenly between the core and the shell partitions. We highlight numerous examples where RADICAL reveals aspects of phylogenetic behavior not evident by examining individual gene trees or a “‘total evidence” tree. Our method also demonstrates that most emergent phylogenetic signal appears early in the concatenation process. The software is freely available at http://desalle.amnh.org.
cyanobacteria; concatenation; core; shell; emergent phylogenetic support
The family Pteropodidae comprises bats commonly known as megabats or Old World fruit bats. Molecular phylogenetic studies of pteropodids have provided considerable insight into intrafamilial relationships, but these studies have included only a fraction of the extant diversity (a maximum of 26 out of the 46 currently recognized genera) and have failed to resolve deep relationships among internal clades. Here we readdress the systematics of pteropodids by applying a strategy to try to resolve ancient relationships within Pteropodidae, while providing further insight into subgroup membership, by 1) increasing the taxonomic sample to 42 genera; 2) increasing the number of characters (to >8,000 bp) and nuclear genomic representation; 3) minimizing missing data; 4) controlling for sequence bias; and 5) using appropriate data partitioning and models of sequence evolution.
Our analyses recovered six principal clades and one additional independent lineage (consisting of a single genus) within Pteropodidae. Reciprocal monophyly of these groups was highly supported and generally congruent among the different methods and datasets used. Likewise, most relationships within these principal clades were well resolved and statistically supported. Relationships among the 7 principal groups, however, were poorly supported in all analyses. This result could not be explained by any detectable systematic bias in the data or incongruence among loci. The SOWH test confirmed that basal branches' lengths were not different from zero, which points to closely-spaced cladogenesis as the most likely explanation for the poor resolution of the deep pteropodid relationships. Simulations suggest that an increase in the amount of sequence data is likely to solve this problem.
The phylogenetic hypothesis generated here provides a robust framework for a revised cladistic classification of Pteropodidae into subfamilies and tribes and will greatly contribute to the understanding of character evolution and biogeography of pteropodids. The inability of our data to resolve the deepest relationships of the major pteropodid lineages suggests an explosive diversification soon after origin of the crown pteropodids. Several characteristics of pteropodids are consistent with this conclusion, including high species diversity, great morphological diversity, and presence of key innovations in relation to their sister group.
In the eight years since phylogenomics was introduced as the intersection of genomics and phylogenetics, the field has provided fundamental insights into gene function, genome history and organismal relationships. The utility of phylogenomics is growing with the increase in the number and diversity of taxa for which whole genome and large transcriptome sequence sets are being generated. We assert that the synergy between genomic and phylogenetic perspectives in comparative biology would be enhanced by the development and refinement of minimal reporting standards for phylogenetic analyses. Encouraged by the development of the Minimum Information About a Microarray Experiment (MIAME) standard, we propose a similar roadmap for the development of a Minimal Information About a Phylogenetic Analysis (MIAPA) standard. Key in the successful development and implementation of such a standard will be broad participation by developers of phylogenetic analysis software, phylogenetic database developers, practitioners of phylogenomics, and journal editors.
The innate immune system responds within minutes of infection to produce type I interferons and pro-inflammatory cytokines. Interferons induce the synthesis of cell proteins with antiviral activity, and also shape the adaptive immune response by priming T cells. Despite the discovery of interferons over 50 years ago, only recently have we begun to understand how cells sense the presence of a virus infection. Two families of pattern recognition receptors have been shown to distinguish unique molecules present in pathogens, such as bacterial and fungal cell wall components, viral RNA and DNA, and lipoproteins. The first family includes the membrane-bound toll-like receptors (TLRs). Studies of the signaling pathways that lead from pattern recognition to cytokine induction have revealed extensive and overlapping cascades that involve protein-protein interactions and phosphorylation, and culminate in activation of transcription proteins that control the transcription of genes encoding interferons and other cytokines. A second family of pattern recognition receptors has recently been identified, which comprises the cytoplasmic sensors of viral nucleic acids, including MDA-5, RIG-I, and LGP2. In this review we summarize the discovery of these cytoplasmic sensors, how they recognize nucleic acids, the signaling pathways leading to cytokine synthesis, and viral countermeasures that have evolved to antagonize the functions of these proteins. We also consider the function of these cytoplasmic sensors in apoptosis, development and differentiation, and diabetes.
Antiviral innate immunity; MDA-5; RIG-I; domain grafting; cell signaling; apoptosis; viral pathogenesis
Human Papillomavirus type 16 (HPV16) causes over half of all cervical cancer and some HPV16 variants are more oncogenic than others. The genetic basis for the extraordinary oncogenic properties of HPV16 compared to other HPVs is unknown. In addition, we neither know which nucleotides vary across and within HPV types and lineages, nor which of the single nucleotide polymorphisms (SNPs) determine oncogenicity.
A reference set of 62 HPV16 complete genome sequences was established and used to examine patterns of evolutionary relatedness amongst variants using a pairwise identity heatmap and HPV16 phylogeny. A BLAST-based algorithm was developed to impute complete genome data from partial sequence information using the reference database. To interrogate the oncogenic risk of determined and imputed HPV16 SNPs, odds-ratios for each SNP were calculated in a case-control viral genome-wide association study (VWAS) using biopsy confirmed high-grade cervix neoplasia and self-limited HPV16 infections from Guanacaste, Costa Rica.
HPV16 variants display evolutionarily stable lineages that contain conserved diagnostic SNPs. The imputation algorithm indicated that an average of 97.5±1.03% of SNPs could be accurately imputed. The VWAS revealed specific HPV16 viral SNPs associated with variant lineages and elevated odds ratios; however, individual causal SNPs could not be distinguished with certainty due to the nature of HPV evolution.
Conserved and lineage-specific SNPs can be imputed with a high degree of accuracy from limited viral polymorphic data due to the lack of recombination and the stochastic mechanism of variation accumulation in the HPV genome. However, to determine the role of novel variants or non-lineage-specific SNPs by VWAS will require direct sequence analysis. The investigation of patterns of genetic variation and the identification of diagnostic SNPs for lineages of HPV16 variants provides a valuable resource for future studies of HPV16 pathogenicity.
Human papillomavirus 16 (HPV16) species group (alpha-9) of the Alphapapillomavirus genus contains HPV16, HPV31, HPV33, HPV35, HPV52, HPV58 and HPV67. These HPVs account for 75% of invasive cervical cancers worldwide. Viral variants of these HPVs differ in evolutionary history and pathogenicity. Moreover, a comprehensive nomenclature system for HPV variants is lacking, limiting comparisons between studies.
DNA from cervical samples previously characterized for HPV type were obtained from multiple geographic regions to screen for novel variants. The complete 8 kb genomes of 120 variants representing the major and minor lineages of the HPV16-related alpha-9 HPV types were sequenced to capture maximum viral heterogeneity. Viral evolution was characterized by constructing phylogenic trees based on complete genomes using multiple algorithms. Maximal and viral region specific divergence was calculated by global and pairwise alignments. Variant lineages were classified and named using an alphanumeric system; the prototype genome was assigned to the A lineage for all types.
The range of genome-genome sequence heterogeneity varied from 0.6% for HPV35 to 2.2% for HPV52 and included 1.4% for HPV31, 1.1% for HPV33, 1.7% for HPV58 and 1.1% for HPV67. Nucleotide differences of approximately 1.0% - 10.0% and 0.5%–1.0% of the complete genomes were used to define variant lineages and sublineages, respectively. Each gene/region differs in sequence diversity, from most variable to least variable: noncoding region 1 (NCR1) /noncoding region 2 (NCR2) >upstream regulatory region (URR)> E6/E7 > E2/L2 > E1/L1.
These data define maximum viral genomic heterogeneity of HPV16-related alpha-9 HPV variants. The proposed nomenclature system facilitates the comparison of variants across epidemiological studies. Sequence diversity and phylogenies of this clinically important group of HPVs provides the basis for further studies of discrete viral evolution, epidemiology, pathogenesis and preventative/therapeutic interventions.
HPV types differ profoundly in cervical carcinogenicity. For the most carcinogenic type, HPV16, variant lineages representing further evolutionary divergence also differ in cancer risk. Variants of the remaining 10-15 carcinogenic HPV types have not been well-studied.
In the first prospective, population-based study of HPV variants, we explored whether, on average, the oldest evolutionary branches within each carcinogenic type predicted different risks of ≥2-year viral persistence and/or precancer and cancer (CIN3+). We examined the natural history of HPV variants in the 7-year, 10,049-woman Guanacaste Cohort Study, using a nested case-control design. Infections were assigned to a variant lineage determined by phylogenetic parsimony methods based on URR/E6 sequences. We used the Fisher's combination test to evaluate significance of the risk associations, cumulating evidence across types.
Globally, for HPV types including HPV16, the p-value was 0.01 for persistence and 0.07 for CIN3+. Excluding HPV16, the p-values were 0.04 and 0.37, respectively. For HPV16, non-European viral variants were significantly more likely than European variants to cause persistence (OR = 2.6, p = 0.01) and CIN3+ (OR = 2.4, p = 0.004). HPV35 and HPV51 variant lineages also predicted CIN3+.
HPV variants generally differ in risk of persistence. For some HPV types, especially HPV16, variant lineages differ in risk of CIN3+. The findings indicate that continued evolution of HPV types has led to even finer genetic discrimination linked to HPV natural history and cervical cancer risk. Larger viral genomic studies are warranted, especially to identify the genetic basis for HPV16's unique carcinogenicity.
HPV; variants; evolution; cervix; cancer
Over the past decade, fluorescent proteins (FPs) have become ubiquitous tools in biological research. Yet, little is known about the natural function or evolution of this superfamily of proteins that originate from marine organisms. Using molecular phylogenetic analyses of 102 naturally occurring cyan fluorescent proteins, green fluorescent proteins, red fluorescent proteins, as well as the nonfluorescent (purple-blue) protein sequences (including new FPs from Lizard Island, Australia) derived from organisms with known geographic origin, we show that FPs consist of two distinct and novel regions that have evolved under opposite and sharply divergent evolutionary pressures. A central region is highly conserved, and although it contains the residues that form the chromophore, its evolution does not track with fluorescent color and evolves independently from the rest of the protein. By contrast, the regions enclosing this central region are under strong positive selection pressure to vary its sequence and yet segregate well with fluorescence color emission. We did not find a significant correlation between geographic location of the organism from which the FP was isolated and molecular evolution of the protein. These results define for the first time two distinct regions based on evolution for this highly compact protein. The findings have implications for more sophisticated bioengineering of this molecule as well as studies directed toward understanding the natural function of FPs.
fluorescent protein; molecular evolution; positive selection; conserved region
We use measures of congruence on a combined expressed sequenced tag genome phylogeny to identify proteins that have potential significance in the evolution of seed plants. Relevant proteins are identified based on the direction of partitioned branch and hidden support on the hypothesis obtained on a 16-species tree, constructed from 2,557 concatenated orthologous genes. We provide a general method for detecting genes or groups of genes that may be under selection in directions that are in agreement with the phylogenetic pattern. Gene partitioning methods and estimates of the degree and direction of support of individual gene partitions to the overall data set are used. Using this approach, we correlate positive branch support of specific genes for key branches in the seed plant phylogeny. In addition to basic metabolic functions, such as photosynthesis or hormones, genes involved in posttranscriptional regulation by small RNAs were significantly overrepresented in key nodes of the phylogeny of seed plants. Two genes in our matrix are of critical importance as they are involved in RNA-dependent regulation, essential during embryo and leaf development. These are Argonaute and the RNA-dependent RNA polymerase 6 found to be overrepresented in the angiosperm clade. We use these genes as examples of our phylogenomics approach and show that identifying partitions or genes in this way provides a platform to explain some of the more interesting organismal differences among species, and in particular, in the evolution of plants.
phylogenomics; orthologs; partition metrics; gene ontology; micro-RNAs; small interfering RNAs