PMCC PMCC

Search tips
Search criteria

Advanced
Results 1-25 (27)
 

Clipboard (0)
None

Select a Filter Below

Journals
more »
Year of Publication
Document Types
1.  Amplification, Mutation, and Sequencing of a Six-Letter Synthetic Genetic System 
Journal of the American Chemical Society  2011;133(38):15105-15112.
The next goals in the development of a synthetic biology that use artificial genetic systems will require chemistry-biology combinations that allow the amplification of DNA containing any number of sequential and non-sequential non-standard nucleotides. This amplification must ensure that the non-standard nucleotides are not unidirectionally lost during PCR amplification (unidirectional loss would cause the artificial system to revert to an all-natural genetic system). Further, technology is needed to sequence artificial genetic DNA molecules. The work reported here meets all three of these goals for a six-letter artificially expanded genetic information system (AEGIS) that comprises four standard nucleotides (G, A, C, and T) and two additional non-standard nucleotides (Z and P). We report polymerases and PCR conditions that amplify a wide range of GACTZP DNA sequences having multiple consecutive unnatural synthetic genetic components with low (0.2% per theoretical cycle) levels of mutation. We demonstrate that residual mutation processes both introduce and remove unnatural nucleotides, allowing the artificial genetic system to evolve as such, rather than revert to a wholly natural system. We then show that mechanisms for these residual mutation processes can be exploited in a strategy to sequence “six-letter” GACTZP DNA. These are all not yet reported for any other synthetic genetic system.
doi:10.1021/ja204910n
PMCID: PMC3427765  PMID: 21842904
2.  The Natural History of Class I Primate Alcohol Dehydrogenases Includes Gene Duplication, Gene Loss, and Gene Conversion 
PLoS ONE  2012;7(7):e41175.
Background
Gene duplication is a source of molecular innovation throughout evolution. However, even with massive amounts of genome sequence data, correlating gene duplication with speciation and other events in natural history can be difficult. This is especially true in its most interesting cases, where rapid and multiple duplications are likely to reflect adaptation to rapidly changing environments and life styles. This may be so for Class I of alcohol dehydrogenases (ADH1s), where multiple duplications occurred in primate lineages in Old and New World monkeys (OWMs and NWMs) and hominoids.
Methodology/Principal Findings
To build a preferred model for the natural history of ADH1s, we determined the sequences of nine new ADH1 genes, finding for the first time multiple paralogs in various prosimians (lemurs, strepsirhines). Database mining then identified novel ADH1 paralogs in both macaque (an OWM) and marmoset (a NWM). These were used with the previously identified human paralogs to resolve controversies relating to dates of duplication and gene conversion in the ADH1 family. Central to these controversies are differences in the topologies of trees generated from exonic (coding) sequences and intronic sequences.
Conclusions/Significance
We provide evidence that gene conversions are the primary source of difference, using molecular clock dating of duplications and analyses of microinsertions and deletions (micro-indels). The tree topology inferred from intron sequences appear to more correctly represent the natural history of ADH1s, with the ADH1 paralogs in platyrrhines (NWMs) and catarrhines (OWMs and hominoids) having arisen by duplications shortly predating the divergence of OWMs and NWMs. We also conclude that paralogs in lemurs arose independently. Finally, we identify errors in database interpretation as the source of controversies concerning gene conversion. These analyses provide a model for the natural history of ADH1s that posits four ADH1 paralogs in the ancestor of Catarrhine and Platyrrhine primates, followed by the loss of an ADH1 paralog in the human lineage.
doi:10.1371/journal.pone.0041175
PMCID: PMC3409193  PMID: 22859968
3.  Planetary Organic Chemistry and the Origins of Biomolecules 
Organic chemistry on a planetary scale is likely to have transformed carbon dioxide and reduced carbon species delivered to an accreting Earth. According to various models for the origin of life on Earth, biological molecules that jump-started Darwinian evolution arose via this planetary chemistry. The grandest of these models assumes that ribonucleic acid (RNA) arose prebiotically, together with components for compartments that held it and a primitive metabolism that nourished it. Unfortunately, it has been challenging to identify possible prebiotic chemistry that might have created RNA. Organic molecules, given energy, have a well-known propensity to form multiple products, sometimes referred to collectively as “tar” or “tholin.” These mixtures appear to be unsuited to support Darwinian processes, and certainly have never been observed to spontaneously yield a homochiral genetic polymer. To date, proposed solutions to this challenge either involve too much direct human intervention to satisfy many in the community, or generate molecules that are unreactive “dead ends” under standard conditions of temperature and pressure. Carbohydrates, organic species having carbon, hydrogen, and oxygen atoms in a ratio of 1:2:1 and an aldehyde or ketone group, conspicuously embody this challenge. They are components of RNA and their reactivity can support both interesting spontaneous chemistry as part of a “carbohydrate world,” but they also easily form mixtures, polymers and tars. We describe here the latest thoughts on how on this challenge, focusing on how it might be resolved using minerals containing borate, silicate, and molybdate, inter alia.
Borates, silicates, and other minerals may have promoted prebiotic chemical reactions in which organic molecules produced RNA, rather than “dead end” polymers and tars.
doi:10.1101/cshperspect.a003467
PMCID: PMC2890202  PMID: 20504964
4.  Experimental Evolution of a Facultative Thermophile from a Mesophilic Ancestor 
Experimental evolution via continuous culture is a powerful approach to the alteration of complex phenotypes, such as optimal/maximal growth temperatures. The benefit of this approach is that phenotypic selection is tied to growth rate, allowing the production of optimized strains. Herein, we demonstrate the use of a recently described long-term culture apparatus called the Evolugator for the generation of a thermophilic descendant from a mesophilic ancestor (Escherichia coli MG1655). In addition, we used whole-genome sequencing of sequentially isolated strains throughout the thermal adaptation process to characterize the evolutionary history of the resultant genotype, identifying 31 genetic alterations that may contribute to thermotolerance, although some of these mutations may be adaptive for off-target environmental parameters, such as rich medium. We undertook preliminary phenotypic analysis of mutations identified in the glpF and fabA genes. Deletion of glpF in a mesophilic wild-type background conferred significantly improved growth rates in the 43-to-48°C temperature range and altered optimal growth temperature from 37°C to 43°C. In addition, transforming our evolved thermotolerant strain (EVG1064) with a wild-type allele of glpF reduced fitness at high temperatures. On the other hand, the mutation in fabA predictably increased the degree of saturation in membrane lipids, which is a known adaptation to elevated temperature. However, transforming EVG1064 with a wild-type fabA allele had only modest effects on fitness at intermediate temperatures. The Evolugator is fully automated and demonstrates the potential to accelerate the selection for complex traits by experimental evolution and significantly decrease development time for new industrial strains.
doi:10.1128/AEM.05773-11
PMCID: PMC3255606  PMID: 22020511
5.  Expanded Genetic Alphabets in the Polymerase Chain Reaction** 
Cleaning up polymerase chain reactions: Artificially expanded genetic information systems (AEGIS) add extra nucleotide "letters" to DNA alphabets; oligonucleotides containing AEGIS nucleotides do not bind to natural DNA. This "orthogonality" is exploited here by placing two AEGIS nucleotides (P and Z) in external tags for primers targeting three cancer genes in a nested PCR architecture. AEGIS tags support multiplexed PCR with fewer primer dimers and off-target amplicons than multiplexed PCR without AEGIS components.
doi:10.1002/anie.200905173
PMCID: PMC3155763  PMID: 19946925
PCR; DNA replication; genetic alphabets; polymerases; nucleobases
6.  Recognition of an expanded genetic alphabet by type-II restriction endonucleases and their application to analyze polymerase fidelity 
Nucleic Acids Research  2011;39(9):3949-3961.
To explore the possibility of using restriction enzymes in a synthetic biology based on artificially expanded genetic information systems (AEGIS), 24 type-II restriction endonucleases (REases) were challenged to digest DNA duplexes containing recognition sites where individual Cs and Gs were replaced by the AEGIS nucleotides Z and P [respectively, 6-amino-5-nitro-3-(1′-β-d-2′-deoxyribofuranosyl)-2(1H)-pyridone and 2-amino-8-(1′-β-d-2′-deoxyribofuranosyl)-imidazo[1,2-a]-1,3,5-triazin-4(8H)-one]. These AEGIS nucleotides implement complementary hydrogen bond donor–donor–acceptor and acceptor–acceptor–donor patterns. Results allowed us to classify type-II REases into five groups based on their performance, and to infer some specifics of their interactions with functional groups in the major and minor grooves of the target DNA. For three enzymes among these 24 where crystal structures are available (BcnI, EcoO109I and NotI), these interactions were modeled. Further, we applied a type-II REase to quantitate the fidelity polymerases challenged to maintain in a DNA duplex C:G, T:A and Z:P pairs through repetitive PCR cycles. This work thus adds tools that are able to manipulate this expanded genetic alphabet in vitro, provides some structural insights into the working of restriction enzymes, and offers some preliminary data needed to take the next step in synthetic biology to use an artificial genetic system inside of living bacterial cells.
doi:10.1093/nar/gkq1274
PMCID: PMC3089450  PMID: 21245035
7.  Defining Life 
Astrobiology  2010;10(10):1021-1030.
Abstract
Any definition is intricately connected to a theory that gives it meaning. Accordingly, this article discusses various definitions of life held in the astrobiology community by considering their connected “theories of life.” These include certain “list” definitions and a popular definition that holds that life is a “self-sustaining chemical system capable of Darwinian evolution.” We then act as “anthropologists,” studying what scientists do to determine which definition-theories of life they constructively hold as they design missions to seek non-terran life. We also look at how constructive beliefs about biosignatures change as observational data accumulate. And we consider how a definition centered on Darwinian evolution might itself be forced to change as supra-Darwinian species emerge, including in our descendents, and consider the chances of our encountering supra-Darwinian species in our exploration of the Cosmos. Last, we ask what chemical structures might support Darwinian evolution universally; these structures might be universal biosignatures. Key Words: Evolution—Life—Life detection—Biosignatures. Astrobiology 10, 1021–1030.
doi:10.1089/ast.2010.0524
PMCID: PMC3005285  PMID: 21162682
8.  Q&A: Life, synthetic biology and risk 
BMC Biology  2010;8:77.
doi:10.1186/1741-7007-8-77
PMCID: PMC2885331  PMID: 20594289
9.  Design of a novel molecular beacon: modification of the stem with artificially genetic alphabet† 
A molecular beacon that incorporates components of an artificially expanded genetic information system (Aegis) in its stem is shown not to be opened by unwanted stem invasion by adventitious standard DNA; this should improve the “darkness” of the beacon in real-world applications.
doi:10.1039/b811159f
PMCID: PMC2763601  PMID: 18956044
10.  Lessons from comparative physiology: could uric acid represent a physiologic alarm signal gone awry in western society? 
Uric acid has historically been viewed as a purine metabolic waste product excreted by the kidney and gut that is relatively unimportant other than its penchant to crystallize in joints to cause the disease gout. In recent years, however, there has been the realization that uric acid is not biologically inert but may have a wide range of actions, including being both a pro- and anti-oxidant, a neurostimulant, and an inducer of inflammation and activator of the innate immune response. In this paper, we present the hypothesis that uric acid has a key role in the foraging response associated with starvation and fasting. We further suggest that there is a complex interplay between fructose, uric acid and vitamin C, with fructose and uric acid stimulating the foraging response and vitamin C countering this response. Finally, we suggest that the mutations in ascorbate synthesis and uricase that characterized early primate evolution were likely in response to the need to stimulate the foraging “survival” response and might have inadvertently had a role in accelerating the development of bipedal locomotion and intellectual development. Unfortunately, due to marked changes in the diet, resulting in dramatic increases in fructose- and purine-rich foods, these identical genotypic changes may be largely responsible for the epidemic of obesity, diabetes and cardiovascular disease in today’s society.
doi:10.1007/s00360-008-0291-7
PMCID: PMC2684327  PMID: 18649082
Uric acid; Fructose; Foraging; Metabolic syndrome; Obesity; Fasting; Hibernation
11.  The potential and challenges of nanopore sequencing 
Nature biotechnology  2008;26(10):1146-1153.
A nanopore-based device provides single-molecule detection and analytical capabilities that are achieved by electrophoretically driving molecules in solution through a nano-scale pore. The nanopore provides a highly confined space within which single nucleic acid polymers can be analyzed at high throughput by one of a variety of means, and the perfect processivity that can be enforced in a narrow pore ensures that the native order of the nucleobases in a polynucleotide is reflected in the sequence of signals that is detected. Kilobase length polymers (single-stranded genomic DNA or RNA) or small molecules (e.g., nucleosides) can be identified and characterized without amplification or labeling, a unique analytical capability that makes inexpensive, rapid DNA sequencing a possibility. Further research and development to overcome current challenges to nanopore identification of each successive nucleotide in a DNA strand offers the prospect of `third generation' instruments that will sequence a diploid mammalian genome for ~$1,000 in ~24 h.
doi:10.1038/nbt.1495
PMCID: PMC2683588  PMID: 18846088
12.  The Planetary Biology of Ascorbate and Uric acid and their Relationship with the Epidemic of Obesity and Cardiovascular Disease 
Medical hypotheses  2008;71(1):22-31.
Humans have relatively low plasma ascorbate levels and high serum uric acid levels compared to most mammals due to the presence of genetic mutations in L-gulonolactone oxidase and uricase, respectively. We review the major hypotheses for why these mutations may have occurred. In particular, we suggest that both mutations may have provided a survival advantage to early primates by helping maintain blood pressure during periods of dietary change and environmental stress. We further propose that these mutations have the inadvertent disadvantage of increasing our risk for hypertension and cardiovascular disease in today’s society characterized by Western diet and increasing physical inactivity. Finally, we suggest that a “planetary biology” approach in which genetic changes are analyzed in relation to their biologic action and historical context may provide the ideal approach towards understanding the biology of the past, present and future.
doi:10.1016/j.mehy.2008.01.017
PMCID: PMC2495042  PMID: 18331782
13.  Multiplexed Genetic Analysis Using an Expanded Genetic Alphabet 
Clinical chemistry  2004;50(11):2019-2027.
Background
All states require some kind of testing for newborns, but the policies are far from standardized. In some states, newborn screening may include genetic tests for a wide range of targets, but the costs and complexities of the newer genetic tests inhibit expansion of newborn screening. We describe the development and technical evaluation of a multiplex platform that may foster increased newborn genetic screening.
Methods
MultiCode® PLx involves three major steps: PCR, target-specific extension, and liquid chip decoding. Each step is performed in the same reaction vessel, and the test is completed in ~3 h. For site-specific labeling and room-temperature decoding, we use an additional base pair constructed from isoguanosine and isocytidine. We used the method to test for mutations within the cystic fibrosis transmembrane conductance regulator (CFTR) gene. The developed test was performed manually and by automated liquid handling. Initially, 225 samples with a range of genotypes were tested retrospectively with the method. A prospective study used samples from >400 newborns.
Results
In the retrospective study, 99.1% of samples were correctly genotyped with no incorrect calls made. In the perspective study, 95% of the samples were correctly genotyped for all targets, and there were no incorrect calls.
Conclusions
The unique genetic multiplexing platform was successfully able to test for 31 targets within the CFTR gene and provides accurate genotype assignments in a clinical setting.
doi:10.1373/clinchem.2004.034330
PMCID: PMC1592527  PMID: 15319316
14.  Enzymatic incorporation of a third nucleobase pair 
Nucleic Acids Research  2007;35(13):4238-4249.
DNA polymerases are identified that copy a non-standard nucleotide pair joined by a hydrogen bonding pattern different from the patterns joining the dA:T and dG:dC pairs. 6-Amino-5-nitro-3-(1′-β-d-2′-deoxyribofuranosyl)-2(1H)-pyridone (dZ) implements the non-standard ‘small’ donor–donor–acceptor (pyDDA) hydrogen bonding pattern. 2-Amino-8-(1′-β-D-2′-deoxyribofuranosyl)-imidazo[1,2-a]-1,3,5-triazin-4(8H)-one (dP) implements the ‘large’ acceptor–acceptor–donor (puAAD) pattern. These nucleobases were designed to present electron density to the minor groove, density hypothesized to help determine specificity for polymerases. Consistent with this hypothesis, both dZTP and dPTP are accepted by many polymerases from both Families A and B. Further, the dZ:dP pair participates in PCR reactions catalyzed by Taq, Vent (exo−) and Deep Vent (exo−) polymerases, with 94.4%, 97.5% and 97.5%, respectively, retention per round. The dZ:dP pair appears to be lost principally via transition to a dC:dG pair. This is consistent with a mechanistic hypothesis that deprotonated dZ (presenting a pyDAA pattern) complements dG (presenting a puADD pattern), while protonated dC (presenting a pyDDA pattern) complements dP (presenting a puAAD pattern). This hypothesis, grounded in the Watson–Crick model for nucleobase pairing, was confirmed by studies of the pH-dependence of mismatching. The dZ:dP pair and these polymerases, should be useful in dynamic architectures for sequencing, molecular-, systems- and synthetic-biology.
doi:10.1093/nar/gkm395
PMCID: PMC1934989  PMID: 17576683
15.  Nucleoside alpha-thiotriphosphates, polymerases and the exonuclease III analysis of oligonucleotides containing phosphorothioate linkages 
Nucleic Acids Research  2007;35(9):3118-3127.
The use of DNA polymerases to incorporate phosphorothioate linkages into DNA, and the use of exonuclease III to determine where those linkages have been incorporated, are re-examined in this work. The results presented here show that exonuclease III degrades single-stranded DNA as a substrate and digests through phosphorothioate linkages having one absolute stereochemistry, assigned (assuming inversion in the polymerase reaction) as S, but not the other absolute stereochemistry. This contrasts with a general view in the literature that exonuclease III favors double-stranded nucleic acid as a substrate and stops completely at phosphorothioate linkages. Furthermore, not all DNA polymerases appear to accept exclusively the (R) stereoisomer of nucleoside alpha-thiotriphosphates [and not the (S) diastereomer], a conclusion inferred two decades ago by examination of five Family-A polymerases and a reverse transcriptase. This suggests that caution is appropriate when extrapolating the detailed behavior of one polymerase from the behaviors of other polymerases. Furthermore, these results provide constraints on how exonuclease III–thiotriphosphate–polymerase combinations can be used to analyze the behavior of the components of a synthetic biology.
doi:10.1093/nar/gkm168
PMCID: PMC1888802  PMID: 17452363
16.  Artificially expanded genetic information system: a new base pair with an alternative hydrogen bonding pattern 
Nucleic Acids Research  2006;34(21):6095-6101.
To support efforts to develop a ‘synthetic biology’ based on an artificially expanded genetic information system (AEGIS), we have developed a route to two components of a non-standard nucleobase pair, the pyrimidine analog 6-amino-5-nitro-3-(1′-β-D-2′-deoxyribofuranosyl)-2(1H)-pyridone (dZ) and its Watson–Crick complement, the purine analog 2-amino-8-(1′-β-D-2′-deoxyribofuranosyl)-imidazo[1,2-a]-1,3,5-triazin-4(8H)-one (dP). These implement the pyDDA:puAAD hydrogen bonding pattern (where ‘py’ indicates a pyrimidine analog and ‘pu’ indicates a purine analog, while A and D indicate the hydrogen bonding patterns of acceptor and donor groups presented to the complementary nucleobases, from the major to the minor groove). Also described is the synthesis of the triphosphates and protected phosphoramidites of these two nucleosides. We also describe the use of the protected phosphoramidites to synthesize DNA oligonucleotides containing these AEGIS components, verify the absence of epimerization of dZ in those oligonucleotides, and report some hybridization properties of the dZ:dP nucleobase pair, which is rather strong, and the ability of each to effectively discriminate against mismatches in short duplex DNA.
doi:10.1093/nar/gkl633
PMCID: PMC1635279  PMID: 17074747
17.  Dynamic assembly of primers on nucleic acid templates 
Nucleic Acids Research  2006;34(17):4702-4710.
A strategy is presented that uses dynamic equlibria to assemble in situ composite DNA polymerase primers, having lengths of 14 or 16 nt, from DNA fragments that are 6 or 8 nt in length. In this implementation, the fragments are transiently joined under conditions of dynamic equilibrium by an imine linker, which has a dissociation constant of ∼1 μM. If a polymerase is able to extend the composite, but not the fragments, it is possible to prime the synthesis of a target DNA molecule under conditions where two useful specificities are combined: (i) single nucleotide discrimination that is characteristic of short oligonucleotide duplexes (four to six nucleobase pairs in length), which effectively excludes single mismatches, and (ii) an overall specificity of priming that is characteristic of long (14 to 16mers) oligonucleotides, potentially unique within a genome. We report here the screening of a series of polymerases that combine an ability not to accept short primer fragments with an ability to accept the long composite primer held together by an unnatural imine linkage. Several polymerases were found that achieve this combination, permitting the implementation of the dynamic combinatorial chemical strategy.
doi:10.1093/nar/gkl625
PMCID: PMC1635275  PMID: 16963776
18.  Analysis of transitions at two-fold redundant sites in mammalian genomes. Transition redundant approach-to-equilibrium (TREx) distance metrics 
Background
The exchange of nucleotides at synonymous sites in a gene encoding a protein is believed to have little impact on the fitness of a host organism. This should be especially true for synonymous transitions, where a pyrimidine nucleotide is replaced by another pyrimidine, or a purine is replaced by another purine. This suggests that transition redundant exchange (TREx) processes at the third position of conserved two-fold codon systems might offer the best approximation for a neutral molecular clock, serving to examine, within coding regions, theories that require neutrality, determine whether transition rate constants differ within genes in a single lineage, and correlate dates of events recorded in genomes with dates in the geological and paleontological records. To date, TREx analysis of the yeast genome has recognized correlated duplications that established a new metabolic strategies in fungi, and supported analyses of functional change in aromatases in pigs. TREx dating has limitations, however. Multiple transitions at synonymous sites may cause equilibration and loss of information. Further, to be useful to correlate events in the genomic record, different genes within a genome must suffer transitions at similar rates.
Results
A formalism to analyze divergence at two fold redundant codon systems is presented. This formalism exploits two-state approach-to-equilibrium kinetics from chemistry. This formalism captures, in a single equation, the possibility of multiple substitutions at individual sites, avoiding any need to "correct" for these. The formalism also connects specific rate constants for transitions to specific approximations in an underlying evolutionary model, including assumptions that transition rate constants are invariant at different sites, in different genes, in different lineages, and at different times. Therefore, the formalism supports analyses that evaluate these approximations.
Transitions at synonymous sites within two-fold redundant coding systems were examined in the mouse, rat, and human genomes. The key metric (f2), the fraction of those sites that holds the same nucleotide, was measured for putative ortholog pairs. A transition redundant exchange (TREx) distance was calculated from f2 for these pairs. Pyrimidine-pyrimidine transitions at these sites occur approximately 14% faster than purine-purine transitions in various lineages. Transition rate constants were similar in different genes within the same lineages; within a set of orthologs, the f2 distribution is only modest overdispersed. No correlation between disparity and overdispersion is observed. In rodents, evidence was found for greater conservation of TREx sites in genes on the X chromosome, accounting for a small part of the overdispersion, however.
Conclusion
The TREx metric is useful to analyze the history of transition rate constants within these mammals over the past 100 million years. The TREx metric estimates the extent to which silent nucleotide substitutions accumulate in different genes, on different chromosomes, with different compositions, in different lineages, and at different times.
doi:10.1186/1471-2148-6-25
PMCID: PMC1435776  PMID: 16545144
19.  Application of DETECTER, an evolutionary genomic tool to analyze genetic variation, to the cystic fibrosis gene family 
BMC Genomics  2006;7:44.
Background
The medical community requires computational tools that distinguish missense genetic differences having phenotypic impact within the vast number of sense mutations that do not. Tools that do this will become increasingly important for those seeking to use human genome sequence data to predict disease, make prognoses, and customize therapy to individual patients.
Results
An approach, termed DETECTER, is proposed to identify sites in a protein sequence where amino acid replacements are likely to have a significant effect on phenotype, including causing genetic disease. This approach uses a model-dependent tool to estimate the normalized replacement rate at individual sites in a protein sequence, based on a history of those sites extracted from an evolutionary analysis of the corresponding protein family. This tool identifies sites that have higher-than-average, average, or lower-than-average rates of change in the lineage leading to the sequence in the population of interest. The rates are then combined with sequence data to determine the likelihoods that particular amino acids were present at individual sites in the evolutionary history of the gene family. These likelihoods are used to predict whether any specific amino acid replacements, if introduced at the site in a modern human population, would have a significant impact on fitness. The DETECTER tool is used to analyze the cystic fibrosis transmembrane conductance regulator (CFTR) gene family.
Conclusion
In this system, DETECTER retrodicts amino acid replacements associated with the cystic fibrosis disease with greater accuracy than alternative approaches. While this result validates this approach for this particular family of proteins only, the approach may be applicable to the analysis of polymorphisms generally, including SNPs in a human population.
doi:10.1186/1471-2164-7-44
PMCID: PMC1420294  PMID: 16522197
20.  Integrating protein structures and precomputed genealogies in the Magnum database: Examples with cellular retinoid binding proteins 
BMC Bioinformatics  2006;7:89.
Background
When accurate models for the divergent evolution of protein sequences are integrated with complementary biological information, such as folded protein structures, analyses of the combined data often lead to new hypotheses about molecular physiology. This represents an excellent example of how bioinformatics can be used to guide experimental research. However, progress in this direction has been slowed by the lack of a publicly available resource suitable for general use.
Results
The precomputed Magnum database offers a solution to this problem for ca. 1,800 full-length protein families with at least one crystal structure. The Magnum deliverables include 1) multiple sequence alignments, 2) mapping of alignment sites to crystal structure sites, 3) phylogenetic trees, 4) inferred ancestral sequences at internal tree nodes, and 5) amino acid replacements along tree branches. Comprehensive evaluations revealed that the automated procedures used to construct Magnum produced accurate models of how proteins divergently evolve, or genealogies, and correctly integrated these with the structural data. To demonstrate Magnum's capabilities, we asked for amino acid replacements requiring three nucleotide substitutions, located at internal protein structure sites, and occurring on short phylogenetic tree branches. In the cellular retinoid binding protein family a site that potentially modulates ligand binding affinity was discovered. Recruitment of cellular retinol binding protein to function as a lens crystallin in the diurnal gecko afforded another opportunity to showcase the predictive value of a browsable database containing branch replacement patterns integrated with protein structures.
Conclusion
We integrated two areas of protein science, evolution and structure, on a large scale and created a precomputed database, known as Magnum, which is the first freely available resource of its kind. Magnum provides evolutionary and structural bioinformatics resources that are useful for identifying experimentally testable hypotheses about the molecular basis of protein behaviors and functions, as illustrated with the examples from the cellular retinoid binding proteins.
doi:10.1186/1471-2105-7-89
PMCID: PMC1475641  PMID: 16504077
21.  The use of thymidine analogs to improve the replication of an extra DNA base pair: a synthetic biological system 
Nucleic Acids Research  2005;33(17):5640-5646.
Synthetic biology based on a six-letter genetic alphabet that includes the two non-standard nucleobases isoguanine (isoG) and isocytosine (isoC), as well as the standard A, T, G and C, is known to suffer as a consequence of a minor tautomeric form of isoguanine that pairs with thymine, and therefore leads to infidelity during repeated cycles of the PCR. Reported here is a solution to this problem. The solution replaces thymidine triphosphate by 2-thiothymidine triphosphate (2-thioTTP). Because of the bulk and hydrogen bonding properties of the thione unit in 2-thioT, 2-thioT does not mispair effectively with the minor tautomer of isoG. To test whether this might allow PCR amplification of a six-letter artificially expanded genetic information system, we examined the relative rates of misincorporation of 2-thioTTP and TTP opposite isoG using affinity electrophoresis. The concentrations of isoCTP and 2-thioTTP were optimal to best support PCR amplification using thermostable polymerases of a six-letter alphabet that includes the isoC–isoG pair. The fidelity-per-round of amplification was found to be ∼98% in trial PCRs with this six-letter DNA alphabet. The analogous PCR employing TTP had a fidelity-per-round of only ∼93%. Thus, the A, 2-thioT, G, C, isoC, isoG alphabet is an artificial genetic system capable of Darwinian evolution.
doi:10.1093/nar/gki873
PMCID: PMC1236980  PMID: 16192575
22.  Phylogenomic approaches to common problems encountered in the analysis of low copy repeats: The sulfotransferase 1A gene family example 
Background
Blocks of duplicated genomic DNA sequence longer than 1000 base pairs are known as low copy repeats (LCRs). Identified by their sequence similarity, LCRs are abundant in the human genome, and are interesting because they may represent recent adaptive events, or potential future adaptive opportunities within the human lineage. Sequence analysis tools are needed, however, to decide whether these interpretations are likely, whether a particular set of LCRs represents nearly neutral drift creating junk DNA, or whether the appearance of LCRs reflects assembly error. Here we investigate an LCR family containing the sulfotransferase (SULT) 1A genes involved in drug metabolism, cancer, hormone regulation, and neurotransmitter biology as a first step for defining the problems that those tools must manage.
Results
Sequence analysis here identified a fourth sulfotransferase gene, which may be transcriptionally active, located on human chromosome 16. Four regions of genomic sequence containing the four human SULT1A paralogs defined a new LCR family. The stem hominoid SULT1A progenitor locus was identified by comparative genomics involving complete human and rodent genomes, and a draft chimpanzee genome. SULT1A expansion in hominoid genomes was followed by positive selection acting on specific protein sites. This episode of adaptive evolution appears to be responsible for the dopamine sulfonation function of some SULT enzymes. Each of the conclusions that this bioinformatic analysis generated using data that has uncertain reliability (such as that from the chimpanzee genome sequencing project) has been confirmed experimentally or by a "finished" chromosome 16 assembly, both of which were published after the submission of this manuscript.
Conclusion
SULT1A genes expanded from one to four copies in hominoids during intra-chromosomal LCR duplications, including (apparently) one after the divergence of chimpanzees and humans. Thus, LCRs may provide a means for amplifying genes (and other genetic elements) that are adaptively useful. Being located on and among LCRs, however, could make the human SULT1A genes susceptible to further duplications or deletions resulting in 'genomic diseases' for some individuals. Pharmacogenomic studies of SULT1Asingle nucleotide polymorphisms, therefore, should also consider examining SULT1A copy number variability when searching for genotype-phenotype associations. The latest duplication is, however, only a substantiated hypothesis; an alternative explanation, disfavored by the majority of evidence, is that the duplication is an artifact of incorrect genome assembly.
doi:10.1186/1471-2148-5-22
PMCID: PMC555591  PMID: 15752422
23.  The planetary biology of cytochrome P450 aromatases 
BMC Biology  2004;2:19.
Background
Joining a model for the molecular evolution of a protein family to the paleontological and geological records (geobiology), and then to the chemical structures of substrates, products, and protein folds, is emerging as a broad strategy for generating hypotheses concerning function in a post-genomic world. This strategy expands systems biology to a planetary context, necessary for a notion of fitness to underlie (as it must) any discussion of function within a biomolecular system.
Results
Here, we report an example of such an expansion, where tools from planetary biology were used to analyze three genes from the pig Sus scrofa that encode cytochrome P450 aromatases–enzymes that convert androgens into estrogens. The evolutionary history of the vertebrate aromatase gene family was reconstructed. Transition redundant exchange silent substitution metrics were used to interpolate dates for the divergence of family members, the paleontological record was consulted to identify changes in physiology that correlated in time with the change in molecular behavior, and new aromatase sequences from peccary were obtained. Metrics that detect changing function in proteins were then applied, including KA/KS values and those that exploit structural biology. These identified specific amino acid replacements that were associated with changing substrate and product specificity during the time of presumed adaptive change. The combined analysis suggests that aromatase paralogs arose in pigs as a result of selection for Suoidea with larger litters than their ancestors, and permitted the Suoidea to survive the global climatic trauma that began in the Eocene.
Conclusions
This combination of bioinformatics analysis, molecular evolution, paleontology, cladistics, global climatology, structural biology, and organic chemistry serves as a paradigm in planetary biology. As the geological, paleontological, and genomic records improve, this approach should become widely useful to make systems biology statements about high-level function for biomolecular systems.
doi:10.1186/1741-7007-2-19
PMCID: PMC515309  PMID: 15315709
24.  Probing minor groove recognition contacts by DNA polymerases and reverse transcriptases using 3-deaza-2′-deoxyadenosine 
Nucleic Acids Research  2004;32(7):2241-2250.
Standard nucleobases all present electron density as an unshared pair of electrons to the minor groove of the double helix. Many heterocycles supporting artificial genetic systems lack this electron pair. To determine how different DNA polymerases use the pair as a substrate specificity determinant, three Family A polymerases, three Family B polymerases and three reverse transcriptases were examined for their ability to handle 3-deaza-2′-deoxyadenosine (c3dA), an analog of 2′-deoxyadenosine lacking the minor groove electron pair. Different polymerases differed widely in their interaction with c3dA. Most notably, Family A and Family B polymerases differed in their use of this interaction to exploit their exonuclease activities. Significant differences were also found within polymerase families. This plasticity in polymerase behavior is encouraging to those wishing to develop a synthetic biology based on artificial genetic systems. The differences also suggest either that Family A and Family B polymerases do not share a common ancestor, that minor groove contact was not used by that ancestor functionally or that this contact was not sufficiently critical to fitness to have been conserved as the polymerase families diverged. Each interpretation is significant for understanding the planetary biology of polymerases.
doi:10.1093/nar/gkh542
PMCID: PMC407825  PMID: 15107492
25.  PCR amplification of DNA containing non-standard base pairs by variants of reverse transcriptase from Human Immunodeficiency Virus-1 
Nucleic Acids Research  2004;32(2):728-735.
As the next step towards generating a synthetic biology from artificial genetic information systems, we have examined variants of HIV reverse transcriptase (RT) for their ability to synthesize duplex DNA incorporating the non-standard base pair between 2,4-diaminopyrimidine (pyDAD), a pyrimidine presenting a hydrogen bond ‘donor–acceptor–donor’ pattern to the complementary base, and xanthine (puADA), a purine presenting a hydrogen bond ‘acceptor–donor–acceptor’ pattern. This base pair fits the Watson–Crick geometry, but is joined by a pattern of hydrogen bond donor and acceptor groups different from those joining the GC and AT pairs. A variant of HIV-RT where Tyr 188 is replaced by Leu, has emerged from experiments where HIV was challenged to grow in the presence of drugs targeted against the RT, such as L-697639, TIBO and nevirapine. These drugs bind at a site near, but not in, the active site. This variant accepts the pyDAD-puADA base pair significantly better than wild type HIV-RT, and we used this as a starting point. A second mutation, E478Q, was introduced into the Y188L variant, in the event that the residual nuclease activity observed is due to the RT, and not a contaminant. The doubly mutated RT incorporated the non-standard pair with sufficient fidelity that the variant could be used to amplify oligonucleotides containing pyDAD and puADA through several rounds of a polymerase chain reaction (PCR) without losing the non-standard base pair. This is the first time where DNA containing non-standard base pairs with alternative hydrogen bonding patterns has been amplified by a full PCR. This work also illustrates a research strategy that combines in clinico pre-evolution of proteins followed by rational design to obtain an enzyme that meets a particular technological specification.
doi:10.1093/nar/gkh241
PMCID: PMC373358  PMID: 14757837

Results 1-25 (27)