PMCC PMCC

Search tips
Search criteria

Advanced
Results 1-25 (29)
 

Clipboard (0)
None

Select a Filter Below

Journals
more »
Year of Publication
Document Types
1.  Deep Resequencing and Association Analysis of Schizophrenia Candidate Genes 
Molecular psychiatry  2012;18(2):138-140.
doi:10.1038/mp.2012.28
PMCID: PMC3577417  PMID: 22472875
schizophrenia; sequencing; SNV; genetic; association; mutation; DISC1
2.  A framework for human microbiome research 
Methé, Barbara A. | Nelson, Karen E. | Pop, Mihai | Creasy, Heather H. | Giglio, Michelle G. | Huttenhower, Curtis | Gevers, Dirk | Petrosino, Joseph F. | Abubucker, Sahar | Badger, Jonathan H. | Chinwalla, Asif T. | Earl, Ashlee M. | FitzGerald, Michael G. | Fulton, Robert S. | Hallsworth-Pepin, Kymberlie | Lobos, Elizabeth A. | Madupu, Ramana | Magrini, Vincent | Martin, John C. | Mitreva, Makedonka | Muzny, Donna M. | Sodergren, Erica J. | Versalovic, James | Wollam, Aye M. | Worley, Kim C. | Wortman, Jennifer R. | Young, Sarah K. | Zeng, Qiandong | Aagaard, Kjersti M. | Abolude, Olukemi O. | Allen-Vercoe, Emma | Alm, Eric J. | Alvarado, Lucia | Andersen, Gary L. | Anderson, Scott | Appelbaum, Elizabeth | Arachchi, Harindra M. | Armitage, Gary | Arze, Cesar A. | Ayvaz, Tulin | Baker, Carl C. | Begg, Lisa | Belachew, Tsegahiwot | Bhonagiri, Veena | Bihan, Monika | Blaser, Martin J. | Bloom, Toby | Vivien Bonazzi, J. | Brooks, Paul | Buck, Gregory A. | Buhay, Christian J. | Busam, Dana A. | Campbell, Joseph L. | Canon, Shane R. | Cantarel, Brandi L. | Chain, Patrick S. | Chen, I-Min A. | Chen, Lei | Chhibba, Shaila | Chu, Ken | Ciulla, Dawn M. | Clemente, Jose C. | Clifton, Sandra W. | Conlan, Sean | Crabtree, Jonathan | Cutting, Mary A. | Davidovics, Noam J. | Davis, Catherine C. | DeSantis, Todd Z. | Deal, Carolyn | Delehaunty, Kimberley D. | Dewhirst, Floyd E. | Deych, Elena | Ding, Yan | Dooling, David J. | Dugan, Shannon P. | Dunne, Wm. Michael | Durkin, A. Scott | Edgar, Robert C. | Erlich, Rachel L. | Farmer, Candace N. | Farrell, Ruth M. | Faust, Karoline | Feldgarden, Michael | Felix, Victor M. | Fisher, Sheila | Fodor, Anthony A. | Forney, Larry | Foster, Leslie | Di Francesco, Valentina | Friedman, Jonathan | Friedrich, Dennis C. | Fronick, Catrina C. | Fulton, Lucinda L. | Gao, Hongyu | Garcia, Nathalia | Giannoukos, Georgia | Giblin, Christina | Giovanni, Maria Y. | Goldberg, Jonathan M. | Goll, Johannes | Gonzalez, Antonio | Griggs, Allison | Gujja, Sharvari | Haas, Brian J. | Hamilton, Holli A. | Harris, Emily L. | Hepburn, Theresa A. | Herter, Brandi | Hoffmann, Diane E. | Holder, Michael E. | Howarth, Clinton | Huang, Katherine H. | Huse, Susan M. | Izard, Jacques | Jansson, Janet K. | Jiang, Huaiyang | Jordan, Catherine | Joshi, Vandita | Katancik, James A. | Keitel, Wendy A. | Kelley, Scott T. | Kells, Cristyn | Kinder-Haake, Susan | King, Nicholas B. | Knight, Rob | Knights, Dan | Kong, Heidi H. | Koren, Omry | Koren, Sergey | Kota, Karthik C. | Kovar, Christie L. | Kyrpides, Nikos C. | La Rosa, Patricio S. | Lee, Sandra L. | Lemon, Katherine P. | Lennon, Niall | Lewis, Cecil M. | Lewis, Lora | Ley, Ruth E. | Li, Kelvin | Liolios, Konstantinos | Liu, Bo | Liu, Yue | Lo, Chien-Chi | Lozupone, Catherine A. | Lunsford, R. Dwayne | Madden, Tessa | Mahurkar, Anup A. | Mannon, Peter J. | Mardis, Elaine R. | Markowitz, Victor M. | Mavrommatis, Konstantinos | McCorrison, Jamison M. | McDonald, Daniel | McEwen, Jean | McGuire, Amy L. | McInnes, Pamela | Mehta, Teena | Mihindukulasuriya, Kathie A. | Miller, Jason R. | Minx, Patrick J. | Newsham, Irene | Nusbaum, Chad | O’Laughlin, Michelle | Orvis, Joshua | Pagani, Ioanna | Palaniappan, Krishna | Patel, Shital M. | Pearson, Matthew | Peterson, Jane | Podar, Mircea | Pohl, Craig | Pollard, Katherine S. | Priest, Margaret E. | Proctor, Lita M. | Qin, Xiang | Raes, Jeroen | Ravel, Jacques | Reid, Jeffrey G. | Rho, Mina | Rhodes, Rosamond | Riehle, Kevin P. | Rivera, Maria C. | Rodriguez-Mueller, Beltran | Rogers, Yu-Hui | Ross, Matthew C. | Russ, Carsten | Sanka, Ravi K. | Pamela Sankar, J. | Sathirapongsasuti, Fah | Schloss, Jeffery A. | Schloss, Patrick D. | Schmidt, Thomas M. | Scholz, Matthew | Schriml, Lynn | Schubert, Alyxandria M. | Segata, Nicola | Segre, Julia A. | Shannon, William D. | Sharp, Richard R. | Sharpton, Thomas J. | Shenoy, Narmada | Sheth, Nihar U. | Simone, Gina A. | Singh, Indresh | Smillie, Chris S. | Sobel, Jack D. | Sommer, Daniel D. | Spicer, Paul | Sutton, Granger G. | Sykes, Sean M. | Tabbaa, Diana G. | Thiagarajan, Mathangi | Tomlinson, Chad M. | Torralba, Manolito | Treangen, Todd J. | Truty, Rebecca M. | Vishnivetskaya, Tatiana A. | Walker, Jason | Wang, Lu | Wang, Zhengyuan | Ward, Doyle V. | Warren, Wesley | Watson, Mark A. | Wellington, Christopher | Wetterstrand, Kris A. | White, James R. | Wilczek-Boney, Katarzyna | Wu, Yuan Qing | Wylie, Kristine M. | Wylie, Todd | Yandava, Chandri | Ye, Liang | Ye, Yuzhen | Yooseph, Shibu | Youmans, Bonnie P. | Zhang, Lan | Zhou, Yanjiao | Zhu, Yiming | Zoloth, Laurie | Zucker, Jeremy D. | Birren, Bruce W. | Gibbs, Richard A. | Highlander, Sarah K. | Weinstock, George M. | Wilson, Richard K. | White, Owen
Nature  2012;486(7402):215-221.
A variety of microbial communities and their genes (microbiome) exist throughout the human body, playing fundamental roles in human health and disease. The NIH funded Human Microbiome Project (HMP) Consortium has established a population-scale framework which catalyzed significant development of metagenomic protocols resulting in a broad range of quality-controlled resources and data including standardized methods for creating, processing and interpreting distinct types of high-throughput metagenomic data available to the scientific community. Here we present resources from a population of 242 healthy adults sampled at 15 to 18 body sites up to three times, which to date, have generated 5,177 microbial taxonomic profiles from 16S rRNA genes and over 3.5 Tb of metagenomic sequence. In parallel, approximately 800 human-associated reference genomes have been sequenced. Collectively, these data represent the largest resource to date describing the abundance and variety of the human microbiome, while providing a platform for current and future studies.
doi:10.1038/nature11209
PMCID: PMC3377744  PMID: 22699610
3.  Mind the Gap: Upgrading Genomes with Pacific Biosciences RS Long-Read Sequencing Technology 
PLoS ONE  2012;7(11):e47768.
Many genomes have been sequenced to high-quality draft status using Sanger capillary electrophoresis and/or newer short-read sequence data and whole genome assembly techniques. However, even the best draft genomes contain gaps and other imperfections due to limitations in the input data and the techniques used to build draft assemblies. Sequencing biases, repetitive genomic features, genomic polymorphism, and other complicating factors all come together to make some regions difficult or impossible to assemble. Traditionally, draft genomes were upgraded to “phase 3 finished” status using time-consuming and expensive Sanger-based manual finishing processes. For more facile assembly and automated finishing of draft genomes, we present here an automated approach to finishing using long-reads from the Pacific Biosciences RS (PacBio) platform. Our algorithm and associated software tool, PBJelly, (publicly available at https://sourceforge.net/projects/pb-jelly/) automates the finishing process using long sequence reads in a reference-guided assembly process. PBJelly also provides “lift-over” co-ordinate tables to easily port existing annotations to the upgraded assembly. Using PBJelly and long PacBio reads, we upgraded the draft genome sequences of a simulated Drosophila melanogaster, the version 2 draft Drosophila pseudoobscura, an assembly of the Assemblathon 2.0 budgerigar dataset, and a preliminary assembly of the Sooty mangabey. With 24× mapped coverage of PacBio long-reads, we addressed 99% of gaps and were able to close 69% and improve 12% of all gaps in D. pseudoobscura. With 4× mapped coverage of PacBio long-reads we saw reads address 63% of gaps in our budgerigar assembly, of which 32% were closed and 63% improved. With 6.8× mapped coverage of mangabey PacBio long-reads we addressed 97% of gaps and closed 66% of addressed gaps and improved 19%. The accuracy of gap closure was validated by comparison to Sanger sequencing on gaps from the original D. pseudoobscura draft assembly and shown to be dependent on initial reference quality.
doi:10.1371/journal.pone.0047768
PMCID: PMC3504050  PMID: 23185243
4.  Whole Genome Sequence of Treponema pallidum ssp. pallidum, Strain Mexico A, Suggests Recombination between Yaws and Syphilis Strains 
Background
Treponema pallidum ssp. pallidum (TPA), the causative agent of syphilis, and Treponema pallidum ssp. pertenue (TPE), the causative agent of yaws, are closely related spirochetes causing diseases with distinct clinical manifestations. The TPA Mexico A strain was isolated in 1953 from male, with primary syphilis, living in Mexico. Attempts to cultivate TPA Mexico A strain under in vitro conditions have revealed lower growth potential compared to other tested TPA strains.
Methodology/Principal Findings
The complete genome sequence of the TPA Mexico A strain was determined using the Illumina sequencing technique. The genome sequence assembly was verified using the whole genome fingerprinting technique and the final sequence was annotated. The genome size of the Mexico A strain was determined to be 1,140,038 bp with 1,035 predicted ORFs. The Mexico A genome sequence was compared to the whole genome sequences of three TPA (Nichols, SS14 and Chicago) and three TPE (CDC-2, Samoa D and Gauthier) strains. No large rearrangements in the Mexico A genome were found and the identified nucleotide changes occurred most frequently in genes encoding putative virulence factors. Nevertheless, the genome of the Mexico A strain, revealed two genes (TPAMA_0326 (tp92) and TPAMA_0488 (mcp2-1)) which combine TPA- and TPE- specific nucleotide sequences. Both genes were found to be under positive selection within TPA strains and also between TPA and TPE strains.
Conclusions/Significance
The observed mosaic character of the TPAMA_0326 and TPAMA_0488 loci is likely a result of inter-strain recombination between TPA and TPE strains during simultaneous infection of a single host suggesting horizontal gene transfer between treponemal subspecies.
Author Summary
Treponema pallidum is a Gram-negative spirochete that causes diseases with distinct clinical manifestations and uses different transmission strategies. While syphilis (caused by subspecies pallidum) is a worldwide venereal and congenital disease, yaws (caused by subspecies pertenue) is a tropical disease transmitted by direct skin contact. Currently the genetic basis and evolution of these diseases remain unknown.
In this study, we describe a high quality whole genome sequence of T. pallidum ssp. pallidum strain Mexico A, determined using the ?next generation? sequencing technique (Illumina). Although the genome of this strain contains no large rearrangements in comparison with other treponemal genomes, we found two genes which combined sequences from both subspecies pallidum and pertenue. The observed mosaic character of these two genes is likely a result of inter-strain recombination between pallidum and pertenue during simultaneous infection of a single host.
doi:10.1371/journal.pntd.0001832
PMCID: PMC3447947  PMID: 23029591
5.  Targeted enrichment beyond the consensus coding DNA sequence exome reveals exons with higher variant densities 
Genome Biology  2011;12(7):R68.
Background
Enrichment of loci by DNA hybridization-capture, followed by high-throughput sequencing, is an important tool in modern genetics. Currently, the most common targets for enrichment are the protein coding exons represented by the consensus coding DNA sequence (CCDS). The CCDS, however, excludes many actual or computationally predicted coding exons present in other databases, such as RefSeq and Vega, and non-coding functional elements such as untranslated and regulatory regions. The number of variants per base pair (variant density) and our ability to interrogate regions outside of the CCDS regions is consequently less well understood.
Results
We examine capture sequence data from outside of the CCDS regions and find that extremes of GC content that are present in different subregions of the genome can reduce the local capture sequence coverage to less than 50% relative to the CCDS. This effect is due to biases inherent in both the Illumina and SOLiD sequencing platforms that are exacerbated by the capture process. Interestingly, for two subregion types, microRNA and predicted exons, the capture process yields higher than expected coverage when compared to whole genome sequencing. Lastly, we examine the variation present in non-CCDS regions and find that predicted exons, as well as exonic regions specific to RefSeq and Vega, show much higher variant densities than the CCDS.
Conclusions
We show that regions outside of the CCDS perform less efficiently in capture sequence experiments. Further, we show that the variant density in computationally predicted exons is more than 2.5-times higher than that observed in the CCDS.
doi:10.1186/gb-2011-12-7-r68
PMCID: PMC3218830  PMID: 21787409
6.  Whole-Genome Sequencing for Optimized Patient Management 
Science Translational Medicine  2011;3(87):87re3.
Whole-genome sequencing of patient DNA can facilitate diagnosis of a disease, but its potential for guiding treatment has been under-realized. We interrogated the complete genome sequences of a 14-year-old fraternal twin pair diagnosed with dopa (3,4-dihydroxyphenylalanine)–responsive dystonia (DRD; Mendelian Inheritance in Man #128230). DRD is a genetically heterogeneous and clinically complex movement disorder that is usually treated with l-dopa, a precursor of the neurotransmitter dopamine. Whole-genome sequencing identified compound heterozygous mutations in the SPR gene encoding sepiapterin reductase. Disruption of SPR causes a decrease in tetrahydrobiopterin, a cofactor required for the hydroxylase enzymes that synthesize the neurotransmitters dopamine and serotonin. Supplementation of l-dopa therapy with 5-hydroxytryptophan, a serotonin precursor, resulted in clinical improvements in both twins.
doi:10.1126/scitranslmed.3002243
PMCID: PMC3314311  PMID: 21677200
7.  Identification of Genetic Susceptibility to Childhood Cancer through Analysis of Genes in Parallel 
Cancer genetics  2011;204(1):19-25.
Clinical cancer genetic susceptibility analysis typically proceeds sequentially beginning with the most likely causative gene. The process is time consuming and the yield is low particularly for families with unusual patterns of cancer. We determined the results of in parallel mutation analysis of a large cancer-associated gene panel. We performed deletion analysis and sequenced the coding regions of 45 genes (8 oncogenes and 37 tumor suppressor or DNA repair genes) in 48 childhood cancer patients who also (1) were diagnosed with a second malignancy under age 30, (2) have a sibling diagnosed with cancer under age 30 and/or (3) have a major congenital anomaly or developmental delay. Deleterious mutations were identified in 6 of 48 (13%) families, 4 of which met the sibling criteria. Mutations were identified in genes previously implicated in both dominant and recessive childhood syndromes including SMARCB1, PMS2, and TP53. No pathogenic deletions were identified. This approach has provided efficient identification of childhood cancer susceptibility mutations and will have greater utility as additional cancer susceptibility genes are identified. Integrating parallel analysis of large gene panels into clinical testing will speed results and increase diagnostic yield. The failure to detect mutations in 87% of families highlights that a number of childhood cancer susceptibility genes remain to be discovered.
doi:10.1016/j.cancergencyto.2010.11.001
PMCID: PMC3075924  PMID: 21356188
Cancer susceptibility; tumor suppressor genes; oncogenes; mutation analysis; rhabdoid tumors
8.  TTC21B contributes both causal and modifying alleles across the ciliopathy spectrum 
Nature genetics  2011;43(3):189-196.
Ciliary dysfunction leads to a broad range of overlapping phenotypes, termed collectively as ciliopathies. This grouping is underscored by genetic overlap, where causal genes can also contribute modifying alleles to clinically distinct disorders. Here we show that mutations in TTC21B/IFT139, encoding a retrograde intraflagellar transport (IFT) protein, cause both isolated nephronophthisis (NPHP) and syndromic Jeune Asphyxiating Thoracic Dystrophy (JATD). Moreover, although systematic medical resequencing of a large, clinically diverse ciliopathy cohort and matched controls showed a similar frequency of rare changes, in vivo and in vitro evaluations unmasked a significant enrichment of pathogenic alleles in cases, suggesting that TTC21B contributes pathogenic alleles to ∼5% of ciliopathy patients. Our data illustrate how genetic lesions can be both causally associated with diverse ciliopathies, as well as interact in trans with other disease-causing genes, and highlight how saturated resequencing followed by functional analysis of all variants informs the genetic architecture of disorders.
doi:10.1038/ng.756
PMCID: PMC3071301  PMID: 21258341
9.  Exome Sequencing of Head and Neck Squamous Cell Carcinoma Reveals Inactivating Mutations in NOTCH1 
Science (New York, N.Y.)  2011;333(6046):1154-1157.
Head and neck squamous cell carcinoma (HNSCC) is the sixth most common cancer worldwide. To explore the genetic origins of this cancer, we used whole exome sequencing and gene copy number analyses to study 32 primary tumors. Tumors from patients with a history of tobacco use had more mutations than did tumors from patients who did not use tobacco, and tumors that were negative for human papilloma virus (HPV) had more mutations than did HPV-positive tumors. Six of the genes that were mutated in multiple tumors were assessed in up to 88 additional HNSCCs. In addition to previously described mutations in TP53, CDKN2A, PIK3CA and HRAS, we identified mutations in FBXW7 and NOTCH1. Interestingly, nearly 40% of the 28 mutations identified in NOTCH1 were predicted to truncate the gene product, suggesting that NOTCH1 may function as a tumor suppressor gene rather than an oncogene in this tumor type.
doi:10.1126/science.1206923
PMCID: PMC3162986  PMID: 21798897
10.  Comparative and demographic analysis of orangutan genomes 
Locke, Devin P. | Hillier, LaDeana W. | Warren, Wesley C. | Worley, Kim C. | Nazareth, Lynne V. | Muzny, Donna M. | Yang, Shiaw-Pyng | Wang, Zhengyuan | Chinwalla, Asif T. | Minx, Pat | Mitreva, Makedonka | Cook, Lisa | Delehaunty, Kim D. | Fronick, Catrina | Schmidt, Heather | Fulton, Lucinda A. | Fulton, Robert S. | Nelson, Joanne O. | Magrini, Vincent | Pohl, Craig | Graves, Tina A. | Markovic, Chris | Cree, Andy | Dinh, Huyen H. | Hume, Jennifer | Kovar, Christie L. | Fowler, Gerald R. | Lunter, Gerton | Meader, Stephen | Heger, Andreas | Ponting, Chris P. | Marques-Bonet, Tomas | Alkan, Can | Chen, Lin | Cheng, Ze | Kidd, Jeffrey M. | Eichler, Evan E. | White, Simon | Searle, Stephen | Vilella, Albert J. | Chen, Yuan | Flicek, Paul | Ma, Jian | Raney, Brian | Suh, Bernard | Burhans, Richard | Herrero, Javier | Haussler, David | Faria, Rui | Fernando, Olga | Darré, Fleur | Farré, Domènec | Gazave, Elodie | Oliva, Meritxell | Navarro, Arcadi | Roberto, Roberta | Capozzi, Oronzo | Archidiacono, Nicoletta | Valle, Giuliano Della | Purgato, Stefania | Rocchi, Mariano | Konkel, Miriam K. | Walker, Jerilyn A. | Ullmer, Brygg | Batzer, Mark A. | Smit, Arian F. A. | Hubley, Robert | Casola, Claudio | Schrider, Daniel R. | Hahn, Matthew W. | Quesada, Victor | Puente, Xose S. | Ordoñez, Gonzalo R. | López-Otín, Carlos | Vinar, Tomas | Brejova, Brona | Ratan, Aakrosh | Harris, Robert S. | Miller, Webb | Kosiol, Carolin | Lawson, Heather A. | Taliwal, Vikas | Martins, André L. | Siepel, Adam | RoyChoudhury, Arindam | Ma, Xin | Degenhardt, Jeremiah | Bustamante, Carlos D. | Gutenkunst, Ryan N. | Mailund, Thomas | Dutheil, Julien Y. | Hobolth, Asger | Schierup, Mikkel H. | Chemnick, Leona | Ryder, Oliver A. | Yoshinaga, Yuko | de Jong, Pieter J. | Weinstock, George M. | Rogers, Jeffrey | Mardis, Elaine R. | Gibbs, Richard A. | Wilson, Richard K.
Nature  2011;469(7331):529-533.
“Orangutan” is derived from the Malay term “man of the forest” and aptly describes the Southeast Asian great apes native to Sumatra and Borneo. The orangutan species, Pongo abelii (Sumatran) and Pongo pygmaeus (Bornean), are the most phylogenetically distant great apes from humans, thereby providing an informative perspective on hominid evolution. Here we present a Sumatran orangutan draft genome assembly and short read sequence data from five Sumatran and five Bornean orangutan genomes. Our analyses reveal that, compared to other primates, the orangutan genome has many unique features. Structural evolution of the orangutan genome has proceeded much more slowly than other great apes, evidenced by fewer rearrangements, less segmental duplication, a lower rate of gene family turnover and surprisingly quiescent Alu repeats, which have played a major role in restructuring other primate genomes. We also describe the first primate polymorphic neocentromere, found in both Pongo species, emphasizing the gradual evolution of orangutan genome structure. Orangutans have extremely low energy usage for a eutherian mammal1, far lower than their hominid relatives. Adding their genome to the repertoire of sequenced primates illuminates new signals of positive selection in several pathways including glycolipid metabolism. From the population perspective, both Pongo species are deeply diverse; however, Sumatran individuals possess greater diversity than their Bornean counterparts, and more species-specific variation. Our estimate of Bornean/Sumatran speciation time, 400k years ago (ya), is more recent than most previous studies and underscores the complexity of the orangutan speciation process. Despite a smaller modern census population size, the Sumatran effective population size (Ne) expanded exponentially relative to the ancestral Ne after the split, while Bornean Ne declined over the same period. Overall, the resources and analyses presented here offer new opportunities in evolutionary genomics, insights into hominid biology, and an extensive database of variation for conservation efforts.
doi:10.1038/nature09687
PMCID: PMC3060778  PMID: 21270892
11.  Characterization of single-nucleotide variation in Indian-origin rhesus macaques (Macaca mulatta) 
BMC Genomics  2011;12:311.
Background
Rhesus macaques are the most widely utilized nonhuman primate model in biomedical research. Previous efforts have validated fewer than 900 single nucleotide polymorphisms (SNPs) in this species, which limits opportunities for genetic studies related to health and disease. Extensive information about SNPs and other genetic variation in rhesus macaques would facilitate valuable genetic analyses, as well as provide markers for genome-wide linkage analysis and the genetic management of captive breeding colonies.
Results
We used the available rhesus macaque draft genome sequence, new sequence data from unrelated individuals and existing published sequence data to create a genome-wide SNP resource for Indian-origin rhesus monkeys. The original reference animal and two additional Indian-origin individuals were resequenced to low coverage using SOLiD™ sequencing. We then used three strategies to validate SNPs: comparison of potential SNPs found in the same individual using two different sequencing chemistries, and comparison of potential SNPs in different individuals identified with either the same or different sequencing chemistries. Our approach validated approximately 3 million SNPs distributed across the genome. Preliminary analysis of SNP annotations suggests that a substantial number of these macaque SNPs may have functional effects. More than 700 non-synonymous SNPs were scored by Polyphen-2 as either possibly or probably damaging to protein function and these variants now constitute potential models for studying functional genetic variation relevant to human physiology and disease.
Conclusions
Resequencing of a small number of animals identified greater than 3 million SNPs. This provides a significant new information resource for rhesus macaques, an important research animal. The data also suggests that overall genetic variation is high in this species. We identified many potentially damaging non-synonymous coding SNPs, providing new opportunities to identify rhesus models for human disease.
doi:10.1186/1471-2164-12-311
PMCID: PMC3141668  PMID: 21668978
single nucleotide polymorphism; common variants; SOLiD™; genetic variation; rhesus macaque
12.  Complete Genome Sequence of Treponema paraluiscuniculi, Strain Cuniculi A: The Loss of Infectivity to Humans Is Associated with Genome Decay 
PLoS ONE  2011;6(5):e20415.
Treponema paraluiscuniculi is the causative agent of rabbit venereal spirochetosis. It is not infectious to humans, although its genome structure is very closely related to other pathogenic Treponema species including Treponema pallidum subspecies pallidum, the etiological agent of syphilis. In this study, the genome sequence of Treponema paraluiscuniculi, strain Cuniculi A, was determined by a combination of several high-throughput sequencing strategies. Whereas the overall size (1,133,390 bp), arrangement, and gene content of the Cuniculi A genome closely resembled those of the T. pallidum genome, the T. paraluiscuniculi genome contained a markedly higher number of pseudogenes and gene fragments (51). In addition to pseudogenes, 33 divergent genes were also found in the T. paraluiscuniculi genome. A set of 32 (out of 84) affected genes encoded proteins of known or predicted function in the Nichols genome. These proteins included virulence factors, gene regulators and components of DNA repair and recombination. The majority (52 or 61.9%) of the Cuniculi A pseudogenes and divergent genes were of unknown function. Our results indicate that T. paraluiscuniculi has evolved from a T. pallidum-like ancestor and adapted to a specialized host-associated niche (rabbits) during loss of infectivity to humans. The genes that are inactivated or altered in T. paraluiscuniculi are candidates for virulence factors important in the infectivity and pathogenesis of T. pallidum subspecies.
doi:10.1371/journal.pone.0020415
PMCID: PMC3105029  PMID: 21655244
13.  Genetic diversity in India and the inference of Eurasian population expansion 
Genome Biology  2010;11(11):R113.
Background
Genetic studies of populations from the Indian subcontinent are of great interest because of India's large population size, complex demographic history, and unique social structure. Despite recent large-scale efforts in discovering human genetic variation, India's vast reservoir of genetic diversity remains largely unexplored.
Results
To analyze an unbiased sample of genetic diversity in India and to investigate human migration history in Eurasia, we resequenced one 100-kb ENCODE region in 92 samples collected from three castes and one tribal group from the state of Andhra Pradesh in south India. Analyses of the four Indian populations, along with eight HapMap populations (692 samples), showed that 30% of all SNPs in the south Indian populations are not seen in HapMap populations. Several Indian populations, such as the Yadava, Mala/Madiga, and Irula, have nucleotide diversity levels as high as those of HapMap African populations. Using unbiased allele-frequency spectra, we investigated the expansion of human populations into Eurasia. The divergence time estimates among the major population groups suggest that Eurasian populations in this study diverged from Africans during the same time frame (approximately 90 to 110 thousand years ago). The divergence among different Eurasian populations occurred more than 40,000 years after their divergence with Africans.
Conclusions
Our results show that Indian populations harbor large amounts of genetic variation that have not been surveyed adequately by public SNP discovery efforts. Our data also support a delayed expansion hypothesis in which an ancestral Eurasian founding population remained isolated long after the out-of-Africa diaspora, before expanding throughout Eurasia.
doi:10.1186/gb-2010-11-11-r113
PMCID: PMC3156952  PMID: 21106085
14.  A Collagen-Binding Adhesin, Acb, and Ten Other Putative MSCRAMM and Pilus Family Proteins of Streptococcus gallolyticus subsp. gallolyticus (Streptococcus bovis Group, Biotype I)▿ §  
Journal of Bacteriology  2009;191(21):6643-6653.
Members of the Streptococcus bovis group are important causes of endocarditis. However, factors associated with their pathogenicity, such as adhesins, remain uncharacterized. We recently demonstrated that endocarditis-derived Streptococcus gallolyticus subsp. gallolyticus isolates frequently adhere to extracellular matrix (ECM) proteins. Here, we generated a draft genome sequence of an ECM protein-adherent S. gallolyticus subsp. gallolyticus strain and found, by genome-wide analyses, 11 predicted LPXTG-type cell wall-anchored proteins with characteristics of MSCRAMMs, including a modular architecture of domains predicted to adopt immunoglobulin (Ig)-like folding. A recombinant segment of one of these, Acb, showed high-affinity binding to immobilized collagen, and cell surface expression of Acb correlated with the presence of acb and collagen adherence of isolates. Three of the 11 proteins have similarities to major pilus subunits and are organized in separate clusters, each including a second Ig-fold-containing MSCRAMM and a class C sortase, suggesting that the sequenced strain encodes three distinct types of pili. Reverse transcription-PCR demonstrated that all three genes of one cluster, acb-sbs7-srtC1, are cotranscribed, consistent with pilus operons of other gram-positive bacteria. Further analysis detected expression of all 11 genes in cells grown to mid to late exponential growth phases. Wide distribution of 9 of the 11 genes was observed among S. gallolyticus subsp. gallolyticus isolates with fewer genes present in other S. bovis group species/subspecies. The high prevalence of genes encoding putative MSCRAMMs and pili, including a collagen-binding MSCRAMM, among S. gallolyticus subsp. gallolyticus isolates may play an important role in the predominance of this subspecies in S. bovis endocarditis.
doi:10.1128/JB.00909-09
PMCID: PMC2795296  PMID: 19717590
15.  A common allele in RPGRIP1L is a modifier of retinal degeneration in ciliopathies 
Nature genetics  2009;41(6):739-745.
Despite rapid advances in disease gene identification, the predictive power of the genotype remains limited, in part due to poorly understood effects of second-site modifiers. Here we demonstrate that a polymorphic coding variant of RPGRIP1L (retinitis pigmentosa GTPase regulator-interacting protein-1 like), a ciliary gene mutated in Meckel-Gruber (MKS) and Joubert (JBTS) syndromes, is associated with the development of retinal degeneration in patients with ciliopathies caused by mutations in other genes. As part of our resequencing efforts of the ciliary proteome, we identified several putative loss of function RPGRIP1L mutations, including one common variant, A229T. Multiple genetic lines of evidence showed this allele to be associated with photoreceptor loss in ciliopathies. Moreover, we show that RPGRIP1L interacts biochemically with RPGR, loss of which causes retinal degeneration, and that the 229T-encoded protein significantly compromises this interaction. Our data represent an example of modification of a discrete phenotype of syndromic disease and highlight the importance of a multifaceted approach for the discovery of modifier alleles of intermediate frequency and effect.
doi:10.1038/ng.366
PMCID: PMC2783476  PMID: 19430481
16.  Common and Rare Variants of DAOA in Bipolar Disorder 
The d-amino acid oxidase activator (DAOA, previously known as G72) gene, mapped on 13q33, has been reported to be genetically associated with bipolar disorder (BP) in several populations. The consistency of associated variants is unclear and rare variants in exons of the DAOA gene have not been investigated in psychiatric diseases. We employed a conditional linkage method - STatistical Explanation for Positional Cloning (STEPC) to evaluate whether any associated single nucleotide polymorphisms (SNPs) account for the evidence of linkage in a pedigree series that previously has been linked to marker D13S779 at 13q33. We also performed an association study in a sample of 376 Caucasian BP parent-proband trios by genotyping 38 common SNPs in the gene region. Besides, we resequenced coding regions and flanking intronic sequences of DAOA in 555 Caucasian unrelated BP patients and 564 mentally healthy controls, to identify putative functional rare variants that may contribute to disease. One SNP rs1935058 could “explain” the linkage signal in the family sample set (P = 0.055) using STEPC analysis. No significant allelic association was detected in an association study by genotyping 38 common SNPs in 376 Caucasian BP trios. Reseqencing identified 53 SNPs, of which 46 were novel SNPs. There was no significant excess of rare variants in cases relative to controls. Our results suggest that DAOA does not have a major effect on BP susceptibility. However, DAOA may contribute to bipolar susceptibility in some specific families as evidenced by the STEPC analysis.
doi:10.1002/ajmg.b.30925
PMCID: PMC2753761  PMID: 19194963
G72; DAOA; bipolar disorder; resequencing; single nucleotide polymorphism; genetic association
17.  Somatic mutations affect key pathways in lung adenocarcinoma 
Ding, Li | Getz, Gad | Wheeler, David A. | Mardis, Elaine R. | McLellan, Michael D. | Cibulskis, Kristian | Sougnez, Carrie | Greulich, Heidi | Muzny, Donna M. | Morgan, Margaret B. | Fulton, Lucinda | Fulton, Robert S. | Zhang, Qunyuan | Wendl, Michael C. | Lawrence, Michael S. | Larson, David E. | Chen, Ken | Dooling, David J. | Sabo, Aniko | Hawes, Alicia C. | Shen, Hua | Jhangiani, Shalini N. | Lewis, Lora R. | Hall, Otis | Zhu, Yiming | Mathew, Tittu | Ren, Yanru | Yao, Jiqiang | Scherer, Steven E. | Clerc, Kerstin | Metcalf, Ginger A. | Ng, Brian | Milosavljevic, Aleksandar | Gonzalez-Garay, Manuel L. | Osborne, John R. | Meyer, Rick | Shi, Xiaoqi | Tang, Yuzhu | Koboldt, Daniel C. | Lin, Ling | Abbott, Rachel | Miner, Tracie L. | Pohl, Craig | Fewell, Ginger | Haipek, Carrie | Schmidt, Heather | Dunford-Shore, Brian H. | Kraja, Aldi | Crosby, Seth D. | Sawyer, Christopher S. | Vickery, Tammi | Sander, Sacha | Robinson, Jody | Winckler, Wendy | Baldwin, Jennifer | Chirieac, Lucian R. | Dutt, Amit | Fennell, Tim | Hanna, Megan | Johnson, Bruce E. | Onofrio, Robert C. | Thomas, Roman K. | Tonon, Giovanni | Weir, Barbara A. | Zhao, Xiaojun | Ziaugra, Liuda | Zody, Michael C. | Giordano, Thomas | Orringer, Mark B. | Roth, Jack A. | Spitz, Margaret R. | Wistuba, Ignacio I. | Ozenberger, Bradley | Good, Peter J. | Chang, Andrew C. | Beer, David G. | Watson, Mark A. | Ladanyi, Marc | Broderick, Stephen | Yoshizawa, Akihiko | Travis, William D. | Pao, William | Province, Michael A. | Weinstock, George M. | Varmus, Harold E. | Gabriel, Stacey B. | Lander, Eric S. | Gibbs, Richard A. | Meyerson, Matthew | Wilson, Richard K.
Nature  2008;455(7216):1069-1075.
Determining the genetic basis of cancer requires comprehensive analyses of large collections of histopathologically well-classified primary tumours. Here we report the results of a collaborative study to discover somatic mutations in 188 human lung adenocarcinomas. DNA sequencing of 623 genes with known or potential relationships to cancer revealed more than 1,000 somatic mutations across the samples. Our analysis identified 26 genes that are mutated at significantly high frequencies and thus are probably involved in carcinogenesis. The frequently mutated genes include tyrosine kinases, among them the EGFR homologue ERBB4; multiple ephrin receptor genes, notably EPHA3; vascular endothelial growth factor receptor KDR; and NTRK genes. These data provide evidence of somatic mutations in primary lung adenocarcinoma for several tumour suppressor genes involved in other cancers—including NF1, APC, RB1 and ATM—and for sequence changes in PTPRD as well as the frequently deleted gene LRP1B. The observed mutational profiles correlate with clinical features, smoking status and DNA repair defects. These results are reinforced by data integration including single nucleotide polymorphism array and gene expression array. Our findings shed further light on several important signalling pathways involved in lung adenocarcinoma, and suggest new molecular targets for treatment.
doi:10.1038/nature07423
PMCID: PMC2694412  PMID: 18948947
18.  Pulmonary alveolar proteinosis caused by deletion of the GM-CSFRα gene in the X chromosome pseudoautosomal region 1 
The Journal of Experimental Medicine  2008;205(12):2711-2716.
Pulmonary alveolar proteinosis (PAP) is a rare lung disorder in which surfactant-derived lipoproteins accumulate excessively within pulmonary alveoli, causing severe respiratory distress. The importance of granulocyte/macrophage colony-stimulating factor (GM-CSF) in the pathogenesis of PAP has been confirmed in humans and mice, wherein GM-CSF signaling is required for pulmonary alveolar macrophage catabolism of surfactant. PAP is caused by disruption of GM-CSF signaling in these cells, and is usually caused by neutralizing autoantibodies to GM-CSF or is secondary to other underlying diseases. Rarely, genetic defects in surfactant proteins or the common β chain for the GM-CSF receptor (GM-CSFR) are causal. Using a combination of cellular, molecular, and genomic approaches, we provide the first evidence that PAP can result from a genetic deficiency of the GM-CSFR α chain, encoded in the X-chromosome pseudoautosomal region 1.
doi:10.1084/jem.20080759
PMCID: PMC2585851  PMID: 18955567
19.  The Complete Genome Sequence of Escherichia coli DH10B: Insights into the Biology of a Laboratory Workhorse▿ †  
Journal of Bacteriology  2008;190(7):2597-2606.
Escherichia coli DH10B was designed for the propagation of large insert DNA library clones. It is used extensively, taking advantage of properties such as high DNA transformation efficiency and maintenance of large plasmids. The strain was constructed by serial genetic recombination steps, but the underlying sequence changes remained unverified. We report the complete genomic sequence of DH10B by using reads accumulated from the bovine sequencing project at Baylor College of Medicine and assembled with DNAStar's SeqMan genome assembler. The DH10B genome is largely colinear with that of the wild-type K-12 strain MG1655, although it is substantially more complex than previously appreciated, allowing DH10B biology to be further explored. The 226 mutated genes in DH10B relative to MG1655 are mostly attributable to the extensive genetic manipulations the strain has undergone. However, we demonstrate that DH10B has a 13.5-fold higher mutation rate than MG1655, resulting from a dramatic increase in insertion sequence (IS) transposition, especially IS150. IS elements appear to have remodeled genome architecture, providing homologous recombination sites for a 113,260-bp tandem duplication and an inversion. DH10B requires leucine for growth on minimal medium due to the deletion of leuLABCD and harbors both the relA1 and spoT1 alleles causing both sensitivity to nutritional downshifts and slightly lower growth rates relative to the wild type. Finally, while the sequence confirms most of the reported alleles, the sequence of deoR is wild type, necessitating reexamination of the assumed basis for the high transformability of DH10B.
doi:10.1128/JB.01695-07
PMCID: PMC2293198  PMID: 18245285
20.  The Genome Sequence of Mannheimia haemolytica A1: Insights into Virulence, Natural Competence, and Pasteurellaceae Phylogeny†  
Journal of Bacteriology  2006;188(20):7257-7266.
The draft genome sequence of Mannheimia haemolytica A1, the causative agent of bovine respiratory disease complex (BRDC), is presented. Strain ATCC BAA-410, isolated from the lung of a calf with BRDC, was the DNA source. The annotated genome includes 2,839 coding sequences, 1,966 of which were assigned a function and 436 of which are unique to M. haemolytica. Through genome annotation many features of interest were identified, including bacteriophages and genes related to virulence, natural competence, and transcriptional regulation. In addition to previously described virulence factors, M. haemolytica encodes adhesins, including the filamentous hemagglutinin FhaB and two trimeric autotransporter adhesins. Two dual-function immunoglobulin-protease/adhesins are also present, as is a third immunoglobulin protease. Genes related to iron acquisition and drug resistance were identified and are likely important for survival in the host and virulence. Analysis of the genome indicates that M. haemolytica is naturally competent, as genes for natural competence and DNA uptake signal sequences (USS) are present. Comparison of competence loci and USS in other species in the family Pasteurellaceae indicates that M. haemolytica, Actinobacillus pleuropneumoniae, and Haemophilus ducreyi form a lineage distinct from other Pasteurellaceae. This observation was supported by a phylogenetic analysis using sequences of predicted housekeeping genes.
doi:10.1128/JB.00675-06
PMCID: PMC1636238  PMID: 17015664
21.  Finishing a whole-genome shotgun: Release 3 of the Drosophila melanogaster euchromatic genome sequence 
Genome Biology  2002;3(12):research0079.1-79.14.
The Drosophila melanogaster genome was the first metazoan genome to be sequenced by whole-genome shotgun. Now, the sequence has been finished in a process designed to close gaps, improve sequence quality and validate the assembly.
Background
The Drosophila melanogaster genome was the first metazoan genome to have been sequenced by the whole-genome shotgun (WGS) method. Two issues relating to this achievement were widely debated in the genomics community: how correct is the sequence with respect to base-pair (bp) accuracy and frequency of assembly errors? And, how difficult is it to bring a WGS sequence to the accepted standard for finished sequence? We are now in a position to answer these questions.
Results
Our finishing process was designed to close gaps, improve sequence quality and validate the assembly. Sequence traces derived from the WGS and draft sequencing of individual bacterial artificial chromosomes (BACs) were assembled into BAC-sized segments. These segments were brought to high quality, and then joined to constitute the sequence of each chromosome arm. Overall assembly was verified by comparison to a physical map of fingerprinted BAC clones. In the current version of the 116.9 Mb euchromatic genome, called Release 3, the six euchromatic chromosome arms are represented by 13 scaffolds with a total of 37 sequence gaps. We compared Release 3 to Release 2; in autosomal regions of unique sequence, the error rate of Release 2 was one in 20,000 bp.
Conclusions
The WGS strategy can efficiently produce a high-quality sequence of a metazoan genome while generating the reagents required for sequence finishing. However, the initial method of repeat assembly was flawed. The sequence we report here, Release 3, is a reliable resource for molecular genetic experimentation and computational analysis.
doi:10.1186/gb-2002-3-12-research0079
PMCID: PMC151181  PMID: 12537568
22.  Glass bead purification of plasmid template DNA for high throughput sequencing of mammalian genomes 
Nucleic Acids Research  2002;30(7):e32.
To meet the new challenge of generating the draft sequences of mammalian genomes, we describe the development of a novel high throughput 96-well method for the purification of plasmid DNA template using size-fractionated, acid-washed glass beads. Unlike most previously described approaches, the current method has been designed and optimized to facilitate the direct binding of alcohol-precipitated plasmid DNA to glass beads from alkaline lysed bacterial cells containing the insoluble cellular aggregate material. Eliminating the tedious step of separating the cleared lysate significantly simplifies the method and improves throughput and reliability. During a 4 month period of 96-capillary DNA sequencing of the Rattus norvegicus genome at the Baylor College of Medicine Human Genome Sequencing Center, the average success rate and read length derived from >1 800 000 plasmid DNA templates prepared by the direct lysis/glass bead method were 82.2% and 516 bases, respectively. The cost of this direct lysis/glass bead method in September 2001 was ∼10 cents per clone, which is a significant cost saving in high throughput genomic sequencing efforts.
PMCID: PMC101856  PMID: 11917038
23.  Deep resequencing reveals excess rare recent variants consistent with explosive population growth 
Nature Communications  2010;1:131-.
Accurately determining the distribution of rare variants is an important goal of human genetics, but resequencing of a sample large enough for this purpose has been unfeasible until now. Here, we applied Sanger sequencing of genomic PCR amplicons to resequence the diabetes-associated genes KCNJ11 and HHEX in 13,715 people (10,422 European Americans and 3,293 African Americans) and validated amplicons potentially harbouring rare variants using 454 pyrosequencing. We observed far more variation (expected variant-site count ∼578) than would have been predicted on the basis of earlier surveys, which could only capture the distribution of common variants. By comparison with earlier estimates based on common variants, our model shows a clear genetic signal of accelerating population growth, suggesting that humanity harbours a myriad of rare, deleterious variants, and that disease risk and the burden of disease in contemporary populations may be heavily influenced by the distribution of rare variants.
To fully catalogue rare genetic variation in humans, many samples need to be examined. In this study, Coventry et al. resequenced two genes, KCNJ11 and HHEX, in 13,715 humans, and concluded that most of the sequence variation arose recently and that variation is greater than expected.
doi:10.1038/ncomms1130
PMCID: PMC3060603  PMID: 21119644
24.  Strict evolutionary conservation followed rapid gene loss on human and rhesus Y chromosomes 
Nature  2012;483(7387):82-86.
The human X and Y chromosomes evolved from an ordinary pair of autosomes during the past 200–300 million years1–3. Due to genetic decay, the human MSY (male-specific region of Y chromosome) retains only three percent of the ancestral autosomes’ genes4,5. This evolutionary decay was driven by a series of five “stratification” events. Each event suppressed X-Y crossing over within a chromosome segment or “stratum”, incorporated that segment into the MSY, and subjected its genes to the erosive forces that attend the absence of crossing over2,6. The last of these events occurred 30 million years ago (mya), or 5 million years before the human and Old World monkey (OWM) lineages diverged. Although speculation abounds regarding ongoing decay and looming extinction of the human Y chromosome7–10, remarkably little is known about how many MSY genes were lost in the human lineage in the 25 million years that have followed its separation from the OWM lineage. To explore this question, we sequenced the MSY of the rhesus macaque, an OWM, and compared it to the human MSY. We discovered that, during the last 25 million years, MSY gene loss in the human lineage was limited to the youngest stratum (stratum 5), which comprises three percent of the human MSY. Within the older strata, which collectively comprise the bulk of the human MSY, gene loss evidently ceased more than 25 mya. Likewise, the rhesus MSY has not lost any older genes (from strata 1–4) during the past 25 million years, despite major structural differences from the human MSY. The rhesus MSY is simpler, with few amplified gene families or palindromes that might enable intrachromosomal recombination and repair. We present an empirical reconstruction of human MSY evolution in which each stratum transitioned from rapid, exponential loss of ancestral genes to strict conservation through purifying selection.
doi:10.1038/nature10843
PMCID: PMC3292678  PMID: 22367542
25.  Complete genome sequence of Enterococcus faecium strain TX16 and comparative genomic analysis of Enterococcus faecium genomes 
BMC Microbiology  2012;12:135.
Background
Enterococci are among the leading causes of hospital-acquired infections in the United States and Europe, with Enterococcus faecalis and Enterococcus faecium being the two most common species isolated from enterococcal infections. In the last decade, the proportion of enterococcal infections caused by E. faecium has steadily increased compared to other Enterococcus species. Although the underlying mechanism for the gradual replacement of E. faecalis by E. faecium in the hospital environment is not yet understood, many studies using genotyping and phylogenetic analysis have shown the emergence of a globally dispersed polyclonal subcluster of E. faecium strains in clinical environments. Systematic study of the molecular epidemiology and pathogenesis of E. faecium has been hindered by the lack of closed, complete E. faecium genomes that can be used as references.
Results
In this study, we report the complete genome sequence of the E. faecium strain TX16, also known as DO, which belongs to multilocus sequence type (ST) 18, and was the first E. faecium strain ever sequenced. Whole genome comparison of the TX16 genome with 21 E. faecium draft genomes confirmed that most clinical, outbreak, and hospital-associated (HA) strains (including STs 16, 17, 18, and 78), in addition to strains of non-hospital origin, group in the same clade (referred to as the HA clade) and are evolutionally considerably more closely related to each other by phylogenetic and gene content similarity analyses than to isolates in the community-associated (CA) clade with approximately a 3–4% average nucleotide sequence difference between the two clades at the core genome level. Our study also revealed that many genomic loci in the TX16 genome are unique to the HA clade. 380 ORFs in TX16 are HA-clade specific and antibiotic resistance genes are enriched in HA-clade strains. Mobile elements such as IS16 and transposons were also found almost exclusively in HA strains, as previously reported.
Conclusions
Our findings along with other studies show that HA clonal lineages harbor specific genetic elements as well as sequence differences in the core genome which may confer selection advantages over the more heterogeneous CA E. faecium isolates. Which of these differences are important for the success of specific E. faecium lineages in the hospital environment remain(s) to be determined.
doi:10.1186/1471-2180-12-135
PMCID: PMC3433357  PMID: 22769602

Results 1-25 (29)