PMCC PMCC

Search tips
Search criteria

Advanced
Results 1-25 (134)
 

Clipboard (0)
None

Select a Filter Below

Journals
more »
Year of Publication
more »
1.  The Common Marmoset Genome Provides Insight into Primate Biology and Evolution 
Worley, Kim C. | Warren, Wesley C. | Rogers, Jeffrey | Locke, Devin | Muzny, Donna M. | Mardis, Elaine R. | Weinstock, George M. | Tardif, Suzette D. | Aagaard, Kjersti M. | Archidiacono, Nicoletta | Rayan, Nirmala Arul | Batzer, Mark A. | Beal, Kathryn | Brejova, Brona | Capozzi, Oronzo | Capuano, Saverio B. | Casola, Claudio | Chandrabose, Mimi M. | Cree, Andrew | Dao, Marvin Diep | de Jong, Pieter J. | del Rosario, Ricardo Cruz-Herrera | Delehaunty, Kim D. | Dinh, Huyen H. | Eichler, Evan | Fitzgerald, Stephen | Flicek, Paul | Fontenot, Catherine C. | Fowler, R. Gerald | Fronick, Catrina | Fulton, Lucinda A. | Fulton, Robert S. | Gabisi, Ramatu Ayiesha | Gerlach, Daniel | Graves, Tina A. | Gunaratne, Preethi H. | Hahn, Matthew W. | Haig, David | Han, Yi | Harris, R. Alan | Herrero, Javier M. | Hillier, LaDeana W. | Hubley, Robert | Hughes, Jennifer F. | Hume, Jennifer | Jhangiani, Shalini N. | Jorde, Lynn B. | Joshi, Vandita | Karakor, Emre | Konkel, Miriam K. | Kosiol, Carolin | Kovar, Christie L. | Kriventseva, Evgenia V. | Lee, Sandra L. | Lewis, Lora R. | Liu, Yih-shin | Lopez, John | Lopez-Otin, Carlos | Lorente-Galdos, Belen | Mansfield, Keith G. | Marques-Bonet, Tomas | Minx, Patrick | Misceo, Doriana | Moncrieff, J. Scott | Morgan, Margaret B. | Muthuswamy, Raveendran | Nazareth, Lynne V. | Newsham, Irene | Nguyen, Ngoc Bich | Okwuonu, Geoffrey O. | Prabhakar, Shyam | Perales, Lora | Pu, Ling-Ling | Puente, Xose S. | Quesada, Victor | Ranck, Megan C. | Raney, Brian J. | Deiros, David Rio | Rocchi, Mariano | Rodriguez, David | Ross, Corinna | Ruffier, Magali | Ruiz, San Juana | Sajjadian, S. | Santibanez, Jireh | Schrider, Daniel R. | Searle, Steve | Skaletsky, Helen | Soibam, Benjamin | Smit, Arian F. A. | Tennakoon, Jayantha B. | Tomaska, Lubomir | Ullmer, Brygg | Vejnar, Charles E. | Ventura, Mario | Vilella, Albert J. | Vinar, Tomas | Vogel, Jan-Hinnerk | Walker, Jerilyn A. | Wang, Qing | Warner, Crystal M. | Wildman, Derek E. | Witherspoon, David J. | Wright, Rita A. | Wu, Yuanqing | Xiao, Weimin | Xing, Jinchuan | Zdobnov, Evgeny M. | Zhu, Baoli | Gibbs, Richard A. | Wilson, Richard K.
Nature genetics  2014;46(8):850-857.
A first analysis of the genome sequence of the common marmoset (Callithrix jacchus), assembled using traditional Sanger methods and Ensembl annotation, has permitted genomic comparison with apes and that old world monkeys and the identification of specific molecular features a rapid reproductive capacity partly due to may contribute to the unique biology of diminutive The common marmoset has prevalence of this dizygotic primate. twins. Remarkably, these twins share placental circulation and exchange hematopoietic stem cells in utero, resulting in adults that are hematopoietic chimeras.
We observed positive selection or non-synonymous substitutions for genes encoding growth hormone / insulin-like growth factor (growth pathways), respiratory complex I (metabolic pathways), immunobiology, and proteases (reproductive and immunity pathways). In addition, both protein-coding and microRNA genes related to reproduction exhibit rapid sequence evolution. This New World monkey genome sequence enables significantly increased power for comparative analyses among available primate genomes and facilitates biomedical research application.
doi:10.1038/ng.3042
PMCID: PMC4138798  PMID: 25038751
2.  Genome Sequence of Enterovirus D68 from St. Louis, Missouri, USA 
Emerging Infectious Diseases  2015;21(1):184-186.
doi:10.3201/eid2101.141605
PMCID: PMC4285240  PMID: 25532062
enterovirus D68; enterovirus; genomic; outbreak; respiratory; asthma; pediatric; viruses; St. Louis
3.  Using VarScan 2 for Germline Variant Calling and Somatic Mutation Detection 
The identification of small sequence variants remains a challenging but critical step in the analysis of next-generation sequencing data. Our variant calling tool, VarScan 2, employs heuristic and statistic thresholds based on user-defined criteria to call variants using SAMtools mpileup data as input. Here, we provide guidelines for generating that input, and describe protocols for using VarScan 2 to (1) identify germline variants in individual samples; (2) call somatic mutations, copy number alterations, and LOH events in tumor-normal pairs; and (3) identify germline variants, de novo mutations, and Mendelian inheritance errors in family trios. Further, we describe a strategy for variant filtering that removes likely false positives associated with common sequencing- and alignment-related artifacts.
doi:10.1002/0471250953.bi1504s44
PMCID: PMC4278659  PMID: 25553206
variant calling; mutation detection; trio calling; snvs; indels; varscan 2; next-generation sequencing
4.  Targeting Oxidative Stress in Embryonal Rhabdomyosarcoma 
Cancer cell  2013;24(6):710-724.
SUMMARY
Rhabdomyosarcoma is a soft-tissue sarcoma with molecular and cellular features of developing skeletal muscle. Rhabdomyosarcoma has two major histological subtypes, embryonal and alveolar, each with distinct clinical, molecular, and genetic features. Genomic analysis show that embryonal tumors have more structural and copy number variations than alveolar tumors. Mutations in the RAS/NF1 pathway are significantly associated with intermediate- and high-risk embryonal rhabdomyosarcomas (ERMS). In contrast, alveolar rhabdomyosarcoma (ARMS) have fewer genetic lesions overall and no known recurrently mutated cancer consensus genes. To identify therapeutics for ERMS, we developed and characterized orthotopic xenografts of tumors that were sequenced in our study. High throughput screening of primary cultures derived from those xenografts identified oxidative stress as a pathway of therapeutic relevance for ERMS.
doi:10.1016/j.ccr.2013.11.002
PMCID: PMC3904731  PMID: 24332040
5.  The DNA Double-Strand Break Response Is Abnormal in Myeloblasts From Patients With Therapy-Related Acute Myeloid Leukemia 
Leukemia  2013;28(6):1242-1251.
The complex chromosomal aberrations found in therapy related acute myeloid leukemia (t-AML) suggest that the DNA double strand break (DSB) response may be altered. In this study we examined the DNA DSB response of primary bone marrow cells from t-AML patients and performed next-generation sequencing of 37 canonical homologous recombination (HR) and non-homologous end-joining (NHEJ) DNA repair genes, and a subset of DNA damage response genes using tumor and paired normal DNA obtained from t-AML patients. Our results suggest that the majority of t-AML patients (11 of 15) have tumor cell-intrinsic, functional dysregulation of their DSB response. Distinct patterns of abnormal DNA damage response in myeloblasts correlated with acquired genetic alterations in TP53 and the presence of inferred chromothripsis. Furthermore, the presence of trisomy 8 in tumor cells was associated with persistently elevated levels of DSBs. Although tumor-acquired point mutations or small indels in canonical HR and NHEJ genes do not appear to be a dominant means by which t-AML leukemogenesis occurs, our functional studies suggest that an abnormal response to DNA damage is a common finding in t-AML.
doi:10.1038/leu.2013.368
PMCID: PMC4047198  PMID: 24304937
therapy-related AML; DNA damage; DNA repair; Trisomy 8
6.  Clonal Architectures and Driver Mutations in Metastatic Melanomas 
PLoS ONE  2014;9(11):e111153.
To reveal the clonal architecture of melanoma and associated driver mutations, whole genome sequencing (WGS) and targeted extension sequencing were used to characterize 124 melanoma cases. Significantly mutated gene analysis using 13 WGS cases and 15 additional paired extension cases identified known melanoma genes such as BRAF, NRAS, and CDKN2A, as well as a novel gene EPHA3, previously implicated in other cancer types. Extension studies using tumors from another 96 patients discovered a large number of truncation mutations in tumor suppressors (TP53 and RB1), protein phosphatases (e.g., PTEN, PTPRB, PTPRD, and PTPRT), as well as chromatin remodeling genes (e.g., ASXL3, MLL2, and ARID2). Deep sequencing of mutations revealed subclones in the majority of metastatic tumors from 13 WGS cases. Validated mutations from 12 out of 13 WGS patients exhibited a predominant UV signature characterized by a high frequency of C->T transitions occurring at the 3′ base of dipyrimidine sequences while one patient (MEL9) with a hypermutator phenotype lacked this signature. Strikingly, a subclonal mutation signature analysis revealed that the founding clone in MEL9 exhibited UV signature but the secondary clone did not, suggesting different mutational mechanisms for two clonal populations from the same tumor. Further analysis of four metastases from different geographic locations in 2 melanoma cases revealed phylogenetic relationships and highlighted the genetic alterations responsible for differential drug resistance among metastatic tumors. Our study suggests that clonal evaluation is crucial for understanding tumor etiology and drug resistance in melanoma.
doi:10.1371/journal.pone.0111153
PMCID: PMC4230926  PMID: 25393105
7.  The genomic landscape of diffuse intrinsic pontine glioma and pediatric non-brainstem high-grade glioma 
Nature genetics  2014;46(5):444-450.
Pediatric high-grade glioma (HGG) is a devastating disease with a two-year survival of less than 20%1. We analyzed 127 pediatric HGGs, including diffuse intrinsic pontine gliomas (DIPGs) and non-brainstem HGGs (NBS-HGGs) by whole genome, whole exome, and/or transcriptome sequencing. We identified recurrent somatic mutations in ACVR1 exclusively in DIPG (32%), in addition to the previously reported frequent somatic mutations in histone H3, TP53 and ATRX in both DIPG and NBS-HGGs2-5. Structural variants generating fusion genes were found in 47% of DIPGs and NBS-HGGs, with recurrent fusions involving the neurotrophin receptor genes NTRK1, 2, or 3 in 40% of NBS-HGGs in infants. Mutations targeting receptor tyrosine kinase/RAS/PI3K signaling, histone modification or chromatin remodeling, and cell cycle regulation were found in 68%, 73% and 59%, respectively, of pediatric HGGs, including DIPGs and NBS-HGGs. This comprehensive analysis provides insights into the unique and shared pathways driving pediatric HGG within and outside the brainstem.
doi:10.1038/ng.2938
PMCID: PMC4056452  PMID: 24705251
8.  Mammalian Y chromosomes retain widely expressed dosage-sensitive regulators 
Nature  2014;508(7497):494-499.
The human X and Y chromosomes evolved from an ordinary pair of autosomes, but millions of years ago genetic decay ravaged the Y chromosome, and only three percent of its ancestral genes survived. We reconstructed the evolution of the Y chromosome across eight mammals to identify biases in gene content and the selective pressures that preserved the surviving ancestral genes. Our findings indicate that survival was non-random, and in two cases, convergent across placental and marsupial mammals. We conclude that the Y chromosome's gene content became specialized through selection to maintain the ancestral dosage of homologous X-Y gene pairs that function as broadly expressed regulators of transcription, translation and protein stability. We propose that beyond its roles in testis determination and spermatogenesis, the Y chromosome is essential for male viability, and plays unappreciated roles in Turner syndrome and in phenotypic differences between the sexes in health and disease.
doi:10.1038/nature13206
PMCID: PMC4139287  PMID: 24759411
9.  Genome of the house fly, Musca domestica L., a global vector of diseases with adaptations to a septic environment 
Genome Biology  2014;15(10):466.
Background
Adult house flies, Musca domestica L., are mechanical vectors of more than 100 devastating diseases that have severe consequences for human and animal health. House fly larvae play a vital role as decomposers of animal wastes, and thus live in intimate association with many animal pathogens.
Results
We have sequenced and analyzed the genome of the house fly using DNA from female flies. The sequenced genome is 691 Mb. Compared with Drosophila melanogaster, the genome contains a rich resource of shared and novel protein coding genes, a significantly higher amount of repetitive elements, and substantial increases in copy number and diversity of both the recognition and effector components of the immune system, consistent with life in a pathogen-rich environment. There are 146 P450 genes, plus 11 pseudogenes, in M. domestica, representing a significant increase relative to D. melanogaster and suggesting the presence of enhanced detoxification in house flies. Relative to D. melanogaster, M. domestica has also evolved an expanded repertoire of chemoreceptors and odorant binding proteins, many associated with gustation.
Conclusions
This represents the first genome sequence of an insect that lives in intimate association with abundant animal pathogens. The house fly genome provides a rich resource for enabling work on innovative methods of insect control, for understanding the mechanisms of insecticide resistance, genetic adaptation to high pathogen loads, and for exploring the basic biology of this important pest. The genome of this species will also serve as a close out-group to Drosophila in comparative genomic studies.
Electronic supplementary material
The online version of this article (doi:10.1186/s13059-014-0466-3) contains supplementary material, which is available to authorized users.
doi:10.1186/s13059-014-0466-3
PMCID: PMC4195910  PMID: 25315136
10.  The Next-Generation Sequencing Revolution and Its Impact on Genomics 
Cell  2013;155(1):27-38.
Genomics is a relatively new scientific discipline, having DNA sequencing as its core technology. As technology has improved the cost and scale of genome characterization over sequencing’s 40-year history, the scope of inquiry has commensurately broadened. Massively parallel sequencing has proven revolutionary, shifting the paradigm of genomics to address biological questions at a genome-wide scale. Sequencing now empowers clinical diagnostics and other aspects of medical care, including disease risk, therapeutic identification, and prenatal testing. This Review explores the current state of genomics in the massively parallel sequencing era.
doi:10.1016/j.cell.2013.09.006
PMCID: PMC3969849  PMID: 24074859
11.  C11orf95-RELA fusions drive oncogenic NF-κB signaling in ependymoma 
Nature  2014;506(7489):451-455.
The nuclear factor-κB (NF-κB) family of transcriptional regulators are central mediators of the cellular inflammatory response. Although constitutive NF-κB signaling is present in most human tumours, mutations in pathway members are rare, complicating efforts to understand and block aberrant NF-κB activity in cancer. Here, we show that more than two thirds of supratentorial ependymomas contain oncogenic fusions between RELA, the principal effector of canonical NF-κB signalling, and an uncharacterized gene, C11orf95. In each case, C11orf95-RELA fusions resulted from chromothripsis involving chromosome 11q13.1. C11orf95-RELA fusion proteins translocated spontaneously to the nucleus to activate NF-κB target genes, and rapidly transformed neural stem cells—the cell of origin of ependymoma—to form these tumours in mice. Our data identify the first highly recurrent genetic alteration of RELA in human cancer, and the C11orf95-RELA fusion protein as a potential therapeutic target in supratentorial ependymoma.
doi:10.1038/nature13109
PMCID: PMC4050669  PMID: 24553141
12.  SciClone: Inferring Clonal Architecture and Tracking the Spatial and Temporal Patterns of Tumor Evolution 
PLoS Computational Biology  2014;10(8):e1003665.
The sensitivity of massively-parallel sequencing has confirmed that most cancers are oligoclonal, with subpopulations of neoplastic cells harboring distinct mutations. A fine resolution view of this clonal architecture provides insight into tumor heterogeneity, evolution, and treatment response, all of which may have clinical implications. Single tumor analysis already contributes to understanding these phenomena. However, cryptic subclones are frequently revealed by additional patient samples (e.g., collected at relapse or following treatment), indicating that accurately characterizing a tumor requires analyzing multiple samples from the same patient. To address this need, we present SciClone, a computational method that identifies the number and genetic composition of subclones by analyzing the variant allele frequencies of somatic mutations. We use it to detect subclones in acute myeloid leukemia and breast cancer samples that, though present at disease onset, are not evident from a single primary tumor sample. By doing so, we can track tumor evolution and identify the spatial origins of cells resisting therapy.
Author Summary
Sequencing the genomic DNA of cancers has revealed that tumors are not homogeneous. As a tumor grows, new mutations accumulate in individual cells, and as these cells replicate, the mutations are passed on to their offspring, which comprise only a portion of the tumor when it is sampled. We present a method for identifying the fraction of cells containing specific mutations, clustering them into subclonal populations, and tracking the changes in these subclones. This allows us to follow the clonal evolution of cancers as they respond to chemotherapy or develop therapy resistance, processes which may radically alter the subclonal composition of a tumor. It also gives us insight into the spatial organization of tumors, and we show that multiple biopsies from a single breast cancer may harbor different subclones that respond differently to treatment. Finally, we show that sequencing multiple samples from a patient's tumor is often critical, as it reveals cryptic subclones that cannot be discerned from only one sample. This is the first tool that can efficiently leverage multiple samples to identify these as distinct subpopulations of cells, thus contributing to understanding the biology of the tumor and influencing clinical decisions about therapy.
doi:10.1371/journal.pcbi.1003665
PMCID: PMC4125065  PMID: 25102416
13.  The landscape of somatic mutations in epigenetic regulators across 1000 pediatric cancer genomes 
Nature communications  2014;5:3630.
Here we sequence 633 genes, encoding the majority of known epigenetic regulatory proteins, in over 1000 pediatric tumors to define the landscape of somatic mutations in epigenetic regulators in pediatric cancer. Our results demonstrate a marked variation in the frequency of gene mutations across 21 different pediatric cancer subtypes, with the highest frequency of mutations detected in high-grade gliomas, T-lineage acute lymphoblastic leukemia, medulloblastoma, and a paucity of mutations in low-grade glioma, and retinoblastoma. The most frequently mutated genes are H3F3A, PHF6, ATRX, KDM6A, SMARCA4, ASXL2, CREBBP, EZH2, MLL2, USP7, ASXL1, NSD2, SETD2, SMC1A, and ZMYM3. Importantly, we identify novel loss-of-function mutations in the ubiquitin-specific-processing protease 7 (USP7) in pediatric leukemia, which result in a decrease in deubiquitination activity. Collectively, our results help to define the landscape of mutations in epigenetic regulatory genes in pediatric cancer and yield a valuable new database for investigating the role of epigenetic dysregulations in cancer.
doi:10.1038/ncomms4630
PMCID: PMC4119022  PMID: 24710217
14.  Integrated Analysis of Germline and Somatic Variants in Ovarian Cancer 
Nature communications  2014;5:3156.
We report the first large-scale exome-wide analysis of the combined germline-somatic landscape in ovarian cancer. Here we analyze germline and somatic alterations in 429 ovarian carcinoma cases and 557 controls. We identify 3,635 high confidence, rare truncation and 22,953 missense variants with predicted functional impact. We find germline truncation variants and large deletions across Fanconi pathway genes in 20% of cases. Enrichment of rare truncations is shown in BRCA1, BRCA2, and PALB2. Additionally, we observe germline truncation variants in genes not previously associated with ovarian cancer susceptibility (NF1, MAP3K4, CDKN2B, and MLL3). Evidence for loss of heterozygosity was found in 100% and 76% of cases with germline BRCA1 and BRCA2 truncations respectively. Germline-somatic interaction analysis combined with extensive bioinformatics annotation identifies 237 candidate functional germline truncation and missense variants, including 2 pathogenic BRCA1 and 1 TP53 deleterious variants. Finally, integrated analyses of germline and somatic variants identify significantly altered pathways, including the Fanconi, MAPK, and MLL pathways.
doi:10.1038/ncomms4156
PMCID: PMC4025965  PMID: 24448499
15.  Recurrent Somatic Structural Variations Contribute to Tumorigenesis in Pediatric Osteosarcoma 
Cell reports  2014;7(1):104-112.
Osteosarcoma is a neoplasm of mesenchymal origin with features of osteogenic differentiation. Patients with recurrent or metastatic disease have a very poor prognosis. To define the landscape of somatic mutations in pediatric osteosarcoma, we performed whole-genome sequencing of DNA from 20 osteosarcoma tumor samples and matched normal tissue (obtained from 19 patients) in the discovery cohort as well as 14 samples from 13 patients in the validation cohort. Our results demonstrate that pediatric osteosarcoma is characterized by multiple somatic chromosomal lesions, including structural variations (SVs) and copy number alterations (CNAs). Moreover, single nucleotide variations (SNVs) exhibit a pattern of localized hypermutation called “kataegis” in 50% of the tumors. Despite these regions of kataegis across the osteosarcoma genomes, we detected relatively few recurrent SNVs, and only when SVs were included did we identify the major pathways that are mutated in osteosarcoma. We identified p53 pathway lesions in all 19 patient’s tumors in the discovery cohort, 9 of which were translocations in the first intron of the TP53 gene, leading to gene inactivation. This mechanism of p53 gene inactivation is unique to osteosarcoma among pediatric cancers. In an additional cohort of 32 patients, TP53 gene alterations were identified in 29 of those tumors. Beyond TP53, the RB1, ATRX and DLG2 genes showed recurrent somatic alterations (SNVs and/or SVs) in 29–53% of the tumors. These data highlight the power of whole-genome sequencing in identifying recurrent somatic alterations in cancer genomes that may be missed using other methods.
doi:10.1016/j.celrep.2014.03.003
PMCID: PMC4096827  PMID: 24703847
16.  Clonal Architecture of Secondary Acute Myeloid Leukemia Defined by Single-Cell Sequencing 
PLoS Genetics  2014;10(7):e1004462.
Next-generation sequencing has been used to infer the clonality of heterogeneous tumor samples. These analyses yield specific predictions—the population frequency of individual clones, their genetic composition, and their evolutionary relationships—which we set out to test by sequencing individual cells from three subjects diagnosed with secondary acute myeloid leukemia, each of whom had been previously characterized by whole genome sequencing of unfractionated tumor samples. Single-cell mutation profiling strongly supported the clonal architecture implied by the analysis of bulk material. In addition, it resolved the clonal assignment of single nucleotide variants that had been initially ambiguous and identified areas of previously unappreciated complexity. Accordingly, we find that many of the key assumptions underlying the analysis of tumor clonality by deep sequencing of unfractionated material are valid. Furthermore, we illustrate a single-cell sequencing strategy for interrogating the clonal relationships among known variants that is cost-effective, scalable, and adaptable to the analysis of both hematopoietic and solid tumors, or any heterogeneous population of cells.
Author Summary
Human cancers are genetically diverse populations of cells that evolve over the course of their natural history or in response to the selective pressure of therapy. In theory, it is possible to infer how this variation is structured into related populations of cells based on the frequency of individual mutations in bulk samples, but the accuracy of these models has not been evaluated across a large number of variants in individual cells. Here, we report a strategy for analyzing hundreds of variants within a single cell, and we apply this method to assess models of tumor clonality derived from bulk samples in three cases of leukemia. The data largely support the predicted population structure, though they suggest specific refinements. This type of approach not only illustrates the biological complexity of human cancer, but it also has the potential to inform patient management. That is, precise knowledge of which variants are present in which populations of cells may allow physicians to more effectively target combinations of mutations and predict how patients will respond to therapy.
doi:10.1371/journal.pgen.1004462
PMCID: PMC4091781  PMID: 25010716
17.  Elephant shark genome provides unique insights into gnathostome evolution 
Nature  2014;505(7482):174-179.
The emergence of jawed vertebrates (gnathostomes) from jawless vertebrates was accompanied by major morphological and physiological innovations, such as hinged jaws, paired fins and immunoglobulin-based adaptive immunity. Gnathostomes subsequently diverged into two groups, the cartilaginous fishes and the bony vertebrates. Here we report the whole-genome analysis of a cartilaginous fish, the elephant shark (Callorhinchus milii). We find that the C. milii genome is the slowest evolving of all known vertebrates, including the ‘living fossil’ coelacanth, and features extensive synteny conservation with tetrapod genomes, making it a good model for comparative analyses of gnathostome genomes. Our functional studies suggest that the lack of genes encoding secreted calcium-binding phosphoproteins in cartilaginous fishes explains the absence of bone in their endoskeleton. Furthermore, the adaptive immune system of cartilaginous fishes is unusual: it lacks the canonical CD4 co-receptor and most transcription factors, cytokines and cytokine receptors related to the CD4 lineage, despite the presence of polymorphic major histocompatibility complex class II molecules. It thus presents a new model for understanding the origin of adaptive immunity.
doi:10.1038/nature12826
PMCID: PMC3964593  PMID: 24402279
18.  Ancestry Estimation and Control of Population Stratification for Sequence-based Association Studies 
Nature genetics  2014;46(4):409-415.
Knowledge of individual ancestry is important for genetic association studies where population structure leads to false positive signals. Estimating individual ancestry with targeted sequence data, which constitutes the bulk of current sequence datasets, is challenging. Here, we propose a new method for accurate estimation of genetic ancestry. Our method skips genotype calling and directly analyzes sequence reads. We validate the method using simulated and empirical data and show that the method can accurately infer worldwide continental ancestry with whole genome shotgun coverage as low as 0.001X. For estimates of fine-scale ancestry within Europe, the method performs well with coverage of 0.1X. At an even finer-scale, the method improves discrimination between exome-sequenced participants originating from different provinces within Finland. Finally, we show that our method can be used to improve case-control matching in genetic association studies and reduce the risk of spurious findings due to population structure.
doi:10.1038/ng.2924
PMCID: PMC4084909  PMID: 24633160
19.  Draft Genome Sequence of Acetobacter aceti Strain 1023, a Vinegar Factory Isolate 
Genome Announcements  2014;2(3):e00550-14.
The genome sequence of Acetobacter aceti 1023, an acetic acid bacterium adapted to traditional vinegar fermentation, comprises 3.0 Mb (chromosome plus plasmids). A. aceti 1023 is closely related to the cocoa fermenter Acetobacter pasteurianus 386B but possesses many additional insertion sequence elements.
doi:10.1128/genomeA.00550-14
PMCID: PMC4047455  PMID: 24903876
20.  DGIdb - Mining the druggable genome 
Nature methods  2013;10(12):10.1038/nmeth.2689.
The Drug-Gene Interaction database (DGIdb) mines existing resources that generate hypotheses about how mutated genes might be targeted therapeutically or prioritized for drug development. It provides an interface for searching lists of genes against a compendium of drug-gene interactions and potentially druggable genes. DGIdb can be accessed at dgidb.org.
doi:10.1038/nmeth.2689
PMCID: PMC3851581  PMID: 24122041
21.  Finished sequence and assembly of the DUF1220-rich 1q21 region using a haploid human genome 
BMC Genomics  2014;15(1):387.
Background
Although the reference human genome sequence was declared finished in 2003, some regions of the genome remain incomplete due to their complex architecture. One such region, 1q21.1-q21.2, is of increasing interest due to its relevance to human disease and evolution. Elucidation of the exact variants behind these associations has been hampered by the repetitive nature of the region and its incomplete assembly. This region also contains 238 of the 270 human DUF1220 protein domains, which are implicated in human brain evolution and neurodevelopment. Additionally, examinations of this protein domain have been challenging due to the incomplete 1q21 build. To address these problems, a single-haplotype hydatidiform mole BAC library (CHORI-17) was used to produce the first complete sequence of the 1q21.1-q21.2 region.
Results
We found and addressed several inaccuracies in the GRCh37sequence of the 1q21 region on large and small scales, including genomic rearrangements and inversions, and incorrect gene copy number estimates and assemblies. The DUF1220-encoding NBPF genes required the most corrections, with 3 genes removed, 2 genes reassigned to the 1p11.2 region, 8 genes requiring assembly corrections for DUF1220 domains (~91 DUF1220 domains were misassigned), and multiple instances of nucleotide changes that reassigned the domain to a different DUF1220 subtype. These corrections resulted in an overall increase in DUF1220 copy number, yielding a haploid total of 289 copies. Approximately 20 of these new DUF1220 copies were the result of a segmental duplication from 1q21.2 to 1p11.2 that included two NBPF genes. Interestingly, this duplication may have been the catalyst for the evolutionarily important human lineage-specific chromosome 1 pericentric inversion.
Conclusions
Through the hydatidiform mole genome sequencing effort, the 1q21.1-q21.2 region is complete and misassemblies involving inter- and intra-region duplications have been resolved. The availability of this single haploid sequence path will aid in the investigation of many genetic diseases linked to 1q21, including several associated with DUF1220 copy number variations. Finally, the corrected sequence identified a recent segmental duplication that added 20 additional DUF1220 copies to the human genome, and may have facilitated the chromosome 1 pericentric inversion that is among the most notable human-specific genomic landmarks.
doi:10.1186/1471-2164-15-387
PMCID: PMC4053653  PMID: 24885025
1q21; DUF1220 domain; Hydatidiform mole
22.  Identification of a Rare Coding Variant in Complement 3 Associated with Age-related Macular Degeneration 
Nature genetics  2013;45(11):10.1038/ng.2758.
Macular degeneration is a common cause of blindness in the elderly. To identify rare coding variants associated with a large increase in risk of age-related macular degeneration (AMD), we sequenced 2,335 cases and 789 controls in 10 candidate loci (57 genes). To increase power, we augmented our control set with ancestry-matched exome sequenced controls. An analysis of coding variation in 2,268 AMD cases and 2,268 ancestry matched controls revealed two large-effect rare variants; previously described R1210C in the CFH gene (fcase = 0.51%, fcontrol = 0.02%, OR = 23.11), and newly identified K155Q in the C3 gene (fcase = 1.06%, fcontrol = 0.39%, OR = 2.68). The variants suggest decreased inhibition of C3 by Factor H, resulting in increased activation of the alternative complement pathway, as a key component of disease biology.
doi:10.1038/ng.2758
PMCID: PMC3812337  PMID: 24036949
23.  A Novel Retinoblastoma Therapy from Genomic and Epigenetic Analyses 
Nature  2012;481(7381):329-334.
SUMMARY
Retinoblastoma is an aggressive childhood cancer of the developing retina that is initiated by the biallelic loss of the RB1 gene. To identify the mutations that cooperate with RB1 loss, we performed whole-genome sequencing of retinoblastomas. The overall mutational rate was very low; RB1 was the only known cancer gene mutated. We then evaluated RB1’s role in genome stability and considered nongenetic mechanisms of cancer pathway deregulation. Here we show that the retinoblastoma genome is stable, but multiple cancer pathways can be epigenetically deregulated. For example, the proto-oncogene SYK is upregulated in retinoblastoma and is required for tumor cell survival. Targeting SYK with a small-molecule inhibitor induced retinoblastoma tumor cell death in vitro and in vivo. Thus, RB1 inactivation may allow preneoplastic cells to acquire multiple hallmarks of cancer through epigenetic mechanisms, resulting directly or indirectly from RB1 loss. These data provide novel targets for chemotherapeutic interventions of retinoblastoma.
doi:10.1038/nature10733
PMCID: PMC3289956  PMID: 22237022
24.  The characterization of the Phlebotomus papatasi transcriptome 
Insect molecular biology  2013;22(2):211-232.
As important vectors of human disease, phlebotomine sand flies are of global significance to human health, transmitting several emerging and re-emerging infectious diseases. The most devastating of the sand fly transmitted infections are the leishmaniases, causing significant mortality and morbidity in both the Old and New World. Here we present the first global transcriptome analysis of the Old World vector of cutaneous leishmaniasis, Phlebotomus papatasi (Scopoli) and compare this transcriptome to that of the New World vector of visceral leishmaniasis, Lutzomyia longipalpis. A normalized cDNA library was constructed using pooled mRNA from Phlebotomus papatasi larvae, pupae, adult males and females sugar fed, adult females blood fed and fed blood infected with Leishmania major. A total of 47,615 generated sequences were cleaned and assembled into 17,120 unique transcripts. Of the assembled sequences, 50% (8,837 sequences) were classified using Gene Ontology (GO) terms. This collection of transcripts is comprehensive, as demonstrated by the high number of different GO categories. An in depth analysis has revealed 245 sequences with putative homology to proteins involved in blood and sugar digestion, immune response and peritrophic matrix formation. Twelve of the novel genes, including one trypsin, two peptidoglycan recognition proteins (PGRP) and nine chymotrypsins have a higher expression level during larval stages. Two novel chymotrypsins and one novel PGRP are abundantly expressed upon blood feeding. This study will greatly improve the available genomic resources for Ph. papatasi and will provide essential information for annotation of the full genome.
doi:10.1111/imb.12015
PMCID: PMC3594503  PMID: 23398403
25.  The western painted turtle genome, a model for the evolution of extreme physiological adaptations in a slowly evolving lineage 
Genome Biology  2013;14(3):R28.
Background
We describe the genome of the western painted turtle, Chrysemys picta bellii, one of the most widespread, abundant, and well-studied turtles. We place the genome into a comparative evolutionary context, and focus on genomic features associated with tooth loss, immune function, longevity, sex differentiation and determination, and the species' physiological capacities to withstand extreme anoxia and tissue freezing.
Results
Our phylogenetic analyses confirm that turtles are the sister group to living archosaurs, and demonstrate an extraordinarily slow rate of sequence evolution in the painted turtle. The ability of the painted turtle to withstand complete anoxia and partial freezing appears to be associated with common vertebrate gene networks, and we identify candidate genes for future functional analyses. Tooth loss shares a common pattern of pseudogenization and degradation of tooth-specific genes with birds, although the rate of accumulation of mutations is much slower in the painted turtle. Genes associated with sex differentiation generally reflect phylogeny rather than convergence in sex determination functionality. Among gene families that demonstrate exceptional expansions or show signatures of strong natural selection, immune function and musculoskeletal patterning genes are consistently over-represented.
Conclusions
Our comparative genomic analyses indicate that common vertebrate regulatory networks, some of which have analogs in human diseases, are often involved in the western painted turtle's extraordinary physiological capacities. As these regulatory pathways are analyzed at the functional level, the painted turtle may offer important insights into the management of a number of human health disorders.
doi:10.1186/gb-2013-14-3-r28
PMCID: PMC4054807  PMID: 23537068
Amniote phylogeny; anoxia tolerance; chelonian; freeze tolerance; genomics; longevity; phylogenomics; physiology; turtle; evolutionary rates

Results 1-25 (134)