We describe an approach for targeted genome resequencing, called oligonucleotide-selective sequencing (OS-Seq), in which we modify the immobilized lawn of oligonucleotide primers of a next-generation DNA sequencer to function as both a capture and sequencing substrate. We apply OS-Seq to resequence the exons of either 10 or 344 cancer genes from human DNA samples. In our assessment of capture performance, >87% of the captured sequence originated from the intended target region with sequencing coverage falling within a tenfold range for a majority of all targets. Single nucleotide variants (SNVs) called from OS-Seq data agreed with >95% of variants obtained from whole-genome sequencing of the same individual. We also demonstrate mutation discovery from a colorectal cancer tumor sample matched with normal tissue. Overall, we show the robust performance and utility of OS-Seq for the resequencing analysis of human germline and cancer genomes.
Human induced pluripotent stem cells (hiPSCs1–3) are useful in disease modeling and drug discovery, and they promise to provide a new generation of cell-based therapeutics. To date there has been no systematic evaluation of the most widely used techniques for generating integration-free hiPSCs. Here we compare Sendai-viral (SeV)4, episomal (Epi)5 and mRNA transfection mRNA6 methods using a number of criteria. All methods generated high-quality hiPSCs, but significant differences existed in aneuploidy rates, reprogramming efficiency, reliability and workload. We discuss the advantages and shortcomings of each approach, and present and review the results of a survey of a large number of human reprogramming laboratories on their independent experiences and preferences. Our analysis provides a valuable resource to inform the use of specific reprogramming methods for different laboratories and different applications, including clinical translation.
The discovery of an efficient mechanism of homologous recombination between two linear DNA substrates enables direct cloning of large genomic sequences.
Genomic sequences contain rich evolutionary information about functional constraints on macromolecules such as proteins. This information can be efficiently mined to detect evolutionary couplings between residues in proteins and address the long-standing challenge to compute protein three-dimensional structures from amino acid sequences. Substantial progress has recently been made on this problem owing to the explosive growth in available sequences and the application of global statistical methods. In addition to three-dimensional structure, the improved understanding of covariation may help identify functional residues involved in ligand binding, protein-complex formation and conformational changes. We expect computation of covariation patterns to complement experimental structural biology in elucidating the full spectrum of protein structures, their functional interactions and evolutionary dynamics.
Nanopore sequencing of DNA is a single-molecule technique that may achieve long reads, low cost and high speed with minimal sample preparation and instrumentation. Here, we build on recent progress with respect to nanopore resolution and DNA control to interpret the procession of ion current levels observed during the translocation of DNA through the pore MspA. As approximately four nucleotides affect the ion current of each level, we measured the ion current corresponding to all 256 four-nucleotide combinations (quadromers). This quadromer map is highly predictive of ion current levels of previously unmeasured sequences derived from the bacteriophage phi X 174 genome. Furthermore, we show nanopore sequencing reads of phi X 174 up to 4,500 bases in length that can be unambiguously aligned to the phi X 174 reference genome, and demonstrate proof-of-concept utility with respect to hybrid genome assembly and polymorphism detection. This work provides the foundation for nanopore sequencing of long, complex, natural DNA strands.
Bacterial type II CRISPR-Cas9 systems have been widely adapted for RNA- guided genome editing and transcription regulation in eukaryotic cells, yet their in vivo target specificity is poorly understood. Here we mapped genome-wide binding sites of a catalytically inactive Cas9 (dCas9) from Streptococcus pyogenes loaded with single guide RNAs (sgRNAs) in mouse embryonic stem cells (mESCs). Each of the four sgRNAs tested targets dCas9 to tens to thousands of genomic sites, characterized by a 5-nucleotide seed region in the sgRNA, in addition to an NGG protospacer adjacent motif (PAM). Chromatin inaccessibility prevents dCas9 binding to other sites with matching seed sequences, and consequently 70% of off-target sites are associated with genes. Targeted sequencing of 295 dCas9 binding sites in mESCs transfected with catalytically active Cas9 identified only one site mutated above background. We propose a two-state model for Cas9 binding and cleavage, in which a seed match triggers binding but extensive pairing with target DNA is required for cleavage.
Molecular profiling of tumors promises to advance the clinical management of cancer, but the benefits of integrating molecular data with traditional clinical variables have not been systematically studied. Here we retrospectively predict patient survival using diverse molecular data (somatic copy-number alteration, DNA methylation and mRNA, miRNA and protein expression) from 953 samples of four cancer types from The Cancer Genome Atlas project. We found that incorporating molecular data with clinical variables yielded statistically significantly improved predictions (FDR < 0.05) for three cancers but those quantitative gains were limited (2.2–23.9%). Additional analyses revealed little predictive power across tumor types except for one case. In clinically relevant genes, we identified 10,281 somatic alterations across 12 cancer types in 2,928 of 3,277 patients (89.4%), many of which would not be revealed in single-tumor analyses. Our study provides a starting point and resources, including an open-access model evaluation platform, for building reliable prognostic and therapeutic strategies that incorporate molecular data.
The domestication of citrus, is poorly understood. Cultivated types are selections from, or hybrids of, wild progenitor species, whose identities and contributions remain controversial. By comparative analysis of a collection of citrus genomes, including a high quality haploid reference, we show that cultivated types were derived from two progenitor species. Though cultivated pummelos represent selections from a single progenitor species, C. maxima, cultivated mandarins are introgressions of C. maxima into the ancestral mandarin species, C. reticulata. The most widely cultivated citrus, sweet orange, is the offspring of previously admixed individuals, but sour orange is an F1 hybrid of pure C. maxima and C. reticulata parents, implying that wild mandarins were part of the early breeding germplasm. A wild “mandarin” from China exhibited substantial divergence from C. reticulata, suggesting the possibility of other unrecognized wild citrus species. Understanding citrus phylogeny through genome analysis clarifies taxonomic relationships and enables sequence-directed genetic improvement.
Systematic modification of the backbone of bioactive polypeptides through β-amino acid residue incorporation could provide a strategy for generating molecules with improved drug properties, but such alterations can result in lower receptor affinity and potency. Using an agonist of parathyroid hormone receptor-1 (PTHR1), a G protein–coupled receptor in the B-family, we present an approach for α→β residue replacement that enables both high activity and improved pharmacokinetic properties in vivo.
Adoptive T cell therapy can target and kill widespread malignant cells thereby inducing durable clinical responses in melanoma and selected other malignances. However, many commonly targeted tumor antigens are also expressed by healthy tissues, and T cells do not distinguish between benign and malignant tissues if both express the target antigen. As such, autoimmune toxicity from T-cell-mediated destruction of normal tissue has limited the development and adoption of this otherwise promising type of cancer therapy. A review of the unique biology of T-cell therapy and of recent clinical experience compels a reassessment of target antigens that traditionally have been viewed from the perspective of weaker immunotherapeutic modalities. In selecting target antigens for adoptive T-cell therapy, expression by tumors and not by essential healthy tissues is of paramount importance. The risk of autoimmune adverse events can be further mitigated by generating antigen receptors using strategies that reduce the chance of cross-reactivity against epitopes in unintended targets. In general, a circumspect approach to target selection and thoughtful preclinical and clinical studies are pivotal to the ongoing advancement of these promising treatments.
Research on converting one cell type to another will be aided by
systematic mapping of the gene-regulatory networks in mammalian cells.
Genome editing by Cas9, which cleaves double-stranded DNA at a sequence programmed by a short single-guide RNA (sgRNA), can result in off-target DNA modification that may be detrimental in some applications. To improve DNA cleavage specificity, we generated fusions of catalytically inactive Cas9 and FokI nuclease (fCas9). DNA cleavage by fCas9 requires association of two fCas9 monomers that simultaneously bind target sites ~15 or 25 base pairs apart. In human cells, fCas9 modified target DNA sites with >140-fold higher specificity than wild-type Cas9 and with an efficiency similar to that of paired Cas9 ‘nickases’, recently engineered variants that cleave only one DNA strand per monomer. The specificity of fCas9 was at least 4-fold higher_than that of paired nickases at loci with highly similar off-target sites. Target sites that conform to the substrate requirements of fCas9 occur on average every 34 bp in the human genome, suggesting the broad versatility of this approach for highly specific genome-wide editing.
Combining genotyping and the data locked in medical records yields a
large number of known genotype-phenotype associations.
Monomeric CRISPR-Cas9 nucleases are widely used for targeted genome editing but can induce unwanted off-target mutations with high frequencies. Here we describe dimeric RNA-guided FokI Nucleases (RFNs) that recognize extended sequences and can edit endogenous genes with high efficiencies in human cells. The cleavage activity of an RFN depends strictly on the binding of two guide RNAs (gRNAs) to DNA with a defined spacing and orientation and therefore show improved specificities relative to wild-type Cas9 monomers. Importantly, direct comparisons show that RFNs guided by a single gRNA generally induce lower levels of unwanted mutations than matched monomeric Cas9 nickases. In addition, we describe a simple method for expressing multiple gRNAs bearing any 5′ end nucleotide, which gives dimeric RFNs a broad targeting range. RFNs combine the ease of RNA-based targeting with the specificity enhancement inherent to dimerization and are likely to be useful in applications that require highly precise genome editing.
Efforts to derive hematopoietic stem cells (HSCs) from human pluripotent stem cells (hPSCs) are complicated by the fact that embryonic hematopoiesis consists of two programs, primitive and definitive, that differ in developmental potential. As only definitive hematopoiesis generates HSCs, understanding how this program develops is essential for being able to produce this cell population in vitro. Here we show that both hematopoietic programs transition through hemogenic endothelial intermediates and develop from KDR+CD34−CD144− progenitors that are distinguished by CD235a expression. Generation of primitive progenitors (KDR+CD235a+) depends on stage-specific Activin-nodal signaling and inhibition of the Wnt-β-catenin pathway, whereas specification of definitive progenitors (KDR+CD235a−) requires Wnt-β-catenin signaling during this same time frame. Together, these findings establish simple selective differentiation strategies for the generation of primitive or definitive hematopoietic progenitors via Wnt-β-catenin manipulation, and in doing so provide access to enriched populations for future studies on hPSC-derived hematopoietic development.
embryonic stem cell; pluripotent; hematopoiesis; KDR; CD235a; hemogenic endothelium; definitive hematopoiesis; Wnt
High-throughput RNA sequencing (RNA-seq) dramatically expands the potential for novel genomics discoveries, but the wide variety of platforms, protocols and performance has created the need for comprehensive reference data. Here we describe the Association of Biomolecular Resource Facilities next-generation sequencing (ABRF-NGS) study on RNA-seq. We tested replicate experiments across 15 laboratory sites using reference RNA standards to test four protocols (polyA-selected, ribo-depleted, size-selected and degraded) on five sequencing platforms (Illumina HiSeq, Life Technologies’ PGM and Proton, Pacific Biosciences RS and Roche’s 454). The results show high intra-platform and inter-platform concordance for expression measures across the deep-count platforms, but highly variable efficiency and cost for splice junction and variant detection between all platforms. These data also demonstrate that ribosomal RNA depletion can both enable effective analysis of degraded RNA samples and be readily compared to polyA-enriched fractions. This study provides a broad foundation for cross-platform standardization, evaluation and improvement of RNA-seq.
Rapid advances in high-throughput sequencing facilitate variant discovery and genotyping, but linking variants into a single haplotype remains challenging. Here we demonstrate HaploSeq, an approach for assembling chromosome-scale haplotypes by exploiting the existence of ‘chromosome territories’. We use proximity ligation and sequencing to show that alleles on homologous chromosomes occupy distinct territories, and therefore this experimental protocol preferentially recovers physically linked DNA variants on a homolog. Computational analysis of such data sets allows for accurate (~99.5%) reconstruction of chromosome-spanning haplotypes for ~95% of alleles in hybrid mouse cells with 30× sequencing coverage. To resolve haplotypes for a human genome, which has a low density of variants, we coupled HaploSeq with local conditional phasing to obtain haplotypes for ~81% of alleles with ~98% accuracy from just 17× sequencing. Whereas methods based on proximity ligation were originally designed to investigate spatial organization of genomes, our results lend support for their use as a general tool for haplotyping.
RNA-seq facilitates unbiased genome-wide gene-expression profiling. However, its concordance with the well-established microarray platform must be rigorously assessed for confident uses in clinical and regulatory application. Here we use a comprehensive study design to generate Illumina RNA-seq and Affymetrix microarray data from the same set of liver samples of rats under varying degrees of perturbation by 27 chemicals representing multiple modes of action (MOA). The cross-platform concordance in terms of differentially expressed genes (DEGs) or enriched pathways is highly correlated with treatment effect size, gene-expression abundance and the biological complexity of the MOA. RNA-seq outperforms microarray (90% versus 76%) in DEG verification by quantitative PCR and the main gain is its improved accuracy for low expressed genes. Nonetheless, predictive classifiers derived from both platforms performed similarly. Therefore, the endpoint studied and its biological complexity, transcript abundance, and intended application are important factors in transcriptomic research and for decision-making.
We introduce Sailfish, a computational method for quantifying the abundance of previously annotated RNA isoforms from RNA-seq data. Because Sailfish entirely avoids mapping reads, a time-consuming step in all current methods, it provides quantification estimates much faster than do existing approaches (typically 20 times faster) without loss of accuracy. By facilitating frequent reanalysis of data and reducing the need to optimize parameters, Sailfish exemplifies the potential of lightweight algorithms for efficiently processing sequencing reads.
Comprehensive analyses of cancer genomes promise to inform prognoses and precise cancer treatments. A major barrier, however, is inaccessibility of metastatic tissue. A potential solution is to characterize circulating tumor cells (CTCs), but this requires overcoming the challenges of isolating rare cells and sequencing low-input material. Here we report an integrated process to isolate, qualify and sequence whole exomes of CTCs with high fidelity, using a census-based sequencing strategy. Power calculations suggest that mapping of >99.995% of the standard exome is possible in CTCs. We validated our process in two prostate cancer patients including one for whom we sequenced CTCs, a lymph node metastasis and nine cores of the primary tumor. Fifty-one of 73 CTC mutations (70%) were observed in matched tissue. Moreover, we identified 10 early-trunk and 56 metastatic-trunk mutations in the non-CTC tumor samples and found 90% and 73% of these, respectively, in CTC exomes. This study establishes a foundation for CTC genomics in the clinic.
Identifying the proteins synthesized in defined cells at specific times in an animal will facilitate the study of cellular functions and dynamic processes. Here we introduce stochastic orthogonal recoding of translation with chemoselective modification (SORT-M) to address this challenge. SORT-M involves modifying cells to express an orthogonal aminoacyl-tRNA synthetase/tRNA pair to enable the incorporation of chemically modifiable analogs of amino acids at diverse sense codons in cells in rich media. We apply SORT-M to Drosophila melanogaster fed standard food to label and image proteins in specific tissues at precise developmental stages with diverse chemistries, including cyclopropene-tetrazine inverse electron demand Diels-Alder cycloaddition reactions. We also use SORT-M to identify proteins synthesized in germ cells of the fly ovary without dissection. SORT-M will facilitate the definition of proteins synthesized in specific sets of cells to study development, and learning and memory in flies, and may be extended to other animals.