To describe the clinical phenotype and identify the molecular basis of disease in a consanguineous family of Palestinian origin with autosomal recessive retinal degeneration.
Eight family members were evaluated with visual acuity and perimetry tests, color fundus photographs, full-field electroretinography, and optical coherence tomography. Cone photoreceptors surrounding the fovea were imaged in 2 members, using adaptive optics scanning laser ophthalmoscopy. Exome was captured using probes and sequenced. Readings were mapped to reference hg19. Variant calls and annotations were performed, using published protocols. Confirmation of variants and segregation analysis was performed using dideoxy sequencing.
Analysis detected 24 037 single-nucleotide variants in one affected family member, of which 3622 were rare and potentially damaging to encoded proteins. Further analysis revealed a novel homozygous nonsense change, c.1381 C>T, p.Gln461X in exon 13 of the CDHR1 gene, which segregated with retinal degeneration in this family. Affected members had night blindness beginning during adolescence with progressive visual acuity and field loss and unmeasurable electroretinographic responses, as well as macular outer retinal loss, although residual cones with increased cone spacing were observed in the youngest individual.
Exome analysis revealed a novel CDHR1 nonsense mutation segregating with progressive retinal degeneration causing severe central vision loss by the fourth decade of life. High-resolution retinal imaging revealed outer retinal changes suggesting that CDHR1 is important for normal photoreceptor structure and survival.
Exome sequencing is a powerful technique that may identify causative genetic variants in families with autosomal recessive retinal degeneration.
Metabolomic profiling offers direct insights into the chemical environment and metabolic pathway activities at sites of human disease. During infection, this environment may receive important contributions from both host and pathogen. Here we apply untargeted metabolomics approach to identify compounds associated with an E. coli urinary tract infection population. Correlative and structural data from minimally processed samples were obtained using an optimized LC-MS platform capable of resolving ∼2300 molecular features. Principal components analysis readily distinguished patient groups and multiple supervised chemometric analyses resolved robust metabolomic shifts between groups. These analyses revealed nine compounds whose provisional structures suggest candidate infection-associated endocrine, catabolic, and lipid pathways. Several of these metabolite signatures may derive from microbial processing of host metabolites. Overall, this study highlights the ability of metabolomic approaches to directly identify compounds encountered by, and produced from, bacterial pathogens within human hosts.
integrated approach; LC-MS; untargeted urine metabolomics; urinary tract infection; Escherichia coli; infectious diseases
Correct diagnosis is pivotal to understand and treat neurological disease. Herein, we report the diagnostic work-up utilizing exome sequencing and the characterization of clinical features and brain MRI in two siblings with a complex, adult-onset phenotype; including peripheral neuropathy, epilepsy, relapsing encephalopathy, bilateral thalamic lesions, type 2 diabetes mellitus, cataract, pigmentary retinopathy and tremor.
We applied clinical and genealogical investigations, homozygosity mapping and exome sequencing to establish the diagnosis and MRI to characterize the cerebral lesions.
A recessive genetic defect was suspected in two siblings of healthy, but consanguineous parents. Homozygosity mapping revealed three shared homozygous regions and exome sequencing, revealed a novel homozygous c.367 G>A [p.Asp123Asn] mutation in the α-methylacyl-coA racemase (AMACR) gene in both patients. The genetic diagnosis of α-methylacyl-coA racemase deficiency was confirmed by demonstrating markedly increased pristanic acid levels in blood (169 μmol/L, normal <1.5 μmol/L). MRI studies showed characteristic degeneration of cerebellar afferents and efferents, including the dentatothalamic tract and thalamic lesions in both patients.
Metabolic diseases presenting late are diagnostically challenging. We show that appropriately applied, homozygosity mapping and exome sequencing can be decisive for establishing diagnoses such as late onset α-methylacyl-coA racemase deficiency, an autosomal recessive peroxisomal disorder with accumulation of pristanic acid. Our study also highlights radiological features that may assist in diagnosis. Early diagnosis is important as patients with this disorder may benefit from restricted dietary phytanic and pristanic acid intake.
AMACR gene; Seizures; Next generation sequencing; Ataxia; Peroxisomal disorders; Metabolic disorders; Tremor; Peripheral neuropathy; Pigmentary retinopathy
Metabolism is vital to every aspect of cell function, yet the metabolome of induced pluripotent stem cells (iPSCs) remains largely unexplored. Here we report, using an untargeted metabolomics approach, that human iPSCs share a pluripotent metabolomic signature with embryonic stem cells (ESCs) that is distinct from their parental cells, and that is characterized by changes in metabolites involved in cellular respiration. Examination of cellular bioenergetics corroborated with our metabolomic analysis, and demonstrated that somatic cells convert from an oxidative state to a glycolytic state in pluripotency. Interestingly, the bioenergetics of various somatic cells correlated with their reprogramming efficiencies. We further identified metabolites that differ between iPSCs and ESCs, which revealed novel metabolic pathways that play a critical role in regulating somatic cell reprogramming. Our findings are the first to globally analyze the metabolome of iPSCs, and provide mechanistic insight into a new layer of regulation involved in inducing pluripotency, and in evaluating iPSC and ESC equivalence.
reprogramming; iPS cells; metabolome; stem cells; metabolism
Metabolism is vital to every aspect of cell function, yet the metabolome of iPSCs remains largely unexplored. Here we report, using an untargeted metabolomics approach, that human iPSCs share a pluripotent metabolomic signature with ESCs that is distinct from their parental cells, and that is characterized by changes in metabolites involved in cellular respiration. Examination of cellular bioenergetics corroborated with our metabolomic analysis, and demonstrated that somatic cells convert from an oxidative state to a glycolytic state in pluripotency. Interestingly, the bioenergetics of various somatic cells correlated with their reprogramming efficiencies. We further identified metabolites that differ between iPSCs and ESCs, which revealed novel metabolic pathways that play a critical role in regulating somatic cell reprogramming. Our findings are the first to globally analyze the metabolome of iPSCs, and provide mechanistic insight into a new layer of regulation involved in inducing pluripotency, and in evaluating iPSC and ESC equivalence.
reprogramming; iPS cells; metabolome; stem cells; metabolism
Mendelian phenotypes in humans vary from benign variants to lethal disorders. Embryonic lethal phenotypes that are similar to what has been known for a long time in mice have remained largely unknown because of the difficulty in arriving at a molecular diagnosis. The purpose of this study is to test whether next generation sequencing can reveal the underlying etiology of recurrent fetal loss.
We hypothesized that exome sequencing combined with autozygome analysis can reveal the underlying mutation in a family in which recurrent fetal loss was likely to be autosomal recessive in origin.
A novel mutation in CHRNA1 was identified. This gene is known to cause multiple pterygium and fetal akinesia syndrome.
This is the first report of exome sequencing to identify the cause of recurrent fetal loss and reveal the diagnosis of a lethal human phenotype. Our results should inspire a systematic examination of the extent of “unborn” Mendelian phenotypes in humans using next-generation sequencing.
embryonic lethal; exome sequencing; hydrops fetalis; recurrent abortion
Ctf18-replication factor C complex including Dscc1 (DNA replication and sister chromatid cohesion 1) is implicated in sister chromatid cohesion, DNA replication, and genome stability in S. cerevisiae and C. elegans. We previously performed gene expression profiling in primary colorectal cancer cells in order to identify novel molecular targets for the treatment of colorectal cancer. A feature of the cancer-associated transcriptional signature revealed from this effort is the elevated expression of the proto-oncogene DSCC1. Here, we have interrogated the molecular basis for deviant expression of human DSCC1 in colorectal cancer and its ability to promote survival of cancer cells. Quantitative PCR and immunohistochemical analyses corroborated that the expression level of DSCC1 is elevated in 60–70% of colorectal tumors compared to their matched noncancerous colonic mucosa. An in silico evaluation of the presumptive DSCC1 promoter region for consensus DNA transcriptional regulatory elements revealed a potential role for the E2F family of DNA-binding proteins in controlling DSCC1 expression. RNAi-mediated reduction of E2F1 reduced expression of DSCC1 in colorectal cancer cells. Gain- and loss-of-function experiments demonstrated that DSCC1 is involved in the viability of cancer cells in response to genotoxic stimuli. We reveal that E2F-dependent expression of DSCC1 confers anti-apoptotic properties in colorectal cancer cells, and that its suppression may be a useful option for the treatment of colorectal cancer.
Dilated cardiomyopathy (DCM) is a heritable, genetically heterogeneous disorder, typically exhibiting autosomal dominant inheritance. Genomic strategies enable discovery of novel, unsuspected molecular underpinnings of familial DCM. We performed genome-wide mapping and exome sequencing in a unique family wherein DCM segregated as an autosomal recessive (AR) trait.
Methods and Results
Echocardiography in 17 adult descendants of first cousins revealed DCM in two female siblings and idiopathic left ventricular enlargement in their brother. Genotyping and linkage analysis mapped an AR DCM locus to chromosome 7q21, which was validated and refined by high-density homozygosity mapping. Exome sequencing of the affected sisters was then employed as a complementary strategy for mutation discovery. An iterative bioinformatics process was used to filter >40,000 genetic variants, revealing a single shared homozygous missense mutation localized to the 7q21 critical region. The mutation, absent in HapMap, 1000Genomes and 474 ethnically matched controls, altered a conserved residue of GATAD1, encoding GATA zinc finger domain-containing protein 1. Thirteen relatives were heterozygous mutation-carriers with no evidence of myocardial disease, even at advanced ages. Immunohistochemistry demonstrated nuclear localization of GATAD1 in left ventricular myocytes, yet subcellular expression and nuclear morphology were aberrant in the proband.
Linkage analysis and exome sequencing were used as synergistic genomic strategies to identify GATAD1 as a gene for AR DCM. GATAD1 binds to a histone modification site that regulates gene expression. Consistent with murine DCM caused by genetic disruption of histone deacetylases, our data implicate an inherited basis for epigenetic dysregulation in human heart failure.
Cardiomyopathy; Genetics; Genomics; Epigenetics; Next generation sequencing
Although acute lymphocytic leukemia (ALL) is the most common childhood cancer, genetic predisposition to ALL remains poorly understood. Whole-exome sequencing was performed in an extended kindred in which five individuals had been diagnosed with leukemia. Analysis revealed a nonsense variant of TP53 which has been previously reported in families with sarcomas and other typical Li Fraumeni syndrome-associated cancers but never in a familial leukemia kindred. This unexpected finding enabled identification of an appropriate sibling bone marrow donor and illustrates that exome sequencing will reveal atypical clinical presentations of even well-studied genes.
exome sequencing; acute lymphocytic leukemia; genetic predisposition to disease; genetic testing
Lung adenocarcinoma, the most common subtype of non-small cell lung cancer, is responsible for over 500,000 deaths per year worldwide. Here, we report exome and genome sequences of 183 lung adenocarcinoma tumor/normal DNA pairs. These analyses revealed a mean exonic somatic mutation rate of 12.0 events/megabase and identified the majority of genes previously reported as significantly mutated in lung adenocarcinoma. In addition, we identified statistically recurrent somatic mutations in the splicing factor gene U2AF1 and truncating mutations affecting RBM10 and ARID1A. Analysis of nucleotide context-specific mutation signatures grouped the sample set into distinct clusters that correlated with smoking history and alterations of reported lung adenocarcinoma genes. Whole genome sequence analysis revealed frequent structural re-arrangements, including in-frame exonic alterations within EGFR and SIK2 kinases. The candidate genes identified in this study are attractive targets for biological characterization and therapeutic targeting of lung adenocarcinoma.
Exome sequencing is emerging as a popular approach to study the effect of rare coding variants on complex phenotypes. The promise of exome sequencing is grounded in theoretical population genetics and in empirical successes of candidate gene sequencing studies. Many projects aimed at common diseases are underway, and their results are eagerly anticipated. In this Perspective, using exome sequencing data from 438 individuals, we discuss several aspects of exome sequencing studies that we view as particularly important. We review processing and quality control of raw sequence data, evaluate the statistical properties of exome sequencing studies, discuss rare variant burden tests to detect association to phenotypes, and demonstrate the importance of accounting for population stratification in the analysis of rare variants. We conclude that enthusiasm for exome sequencing studies of complex traits should be combined with the caution that thousands of samples may be required to reach sufficient statistical power.
Metabolomics has become increasingly popular in the study of disease phenotypes and molecular pathophysiology. One branch of metabolomics that encompasses the high-throughput screening of cellular metabolism is metabolic profiling. In the present study, the metabolic profiles of different tumour cells from colorectal carcinoma and breast adenocarcinoma were exposed to hypoxic and normoxic conditions and these have been compared to reveal the potential metabolic effects of hypoxia on the biochemistry of the tumour cells; this may contribute to their survival in oxygen compromised environments. In an attempt to analyse the complex interactions between metabolites beyond routine univariate and multivariate data analysis methods, correlation analysis has been integrated with a human metabolic reconstruction to reveal connections between pathways that are associated with normoxic or hypoxic oxygen environments.
Correlation analysis has revealed statistically significant connections between metabolites, where differences in correlations between cells exposed to different oxygen levels have been highlighted as markers of hypoxic metabolism in cancer. Network mapping onto reconstructed human metabolic models is a novel addition to correlation analysis. Correlated metabolites have been mapped onto the Edinburgh human metabolic network (EHMN) with the aim of interlinking metabolites found to be regulated in a similar fashion in response to oxygen. This revealed novel pathways within the metabolic network that may be key to tumour cell survival at low oxygen. Results show that the metabolic responses to lowering oxygen availability can be conserved or specific to a particular cell line. Network-based correlation analysis identified conserved metabolites including malate, pyruvate, 2-oxoglutarate, glutamate and fructose-6-phosphate. In this way, this method has revealed metabolites not previously linked, or less well recognised, with respect to hypoxia before. Lactate fermentation is one of the key themes discussed in the field of hypoxia; however, malate, pyruvate, 2-oxoglutarate, glutamate and fructose-6-phosphate, which are connected by a single pathway, may provide a more significant marker of hypoxia in cancer.
Metabolic networks generated for each cell line were compared to identify conserved metabolite pathway responses to low oxygen environments. Furthermore, we believe this methodology will have general application within metabolomics.
Metabolomics; Correlation analysis; Network analysis; Cancer; Hypoxia
The incidence of esophageal adenocarcinoma (EAC) has risen 600% over the last 30 years. With a five-year survival rate of 15%, identification of new therapeutic targets for EAC is greatly important. We analyze the mutation spectra from whole exome sequencing of 149 EAC tumors/normal pairs, 15 of which have also been subjected to whole genome sequencing. We identify a mutational signature defined by a high prevalence of A to C transversions at AA dinucleotides. Statistical analysis of exome data identified significantly mutated 26 genes. Of these genes, four (TP53, CDKN2A, SMAD4, and PIK3CA) have been previously implicated in EAC. The novel significantly mutated genes include chromatin modifying factors and candidate contributors: SPG20, TLR4, ELMO1, and DOCK2. Functional analyses of EAC-derived mutations in ELMO1 reveal increased cellular invasion. Therefore, we suggest a new hypothesis about the potential activation of the RAC1 pathway to be a contributor to EAC tumorigenesis.
Large-scale genomic analysis such as whole-exome and whole-genome sequencing is becoming increasingly prevalent in the research arena. Clinically, many potential uses of this technology have been proposed. One such application is the extension or augmentation of newborn screening. In order to explore this application, we examined data from 3 children with normal newborn screens who underwent whole-exome sequencing as part of research participation. We analyzed sequence information for 151 selected genes associated with conditions ascertained by newborn screening. We compared findings with publicly available databases and results from over 500 individuals who underwent whole-exome sequencing at the same facility. Novel variants were confirmed through bidirectional dideoxynucleotide sequencing. High-density microarrays (Illumina Omni1-Quad) were also performed to detect potential copy number variations affecting these genes. We detected an average of 87 genetic variants per individual. After excluding artifacts, 96% of the variants were found to be reported in public databases and have no evidence of pathogenicity. No variants were identified that would predict disease in the tested individuals, which is in accordance with their normal newborn screens. However, we identified 6 previously reported variants and 2 novel variants that, according to published literature, could result in affected offspring if the reproductive partner were also a mutation carrier; other specific molecular findings highlight additional means by which genomic testing could augment newborn screening.
Exome sequencing; Genomic sequencing; Newborn screening; Whole-exome sequencing
To identify potential tumor suppressor genes, genome-wide data from exome and transcriptome sequencing were combined to search for genes with loss of heterozygosity and allele-specific expression. The analysis was conducted on the breast cancer cell line HCC1954, and a lymphoblast cell line from the same individual, HCC1954BL.
By comparing exome sequences from the two cell lines, we identified loss of heterozygosity events at 403 genes in HCC1954 and at one gene in HCC1954BL. The combination of exome and transcriptome sequence data also revealed 86 and 50 genes with allele specific expression events in HCC1954 and HCC1954BL, which comprise 5.4% and 2.6% of genes surveyed, respectively. Many of these genes identified by loss of heterozygosity and allele-specific expression are known or putative tumor suppressor genes, such as BRCA1, MSH3 and SETX, which participate in DNA repair pathways.
Our results demonstrate that the combined application of high throughput sequencing to exome and allele-specific transcriptome analysis can reveal genes with known tumor suppressor characteristics, and a shortlist of novel candidates for the study of tumor suppressor activities.
To identify the causative gene in an autosomal dominant limb-girdle muscular dystrophy (LGMD) with skeletal muscle vacuoles.
Exome sequencing was used to identify candidate mutations in the studied pedigree. Genome-wide linkage was then used to narrow the list of candidates to a single disease-associated mutation. Additional pedigrees with dominant or sporadic myopathy were screened for mutations in the same gene (DNAJB6) using exome sequencing. Skeletal muscle from affected patients was evaluated with histochemistry and immunohistochemical stains for dystrophy-related proteins, SMI-31, TDP43, and DNAJB6.
Exome analysis in three affected individuals from a family with dominant limb-girdle muscular dystrophy and vacuolar pathology identified novel candidate mutations in 22 genes. Linkage analysis excluded all variants except a Phe93Leu mutation in the G/F domain of the DNAJB6 gene, which resides within the LGMD 1E locus at 7q36. Analysis of exome sequencing data from other pedigrees with dominant myopathy identified a second G/F domain mutation (Pro96Arg) in DNAJB6. Affected muscle showed mild dystrophic changes, vacuoles, and abnormal aggregation of proteins, including TDP-43 and DNAJB6 itself.
Mutations within the G/F domain of DNAJB6 are a novel cause of dominantly-inherited myopathy. DNAJB6 is a member of the HSP40/DNAJ family of molecular co-chaperones tasked with protecting client proteins from irreversible aggregation during protein synthesis or during times of cellular stress. The abnormal accumulation of several proteins in patient muscle, including DNAJB6 itself, suggest that DNAJB6 function is compromised by the identified G/F domain mutations.
Metabolomics is the methodology that identifies and measures global pools of small molecules (of less than about 1,000 Da) of a biological sample, which are collectively called the metabolome. Metabolomics can therefore reveal the metabolic outcome of a genetic or environmental perturbation of a metabolic regulatory network, and thus provide insights into the structure and regulation of that network. Because of the chemical complexity of the metabolome and limitations associated with individual analytical platforms for determining the metabolome, it is currently difficult to capture the complete metabolome of an organism or tissue, which is in contrast to genomics and transcriptomics. This paper describes the analysis of Arabidopsis metabolomics data sets acquired by a consortium that includes five analytical laboratories, bioinformaticists, and biostatisticians, which aims to develop and validate metabolomics as a hypothesis-generating functional genomics tool. The consortium is determining the metabolomes of Arabidopsis T-DNA mutant stocks, grown in standardized controlled environment optimized to minimize environmental impacts on the metabolomes. Metabolomics data were generated with seven analytical platforms, and the combined data is being provided to the research community to formulate initial hypotheses about genes of unknown function (GUFs). A public database (www.PlantMetabolomics.org) has been developed to provide the scientific community with access to the data along with tools to allow for its interactive analysis. Exemplary datasets are discussed to validate the approach, which illustrate how initial hypotheses can be generated from the consortium-produced metabolomics data, integrated with prior knowledge to provide a testable hypothesis concerning the functionality of GUFs.
Arabidopsis; metabolomics; gene annotation; functional genomics; database
Structural rearrangements form a major class of somatic variation in cancer genomes. Local chromosome shattering, termed chromothripsis, is a mechanism proposed to be the cause of clustered chromosomal rearrangements and was recently described to occur in a small percentage of tumors. The significance of these clusters for tumor development or metastatic spread is largely unclear.
We used genome-wide long mate-pair sequencing and SNP array profiling to reveal that chromothripsis is a widespread phenomenon in primary colorectal cancer and metastases. We find large and small chromothripsis events in nearly every colorectal tumor sample and show that several breakpoints of chromothripsis clusters and isolated rearrangements affect cancer genes, including NOTCH2, EXO1 and MLL3. We complemented the structural variation studies by sequencing the coding regions of a cancer exome in all colorectal tumor samples and found somatic mutations in 24 genes, including APC, KRAS, SMAD4 and PIK3CA. A pairwise comparison of somatic variations in primary and metastatic samples indicated that many chromothripsis clusters, isolated rearrangements and point mutations are exclusively present in either the primary tumor or the metastasis and may affect cancer genes in a lesion-specific manner.
We conclude that chromothripsis is a prevalent mechanism driving structural rearrangements in colorectal cancer and show that a complex interplay between point mutations, simple copy number changes and chromothripsis events drive colorectal tumor development and metastasis.
Genomic technologies, such as whole-exome sequencing, are a powerful tool in genetic research. Such testing yields a great deal of incidental medical information, or medical information not related to the primary research target. We describe the management of incidental medical information derived from whole-exome sequencing in the research context. We performed whole-exome sequencing on a monozygotic twin pair in which only 1 child was affected with congenital anomalies and applied an institutional review board–approved algorithm to determine what genetic information would be returned. Whole-exome sequencing identified 79 525 genetic variants in the twins. Here, we focus on novel variants. After filtering artifacts and excluding known single nucleotide polymorphisms and variants not predicted to be pathogenic, the twins had 32 novel variants in 32 genes that were felt to be likely to be associated with human disease. Eighteen of these novel variants were associated with recessive disease and 18 were associated with dominantly manifesting conditions (variants in some genes were potentially associated with both recessive and dominant conditions), but only 1 variant ultimately met our institutional review board–approved criteria for return of information to the research participants.
whole-exome sequencing; incidental medical information
Exome sequencing of human breast cancers has revealed a substantial number of candidate cancer genes with recurring but infrequent somatic mutations. To determine more accurately their mutation prevalence, we performed a mutation analysis of 36 novel candidate cancer genes in 96 human breast cancers. Somatic mutations with potential impact on protein function were observed in the genes ADAM12, CENTB1, CENTG1, DIP2C, GLI1, GRIN2D, HDLBP, IKBKB, KPNA5, NFKB1, NOTCH1, and OTOF. These findings strengthen the evidence for involvement of the Notch, Hedgehog, NF-KB, and PIK3CA pathways in breast cancer development, and point to novel processes that likely are involved.
Like other solid tumors, colorectal cancer (CRC) is a genomic disorder in which various types of genomic alterations, such as point mutations, genomic rearrangements, gene fusions, or chromosomal copy number alterations, can contribute to the initiation and progression of the disease. The advent of a new DNA sequencing technology known as next-generation sequencing (NGS) has revolutionized the speed and throughput of cataloguing such cancer-related genomic alterations. Now the challenge is how to exploit this advanced technology to better understand the underlying molecular mechanism of colorectal carcinogenesis and to identify clinically relevant genetic biomarkers for diagnosis and personalized therapeutics. In this review, we will introduce NGS-based cancer genomics studies focusing on those of CRC, including a recent large-scale report from the Cancer Genome Atlas. We will mainly discuss how NGS-based exome-, whole genome- and methylome-sequencing have extended our understanding of colorectal carcinogenesis. We will also introduce the unique genomic features of CRC discovered by NGS technologies, such as the relationship with bacterial pathogens and the massive genomic rearrangements of chromothripsis. Finally, we will discuss the necessary steps prior to development of a clinical application of NGS-related findings for the advanced management of patients with CRC.
Next-generation sequencing; Cancer genomics; Colorectal cancers; Personalized medicine; The cancer genome atlas
Lung cancer has become the top killer among malignant tumors in China and is significantly associated with somatic genetic alterations. We performed exome sequencing of 14 non–small cell lung carcinomas (NSCLCs) with matched adjacent normal lung tissues extracted from Chinese patients. In addition to the lung cancer–related genes (TP53, EGFR, KRAS, PIK3CA, and ROS1), this study revealed “novel” genes not previously implicated in NSCLC. Especially, matrix-remodeling associated 5 was the second most frequently mutated gene in NSCLC (first is TP53). Subsequent Sanger sequencing of matrix-remodeling associated 5 in an additional sample set consisting of 52 paired tumor-normal DNA samples revealed that 15% of Chinese NSCLCs contained somatic mutations in matrix-remodeling associated 5. These findings, together with the results from pathway analysis, strongly indicate that altered extracellular matrix-remodeling may be involved in the etiology of NSCLC.
Metabolite profiles can be used for identifying molecular signatures and mechanisms underlying diseases since they reflect the outcome of complex upstream genomic, transcriptomic, proteomic and environmental events. The scarcity of publicly accessible large scale metabolome datasets related to human disease has been a major obstacle for assessing the potential of metabolites as biomarkers as well as understanding the molecular events underlying disease-related metabolic changes. The availability of metabolite and gene expression profiles for the NCI-60 cell lines offers the possibility of identifying significant metabolome and transcriptome features and discovering unique molecular processes related to different cancer types.
We utilized a combination of analytical methods in the R statistical package to evaluate metabolic features associated with cancer cell lines from different tissue origins, identify metabolite-gene correlations and detect outliers cell lines based on metabolome and transcriptome data. Statistical analysis results are integrated with metabolic pathway annotations as well as COSMIC and Tumorscape databases to explore associated molecular mechanisms.
Our analysis reveals that although the NCI-60 metabolome dataset is quite noisy comparing with microarray-based transcriptome data, it does contain tissue origin specific signatures. We also identified biologically meaningful gene-metabolite associations. Most remarkably, several abnormal gene-metabolite relationships identified by our approach can be directly linked to known gene mutations and copy number variations in the corresponding cell lines.
Our results suggest that integrative metabolome and transcriptome analysis is a powerful method for understanding molecular machinery underlying various pathophysiological processes. We expect the availability of large scale metabolome data in the coming years will significantly promote the discovery of novel biomarkers, which will in turn improve the understanding of molecular mechanism underlying diseases.
Lung cancer is the leading cause of cancer-related death, with non-small cell lung cancer (NSCLC) being the predominant form of the disease. Most lung cancer is caused by the accumulation of genomic alterations due to tobacco exposure. To uncover its mutational landscape, we performed whole-exome sequencing in 31 NSCLCs and their matched normal tissue samples. We identified both common and unique mutation spectra and pathway activation in lung adenocarcinomas and squamous cell carcinomas, two major histologies in NSCLC. In addition to identifying previously known lung cancer genes (TP53, KRAS, EGFR, CDKN2A and RB1), the analysis revealed many genes not previously implicated in this malignancy. Notably, a novel gene CSMD3 was identified as the second most frequently mutated gene (next to TP53) in lung cancer. We further demonstrated that loss of CSMD3 results in increased proliferation of airway epithelial cells. The study provides unprecedented insights into mutational processes, cellular pathways and gene networks associated with lung cancer. Of potential immediate clinical relevance, several highly mutated genes identified in our study are promising druggable targets in cancer therapy including ALK, CTNNA3, DCC, MLL3, PCDHIIX, PIK3C2B, PIK3CG and ROCK2.
The joint sequencing of related genomes has become an important means to discover rare variants. Normal-tumor genome pairs are routinely sequenced together to find somatic mutations and their associations with different cancers. Parental and sibling genomes reveal de novo germline mutations and inheritance patterns related to Mendelian diseases.
Acute lymphoblastic leukemia (ALL) is the most common paediatric cancer and the leading cause of cancer-related death among children. With the aim of uncovering the full spectrum of germline and somatic genetic alterations in childhood ALL genomes, we conducted whole-exome re-sequencing on a unique cohort of over 120 exomes of childhood ALL quartets, each comprising a patient's tumor and matched-normal material, and DNA from both parents. We developed a general probabilistic model for such quartet sequencing reads mapped to the reference human genome. The model is used to infer joint genotypes at homologous loci across a normal-tumor genome pair and two parental genomes.
We describe the algorithms and data structures for genotype inference, model parameter training. We implemented the methods in an open-source software package (QUADGT) that uses the standard file formats of the 1000 Genomes Project. Our method's utility is illustrated on quartets from the ALL cohort.