Search tips
Search criteria

Results 1-25 (31)

Clipboard (0)

Select a Filter Below

Year of Publication
Document Types
1.  Global diversity, population stratification, and selection of human copy number variation 
Science (New York, N.Y.)  2015;349(6253):aab3761.
In order to explore the diversity and selective signatures of duplication and deletion human copy number variants (CNVs), we sequenced 236 individuals from 125 distinct human populations. We observed that duplications exhibit fundamentally different population genetic and selective signatures than deletions and are more likely to be stratified between human populations. Through reconstruction of the ancestral human genome, we identify megabases of DNA lost in different human lineages and pinpoint large duplications that introgressed from the extinct Denisova lineage now found at high frequency exclusively in Oceanic populations. We find that the proportion of CNV base pairs to single nucleotide variant base pairs is greater among non-Africans than it is among African populations, but we conclude that this difference is likely due to unique aspects of non-African population history as opposed to differences in CNV load.
PMCID: PMC4568308  PMID: 26249230
2.  Disruptive CHD8 mutations define a subtype of autism early in development 
Cell  2014;158(2):263-276.
Autism spectrum disorder (ASD) is a heterogeneous disease where efforts to define subtypes behaviorally have met with limited success. Hypothesizing that genetically based subtype identification may prove more productive, we resequenced the ASD-associated gene CHD8 in 3,730 children with developmental delay or ASD. We identified a total of 15 independent mutations; no truncating events were identified in 8,792 controls, including 2,289 unaffected siblings. In addition to a high likelihood of an ASD diagnosis among patients bearing CHD8 mutations, characteristics enriched in this group included macrocephaly, distinct faces, and gastrointestinal complaints. chd8 disruption in zebrafish recapitulates features of the human phenotype, including increased head size as a result of expansion of the forebrain/midbrain and impairment of gastrointestinal motility due to a reduction in post-mitotic enteric neurons. Our findings indicate that CHD8 disruptions define a distinct ASD subtype and reveal unexpected comorbidities between brain development and enteric innervation.
PMCID: PMC4136921  PMID: 24998929
Autism spectrum disorder; autism subtypes; dysmorphology; macrocephaly; gastrointestinal defect; zebrafish modeling; enteric neurons; forebrain/midbrain expansion
3.  Refining analyses of copy number variation identifies specific genes associated with developmental delay 
Nature genetics  2014;46(10):1063-1071.
Copy number variants (CNVs) are associated with many neurocognitive disorders; however, these events are typically large and the underlying causative gene is unclear. We created an expanded CNV morbidity map from 29,085 children with developmental delay versus 19,584 healthy controls, identifying 70 significant CNVs. We resequenced 26 candidate genes in 4,716 additional cases with developmental delay or autism and 2,193 controls. An integrated analysis of CNV and single-nucleotide variant (SNV) data pinpointed ten genes enriched for putative loss of function. Patient follow-up on a subset identified new clinical subtypes of pediatric disease and the genes responsible for disease-associated CNVs. This includes haploinsufficiency of SETBP1 associated with intellectual disability and loss of expressive language and truncations of ZMYND11 in patients with autism, aggression and complex neuropsychiatric features. This combined CNV and SNV approach facilitates the rapid discovery of new syndromes and neuropsychiatric disease genes despite extensive genetic heterogeneity.
PMCID: PMC4177294  PMID: 25217958
4.  The Transcriptional Regulator ADNP Links the BAF (SWI/SNF) Complexes With Autism 
Mutations in ADNP were recently identified as a frequent cause of syndromic autism, characterized by deficits in social communication and interaction and restricted, repetitive behavioral patterns. Based on its functional domains, ADNP is a presumed transcription factor. The gene interacts closely with the SWI/SNF complex by direct and experimentally verified binding of its C-terminus to three of its core components. A detailed and systematic clinical assessment of the symptoms observed in our patients allows a detailed comparison with the symptoms observed in other SWI/SNF disorders. While the mutational mechanism of the first 10 patients identified suggested a gain of function mechanism, an 11th patient reported here is predicted haploinsufficient. The latter observation may raise hope for therapy, as addition of NAP, a neuroprotective octapeptide named after the first three amino acids of the sequence NAPVSPIQ, has been reported by others to ameliorate some of the cognitive abnormalities observed in a knockout mouse model. It is concluded that detailed clinical and molecular studies on larger cohorts of patients are necessary to establish a better insight in the genotype phenotype correlation and in the mutational mechanism.
PMCID: PMC4195434  PMID: 25169753
autism; SWI/SNF; BAF complexes; ADNP
5.  A SWI/SNF related autism syndrome caused by de novo mutations in ADNP 
Nature genetics  2014;46(4):380-384.
Despite a high heritability, a genetic diagnosis can only be established in a minority of patients with autism spectrum disorder (ASD), characterized by persistent deficits in social communication and interaction and restricted, repetitive patterns of behavior, interests or activities1. Known genetic causes include chromosomal aberrations, such as the duplication of the 15q11-13 region, and monogenic causes, such as the Rett and Fragile X syndromes. The genetic heterogeneity within ASD is striking, with even the most frequent causes responsible for only 1% of cases at the most. Even with the recent developments in next generation sequencing, for the large majority of cases no molecular diagnosis can be established 2-7. Here, we report 10 patients with ASD and other shared clinical characteristics, including intellectual disability and facial dysmorphisms caused by a mutation in ADNP, a transcription factor involved in the SWI/SNF remodeling complex. We estimate this gene to be mutated in at least 0.17% of ASD cases, making it one of the most frequent ASD genes known to date.
PMCID: PMC3990853  PMID: 24531329
6.  EZH2 promotes E2F driven SCLC tumorigenesis through modulation of apoptosis and cell cycle regulation 
While EZH2 has been associated with both non small cell and small cell lung cancers, current observations suggest different mechanisms of EZH2 activation and overexpression in these lung cancer types. Globally, small cell lung cancer (SCLC) kills 200,000 people yearly. New clinical approaches for SCLC treatment are required to improve the poor survival rate. Given the therapeutic potential of EZH2 as a target, we sought to delineate the downstream consequences of EZH2 disruption to identify the cellular mechanisms by which EZH2 promotes tumorigenesis in SCLC.
We generated cells with stable expression of shRNA targeting EZH2 and corresponding controls (pLKO.1) and determined the consequences of EZH2 knockdown on the cell cycle and apoptosis by means of propidium iodide staining and fluorescence activated cell sorting, western blot, qRT-PCR as well as cell viability assessment using MTT assays.
We discovered that EZH2 inhibition 1) increased apoptotic activity by up-regulating the pro-apoptotic factors Puma and Bad, 2) decreased the fraction of cells in S or G2/M phases, and 3) elevated p21 protein levels, implicating EZH2 in cell death and cell cycle control in SCLC.
Our findings present evidence for the role of EZH2 in the regulation of cell cycle and apoptosis, providing a biological mechanism to explain the tumorigenicity of EZH2 in SCLC. Our work points to the great potential of EZH2 as a therapeutic target in SCLC.
PMCID: PMC3713495  PMID: 23857401
SCLC; EZH2; oncogene; RB1; E2F
7.  The Genetic Variability and Commonality of Neurodevelopmental Disease 
Despite detailed clinical definition and refinement of neurodevelopmental disorders and neuropsychiatric conditions, the underlying genetic etiology has proved elusive. Recent genetic studies have revealed some common themes: considerable locus heterogeneity, variable expressivity for the same mutation, and a role for multiple disruptive events in the same individual affecting genes in common pathways. Recurrent copy number variation (CNV), in particular, has emphasized the importance of either de novo or essentially private mutations creating imbalances for multiple genes. CNVs have foreshadowed a model where the distinction between milder neuropsychiatric conditions from those of severe developmental impairment may be a consequence of increased mutational burden affecting more genes.
PMCID: PMC4114147  PMID: 22499536
copy number variants; variable penetrance; genomic disorders; autism; schizophrenia; intellectual disability
8.  Genome structural variation discovery and genotyping 
Nature reviews. Genetics  2011;12(5):363-376.
Comparisons of human genomes show that more base pairs are altered as a result of structural variation — including copy number variation — than as a result of point mutations. Here we review advances and challenges in the discovery and genotyping of structural variation. The recent application of massively parallel sequencing methods has complemented microarray-based methods and has led to an exponential increase in the discovery of smaller structural-variation events. Some global discovery biases remain, but the integration of experimental and computational approaches is proving fruitful for accurate characterization of the copy, content and structure of variable regions. We argue that the long-term goal should be routine, cost-effective and high quality de novo assembly of human genomes to comprehensively assess all classes of structural variation.
PMCID: PMC4108431  PMID: 21358748
9.  Estimates of penetrance for recurrent pathogenic copy-number variations 
Although an increasing number of copy-number variations are being identified as susceptibility loci for a variety of pediatric diseases, the penetrance of these copy-number variations remains mostly unknown. This poses challenges for counseling, both for recurrence risks and prenatal diagnosis. We sought to provide empiric estimates for penetrance for some of these recurrent, disease-susceptibility loci.
We conducted a Bayesian analysis, based on the copy-number variation frequencies in control populations (n = 22,246) and in our database of >48,000 postnatal microarray-based comparative genomic hybridization samples. The background risk for congenital anomalies/developmental delay/intellectual disability was assumed to be ~5%. Copy-number variations studied were 1q21.1 proximal duplications, 1q21.1 distal deletions and duplications, 15q11.2 deletions, 16p13.11 deletions, 16p12.1 deletions, 16p11.2 proximal and distal deletions and duplications, 17q12 deletions and duplications, and 22q11.21 duplications.
Estimates for the risk of an abnormal phenotype ranged from 10.4% for 15q11.2 deletions to 62.4% for distal 16p11.2 deletions.
This model can be used to provide more precise estimates for the chance of an abnormal phenotype for many copy-number variations encountered in the prenatal setting. By providing the penetrance, additional, critical information can be given to prospective parents in the genetic counseling session.
PMCID: PMC3664238  PMID: 23258348
copy-number variation; genomic disorder; microarray; penetrance; prenatal diagnosis
10.  Genomic Deregulation of the E2F/Rb Pathway Leads to Activation of the Oncogene EZH2 in Small Cell Lung Cancer 
PLoS ONE  2013;8(8):e71670.
Small cell lung cancer (SCLC) is a highly aggressive lung neoplasm with extremely poor clinical outcomes and no approved targeted treatments. To elucidate the mechanisms responsible for driving the SCLC phenotype in hopes of revealing novel therapeutic targets, we studied copy number and methylation profiles of SCLC. We found disruption of the E2F/Rb pathway was a prominent feature deregulated in 96% of the SCLC samples investigated and was strongly associated with increased expression of EZH2, an oncogene and core member of the polycomb repressive complex 2 (PRC2). Through its catalytic role in the PRC2 complex, EZH2 normally functions to epigenetically silence genes during development, however, it aberrantly silences genes in human cancers. We provide evidence to support that EZH2 is functionally active in SCLC tumours, exerts pro-tumourigenic functions in vitro, and is associated with aberrant methylation profiles of PRC2 target genes indicative of a “stem-cell like” hypermethylator profile in SCLC tumours. Furthermore, lentiviral-mediated knockdown of EZH2 demonstrated a significant reduction in the growth of SCLC cell lines, suggesting EZH2 has a key role in driving SCLC biology. In conclusion, our data confirm the role of EZH2 as a critical oncogene in SCLC, and lend support to the prioritization of EZH2 as a potential therapeutic target in clinical disease.
PMCID: PMC3744458  PMID: 23967231
12.  Phenotypic Heterogeneity of Genomic Disorders and Rare Copy-Number Variants 
The New England journal of medicine  2012;367(14):1321-1331.
Some copy-number variants are associated with genomic disorders with extreme phenotypic heterogeneity. The cause of this variation is unknown, which presents challenges in genetic diagnosis, counseling, and management.
We analyzed the genomes of 2312 children known to carry a copy-number variant associated with intellectual disability and congenital abnormalities, using array comparative genomic hybridization.
Among the affected children, 10.1% carried a second large copy-number variant in addition to the primary genetic lesion. We identified seven genomic disorders, each defined by a specific copy-number variant, in which the affected children were more likely to carry multiple copy-number variants than were controls. We found that syndromic disorders could be distinguished from those with extreme phenotypic heterogeneity on the basis of the total number of copy-number variants and whether the variants are inherited or de novo. Children who carried two large copy-number variants of unknown clinical significance were eight times as likely to have developmental delay as were controls (odds ratio, 8.16; 95% confidence interval, 5.33 to 13.07; P = 2.11×10−38). Among affected children, inherited copy-number variants tended to co-occur with a second-site large copy-number variant (Spearman correlation coefficient, 0.66; P<0.001). Boys were more likely than girls to have disorders of phenotypic heterogeneity (P<0.001), and mothers were more likely than fathers to transmit second-site copy-number variants to their offspring (P = 0.02).
Multiple, large copy-number variants, including those of unknown pathogenic significance, compound to result in a severe clinical presentation, and secondary copy-number variants are preferentially transmitted from maternal carriers. (Funded by the Simons Foundation Autism Research Initiative and the National Institutes of Health.)
PMCID: PMC3494411  PMID: 22970919
13.  A Genetic Model for Neurodevelopmental Disease 
Current opinion in neurobiology  2012;22(5):829-836.
The genetic basis of neurodevelopmental and neuropsychiatric diseases has been advanced by the discovery of large and recurrent copy number variants significantly enriched in cases when compared to controls. The pattern of this variation strongly implies that rare variants contribute significantly to neurological disease; that different genes will be responsible for similar diseases in different families; and that the same “primary” genetic lesions can result in a different disease outcome depending potentially on the genetic background. Next-generation sequencing technologies are beginning to broaden the spectrum of disease-causing variation and provide specificity by pinpointing both genes and pathways for future diagnostics and therapeutics.
PMCID: PMC3437230  PMID: 22560351
14.  Germline DNA Copy Number Aberrations Identified as Potential Prognostic Factors for Breast Cancer Recurrence 
PLoS ONE  2013;8(1):e53850.
Breast cancer recurrence (BCR) is a common treatment outcome despite curative-intent primary treatment of non-metastatic breast cancer. Currently used prognostic and predictive factors utilize tumor-based markers, and are not optimal determinants of risk of BCR. Germline-based copy number aberrations (CNAs) have not been evaluated as determinants of predisposition to experience BCR. In this study, we accessed germline DNA from 369 female breast cancer subjects who received curative-intent primary treatment following diagnosis. Of these, 155 experienced BCR and 214 did not, after a median duration of follow up after breast cancer diagnosis of 6.35 years (range = 0.60–21.78) and 8.60 years (range = 3.08–13.57), respectively. Whole genome CNA genotyping was performed on the Affymetrix SNP array 6.0 platform. CNAs were identified using the SNP-Fast Adaptive States Segmentation Technique 2 algorithm implemented in Nexus Copy Number 6.0. Six samples were removed due to poor quality scores, leaving 363 samples for further analysis. We identified 18,561 CNAs with ≥1 kb as a predefined cut-off for observed aberrations. Univariate survival analyses (log-rank tests) identified seven CNAs (two copy number gains and five copy neutral-loss of heterozygosities, CN-LOHs) showing significant differences (P<2.01×10−5) in recurrence-free survival (RFS) probabilities with and without CNAs.We also observed three additional but distinct CN-LOHs showing significant differences in RFS probabilities (P<2.86×10−5) when analyses were restricted to stratified cases (luminal A, n = 208) only. After adjusting for tumor stage and grade in multivariate analyses (Cox proportional hazards models), all the CNAs remained strongly associated with the phenotype of BCR. Of these, we confirmed three CNAs at 17q11.2, 11q13.1 and 6q24.1 in representative samples using independent genotyping platforms. Our results suggest further investigations on the potential use of germline DNA variations as prognostic markers in cancer-associated phenotypes.
PMCID: PMC3547038  PMID: 23342018
15.  Sporadic autism exomes reveal a highly interconnected protein network of de novo mutations 
Nature  2012;485(7397):246-250.
It is well established that autism spectrum disorders (ASD) have a strong genetic component. However, for at least 70% of cases, the underlying genetic cause is unknown1. Under the hypothesis that de novo mutations underlie a substantial fraction of the risk for developing ASD in families with no previous history of ASD or related phenotypes—so-called sporadic or simplex families2,3, we sequenced all coding regions of the genome, i.e. the exome, for parent-child trios exhibiting sporadic ASD, including 189 new trios and 20 previously reported4. Additionally, we also sequenced the exomes of 50 unaffected siblings corresponding to these new (n = 31) and previously reported trios (n = 19)4, for a total of 677 individual exomes from 209 families. Here we show de novo point mutations are overwhelmingly paternal in origin (4:1 bias) and positively correlated with paternal age, consistent with the modest increased risk for children of older fathers to develop ASD5. Moreover, 39% (49/126) of the most severe or disruptive de novo mutations map to a highly interconnected beta-catenin/chromatin remodeling protein network ranked significantly for autism candidate genes. In proband exomes, recurrent protein-altering mutations were observed in two genes, CHD8 and NTNG1. Mutation screening of six candidate genes in 1,703 ASD probands identified additional de novo, protein-altering mutations in GRIN2B, LAMC3, and SCN1A. Combined with copy number variant (CNV) data, these results suggest extreme locus heterogeneity but also provide a target for future discovery, diagnostics, and therapeutics.
PMCID: PMC3350576  PMID: 22495309
16.  Integrating the multiple dimensions of genomic and epigenomic landscapes of cancer 
Cancer metastasis reviews  2010;29(1):73-93.
Advances in high-throughput, genome-wide profiling technologies have allowed for an unprecedented view of the cancer genome landscape. Specifically, high-density microarrays and sequencing-based strategies have been widely utilized to identify genetic (such as gene dosage, allelic status, and mutations in gene sequence) and epigenetic (such as DNA methylation, histone modification, and micro-RNA) aberrations in cancer. Although the application of these profiling technologies in unidimensional analyses has been instrumental in cancer gene discovery, genes affected by low-frequency events are often overlooked. The integrative approach of analyzing parallel dimensions has enabled the identification of (a) genes that are often disrupted by multiple mechanisms but at low frequencies by any one mechanism and (b) pathways that are often disrupted at multiple components but at low frequencies at individual components. These benefits of using an integrative approach illustrate the concept that the whole is greater than the sum of its parts. As efforts have now turned toward parallel and integrative multidimensional approaches for studying the cancer genome landscape in hopes of obtaining a more insightful understanding of the key genes and pathways driving cancer cells, this review describes key findings disseminating from such high-throughput, integrative analyses, including contributions to our understanding of causative genetic events in cancer cell biology.
PMCID: PMC3415277  PMID: 20108112
Integrative analysis; Cancer genome; Sequencing; Microarray
17.  Divergent Genomic and Epigenomic Landscapes of Lung Cancer Subtypes Underscore the Selection of Different Oncogenic Pathways during Tumor Development 
PLoS ONE  2012;7(5):e37775.
For therapeutic purposes, non-small cell lung cancer (NSCLC) has traditionally been regarded as a single disease. However, recent evidence suggest that the two major subtypes of NSCLC, adenocarcinoma (AC) and squamous cell carcinoma (SqCC) respond differently to both molecular targeted and new generation chemotherapies. Therefore, identifying the molecular differences between these tumor types may impact novel treatment strategy. We performed the first large-scale analysis of 261 primary NSCLC tumors (169 AC and 92 SqCC), integrating genome-wide DNA copy number, methylation and gene expression profiles to identify subtype-specific molecular alterations relevant to new agent design and choice of therapy. Comparison of AC and SqCC genomic and epigenomic landscapes revealed 778 altered genes with corresponding expression changes that are selected during tumor development in a subtype-specific manner. Analysis of >200 additional NSCLCs confirmed that these genes are responsible for driving the differential development and resulting phenotypes of AC and SqCC. Importantly, we identified key oncogenic pathways disrupted in each subtype that likely serve as the basis for their differential tumor biology and clinical outcomes. Downregulation of HNF4α target genes was the most common pathway specific to AC, while SqCC demonstrated disruption of numerous histone modifying enzymes as well as the transcription factor E2F1. In silico screening of candidate therapeutic compounds using subtype-specific pathway components identified HDAC and PI3K inhibitors as potential treatments tailored to lung SqCC. Together, our findings suggest that AC and SqCC develop through distinct pathogenetic pathways that have significant implication in our approach to the clinical management of NSCLC.
PMCID: PMC3357406  PMID: 22629454
18.  A Copy Number Variation Morbidity Map of Developmental Delay 
Nature genetics  2011;43(9):838-846.
To understand the genetic heterogeneity underlying developmental delay, we compare copy-number variants (CNVs) in 15,767 children with intellectual disability and various congenital defects to 8,329 adult controls. We estimate that ~14.2% of disease in these individuals is due to large CNVs > 400 kbp. We find greater CNV enrichment in patients with craniofacial anomalies and cardiovascular defects than epilepsy or autism. We identify 59 pathogenic CNVs including 14 novel or previously weakly supported candidates. We refine the critical interval for several genomic disorders such as the 17q21.31 microdeletion syndrome and identify 940 candidate dosage-sensitive genes. We also develop methods to opportunistically discover small, disruptive CNVs within the large and growing diagnostic array datasets. This evolving CNV morbidity map combined with exome/genome sequencing will be critical for deciphering the genetic basis of developmental delay, intellectual disability, and autism spectrum disorders.
PMCID: PMC3171215  PMID: 21841781
19.  Relative Burden of Large CNVs on a Range of Neurodevelopmental Phenotypes 
PLoS Genetics  2011;7(11):e1002334.
While numerous studies have implicated copy number variants (CNVs) in a range of neurological phenotypes, the impact relative to disease severity has been difficult to ascertain due to small sample sizes, lack of phenotypic details, and heterogeneity in platforms used for discovery. Using a customized microarray enriched for genomic hotspots, we assayed for large CNVs among 1,227 individuals with various neurological deficits including dyslexia (376), sporadic autism (350), and intellectual disability (ID) (501), as well as 337 controls. We show that the frequency of large CNVs (>1 Mbp) is significantly greater for ID–associated phenotypes compared to autism (p = 9.58×10−11, odds ratio = 4.59), dyslexia (p = 3.81×10−18, odds ratio = 14.45), or controls (p = 2.75×10−17, odds ratio = 13.71). There is a striking difference in the frequency of rare CNVs (>50 kbp) in autism (10%, p = 2.4×10−6, odds ratio = 6) or ID (16%, p = 3.55×10−12, odds ratio = 10) compared to dyslexia (2%) with essentially no difference in large CNV burden among dyslexia patients compared to controls. Rare CNVs were more likely to arise de novo (64%) in ID when compared to autism (40%) or dyslexia (0%). We observed a significantly increased large CNV burden in individuals with ID and multiple congenital anomalies (MCA) compared to ID alone (p = 0.001, odds ratio = 2.54). Our data suggest that large CNV burden positively correlates with the severity of childhood disability: ID with MCA being most severely affected and dyslexics being indistinguishable from controls. When autism without ID was considered separately, the increase in CNV burden was modest compared to controls (p = 0.07, odds ratio = 2.33).
Author Summary
Deletions and duplications, termed copy number variants (CNVs), have been implicated in a variety of neurodevelopmental disorders including intellectual disability (ID), autism, and schizophrenia. Our understanding of the relevance of large, rare CNVs in a range of neurodevelopmental phenotypes, varying in severity and prevalence, has been difficult because these studies were restricted to the analysis of one disorder at a time using different CNV detection platforms, insufficient sample sizes, and a lack of detailed clinical information. We tested 1,227 individuals with different neurological diseases including dyslexia, autism, and ID using the same CNV detection platform. We observed striking differences in CNV burden and inheritance characteristics among these cohorts and show that ID is the primary correlate of large CNV burden. This correlation is well illustrated by a comparison of autism patients with and without ID—where the latter show only modest increases in large CNV burden compared to controls. We also find significant depletion in the frequency of large CNVs in dyslexia compared to the other cohorts. Further studies on larger sets of individuals using high-resolution arrays and next-generation sequencing are warranted for a detailed understanding of the relative contribution of genetic variants to neurodevelopmental disorders.
PMCID: PMC3213131  PMID: 22102821
20.  Comparison of genome-wide array genomic hybridization platforms for the detection of copy number variants in idiopathic mental retardation 
BMC Medical Genomics  2011;4:25.
Clinical laboratories are adopting array genomic hybridization as a standard clinical test. A number of whole genome array genomic hybridization platforms are available, but little is known about their comparative performance in a clinical context.
We studied 30 children with idiopathic MR and both unaffected parents of each child using Affymetrix 500 K GeneChip SNP arrays, Agilent Human Genome 244 K oligonucleotide arrays and NimbleGen 385 K Whole-Genome oligonucleotide arrays. We also determined whether CNVs called on these platforms were detected by Illumina Hap550 beadchips or SMRT 32 K BAC whole genome tiling arrays and tested 15 of the 30 trios on Affymetrix 6.0 SNP arrays.
The Affymetrix 500 K, Agilent and NimbleGen platforms identified 3061 autosomal and 117 X chromosomal CNVs in the 30 trios. 147 of these CNVs appeared to be de novo, but only 34 (22%) were found on more than one platform. Performing genotype-phenotype correlations, we identified 7 most likely pathogenic and 2 possibly pathogenic CNVs for MR. All 9 of these putatively pathogenic CNVs were detected by the Affymetrix 500 K, Agilent, NimbleGen and the Illumina arrays, and 5 were found by the SMRT BAC array. Both putatively pathogenic CNVs identified in the 15 trios tested with the Affymetrix 6.0 were identified by this platform.
Our findings demonstrate that different results are obtained with different platforms and illustrate the trade-off that exists between sensitivity and specificity. The large number of apparently false positive CNV calls on each of the platforms supports the need for validating clinically important findings with a different technology.
PMCID: PMC3076225  PMID: 21439053
21.  A sequence-based approach to identify reference genes for gene expression analysis 
BMC Medical Genomics  2010;3:32.
An important consideration when analyzing both microarray and quantitative PCR expression data is the selection of appropriate genes as endogenous controls or reference genes. This step is especially critical when identifying genes differentially expressed between datasets. Moreover, reference genes suitable in one context (e.g. lung cancer) may not be suitable in another (e.g. breast cancer). Currently, the main approach to identify reference genes involves the mining of expression microarray data for highly expressed and relatively constant transcripts across a sample set. A caveat here is the requirement for transcript normalization prior to analysis, and measurements obtained are relative, not absolute. Alternatively, as sequencing-based technologies provide digital quantitative output, absolute quantification ensues, and reference gene identification becomes more accurate.
Serial analysis of gene expression (SAGE) profiles of non-malignant and malignant lung samples were compared using a permutation test to identify the most stably expressed genes across all samples. Subsequently, the specificity of the reference genes was evaluated across multiple tissue types, their constancy of expression was assessed using quantitative RT-PCR (qPCR), and their impact on differential expression analysis of microarray data was evaluated.
We show that (i) conventional references genes such as ACTB and GAPDH are highly variable between cancerous and non-cancerous samples, (ii) reference genes identified for lung cancer do not perform well for other cancer types (breast and brain), (iii) reference genes identified through SAGE show low variability using qPCR in a different cohort of samples, and (iv) normalization of a lung cancer gene expression microarray dataset with or without our reference genes, yields different results for differential gene expression and subsequent analyses. Specifically, key established pathways in lung cancer exhibit higher statistical significance using a dataset normalized with our reference genes relative to normalization without using our reference genes.
Our analyses found NDUFA1, RPL19, RAB5C, and RPS18 to occupy the top ranking positions among 15 suitable reference genes optimal for normalization of lung tissue expression data. Significantly, the approach used in this study can be applied to data generated using new generation sequencing platforms for the identification of reference genes optimal within diverse contexts.
PMCID: PMC2928167  PMID: 20682026
22.  Integrative Genomic Analyses Identify BRF2 as a Novel Lineage-Specific Oncogene in Lung Squamous Cell Carcinoma 
PLoS Medicine  2010;7(7):e1000315.
William Lockwood and colleagues show that the focal amplification of a gene, BRF2, on Chromosome 8p12 plays a key role in squamous cell carcinoma of the lung.
Traditionally, non-small cell lung cancer is treated as a single disease entity in terms of systemic therapy. Emerging evidence suggests the major subtypes—adenocarcinoma (AC) and squamous cell carcinoma (SqCC)—respond differently to therapy. Identification of the molecular differences between these tumor types will have a significant impact in designing novel therapies that can improve the treatment outcome.
Methods and Findings
We used an integrative genomics approach, combing high-resolution comparative genomic hybridization and gene expression microarray profiles, to compare AC and SqCC tumors in order to uncover alterations at the DNA level, with corresponding gene transcription changes, which are selected for during development of lung cancer subtypes. Through the analysis of multiple independent cohorts of clinical tumor samples (>330), normal lung tissues and bronchial epithelial cells obtained by bronchial brushing in smokers without lung cancer, we identified the overexpression of BRF2, a gene on Chromosome 8p12, which is specific for development of SqCC of lung. Genetic activation of BRF2, which encodes a RNA polymerase III (Pol III) transcription initiation factor, was found to be associated with increased expression of small nuclear RNAs (snRNAs) that are involved in processes essential for cell growth, such as RNA splicing. Ectopic expression of BRF2 in human bronchial epithelial cells induced a transformed phenotype and demonstrates downstream oncogenic effects, whereas RNA interference (RNAi)-mediated knockdown suppressed growth and colony formation of SqCC cells overexpressing BRF2, but not AC cells. Frequent activation of BRF2 in >35% preinvasive bronchial carcinoma in situ, as well as in dysplastic lesions, provides evidence that BRF2 expression is an early event in cancer development of this cell lineage.
This is the first study, to our knowledge, to show that the focal amplification of a gene in Chromosome 8p12, plays a key role in squamous cell lineage specificity of the disease. Our data suggest that genetic activation of BRF2 represents a unique mechanism of SqCC lung tumorigenesis through the increase of Pol III-mediated transcription. It can serve as a marker for lung SqCC and may provide a novel target for therapy.
Please see later in the article for the Editors' Summary
Editors' Summary
Lung cancer is the commonest cause of cancer-related death. Every year, 1.3 million people die from this disease, which is mainly caused by smoking. Most cases of lung cancer are “non-small cell lung cancers” (NSCLCs). Like all cancers, NSCLC starts when cells begin to divide uncontrollably and to move round the body (metastasize) because of changes (mutations) in their genes. These mutations are often in “oncogenes,” genes that, when activated, encourage cell division. Oncogenes can be activated by mutations that alter the properties of the proteins they encode or by mutations that increase the amount of protein made from them, such as gene amplification (an increase in the number of copies of a gene). If NSCLC is diagnosed before it has spread from the lungs (stage I disease), it can be surgically removed and many patients with stage I NSCLC survive for more than 5 years after their diagnosis. Unfortunately, in more than half of patients, NSCLC has metastasized before it is diagnosed. This stage IV NSCLC can be treated with chemotherapy (toxic chemicals that kill fast-growing cancer cells) but only 2% of patients with stage IV lung cancer are alive 5 years after diagnosis.
Why Was This Study Done?
Traditionally, NSCLC has been regarded as a single disease in terms of treatment. However, emerging evidence suggests that the two major subtypes of NSCLC—adenocarcinoma and squamous cell carcinoma (SqCC)—respond differently to chemotherapy. Adenocarcinoma and SqCC start in different types of lung cell and experts think that for each cell type in the body, specific combinations of mutations interact with the cell type's own unique characteristics to provide the growth and survival advantage needed for cancer development. If this is true, then identifying the molecular differences between adenocarcinoma and SqCC could provide targets for more effective therapies for these major subtypes of NSCLC. Amplification of a chromosome region called 8p12 is very common in NSCLC, which suggests that an oncogene that drives lung cancer development is present in this chromosome region. In this study, the researchers investigate this possibility by looking for an amplified gene in the 8p12 chromosome region that makes increased amounts of protein in lung SqCC but not in lung adenocarcinoma.
What Did the Researchers Do and Find?
The researchers used a technique called comparative genomic hybridization to show that focal regions of Chromosome 8p are amplified in about 40% of lung SqCCs, but that DNA loss in this region is the most common alteration in lung adenocarcinomas. Ten genes in the 8p12 chromosome region were expressed at higher levels in the SqCC samples that they examined than in adenocarcinoma samples, they report, and overexpression of five of these genes correlated with amplification of the 8p12 region in the SqCC samples. Only one of the genes—BRF2—was more highly expressed in squamous carcinoma cells than in normal bronchial epithelial cells (the cell type that lines the tubes that take air into the lungs and from which SqCC develops). Artificially induced expression of BRF2 in bronchial epithelial cells made these normal cells behave like tumor cells, whereas reduction of BRF2 expression in squamous carcinoma cells made them behave more like normal bronchial epithelial cells. Finally, BRF2 was frequently activated in two early stages of squamous cell carcinoma—bronchial carcinoma in situ and dysplastic lesions.
What Do These Findings Mean?
Together, these findings show that the focal amplification of chromosome region 8p12 plays a role in the development of lung SqCC but not in the development of lung adenocarcinoma, the other major subtype of NSCLC. These findings identify BRF2 (which encodes a RNA polymerase III transcription initiation factor, a protein that is required for the synthesis of RNA molecules that help to control cell growth) as a lung SqCC-specific oncogene and uncover a unique mechanism for lung SqCC development. Most importantly, these findings suggest that genetic activation of BRF2 could be used as a marker for lung SqCC, which might facilitate the early detection of this type of NSCLC and that BRF2 might provide a new target for therapy.
Additional Information
Please access these Web sites via the online version of this summary at
The US National Cancer Institute provides detailed information for patients and professionals about all aspects of lung cancer, including information on non-small cell carcinoma (in English and Spanish)
Cancer Research UK also provides information about lung cancer and information on how cancer starts
MedlinePlus has links to other resources about lung cancer (in English and Spanish)
PMCID: PMC2910599  PMID: 20668658
23.  FACADE: a fast and sensitive algorithm for the segmentation and calling of high resolution array CGH data 
Nucleic Acids Research  2010;38(15):e157.
The availability of high resolution array comparative genomic hybridization (CGH) platforms has led to increasing complexities in data analysis. Specifically, defining contiguous regions of alterations or segmentation can be computationally intensive and popular algorithms can take hours to days for the processing of arrays comprised of hundreds of thousands to millions of elements. Additionally, tumors tend to demonstrate subtle copy number alterations due to heterogeneity, ploidy and hybridization effects. Thus, there is a need for fast, sensitive array CGH segmentation and alteration calling algorithms. Here, we describe Fast Algorithm for Calling After Detection of Edges (FACADE), a highly sensitive and easy to use algorithm designed to rapidly segment and call high resolution array data.
PMCID: PMC2926625  PMID: 20551132
24.  An integrative multi-dimensional genetic and epigenetic strategy to identify aberrant genes and pathways in cancer 
BMC Systems Biology  2010;4:67.
Genomics has substantially changed our approach to cancer research. Gene expression profiling, for example, has been utilized to delineate subtypes of cancer, and facilitated derivation of predictive and prognostic signatures. The emergence of technologies for the high resolution and genome-wide description of genetic and epigenetic features has enabled the identification of a multitude of causal DNA events in tumors. This has afforded the potential for large scale integration of genome and transcriptome data generated from a variety of technology platforms to acquire a better understanding of cancer.
Here we show how multi-dimensional genomics data analysis would enable the deciphering of mechanisms that disrupt regulatory/signaling cascades and downstream effects. Since not all gene expression changes observed in a tumor are causal to cancer development, we demonstrate an approach based on multiple concerted disruption (MCD) analysis of genes that facilitates the rational deduction of aberrant genes and pathways, which otherwise would be overlooked in single genomic dimension investigations.
Notably, this is the first comprehensive study of breast cancer cells by parallel integrative genome wide analyses of DNA copy number, LOH, and DNA methylation status to interpret changes in gene expression pattern. Our findings demonstrate the power of a multi-dimensional approach to elucidate events which would escape conventional single dimensional analysis and as such, reduce the cohort sample size for cancer gene discovery.
PMCID: PMC2880289  PMID: 20478067
25.  Transcriptome Profiles of Carcinoma-in-Situ and Invasive Non-Small Cell Lung Cancer as Revealed by SAGE 
PLoS ONE  2010;5(2):e9162.
Non-small cell lung cancer (NSCLC) presents as a progressive disease spanning precancerous, preinvasive, locally invasive, and metastatic lesions. Identification of biological pathways reflective of these progressive stages, and aberrantly expressed genes associated with these pathways, would conceivably enhance therapeutic approaches to this devastating disease.
Methodology/Principal Findings
Through the construction and analysis of SAGE libraries, we have determined transcriptome profiles for preinvasive carcinoma-in-situ (CIS) and invasive squamous cell carcinoma (SCC) of the lung, and compared these with expression profiles generated from both bronchial epithelium, and precancerous metaplastic and dysplastic lesions using Ingenuity Pathway Analysis. Expression of genes associated with epidermal development, and loss of expression of genes associated with mucociliary biology, are predominant features of CIS, largely shared with precancerous lesions. Additionally, expression of genes associated with xenobiotic metabolism/detoxification is a notable feature of CIS, and is largely maintained in invasive cancer. Genes related to tissue fibrosis and acute phase immune response are characteristic of the invasive SCC phenotype. Moreover, the data presented here suggests that tissue remodeling/fibrosis is initiated at the early stages of CIS. Additionally, this study indicates that alteration in copy-number status represents a plausible mechanism for differential gene expression in CIS and invasive SCC.
This study is the first report of large-scale expression profiling of CIS of the lung. Unbiased expression profiling of these preinvasive and invasive lesions provides a platform for further investigations into the molecular genetic events relevant to early stages of squamous NSCLC development. Additionally, up-regulated genes detected at extreme differences between CIS and invasive cancer may have potential to serve as biomarkers for early detection.
PMCID: PMC2820080  PMID: 20161782

Results 1-25 (31)