PMCC PMCC

Search tips
Search criteria

Advanced
Results 1-23 (23)
 

Clipboard (0)
None

Select a Filter Below

Journals
more »
Year of Publication
Document Types
1.  Estimates of penetrance for recurrent pathogenic copy-number variations 
Purpose
Although an increasing number of copy-number variations are being identified as susceptibility loci for a variety of pediatric diseases, the penetrance of these copy-number variations remains mostly unknown. This poses challenges for counseling, both for recurrence risks and prenatal diagnosis. We sought to provide empiric estimates for penetrance for some of these recurrent, disease-susceptibility loci.
Methods
We conducted a Bayesian analysis, based on the copy-number variation frequencies in control populations (n = 22,246) and in our database of >48,000 postnatal microarray-based comparative genomic hybridization samples. The background risk for congenital anomalies/developmental delay/intellectual disability was assumed to be ~5%. Copy-number variations studied were 1q21.1 proximal duplications, 1q21.1 distal deletions and duplications, 15q11.2 deletions, 16p13.11 deletions, 16p12.1 deletions, 16p11.2 proximal and distal deletions and duplications, 17q12 deletions and duplications, and 22q11.21 duplications.
Results
Estimates for the risk of an abnormal phenotype ranged from 10.4% for 15q11.2 deletions to 62.4% for distal 16p11.2 deletions.
Conclusion
This model can be used to provide more precise estimates for the chance of an abnormal phenotype for many copy-number variations encountered in the prenatal setting. By providing the penetrance, additional, critical information can be given to prospective parents in the genetic counseling session.
doi:10.1038/gim.2012.164
PMCID: PMC3664238  PMID: 23258348
copy-number variation; genomic disorder; microarray; penetrance; prenatal diagnosis
2.  Genomic Deregulation of the E2F/Rb Pathway Leads to Activation of the Oncogene EZH2 in Small Cell Lung Cancer 
PLoS ONE  2013;8(8):e71670.
Small cell lung cancer (SCLC) is a highly aggressive lung neoplasm with extremely poor clinical outcomes and no approved targeted treatments. To elucidate the mechanisms responsible for driving the SCLC phenotype in hopes of revealing novel therapeutic targets, we studied copy number and methylation profiles of SCLC. We found disruption of the E2F/Rb pathway was a prominent feature deregulated in 96% of the SCLC samples investigated and was strongly associated with increased expression of EZH2, an oncogene and core member of the polycomb repressive complex 2 (PRC2). Through its catalytic role in the PRC2 complex, EZH2 normally functions to epigenetically silence genes during development, however, it aberrantly silences genes in human cancers. We provide evidence to support that EZH2 is functionally active in SCLC tumours, exerts pro-tumourigenic functions in vitro, and is associated with aberrant methylation profiles of PRC2 target genes indicative of a “stem-cell like” hypermethylator profile in SCLC tumours. Furthermore, lentiviral-mediated knockdown of EZH2 demonstrated a significant reduction in the growth of SCLC cell lines, suggesting EZH2 has a key role in driving SCLC biology. In conclusion, our data confirm the role of EZH2 as a critical oncogene in SCLC, and lend support to the prioritization of EZH2 as a potential therapeutic target in clinical disease.
doi:10.1371/journal.pone.0071670
PMCID: PMC3744458  PMID: 23967231
4.  Phenotypic Heterogeneity of Genomic Disorders and Rare Copy-Number Variants 
The New England journal of medicine  2012;367(14):1321-1331.
BACKGROUND
Some copy-number variants are associated with genomic disorders with extreme phenotypic heterogeneity. The cause of this variation is unknown, which presents challenges in genetic diagnosis, counseling, and management.
METHODS
We analyzed the genomes of 2312 children known to carry a copy-number variant associated with intellectual disability and congenital abnormalities, using array comparative genomic hybridization.
RESULTS
Among the affected children, 10.1% carried a second large copy-number variant in addition to the primary genetic lesion. We identified seven genomic disorders, each defined by a specific copy-number variant, in which the affected children were more likely to carry multiple copy-number variants than were controls. We found that syndromic disorders could be distinguished from those with extreme phenotypic heterogeneity on the basis of the total number of copy-number variants and whether the variants are inherited or de novo. Children who carried two large copy-number variants of unknown clinical significance were eight times as likely to have developmental delay as were controls (odds ratio, 8.16; 95% confidence interval, 5.33 to 13.07; P = 2.11×10−38). Among affected children, inherited copy-number variants tended to co-occur with a second-site large copy-number variant (Spearman correlation coefficient, 0.66; P<0.001). Boys were more likely than girls to have disorders of phenotypic heterogeneity (P<0.001), and mothers were more likely than fathers to transmit second-site copy-number variants to their offspring (P = 0.02).
CONCLUSIONS
Multiple, large copy-number variants, including those of unknown pathogenic significance, compound to result in a severe clinical presentation, and secondary copy-number variants are preferentially transmitted from maternal carriers. (Funded by the Simons Foundation Autism Research Initiative and the National Institutes of Health.)
doi:10.1056/NEJMoa1200395
PMCID: PMC3494411  PMID: 22970919
5.  A Genetic Model for Neurodevelopmental Disease 
Current opinion in neurobiology  2012;22(5):829-836.
The genetic basis of neurodevelopmental and neuropsychiatric diseases has been advanced by the discovery of large and recurrent copy number variants significantly enriched in cases when compared to controls. The pattern of this variation strongly implies that rare variants contribute significantly to neurological disease; that different genes will be responsible for similar diseases in different families; and that the same “primary” genetic lesions can result in a different disease outcome depending potentially on the genetic background. Next-generation sequencing technologies are beginning to broaden the spectrum of disease-causing variation and provide specificity by pinpointing both genes and pathways for future diagnostics and therapeutics.
doi:10.1016/j.conb.2012.04.007
PMCID: PMC3437230  PMID: 22560351
6.  Germline DNA Copy Number Aberrations Identified as Potential Prognostic Factors for Breast Cancer Recurrence 
PLoS ONE  2013;8(1):e53850.
Breast cancer recurrence (BCR) is a common treatment outcome despite curative-intent primary treatment of non-metastatic breast cancer. Currently used prognostic and predictive factors utilize tumor-based markers, and are not optimal determinants of risk of BCR. Germline-based copy number aberrations (CNAs) have not been evaluated as determinants of predisposition to experience BCR. In this study, we accessed germline DNA from 369 female breast cancer subjects who received curative-intent primary treatment following diagnosis. Of these, 155 experienced BCR and 214 did not, after a median duration of follow up after breast cancer diagnosis of 6.35 years (range = 0.60–21.78) and 8.60 years (range = 3.08–13.57), respectively. Whole genome CNA genotyping was performed on the Affymetrix SNP array 6.0 platform. CNAs were identified using the SNP-Fast Adaptive States Segmentation Technique 2 algorithm implemented in Nexus Copy Number 6.0. Six samples were removed due to poor quality scores, leaving 363 samples for further analysis. We identified 18,561 CNAs with ≥1 kb as a predefined cut-off for observed aberrations. Univariate survival analyses (log-rank tests) identified seven CNAs (two copy number gains and five copy neutral-loss of heterozygosities, CN-LOHs) showing significant differences (P<2.01×10−5) in recurrence-free survival (RFS) probabilities with and without CNAs.We also observed three additional but distinct CN-LOHs showing significant differences in RFS probabilities (P<2.86×10−5) when analyses were restricted to stratified cases (luminal A, n = 208) only. After adjusting for tumor stage and grade in multivariate analyses (Cox proportional hazards models), all the CNAs remained strongly associated with the phenotype of BCR. Of these, we confirmed three CNAs at 17q11.2, 11q13.1 and 6q24.1 in representative samples using independent genotyping platforms. Our results suggest further investigations on the potential use of germline DNA variations as prognostic markers in cancer-associated phenotypes.
doi:10.1371/journal.pone.0053850
PMCID: PMC3547038  PMID: 23342018
7.  Sporadic autism exomes reveal a highly interconnected protein network of de novo mutations 
Nature  2012;485(7397):246-250.
It is well established that autism spectrum disorders (ASD) have a strong genetic component. However, for at least 70% of cases, the underlying genetic cause is unknown1. Under the hypothesis that de novo mutations underlie a substantial fraction of the risk for developing ASD in families with no previous history of ASD or related phenotypes—so-called sporadic or simplex families2,3, we sequenced all coding regions of the genome, i.e. the exome, for parent-child trios exhibiting sporadic ASD, including 189 new trios and 20 previously reported4. Additionally, we also sequenced the exomes of 50 unaffected siblings corresponding to these new (n = 31) and previously reported trios (n = 19)4, for a total of 677 individual exomes from 209 families. Here we show de novo point mutations are overwhelmingly paternal in origin (4:1 bias) and positively correlated with paternal age, consistent with the modest increased risk for children of older fathers to develop ASD5. Moreover, 39% (49/126) of the most severe or disruptive de novo mutations map to a highly interconnected beta-catenin/chromatin remodeling protein network ranked significantly for autism candidate genes. In proband exomes, recurrent protein-altering mutations were observed in two genes, CHD8 and NTNG1. Mutation screening of six candidate genes in 1,703 ASD probands identified additional de novo, protein-altering mutations in GRIN2B, LAMC3, and SCN1A. Combined with copy number variant (CNV) data, these results suggest extreme locus heterogeneity but also provide a target for future discovery, diagnostics, and therapeutics.
doi:10.1038/nature10989
PMCID: PMC3350576  PMID: 22495309
8.  A Copy Number Variation Morbidity Map of Developmental Delay 
Nature genetics  2011;43(9):838-846.
To understand the genetic heterogeneity underlying developmental delay, we compare copy-number variants (CNVs) in 15,767 children with intellectual disability and various congenital defects to 8,329 adult controls. We estimate that ~14.2% of disease in these individuals is due to large CNVs > 400 kbp. We find greater CNV enrichment in patients with craniofacial anomalies and cardiovascular defects than epilepsy or autism. We identify 59 pathogenic CNVs including 14 novel or previously weakly supported candidates. We refine the critical interval for several genomic disorders such as the 17q21.31 microdeletion syndrome and identify 940 candidate dosage-sensitive genes. We also develop methods to opportunistically discover small, disruptive CNVs within the large and growing diagnostic array datasets. This evolving CNV morbidity map combined with exome/genome sequencing will be critical for deciphering the genetic basis of developmental delay, intellectual disability, and autism spectrum disorders.
doi:10.1038/ng.909
PMCID: PMC3171215  PMID: 21841781
9.  Relative Burden of Large CNVs on a Range of Neurodevelopmental Phenotypes 
PLoS Genetics  2011;7(11):e1002334.
While numerous studies have implicated copy number variants (CNVs) in a range of neurological phenotypes, the impact relative to disease severity has been difficult to ascertain due to small sample sizes, lack of phenotypic details, and heterogeneity in platforms used for discovery. Using a customized microarray enriched for genomic hotspots, we assayed for large CNVs among 1,227 individuals with various neurological deficits including dyslexia (376), sporadic autism (350), and intellectual disability (ID) (501), as well as 337 controls. We show that the frequency of large CNVs (>1 Mbp) is significantly greater for ID–associated phenotypes compared to autism (p = 9.58×10−11, odds ratio = 4.59), dyslexia (p = 3.81×10−18, odds ratio = 14.45), or controls (p = 2.75×10−17, odds ratio = 13.71). There is a striking difference in the frequency of rare CNVs (>50 kbp) in autism (10%, p = 2.4×10−6, odds ratio = 6) or ID (16%, p = 3.55×10−12, odds ratio = 10) compared to dyslexia (2%) with essentially no difference in large CNV burden among dyslexia patients compared to controls. Rare CNVs were more likely to arise de novo (64%) in ID when compared to autism (40%) or dyslexia (0%). We observed a significantly increased large CNV burden in individuals with ID and multiple congenital anomalies (MCA) compared to ID alone (p = 0.001, odds ratio = 2.54). Our data suggest that large CNV burden positively correlates with the severity of childhood disability: ID with MCA being most severely affected and dyslexics being indistinguishable from controls. When autism without ID was considered separately, the increase in CNV burden was modest compared to controls (p = 0.07, odds ratio = 2.33).
Author Summary
Deletions and duplications, termed copy number variants (CNVs), have been implicated in a variety of neurodevelopmental disorders including intellectual disability (ID), autism, and schizophrenia. Our understanding of the relevance of large, rare CNVs in a range of neurodevelopmental phenotypes, varying in severity and prevalence, has been difficult because these studies were restricted to the analysis of one disorder at a time using different CNV detection platforms, insufficient sample sizes, and a lack of detailed clinical information. We tested 1,227 individuals with different neurological diseases including dyslexia, autism, and ID using the same CNV detection platform. We observed striking differences in CNV burden and inheritance characteristics among these cohorts and show that ID is the primary correlate of large CNV burden. This correlation is well illustrated by a comparison of autism patients with and without ID—where the latter show only modest increases in large CNV burden compared to controls. We also find significant depletion in the frequency of large CNVs in dyslexia compared to the other cohorts. Further studies on larger sets of individuals using high-resolution arrays and next-generation sequencing are warranted for a detailed understanding of the relative contribution of genetic variants to neurodevelopmental disorders.
doi:10.1371/journal.pgen.1002334
PMCID: PMC3213131  PMID: 22102821
10.  Comparison of genome-wide array genomic hybridization platforms for the detection of copy number variants in idiopathic mental retardation 
BMC Medical Genomics  2011;4:25.
Background
Clinical laboratories are adopting array genomic hybridization as a standard clinical test. A number of whole genome array genomic hybridization platforms are available, but little is known about their comparative performance in a clinical context.
Methods
We studied 30 children with idiopathic MR and both unaffected parents of each child using Affymetrix 500 K GeneChip SNP arrays, Agilent Human Genome 244 K oligonucleotide arrays and NimbleGen 385 K Whole-Genome oligonucleotide arrays. We also determined whether CNVs called on these platforms were detected by Illumina Hap550 beadchips or SMRT 32 K BAC whole genome tiling arrays and tested 15 of the 30 trios on Affymetrix 6.0 SNP arrays.
Results
The Affymetrix 500 K, Agilent and NimbleGen platforms identified 3061 autosomal and 117 X chromosomal CNVs in the 30 trios. 147 of these CNVs appeared to be de novo, but only 34 (22%) were found on more than one platform. Performing genotype-phenotype correlations, we identified 7 most likely pathogenic and 2 possibly pathogenic CNVs for MR. All 9 of these putatively pathogenic CNVs were detected by the Affymetrix 500 K, Agilent, NimbleGen and the Illumina arrays, and 5 were found by the SMRT BAC array. Both putatively pathogenic CNVs identified in the 15 trios tested with the Affymetrix 6.0 were identified by this platform.
Conclusions
Our findings demonstrate that different results are obtained with different platforms and illustrate the trade-off that exists between sensitivity and specificity. The large number of apparently false positive CNV calls on each of the platforms supports the need for validating clinically important findings with a different technology.
doi:10.1186/1755-8794-4-25
PMCID: PMC3076225  PMID: 21439053
11.  A sequence-based approach to identify reference genes for gene expression analysis 
BMC Medical Genomics  2010;3:32.
Background
An important consideration when analyzing both microarray and quantitative PCR expression data is the selection of appropriate genes as endogenous controls or reference genes. This step is especially critical when identifying genes differentially expressed between datasets. Moreover, reference genes suitable in one context (e.g. lung cancer) may not be suitable in another (e.g. breast cancer). Currently, the main approach to identify reference genes involves the mining of expression microarray data for highly expressed and relatively constant transcripts across a sample set. A caveat here is the requirement for transcript normalization prior to analysis, and measurements obtained are relative, not absolute. Alternatively, as sequencing-based technologies provide digital quantitative output, absolute quantification ensues, and reference gene identification becomes more accurate.
Methods
Serial analysis of gene expression (SAGE) profiles of non-malignant and malignant lung samples were compared using a permutation test to identify the most stably expressed genes across all samples. Subsequently, the specificity of the reference genes was evaluated across multiple tissue types, their constancy of expression was assessed using quantitative RT-PCR (qPCR), and their impact on differential expression analysis of microarray data was evaluated.
Results
We show that (i) conventional references genes such as ACTB and GAPDH are highly variable between cancerous and non-cancerous samples, (ii) reference genes identified for lung cancer do not perform well for other cancer types (breast and brain), (iii) reference genes identified through SAGE show low variability using qPCR in a different cohort of samples, and (iv) normalization of a lung cancer gene expression microarray dataset with or without our reference genes, yields different results for differential gene expression and subsequent analyses. Specifically, key established pathways in lung cancer exhibit higher statistical significance using a dataset normalized with our reference genes relative to normalization without using our reference genes.
Conclusions
Our analyses found NDUFA1, RPL19, RAB5C, and RPS18 to occupy the top ranking positions among 15 suitable reference genes optimal for normalization of lung tissue expression data. Significantly, the approach used in this study can be applied to data generated using new generation sequencing platforms for the identification of reference genes optimal within diverse contexts.
doi:10.1186/1755-8794-3-32
PMCID: PMC2928167  PMID: 20682026
12.  Integrative Genomic Analyses Identify BRF2 as a Novel Lineage-Specific Oncogene in Lung Squamous Cell Carcinoma 
PLoS Medicine  2010;7(7):e1000315.
William Lockwood and colleagues show that the focal amplification of a gene, BRF2, on Chromosome 8p12 plays a key role in squamous cell carcinoma of the lung.
Background
Traditionally, non-small cell lung cancer is treated as a single disease entity in terms of systemic therapy. Emerging evidence suggests the major subtypes—adenocarcinoma (AC) and squamous cell carcinoma (SqCC)—respond differently to therapy. Identification of the molecular differences between these tumor types will have a significant impact in designing novel therapies that can improve the treatment outcome.
Methods and Findings
We used an integrative genomics approach, combing high-resolution comparative genomic hybridization and gene expression microarray profiles, to compare AC and SqCC tumors in order to uncover alterations at the DNA level, with corresponding gene transcription changes, which are selected for during development of lung cancer subtypes. Through the analysis of multiple independent cohorts of clinical tumor samples (>330), normal lung tissues and bronchial epithelial cells obtained by bronchial brushing in smokers without lung cancer, we identified the overexpression of BRF2, a gene on Chromosome 8p12, which is specific for development of SqCC of lung. Genetic activation of BRF2, which encodes a RNA polymerase III (Pol III) transcription initiation factor, was found to be associated with increased expression of small nuclear RNAs (snRNAs) that are involved in processes essential for cell growth, such as RNA splicing. Ectopic expression of BRF2 in human bronchial epithelial cells induced a transformed phenotype and demonstrates downstream oncogenic effects, whereas RNA interference (RNAi)-mediated knockdown suppressed growth and colony formation of SqCC cells overexpressing BRF2, but not AC cells. Frequent activation of BRF2 in >35% preinvasive bronchial carcinoma in situ, as well as in dysplastic lesions, provides evidence that BRF2 expression is an early event in cancer development of this cell lineage.
Conclusions
This is the first study, to our knowledge, to show that the focal amplification of a gene in Chromosome 8p12, plays a key role in squamous cell lineage specificity of the disease. Our data suggest that genetic activation of BRF2 represents a unique mechanism of SqCC lung tumorigenesis through the increase of Pol III-mediated transcription. It can serve as a marker for lung SqCC and may provide a novel target for therapy.
Please see later in the article for the Editors' Summary
Editors' Summary
Background
Lung cancer is the commonest cause of cancer-related death. Every year, 1.3 million people die from this disease, which is mainly caused by smoking. Most cases of lung cancer are “non-small cell lung cancers” (NSCLCs). Like all cancers, NSCLC starts when cells begin to divide uncontrollably and to move round the body (metastasize) because of changes (mutations) in their genes. These mutations are often in “oncogenes,” genes that, when activated, encourage cell division. Oncogenes can be activated by mutations that alter the properties of the proteins they encode or by mutations that increase the amount of protein made from them, such as gene amplification (an increase in the number of copies of a gene). If NSCLC is diagnosed before it has spread from the lungs (stage I disease), it can be surgically removed and many patients with stage I NSCLC survive for more than 5 years after their diagnosis. Unfortunately, in more than half of patients, NSCLC has metastasized before it is diagnosed. This stage IV NSCLC can be treated with chemotherapy (toxic chemicals that kill fast-growing cancer cells) but only 2% of patients with stage IV lung cancer are alive 5 years after diagnosis.
Why Was This Study Done?
Traditionally, NSCLC has been regarded as a single disease in terms of treatment. However, emerging evidence suggests that the two major subtypes of NSCLC—adenocarcinoma and squamous cell carcinoma (SqCC)—respond differently to chemotherapy. Adenocarcinoma and SqCC start in different types of lung cell and experts think that for each cell type in the body, specific combinations of mutations interact with the cell type's own unique characteristics to provide the growth and survival advantage needed for cancer development. If this is true, then identifying the molecular differences between adenocarcinoma and SqCC could provide targets for more effective therapies for these major subtypes of NSCLC. Amplification of a chromosome region called 8p12 is very common in NSCLC, which suggests that an oncogene that drives lung cancer development is present in this chromosome region. In this study, the researchers investigate this possibility by looking for an amplified gene in the 8p12 chromosome region that makes increased amounts of protein in lung SqCC but not in lung adenocarcinoma.
What Did the Researchers Do and Find?
The researchers used a technique called comparative genomic hybridization to show that focal regions of Chromosome 8p are amplified in about 40% of lung SqCCs, but that DNA loss in this region is the most common alteration in lung adenocarcinomas. Ten genes in the 8p12 chromosome region were expressed at higher levels in the SqCC samples that they examined than in adenocarcinoma samples, they report, and overexpression of five of these genes correlated with amplification of the 8p12 region in the SqCC samples. Only one of the genes—BRF2—was more highly expressed in squamous carcinoma cells than in normal bronchial epithelial cells (the cell type that lines the tubes that take air into the lungs and from which SqCC develops). Artificially induced expression of BRF2 in bronchial epithelial cells made these normal cells behave like tumor cells, whereas reduction of BRF2 expression in squamous carcinoma cells made them behave more like normal bronchial epithelial cells. Finally, BRF2 was frequently activated in two early stages of squamous cell carcinoma—bronchial carcinoma in situ and dysplastic lesions.
What Do These Findings Mean?
Together, these findings show that the focal amplification of chromosome region 8p12 plays a role in the development of lung SqCC but not in the development of lung adenocarcinoma, the other major subtype of NSCLC. These findings identify BRF2 (which encodes a RNA polymerase III transcription initiation factor, a protein that is required for the synthesis of RNA molecules that help to control cell growth) as a lung SqCC-specific oncogene and uncover a unique mechanism for lung SqCC development. Most importantly, these findings suggest that genetic activation of BRF2 could be used as a marker for lung SqCC, which might facilitate the early detection of this type of NSCLC and that BRF2 might provide a new target for therapy.
Additional Information
Please access these Web sites via the online version of this summary at http://dx.doi.org/10.1371/journal.pmed.1000315.
The US National Cancer Institute provides detailed information for patients and professionals about all aspects of lung cancer, including information on non-small cell carcinoma (in English and Spanish)
Cancer Research UK also provides information about lung cancer and information on how cancer starts
MedlinePlus has links to other resources about lung cancer (in English and Spanish)
doi:10.1371/journal.pmed.1000315
PMCID: PMC2910599  PMID: 20668658
13.  FACADE: a fast and sensitive algorithm for the segmentation and calling of high resolution array CGH data 
Nucleic Acids Research  2010;38(15):e157.
The availability of high resolution array comparative genomic hybridization (CGH) platforms has led to increasing complexities in data analysis. Specifically, defining contiguous regions of alterations or segmentation can be computationally intensive and popular algorithms can take hours to days for the processing of arrays comprised of hundreds of thousands to millions of elements. Additionally, tumors tend to demonstrate subtle copy number alterations due to heterogeneity, ploidy and hybridization effects. Thus, there is a need for fast, sensitive array CGH segmentation and alteration calling algorithms. Here, we describe Fast Algorithm for Calling After Detection of Edges (FACADE), a highly sensitive and easy to use algorithm designed to rapidly segment and call high resolution array data.
doi:10.1093/nar/gkq548
PMCID: PMC2926625  PMID: 20551132
14.  An integrative multi-dimensional genetic and epigenetic strategy to identify aberrant genes and pathways in cancer 
BMC Systems Biology  2010;4:67.
Background
Genomics has substantially changed our approach to cancer research. Gene expression profiling, for example, has been utilized to delineate subtypes of cancer, and facilitated derivation of predictive and prognostic signatures. The emergence of technologies for the high resolution and genome-wide description of genetic and epigenetic features has enabled the identification of a multitude of causal DNA events in tumors. This has afforded the potential for large scale integration of genome and transcriptome data generated from a variety of technology platforms to acquire a better understanding of cancer.
Results
Here we show how multi-dimensional genomics data analysis would enable the deciphering of mechanisms that disrupt regulatory/signaling cascades and downstream effects. Since not all gene expression changes observed in a tumor are causal to cancer development, we demonstrate an approach based on multiple concerted disruption (MCD) analysis of genes that facilitates the rational deduction of aberrant genes and pathways, which otherwise would be overlooked in single genomic dimension investigations.
Conclusions
Notably, this is the first comprehensive study of breast cancer cells by parallel integrative genome wide analyses of DNA copy number, LOH, and DNA methylation status to interpret changes in gene expression pattern. Our findings demonstrate the power of a multi-dimensional approach to elucidate events which would escape conventional single dimensional analysis and as such, reduce the cohort sample size for cancer gene discovery.
doi:10.1186/1752-0509-4-67
PMCID: PMC2880289  PMID: 20478067
15.  Transcriptome Profiles of Carcinoma-in-Situ and Invasive Non-Small Cell Lung Cancer as Revealed by SAGE 
PLoS ONE  2010;5(2):e9162.
Background
Non-small cell lung cancer (NSCLC) presents as a progressive disease spanning precancerous, preinvasive, locally invasive, and metastatic lesions. Identification of biological pathways reflective of these progressive stages, and aberrantly expressed genes associated with these pathways, would conceivably enhance therapeutic approaches to this devastating disease.
Methodology/Principal Findings
Through the construction and analysis of SAGE libraries, we have determined transcriptome profiles for preinvasive carcinoma-in-situ (CIS) and invasive squamous cell carcinoma (SCC) of the lung, and compared these with expression profiles generated from both bronchial epithelium, and precancerous metaplastic and dysplastic lesions using Ingenuity Pathway Analysis. Expression of genes associated with epidermal development, and loss of expression of genes associated with mucociliary biology, are predominant features of CIS, largely shared with precancerous lesions. Additionally, expression of genes associated with xenobiotic metabolism/detoxification is a notable feature of CIS, and is largely maintained in invasive cancer. Genes related to tissue fibrosis and acute phase immune response are characteristic of the invasive SCC phenotype. Moreover, the data presented here suggests that tissue remodeling/fibrosis is initiated at the early stages of CIS. Additionally, this study indicates that alteration in copy-number status represents a plausible mechanism for differential gene expression in CIS and invasive SCC.
Conclusions/Significance
This study is the first report of large-scale expression profiling of CIS of the lung. Unbiased expression profiling of these preinvasive and invasive lesions provides a platform for further investigations into the molecular genetic events relevant to early stages of squamous NSCLC development. Additionally, up-regulated genes detected at extreme differences between CIS and invasive cancer may have potential to serve as biomarkers for early detection.
doi:10.1371/journal.pone.0009162
PMCID: PMC2820080  PMID: 20161782
16.  SIGMA2: A system for the integrative genomic multi-dimensional analysis of cancer genomes, epigenomes, and transcriptomes 
BMC Bioinformatics  2008;9:422.
Background
High throughput microarray technologies have afforded the investigation of genomes, epigenomes, and transcriptomes at unprecedented resolution. However, software packages to handle, analyze, and visualize data from these multiple 'omics disciplines have not been adequately developed.
Results
Here, we present SIGMA2, a system for the integrative genomic multi-dimensional analysis of cancer genomes, epigenomes, and transcriptomes. Multi-dimensional datasets can be simultaneously visualized and analyzed with respect to each dimension, allowing combinatorial integration of the different assays belonging to the different 'omics.
Conclusion
The identification of genes altered at multiple levels such as copy number, loss of heterozygosity (LOH), DNA methylation and the detection of consequential changes in gene expression can be concertedly performed, establishing SIGMA2 as a novel tool to facilitate the high throughput systems biology analysis of cancer.
doi:10.1186/1471-2105-9-422
PMCID: PMC2571113  PMID: 18840289
17.  MD-SeeGH: a platform for integrative analysis of multi-dimensional genomic data 
BMC Bioinformatics  2008;9:243.
Background
Recent advances in global genomic profiling methodologies have enabled multi-dimensional characterization of biological systems. Complete analysis of these genomic profiles require an in depth look at parallel profiles of segmental DNA copy number status, DNA methylation state, single nucleotide polymorphisms, as well as gene expression profiles. Due to the differences in data types it is difficult to conduct parallel analysis of multiple datasets from diverse platforms.
Results
To address this issue, we have developed an integrative genomic analysis platform MD-SeeGH, a software tool that allows users to rapidly and directly analyze genomic datasets spanning multiple genomic experiments. With MD-SeeGH, users have the flexibility to easily update datasets in accordance with new genomic builds, make a quality assessment of data using the filtering features, and identify genetic alterations within single or across multiple experiments. Multiple sample analysis in MD-SeeGH allows users to compare profiles from many experiments alongside tracks containing detailed localized gene information, microRNA, CpG islands, and copy number variations.
Conclusion
MD-SeeGH is a new platform for the integrative analysis of diverse microarray data, facilitating multiple profile analyses and group comparisons.
doi:10.1186/1471-2105-9-243
PMCID: PMC2408605  PMID: 18492270
18.  SIGMA: A System for Integrative Genomic Microarray Analysis of Cancer Genomes 
BMC Genomics  2006;7:324.
Background
The prevalence of high resolution profiling of genomes has created a need for the integrative analysis of information generated from multiple methodologies and platforms. Although the majority of data in the public domain are gene expression profiles, and expression analysis software are available, the increase of array CGH studies has enabled integration of high throughput genomic and gene expression datasets. However, tools for direct mining and analysis of array CGH data are limited. Hence, there is a great need for analytical and display software tailored to cross platform integrative analysis of cancer genomes.
Results
We have created a user-friendly java application to facilitate sophisticated visualization and analysis such as cross-tumor and cross-platform comparisons. To demonstrate the utility of this software, we assembled array CGH data representing Affymetrix SNP chip, Stanford cDNA arrays and whole genome tiling path array platforms for cross comparison. This cancer genome database contains 267 profiles from commonly used cancer cell lines representing 14 different tissue types.
Conclusion
In this study we have developed an application for the visualization and analysis of data from high resolution array CGH platforms that can be adapted for analysis of multiple types of high throughput genomic datasets. Furthermore, we invite researchers using array CGH technology to deposit both their raw and processed data, as this will be a continually expanding database of cancer genomes. This publicly available resource, the System for Integrative Genomic Microarray Analysis (SIGMA) of cancer genomes, can be accessed at .
doi:10.1186/1471-2164-7-324
PMCID: PMC1764892  PMID: 17192189
19.  Large fragment Bst DNA polymerase for whole genome amplification of DNA from formalin-fixed paraffin-embedded tissues 
BMC Genomics  2006;7:312.
Background
Formalin-fixed paraffin-embedded (FFPE) tissues represent the largest source of archival biological material available for genomic studies of human cancer. Therefore, it is desirable to develop methods that enable whole genome amplification (WGA) using DNA extracted from FFPE tissues. Multiple-strand Displacement Amplification (MDA) is an isothermal method for WGA that uses the large fragment of Bst DNA polymerase. To date, MDA has been feasible only for genomic DNA isolated from fresh or snap-frozen tissue, and yields a representational distortion of less than threefold.
Results
We amplified genomic DNA of five FFPE samples of normal human lung tissue with the large fragment of Bst DNA polymerase. Using quantitative PCR, the copy number of 7 genes was evaluated in both amplified and original DNA samples. Four neuroblastoma xenograft samples derived from cell lines with known N-myc gene copy number were also evaluated, as were 7 samples of non-small cell lung cancer (NSCLC) tumors with known Skp2 gene amplification. In addition, we compared the array comparative genomic hybridization (CGH)-based genome profiles of two NSCLC samples before and after Bst MDA. A median 990-fold amplification of DNA was achieved. The DNA amplification products had a very high molecular weight (> 23 Kb). When the gene content of the amplified samples was compared to that of the original samples, the representational distortion was limited to threefold. Array CGH genome profiles of amplified and non-amplified FFPE DNA were similar.
Conclusion
Large fragment Bst DNA polymerase is suitable for WGA of DNA extracted from FFPE tissues, with an expected maximal representational distortion of threefold. Amplified DNA may be used for the detection of gene copy number changes by quantitative realtime PCR and genome profiling by array CGH.
doi:10.1186/1471-2164-7-312
PMCID: PMC1764024  PMID: 17156491
20.  SeeGH – A software tool for visualization of whole genome array comparative genomic hybridization data 
BMC Bioinformatics  2004;5:13.
Background
Array comparative genomic hybridization (CGH) is a technique which detects copy number differences in DNA segments. Complete sequencing of the human genome and the development of an array representing a tiling set of tens of thousands of DNA segments spanning the entire human genome has made high resolution copy number analysis throughout the genome possible. Since array CGH provides signal ratio for each DNA segment, visualization would require the reassembly of individual data points into chromosome profiles.
Results
We have developed a visualization tool for displaying whole genome array CGH data in the context of chromosomal location. SeeGH is an application that translates spot signal ratio data from array CGH experiments to displays of high resolution chromosome profiles. Data is imported from a simple tab delimited text file obtained from standard microarray image analysis software. SeeGH processes the signal ratio data and graphically displays it in a conventional CGH karyotype diagram with the added features of magnification and DNA segment annotation. In this process, SeeGH imports the data into a database, calculates the average ratio and standard deviation for each replicate spot, and links them to chromosome regions for graphical display. Once the data is displayed, users have the option of hiding or flagging DNA segments based on user defined criteria, and retrieve annotation information such as clone name, NCBI sequence accession number, ratio, base pair position on the chromosome, and standard deviation.
Conclusions
SeeGH represents a novel software tool used to view and analyze array CGH data. The software gives users the ability to view the data in an overall genomic view as well as magnify specific chromosomal regions facilitating the precise localization of genetic alterations. SeeGH is easily installed and runs on Microsoft Windows 2000 or later environments.
doi:10.1186/1471-2105-5-13
PMCID: PMC373529  PMID: 15040819
Array comparitve genomic hybridization; aCGH
21.  Estimates of penetrance for recurrent pathogenic copy-number variations 
Genetics in Medicine  2012;15(6):478-481.
Purpose:
Although an increasing number of copy-number variations are being identified as susceptibility loci for a variety of pediatric diseases, the penetrance of these copy-number variations remains mostly unknown. This poses challenges for counseling, both for recurrence risks and prenatal diagnosis. We sought to provide empiric estimates for penetrance for some of these recurrent, disease-susceptibility loci.
Methods:
We conducted a Bayesian analysis, based on the copy-number variation frequencies in control populations (n = 22,246) and in our database of >48,000 postnatal microarray-based comparative genomic hybridization samples. The background risk for congenital anomalies/developmental delay/intellectual disability was assumed to be ~5%. Copy-number variations studied were 1q21.1 proximal duplications, 1q21.1 distal deletions and duplications, 15q11.2 deletions, 16p13.11 deletions, 16p12.1 deletions, 16p11.2 proximal and distal deletions and duplications, 17q12 deletions and duplications, and 22q11.21 duplications.
Results:
Estimates for the risk of an abnormal phenotype ranged from 10.4% for 15q11.2 deletions to 62.4% for distal 16p11.2 deletions.
Conclusion:
This model can be used to provide more precise estimates for the chance of an abnormal phenotype for many copy-number variations encountered in the prenatal setting. By providing the penetrance, additional, critical information can be given to prospective parents in the genetic counseling session.
doi:10.1038/gim.2012.164
PMCID: PMC3664238  PMID: 23258348
copy-number variation; genomic disorder; microarray; penetrance; prenatal diagnosis
22.  Integrating the multiple dimensions of genomic and epigenomic landscapes of cancer 
Cancer metastasis reviews  2010;29(1):73-93.
Advances in high-throughput, genome-wide profiling technologies have allowed for an unprecedented view of the cancer genome landscape. Specifically, high-density microarrays and sequencing-based strategies have been widely utilized to identify genetic (such as gene dosage, allelic status, and mutations in gene sequence) and epigenetic (such as DNA methylation, histone modification, and micro-RNA) aberrations in cancer. Although the application of these profiling technologies in unidimensional analyses has been instrumental in cancer gene discovery, genes affected by low-frequency events are often overlooked. The integrative approach of analyzing parallel dimensions has enabled the identification of (a) genes that are often disrupted by multiple mechanisms but at low frequencies by any one mechanism and (b) pathways that are often disrupted at multiple components but at low frequencies at individual components. These benefits of using an integrative approach illustrate the concept that the whole is greater than the sum of its parts. As efforts have now turned toward parallel and integrative multidimensional approaches for studying the cancer genome landscape in hopes of obtaining a more insightful understanding of the key genes and pathways driving cancer cells, this review describes key findings disseminating from such high-throughput, integrative analyses, including contributions to our understanding of causative genetic events in cancer cell biology.
doi:10.1007/s10555-010-9199-2
PMCID: PMC3415277  PMID: 20108112
Integrative analysis; Cancer genome; Sequencing; Microarray
23.  Divergent Genomic and Epigenomic Landscapes of Lung Cancer Subtypes Underscore the Selection of Different Oncogenic Pathways during Tumor Development 
PLoS ONE  2012;7(5):e37775.
For therapeutic purposes, non-small cell lung cancer (NSCLC) has traditionally been regarded as a single disease. However, recent evidence suggest that the two major subtypes of NSCLC, adenocarcinoma (AC) and squamous cell carcinoma (SqCC) respond differently to both molecular targeted and new generation chemotherapies. Therefore, identifying the molecular differences between these tumor types may impact novel treatment strategy. We performed the first large-scale analysis of 261 primary NSCLC tumors (169 AC and 92 SqCC), integrating genome-wide DNA copy number, methylation and gene expression profiles to identify subtype-specific molecular alterations relevant to new agent design and choice of therapy. Comparison of AC and SqCC genomic and epigenomic landscapes revealed 778 altered genes with corresponding expression changes that are selected during tumor development in a subtype-specific manner. Analysis of >200 additional NSCLCs confirmed that these genes are responsible for driving the differential development and resulting phenotypes of AC and SqCC. Importantly, we identified key oncogenic pathways disrupted in each subtype that likely serve as the basis for their differential tumor biology and clinical outcomes. Downregulation of HNF4α target genes was the most common pathway specific to AC, while SqCC demonstrated disruption of numerous histone modifying enzymes as well as the transcription factor E2F1. In silico screening of candidate therapeutic compounds using subtype-specific pathway components identified HDAC and PI3K inhibitors as potential treatments tailored to lung SqCC. Together, our findings suggest that AC and SqCC develop through distinct pathogenetic pathways that have significant implication in our approach to the clinical management of NSCLC.
doi:10.1371/journal.pone.0037775
PMCID: PMC3357406  PMID: 22629454

Results 1-23 (23)