PMCC PMCC

Search tips
Search criteria

Advanced
Results 1-11 (11)
 

Clipboard (0)
None

Select a Filter Below

Journals
Year of Publication
Document Types
1.  Investigating and Correcting Plasma DNA Sequencing Coverage Bias to Enhance Aneuploidy Discovery 
PLoS ONE  2014;9(1):e86993.
Pregnant women carry a mixture of cell-free DNA fragments from self and fetus (non-self) in their circulation. In recent years multiple independent studies have demonstrated the ability to detect fetal trisomies such as trisomy 21, the cause of Down syndrome, by Next-Generation Sequencing of maternal plasma. The current clinical tests based on this approach show very high sensitivity and specificity, although as yet they have not become the standard diagnostic test. Here we describe improvements to the analysis of the sequencing data by reducing GC bias and better handling of the genomic repeats. We show substantial improvements in the sensitivity of the standard trisomy 21 statistical tests, which we measure by artificially reducing read coverage. We also explore the bias stemming from the natural cleavage of plasma DNA by examining DNA motifs and position specific base distributions. We propose a model to correct this fragmentation bias and observe that incorporating this bias does not lead to any further improvements in the detection of fetal trisomy. The improved bias corrections that we demonstrate in this work can be readily adopted into existing fetal trisomy detection protocols and should also lead to improvements in sub-chromosomal copy number variation detection.
doi:10.1371/journal.pone.0086993
PMCID: PMC3906086  PMID: 24489824
2.  Variable hearing impairment in a DFNB2 family with a novel MYO7A missense mutation 
Clinical genetics  2010;77(6):563-571.
Myosin VIIA mutations have been associated with non-syndromic hearing loss (DFNB2; DFNA11) and Usher syndrome type 1B (USH1B). We report clinical and genetic analyzes of a consanguineous Iranian family segregating autosomal recessive non-syndromic hearing loss (ARNSHL). The hearing impairment was mapped to the DFNB2 locus using Affymetrix 50K GeneChips; direct sequencing of the MYO7A gene was completed. The Iranian family (L-1419) was shown to segregate a novel homozygous missense mutation (c.1184G>A) that results in a p.R395H amino acid substitution in the motor domain of the myosin VIIA protein. Since one affected family member had significantly less severe hearing loss we used a candidate approach to search for a genetic modifier. This novel MYO7A mutation is the first reported to cause DFNB2 in the Iranian population and this DFNB2 family is the first to be associated with a potential modifier. The absence of vestibular and retinal defects, and less severe low frequency hearing loss, is consistent with the phenotype of a recently reported Pakistani DFNB2 family. Thus, we conclude this family has non-syndromic hearing loss (DFNB2) rather than Usher syndrome type 1B (USH1B), providing further evidence that these two diseases represent discrete disorders.
doi:10.1111/j.1399-0004.2009.01344.x
PMCID: PMC2891191  PMID: 20132242
DFNB2; genetic modifier; MYO7A gene; missense mutation; motor domain; myosin VIIA protein; USH1B
3.  The cost of reducing starting RNA quantity for Illumina BeadArrays: A bead-level dilution experiment 
BMC Genomics  2010;11:540.
Background
The demands of microarray expression technologies for quantities of RNA place a limit on the questions they can address. As a consequence, the RNA requirements have reduced over time as technologies have improved. In this paper we investigate the costs of reducing the starting quantity of RNA for the Illumina BeadArray platform. This we do via a dilution data set generated from two reference RNA sources that have become the standard for investigations into microarray and sequencing technologies.
Results
We find that the starting quantity of RNA has an effect on observed intensities despite the fact that the quantity of cRNA being hybridized remains constant. We see a loss of sensitivity when using lower quantities of RNA, but no great rise in the false positive rate. Even with 10 ng of starting RNA, the positive results are reliable although many differentially expressed genes are missed. We see that there is some scope for combining data from samples that have contributed differing quantities of RNA, but note also that sample sizes should increase to compensate for the loss of signal-to-noise when using low quantities of starting RNA.
Conclusions
The BeadArray platform maintains a low false discovery rate even when small amounts of starting RNA are used. In contrast, the sensitivity of the platform drops off noticeably over the same range. Thus, those conducting experiments should not opt for low quantities of starting RNA without consideration of the costs of doing so. The implications for experimental design, and the integration of data from different starting quantities, are complex.
doi:10.1186/1471-2164-11-540
PMCID: PMC3091689  PMID: 20925945
4.  A Bayesian deconvolution strategy for immunoprecipitation-based DNA methylome analysis 
Nature biotechnology  2008;26(7):779-785.
DNA methylation is an indispensible epigenetic modification of mammalian genomes. Consequently there is great interest in strategies for genome-wide/whole-genome DNA methylation analysis, and immunoprecipitation-based methods have proven to be a powerful option. Such methods are rapidly shifting the bottleneck from data generation to data analysis, necessitating the development of better analytical tools. Until now, a major analytical difficulty associated with immunoprecipitation-based DNA methylation profiling has been the inability to estimate absolute methylation levels. Here we report the development of a novel cross-platform algorithm – Bayesian Tool for Methylation Analysis (Batman) – for analyzing Methylated DNA Immunoprecipitation (MeDIP) profiles generated using arrays (MeDIP-chip) or next-generation sequencing (MeDIP-seq). The latter is an approach we have developed to elucidate the first high-resolution whole-genome DNA methylation profile (DNA methylome) of any mammalian genome. MeDIP-seq/MeDIP-chip combined with Batman represent robust, quantitative, and cost-effective functional genomic strategies for elucidating the function of DNA methylation.
doi:10.1038/nbt1414
PMCID: PMC2644410  PMID: 18612301
5.  Tissue-specific splicing factor gene expression signatures 
Nucleic Acids Research  2008;36(15):4823-4832.
The alternative splicing code that controls and coordinates the transcriptome in complex multicellular organisms remains poorly understood. It has long been argued that regulation of alternative splicing relies on combinatorial interactions between multiple proteins, and that tissue-specific splicing decisions most likely result from differences in the concentration and/or activity of these proteins. However, large-scale data to systematically address this issue have just recently started to become available. Here we show that splicing factor gene expression signatures can be identified that reflect cell type and tissue-specific patterns of alternative splicing. We used a computational approach to analyze microarray-based gene expression profiles of splicing factors from mouse, chimpanzee and human tissues. Our results show that brain and testis, the two tissues with highest levels of alternative splicing events, have the largest number of splicing factor genes that are most highly differentially expressed. We further identified SR protein kinases and small nuclear ribonucleoprotein particle (snRNP) proteins among the splicing factor genes that are most highly differentially expressed in a particular tissue. These results indicate the power of generating signature-based predictions as an initial computational approach into a global view of tissue-specific alternative splicing regulation.
doi:10.1093/nar/gkn463
PMCID: PMC2528195  PMID: 18653532
6.  Breaking the waves: improved detection of copy number variation from microarray-based comparative genomic hybridization 
Genome Biology  2007;8(10):R228.
Datasets used for detecting copy number variation (CNV) are shown to be affected by a technical artifact. A novel CNV calling algorithm is presented which removes this artifact and identifies regions of CNV better than existing methods.
Background
Large-scale high throughput studies using microarray technology have established that copy number variation (CNV) throughout the genome is more frequent than previously thought. Such variation is known to play an important role in the presence and development of phenotypes such as HIV-1 infection and Alzheimer's disease. However, methods for analyzing the complex data produced and identifying regions of CNV are still being refined.
Results
We describe the presence of a genome-wide technical artifact, spatial autocorrelation or 'wave', which occurs in a large dataset used to determine the location of CNV across the genome. By removing this artifact we are able to obtain both a more biologically meaningful clustering of the data and an increase in the number of CNVs identified by current calling methods without a major increase in the number of false positives detected. Moreover, removing this artifact is critical for the development of a novel model-based CNV calling algorithm - CNVmix - that uses cross-sample information to identify regions of the genome where CNVs occur. For regions of CNV that are identified by both CNVmix and current methods, we demonstrate that CNVmix is better able to categorize samples into groups that represent copy number gains or losses.
Conclusion
Removing artifactual 'waves' (which appear to be a general feature of array comparative genomic hybridization (aCGH) datasets) and using cross-sample information when identifying CNVs enables more biological information to be extracted from aCGH experiments designed to investigate copy number variation in normal individuals.
doi:10.1186/gb-2007-8-10-r228
PMCID: PMC2246302  PMID: 17961237
7.  MicroRNA expression profiling of human breast cancer identifies new markers of tumor subtype 
Genome Biology  2007;8(10):R214.
Integrated analysis of miRNA expression and genomic changes in human breast tumors allows the classification of tumor subtypes.
Background
MicroRNAs (miRNAs), a class of short non-coding RNAs found in many plants and animals, often act post-transcriptionally to inhibit gene expression.
Results
Here we report the analysis of miRNA expression in 93 primary human breast tumors, using a bead-based flow cytometric miRNA expression profiling method. Of 309 human miRNAs assayed, we identify 133 miRNAs expressed in human breast and breast tumors. We used mRNA expression profiling to classify the breast tumors as luminal A, luminal B, basal-like, HER2+ and normal-like. A number of miRNAs are differentially expressed between these molecular tumor subtypes and individual miRNAs are associated with clinicopathological factors. Furthermore, we find that miRNAs could classify basal versus luminal tumor subtypes in an independent data set. In some cases, changes in miRNA expression correlate with genomic loss or gain; in others, changes in miRNA expression are likely due to changes in primary transcription and or miRNA biogenesis. Finally, the expression of DICER1 and AGO2 is correlated with tumor subtype and may explain some of the changes in miRNA expression observed.
Conclusion
This study represents the first integrated analysis of miRNA expression, mRNA expression and genomic changes in human breast cancer and may serve as a basis for functional studies of the role of miRNAs in the etiology of breast cancer. Furthermore, we demonstrate that bead-based flow cytometric miRNA expression profiling might be a suitable platform to classify breast cancer into prognostic molecular subtypes.
doi:10.1186/gb-2007-8-10-r214
PMCID: PMC2246288  PMID: 17922911
8.  High-resolution aCGH and expression profiling identifies a novel genomic subtype of ER negative breast cancer 
Genome Biology  2007;8(10):R215.
High resolution array-CGH and expression profiling identifies a novel genomic subtype of ER negative breast cancer, and provides a genome-wide list of common copy number alterations associated with aberrant expression and poor prognosis.
Background
The characterization of copy number alteration patterns in breast cancer requires high-resolution genome-wide profiling of a large panel of tumor specimens. To date, most genome-wide array comparative genomic hybridization studies have used tumor panels of relatively large tumor size and high Nottingham Prognostic Index (NPI) that are not as representative of breast cancer demographics.
Results
We performed an oligo-array-based high-resolution analysis of copy number alterations in 171 primary breast tumors of relatively small size and low NPI, which was therefore more representative of breast cancer demographics. Hierarchical clustering over the common regions of alteration identified a novel subtype of high-grade estrogen receptor (ER)-negative breast cancer, characterized by a low genomic instability index. We were able to validate the existence of this genomic subtype in one external breast cancer cohort. Using matched array expression data we also identified the genomic regions showing the strongest coordinate expression changes ('hotspots'). We show that several of these hotspots are located in the phosphatome, kinome and chromatinome, and harbor members of the 122-breast cancer CAN-list. Furthermore, we identify frequently amplified hotspots on 8q22.3 (EDD1, WDSOF1), 8q24.11-13 (THRAP6, DCC1, SQLE, SPG8) and 11q14.1 (NDUFC2, ALG8, USP35) associated with significantly worse prognosis. Amplification of any of these regions identified 37 samples with significantly worse overall survival (hazard ratio (HR) = 2.3 (1.3-1.4) p = 0.003) and time to distant metastasis (HR = 2.6 (1.4-5.1) p = 0.004) independently of NPI.
Conclusion
We present strong evidence for the existence of a novel subtype of high-grade ER-negative tumors that is characterized by a low genomic instability index. We also provide a genome-wide list of common copy number alteration regions in breast cancer that show strong coordinate aberrant expression, and further identify novel frequently amplified regions that correlate with poor prognosis. Many of the genes associated with these regions represent likely novel oncogenes or tumor suppressors.
doi:10.1186/gb-2007-8-10-r215
PMCID: PMC2246289  PMID: 17925008
9.  Missing channels in two-colour microarray experiments: Combining single-channel and two-channel data 
BMC Bioinformatics  2007;8:26.
Background
There are mechanisms, notably ozone degradation, that can damage a single channel of two-channel microarray experiments. Resulting analyses therefore often choose between the unacceptable inclusion of poor quality data or the unpalatable exclusion of some (possibly a lot of) good quality data along with the bad. Two such approaches would be a single channel analysis using some of the data from all of the arrays, and an analysis of all of the data, but only from unaffected arrays. In this paper we examine a 'combined' approach to the analysis of such affected experiments that uses all of the unaffected data.
Results
A simulation experiment shows that while a single channel analysis performs relatively well when the majority of arrays are affected, and excluding affected arrays performs relatively well when few arrays are affected (as would be expected in both cases), the combined approach out-performs both. There are benefits to actively estimating the key-parameter of the approach, but whether these compensate for the increased computational cost and complexity over just setting that parameter to take a fixed value is not clear. Inclusion of ozone-affected data results in poor performance, with a clear spatial effect in the damage being apparent.
Conclusion
There is no need to exclude unaffected data in order to remove those which are damaged. The combined approach discussed here is shown to out-perform more usual approaches, although it seems that if the damage is limited to very few arrays, or extends to very nearly all, then the benefits will be limited. In other circumstances though, large improvements in performance can be achieved by adopting such an approach.
doi:10.1186/1471-2105-8-26
PMCID: PMC1797192  PMID: 17254358
10.  MMASS: an optimized array-based method for assessing CpG island methylation 
Nucleic Acids Research  2006;34(20):e136.
We describe an optimized microarray method for identifying genome-wide CpG island methylation called microarray-based methylation assessment of single samples (MMASS) which directly compares methylated to unmethylated sequences within a single sample. To improve previous methods we used bioinformatic analysis to predict an optimized combination of methylation-sensitive enzymes that had the highest utility for CpG-island probes and different methods to produce unmethylated representations of test DNA for more sensitive detection of differential methylation by hybridization. Subtraction or methylation-dependent digestion with McrBC was used with optimized (MMASS-v2) or previously described (MMASS-v1, MMASS-sub) methylation-sensitive enzyme combinations and compared with a published McrBC method. Comparison was performed using DNA from the cell line HCT116. We show that the distribution of methylation microarray data is inherently skewed and requires exogenous spiked controls for normalization and that analysis of digestion of methylated and unmethylated control sequences together with linear fit models of replicate data showed superior statistical power for the MMASS-v2 method. Comparison with previous methylation data for HCT116 and validation of CpG islands from PXMP4, SFRP2, DCC, RARB and TSEN2 confirmed the accuracy of MMASS-v2 results. The MMASS-v2 method offers improved sensitivity and statistical power for high-throughput microarray identification of differential methylation.
doi:10.1093/nar/gkl551
PMCID: PMC1635254  PMID: 17041235
11.  Cell Cycle Genes Are the Evolutionarily Conserved Targets of the E2F4 Transcription Factor 
PLoS ONE  2007;2(10):e1061.
Maintaining quiescent cells in G0 phase is achieved in part through the multiprotein subunit complex known as DREAM, and in human cell lines the transcription factor E2F4 directs this complex to its cell cycle targets. We found that E2F4 binds a highly overlapping set of human genes among three diverse primary tissues and an asynchronous cell line, which suggests that tissue-specific binding partners and chromatin structure have minimal influence on E2F4 targeting. To investigate the conservation of these transcription factor binding events, we identified the mouse genes bound by E2f4 in seven primary mouse tissues and a cell line. E2f4 bound a set of mouse genes that was common among mouse tissues, but largely distinct from the genes bound in human. The evolutionarily conserved set of E2F4 bound genes is highly enriched for functionally relevant regulatory interactions important for maintaining cellular quiescence. In contrast, we found minimal mRNA expression perturbations in this core set of E2f4 bound genes in the liver, kidney, and testes of E2f4 null mice. Thus, the regulatory mechanisms maintaining quiescence are robust even to complete loss of conserved transcription factor binding events.
doi:10.1371/journal.pone.0001061
PMCID: PMC2020443  PMID: 17957245

Results 1-11 (11)