Search tips
Search criteria

Results 1-25 (26)

Clipboard (0)

Select a Filter Below

more »
Year of Publication
1.  G-CSF Receptor Positive Neuroblastoma Subpopulations are Enriched in Chemotherapy –Resistant or Relapsed Tumors and are Highly Tumorigenic 
Cancer research  2013;73(13):4134-4146.
Neuroblastoma is a neural crest derived embryonal malignancy which accounts for 13% of all pediatric cancer mortality, primarily due to tumor recurrence. Therapy-resistant cancer stem cells are implicated in tumor relapse, but definitive phenotypic evidence of the existence of these cells has been lacking. In this study, we define a highly tumorigenic subpopulation in neuroblastoma with stem cell characteristics, based on the expression of CD114, which encodes the receptor for granulocyte colony-stimulating factor (G-CSF). CD114+ cells isolated from a primary tumor and the NGP cell line by flow cytometry were highly tumorigenic and capable of both self-renewal and differentiation to progeny cells. CD114+ cells closely resembled embryonic and induced pluripotent stem cells with respect to their profiles of cell cycle, microRNA and gene expression. In addition, they reflect a primitive undifferentiated neuroectodermal/neural crest phenotype revealing a developmental hierarchy within neuroblastoma tumors. We detected this de-differentiated neural crest subpopulation in all established neuroblastoma cell lines, xenograft tumors, and primary tumor specimens analyzed. Ligand activation of CD114 by the addition of exogenous G-CSF to CD114+ cells confirmed intact STAT3 upregulation, characteristic of G-CSF receptor signaling. Together our data describe a novel distinct subpopulation within neuroblastoma with enhanced tumorigenicity and a stem-cell like phenotype, further elucidating the complex heterogeneity of solid tumors such as neuroblastoma. We propose this subpopulation may represent an additional target for novel therapeutic approaches to this aggressive pediatric malignancy.
PMCID: PMC4298227  PMID: 23687340
2.  An epigenomic approach to therapy for tamoxifen-resistant breast cancer 
Cell Research  2014;24(7):809-819.
Tamoxifen has been a frontline treatment for estrogen receptor alpha (ERα)-positive breast tumors in premenopausal women. However, resistance to tamoxifen occurs in many patients. ER still plays a critical role in the growth of breast cancer cells with acquired tamoxifen resistance, suggesting that ERα remains a valid target for treatment of tamoxifen-resistant (Tam-R) breast cancer. In an effort to identify novel regulators of ERα signaling, through a small-scale siRNA screen against histone methyl modifiers, we found WHSC1, a histone H3K36 methyltransferase, as a positive regulator of ERα signaling in breast cancer cells. We demonstrated that WHSC1 is recruited to the ERα gene by the BET protein BRD3/4, and facilitates ERα gene expression. The small-molecule BET protein inhibitor JQ1 potently suppressed the classic ERα signaling pathway and the growth of Tam-R breast cancer cells in culture. Using a Tam-R breast cancer xenograft mouse model, we demonstrated in vivo anti-breast cancer activity by JQ1 and a strong long-lasting effect of combination therapy with JQ1 and the ER degrader fulvestrant. Taken together, we provide evidence that the epigenomic proteins BRD3/4 and WHSC1 are essential regulators of estrogen receptor signaling and are novel therapeutic targets for treatment of Tam-R breast cancer.
PMCID: PMC4085766  PMID: 24874954
epigenomic; tamoxifen; breast cancer
3.  Analysis of interactions between the epigenome and structural mutability of the genome using Genboree workbench tools 
BMC Bioinformatics  2014;15(Suppl 7):S2.
Interactions between the epigenome and structural genomic variation are potentially bi-directional. In one direction, structural variants may cause epigenomic changes in cis. In the other direction, specific local epigenomic states such as DNA hypomethylation associate with local genomic instability.
To study these interactions, we have developed several tools and exposed them to the scientific community using the Software-as-a-Service model via the Genboree Workbench. One key tool is Breakout, an algorithm for fast and accurate detection of structural variants from mate pair sequencing data.
By applying Breakout and other Genboree Workbench tools we map breakpoints in breast and prostate cancer cell lines and tumors, discriminate between polymorphic breakpoints of germline origin and those of somatic origin, and analyze both types of breakpoints in the context of the Human Epigenome Atlas, ENCODE databases, and other sources of epigenomic profiles. We confirm previous findings that genomic instability in human germline associates with hypomethylation of DNA, binding sites of Suz12, a key member of the PRC2 Polycomb complex, and with PRC2-associated histone marks H3K27me3 and H3K9me3. Breakpoints in germline and in breast cancer associate with distal regulatory of active gene transcription. Breast cancer cell lines and tumors show distinct patterns of structural mutability depending on their ER, PR, or HER2 status.
The patterns of association that we detected suggest that cell-type specific epigenomes may determine cell-type specific patterns of selective structural mutability of the genome.
PMCID: PMC4110728  PMID: 25080362
4.  mtDNA haplogroup and single nucleotide polymorphisms structure human microbiome communities 
BMC Genomics  2014;15:257.
Although our microbial community and genomes (the human microbiome) outnumber our genome by several orders of magnitude, to what extent the human host genetic complement informs the microbiota composition is not clear. The Human Microbiome Project (HMP) Consortium established a unique population-scale framework with which to characterize the relationship of microbial community structure with their human hosts. A wide variety of taxa and metabolic pathways have been shown to be differentially distributed by virtue of race/ethnicity in the HMP. Given that mtDNA haplogroups are the maternally derived ancestral genomic markers and mitochondria’s role as the generator for cellular ATP, characterizing the relationship between human mtDNA genomic variants and microbiome profiles becomes of potential marked biologic and clinical interest.
We leveraged sequencing data from the HMP to investigate the association between microbiome community structures with its own host mtDNA variants. 15 haplogroups and 631 mtDNA nucleotide polymorphisms (mean sequencing depth of 280X on the mitochondria genome) from 89 individuals participating in the HMP were accurately identified. 16S rRNA (V3-V5 region) sequencing generated microbiome taxonomy profiles and whole genome shotgun sequencing generated metabolic profiles from various body sites were treated as traits to conduct association analysis between haplogroups and host clinical metadata through linear regression. The mtSNPs of individuals with European haplogroups were associated with microbiome profiles using PLINK quantitative trait associations with permutation and adjusted for multiple comparisons. We observe that among 139 stool and 59 vaginal posterior fornix samples, several haplogroups show significant association with specific microbiota (q-value < 0.05) as well as their aggregate community structure (Chi-square with Monte Carlo, p < 0.005), which confirmed and expanded previous research on the association of race and ethnicity with microbiome profile. Our results further indicate that mtDNA variations may render different microbiome profiles, possibly through an inflammatory response to different levels of reactive oxygen species activity.
These data provide initial evidence for the association between host ancestral genome with the structure of its microbiome.
PMCID: PMC4234434  PMID: 24694284
HMP; Mitochondrial DNA haplogroup; Association; Microbiome; mtDNA SNP
5.  CDKN2D-WDFY2 Is a Cancer-Specific Fusion Gene Recurrent in High-Grade Serous Ovarian Carcinoma 
PLoS Genetics  2014;10(3):e1004216.
Ovarian cancer is the fifth leading cause of cancer death in women. Almost 70% of ovarian cancer deaths are due to the high-grade serous subtype, which is typically detected only after it has metastasized. Characterization of high-grade serous cancer is further complicated by the significant heterogeneity and genome instability displayed by this cancer. Other than mutations in TP53, which is common to many cancers, highly recurrent recombinant events specific to this cancer have yet to be identified. Using high-throughput transcriptome sequencing of seven patient samples combined with experimental validation at DNA, RNA and protein levels, we identified a cancer-specific and inter-chromosomal fusion gene CDKN2D-WDFY2 that occurs at a frequency of 20% among sixty high-grade serous cancer samples but is absent in non-cancerous ovary and fallopian tube samples. This is the most frequent recombinant event identified so far in high-grade serous cancer implying a major cellular lineage in this highly heterogeneous cancer. In addition, the same fusion transcript was also detected in OV-90, an established high-grade serous type cell line. The genomic breakpoint was identified in intron 1 of CDKN2D and intron 2 of WDFY2 in patient tumor, providing direct evidence that this is a fusion gene. The parental gene, CDKN2D, is a cell-cycle modulator that is also involved in DNA repair, while WDFY2 is known to modulate AKT interactions with its substrates. Transfection of cloned fusion construct led to loss of wildtype CDKN2D and wildtype WDFY2 protein expression, and a gain of a short WDFY2 protein isoform that is presumably under the control of the CDKN2D promoter. The expression of short WDFY2 protein in transfected cells appears to alter the PI3K/AKT pathway that is known to play a role in oncogenesis. CDKN2D-WDFY2 fusion could be an important molecular signature for understanding and classifying sub-lineages among heterogeneous high-grade serous ovarian carcinomas.
Author Summary
High-grade serous carcinoma (HG-SC) is the most common subtype of ovarian cancer observed in women. This subtype of ovarian cancer is typically detected at advanced stages due to lack of effective early screening tools. Recurrent cancer-specific gene fusions resulting from chromosomal translocations have the potential to serve as effective screening tools as well as therapeutic targets. Here we identified CDKN2D-WDFY2 as a cancer-specific fusion gene present in 20% of HG-SC tumors, by far the most frequent gene recombinant event found in this highly heterogeneous disease. We also presented evidence that the expression of this fusion may affect the PI3K/AKT pathway that is important for cancer progression. Thus CDKN2D-WDFY2 could very well represent a major cellular lineage important for detecting and classifying heterogeneous ovarian carcinomas, and could provide insight into the underlying mechanism of this deadly disease. This is critical, given that ovarian cancer kills 140,200 women worldwide each year, and few ovarian cancer-specific molecular alterations are currently available for targeting and screening.
PMCID: PMC3967933  PMID: 24675677
6.  Comparison and quantitative verification of mapping algorithms for whole-genome bisulfite sequencing 
Nucleic Acids Research  2014;42(6):e43.
Coupling bisulfite conversion with next-generation sequencing (Bisulfite-seq) enables genome-wide measurement of DNA methylation, but poses unique challenges for mapping. However, despite a proliferation of Bisulfite-seq mapping tools, no systematic comparison of their genomic coverage and quantitative accuracy has been reported. We sequenced bisulfite-converted DNA from two tissues from each of two healthy human adults and systematically compared five widely used Bisulfite-seq mapping algorithms: Bismark, BSMAP, Pash, BatMeth and BS Seeker. We evaluated their computational speed and genomic coverage and verified their percentage methylation estimates. With the exception of BatMeth, all mappers covered >70% of CpG sites genome-wide and yielded highly concordant estimates of percentage methylation (r2 ≥ 0.95). Fourfold variation in mapping time was found between BSMAP (fastest) and Pash (slowest). In each library, 8–12% of genomic regions covered by Bismark and Pash were not covered by BSMAP. An experiment using simulated reads confirmed that Pash has an exceptional ability to uniquely map reads in genomic regions of structural variation. Independent verification by bisulfite pyrosequencing generally confirmed the percentage methylation estimates by the mappers. Of these algorithms, Bismark provides an attractive combination of processing speed, genomic coverage and quantitative accuracy, whereas Pash offers considerably higher genomic coverage.
PMCID: PMC3973287  PMID: 24391148
7.  Chromatin Changes in Dicer-Deficient Mouse Embryonic Stem Cells in Response to Retinoic Acid Induced Differentiation 
PLoS ONE  2013;8(9):e74556.
Loss of Dicer, an enzyme critical for microRNA biogenesis, results in lethality due to a block in mouse embryonic stem cell (mES) differentiation. Using ChIP-Seq we found increased H3K9me2 at over 900 CpG islands in the Dicer-/-ES epigenome. Gene ontology analysis revealed that promoters of chromatin regulators to be among the most impacted by increased CpG island H3K9me2 in ES (Dicer-/-). We therefore, extended the study to include H3K4me3 and H3K27me3 marks for selected genes. We found that the ES (Dicer-/-) mutant epigenome was characterized by a shift in the overall balance between transcriptionally favorable (H3K4me3) and unfavorable (H3K27me3) marks at key genes regulating ES cell differentiation. Pluripotency genes Oct4, Sox2 and Nanog were not impacted in relation to patterns of H3K27me3 and H3K4me3 and showed no changes in the rates of transcript down-regulation in response to RA. The most striking changes were observed in regards to genes regulating differentiation and the transition from self-renewal to differentiation. An increase in H3K4me3 at the promoter of Lin28b was associated with the down-regulation of this gene at a lower rate in Dicer-/-ES as compared to wild type ES. An increase in H3K27me3 in the promoters of differentiation genes Hoxa1 and Cdx2 in Dicer-/-ES cells was coincident with an inability to up-regulate these genes at the same rate as ES upon retinoic acid (RA)-induced differentiation. We found that siRNAs Ezh2 and post-transcriptional silencing of Ezh2 by let-7g rescued this effect suggesting that Ezh2 up-regulation is in part responsible for increased H3K27me3 and decreased rates of up-regulation of differentiation genes in Dicer-/-ES.
PMCID: PMC3767645  PMID: 24040281
8.  Expression and phosphorylation of the AS160_v2 splice variant supports GLUT4 activation and the Warburg effect in multiple myeloma 
Cancer & Metabolism  2013;1:14.
Multiple myeloma (MM) is a fatal plasma cell malignancy exhibiting enhanced glucose consumption associated with an aerobic glycolytic phenotype (i.e., the Warburg effect). We have previously demonstrated that myeloma cells exhibit constitutive plasma membrane (PM) localization of GLUT4, consistent with the dependence of MM cells on this transporter for maintenance of glucose consumption rates, proliferative capacity, and viability. The purpose of this study was to investigate the molecular basis of constitutive GLUT4 plasma membrane localization in MM cells.
We have elucidated a novel mechanism through which myeloma cells achieve constitutive GLUT4 activation involving elevated expression of the Rab-GTPase activating protein AS160_v2 splice variant to promote the Warburg effect. AS160_v2-positive MM cell lines display constitutive Thr642 phosphorylation, known to be required for inactivation of AS160 Rab-GAP activity. Importantly, we show that enforced expression of AS160_v2 is required for GLUT4 PM translocation and activation in these select MM lines. Furthermore, we demonstrate that ectopic expression of a full-length, phospho-deficient AS160 mutant is sufficient to impair constitutive GLUT4 cell surface residence, which is characteristic of MM cells.
This is the first study to tie AS160 de-regulation to increased glucose consumption rates and the Warburg effect in cancer. Future studies investigating connections between the insulin/IGF-1/AS160_v2/GLUT4 axis and FDG-PET positivity in myeloma patients are warranted and could provide rationale for therapeutically targeting this pathway in MM patients with advanced disease.
PMCID: PMC4178207  PMID: 24280290
10.  The Repertoire and Features of Human Platelet microRNAs 
PLoS ONE  2012;7(12):e50746.
Playing a central role in the maintenance of hemostasis as well as in thrombotic disorders, platelets contain a relatively diverse messenger RNA (mRNA) transcriptome as well as functional mRNA-regulatory microRNAs, suggesting that platelet mRNAs may be regulated by microRNAs. Here, we elucidated the complete repertoire and features of human platelet microRNAs by high-throughput sequencing. More than 492 different mature microRNAs were detected in human platelets, whereas the list of known human microRNAs was expanded further by the discovery of 40 novel microRNA sequences. As in nucleated cells, platelet microRNAs bear signs of post-transcriptional modifications, mainly terminal adenylation and uridylation. In vitro enzymatic assays demonstrated the ability of human platelets to uridylate microRNAs, which correlated with the presence of the uridyltransferase enzyme TUT4. We also detected numerous microRNA isoforms (isomiRs) resulting from imprecise Drosha and/or Dicer processing, in some cases more frequently than the reference microRNA sequence, including 5′ shifted isomiRs with redirected mRNA targeting abilities. This study unveils the existence of a relatively diverse and complex microRNA repertoire in human platelets, and represents a mandatory step towards elucidating the intraplatelet and extraplatelet role, function and importance of platelet microRNAs.
PMCID: PMC3514217  PMID: 23226537
Gastroenterology  2011;141(5):1782-1791.
The intestinal microbiomes of healthy children and pediatric patients with irritable bowel syndrome (IBS) are not well defined. Studies in adults have indicated that the gastrointestinal microbiota could be involved in IBS.
We analyzed 71 samples from 22 children with IBS (pediatric Rome III criteria) and 22 healthy children, ages 7–12 years, by 16S rRNA gene sequencing, with an average of 54,287 reads/stool sample (average 454 read length = 503 bases). Data were analyzed using phylogenetic-based clustering (Unifrac), or an operational taxonomic unit (OTU) approach using a supervised machine learning tool (randomForest). Most samples were also hybridized to a microarray that can detect 8,741 bacterial taxa (16S rRNA PhyloChip).
Microbiomes associated with pediatric IBS were characterized by a significantly greater percentage of the class Gammaproteobacteria (0.07% vs 0.89% of total bacteria; P <.05); one prominent component of this group was Haemophilus parainfluenzae. Differences highlighted by 454 sequencing were confirmed by high-resolution PhyloChip analysis. Using supervised learning techniques, we were able to classify different subtypes of IBS with a success rate of 98.5%, using limited sets of discriminant bacterial species. A novel Ruminococcus-like microbe was associated with IBS, indicating the potential utility of microbe discovery for gastrointestinal disorders. A greater frequency of pain correlated with an increased abundance of several bacterial taxa from the genus Alistipes.
Using16S metagenomics by Phylochip DNA hybridization and deep 454 pyrosequencing, we associated specific microbiome signatures with pediatric IBS. These findings indicate the important association between gastrointestinal microbes and IBS in children; these approaches might be used in diagnosis of functional bowel disorders in pediatric patients.
PMCID: PMC3417828  PMID: 21741921
12.  Atlas2 Cloud: a framework for personal genome analysis in the cloud 
BMC Genomics  2012;13(Suppl 6):S19.
Until recently, sequencing has primarily been carried out in large genome centers which have invested heavily in developing the computational infrastructure that enables genomic sequence analysis. The recent advancements in next generation sequencing (NGS) have led to a wide dissemination of sequencing technologies and data, to highly diverse research groups. It is expected that clinical sequencing will become part of diagnostic routines shortly. However, limited accessibility to computational infrastructure and high quality bioinformatic tools, and the demand for personnel skilled in data analysis and interpretation remains a serious bottleneck. To this end, the cloud computing and Software-as-a-Service (SaaS) technologies can help address these issues.
We successfully enabled the Atlas2 Cloud pipeline for personal genome analysis on two different cloud service platforms: a community cloud via the Genboree Workbench, and a commercial cloud via the Amazon Web Services using Software-as-a-Service model. We report a case study of personal genome analysis using our Atlas2 Genboree pipeline. We also outline a detailed cost structure for running Atlas2 Amazon on whole exome capture data, providing cost projections in terms of storage, compute and I/O when running Atlas2 Amazon on a large data set.
We find that providing a web interface and an optimized pipeline clearly facilitates usage of cloud computing for personal genome analysis, but for it to be routinely used for large scale projects there needs to be a paradigm shift in the way we develop tools, in standard operating procedures, and in funding mechanisms.
PMCID: PMC3481437  PMID: 23134663
13.  The Genboree Microbiome Toolset and the analysis of 16S rRNA microbial sequences 
BMC Bioinformatics  2012;13(Suppl 13):S11.
Microbial metagenomic analyses rely on an increasing number of publicly available tools. Installation, integration, and maintenance of the tools poses significant burden on many researchers and creates a barrier to adoption of microbiome analysis, particularly in translational settings.
To address this need we have integrated a rich collection of microbiome analysis tools into the Genboree Microbiome Toolset and exposed them to the scientific community using the Software-as-a-Service model via the Genboree Workbench. The Genboree Microbiome Toolset provides an interactive environment for users at all bioinformatic experience levels in which to conduct microbiome analysis. The Toolset drives hypothesis generation by providing a wide range of analyses including alpha diversity and beta diversity, phylogenetic profiling, supervised machine learning, and feature selection.
We validate the Toolset in two studies of the gut microbiota, one involving obese and lean twins, and the other involving children suffering from the irritable bowel syndrome.
By lowering the barrier to performing a comprehensive set of microbiome analyses, the Toolset empowers investigators to translate high-volume sequencing data into valuable biomedical discoveries.
PMCID: PMC3426808  PMID: 23320832
14.  Functional annotation of the human brain methylome identifies tissue-specific epigenetic variation across brain and blood 
Genome Biology  2012;13(6):R43.
Dynamic changes to the epigenome play a critical role in establishing and maintaining cellular phenotype during differentiation, but little is known about the normal methylomic differences that occur between functionally distinct areas of the brain. We characterized intra- and inter-individual methylomic variation across whole blood and multiple regions of the brain from multiple donors.
Distinct tissue-specific patterns of DNA methylation were identified, with a highly significant over-representation of tissue-specific differentially methylated regions (TS-DMRs) observed at intragenic CpG islands and low CG density promoters. A large proportion of TS-DMRs were located near genes that are differentially expressed across brain regions. TS-DMRs were significantly enriched near genes involved in functional pathways related to neurodevelopment and neuronal differentiation, including BDNF, BMP4, CACNA1A, CACA1AF, EOMES, NGFR, NUMBL, PCDH9, SLIT1, SLITRK1 and SHANK3. Although between-tissue variation in DNA methylation was found to greatly exceed between-individual differences within any one tissue, we found that some inter-individual variation was reflected across brain and blood, indicating that peripheral tissues may have some utility in epidemiological studies of complex neurobiological phenotypes.
This study reinforces the importance of DNA methylation in regulating cellular phenotype across tissues, and highlights genomic patterns of epigenetic variation across functionally distinct regions of the brain, providing a resource for the epigenetics and neuroscience research communities.
PMCID: PMC3446315  PMID: 22703893
15.  A Metagenomic Approach to Characterization of the Vaginal Microbiome Signature in Pregnancy 
PLoS ONE  2012;7(6):e36466.
While current major national research efforts (i.e., the NIH Human Microbiome Project) will enable comprehensive metagenomic characterization of the adult human microbiota, how and when these diverse microbial communities take up residence in the host and during reproductive life are unexplored at a population level. Because microbial abundance and diversity might differ in pregnancy, we sought to generate comparative metagenomic signatures across gestational age strata. DNA was isolated from the vagina (introitus, posterior fornix, midvagina) and the V5V3 region of bacterial 16S rRNA genes were sequenced (454FLX Titanium platform). Sixty-eight samples from 24 healthy gravidae (18 to 40 confirmed weeks) were compared with 301 non-pregnant controls (60 subjects). Generated sequence data were quality filtered, taxonomically binned, normalized, and organized by phylogeny and into operational taxonomic units (OTU); principal coordinates analysis (PCoA) of the resultant beta diversity measures were used for visualization and analysis in association with sample clinical metadata. Altogether, 1.4 gigabytes of data containing >2.5 million reads (averaging 6,837 sequences/sample of 493 nt in length) were generated for computational analyses. Although gravidae were not excluded by virtue of a posterior fornix pH >4.5 at the time of screening, unique vaginal microbiome signature encompassing several specific OTUs and higher-level clades was nevertheless observed and confirmed using a combination of phylogenetic, non-phylogenetic, supervised, and unsupervised approaches. Both overall diversity and richness were reduced in pregnancy, with dominance of Lactobacillus species (L. iners crispatus, jensenii and johnsonii, and the orders Lactobacillales (and Lactobacillaceae family), Clostridiales, Bacteroidales, and Actinomycetales. This intergroup comparison using rigorous standardized sampling protocols and analytical methodologies provides robust initial evidence that the vaginal microbial 16S rRNA gene catalogue uniquely differs in pregnancy, with variance of taxa across vaginal subsite and gestational age.
PMCID: PMC3374618  PMID: 22719832
16.  Genomic Hypomethylation in the Human Germline Associates with Selective Structural Mutability in the Human Genome 
PLoS Genetics  2012;8(5):e1002692.
The hotspots of structural polymorphisms and structural mutability in the human genome remain to be explained mechanistically. We examine associations of structural mutability with germline DNA methylation and with non-allelic homologous recombination (NAHR) mediated by low-copy repeats (LCRs). Combined evidence from four human sperm methylome maps, human genome evolution, structural polymorphisms in the human population, and previous genomic and disease studies consistently points to a strong association of germline hypomethylation and genomic instability. Specifically, methylation deserts, the ∼1% fraction of the human genome with the lowest methylation in the germline, show a tenfold enrichment for structural rearrangements that occurred in the human genome since the branching of chimpanzee and are highly enriched for fast-evolving loci that regulate tissue-specific gene expression. Analysis of copy number variants (CNVs) from 400 human samples identified using a custom-designed array comparative genomic hybridization (aCGH) chip, combined with publicly available structural variation data, indicates that association of structural mutability with germline hypomethylation is comparable in magnitude to the association of structural mutability with LCR–mediated NAHR. Moreover, rare CNVs occurring in the genomes of individuals diagnosed with schizophrenia, bipolar disorder, and developmental delay and de novo CNVs occurring in those diagnosed with autism are significantly more concentrated within hypomethylated regions. These findings suggest a new connection between the epigenome, selective mutability, evolution, and human disease.
Author Summary
The human genome contains many loci with high incidence of structural mutations, including insertions and deletions of chromosomal segments. This excessive mutability has accelerated evolution and contributed to human disease but has yet to be explained. Segments of DNA repeated in low-copy numbers (LCRs) have been previously implicated in promoting structural mutability in specific disease-associated loci. Lack of methylation (hypomethylation) of genomic DNA has been previously associated with high structural mutability in gibbons and in human cancer cells, but the association with structural mutability in the human germline has not been explored prior to this study. Our analyses confirm the role of LCRs in promoting structural mutability on the genome scale but also reveal a surprisingly strong association of genomic instability with hypomethylation. Specifically, evolutionary analyses reveal that methylation deserts, the ∼1% fraction of the human genome with the lowest methylation in human sperm, harbor a tenfold higher number of structural mutations than genome-wide average. Moreover, the structural mutations in individuals diagnosed with schizophrenia, bipolar disorder, developmental delay, and autism are significantly more concentrated within hypomethylated regions. Our findings suggest a new connection between methylation of genomic DNA, selective structural mutability, evolution, and human disease.
PMCID: PMC3355074  PMID: 22615578
17.  An integrative variant analysis suite for whole exome next-generation sequencing data 
BMC Bioinformatics  2012;13:8.
Whole exome capture sequencing allows researchers to cost-effectively sequence the coding regions of the genome. Although the exome capture sequencing methods have become routine and well established, there is currently a lack of tools specialized for variant calling in this type of data.
Using statistical models trained on validated whole-exome capture sequencing data, the Atlas2 Suite is an integrative variant analysis pipeline optimized for variant discovery on all three of the widely used next generation sequencing platforms (SOLiD, Illumina, and Roche 454). The suite employs logistic regression models in conjunction with user-adjustable cutoffs to accurately separate true SNPs and INDELs from sequencing and mapping errors with high sensitivity (96.7%).
We have implemented the Atlas2 Suite and applied it to 92 whole exome samples from the 1000 Genomes Project. The Atlas2 Suite is available for download at In addition to a command line version, the suite has been integrated into the Genboree Workbench, allowing biomedical scientists with minimal informatics expertise to remotely call, view, and further analyze variants through a simple web interface. The existing genomic databases displayed via the Genboree browser also streamline the process from variant discovery to functional genomics analysis, resulting in an off-the-shelf toolkit for the broader community.
PMCID: PMC3292476  PMID: 22239737
18.  MicroRNA transcriptome in the newborn mouse ovaries determined by massive parallel sequencing 
Molecular Human Reproduction  2010;16(7):463-471.
Small non-coding RNAs, such as microRNAs (miRNAs), are involved in diverse biological processes including organ development and tissue differentiation. Global disruption of miRNA biogenesis in Dicer knockout mice disrupts early embryogenesis and primordial germ cell formation. However, the role of miRNAs in early folliculogenesis is poorly understood. In order to identify a full transcriptome set of small RNAs expressed in the newborn (NB) ovary, we extracted small RNA fraction from mouse NB ovary tissues and subjected it to massive parallel sequencing using the Genome Analyzer from Illumina. Massive sequencing produced 4 655 992 reads of 33 bp each representing a total of 154 Mbp of sequence data. The Pash alignment algorithm mapped 50.13% of the reads to the mouse genome. Sequence reads were clustered based on overlapping mapping coordinates and intersected with known miRNAs, small nucleolar RNAs (snoRNAs), piwi-interacting RNA (piRNA) clusters and repetitive genomic regions; 25.2% of the reads mapped to known miRNAs, 25.5% to genomic repeats, 3.5% to piRNAs and 0.18% to snoRNAs. Three hundred and ninety-eight known miRNA species were among the sequenced small RNAs, and 118 isomiR sequences that are not in the miRBase database. Let-7 family was the most abundantly expressed miRNA, and mmu-mir-672, mmu-mir-322, mmu-mir-503 and mmu-mir-465 families are the most abundant X-linked miRNA detected. X-linked mmu-mir-503, mmu-mir-672 and mmu-mir-465 family showed preferential expression in testes and ovaries. We also identified four novel miRNAs that are preferentially expressed in gonads. Gonadal selective miRNAs may play important roles in ovarian development, folliculogenesis and female fertility.
PMCID: PMC2882868  PMID: 20215419
miRNA; ovary; oocyte; microRNA; ncRNA
19.  Song exposure regulates known and novel microRNAs in the zebra finch auditory forebrain 
BMC Genomics  2011;12:277.
In an important model for neuroscience, songbirds learn to discriminate songs they hear during tape-recorded playbacks, as demonstrated by song-specific habituation of both behavioral and neurogenomic responses in the auditory forebrain. We hypothesized that microRNAs (miRNAs or miRs) may participate in the changing pattern of gene expression induced by song exposure. To test this, we used massively parallel Illumina sequencing to analyse small RNAs from auditory forebrain of adult zebra finches exposed to tape-recorded birdsong or silence.
In the auditory forebrain, we identified 121 known miRNAs conserved in other vertebrates. We also identified 34 novel miRNAs that do not align to human or chicken genomes. Five conserved miRNAs showed significant and consistent changes in copy number after song exposure across three biological replications of the song-silence comparison, with two increasing (tgu-miR-25, tgu-miR-192) and three decreasing (tgu-miR-92, tgu-miR-124, tgu-miR-129-5p). We also detected a locus on the Z sex chromosome that produces three different novel miRNAs, with supporting evidence from Northern blot and TaqMan qPCR assays for differential expression in males and females and in response to song playbacks. One of these, tgu-miR-2954-3p, is predicted (by TargetScan) to regulate eight song-responsive mRNAs that all have functions in cellular proliferation and neuronal differentiation.
The experience of hearing another bird singing alters the profile of miRNAs in the auditory forebrain of zebra finches. The response involves both known conserved miRNAs and novel miRNAs described so far only in the zebra finch, including a novel sex-linked, song-responsive miRNA. These results indicate that miRNAs are likely to contribute to the unique behavioural biology of learned song communication in songbirds.
PMCID: PMC3118218  PMID: 21627805
20.  Comparison of sequencing-based methods to profile DNA methylation and identification of monoallelic epigenetic modifications 
Nature biotechnology  2010;28(10):1097-1105.
Sequencing-based DNA methylation profiling methods are comprehensive and, as accuracy and affordability improve, will increasingly supplant microarrays for genome-scale analyses. Here, four sequencing-based methodologies were applied to biological replicates of human embryonic stem cells to compare their CpG coverage genome-wide and in transposons, resolution, cost, concordance and its relationship with CpG density and genomic context. The two bisulfite methods reached concordance of 82% for CpG methylation levels and 99% for non-CpG cytosine methylation levels. Using binary methylation calls, two enrichment methods were 99% concordant, while regions assessed by all four methods were 97% concordant. To achieve comprehensive methylome coverage while reducing cost, an approach integrating two complementary methods was examined. The integrative methylome profile along with histone methylation, RNA, and SNP profiles derived from the sequence reads allowed genome-wide assessment of allele-specific epigenetic states, identifying most known imprinted regions and new loci with monoallelic epigenetic marks and monoallelic expression.
PMCID: PMC2955169  PMID: 20852635
DNA methylation; Sequencing; Bisulfite
21.  ReadDepth: A Parallel R Package for Detecting Copy Number Alterations from Short Sequencing Reads 
PLoS ONE  2011;6(1):e16327.
Copy number alterations are important contributors to many genetic diseases, including cancer. We present the readDepth package for R, which can detect these aberrations by measuring the depth of coverage obtained by massively parallel sequencing of the genome. In addition to achieving higher accuracy than existing packages, our tool runs much faster by utilizing multi-core architectures to parallelize the processing of these large data sets. In contrast to other published methods, readDepth does not require the sequencing of a reference sample, and uses a robust statistical model that accounts for overdispersed data. It includes a method for effectively increasing the resolution obtained from low-coverage experiments by utilizing breakpoint information from paired end sequencing to do positional refinement. We also demonstrate a method for inferring copy number using reads generated by whole-genome bisulfite sequencing, thus enabling integrative study of epigenomic and copy number alterations. Finally, we apply this tool to two genomes, showing that it performs well on genomes sequenced to both low and high coverage. The readDepth package runs on Linux and MacOSX, is released under the Apache 2.0 license, and is available at
PMCID: PMC3031566  PMID: 21305028
22.  Analysis of MicroRNA Expression in the Prepubertal Testis 
PLoS ONE  2010;5(12):e15317.
Only thirteen microRNAs are conserved between D. melanogaster and the mouse; however, conditional loss of miRNA function through mutation of Dicer causes defects in proliferation of premeiotic germ cells in both species. This highlights the potentially important, but uncharacterized, role of miRNAs during early spermatogenesis. The goal of this study was to characterize on postnatal day 7, 10, and 14 the content and editing of murine testicular miRNAs, which predominantly arise from spermatogonia and spermatocytes, in contrast to prior descriptions of miRNAs in the adult mouse testis which largely reflects the content of spermatids. Previous studies have shown miRNAs to be abundant in the mouse testis by postnatal day 14; however, through Next Generation Sequencing of testes from a B6;129 background we found abundant earlier expression of miRNAs and describe shifts in the miRNA signature during this period. We detected robust expression of miRNAs encoded on the X chromosome in postnatal day 14 testes, consistent with prior studies showing their resistance to meiotic sex chromosome inactivation. Unexpectedly, we also found a similar positional enrichment for most miRNAs on chromosome 2 at postnatal day 14 and for those on chromosome 12 at postnatal day 7. We quantified in vivo developmental changes in three types of miRNA variation including 5′ heterogeneity, editing, and 3′ nucleotide addition. We identified eleven putative novel pubertal testis miRNAs whose developmental expression suggests a possible role in early male germ cell development. These studies provide a foundation for interpretation of miRNA changes associated with testicular pathology and identification of novel components of the miRNA editing machinery in the testis.
PMCID: PMC3012074  PMID: 21206922
23.  Pash 3.0: A versatile software package for read mapping and integrative analysis of genomic and epigenomic variation using massively parallel DNA sequencing 
BMC Bioinformatics  2010;11:572.
Massively parallel sequencing readouts of epigenomic assays are enabling integrative genome-wide analyses of genomic and epigenomic variation. Pash 3.0 performs sequence comparison and read mapping and can be employed as a module within diverse configurable analysis pipelines, including ChIP-Seq and methylome mapping by whole-genome bisulfite sequencing.
Pash 3.0 generally matches the accuracy and speed of niche programs for fast mapping of short reads, and exceeds their performance on longer reads generated by a new generation of massively parallel sequencing technologies. By exploiting longer read lengths, Pash 3.0 maps reads onto the large fraction of genomic DNA that contains repetitive elements and polymorphic sites, including indel polymorphisms.
We demonstrate the versatility of Pash 3.0 by analyzing the interaction between CpG methylation, CpG SNPs, and imprinting based on publicly available whole-genome shotgun bisulfite sequencing data. Pash 3.0 makes use of gapped k-mer alignment, a non-seed based comparison method, which is implemented using multi-positional hash tables. This allows Pash 3.0 to run on diverse hardware platforms, including individual computers with standard RAM capacity, multi-core hardware architectures and large clusters.
PMCID: PMC3001746  PMID: 21092284
25.  GASZ Is Essential for Male Meiosis and Suppression of Retrotransposon Expression in the Male Germline 
PLoS Genetics  2009;5(9):e1000635.
Nuage are amorphous ultrastructural granules in the cytoplasm of male germ cells as divergent as Drosophila, Xenopus, and Homo sapiens. Most nuage are cytoplasmic ribonucleoprotein structures implicated in diverse RNA metabolism including the regulation of PIWI-interacting RNA (piRNA) synthesis by the PIWI family (i.e., MILI, MIWI2, and MIWI). MILI is prominent in embryonic and early post-natal germ cells in nuage also called germinal granules that are often associated with mitochondria and called intermitochondrial cement. We find that GASZ (Germ cell protein with Ankyrin repeats, Sterile alpha motif, and leucine Zipper) co-localizes with MILI in intermitochondrial cement. Knockout of Gasz in mice results in a dramatic downregulation of MILI, and phenocopies the zygotene–pachytene spermatocyte block and male sterility defect observed in MILI null mice. In Gasz null testes, we observe increased hypomethylation and expression of retrotransposons similar to MILI null testes. We also find global shifts in the small RNAome, including down-regulation of repeat-associated, known, and novel piRNAs. These studies provide the first evidence for an essential structural role for GASZ in male fertility and epigenetic and post-transcriptional silencing of retrotransposons by stabilizing MILI in nuage.
Author Summary
Many aspects of RNA processing are essential for or prominent in the differentiation of germ cells. Some RNA metabolism in animal germ cells is associated with physical structures surrounding the cell nucleus called nuage. Nuage has a distinct granular appearance prior to the meiotic divisions with unclear functions. We have identified a protein called GASZ, which plays a structural role in this early nuage. In mice lacking GASZ, retrotransposons—endogenous viral-like particles—become released from their typical repressed state in the germline by the loss of small RNAs called piRNAs, resulting in DNA damage and delayed germ cell maturation. Protection of the germline from genetic intruders may require the association of piRNA-synthesizing enzymes and other components of this nuage structure through direct or indirect associations with GASZ. Mutations in GASZ and other nuage components may contribute to infertility in men who do not produce spermatozoa.
PMCID: PMC2727916  PMID: 19730684

Results 1-25 (26)