Background. The genome-wide association studies (GWAS) have been successful during the last few years. A key challenge is that the interpretation of the results is not straightforward, especially for transacting SNPs. Integration of transcriptome data into GWAS may provide clues elucidating the mechanisms by which a genetic variant leads to a disease. Methods. Here, we developed a novel mediation analysis approach to identify new expression quantitative trait loci (eQTL) driving CYP2D6 activity by combining genotype, gene expression, and enzyme activity data. Results. 389,573 and 1,214,416 SNP-transcript-CYP2D6 activity trios are found strongly associated (P < 10−5, FDR = 16.6% and 11.7%) for two different genotype platforms, namely, Affymetrix and Illumina, respectively. The majority of eQTLs are trans-SNPs. A single polymorphism leads to widespread downstream changes in the expression of distant genes by affecting major regulators or transcription factors (TFs), which would be visible as an eQTL hotspot and can lead to large and consistent biological effects. Overlapped eQTL hotspots with the mediators lead to the discovery of 64 TFs.
Conclusions. Our mediation analysis is a powerful approach in identifying the trans-QTL-phenotype associations. It improves our understanding of the functional genetic variations for the liver metabolism mechanisms.
BACKGROUND & AIMS
Early embryogenesis involves cell fate decisions that define the body axes and establish pools of progenitor cells. Development does not stop once lineages are specified; cells continue to undergo specific maturation events, and changes in gene expression patterns lead to their unique physiological functions. Secretory pancreatic acinar cells mature postnatally to synthesize large amounts of protein, polarize, and communicate with other cells. The transcription factor MIST1 is expressed by only secretory cells and regulates maturation events. MIST1-deficient acinar cells in mice do not establish apical-basal polarity, properly position zymogen granules, or communicate with adjacent cells, disrupting pancreatic function. We investigated whether MIST1 directly induces and maintains the mature phenotype of acinar cells.
We analyzed the effects of Cre-mediated expression of Mist1 in adult Mist1– deficient (Mist1KO) mice. Pancreatic tissues were collected and analyzed by light and electron microscopy, immunohistochemistry, real-time polymerase chain reaction analysis, and chromatin immunoprecipitation. Primary acini were isolated from mice and analyzed in amylase secretion assays.
Induced expression of Mist1 in adult Mist1KO mice restored wild-type gene expression patterns in acinar cells. The acinar cells changed phenotypes, establishing apical-basal polarity, increasing the size of zymogen granules, reorganizing the cytoskeletal network, communicating intercellularly (by synthesizing gap junctions), and undergoing exocytosis.
The exocrine pancreas of adult mice can be remodeled by re-expression of the transcription factor MIST1. MIST1 regulates acinar cell maturation and might be used to repair damaged pancreata in patients with pancreatic disorders.
DIMM; Exocrine Pancreas Disease; Secretion; Transcription
Active peptide from shark liver (APSL) is a cytokine from Chiloscyllium plagiosum that can stimulate liver regeneration and protects the pancreas. To study the effect of orally administered recombinant APSL (rAPSL) on an animal model of type 2 diabetes mellitus, the APSL gene was cloned, and APSL was expressed in Bombyx mori N cells (BmN cells), silkworm larvae and silkworm pupae using the silkworm baculovirus expression vector system (BEVS). It was demonstrated that rAPSL was able to significantly reduce the blood glucose level in mice with type 2 diabetes induced by streptozotocin. The analysis of paraffin sections of mouse pancreatic tissues revealed that rAPSL could effectively protect mouse islets from streptozotocin-induced lesions. Compared with the powder prepared from normal silkworm pupae, the powder prepared from pupae expressing rAPSL exhibited greater protective effects, and these results suggest that rAPSL has potential uses as an oral drug for the treatment of diabetes mellitus in the future.
active peptide from shark liver; Bombyx mori pupae; BmNPV/Bac-to-Bac baculovirus expression system; type 2 diabetes mellitus; oral administration
Bidirectional promoters are shared promoter sequences between divergent gene pair (genes proximal to each other on opposite strands), and can regulate the genes in both directions. In the human genome, > 10% of protein-coding genes are arranged head-to-head on opposite strands, with transcription start sites that are separated by < 1,000 base pairs. Many transcription factor binding sites occur in the bidirectional promoters that influence the expression of 2 opposite genes. Recently, RNA polymerase II (RPol II) ChIP-seq data are used to identify the promoters of coding genes and non-coding RNAs. However, a bidirectional promoter with RPol II ChIP-Seq data has not been found.
In some bidirectional promoter regions, the RPol II forms a bi-peak shape, which indicates that 2 promoters are located in the bidirectional region. We have developed a computational approach to identify the regulatory regions of all divergent gene pairs using genome-wide RPol II binding patterns derived from ChIP-seq data, based upon the assumption that the distribution of RPol II binding patterns around the bidirectional promoters are accumulated by RPol II binding of 2 promoters. In HeLa S3 cells, 249 promoter pairs and 1094 single promoters were identified, of which 76 promoters cover only positive genes, 86 promoters cover only negative genes, and 932 promoters cover 2 genes. Gene expression levels and STAT1 binding sites for different promoter categories were therefore examined.
The regulatory region of bidirectional promoter identification based upon RPol II binding patterns provides important temporal and spatial measurements regarding the initiation of transcription. From gene expression and transcription factor binding site analysis, the promoters in bidirectional regions may regulate the closest gene, and STAT1 is involved in primary promoter.
Over 10,000 long intergenic non-coding RNAs (lincRNAs) have been identified in the human genome. Some have been well characterized and known to participate in various stages of gene regulation. In the post-transcriptional process, another class of well-known small non-coding RNA, or microRNA (miRNA), is very active in inhibiting mRNA. Though similar features between mRNA and lincRNA have been revealed in several recent studies, and a few isolated miRNA-lincRNA relationships have been observed. Despite these advances, the comprehensive miRNA regulation pattern of lincRNA has not been clarified.
In this study, we investigated the possible interaction between the two classes of non-coding RNAs. Instead of using the existing long non-coding database, we employed an ab initio method to annotate lincRNAs expressed in a group of normal breast tissues and breast tumors.
Approximately 90 lincRNAs show strong reverse expression correlation with miRNAs, which have at least one predicted target site presented. These target sites are statistically more conserved than their neighboring genetic regions and other predicted target sites. Several miRNAs that target to these lincRNAs are known to play an essential role in breast cancer.
Similar to inhibiting mRNAs, miRNAs show potential in promoting the degeneration of lincRNAs. Breast-cancer-related miRNAs may influence their target lincRNAs resulting in differential expression in normal and malignant breast tissues. This implies the miRNA regulation of lincRNAs may be involved in the regulatory process in tumor cells.
Typical analysis of time-series gene expression data such as clustering or graphical models cannot distinguish between early and later drug responsive gene targets in cancer cells. However, these genes would represent good candidate biomarkers.
We propose a new model - the dynamic time order network - to distinguish and connect early and later drug responsive gene targets. This network is constructed based on an integrated differential equation. Spline regression is applied for an accurate modeling of the time variation of gene expressions. Then a likelihood ratio test is implemented to infer the time order of any gene expression pair. One application of the model is the discovery of estrogen response biomarkers. For this purpose, we focused on genes whose responses are late when the breast cancer cells are treated with estradiol (E2).
Our approach has been validated by successfully finding time order relations between genes of the cell cycle system. More notably, we found late response genes potentially interesting as biomarkers of E2 treatment.
Alternative splicing increases proteome diversity by expressing multiple gene isoforms that often differ in function. Identifying alternative splicing events from RNA-seq experiments is important for understanding the diversity of transcripts and for investigating the regulation of splicing.
We developed Alt Event Finder, a tool for identifying novel splicing events by using transcript annotation derived from genome-guided construction tools, such as Cufflinks and Scripture. With a proper combination of alignment and transcript reconstruction tools, Alt Event Finder is capable of identifying novel splicing events in the human genome. We further applied Alt Event Finder on a set of RNA-seq data from rat liver tissues, and identified dozens of novel cassette exon events whose splicing patterns changed after extensive alcohol exposure.
Alt Event Finder is capable of identifying de novo splicing events from data-driven transcript annotation, and is a useful tool for studying splicing regulation.
Estrogens control multiple functions of hormone-responsive breast cancer cells. They regulate diverse physiological processes in various tissues through genomic and non-genomic mechanisms that result in activation or repression of gene expression. Transcription regulation upon estrogen stimulation is a critical biological process underlying the onset and progress of the majority of breast cancer. ERα requires distinct co-regulator or modulators for efficient transcriptional regulation, and they form a regulatory network. Knowing this regulatory network will enable systematic study of the effect of ERα on breast cancer.
To investigate the regulatory network of ERα and discover novel modulators of ERα functions, we proposed an analytical method based on a linear regression model to identify translational modulators and their network relationships. In the network analysis, a group of specific modulator and target genes were selected according to the functionality of modulator and the ERα binding. Network formed from targets genes with ERα binding was called ERα genomic regulatory network; while network formed from targets genes without ERα binding was called ERα non-genomic regulatory network. Considering the active or repressive function of ERα, active or repressive function of a modulator, and agonist or antagonist effect of a modulator on ERα, the ERα/modulator/target relationships were categorized into 27 classes.
Using the gene expression data and ERα Chip-seq data from the MCF-7 cell line, the ERα genomic/non-genomic regulatory networks were built by merging ERα/ modulator/target triplets (TF, M, T), where TF refers to the ERα, M refers to the modulator, and T refers to the target. Comparing these two networks, ERα non-genomic network has lower FDR than the genomic network. In order to validate these two networks, the same network analysis was performed in the gene expression data from the ZR-75.1 cell. The network overlap analysis between two cancer cells showed 1% overlap for the ERα genomic regulatory network, but 4% overlap for the non-genomic regulatory network.
We proposed a novel approach to infer the ERα/modulator/target relationships, and construct the genomic/non-genomic regulatory networks in two cancer cells. We found that the non-genomic regulatory network is more reliable than the genomic regulatory network.
A number of empirical Bayes models (each with different statistical distribution assumptions) have now been developed to analyze differential DNA methylation using high-density oligonucleotide tiling arrays. However, it remains unclear which model performs best. For example, for analysis of differentially methylated regions for conservative and functional sequence characteristics (e.g., enrichment of transcription factor-binding sites (TFBSs)), the sensitivity of such analyses, using various empirical Bayes models, remains unclear. In this paper, five empirical Bayes models were constructed, based on either a gamma distribution or a log-normal distribution, for the identification of differential methylated loci and their cell division—(1, 3, and 5) and drug-treatment-(cisplatin) dependent methylation patterns. While differential methylation patterns generated by log-normal models were enriched with numerous TFBSs, we observed almost no TFBS-enriched sequences using gamma assumption models. Statistical and biological results suggest log-normal, rather than gamma, empirical Bayes model distribution to be a highly accurate and precise method for differential methylation microarray analysis. In addition, we presented one of the log-normal models for differential methylation analysis and tested its reproducibility by simulation study. We believe this research to be the first extensive comparison of statistical modeling for the analysis of differential DNA methylation, an important biological phenomenon that precisely regulates gene transcription.
Motivation: One of the fundamental questions in genetics study is to identify functional DNA variants that are responsible to a disease or phenotype of interest. Results from large-scale genetics studies, such as genome-wide association studies (GWAS), and the availability of high-throughput sequencing technologies provide opportunities in identifying causal variants. Despite the technical advances, informatics methodologies need to be developed to prioritize thousands of variants for potential causative effects.
Results: We present regSNPs, an informatics strategy that integrates several established bioinformatics tools, for prioritizing regulatory SNPs, i.e. the SNPs in the promoter regions that potentially affect phenotype through changing transcription of downstream genes. Comparing to existing tools, regSNPs has two distinct features. It considers degenerative features of binding motifs by calculating the differences on the binding affinity caused by the candidate variants and integrates potential phenotypic effects of various transcription factors. When tested by using the disease-causing variants documented in the Human Gene Mutation Database, regSNPs showed mixed performance on various diseases. regSNPs predicted three SNPs that can potentially affect bone density in a region detected in an earlier linkage study. Potential effects of one of the variants were validated using luciferase reporter assay.
Supplementary data are available at Bioinformatics online
Potential epigenetic mechanisms underlying fetal alcohol syndrome (FAS) include alcohol-induced alterations of methyl metabolism, resulting in aberrant patterns of DNA methylation and gene expression during development. Having previously demonstrated an essential role for epigenetics in neural stem cell (NSC) development and that inhibiting DNA methylation prevents NSC differentiation, here we investigated the effect of alcohol exposure on genome-wide DNA methylation patterns and NSC differentiation.
NSCs in culture were treated with or without a 6-hr 88mM (“binge-like”) alcohol exposure and examined at 48 hrs, for migration, growth, and genome-wide DNA methylation. The DNA methylation was examined using DNA-methylation immunoprecipitation (MeDIP) followed by microarray analysis. Further validation was performed using Independent Sequenom analysis.
NSC differentiated in 24 to 48 hrs with migration, neuronal expression, and morphological transformation. Alcohol exposure retarded the migration, neuronal formation, and growth processes of NSC, similar to treatment with the methylation inhibitor 5-aza-cytidine. When NSC departed from the quiescent state, a genome-wide diversification of DNA methylation was observed—that is, many moderately methylated genes altered methylation levels and became hyper- and hypomethylated. Alcohol prevented many genes from such diversification, including genes related to neural development, neuronal receptors, and olfaction, while retarding differentiation. Validation of specific genes by Sequenom analysis demonstrated that alcohol exposure prevented methylation of specific genes associated with neural development [cutl2 (cut-like 2), Igf1 (insulin-like growth factor 1), Efemp1 (epidermal growth factor-containing fibulin-like extracellular matrix protein 1), and Sox 7 (SRY-box containing gene 7)]; eye development, Lim 2 (lens intrinsic membrane protein 2); the epigenetic mark Smarca2 (SWI/SNF related, matrix associated, actin dependent regulator of chromatin, subfamily a, member 2); and developmental disorder [Dgcr2 (DiGeorge syndrome critical region gene 2)]. Specific sites altered by DNA methylation also correlated with transcription factor binding sites known to be critical for regulating neural development.
The data indicate that alcohol prevents normal DNA methylation programming of key neural stem cell genes and retards NSC differentiation. Thus, the role of DNA methylation in FAS warrants further investigation.
Epigenetics; Epigenomics; MeDIP-Chip; Neural development; Fetal alcohol syndrome
It is now established that, as compared to normal cells, the cancer cell genome has an overall inverse distribution of DNA methylation (“methylome”), i.e., predominant hypomethylation and localized hypermethylation, within “CpG islands” (CGIs). Moreover, although cancer cells have reduced methylation “fidelity” and genomic instability, accurate maintenance of aberrant methylomes that underlie malignant phenotypes remains necessary. However, the mechanism(s) of cancer methylome maintenance remains largely unknown. Here, we assessed CGI methylation patterns propagated over 1, 3, and 5 divisions of A2780 ovarian cancer cells, concurrent with exposure to the DNA cross-linking chemotherapeutic cisplatin, and observed cell generation-successive increases in total hyper- and hypo-methylated CGIs. Empirical Bayesian modeling revealed five distinct modes of methylation propagation: (1) heritable (i.e., unchanged) high- methylation (1186 probe loci in CGI microarray); (2) heritable (i.e., unchanged) low-methylation (286 loci); (3) stochastic hypermethylation (i.e., progressively increased, 243 loci); (4) stochastic hypomethylation (i.e., progressively decreased, 247 loci); and (5) considerable “random” methylation (582 loci). These results support a “stochastic model” of DNA methylation equilibrium deriving from the efficiency of two distinct processes, methylation maintenance and de novo methylation. A role for cis-regulatory elements in methylation fidelity was also demonstrated by highly significant (p<2.2×10−5) enrichment of transcription factor binding sites in CGI probe loci showing heritably high (118 elements) and low (47 elements) methylation, and also in loci demonstrating stochastic hyper-(30 elements) and hypo-(31 elements) methylation. Notably, loci having “random” methylation heritability displayed nearly no enrichment. These results demonstrate an influence of cis-regulatory elements on the nonrandom propagation of both strictly heritable and stochastically heritable CGIs.
It is estimated that more than 90% of human genes express multiple mRNA transcripts due to alternative splicing. Consequently, the proteins produced by different splice variants will likely have different functions and expression levels. Several genes with splice variants are known in bone, with functions that affect osteoblast function and bone formation. The primary goal of this study was to evaluate the extent of alternative splicing in a bone subjected to mechanical loading and subsequent bone formation. We used the rat forelimb loading model, in which the right forelimb was loaded axially for 3 minutes, while the left forearm served as a non-loaded control. Animals were subjected to loading sessions every day, with 24 hours between sessions. Ulnae were sampled at 11 time points, from 4 hours to 32 days after beginning loading. RNA was isolated and mRNA abundance was measured at each time point using Affymetrix exon arrays (GeneChip® Rat Exon 1.0 ST Arrays). An ANOVA model was used to identify potential alternatively spliced genes across the time course, and five alternatively spliced genes were validated with qPCR: Akap12, Fn1, Pcolce, Sfrp4, and Tpm1. The number of alternatively spliced genes varied with time, ranging from a low of 68 at 12h to a high of 992 at 16d. We identified genes across the time course that encoded proteins with known functions in bone formation, including collagens, matrix proteins, and components of the Wnt/β-catenin and TGF-β signaling pathways. We also identified alternatively spliced genes encoding cytokines, ion channels, muscle-related genes, and solute carriers that do not have a known function in bone formation and represent potentially novel findings. In addition, a functional characterization was performed to categorize the global functions of the alternatively spliced genes in our data set. In conclusion, mechanical loading induces alternative splicing in bone, which may play an important role in the response of bone to mechanical loading.
Alternative splicing; bone formation; exon arrays; mechanical loading
Bone responds with increased bone formation to mechanical loading, and the time course of bone formation after initiating mechanical loading is well characterized. However, the regulatory activities governing the loading-dependent changes in gene expression are not well understood. The goal of this study was to identify the time-dependent regulatory mechanisms that governed mechanical loading-induced gene expression in bone using a predictive bioinformatics algorithm. A standard model for bone loading in rodents was employed in which the right forelimb was loaded axially for three minutes per day, while the left forearm served as a non-loaded, contralateral control. Animals were subjected to loading sessions every day, with 24 hours between sessions. Ulnas were sampled at 11 time points, from 4 hours to 32 days after beginning loading. Using a predictive bioinformatics algorithm, we created a linear model of gene expression and identified 44 transcription factor binding motifs and 29 microRNA binding sites that were predicted to regulate gene expression across the time course. Known and novel transcription factor binding motifs were identified throughout the time course, as were several novel microRNA binding sites. These time-dependent regulatory mechanisms may be important in controlling the loading-induced bone formation process.
bone; exon array; mechanical loading; microRNA; regulation; transcription factor
Next-generation sequencing technology provides new opportunities and challenges in the search for genetic variants that underlie complex traits. It will also presumably uncover many new rare variants, but exactly how these variants should be incorporated into the data analysis remains a question. Several papers in our group from Genetic Analysis Workshop 17 evaluated different methods of rare variant analysis, including single-variant, gene-based, and pathway-based analyses and analyses that incorporated biological information. Although the performance of some of these methods strongly depends on the underlying disease model, integration of known biological information is helpful in detecting causal genes. Two work groups demonstrated that use of a Bayesian network and a collapsing receiver operating characteristic curve approach improves risk prediction when a disease is caused by many rare variants. Another work group suggested that modeling local rather than global ancestry may be beneficial when controlling the effect of population structure in rare variant association analysis.
rare variant; association analysis; risk prediction model; population structure; biological information; receiver operating characteristic; Bayesian network
The advent of high-throughput measurements of gene expression and bioinformatics analysis methods offers new ways to study gene expression patterns. The primary goal of this study was to determine the time sequence for gene expression in a bone subjected to mechanical loading during key periods of the bone-formation process, including expression of matrix-related genes, the appearance of active osteoblasts, and bone desensitization. A standard model for bone loading was employed in which the right forelimb was loaded axially for 3 minutes per day, whereas the left forearm served as a nonloaded contralateral control. We evaluated loading-induced gene expression over a time course of 4 hours to 32 days after the first loading session. Six distinct time-dependent patterns of gene expression were identified over the time course and were categorized into three primary clusters: genes upregulated early in the time course, genes upregulated during matrix formation, and genes downregulated during matrix formation. Genes then were grouped based on function and/or signaling pathways. Many gene groups known to be important in loading-induced bone formation were identified within the clusters, including AP-1-related genes in the early-response cluster, matrix-related genes in the upregulated gene clusters, and Wnt/β-catenin signaling pathway inhibitors in the downregulated gene clusters. Several novel gene groups were identified as well, including chemokine-related genes, which were upregulated early but downregulated later in the time course; solute carrier genes, which were both upregulated and downregulated; and muscle-related genes, which were primarily downregulated. © 2011 American Society for Bone and Mineral Research.
EXON ARRAYS; GENE EXPRESSION; MECHANICAL LOADING
We present a report of the BIOCOMP'10 - The 2010 International Conference on Bioinformatics & Computational Biology and other related work in the area of systems biology.
Recent studies suggest that many proteins or regions of proteins lack 3D structure. Defined as intrinsically disordered proteins, these proteins/peptides are functionally important. Recent advances in next generation sequencing technologies enable genome-wide identification of novel nucleotide variations in a specific population or cohort.
Using the exonic single nucleotide variations (SNVs) identified in the 1,000 Genomes Project and distributed by the Genetic Analysis Workshop 17, we systematically analysed the genetic and predicted disorder potential features of the non-synonymous variations. The result of experiments suggests that a significant change in the tendency of a protein region to be structured or disordered caused by SNVs may lead to malfunction of such a protein and contribute to disease risk.
After validation with functional SNVs on the traits distributed by GAW17, we conclude that it is valuable to consider structure/disorder tendencies while prioritizing and predicting mechanistic effects arising from novel genetic variations.
RNA-binding proteins (RBPs) play diverse roles in eukaryotic RNA processing. Despite their pervasive functions in coding and noncoding RNA biogenesis and regulation, elucidating the sequence specificities that define protein-RNA interactions remains a major challenge. Recently, CLIP-seq (Cross-linking immunoprecipitation followed by high-throughput sequencing) has been successfully implemented to study the transcriptome-wide binding patterns of SRSF1, PTBP1, NOVA and fox2 proteins. These studies either adopted traditional methods like Multiple EM for Motif Elicitation (MEME) to discover the sequence consensus of RBP's binding sites or used Z-score statistics to search for the overrepresented nucleotides of a certain size. We argue that most of these methods are not well-suited for RNA motif identification, as they are unable to incorporate the RNA structural context of protein-RNA interactions, which may affect to binding specificity. Here, we describe a novel model-based approach--RNAMotifModeler to identify the consensus of protein-RNA binding regions by integrating sequence features and RNA secondary structures.
As an example, we implemented RNAMotifModeler on SRSF1 (SF2/ASF) CLIP-seq data. The sequence-structural consensus we identified is a purine-rich octamer 'AGAAGAAG' in a highly single-stranded RNA context. The unpaired probabilities, the probabilities of not forming pairs, are significantly higher than negative controls and the flanking sequence surrounding the binding site, indicating that SRSF1 proteins tend to bind on single-stranded RNA. Further statistical evaluations revealed that the second and fifth bases of SRSF1octamer motif have much stronger sequence specificities, but weaker single-strandedness, while the third, fourth, sixth and seventh bases are far more likely to be single-stranded, but have more degenerate sequence specificities. Therefore, we hypothesize that nucleotide specificity and secondary structure play complementary roles during binding site recognition by SRSF1.
In this study, we presented a computational model to predict the sequence consensus and optimal RNA secondary structure for protein-RNA binding regions. The successful implementation on SRSF1 CLIP-seq data demonstrates great potential to improve our understanding on the binding specificity of RNA binding proteins.
Identifying rare variants that are responsible for complex disease has been promoted by advances in sequencing technologies. However, statistical methods that can handle the vast amount of data generated and that can interpret the complicated relationship between disease and these variants have lagged. We apply a zero-inflated Poisson regression model to take into account the excess of zeros caused by the extremely low frequency of the 24,487 exonic variants in the Genetic Analysis Workshop 17 data. We grouped the 697 subjects in the data set as Europeans, Asians, and Africans based on principal components analysis and found the total number of rare variants per gene for each individual. We then analyzed these collapsed variants based on the assumption that rare variants are enriched in a group of people affected by a disease compared to a group of unaffected people. We also tested the hypothesis with quantitative traits Q1, Q2, and Q4. Analyses performed on the combined 697 individuals and on each ethnic group yielded different results. For the combined population analysis, we found that UGT1A1, which was not part of the simulation model, was associated with disease liability and that FLT1, which was a causal locus in the simulation model, was associated with Q1. Of the causal loci in the simulation models, FLT1 and KDR were associated with Q1 and VNN1 was correlated with Q2. No significant genes were associated with Q4. These results show the feasibility and capability of our new statistical model to detect multiple rare variants influencing disease risk.
Recent evidence suggests that many complex diseases are caused by genetic variations that play regulatory roles in controlling gene expression. Most genetic studies focus on nonsynonymous variations that can alter the amino acid composition of a protein and are therefore believed to have the highest impact on phenotype. Synonymous variations, however, can also play important roles in disease pathogenesis by regulating pre-mRNA processing and translational control. In this study, we systematically survey the effects of single-nucleotide variations (SNVs) on binding affinity of RNA-binding proteins (RBPs). Among the 10,113 synonymous SNVs identified in 697 individuals in the 1,000 Genomes Project and distributed by Genetic Analysis Workshop 17 (GAW17), we identified 182 variations located in alternatively spliced exons that can significantly change the binding affinity of nine RBPs whose binding preferences on 7-mer RNA sequences were previously reported. We found that the minor allele frequencies of these variations are similar to those of nonsynonymous SNVs, suggesting that they are in fact functional. We propose a workflow to identify phenotype-associated regulatory SNVs that might affect alternative splicing from exome-sequencing-derived genetic variations. Based on the affecting SNVs on the quantitative traits simulated in GAW17, we further identified two and four functional SNVs that are predicted to be involved in alternative splicing regulation in traits Q1 and Q2, respectively.
Serum microRNAs have the potential to be valuable biomarkers of cancer. This investigation addresses two issues that impact their utility: a) appropriate normalization controls and b) whether their altered levels persist in patients who are clinically free of the disease.
Sera from 40 age-matched healthy women and 39 breast cancer patients without clinical disease at the time of serum collection were analyzed for microRNAs let-7f, miR-16, miR-21 and miR-155 using quantitative real-time PCR. U6 and 5S, which are transcribed by RNA polymerase III (RNAP-III) and the small nucleolar RNU44 (SNORD44), were also analyzed for normalization. Significant results from the initial study were verified using a second set of sera from 15 healthy patients, 15 breast cancer patients without clinical disease and 15 with metastatic disease, and a third set of 12 healthy and 18 patients with metastatic disease. U6 was further verified in the extended second cohort of 75 healthy and 68 breast cancer patients without clinical disease.
U6:SNORD44 ratio was consistently higher in breast cancer patients with or without active disease (fold change range 1.5-6.6, p value range 0.0003 to 0.05). This increase in U6:SNORD44 ratio was observed in the sera of both estrogen receptor-positive (ER+) and ER-negative breast cancer patients. MiR-16 and 5S, which are often used as normalization controls for microRNAs, showed remarkable experimental variability and thus are not ideal for normalization.
Elevated serum U6 levels in breast cancer patients irrespective of disease activity at the time of serum collection suggest a new paradigm in cancer; persistent systemic changes during cancer progression, which result in elevated activity of RNAP-III and/or the stability/release pathways of U6 in non-cancer tissues. Additionally, these results highlight the need for developing standards for normalization between samples in microRNA-related studies for healthy versus cancer and for inter-laboratory reproducibility. Our studies rule out the utility of miR-16, U6 and 5S RNAs for this purpose.
We previously showed that alcohol-preferring (P) rats have higher bone density than alcohol-nonpreferring (NP) rats. Genetic mapping in P and NP rats identified a major quantitative trait locus (QTL) between 4q22 and 4q34 for alcohol preference. At the same location, several QTLs linked to bone density and structure were detected in Fischer 344 (F344) and Lewis (LEW) rats, suggesting that bone mass and strength genes might cosegregate with genes that regulate alcohol preference. The aim of this study was to identify the genes segregating for skeletal phenotypes in congenic P and NP rats. Transfer of the NP chromosome 4 QTL into the P background (P.NP) significantly decreased areal bone mineral density (aBMD) and volumetric bone mineral density (vBMD) at several skeletal sites, whereas transfer of the P chromosome 4 QTL into the NP background (NP.P) significantly increased bone mineral content (BMC) and aBMD in the same skeletal sites. Microarray analysis from the femurs using Affymetrix Rat Genome arrays revealed 53 genes that were differentially expressed among the rat strains with a false discovery rate (FDR) of less than 10%. Nine candidate genes were found to be strongly correlated (r2 > 0.50) with bone mass at multiple skeletal sites. The top three candidate genes, neuropeptide Y (Npy), α synuclein (Snca), and sepiapterin reductase (Spr), were confirmed using real-time quantitative PCR (qPCR). Ingenuity pathway analysis revealed relationships among the candidate genes related to bone metabolism involving β-estradiol, interferon-γ, and a voltage-gated calcium channel. We identified several candidate genes, including some novel genes on chromosome 4 segregating for skeletal phenotypes in reciprocal congenic P and NP rats. © 2010 American Society for Bone and Mineral Research.
bone mass; congenic; QTL; neuropeptide Y; gene
Several lines of evidence have suggested that estrogen receptor α (ERα)–negative breast tumors, which are highly aggressive and nonresponsive to hormonal therapy, arise from ERα-positive precursors through different molecular pathways. Because microRNAs (miRNAs) modulate gene expression, we hypothesized that they may have a role in ER-negative tumor formation.
Gene expression profiles were used to highlight the global changes induced by miRNA modulation of ERα protein. miRNA transfection and luciferase assays enabled us to identify new targets of miRNA 206 (miR-206) and miRNA cluster 221-222 (miR-221-222). Northern blot, luciferase assays, estradiol treatment, and chromatin immunoprecipitation were performed to identify the miR-221-222 transcription unit and the mechanism implicated in its regulation.
Different global changes in gene expression were induced by overexpression of miR-221-222 and miR-206 in ER-positive cells. miR-221 and -222 increased proliferation of ERα-positive cells, whereas miR-206 had an inhibitory effect (mean absorbance units [AU]: miR-206: 500 AU, 95% confidence interval [CI]) = 480 to 520; miR-221: 850 AU, 95% CI = 810 to 873; miR-222: 879 AU, 95% CI = 850 to 893; P < .05). We identified hepatocyte growth factor receptor and forkhead box O3 as new targets of miR-206 and miR-221-222, respectively. We demonstrated that ERα negatively modulates miR-221 and -222 through the recruitment of transcriptional corepressor partners: nuclear receptor corepressor and silencing mediator of retinoic acid and thyroid hormone receptor.
These findings suggest that the negative regulatory loop involving miR-221-222 and ERα may confer proliferative advantage and migratory activity to breast cancer cells and promote the transition from ER-positive to ER-negative tumors.
Estrogens regulate diverse physiological processes in various tissues through genomic and non-genomic mechanisms that result in activation or repression of gene expression. Transcription regulation upon estrogen stimulation is a critical biological process underlying the onset and progress of the majority of breast cancer. Dynamic gene expression changes have been shown to characterize the breast cancer cell response to estrogens, the every molecular mechanism of which is still not well understood.
We developed a modulated empirical Bayes model, and constructed a novel topological and temporal transcription factor (TF) regulatory network in MCF7 breast cancer cell line upon stimulation by 17β-estradiol stimulation. In the network, significant TF genomic hubs were identified including ER-alpha and AP-1; significant non-genomic hubs include ZFP161, TFDP1, NRF1, TFAP2A, EGR1, E2F1, and PITX2. Although the early and late networks were distinct (<5% overlap of ERα target genes between the 4 and 24 h time points), all nine hubs were significantly represented in both networks. In MCF7 cells with acquired resistance to tamoxifen, the ERα regulatory network was unresponsive to 17β-estradiol stimulation. The significant loss of hormone responsiveness was associated with marked epigenomic changes, including hyper- or hypo-methylation of promoter CpG islands and repressive histone methylations.
We identified a number of estrogen regulated target genes and established estrogen-regulated network that distinguishes the genomic and non-genomic actions of estrogen receptor. Many gene targets of this network were not active anymore in anti-estrogen resistant cell lines, possibly because their DNA methylation and histone acetylation patterns have changed.