1.  Colocalization of Coregulated Genes: A Steered Molecular Dynamics Study of Human Chromosome 19 
PLoS Computational Biology  2013;9(3):e1003019.
The connection between chromatin nuclear organization and gene activity is vividly illustrated by the observation that transcriptional coregulation of certain genes appears to be directly influenced by their spatial proximity. This fact poses the more general question of whether it is at all feasible that the numerous genes that are coregulated on a given chromosome, especially those at large genomic distances, might become proximate inside the nucleus. This problem is studied here using steered molecular dynamics simulations in order to enforce the colocalization of thousands of knowledge-based gene sequences on a model for the gene-rich human chromosome 19. Remarkably, it is found that most () gene pairs can be brought simultaneously into contact. This is made possible by the low degree of intra-chromosome entanglement and the large number of cliques in the gene coregulatory network. A clique is a set of genes coregulated all together as a group. The constrained conformations for the model chromosome 19 are further shown to be organized in spatial macrodomains that are similar to those inferred from recent HiC measurements. The findings indicate that gene coregulation and colocalization are largely compatible and that this relationship can be exploited to draft the overall spatial organization of the chromosome in vivo. The more general validity and implications of these findings could be investigated by applying to other eukaryotic chromosomes the general and transferable computational strategy introduced here.
Author Summary
Recent high-throughput experiments have shown that chromosome regions (loci) which accommodate specific sets of coregulated genes can be in close spatial proximity despite their possibly large sequence separation. The findings pose the question of whether gene coregulation and gene colocalization are related in general. Here, we tackle this problem using a knowledge-based coarse-grained model of human chromosome 19. Specifically, we carry out steered molecular dynamics simulations to promote the colocalization of hundreds of gene pairs that are known to be significantly coregulated. We show that most () of such pairs can be simultaneously colocalized. This result is, in turn, shown to depend on at least two distinctive chromosomal features: the remarkably low degree of intra-chain entanglement found in chromosomes inside the nucleus and the large number of cliques present in the gene coregulatory network. The results are therefore largely consistent with the coregulation-colocalization hypothesis. Furthermore, the model chromosome conformations obtained by applying the coregulation constraints are found to display spatial macrodomains that have significant similarities with those inferred from HiC measurements of human chromosome 19. This finding suggests that suitable extensions of the present approach might be used to propose viable ensembles of eukaryotic chromosome conformations in vivo.
PMCID: PMC3610629  PMID: 23555238
2.  Bayesian Test for Colocalisation between Pairs of Genetic Association Studies Using Summary Statistics 
PLoS Genetics  2014;10(5):e1004383.
Genetic association studies, in particular the genome-wide association study (GWAS) design, have provided a wealth of novel insights into the aetiology of a wide range of human diseases and traits, in particular cardiovascular diseases and lipid biomarkers. The next challenge consists of understanding the molecular basis of these associations. The integration of multiple association datasets, including gene expression datasets, can contribute to this goal. We have developed a novel statistical methodology to assess whether two association signals are consistent with a shared causal variant. An application is the integration of disease scans with expression quantitative trait locus (eQTL) studies, but any pair of GWAS datasets can be integrated in this framework. We demonstrate the value of the approach by re-analysing a gene expression dataset in 966 liver samples with a published meta-analysis of lipid traits including >100,000 individuals of European ancestry. Combining all lipid biomarkers, our re-analysis supported 26 out of 38 reported colocalisation results with eQTLs and identified 14 new colocalisation results, hence highlighting the value of a formal statistical test. In three cases of reported eQTL-lipid pairs (SYPL2, IFT172, TBKBP1) for which our analysis suggests that the eQTL pattern is not consistent with the lipid association, we identify alternative colocalisation results with SORT1, GCKR, and KPNB1, indicating that these genes are more likely to be causal in these genomic intervals. A key feature of the method is the ability to derive the output statistics from single SNP summary statistics, hence making it possible to perform systematic meta-analysis type comparisons across multiple GWAS datasets (implemented online at Our methodology provides information about candidate causal genes in associated intervals and has direct implications for the understanding of complex diseases as well as the design of drugs to target disease pathways.
Author Summary
Genome-wide association studies (GWAS) have found a large number of genetic regions (“loci”) affecting clinical end-points and phenotypes, many outside coding intervals. One approach to understanding the biological basis of these associations has been to explore whether GWAS signals from intermediate cellular phenotypes, in particular gene expression, are located in the same loci (“colocalise”) and are potentially mediating the disease signals. However, it is not clear how to assess whether the same variants are responsible for the two GWAS signals or whether it is distinct causal variants close to each other. In this paper, we describe a statistical method that can use simply single variant summary statistics to test for colocalisation of GWAS signals. We describe one application of our method to a meta-analysis of blood lipids and liver expression, although any two datasets resulting from association studies can be used. Our method is able to detect the subset of GWAS signals explained by regulatory effects and identify candidate genes affected by the same GWAS variants. As summary GWAS data are increasingly available, applications of colocalisation methods to integrate the findings will be essential for functional follow-up, and will also be particularly useful to identify tissue specific signals in eQTL datasets.
PMCID: PMC4022491  PMID: 24830394
3.  Kerfuffle: a web tool for multi-species gene colocalization analysis 
BMC Bioinformatics  2013;14:22.
The evolutionary pressures that underlie the large-scale functional organization of the genome are not well understood in eukaryotes. Recent evidence suggests that functionally similar genes may colocalize (cluster) in the eukaryotic genome, suggesting the role of chromatin-level gene regulation in shaping the physical distribution of coordinated genes. However, few of the bioinformatic tools currently available allow for a systematic study of gene colocalization across several, evolutionarily distant species. Furthermore, most tools require the user to input manually curated lists of gene position information, DNA sequence or gene homology relations between species. With the growing number of sequenced genomes, there is a need to provide new comparative genomics tools that can address the analysis of multi-species gene colocalization.
Kerfuffle is a web tool designed to help discover, visualize, and quantify the physical organization of genomes by identifying significant gene colocalization and conservation across the assembled genomes of available species (currently up to 47, from humans to worms). Kerfuffle only requires the user to specify a list of human genes and the names of other species of interest. Without further input from the user, the software queries the e!Ensembl BioMart server to obtain positional information and discovers homology relations in all genes and species specified. Using this information, Kerfuffle performs a multi-species clustering analysis, presents downloadable lists of clustered genes, performs Monte Carlo statistical significance calculations, estimates how conserved gene clusters are across species, plots histograms and interactive graphs, allows users to save their queries, and generates a downloadable visualization of the clusters using the Circos software. These analyses may be used to further explore the functional roles of gene clusters by interrogating the enriched molecular pathways associated with each cluster.
Kerfuffle is a new, easy-to-use and publicly available tool to aid our understanding of functional genomics and comparative genomics. This software allows for flexibility and quick investigations of a user-defined set of genes, and the results may be saved online for further analysis. Kerfuffle is freely available at, is implemented in JavaScript (using jQuery and jsCharts libraries) and PHP 5.2, runs on an Apache server, and stores data in flat files and an SQLite database.
PMCID: PMC3598493  PMID: 23327649
Genes; Clusters; Colocalization; Conservation; Synteny
4.  Functional Centromeres Determine the Activation Time of Pericentric Origins of DNA Replication in Saccharomyces cerevisiae 
PLoS Genetics  2012;8(5):e1002677.
The centromeric regions of all Saccharomyces cerevisiae chromosomes are found in early replicating domains, a property conserved among centromeres in fungi and some higher eukaryotes. Surprisingly, little is known about the biological significance or the mechanism of early centromere replication; however, the extensive conservation suggests that it is important for chromosome maintenance. Do centromeres ensure their early replication by promoting early activation of nearby origins, or have they migrated over evolutionary time to reside in early replicating regions? In Candida albicans, a neocentromere contains an early firing origin, supporting the first hypothesis but not addressing whether the new origin is intrinsically early firing or whether the centromere influences replication time. Because the activation time of individual origins is not an intrinsic property of S. cerevisiae origins, but is influenced by surrounding sequences, we sought to test the hypothesis that centromeres influence replication time by moving a centromere to a late replication domain. We used a modified Meselson-Stahl density transfer assay to measure the kinetics of replication for regions of chromosome XIV in which either the functional centromere or a point-mutated version had been moved near origins that reside in a late replication region. We show that a functional centromere acts in cis over a distance as great as 19 kb to advance the initiation time of origins. Our results constitute a direct link between establishment of the kinetochore and the replication initiation machinery, and suggest that the proposed higher-order structure of the pericentric chromatin influences replication initiation.
Author Summary
Genome duplication requires the orderly initiation of DNA synthesis at sites called origins of replication. It has long been known that different origins become active at different times in S-phase (the period during which cells duplicate their chromosomes). Although such temporal regulation of replication is broadly conserved among eukaryotes, how this regional control of replication time occurs largely remains a mystery. The early replication of baker's yeast centromeres (genetic elements essential for proper segregation of chromosomes during cell division) is one frequently cited example of temporal regulation, yet the biological significance of early centromere replication also remains speculative. Increasing evidence suggests that early centromere replication is a conserved feature of the DNA replication program across many species. Here, we show that centromeres in this yeast can advance the time at which origins in their genomic neighborhood initiate DNA replication. The distance over which centromeres can influence origin activation time extends up to 19 kilobases. We further show that centromere-mediated early origin activation depends on the centromere's ability to recruit at least a subset of the proteins needed for chromosome segregation. This study thus provides the first direct functional link between kinetochore establishment and the mechanisms of DNA replication initiation.
PMCID: PMC3349730  PMID: 22589733
5.  Insertion of Foreign DNA into an Established Mammalian Genome Can Alter the Methylation of Cellular DNA Sequences† 
Journal of Virology  1999;73(2):1010-1022.
The insertion of adenovirus type 12 (Ad12) DNA into the hamster genome and the transformation of these cells by Ad12 can lead to marked alterations in the levels of DNA methylation in several cellular genes and DNA segments. Since such alterations in DNA methylation patterns are likely to affect the transcription patterns of cellular genes, it is conceivable that these changes have played a role in the generation or the maintenance of the Ad12-transformed phenotype. We have now isolated clonal BHK21 hamster cell lines that carry in their genomes bacteriophage λ and plasmid pSV2neo DNAs in an integrated state. Most of these cell lines contain one or multiple copies of integrated λ DNA, which often colocalize with the pSV2neo DNA, usually in a single chromosomal site as determined by the fluorescent in situ hybridization technique. In different cell lines, the loci of foreign DNA insertion are different. The inserted bacteriophage λ DNA frequently becomes de novo methylated. In some of the thus-generated hamster cell lines, the levels of DNA methylation in the retrotransposon genomes of the endogenous intracisternal A particles (IAP) are increased in comparison to those in the non-λ-DNA-transgenic BHK21 cell lines. These changes in the methylation patterns of the IAP subclone I (IAPI) segment have been documented by restriction analyses with methylation-sensitive restriction endonucleases followed by Southern transfer hybridization and phosphorimager quantitation. The results of genomic sequencing experiments using the bisulfite protocol yielded additional evidence for alterations in the patterns of DNA methylation in selected segments of the IAPI sequences. In these experiments, the nucleotide sequences in >330 PCR-generated cloned DNA molecules were determined. Upon prolonged cultivation of cell lines with altered cellular methylation patterns, these differences became less apparent, perhaps due to counterselection of the transgenic cells. The possibility existed that the hamster BHK21 cell genomes represent mosaics with respect to DNA methylation in the IAPI segment. Hence, some of the cells with the patterns observed after λ DNA integration might have existed prior to λ DNA integration and been selected by chance. A total of 66 individual BHK21 cell clones from the BHK21 cell stock have been recloned up to three times, and the DNAs of these cell populations have been analyzed for differences in IAPI methylation patterns. None have been found. These patterns are identical among the individual BHK21 cell clones and identical to the patterns of the originally used BHK21 cell line. Similar results have been obtained with nine clones isolated from BHK21 cells mock transfected by the Ca2+-phosphate precipitation procedure with DNA omitted from the transfection mixture. In four clonal sublines of nontransgenic control BHK21 cells, genomic sequencing of 335 PCR-generated clones by the bisulfite protocol revealed 5′-CG-3′ methylation levels in the IAPI segment that were comparable to those in the uncloned BHK21 cell line. We conclude that the observed changes in the DNA methylation patterns in BHK21 cells with integrated λ DNA are unlikely to preexist or to be caused by the transfection procedure. Our data support the interpretation that the insertion of foreign DNA into a preexisting mammalian genome can alter the cellular patterns of DNA methylation, perhaps via changes in chromatin structure. The cellular sites affected by and the extent of these changes could depend on the site and size of foreign DNA insertion.
PMCID: PMC103921  PMID: 9882302
6.  Epigenetically-Inherited Centromere and Neocentromere DNA Replicates Earliest in S-Phase 
PLoS Genetics  2010;6(8):e1001068.
Eukaryotic centromeres are maintained at specific chromosomal sites over many generations. In the budding yeast Saccharomyces cerevisiae, centromeres are genetic elements defined by a DNA sequence that is both necessary and sufficient for function; whereas, in most other eukaryotes, centromeres are maintained by poorly characterized epigenetic mechanisms in which DNA has a less definitive role. Here we use the pathogenic yeast Candida albicans as a model organism to study the DNA replication properties of centromeric DNA. By determining the genome-wide replication timing program of the C. albicans genome, we discovered that each centromere is associated with a replication origin that is the first to fire on its respective chromosome. Importantly, epigenetic formation of new ectopic centromeres (neocentromeres) was accompanied by shifts in replication timing, such that a neocentromere became the first to replicate and became associated with origin recognition complex (ORC) components. Furthermore, changing the level of the centromere-specific histone H3 isoform led to a concomitant change in levels of ORC association with centromere regions, further supporting the idea that centromere proteins determine origin activity. Finally, analysis of centromere-associated DNA revealed a replication-dependent sequence pattern characteristic of constitutively active replication origins. This strand-biased pattern is conserved, together with centromere position, among related strains and species, in a manner independent of primary DNA sequence. Thus, inheritance of centromere position is correlated with a constitutively active origin of replication that fires at a distinct early time. We suggest a model in which the distinct timing of DNA replication serves as an epigenetic mechanism for the inheritance of centromere position.
Author Summary
Centromeres form at the same chromosomal position from generation to generation, yet in most species this inheritance occurs in a DNA sequence–independent manner that is not well understood. Here, we determine the timing of DNA replication across the genome of the human fungal pathogen Candida albicans and find that centromeric DNA is the first locus to replicate on each chromosome. Furthemore, this unique replication timing may be important for centromere inheritance, based on several observations. First, DNA sequence patterns at centromeres indicate that, despite high levels of primary sequence divergence, the region has served as a replication origin for millions of years; second, formation of a neocentromere (a new centromere formed at an ectopic locus following deletion of the native centromere DNA) results in the establishment of a new, early-firing origin of replication; and third, a centromere-specific protein, Cse4p, recruits origin replication complex proteins in a concentration-dependent manner. Thus, centromere position is inherited by an epigenetic mechanism that appears to be defined by a distinctively early firing DNA replication origin.
PMCID: PMC2924309  PMID: 20808889
7.  Targeted interactomics reveals a complex core cell cycle machinery in Arabidopsis thaliana 
A protein interactome focused towards cell proliferation was mapped comprising 857 interactions among 393 proteins, leading to many new insights in plant cell cycle regulation.A comprehensive view on heterodimeric cyclin-dependent kinase (CDK)/cyclin complexes in plants is obtained, in relation with their regulators.Over 100 new candidate cell cycle proteins were predicted.
The basic underlying mechanisms that govern the cell cycle are conserved among all eukaryotes. Peculiar for plants, however, is that their genome contains a collection of cell cycle regulatory genes that is intriguingly large (Vandepoele et al, 2002; Menges et al, 2005) compared to other eukaryotes. Arabidopsis thaliana (Arabidopsis) encodes 71 genes in five regulatory classes versus only 15 in yeast and 23 in human.
Despite the discovery of numerous cell cycle genes, little is known about the protein complex machinery that steers plant cell division. Therefore, we applied tandem affinity purification (TAP) approach coupled with mass spectrometry (MS) on Arabidopsis cell suspension cultures to isolate and analyze protein complexes involved in the cell cycle. This approach allowed us to successfully map a first draft of the basic cell cycle complex machinery of Arabidopsis, providing many new insights into plant cell division.
To map the interactome, we relied on a streamlined platform comprising generic Gateway-based vectors with high cloning flexibility, the fast generation of transgenic suspension cultures, TAP adapted for plant cells, and matrix-assisted laser desorption ionization (MALDI) tandem-MS for the identification of purified proteins (Van Leene et al, 2007, 2008Van Leene et al, 2007, 2008). Complexes for 102 cell cycle proteins were analyzed using this approach, leading to a non-redundant data set of 857 interactions among 393 proteins (Figure 1A). Two subspaces were identified in this data set, domain I1, containing interactions confirmed in at least two independent experimental repeats or in the reciprocal purification experiment, and domain I2 consisting of uniquely observed interactions.
Several observations underlined the quality of both domains. All tested reverse purifications found the original interaction, and 150 known or predicted interactions were confirmed, meaning that also a huge stack of new interactions was revealed. An in-depth computational analysis revealed enrichment for many cell cycle-related features among the proteins of the network (Figure 1B), and many protein pairs were coregulated at the transcriptional level (Figure 1C). Through integration of known cell cycle-related features, more than 100 new candidate cell cycle proteins were predicted (Figure 1D). Besides common qualities of both interactome domains, their real significance appeared through mutual differences exposing two subspaces in the cell cycle interactome: a central regulatory network of stable complexes that are repeatedly isolated and represent core regulatory units, and a peripheral network comprising transient interactions identified less frequently, which are involved in other aspects of the process, such as crosstalk between core complexes or connections with other pathways. To evaluate the biological relevance of the cell cycle interactome in plants, we validated interactions from both domains by a transient split-luciferase assay in Arabidopsis plants (Marion et al, 2008), further sustaining the hypothesis-generating power of the data set to understand plant growth.
With respect to insights into the cell cycle physiology, the interactome was subdivided according to the functional classes of the baits and core protein complexes were extracted, covering cyclin-dependent kinase (CDK)/cyclin core complexes together with their positive and negative regulation networks, DNA replication complexes, the anaphase-promoting complex, and spindle checkpoint complexes. The data imply that mitotic A- and B-type cyclins exclusively form heterodimeric complexes with the plant-specific B-type CDKs and not with CDKA;1, whereas D-type cyclins seem to associate with CDKA;1. Besides the extraction of complexes previously shown in other organisms, our data also suggested many new functional links; for example, the link coupling cell division with the regulation of transcript splicing. The association of negative regulators of CDK/cyclin complexes with transcription factors suggests that their role in reallocation is not solely targeted to CDK/cyclin complexes. New members of the Siamese-related inhibitory proteins were identified, and for the first time potential inhibitors of plant-specific mitotic B-type CDKs have been found in plants. New evidence that the E2F–DP–RBR network is not only active at G1-to-S, but also at the G2-to-M transition is provided and many complexes involved in DNA replication or repair were isolated. For the first time, a plant APC has been isolated biochemically, identifying three potential new plant-specific APC interactors, and finally, complexes involved in the spindle checkpoint were isolated mapping many new but specific interactions.
Finally, to get a general view on the complex machinery, modules of interacting cyclins and core cell cycle regulators were ranked along the cell cycle phases according to the transcript expression peak of the cyclins, showing an assorted set of CDK–cyclin complexes with high regulatory differentiation (Figure 4). Even within the same subfamily (e.g. cyclin A3, B1, B2, D3, and D4), cyclins differ not only in their functional time frame but also in the type and number of CDKs, inhibitors, and scaffolding proteins they bind, further indicating their functional diversification. According to our interaction data, at least 92 different variants of CDK–cyclin complexes are found in Arabidopsis.
In conclusion, these results reflect how several rounds of gene duplication (Sterck et al, 2007) led to the evolution of a large set of cyclin paralogs and a myriad of regulators, resulting in a significant jump in the complexity of the cell cycle machinery that could accommodate unique plant-specific features such as an indeterminate mode of postembryonic development. Through their extensive regulation and connection with a myriad of up- and downstream pathways, the core cell cycle complexes might offer the plant a flexible toolkit to fine-tune cell proliferation in response to an ever-changing environment.
Cell proliferation is the main driving force for plant growth. Although genome sequence analysis revealed a high number of cell cycle genes in plants, little is known about the molecular complexes steering cell division. In a targeted proteomics approach, we mapped the core complex machinery at the heart of the Arabidopsis thaliana cell cycle control. Besides a central regulatory network of core complexes, we distinguished a peripheral network that links the core machinery to up- and downstream pathways. Over 100 new candidate cell cycle proteins were predicted and an in-depth biological interpretation demonstrated the hypothesis-generating power of the interaction data. The data set provided a comprehensive view on heterodimeric cyclin-dependent kinase (CDK)–cyclin complexes in plants. For the first time, inhibitory proteins of plant-specific B-type CDKs were discovered and the anaphase-promoting complex was characterized and extended. Important conclusions were that mitotic A- and B-type cyclins form complexes with the plant-specific B-type CDKs and not with CDKA;1, and that D-type cyclins and S-phase-specific A-type cyclins seem to be associated exclusively with CDKA;1. Furthermore, we could show that plants have evolved a combinatorial toolkit consisting of at least 92 different CDK–cyclin complex variants, which strongly underscores the functional diversification among the large family of cyclins and reflects the pivotal role of cell cycle regulation in the developmental plasticity of plants.
PMCID: PMC2950081  PMID: 20706207
Arabidopsis thaliana; cell cycle; interactome; protein complex; protein interactions
8.  Nuclear IE2 Structures Are Related to Viral DNA Replication Sites during Baculovirus Infection 
Journal of Virology  2002;76(10):5198-5207.
The ie2 gene of Autographa californica multicapsid nuclear polyhedrosis virus is 1 of the 10 baculovirus genes that have been identified as factors involved in viral DNA replication. IE2 is detectable in the nucleus as one of the major early-expressed proteins and exhibits a dynamic localization pattern during the infection cycle (D. Murges, I. Quadt, J. Schröer, and D. Knebel-Mörsdorf, Exp. Cell Res. 264:219-232, 2001). Here, we investigated whether IE2 localized to regions of viral DNA replication. After viral DNA was labeled with bromodeoxyuridine (BrdU), confocal imaging indicated that defined IE2 domains colocalized with viral DNA replication centers as soon as viral DNA replication was detectable. In addition, a subpopulation of IE2 structures colocalized with two further virus-encoded replication factors, late expression factor 3 (LEF-3) and the DNA binding protein (DBP). While DBP and LEF-3 structures always colocalized and enlarged simultaneously with viral DNA replication sites, only those IE2 structures that colocalized with replication sites also colocalized with DBP. Replication and transcription of DNA viruses in association with promyelocytic leukemia protein (PML) oncogenic domains have been observed. By confocal imaging we demonstrated that the human PML colocalized with IE2. Triple staining revealed PML/IE2 domains in the vicinity of viral DNA replication centers, while IE2 alone colocalized with early replication sites, demonstrating that PML structures do not form common domains with viral DNA replication centers. Thus, we conclude that IE2 colocalizes alternately with PML and the sites of viral DNA replication. Small ubiquitin-like modifier SUMO-1 has been implicated in the nuclear distribution of PML. Similar to what was found for mammalian cells, small ubiquitin-like modifiers were recruited to PML domains in infected insect cells, which suggests that IE2 and PML colocalize in conserved cellular domains. In summary, our results support a model for IE2 as part of various functional sites in the nucleus that are connected with viral DNA replication.
PMCID: PMC136171  PMID: 11967334
9.  Statistical independence of the colocalized association signals for type 1 diabetes and RPS26 gene expression on chromosome 12q13 
Biostatistics (Oxford, England)  2008;10(2):327-334.
Following the recent success of genome-wide association studies in uncovering disease-associated genetic variants, the next challenge is to understand how these variants affect downstream pathways. The most proximal trait to a disease-associated variant, most commonly a single nucleotide polymorphism (SNP), is differential gene expression due to the cis effect of SNP alleles on transcription, translation, and/or splicing gene expression quantitative trait loci (eQTL). Several genome-wide SNP–gene expression association studies have already provided convincing evidence of widespread association of eQTLs. As a consequence, some eQTL associations are found in the same genomic region as a disease variant, either as a coincidence or a causal relationship. Cis-regulation of RPS26 gene expression and a type 1 diabetes (T1D) susceptibility locus have been colocalized to the 12q13 genomic region. A recent study has also suggested RPS26 as the most likely susceptibility gene for T1D in this genomic region. However, it is still not clear whether this colocalization is the result of chance alone or if RPS26 expression is directly correlated with T1D susceptibility, and therefore, potentially causal. Here, we derive and apply a statistical test of this hypothesis. We conclude that RPS26 expression is unlikely to be the molecular trait responsible for T1D susceptibility at this locus, at least not in a direct, linear connection.
PMCID: PMC2648905  PMID: 19039033
Association studies; Gene expression; RPS26; T1D
10.  Identification of cis-Acting Elements That Mediate the Replication and Maintenance of Human Papillomavirus Type 16 Genomes in Saccharomyces cerevisiae 
Journal of Virology  2005;79(10):5933-5942.
Papillomaviruses contain small double-stranded DNA genomes that are maintained in persistently infected mammalian host epithelia as nuclear plasmids and rely upon the host replication machinery for replication. Papillomaviruses encode a DNA helicase, E1, which can specifically bind to the viral genome and support DNA synthesis. Under some conditions in mammalian cells, E1 is not required for viral DNA synthesis, leading to the hypothesis that papillomavirus DNA can be replicated solely by the host replication machinery. This machinery is highly conserved among eukaryotes. We and others found that papillomavirus DNA could replicate in a simple eukaryote, Saccharomyces cerevisiae. Specifically, papillomavirus DNA could substitute for the function of the autonomously replicating sequence (ARS) and centromere (CEN) elements that are normally both required for the stable replication of extrachromosomal DNAs in yeast. Furthermore, this form of replication in yeast was E1 independent. In this study, we map the elements in the human papillomavirus type 16 (HPV16) genome that can substitute for yeast ARS and CEN elements. A single element, termed rep, was identified that can substitute for ARS, and multiple elements, termed mtc, could substitute for CEN. The location of one of these mtc elements overlaps the location of rep, and this approximately 1,000-bp region of HPV16 was sufficient to support stable replication of a bacterial-yeast shuttle plasmid deleted of both ARS and CEN elements.
PMCID: PMC1091711  PMID: 15857979
11.  A Bayesian approach to efficient differential allocation for resampling-based significance testing 
BMC Bioinformatics  2009;10:198.
Large-scale statistical analyses have become hallmarks of post-genomic era biological research due to advances in high-throughput assays and the integration of large biological databases. One accompanying issue is the simultaneous estimation of p-values for a large number of hypothesis tests. In many applications, a parametric assumption in the null distribution such as normality may be unreasonable, and resampling-based p-values are the preferred procedure for establishing statistical significance. Using resampling-based procedures for multiple testing is computationally intensive and typically requires large numbers of resamples.
We present a new approach to more efficiently assign resamples (such as bootstrap samples or permutations) within a nonparametric multiple testing framework. We formulated a Bayesian-inspired approach to this problem, and devised an algorithm that adapts the assignment of resamples iteratively with negligible space and running time overhead. In two experimental studies, a breast cancer microarray dataset and a genome wide association study dataset for Parkinson's disease, we demonstrated that our differential allocation procedure is substantially more accurate compared to the traditional uniform resample allocation.
Our experiments demonstrate that using a more sophisticated allocation strategy can improve our inference for hypothesis testing without a drastic increase in the amount of computation on randomized data. Moreover, we gain more improvement in efficiency when the number of tests is large. R code for our algorithm and the shortcut method are available at .
PMCID: PMC2718927  PMID: 19558706
12.  Cellular reprogramming by the conjoint action of ERα, FOXA1, and GATA3 to a ligand-inducible growth state 
Estrogen receptor α (ERα), FOXA1, and GATA3 form a functional enhanceosome in MCF-7 breast carcinoma cell that is significantly associated with active transcriptional features such as enhanced p300 co-activator and RNA Pol II recruitment as well as chromatin opening.The enhanceosome exerts significant impact and optimal transcriptional control in the regulation of E2-responsive genes.The presence of FOXA1 and GATA3 is indispensable in restoring the ERα growth-response machinery in the ERα-negative cells and recapitulating the appropriate expression cassette.
Estrogen receptor α (ERα) is a ligand-inducible hormone nuclear receptor that has important physiology and pathology roles in reproduction, cancer, and cardiovascular biology. The regulation of ERα involves its binding to the DNA recognition sequence also known as estrogen-response elements (EREs) and recruits a variety of co-activators, corepressors, and chromatin remodeling enzymes to initiate transcription machinery. In our previous (Lin et al, 2007) and recent (Joseph et al, 2010) studies, we have identified high confidence ERα binding sites in MCF-7 human mammary carcinoma cells. With known motif scanning and de novo motif detection, we identified that FOXA1 and GATA3 motifs were commonly enriched around ERα binding sites. Moreover, numerous microarray studies have documented the co-expression of ERα, FOXA1, and GATA3 in primary breast tumors (Badve et al, 2007; Wilson and Giguere, 2008). This evidence suggests that these three transcription factors (TFs) may cluster on DNA binding sites and contribute to the breast cancer phenotype. However, there is little understanding as to the nature of their coordinated interaction at the genome level or the biological consequences of their detailed interaction.
We mapped the genome-wide binding profiles of ERα, FOXA1, and GATA3 using the massive parallel chromatin immunoprecipitation-sequencing (ChIP-seq) approach. We observed that ERα, FOXA1, and GATA3 colocalized in a coordinated manner where ∼30% of all ERα binding sites were overlapped with FOXA1 and GATA3 bindings upon estrogen (E2) stimulation. Moreover, we found that the ERα+FOXA1+GATA3 conjoint sites were associated with highest p300 co-activator recruitment, RNA Pol II occupancy, and chromatin opening. Such results indicate that these three TFs form a functional enhanceosome and cooperatively modulate the transcriptional networks previously ascribed to ERα alone. And such enhanceosome binding sites appear to regulate the genes driving core ERα function.
To further validate that ERα+FOXA1+GATA3 co-binding represents an optimal configuration for E2-mediated transcriptional activation, we have performed luciferase reporter assays on GREB1 locus that actively engages ERα enhanceosome sites in gene regulation (Figure 5C). The presence of ERα induced the GREB1 luciferase activity to ∼246% (as compared with the control construct). The individual presence of FOXA1 and GATA3 or combination of both only produced subtle changes to the GREB1 luciferase activity. The combination of ERα+FOXA1 and ERα+GATA3 has increased the luciferase activity to ∼330%. Interestingly, the assemblage of ERα+FOXA1+GATA3 provided the optimal ER responsiveness to 370%. This suggests that ERα provides the fundamental gene regulatory module but that FOXA1 and GATA3 incrementally improve ERα-regulated transcriptional induction.
It is known that ERα is a ligand-activated TF that mediates the proliferative effects of E2 in breast cancer cells. Garcia et al (1992) showed inhibited growth in MDA-MB-231 cells with forced expression of ERα upon E2 treatment. The rationale for these different outcomes has remained elusive. We posited that these higher order regulatory mechanisms of ERα function such as the formation and composition of enhanceosomes may explain the establishment of transcriptional regulatory cassettes favoring either growth enhancement or growth repression.
To test this hypothesis, we stably transfected the MDA-MB-231 cells with individual ERα, FOXA1, GATA3, or in combinations (Figure 6A). We observed inhibited growth in cells with enforced expression of ERα or FOXA1. There was unaltered growth in cells with expression of GATA3. Co-expression of ERα+FOXA1 or ERα+GATA3 exhibited inhibition of cell proliferation as compared with control cells. However, the co-expression of ERα together with FOXA1 and GATA3 resulted in marked induction of cell proliferation under E2 stimulation. We have recapitulated this cellular reprogramming in another ERα-negative breast cancer cell line, BT-549 and observed similar E2-responsive growth induction in the ERα+FOXA1+GATA3-expressing BT-549 cells. This suggests that only with the full activation of conjoint binding sites by the three TFs will the proliferative phenotype associated with ligand induced ERα be manifest.
To assess the nature of this transcriptional reprogramming, we asked the question if the reprogrammed MDA-MB-231 cells display any similarity in the expression profile of the ERα-positive breast cancer cell line, MCF-7 (Figure 6C). We combined the E2-regulated genes from these differently transfected MDA-MB-231 cells, and compared their expressions in these MDA-MB-231-transfected cells and MCF-7 cells. Strikingly, we found that the expression profiles of ERα+FOXA1+GATA3-expressing MDA-MB-231 cells display a good correlation (R=0.42) with the E2-induced expression profile of MCF-7. We did not observe such correlation between the expression profiles of MDA-MB-231 transfected with ERα only (R=−0.21). Furthermore, we observed that there is marginal induced expression of luminal marker genes and reduced expression of basal genes in the ERα+FOXA1+GATA3-expressing MDA-MB-231 as compared with the vector control cells. This suggests that the enhanceosome component is competent to partially reprogramme the basal cells to resemble the luminal cells.
Taken together, we have uncovered the genomics impact as well as the functional importance of an enhanceosome comprising ERα, FOXA1, and GATA3 in the estrogen responsiveness of ERα-positive breast cancer cells. This enhanceosome exerts significant combinatorial control of the transcriptional network regulating growth and proliferation of ERα-positive breast cancer cells. Most importantly, we show that the transfection of the enhanceosome component was necessary to reprogramme the ERα-negative cells to restore the estrogen-responsive growth and to transcriptionally induce a basal to luminal transition.
Despite the role of the estrogen receptor α (ERα) pathway as a key growth driver for breast cells, the phenotypic consequence of exogenous introduction of ERα into ERα-negative cells paradoxically has been growth inhibition. We mapped the binding profiles of ERα and its interacting transcription factors (TFs), FOXA1 and GATA3 in MCF-7 breast carcinoma cells, and observed that these three TFs form a functional enhanceosome that regulates the genes driving core ERα function and cooperatively modulate the transcriptional networks previously ascribed to ERα alone. We demonstrate that these enhanceosome occupied sites are associated with optimal enhancer characteristics with highest p300 co-activator recruitment, RNA Pol II occupancy, and chromatin opening. Most importantly, we show that the transfection of all three TFs was necessary to reprogramme the ERα-negative MDA-MB-231 and BT-549 cells to restore the estrogen-responsive growth resembling estrogen-treated ERα-positive MCF-7 cells. Cumulatively, these results suggest that all the enhanceosome components comprising ERα, FOXA1, and GATA3 are necessary for the full repertoire of cancer-associated effects of the ERα.
PMCID: PMC3202798  PMID: 21878914
enhanceosome; estrogen receptor α; FOXA1; GATA3; synthetic phenotypes
13.  The migration behaviour of DNA replicative intermediates containing an internal bubble analyzed by two-dimensional agarose gel electrophoresis. 
Nucleic Acids Research  1993;21(23):5474-5479.
Initiation of DNA replication in higher eukaryotes is still a matter of controversy. Some evidence suggests it occurs at specific sites. Data obtained using two-dimensional (2D) agarose gel electrophoresis, however, led to the notion that it may occur at random in broad zones. This hypothesis is primarily based on the observation that several contiguous DNA fragments generate a mixture of the so-called 'bubble' and 'simple Y' patterns in Neutral/neutral 2D gels. The interpretation that this mixture of hybridisation patterns is indicative for random initiation of DNA synthesis relies on the assumption that replicative intermediates (RIs) containing an internal bubble where initiation occurred at different relative positions, generate comigrating signals. The latter, however, is still to be proven. We investigated this problem by analysing together, in the same 2D gel, populations of pBR322 RIs that were digested with different restriction endonucleases that cut the monomer only once at different locations. DNA synthesis begins at a specific site in pBR322 and progresses in a uni-directional manner. Thus, the main difference between these sets of RIs was the relative position of the origin. The results obtained clearly showed that populations of RIs containing an internal bubble where initiation occurred at different relative positions do not generate signals that co-migrate all-the-way in 2D gels. Despite this observation, however, our results support the notion that random initiation is indeed responsible for the peculiar 'bubble' signal observed in the case of several metazoan eukaryotes.
PMCID: PMC310588  PMID: 8265365
14.  Statistical colocalization of monocyte gene expression and genetic risk variants for type 1 diabetes 
Human Molecular Genetics  2012;21(12):2815-2824.
One mechanism by which disease-associated DNA variation can alter disease risk is altering gene expression. However, linkage disequilibrium (LD) between variants, mostly single-nucleotide polymorphisms (SNPs), means it is not sufficient to show that a particular variant associates with both disease and expression, as there could be two distinct causal variants in LD. Here, we describe a formal statistical test of colocalization and apply it to type 1 diabetes (T1D)-associated regions identified mostly through genome-wide association studies and expression quantitative trait loci (eQTLs) discovered in a recently determined large monocyte expression data set from the Gutenberg Health Study (1370 individuals), with confirmation sought in an additional data set from the Cardiogenics Transcriptome Study (558 individuals). We excluded 39 out of 60 overlapping eQTLs in 49 T1D regions from possible colocalization and identified 21 coincident eQTLs, representing 21 genes in 14 distinct T1D regions. Our results reflect the importance of monocyte (and their derivatives, macrophage and dendritic cell) gene expression in human T1D and support the candidacy of several genes as causal factors in autoimmune pancreatic beta-cell destruction, including AFF3, CD226, CLECL1, DEXI, FKRP, PRKD2, RNLS, SMARCE1 and SUOX, in addition to the recently described GPR183 (EBI2) gene.
PMCID: PMC3363338  PMID: 22403184
15.  KAP-1 Corepressor Protein Interacts and Colocalizes with Heterochromatic and Euchromatic HP1 Proteins: a Potential Role for Krüppel-Associated Box–Zinc Finger Proteins in Heterochromatin-Mediated Gene Silencing 
Molecular and Cellular Biology  1999;19(6):4366-4378.
Krüppel-associated box (KRAB) domains are present in approximately one-third of all human zinc finger proteins (ZFPs) and are potent transcriptional repression modules. We have previously cloned a corepressor for the KRAB domain, KAP-1, which is required for KRAB-mediated repression in vivo. To characterize the repression mechanism utilized by KAP-1, we have analyzed the ability of KAP-1 to interact with murine (M31 and M32) and human (HP1α and HP1γ) homologues of the HP1 protein family, a class of nonhistone heterochromatin-associated proteins with a well-established epigenetic gene silencing function in Drosophila. In vitro studies confirmed that KAP-1 is capable of directly interacting with M31 and hHP1α, which are normally found in centromeric heterochromatin, as well as M32 and hHP1γ, both of which are found in euchromatin. Mapping of the region in KAP-1 required for HP1 interaction showed that amino acid substitutions which abolish HP1 binding in vitro reduce KAP-1 mediated repression in vivo. We observed colocalization of KAP-1 with M31 and M32 in interphase nuclei, lending support to the biochemical evidence that M31 and M32 directly interact with KAP-1. The colocalization of KAP-1 with M31 is sometimes found in subnuclear territories of potential pericentromeric heterochromatin, whereas colocalization of KAP-1 and M32 occurs in punctate euchromatic domains throughout the nucleus. This work suggests a mechanism for the recruitment of HP1-like gene products by the KRAB-ZFP–KAP-1 complex to specific loci within the genome through formation of heterochromatin-like complexes that silence gene activity. We speculate that gene-specific repression may be a consequence of the formation of such complexes, ultimately leading to silenced genes in newly formed heterochromatic chromosomal environments.
PMCID: PMC104396  PMID: 10330177
16.  HeliCis: a DNA motif discovery tool for colocalized motif pairs with periodic spacing 
BMC Bioinformatics  2007;8:418.
Correct temporal and spatial gene expression during metazoan development relies on combinatorial interactions between different transcription factors. As a consequence, cis-regulatory elements often colocalize in clusters termed cis-regulatory modules. These may have requirements on organizational features such as spacing, order and helical phasing (periodic spacing) between binding sites. Due to the turning of the DNA helix, a small modification of the distance between a pair of sites may sometimes drastically disrupt function, while insertion of a full helical turn of DNA (10–11 bp) between cis elements may cause functionality to be restored. Recently, de novo motif discovery methods which incorporate organizational properties such as colocalization and order preferences have been developed, but there are no tools which incorporate periodic spacing into the model.
We have developed a web based motif discovery tool, HeliCis, which features a flexible model which allows de novo detection of motifs with periodic spacing. Depending on the parameter settings it may also be used for discovering colocalized motifs without periodicity or motifs separated by a fixed gap of known or unknown length. We show on simulated data that it can efficiently capture the synergistic effects of colocalization and periodic spacing to improve detection of weak DNA motifs. It provides a simple to use web interface which interactively visualizes the current settings and thereby makes it easy to understand the parameters and the model structure.
HeliCis provides simple and efficient de novo discovery of colocalized DNA motif pairs, with or without periodic spacing. Our evaluations show that it can detect weak periodic patterns which are not easily discovered using a sequential approach, i.e. first finding the binding sites and second analyzing the properties of their pairwise distances.
PMCID: PMC2200674  PMID: 17963524
17.  NuChart: An R Package to Study Gene Spatial Neighbourhoods with Multi-Omics Annotations 
PLoS ONE  2013;8(9):e75146.
Long-range chromosomal associations between genomic regions, and their repositioning in the 3D space of the nucleus, are now considered to be key contributors to the regulation of gene expression and important links have been highlighted with other genomic features involved in DNA rearrangements. Recent Chromosome Conformation Capture (3C) measurements performed with high throughput sequencing (Hi-C) and molecular dynamics studies show that there is a large correlation between colocalization and coregulation of genes, but these important researches are hampered by the lack of biologists-friendly analysis and visualisation software. Here, we describe NuChart, an R package that allows the user to annotate and statistically analyse a list of input genes with information relying on Hi-C data, integrating knowledge about genomic features that are involved in the chromosome spatial organization. NuChart works directly with sequenced reads to identify the related Hi-C fragments, with the aim of creating gene-centric neighbourhood graphs on which multi-omics features can be mapped. Predictions about CTCF binding sites, isochores and cryptic Recombination Signal Sequences are provided directly with the package for mapping, although other annotation data in bed format can be used (such as methylation profiles and histone patterns). Gene expression data can be automatically retrieved and processed from the Gene Expression Omnibus and ArrayExpress repositories to highlight the expression profile of genes in the identified neighbourhood. Moreover, statistical inferences about the graph structure and correlations between its topology and multi-omics features can be performed using Exponential-family Random Graph Models. The Hi-C fragment visualisation provided by NuChart allows the comparisons of cells in different conditions, thus providing the possibility of novel biomarkers identification. NuChart is compliant with the Bioconductor standard and it is freely available at
PMCID: PMC3777921  PMID: 24069388
18.  Role of the Herpes Simplex Virus Helicase-Primase Complex during Adeno-Associated Virus DNA Replication 
Journal of Virology  2006;80(11):5241-5250.
A subset of DNA replication proteins of herpes simplex virus (HSV) comprising the single-strand DNA-binding protein, ICP8 (UL29), and the helicase-primase complex (UL5, UL8, and UL52 proteins) has previously been shown to be sufficient for the replication of adeno-associated virus (AAV). We recently demonstrated complex formation between ICP8, AAV Rep78, and the single-stranded DNA AAV genome, both in vitro and in the nuclear HSV replication domains of coinfected cells. In this study the functional role(s) of HSV helicase and primase during AAV DNA replication were analyzed. To differentiate between their necessity as structural components of the HSV replication complex or as active enzymes, point mutations within the helicase and primase catalytic domains were analyzed. In two complementary approaches the remaining HSV helper functions were either provided by infection with HSV mutants or by plasmid transfection. We show here that upon cotransfection of the minimal four HSV proteins (i.e., the four proteins constituting the minimal requirements for basal AAV replication), UL52 primase catalytic activity was not required for AAV DNA replication. In contrast, UL5 helicase activity was necessary for fully efficient replication. Confocal microscopy confirmed that all mutants retained the ability to support formation of ICP8-positive nuclear replication foci, to which AAV Rep78 colocalized in a manner strictly dependent on the presence of AAV single-stranded DNA (ssDNA). The data indicate that recruitment of AAV Rep78 and ssDNA to nuclear replication sites by the four HSV helper proteins is maintained in the absence of catalytic primase or helicase activities and suggest an involvement of the HSV UL5 helicase activity during AAV DNA replication.
PMCID: PMC1472166  PMID: 16699004
19.  Origin Replication Complex Binding, Nucleosome Depletion Patterns, and a Primary Sequence Motif Can Predict Origins of Replication in a Genome with Epigenetic Centromeres 
mBio  2014;5(5):e01703-14.
Origins of DNA replication are key genetic elements, yet their identification remains elusive in most organisms. In previous work, we found that centromeres contain origins of replication (ORIs) that are determined epigenetically in the pathogenic yeast Candida albicans. In this study, we used origin recognition complex (ORC) binding and nucleosome occupancy patterns in Saccharomyces cerevisiae and Kluyveromyces lactis to train a machine learning algorithm to predict the position of active arm (noncentromeric) origins in the C. albicans genome. The model identified bona fide active origins as determined by the presence of replication intermediates on nondenaturing two-dimensional (2D) gels. Importantly, these origins function at their native chromosomal loci and also as autonomously replicating sequences (ARSs) on a linear plasmid. A “mini-ARS screen” identified at least one and often two ARS regions of ≥100 bp within each bona fide origin. Furthermore, a 15-bp AC-rich consensus motif was associated with the predicted origins and conferred autonomous replicating activity to the mini-ARSs. Thus, while centromeres and the origins associated with them are epigenetic, arm origins are dependent upon critical DNA features, such as a binding site for ORC and a propensity for nucleosome exclusion.
DNA replication machinery is highly conserved, yet the definition of exactly what specifies a replication origin differs in different species. Here, we utilized computational genomics to predict origin locations in Candida albicans by combining locations of binding sites for the conserved origin replication complex, necessary for replication initiation, together with chromatin organization patterns. We identified predicted sequences that exhibited bona fide origin function and developed a linear plasmid assay to delimit the DNA fragments necessary for origin function. Additionally, we found that a short AC-rich motif, which is enriched in predicted origins, is required for origin function. Thus, we demonstrated a new machine learning paradigm for identification of potential origins from a genome with no prior information. Furthermore, this work suggests that C. albicans has two different types of origins: “hard-wired” arm origins that rely upon specific sequence motifs and “epigenetic” centromeric origins that are recruited to kinetochores in a sequence-independent manner.
PMCID: PMC4173791  PMID: 25182328
20.  Human Papillomavirus DNA Replication Compartments in a Transient DNA Replication System 
Journal of Virology  1999;73(2):1001-1009.
Many DNA viruses replicate their genomes at nuclear foci in infected cells. Using indirect immunofluorescence in combination with fluorescence in situ hybridization, we colocalized the human papillomavirus (HPV) replicating proteins E1 and E2 and the replicating origin-containing plasmid to nuclear foci in transiently transfected cells. The host replication protein A (RP-A) was also colocalized to these foci. These nuclear structures were identified as active sites of viral DNA synthesis by bromodeoxyuridine (BrdU) pulse-labeling. Unexpectedly, the great majority of RP-A and BrdU incorporation was found in these HPV replication domains. Furthermore, E1, E2, and RP-A were also colocalized to nuclear foci in the absence of an origin-containing plasmid. These observations suggest a spatial reorganization of the host DNA replication machinery upon HPV DNA replication or E1 and E2 expression. Alternatively, viral DNA replication might be targeted to host nuclear domains that are active during the late S phase, when such domains are limited in number. In a fraction of cells expressing E1 and E2, the promyelocytic leukemia protein, a component of nuclear domain 10 (ND10), was either partially or completely colocalized with E1 and E2. Since ND10 structures were recently hypothesized to be sites of bovine papillomavirus virion assembly, our observation suggests that HPV DNA amplification might be partially coupled to virion assembly.
PMCID: PMC103920  PMID: 9882301
21.  Testing mitochondrial sequences and anonymous nuclear markers for phylogeny reconstruction in a rapidly radiating group: molecular systematics of the Delphininae (Cetacea: Odontoceti: Delphinidae) 
Many molecular phylogenetic analyses rely on DNA sequence data obtained from single or multiple loci, particularly mitochondrial DNA loci. However, phylogenies for taxa that have undergone recent, rapid radiation events often remain unresolved. Alternative methodologies for discerning evolutionary relationships under these conditions are desirable. The dolphin subfamily Delphininae is a group that has likely resulted from a recent and rapid radiation. Despite several efforts, the evolutionary relationships among the species in the subfamily remain unclear.
Here, we compare a phylogeny estimated using mitochondrial DNA (mtDNA) control region sequences to a multi-locus phylogeny inferred from 418 polymorphic genomic markers obtained from amplified fragment length polymorphism (AFLP) analysis. The two sets of phylogenies are largely incongruent, primarily because the mtDNA tree provides very poor resolving power; very few species' nodes in the tree are supported by bootstrap resampling. The AFLP phylogeny is considerably better resolved and more congruent with relationships inferred from morphological data. Both phylogenies support paraphyly for the genera Stenella and Tursiops. The AFLP data indicate a close relationship between the two spotted dolphin species and recent ancestry between Stenella clymene and S. longirostris. The placement of the Lagenodelphis hosei lineage is ambiguous: phenetic analysis of the AFLP data is consistent with morphological expectations but the phylogenetic analysis is not.
For closely related, recently diverged taxa, a multi-locus genome-wide survey is likely the most comprehensive approach currently available for phylogenetic inference.
PMCID: PMC2770059  PMID: 19811651
22.  Assessing Differential Expression in Two-Color Microarrays: A Resampling-Based Empirical Bayes Approach 
PLoS ONE  2013;8(11):e80099.
Microarrays are widely used for examining differential gene expression, identifying single nucleotide polymorphisms, and detecting methylation loci. Multiple testing methods in microarray data analysis aim at controlling both Type I and Type II error rates; however, real microarray data do not always fit their distribution assumptions. Smyth's ubiquitous parametric method, for example, inadequately accommodates violations of normality assumptions, resulting in inflated Type I error rates. The Significance Analysis of Microarrays, another widely used microarray data analysis method, is based on a permutation test and is robust to non-normally distributed data; however, the Significance Analysis of Microarrays method fold change criteria are problematic, and can critically alter the conclusion of a study, as a result of compositional changes of the control data set in the analysis. We propose a novel approach, combining resampling with empirical Bayes methods: the Resampling-based empirical Bayes Methods. This approach not only reduces false discovery rates for non-normally distributed microarray data, but it is also impervious to fold change threshold since no control data set selection is needed. Through simulation studies, sensitivities, specificities, total rejections, and false discovery rates are compared across the Smyth's parametric method, the Significance Analysis of Microarrays, and the Resampling-based empirical Bayes Methods. Differences in false discovery rates controls between each approach are illustrated through a preterm delivery methylation study. The results show that the Resampling-based empirical Bayes Methods offer significantly higher specificity and lower false discovery rates compared to Smyth's parametric method when data are not normally distributed. The Resampling-based empirical Bayes Methods also offers higher statistical power than the Significance Analysis of Microarrays method when the proportion of significantly differentially expressed genes is large for both normally and non-normally distributed data. Finally, the Resampling-based empirical Bayes Methods are generalizable to next generation sequencing RNA-seq data analysis.
PMCID: PMC3842292  PMID: 24312198
23.  Conformational Changes in the Herpes Simplex Virus ICP8 DNA-Binding Protein Coincident with Assembly in Viral Replication Structures 
Journal of Virology  2003;77(13):7467-7476.
The herpes simplex virus (HSV) single-stranded DNA-binding protein, ICP8, is required for viral DNA synthesis. Before viral DNA replication, ICP8 colocalizes with other replication proteins at small punctate foci called prereplicative sites. With the onset of viral genome amplification, these proteins become redistributed into large globular replication compartments. Here we present the results of immunocytochemical and biochemical analysis of ICP8 showing that various antibodies recognize distinct forms of ICP8. Using these ICP8-specific antibodies as probes for ICP8 structure, we detected a time-dependent appearance and disappearance of ICP8 epitopes in immunoprecipitation assays. Immunofluorescence staining of ICP8 in cells infected with different HSV mutant viruses as well as cells transfected with a limited number of viral genes demonstrated that these and other antigenic changes occur coincident with ICP8 assembly at intranuclear replication structures. Genetic analysis has revealed a correlation between the ability of various ICP8 mutant proteins to form the 39S epitope and their ability to bind to DNA. These results support the hypothesis that ICP8 undergoes a conformational change upon binding to other HSV proteins and/or to DNA coincident with assembly into viral DNA replication structures.
PMCID: PMC164794  PMID: 12805446
24.  Paternal Poly (ADP-ribose) Metabolism Modulates Retention of Inheritable Sperm Histones and Early Embryonic Gene Expression 
PLoS Genetics  2014;10(5):e1004317.
To achieve the extreme nuclear condensation necessary for sperm function, most histones are replaced with protamines during spermiogenesis in mammals. Mature sperm retain only a small fraction of nucleosomes, which are, in part, enriched on gene regulatory sequences, and recent findings suggest that these retained histones provide epigenetic information that regulates expression of a subset of genes involved in embryo development after fertilization. We addressed this tantalizing hypothesis by analyzing two mouse models exhibiting abnormal histone positioning in mature sperm due to impaired poly(ADP-ribose) (PAR) metabolism during spermiogenesis and identified altered sperm histone retention in specific gene loci genome-wide using MNase digestion-based enrichment of mononucleosomal DNA. We then set out to determine the extent to which expression of these genes was altered in embryos generated with these sperm. For control sperm, most genes showed some degree of histone association, unexpectedly suggesting that histone retention in sperm genes is not an all-or-none phenomenon and that a small number of histones may remain associated with genes throughout the genome. The amount of retained histones, however, was altered in many loci when PAR metabolism was impaired. To ascertain whether sperm histone association and embryonic gene expression are linked, the transcriptome of individual 2-cell embryos derived from such sperm was determined using microarrays and RNA sequencing. Strikingly, a moderate but statistically significant portion of the genes that were differentially expressed in these embryos also showed different histone retention in the corresponding gene loci in sperm of their fathers. These findings provide new evidence for the existence of a linkage between sperm histone retention and gene expression in the embryo.
Author Summary
That not all histones are replaced by protamines in the sperm nucleus during spermiogenesis has been known for almost three decades, along with the notion that protamines do not bear any specific epigenetic information whereas histones typically carry posttranslational modifications with epigenetic regulatory functions. The enrichment of histones with distinct epigenetic modifications around transcriptional start sites, as well as unmethylated GC-rich promoter regions and exons in murine and human sperm, has recently been demonstrated by others at high resolution. The evolutionary conservation of the common principles underlying sperm histone retention provides a plausible rationale for epigenetic inheritance by nucleosomes. The present study takes a different approach towards testing the overarching hypothesis that sperm histones are linked to early embryonic gene expression by analyzing expression of genes in 2-cell embryos originating from sperm in which gene histone association of these genes was experimentally altered. The results are consistent with the aforementioned hypothesis and support the view of sperm histones as potential mediators of epigenetic inheritance through the male germ line, which could also contribute to phenotypic variation in mammals in response to environmental or dietary factors that affect sensitive chromatin-modulating pathways such as PAR metabolism.
PMCID: PMC4014456  PMID: 24810616
25.  Kaposi's Sarcoma-Associated Herpesvirus Latency in Endothelial and B Cells Activates Gamma Interferon-Inducible Protein 16-Mediated Inflammasomes 
Journal of Virology  2013;87(8):4417-4431.
Kaposi's sarcoma-associated herpesvirus (KSHV) infections of endothelial and B cells are etiologically linked with Kaposi's sarcoma (KS) and primary effusion B-cell lymphoma (PEL), respectively. KS endothelial and PEL B cells carry multiple copies of the nuclear episomal latent KSHV genome and secrete a variety of inflammatory cytokines, including interleukin-1β (IL-1β) and IL-18. The maturation of IL-1β and IL-18 depends upon active caspase-1, which is regulated by a multiprotein inflammasome complex induced by sensing of danger signals. During primary KSHV infection of endothelial cells, acting as a nuclear pattern recognition receptor, gamma interferon-inducible protein 16 (IFI16) colocalized with the KSHV genome in the nuclei and interacted with ASC and procaspase-1 to form a functional inflammasome (Kerur N et al., Cell Host Microbe 9:363-375, 2011). Here, we demonstrate that endothelial telomerase-immortalized human umbilical cells (TIVE) supporting KSHV stable latency (TIVE-LTC cells) and PEL (cavity-based B-cell lymphoma 1 [BCBL-1]) cells show evidence of inflammasome activation, such as the activation of caspase-1 and cleavage of pro-IL-1β and pro-IL-18. Interaction of ASC with IFI16 but not with AIM2 or NOD-like receptor P3 (NLRP3) was detected. The KSHV latency-associated viral FLIP (vFLIP) gene induced the expression of IL-1β, IL-18, and caspase-1 mRNAs in an NF-κB-dependent manner. IFI16 and cleaved IL-1β were detected in the exosomes released from BCBL-1 cells. Exosomal release could be a KSHV-mediated strategy to subvert IL-1β functions. In fluorescent in situ hybridization analyses, IFI16 colocalized with multiple copies of the KSHV genome in BCBL-1 cells. IFI16 colocalization with ASC was also detected in lung PEL sections from patients. Taken together, these findings demonstrated the constant sensing of the latent KSHV genome by IFI16-mediated innate defense and unraveled a potential mechanism of inflammation induction associated with KS and PEL lesions.
PMCID: PMC3624349  PMID: 23388709

