Search tips
Search criteria

Results 1-12 (12)

Clipboard (0)

Select a Filter Below

Year of Publication
Document Types
author:("luck, marges")
1.  Enhancer Evolution across 20 Mammalian Species 
Cell  2015;160(3):554-566.
The mammalian radiation has corresponded with rapid changes in noncoding regions of the genome, but we lack a comprehensive understanding of regulatory evolution in mammals. Here, we track the evolution of promoters and enhancers active in liver across 20 mammalian species from six diverse orders by profiling genomic enrichment of H3K27 acetylation and H3K4 trimethylation. We report that rapid evolution of enhancers is a universal feature of mammalian genomes. Most of the recently evolved enhancers arise from ancestral DNA exaptation, rather than lineage-specific expansions of repeat elements. In contrast, almost all liver promoters are partially or fully conserved across these species. Our data further reveal that recently evolved enhancers can be associated with genes under positive selection, demonstrating the power of this approach for annotating regulatory adaptations in genomic sequences. These results provide important insight into the functional genetics underpinning mammalian regulatory evolution.
Graphical Abstract
•Rapid enhancer and slow promoter evolution across genomes of 20 mammalian species•Enhancers are rarely conserved across these mammals•Recently evolved enhancers dominate mammalian regulatory landscapes•Unbiased mapping links candidate enhancers with lineage-specific positive selection
Comparative functional genomic analysis in 20 mammalian species reveals distinct features for the evolution of enhancers, in comparison to those of promoters, across 180 million years.
PMCID: PMC4313353  PMID: 25635462
2.  Multi-species, multi-transcription factor binding highlights conserved control of tissue-specific biological pathways 
eLife  null;3:e02626.
As exome sequencing gives way to genome sequencing, the need to interpret the function of regulatory DNA becomes increasingly important. To test whether evolutionary conservation of cis-regulatory modules (CRMs) gives insight into human gene regulation, we determined transcription factor (TF) binding locations of four liver-essential TFs in liver tissue from human, macaque, mouse, rat, and dog. Approximately, two thirds of the TF-bound regions fell into CRMs. Less than half of the human CRMs were found as a CRM in the orthologous region of a second species. Shared CRMs were associated with liver pathways and disease loci identified by genome-wide association studies. Recurrent rare human disease causing mutations at the promoters of several blood coagulation and lipid metabolism genes were also identified within CRMs shared in multiple species. This suggests that multi-species analyses of experimentally determined combinatorial TF binding will help identify genomic regions critical for tissue-specific gene control.
eLife digest
Stretches of DNA called cis-regulatory modules (or CRMs for short) could help researchers to identify the regions of DNA that are most important for controlling genes. CRMs are regions where multiple transcription factors—proteins that control when and how genes are expressed—bind to DNA. As important biological pathways are often regulated by more than one transcription factor, CRMs are therefore a good target when looking for DNA regions that, if mutated, are likely to cause disease.
If a stretch of DNA performs an important role, it is often conserved throughout evolution. This is often observed for genes that make proteins. Indeed, DNA regions that specify critical amino acids that make up proteins are often conserved across distantly related species. However, unlike the changes made to the amino acid encoding parts of genes, it is currently a challenge to predict which changes in the rest of the genome will affect gene expression.
One reason for this challenge is that transcription factor binding sites are rapidly evolving. This rapid evolution means that strictly comparing DNA sequences between species may fail to identify where transcription factors like to bind in the genome. Numerous experimental efforts have therefore been made to map these sites. These have revealed that there are a huge number of regions in the human genome that can bind transcription factors: hundreds of thousands of sites, far more than there are genes. For this reason, there is a great interest in revealing which of these regulatory regions are critical for maintaining normal levels and timings of gene expression.
Ballester et al. compared the binding sites of four transcription factors responsible for regulating liver function in humans, macaques, mice, rats, and dogs. About two-thirds of these binding sites were found in CRMs. Less than half of the CRMs in humans were also CRMs in another species—but Ballester et al. found that these shared CRMs are predominantly in charge of regulating the essential biological pathways that allow the liver to function correctly. In addition, Ballester et al. identified several examples of disease-causing DNA mutations in shared CRMs that affected the expression of genes that make up pathways such as the blood clotting cascade. Genome-wide association studies also uncovered common variants for liver-related traits that were enriched for the CRMs found in more than one species, further supporting their importance.
As transcription factors work in different ways in different tissues, further studies are now required to expand these observations to organs other than the liver. Future work is also needed to investigate the function of thousands of conserved CRMs whose role in liver gene regulation remains unknown.
PMCID: PMC4359374  PMID: 25279814
cis regulatory module; transcription factors; molecular evolution; macaque; dog; liver; human; mouse; rat; other
3.  AHT-ChIP-seq: a completely automated robotic protocol for high-throughput chromatin immunoprecipitation 
Genome Biology  2013;14(11):R124.
ChIP-seq is an established manually-performed method for identifying DNA-protein interactions genome-wide. Here, we describe a protocol for automated high-throughput (AHT) ChIP-seq. To demonstrate the quality of data obtained using AHT-ChIP-seq, we applied it to five proteins in mouse livers using a single 96-well plate, demonstrating an extremely high degree of qualitative and quantitative reproducibility among biological and technical replicates. We estimated the optimum and minimum recommended cell numbers required to perform AHT-ChIP-seq by running an additional plate using HepG2 and MCF7 cells. With this protocol, commercially available robotics can perform four hundred experiments in five days.
PMCID: PMC4053851  PMID: 24200198
4.  Global Gene Expression Profiling Reveals SPINK1 as a Potential Hepatocellular Carcinoma Marker 
PLoS ONE  2013;8(3):e59459.
Liver cirrhosis is the most important risk factor for hepatocellular carcinoma (HCC) but the role of liver disease aetiology in cancer development remains under-explored. We investigated global gene expression profiles from HCC arising in different liver diseases to test whether HCC development is driven by expression of common or different genes, which could provide new diagnostic markers or therapeutic targets.
Methodology and Principal Findings
Global gene expression profiling was performed for 4 normal (control) livers as well as 8 background liver and 7 HCC from 3 patients with hereditary haemochromatosis (HH) undergoing surgery. In order to investigate different disease phenotypes causing HCC, the data were compared with public microarray repositories for gene expression in normal liver, hepatitis C virus (HCV) cirrhosis, HCV-related HCC (HCV-HCC), hepatitis B virus (HBV) cirrhosis and HBV-related HCC (HBV-HCC). Principal component analysis and differential gene expression analysis were carried out using R Bioconductor. Liver disease-specific and shared gene lists were created and genes identified as highly expressed in hereditary haemochromatosis HCC (HH-HCC) were validated using quantitative RT-PCR. Selected genes were investigated further using immunohistochemistry in 86 HCC arising in liver disorders with varied aetiology. Using a 2-fold cut-off, 9 genes were highly expressed in all HCC, 11 in HH-HCC, 270 in HBV-HCC and 9 in HCV-HCC. Six genes identified by microarray as highly expressed in HH-HCC were confirmed by RT qPCR. Serine peptidase inhibitor, Kazal type 1 (SPINK1) mRNA was very highly expressed in HH-HCC (median fold change 2291, p = 0.0072) and was detected by immunohistochemistry in 91% of HH-HCC, 0% of HH-related cirrhotic or dysplastic nodules and 79% of mixed-aetiology HCC.
HCC, arising from diverse backgrounds, uniformly over-express a small set of genes. SPINK1, a secretory trypsin inhibitor, demonstrated potential as a diagnostic HCC marker and should be evaluated in future studies.
PMCID: PMC3601070  PMID: 23527199
5.  Latent Regulatory Potential of Human-Specific Repetitive Elements 
Molecular Cell  2013;49(2):262-272.
At least half of the human genome is derived from repetitive elements, which are often lineage specific and silenced by a variety of genetic and epigenetic mechanisms. Using a transchromosomic mouse strain that transmits an almost complete single copy of human chromosome 21 via the female germline, we show that a heterologous regulatory environment can transcriptionally activate transposon-derived human regulatory regions. In the mouse nucleus, hundreds of locations on human chromosome 21 newly associate with activating histone modifications in both somatic and germline tissues, and influence the gene expression of nearby transcripts. These regions are enriched with primate and human lineage-specific transposable elements, and their activation corresponds to changes in DNA methylation at CpG dinucleotides. This study reveals the latent regulatory potential of the repetitive human genome and illustrates the species specificity of mechanisms that control it.
► A mouse carrying human chromosome 21 fails to repress primate-specific repeats ► The lack of repression was revealed by H3K4me3 and transcription factor binding ► Activation corresponded to a decrease in CpG methylation ► Primate-specific repeats activated in human testes were activated in the Tc1 mouse
PMCID: PMC3560060  PMID: 23246434
6.  MageComet—web application for harmonizing existing large-scale experiment descriptions 
Bioinformatics  2012;28(10):1402-1403.
Motivation: Meta-analysis of large gene expression datasets obtained from public repositories requires consistently annotated data. Curation of such experiments, however, is an expert activity which involves repetitive manipulation of text. Existing tools for automated curation are few, which bottleneck the analysis pipeline.
Results: We present MageComet, a web application for biologists and annotators that facilitates the re-annotation of gene expression experiments in MAGE-TAB format. It incorporates data mining, automatic annotation, use of ontologies and data validation to improve the consistency and quality of experimental meta-data from the ArrayExpress Repository.
Availability and implementation: Source and tutorials for MageComet are openly available at under the GNU GPL v3 licenses. An implementation can be found at
Contact: or
PMCID: PMC3348561  PMID: 22474121
7.  Assessing affymetrix GeneChip microarray quality 
BMC Bioinformatics  2011;12:137.
Microarray technology has become a widely used tool in the biological sciences. Over the past decade, the number of users has grown exponentially, and with the number of applications and secondary data analyses rapidly increasing, we expect this rate to continue. Various initiatives such as the External RNA Control Consortium (ERCC) and the MicroArray Quality Control (MAQC) project have explored ways to provide standards for the technology. For microarrays to become generally accepted as a reliable technology, statistical methods for assessing quality will be an indispensable component; however, there remains a lack of consensus in both defining and measuring microarray quality.
We begin by providing a precise definition of microarray quality and reviewing existing Affymetrix GeneChip quality metrics in light of this definition. We show that the best-performing metrics require multiple arrays to be assessed simultaneously. While such multi-array quality metrics are adequate for bench science, as microarrays begin to be used in clinical settings, single-array quality metrics will be indispensable. To this end, we define a single-array version of one of the best multi-array quality metrics and show that this metric performs as well as the best multi-array metrics. We then use this new quality metric to assess the quality of microarry data available via the Gene Expression Omnibus (GEO) using more than 22,000 Affymetrix HGU133a and HGU133plus2 arrays from 809 studies.
We find that approximately 10 percent of these publicly available arrays are of poor quality. Moreover, the quality of microarray measurements varies greatly from hybridization to hybridization, study to study, and lab to lab, with some experiments producing unusable data. Many of the concepts described here are applicable to other high-throughput technologies.
PMCID: PMC3097162  PMID: 21548974
8.  ArrayExpress update—an archive of microarray and high-throughput sequencing-based functional genomics experiments 
Nucleic Acids Research  2010;39(Database issue):D1002-D1004.
The ArrayExpress Archive ( is one of the three international public repositories of functional genomics data supporting publications. It includes data generated by sequencing or array-based technologies. Data are submitted by users and imported directly from the NCBI Gene Expression Omnibus. The ArrayExpress Archive is closely integrated with the Gene Expression Atlas and the sequence databases at the European Bioinformatics Institute. Advanced queries provided via ontology enabled interfaces include queries based on technology and sample attributes such as disease, cell types and anatomy.
PMCID: PMC3013660  PMID: 21071405
9.  A global map of human gene expression 
Nature biotechnology  2010;28(4):322-324.
PMCID: PMC2974261  PMID: 20379172
10.  Importing ArrayExpress datasets into R/Bioconductor 
Bioinformatics  2009;25(16):2092-2094.
Summary:ArrayExpress is one of the largest public repositories of microarray datasets. R/Bioconductor provides a comprehensive suite of microarray analysis and integrative bioinformatics software. However, easy ways for importing datasets from ArrayExpress into R/Bioconductor have been lacking. Here, we present such a tool that is suitable for both interactive and automated use.
Availability: The ArrayExpress package is available from the Bioconductor project at A users guide and examples are provided with the package.
Supplementary information:Supplementary data are available Bioinformatics online.
PMCID: PMC2723004  PMID: 19505942
11.  ArrayExpress update—from an archive of functional genomics experiments to the atlas of gene expression 
Nucleic Acids Research  2008;37(Database issue):D868-D872.
ArrayExpress consists of three components: the ArrayExpress Repository—a public archive of functional genomics experiments and supporting data, the ArrayExpress Warehouse—a database of gene expression profiles and other bio-measurements and the ArrayExpress Atlas—a new summary database and meta-analytical tool of ranked gene expression across multiple experiments and different biological conditions. The Repository contains data from over 6000 experiments comprising approximately 200 000 assays, and the database doubles in size every 15 months. The majority of the data are array based, but other data types are included, most recently—ultra high-throughput sequencing transcriptomics and epigenetic data. The Warehouse and Atlas allow users to query for differentially expressed genes by gene names and properties, experimental conditions and sample properties, or a combination of both. In this update, we describe the ArrayExpress developments over the last two years.
PMCID: PMC2686529  PMID: 19015125
12.  MAGETabulator, a suite of tools to support the microarray data format MAGE-TAB 
Bioinformatics  2008;25(2):279-280.
Summary: The MAGE-TAB format for microarray data representation and exchange has been proposed by the microarray community to replace the more complex MAGE-ML format. We present a suite of tools to support MAGE-TAB generation and validation, conversion between existing formats for data exchange, visualization of the experiment designs encoded by MAGE-TAB documents and the mining of such documents for semantic content.
Availability: Software is available from
PMCID: PMC2638998  PMID: 19038988

Results 1-12 (12)