1.  Human-specific epigenetic variation in the immunological Leukotriene B4 Receptor (LTB4R/BLT1) implicated in common inflammatory diseases 
Genome Medicine  2014;6(3):19.
Common human diseases are caused by the complex interplay of genetic susceptibility as well as environmental factors. Due to the environment’s influence on the epigenome, and therefore genome function, as well as conversely the genome’s facilitative effect on the epigenome, analysis of this level of regulation may increase our knowledge of disease pathogenesis.
In order to identify human-specific epigenetic influences, we have performed a novel genome-wide DNA methylation analysis comparing human, chimpanzee and rhesus macaque.
We have identified that the immunological Leukotriene B4 receptor (LTB4R, BLT1 receptor) is the most epigenetically divergent human gene in peripheral blood in comparison with other primates. This difference is due to the co-ordinated active state of human-specific hypomethylation in the promoter and human-specific increased gene body methylation. This gene is significant in innate immunity and the LTB4/LTB4R pathway is involved in the pathogenesis of the spectrum of human inflammatory diseases. This finding was confirmed by additional neutrophil-only DNA methylome and lymphoblastoid H3K4me3 chromatin comparative data. Additionally we show through functional analysis that this receptor has increased expression and a higher response to the LTB4 ligand in human versus rhesus macaque peripheral blood mononuclear cells. Genome-wide we also find human species-specific differentially methylated regions (human s-DMRs) are more prevalent in CpG island shores than within the islands themselves, and within the latter are associated with the CTCF motif.
This result further emphasises the exclusive nature of the human immunological system, its divergent adaptation even from very closely related primates, and the power of comparative epigenomics to identify and understand human uniqueness.
PMCID: PMC4062055  PMID: 24598577
2.  Integrated virus-host methylome analysis in head and neck squamous cell carcinoma 
Epigenetics  2013;8(9):953-961.
One in six cancers worldwide is caused by infection and human papillomavirus (HPV) is one of the main culprits. To better understand the dynamics of HPV integration and its effect on both the viral and host methylomes, we conducted whole-genome DNA methylation analysis using MeDIP-seq of HPV+ and HPV- head and neck squamous cell carcinoma (HNSCC). We determined the viral subtype to be HPV-16 in all cases and show that HPV-16 integrates into the host genome at multiple random sites and that this process predominantly involves the transcriptional repressor gene (E2) in the viral genome. Comparative analysis identified 453 (FDR ≤ 0.01) differentially methylated regions (DMRs) in the HPV+ host methylome. Bioinformatics characterization of these DMRs confirmed the previously reported cadherin genes to be affected but also revealed new targets for HPV-mediated methylation changes at regions not covered by array-based platforms, including the recently identified super-enhancers.
PMCID: PMC3883772  PMID: 23867721
human papillomavirus (HPV); head and neck squamous cell carcinoma (HNSCC); DNA methylation; methylome; epigenome
3.  Human-specific CpG “beacons” identify loci associated with human-specific traits and disease 
Epigenetics  2012;7(10):1188-1199.
Regulatory change has long been hypothesized to drive the delineation of the human phenotype from other closely related primates. Here we provide evidence that CpG dinucleotides play a special role in this process. CpGs enable epigenome variability via DNA methylation, and this epigenetic mark functions as a regulatory mechanism. Therefore, species-specific CpGs may influence species-specific regulation. We report non-polymorphic species-specific CpG dinucleotides (termed “CpG beacons”) as a distinct genomic feature associated with CpG island (CGI) evolution, human traits and disease. Using an inter-primate comparison, we identified 21 extreme CpG beacon clusters (≥ 20/kb peaks, empirical p < 1.0 × 10−3) in humans, which include associations with four monogenic developmental and neurological disease related genes (Benjamini-Hochberg corrected p = 6.03 × 10−3). We also demonstrate that beacon-mediated CpG density gain in CGIs correlates with reduced methylation in these species in orthologous CGIs over time, via human, chimpanzee and macaque MeDIP-seq. Therefore mapping into both the genomic and epigenomic space the identified CpG beacon clusters define points of intersection where a substantial two-way interaction between genetic sequence and epigenetic state has occurred. Taken together, our data support a model for CpG beacons to contribute to CGI evolution from genesis to tissue-specific to constitutively active CGIs.
PMCID: PMC3469460  PMID: 22968434
epigenetics; epigenomics; CpG islands; gene regulation; evolution; human disease
4.  AutoMeDIP-seq: A high-throughput, whole genome, DNA methylation assay 
Methods (San Diego, Calif.)  2010;52(3-13):223-231.
DNA methylation is an epigenetic mark linking DNA sequence and transcription regulation, and therefore plays an important role in phenotypic plasticity. The ideal whole genome methylation (methylome) assay should be accurate, affordable, high-throughput and agnostic with respect to genomic features. To this end, the methylated DNA immunoprecipitation (MeDIP) assay provides a good balance of these criteria. In this Methods paper, we present AutoMeDIP-seq, a technique that combines an automated MeDIP protocol with library preparation steps for subsequent second-generation sequencing. We assessed recovery of DNA sequences covering a range of CpG densities using in vitro methylated λ-DNA fragments (and their unmethylated counterparts) spiked-in against a background of human genomic DNA. We show that AutoMeDIP is more reliable than manual protocols, shows a linear recovery profile of fragments related to CpG density (R2 = 0.86), and that it is highly specific (>99%). AutoMeDIP-seq offers a competitive approach to high-throughput methylome analysis of medium to large cohorts.
PMCID: PMC2977854  PMID: 20385236
DNA methylation; Automation; Whole genome; High-throughput sequencing; MeDIP
5.  A Three-Stage Genome-Wide Association Study of General Cognitive Ability: Hunting the Small Effects 
Behavior Genetics  2010;40(6):759-767.
Childhood general cognitive ability (g) is important for a wide range of outcomes in later life, from school achievement to occupational success and life expectancy. Large-scale association studies will be essential in the quest to identify variants that make up the substantial genetic component implicated by quantitative genetic studies. We conducted a three-stage genome-wide association study for general cognitive ability using over 350,000 single nucleotide polymorphisms (SNPs) in the quantitative extremes of a population sample of 7,900 7-year-old children from the UK Twins Early Development Study. Using two DNA pooling stages to enrich true positives, each of around 1,000 children selected from the extremes of the distribution, and a third individual genotyping stage of over 3,000 children to test for quantitative associations across the normal range, we aimed to home in on genes of small effect. Genome-wide results suggested that our approach was successful in enriching true associations and 28 SNPs were taken forward to individual genotyping in an unselected population sample. However, although we found an enrichment of low P values and identified nine SNPs nominally associated with g (P < 0.05) that show interesting characteristics for follow-up, further replication will be necessary to meet rigorous standards of association. These replications may take advantage of SNP sets to overcome limitations of statistical power. Despite our large sample size and three-stage design, the genes associated with childhood g remain tantalizingly beyond our current reach, providing further evidence for the small effect sizes of individual loci. Larger samples, denser arrays and multiple replications will be necessary in the hunt for the genetic variants that influence human cognitive ability.
Electronic supplementary material
The online version of this article (doi:10.1007/s10519-010-9350-4) contains supplementary material, which is available to authorized users.
PMCID: PMC2992848  PMID: 20306291
Genetics; Genome-wide association; General cognitive ability; Intelligence; Population sample; Middle childhood
6.  A Genome-Wide Association Study of Social and Non-Social Autistic-Like Traits in the General Population Using Pooled DNA, 500 K SNP Microarrays and Both Community and Diagnosed Autism Replication Samples 
Behavior Genetics  2009;40(1):31-45.
Two separate genome-wide association studies were conducted to identify single nucleotide polymorphisms (SNPs) associated with social and nonsocial autistic-like traits. We predicted that we would find SNPs associated with social and non-social autistic-like traits and that different SNPs would be associated with social and nonsocial. In Stage 1, each study screened for allele frequency differences in ~430,000 autosomal SNPs using pooled DNA on microarrays in high-scoring versus low-scoring boys from a general population sample (N = ~400/group). In Stage 2, 22 and 20 SNPs in the social and non-social studies, respectively, were tested for QTL association by individually genotyping an independent community sample of 1,400 boys. One SNP (rs11894053) was nominally associated (P < .05, uncorrected for multiple testing) with social autistic-like traits. When the sample was increased by adding females, 2 additional SNPs were nominally significant (P < .05). These 3 SNPs, however, showed no significant association in transmission disequilibrium analyses of diagnosed ASD families.
PMCID: PMC2797846  PMID: 20012890
Autistic traits; Genome-wide association; Autism; Microarrays; Heritability; Pooling
7.  The Nature of Nurture: A Genomewide Association Scan for Family Chaos 
Behavior Genetics  2008;38(4):361-371.
Widely used measures of the environment, especially the family environment of children, show genetic influence in dozens of twin and adoption studies. This phenomenon is known as gene-environment correlation in which genetically driven influences of individuals affect their environments. We conducted the first genome-wide association (GWA) analysis of an environmental measure. We used a measure called CHAOS which assesses ‘environmental confusion’ in the home, a measure that is more strongly associated with cognitive development in childhood than any other environmental measure. CHAOS was assessed by parental report when the children were 3 years and again when the children were 4 years; a composite CHAOS measure was constructed across the 2 years. We screened 490,041 autosomal single-nucleotide polymorphisms (SNPs) in a two-stage design in which children in low chaos families (N = 469) versus high chaos families (N = 369) from 3,000 families of 4-year-old twins were screened in Stage 1 using pooled DNA. In Stage 2, following SNP quality control procedures, 41 nominated SNPs were tested for association with family chaos by individual genotyping an independent representative sample of 3,529. Despite having 99% power to detect associations that account for more than 0.5% of the variance, none of the 41 nominated SNPs met conservative criteria for replication. Similar to GWA analyses of other complex traits, it is likely that most of the heritable variation in environmental measures such as family chaos is due to many genes of very small effect size.
PMCID: PMC2480594  PMID: 18360741
GE correlation; Family chaos; Genomewide association; Microarrays; Community sample
8.  No evidence for association between BMI and 10 candidate genes at ages 4, 7 and 10 in a large UK sample of twins 
BMC Medical Genetics  2008;9:12.
Over the last decade, associations between Body Mass Index (BMI) and a variety of candidate genes have been reported, but samples have almost all been limited to adults. The purpose of the present study was to test the developmental origins of some of these associations in a large longitudinal sample of children.
For 10 single-nucleotide polymorphisms (SNPs) in candidate genes reported to be associated with BMI in adults, we examined associations with BMI in a sample of 5000 children (2500 twin pairs) with BMI data at 4, 7 and 10 years. Association analyses were performed using the Quantitative Transmission Disequilibrium Test and we corrected for multiple testing using the False Discovery Rate.
Despite having 80% power to detect associations that account for as little as 0.2% of the variance of BMI, none of the 10 SNPs were significantly associated with BMI at any age, although two SNPs showed trends in the expected direction.
The lack of association for these ten previously reported associations, despite our large sample size, is typical of associations between candidate genes and complex traits. However, some of the reported SNP associations with BMI might emerge as we continue to follow the sample into adolescence and adulthood. This report highlights the importance of developmentally appropriate candidate genes.
PMCID: PMC2270805  PMID: 18304332
9.  Applicability of DNA pools on 500 K SNP microarrays for cost-effective initial screens in genomewide association studies 
BMC Genomics  2007;8:214.
Genetic influences underpinning complex traits are thought to involve multiple quantitative trait loci (QTLs) of small effect size. Detection of such QTL associations requires systematic screening of large numbers of DNA markers within large sample populations. Using pooled DNA on SNP microarrays to screen for allelic frequency differences between groups such as cases and controls (called SNP Microarray and Pooling, or SNP-MaP) has been validated as an efficient solution on both 10 k and 100 k platforms. We demonstrate that this approach can be effectively applied to the truly genomewide Affymetrix GeneChip® Mapping 500 K Array.
In comparisons between five independent DNA pools (N ~200 per pool) on separate Affymetrix GeneChip® Mapping 500 K Array sets, we show that, for SNPs with minor allele frequencies > 0.05, the reliability of the rank order of estimated allele frequencies, assessed as the average correlation between allele frequency estimates across the DNA pools, was 0.948 (average mean difference across the five pools = 0.069). Similarly, validity of the SNP-MaP approach was demonstrated by a rank-order correlation of 0.937 (average mean difference = 0.095) between the average DNA pool allele frequency estimates and the allele frequencies of an independent (CEPH) sample of 60 unrelated individually genotyped subjects.
We conclude that SNP-MaP can be extended for use on the Affymetrix GeneChip® Mapping 500 K Array, providing a cost-effective, reliable and valid initial screen of 500 K SNP microarrays in genomewide association scans.
PMCID: PMC1925094  PMID: 17610740
10.  Genotyping pooled DNA using 100K SNP microarrays: a step towards genomewide association scans 
Nucleic Acids Research  2006;34(4):e28.
The identification of quantitative trait loci (QTLs) of small effect size that underlie complex traits poses a particular challenge for geneticists due to the large sample sizes and large numbers of genetic markers required for genomewide association scans. An efficient solution for screening purposes is to combine single nucleotide polymorphism (SNP) microarrays and DNA pooling (SNP-MaP), an approach that has been shown to be valid, reliable and accurate in deriving relative allele frequency estimates from pooled DNA for groups such as cases and controls for 10K SNP microarrays. However, in order to conduct a genomewide association study many more SNP markers are needed. To this end, we assessed the validity and reliability of the SNP-MaP method using Affymetrix GeneChip® Mapping 100K Array set. Interpretable results emerged for 95% of the SNPs (nearly 110 000 SNPs). We found that SNP-MaP allele frequency estimates correlated 0.939 with allele frequencies for 97 605 SNPs that were genotyped individually in an independent population; the correlation was 0.971 for 26 SNPs that were genotyped individually for the 1028 individuals used to construct the DNA pools. We conclude that extending the SNP-MaP method to the Affymetrix GeneChip® Mapping 100K Array set provides a useful screen of >100 000 SNP markers for QTL association scans.
PMCID: PMC1368655  PMID: 16478714
11.  Genotyping DNA pools on microarrays: Tackling the QTL problem of large samples and large numbers of SNPs 
BMC Genomics  2005;6:52.
Quantitative trait locus (QTL) theory predicts that genetic influence on complex traits involves multiple genes of small effect size. To detect QTL associations of small effect size, large samples and systematic screens of thousands of DNA markers are required. An efficient solution is to genotype case and control DNA pools using SNP microarrays. We demonstrate that this is practical using DNA pools of 100 individuals.
Using standard microarray protocols for the Affymetrix GeneChip® Mapping 10 K Array Xba 131, we show that relative allele signal (RAS) values provide a quantitative index of allele frequencies in pooled DNA that correlate 0.986 with allele frequencies for 104 SNPs that were genotyped individually for 100 individuals. The sensitivity of the assay was demonstrated empirically in a spiking experiment in which 15% and 20% of one individual's DNA was added to a DNA pool.
We conclude that this approach, which we call SNP-MaP (SNP microarrays and pooling), is rapid, cost effective and promises to be a valuable initial screening method in the hunt for QTLs.
PMCID: PMC1079828  PMID: 15811185
12.  A central resource for accurate allele frequency estimation from pooled DNA genotyped on DNA microarrays 
Nucleic Acids Research  2005;33(3):e25.
Analysing pooled DNA on microarrays is an efficient way to genotype hundreds of individuals for thousands of markers for genome-wide association. Although direct comparison of case and control fluorescence scores is possible, correction for differential hybridization of alleles is important, particularly for rare single nucleotide polymorphisms. Such correction relies on heterozygous fluorescence scores and requires the genotyping of hundreds of individuals to obtain sufficient estimates of the correction factor, completely negating any benefit gained by pooling samples. We explore the effect of differential hybridization on test statistics and provide a solution to this problem in the form of a central resource for the accumulation of heterozygous fluorescence scores, allowing accurate allele frequency estimation at no extra cost.
PMCID: PMC549427  PMID: 15701753

