1.  Genome-wide DNA methylation profiling in rheumatoid arthritis identifies disease-associated methylation changes that are distinct to individual T- and B-lymphocyte populations 
Epigenetics  2014;9(9):1228-1237.
Changes to the DNA methylome have been described in patients with rheumatoid arthritis (RA). In previous work, we reported genome-wide methylation differences in T-lymphocyte and B-lymphocyte populations from healthy individuals. Now, using HumanMethylation450 BeadChips to interrogate genome-wide DNA methylation, we have determined disease-associated methylation changes in blood-derived T- and B-lymphocyte populations from 12 female patients with seropositive established RA, relative to 12 matched healthy individuals. Array data were analyzed using NIMBL software and bisulfite pyrosequencing was used to validate array candidates. Genome-wide DNA methylation, determined by analysis of LINE-1 sequences, revealed higher methylation in B-lymphocytes compared with T-lymphocytes (P ≤ 0.01), which is consistent with our findings in healthy individuals. Moreover, loci-specific methylation differences that distinguished T-lymphocytes from B-lymphocytes in healthy individuals were also apparent in RA patients. However, disease-associated methylation differences were also identified in RA. In these cases, we identified 509 and 252 CpGs in RA-derived T- and B-lymphocytes, respectively, that showed significant changes in methylation compared with their cognate healthy counterparts. Moreover, this included a restricted set of 32 CpGs in T-lymphocytes and 20 CpGs in B-lymphocytes (representing 15 and 10 genes, respectively, and including two, MGMT and CCS, that were common to both cell types) that displayed more substantial changes in methylation. These changes, apparent as hyper- or hypo-methylation, were independently confirmed by pyrosequencing analysis. Validation by pyrosequencing also revealed additional sites in some candidate genes that also displayed altered methylation in RA. In this first study of genome-wide DNA methylation in individual T- and B-lymphocyte populations in RA patients, we report disease-associated methylation changes that are distinct to each cell type and which support a role for discrete epigenetic regulation in this disease.
PMCID: PMC4169015  PMID: 25147922
Rheumatoid arthritis; DNA methylation; T-lymphocyte; B-lymphocyte; CpG; Genome-wide; Illumina 450k array
2.  Assembly, Assessment, and Availability of De novo Generated Eukaryotic Transcriptomes 
Frontiers in Genetics  2016;6:361.
De novo assembly of a complete transcriptome without the need for a guiding reference genome is attractive, particularly where the cost and complexity of generating a eukaryote genome is prohibitive. The transcriptome should not however be seen as just a quick and cheap alternative to building a complete genome. Transcriptomics allows the understanding and comparison of spatial and temporal samples within an organism, and allows surveying of multiple individuals or closely related species. De novo assembly in theory allows the building of a complete transcriptome without any prior knowledge of the genome. It also allows the discovery of alternate splice forms of coding RNAs and also non-coding RNAs, which are often missed by proteomic approaches, or are incompletely annotated in genome studies. The limitations of the method are that the generation of a truly complete assembly is unlikely, and so we require some methods for the assessment of the quality and appropriateness of a generated transcriptome. Whilst no single consensus pipeline or tool is agreed as optimal, various algorithms, and easy to use software do exist making transcriptome generation a more common approach. With this expansion of data, questions still exist relating to how do we make these datasets fully discoverable, comparable and most useful to understand complex biological systems?
PMCID: PMC4707302  PMID: 26793234
de novo transcriptome assembly; high-throughput sequencing; assessment; availability; annotation
3.  Transposon insertion mapping with PIMMS – Pragmatic Insertional Mutation Mapping System 
Frontiers in Genetics  2015;6:139.
The PIMMS (Pragmatic Insertional Mutation Mapping System) pipeline has been developed for simple conditionally essential genome discovery experiments in bacteria. Capable of using raw sequence data files alongside a FASTA sequence of the reference genome and GFF file, PIMMS will generate a tabulated output of each coding sequence with corresponding mapped insertions accompanied with normalized results enabling streamlined analysis. This allows for a quick assay of the genome to identify conditionally essential genes on a standard desktop computer prioritizing results for further investigation.
Availability: The PIMMS script, manual and accompanying test data is freely available at
PMCID: PMC4391243  PMID: 25914720
TnSeq; INseq; TraDIS; transposon mapping; sequencing
4.  Methylation of HOXA9 and ISL1 Predicts Patient Outcome in High-Grade Non-Invasive Bladder Cancer 
PLoS ONE  2015;10(9):e0137003.
Inappropriate DNA methylation is frequently associated with human tumour development, and in specific cases, is associated with clinical outcomes. Previous reports of DNA methylation in low/intermediate grade non-muscle invasive bladder cancer (NMIBC) have suggested that specific patterns of DNA methylation may have a role as diagnostic or prognostic biomarkers. In view of the aggressive and clinically unpredictable nature of high-grade (HG) NMIBC, and the current shortage of the preferred treatment option (Bacillus:Calmette-Guerin), novel methylation analyses may similarly reveal biomarkers of disease outcome that could risk-stratify patients and guide clinical management at initial diagnosis.
Promoter-associated CpG island methylation was determined in primary tumour tissue of 36 initial presentation high-grade NMIBCs, 12 low/intermediate-grade NMIBCs and 3 normal bladder controls. The genes HOXA9, ISL1, NKX6-2, SPAG6, ZIC1 and ZNF154 were selected for investigation on the basis of previous reports and/or prognostic utility in low/intermediate-grade NMIBC. Methylation was determined by Pyrosequencing of sodium-bisulphite converted DNA, and then correlated with gene expression using RT-qPCR. Methylation was additionally correlated with tumour behaviour, including tumour recurrence and progression to muscle invasive bladder cancer or metastases.
The ISL1 genes’ promoter-associated island was more frequently methylated in recurrent and progressive high-grade tumours than their non-recurrent counterparts (60.0% vs. 18.2%, p = 0.008). ISL1 and HOXA9 showed significantly higher mean methylation in recurrent and progressive tumours compared to non-recurrent tumours (43.3% vs. 20.9%, p = 0.016 and 34.5% vs 17.6%, p = 0.017, respectively). Concurrent ISL1/HOXA9 methylation in HG-NMIBC reliably predicted tumour recurrence and progression within one year (Positive Predictive Value 91.7%), and was associated with disease-specific mortality (DSM).
In this study we report methylation differences and similarities between clinical sub-types of high-grade NMIBC. We report the potential ability of methylation biomarkers, at initial diagnosis, to predict tumour recurrence and progression within one year of diagnosis. We found that specific biomarkers reliably predict disease outcome and therefore may help guide patient treatment despite the unpredictable clinical course and heterogeneity of high-grade NMIBC. Further investigation is required, including validation in a larger patient cohort, to confirm the clinical utility of methylation biomarkers in high-grade NMIBC.
PMCID: PMC4558003  PMID: 26332997
5.  The A2 gene of alcelaphine herpesvirus-1 is a transcriptional regulator affecting cytotoxicity in virus-infected T cells but is not required for malignant catarrhal fever induction in rabbits 
Virus research  2014;188:68-80.
Alcelaphine herpesvirus-1 (AlHV-1) causes malignant catarrhal fever (MCF). The A2 gene of AlHV-1 is a member of the bZIP transcription factor family. We wished to determine whether A2 is a virulence gene or not and whether it is involved in pathogenesis by interference with host transcription pathways. An A2 gene knockout (A2ΔAlHV-1) virus, revertant (A2revAlHV-1) virus, and wild-type virus (wtAlHV-1) were used to infect three groups of rabbits. A2ΔAlHV-1-infected rabbits succumbed to MCF, albeit with a delayed onset compared to the control groups, so A2 is not a critical virulence factor. Differential gene transcription analysis by RNAseq and qRT-PCR validation of a selection of these was performed in infected large granular lymphocyte (LGL) T cells obtained in culture from the MCF-affected animals. A2 was involved in the transcriptional regulation of immunological, cell cycle and apoptosis pathways. In particular, there was a bias towards γδ T cell receptor (TCR) expression and downregulation of αβ TCR. TCR signalling, apoptosis, cell cycle, IFN-γ and NFAT pathways were affected. Of particular interest was partial inhibition of the cytotoxicity-associated pathways involving perforin and the granzymes A and B in the A2ΔAlHV-1-infected LGLs compared to controls. In functional assays, A2ΔAlHV-1-infected LGLs were significantly less cytotoxic than wtAlHV-1- and A2revAlHV-1-infected LGLs using rabbit corneal epithelial cells (SIRC) as targets. This implies that A2 is involved in a pathway enhancing the expression of LGL cytotoxicity. This is important as virus-infected T cell cytotoxicity in vivo has been suggested as a potential mechanism of disease induction in MCF.
PMCID: PMC4441327  PMID: 24732177
Malignant catarrhal fever; Herpesvirus; bZIP protein; Transcription regulation; Cytotoxicity
6.  Multiple Groups of Endogenous Epsilon-Like Retroviruses Conserved across Primates 
Journal of Virology  2014;88(21):12464-12471.
Several types of cancer in fish are caused by retroviruses, including those responsible for major outbreaks of disease, such as walleye dermal sarcoma virus and salmon swim bladder sarcoma virus. These viruses form a phylogenetic group often described as the epsilonretrovirus genus. Epsilon-like retroviruses have become endogenous retroviruses (ERVs) on several occasions, integrating into germ line cells to become part of the host genome, and sections of fish and amphibian genomes are derived from epsilon-like retroviruses. However, epsilon-like ERVs have been identified in very few mammals. We have developed a pipeline to screen full genomes for ERVs, and using this pipeline, we have located over 800 endogenous epsilon-like ERV fragments in primate genomes. Genomes from 32 species of mammals and birds were screened, and epsilon-like ERV fragments were found in all primate and tree shrew genomes but no others. These viruses appear to have entered the genome of a common ancestor of Old and New World monkeys between 42 million and 65 million years ago. Based on these results, there is an ancient evolutionary relationship between epsilon-like retroviruses and primates. Clearly, these viruses had the potential to infect the ancestors of primates and were at some point a common pathogen in these hosts. Therefore, this result raises questions about the potential of epsilonretroviruses to infect humans and other primates and about the evolutionary history of these retroviruses.
IMPORTANCE Epsilonretroviruses are a group of retroviruses that cause several important diseases in fish. Retroviruses have the ability to become a permanent part of the DNA of their host by entering the germ line as endogenous retroviruses (ERVs), where they lose their infectivity over time but can be recognized as retroviruses for millions of years. Very few mammals are known to have epsilon-like ERVs; however, we have identified over 800 fragments of endogenous epsilon-like ERVs in the genomes of all major groups of primates, including humans. These viruses seem to have circulated and infected primate ancestors 42 to 65 million years ago. We are now interested in how these viruses have evolved and whether they have the potential to infect modern humans or other primates.
PMCID: PMC4248910  PMID: 25142585
7.  Virulence related sequences; insights provided by comparative genomics of Streptococcus uberis of differing virulence 
BMC Genomics  2015;16(1):334.
Streptococcus uberis, a Gram-positive, catalase-negative member of the family Streptococcaceae is an important environmental pathogen responsible for a significant proportion of subclinical and clinical bovine intramammary infections. Currently, the genome of only a single reference strain (0140J) has been described. Here we present a comparative analysis of complete draft genome sequences of an additional twelve S. uberis strains.
Pan and core genome analysis revealed the core genome common to all strains to be 1,550 genes in 1,509 orthologous clusters, complemented by 115-246 accessory genes present in one or more S. uberis strains but absent in the reference strain 0140J. Most of the previously predicted virulent genes were present in the core genome of all 13 strains but gene gain/loss was observed between the isolates in CDS associated with clustered regularly interspaced short palindromic repeats (CRISPRs), prophage and bacteriocin production. Experimental challenge experiments confirmed strain EF20 as non-virulent; only able to infect in a transient manner that did not result in clinical mastitis. Comparison of the genome sequence of EF20 with the validated virulent strain 0140J identified genes associated with virulence, however these did not relate clearly with clinical/non-clinical status of infection.
The gain/loss of mobile genetic elements such as CRISPRs and prophage are a potential driving force for evolutionary change. This first “whole-genome” comparison of strains isolated from clinical vs non-clinical intramammary infections including the type virulent vs non-virulent strains did not identify simple gene gain/loss rules that readily explain, or be confidently associated with, differences in virulence. This suggests that a more complex dynamic determines infection potential and clinical outcome not simply gene content.
Electronic supplementary material
The online version of this article (doi:10.1186/s12864-015-1512-6) contains supplementary material, which is available to authorized users.
PMCID: PMC4427978  PMID: 25898893
Mastitis; Streptococcus uberis; Comparative genomics; vru; de novo assembly; CRISPRs
8.  A novel member of the let-7 microRNA family is associated with developmental transitions in filarial nematode parasites 
BMC Genomics  2015;16(1):331.
Filarial nematodes are important pathogens in the tropics transmitted to humans via the bite of blood sucking arthropod vectors. The molecular mechanisms underpinning survival and differentiation of these parasites following transmission are poorly understood. microRNAs are small non-coding RNA molecules that regulate target mRNAs and we set out to investigate whether they play a role in the infection event.
microRNAs differentially expressed during the early post-infective stages of Brugia pahangi L3 were identified by microarray analysis. One of these, bpa-miR-5364, was selected for further study as it is upregulated ~12-fold at 24 hours post-infection, is specific to clade III nematodes, and is a novel member of the let-7 family, which are known to have key developmental functions in the free-living nematode Caenorhabditis elegans. Predicted mRNA targets of bpa-miR-5364 were identified using bioinformatics and comparative genomics approaches that relied on the conservation of miR-5364 binding sites in the orthologous mRNAs of other filarial nematodes. Finally, we confirmed the interaction between bpa-miR-5364 and three of its predicted targets using a dual luciferase assay.
These data provide new insight into the molecular mechanisms underpinning the transmission of third stage larvae of filarial nematodes from vector to mammal. This study is the first to identify parasitic nematode mRNAs that are verified targets of specific microRNAs and demonstrates that post-transcriptional control of gene expression via stage-specific expression of microRNAs may be important in the success of filarial infection.
Electronic supplementary material
The online version of this article (doi:10.1186/s12864-015-1536-y) contains supplementary material, which is available to authorized users.
PMCID: PMC4428239  PMID: 25896062
Brugia; Lymphatic filariasis; Nematodes; microRNAs
9.  A predictive model for canine dilated cardiomyopathy—a meta-analysis of Doberman Pinscher data 
PeerJ  2015;3:e842.
Dilated cardiomyopathy is a prevalent and often fatal disease in humans and dogs. Indeed dilated cardiomyopathy is the third most common form of cardiac disease in humans, reported to affect approximately 36 individuals per 100,000 individuals. In dogs, dilated cardiomyopathy is the second most common cardiac disease and is most prevalent in the Irish Wolfhound, Doberman Pinscher and Newfoundland breeds. Dilated cardiomyopathy is characterised by ventricular chamber enlargement and systolic dysfunction which often leads to congestive heart failure. Although multiple human loci have been implicated in the pathogenesis of dilated cardiomyopathy, the identified variants are typically associated with rare monogenic forms of dilated cardiomyopathy. The potential for multigenic interactions contributing to human dilated cardiomyopathy remains poorly understood. Consistent with this, several known human dilated cardiomyopathy loci have been excluded as common causes of canine dilated cardiomyopathy, although canine dilated cardiomyopathy resembles the human disease functionally. This suggests additional genetic factors contribute to the dilated cardiomyopathy phenotype.This study represents a meta-analysis of available canine dilated cardiomyopathy genetic datasets with the goal of determining potential multigenic interactions relating the sex chromosome genotype (XX vs. XY) with known dilated cardiomyopathy associated loci on chromosome 5 and the PDK4 gene in the incidence and progression of dilated cardiomyopathy. The results show an interaction between known canine dilated cardiomyopathy loci and an unknown X-linked locus. Our study is the first to test a multigenic contribution to dilated cardiomyopathy and suggest a genetic basis for the known sex-disparity in dilated cardiomyopathy outcomes.
PMCID: PMC4380154  PMID: 25834770
Dilated cardiomyopathy; PDK4; Canine; Multigenic; Human
10.  TGF-β superfamily members from the helminth Fasciola hepatica show intrinsic effects on viability and development 
Veterinary Research  2015;46:29.
The helminth Fasciola hepatica causes fasciolosis throughout the world, a major disease of livestock and an emerging zoonotic disease in humans. Sustainable control mechanisms such as vaccination are urgently required. To discover potential vaccine targets we undertook a genome screen to identify members of the transforming growth factor (TGF) family of proteins. Herein we describe the discovery of three ligands belonging to this superfamily and the cloning and characterisation of an activin/TGF like molecule we term FhTLM. FhTLM has a limited expression pattern both temporally across the parasite stages but also spatially within the worm. Furthermore, a recombinant form of this protein is able to enhance the rate (or magnitude) of multiple developmental processes of the parasite indicating a conserved role for this protein superfamily in the developmental biology of a major trematode parasite. Our study demonstrates for the first time the existence of this protein superfamily within F. hepatica and assigns a function to one of the three identified ligands. Moreover further exploration of this superfamily may yield future targets for diagnostic or vaccination purposes due to its stage restricted expression and functional role.
Electronic supplementary material
The online version of this article (doi:10.1186/s13567-015-0167-2) contains supplementary material, which is available to authorized users.
PMCID: PMC4354977  PMID: 25879787
11.  Identifying the Cellular Targets of Drug Action in the Central Nervous System Following Corticosteroid Therapy 
ACS Chemical Neuroscience  2013;5(1):51-63.
Corticosteroid (CS) therapy is used widely in the treatment of a range of pathologies, but can delay production of myelin, the insulating sheath around central nervous system nerve fibers. The cellular targets of CS action are not fully understood, that is, “direct” action on cells involved in myelin genesis [oligodendrocytes and their progenitors the oligodendrocyte precursor cells (OPCs)] versus “indirect” action on other neural cells. We evaluated the effects of the widely used CS dexamethasone (DEX) on purified OPCs and oligodendrocytes, employing complementary histological and transcriptional analyses. Histological assessments showed no DEX effects on OPC proliferation or oligodendrocyte genesis/maturation (key processes underpinning myelin genesis). Immunostaining and RT-PCR analyses show that both cell types express glucocorticoid receptor (GR; the target for DEX action), ruling out receptor expression as a causal factor in the lack of DEX-responsiveness. GRs function as ligand-activated transcription factors, so we simultaneously analyzed DEX-induced transcriptional responses using microarray analyses; these substantiated the histological findings, with limited gene expression changes in DEX-treated OPCs and oligodendrocytes. With identical treatment, microglial cells showed profound and global changes post-DEX addition; an unexpected finding was the identification of the transcription factor Olig1, a master regulator of myelination, as a DEX responsive gene in microglia. Our data indicate that CS-induced myelination delays are unlikely to be due to direct drug action on OPCs or oligodendrocytes, and may occur secondary to alterations in other neural cells, such as the immune component. To the best of our knowledge, this is the first comparative molecular and cellular analysis of CS effects in glial cells, to investigate the targets of this major class of anti-inflammatory drugs as a basis for myelination deficits.
PMCID: PMC3894723  PMID: 24147833
Oligodendrocyte; Olig1; corticosteroid; glucocorticoid receptor; microglia; microarray
12.  Identification of DNA methylation biomarkers from Infinium arrays 
Frontiers in Genetics  2012;3:161.
Epigenetic modifications of DNA, such as cytosine methylation are differentially abundant in diseases such as cancer. A goal for clinical research is finding sites that are differentially methylated between groups of samples to act as potential biomarkers for disease outcome. However, clinical samples are often limited in availability, represent a heterogeneous collection of cells or are of uncertain clinical class. Array-based methods for identification of methylation provide a cost-effective method to survey a proportion of the methylome at single base resolution. The Illumina Infinium array has become a popular and reliable high throughput method in this field and are proving useful in the identification of biomarkers for disease. Here, we compare a commonly used statistical test with a new intuitive and flexible computational approach to quickly detect differentially methylated sites. The method rapidly identifies and ranks candidate lists with greatest inter-group variability whilst controlling for intra-group variability. Intuitive and biologically relevant filters can be imposed to quickly identify sites and genes of interest.
PMCID: PMC3427494  PMID: 22936948
DNA methylation array; Infinium 450k; biomarker discovery; differential methylation; epigenetics; DNA methylome; epigenomics; NIMBL
13.  Homopolymer tract organization in the human malarial parasite Plasmodium falciparum and related Apicomplexan parasites 
BMC Genomics  2014;15(1):848.
Homopolymeric tracts, particularly poly dA.dT, are enriched within the intergenic sequences of eukaryotic genomes where they appear to act as intrinsic regulators of nucleosome positioning. A previous study of the incomplete genome of the human malarial parasite Plasmodium falciparum reports a higher than expected enrichment of poly dA.dT tracts, far above that anticipated even in this highly AT rich genome. Here we report an analysis of the relative frequency, length and spatial arrangement of homopolymer tracts for the complete P. falciparum genome, extending this analysis to twelve additional genomes of Apicomplexan parasites important to human and animal health. In addition, using nucleosome-positioning data available for P. falciparum, we explore the correlation of poly dA.dT tracts with nucleosome-positioning data over key expression landmarks within intergenic regions.
We describe three apparent lineage-specific patterns of homopolymeric tract organization within the intergenic regions of these Apicomplexan parasites. Moreover, a striking pattern of enrichment of overly long poly dA.dT tracts in the intergenic regions of Plasmodium spp. uniquely extends into protein coding sequences. There is a conserved spatial arrangement of poly dA.dT immediately flanking open reading frames and over predicted core promoter sites. These key landmarks are all relatively depleted in nucleosomes in P. falciparum, as would be expected for poly dA.dT acting as nucleosome exclusion sequences.
Previous comparative studies of homopolymer tract organization emphasize evolutionary diversity; this is the first report of such an analysis within a single phylum. Our data provide insights into the evolution of homopolymeric tracts and the selective pressures at play in their maintenance and expansion.
Electronic supplementary material
The online version of this article (doi:10.1186/1471-2164-15-848) contains supplementary material, which is available to authorized users.
PMCID: PMC4194402  PMID: 25281558
Poly dA.dT; Intergenic regions; Malaria; Nucleosome; Gene expression
14.  Comparison of Clustering Methods for Investigation of Genome-Wide Methylation Array Data 
The use of genome-wide methylation arrays has proved very informative to investigate both clinical and biological questions in human epigenomics. The use of clustering methods either for exploration of these data or to compare to an a priori grouping, e.g., normal versus disease allows assessment of groupings of data without user bias. However no consensus on the methods to use for clustering of methylation array approaches has been reached. To determine the most appropriate clustering method for analysis of illumina array methylation data, a collection of data sets was simulated and used to compare clustering methods. Both hierarchical clustering and non-hierarchical clustering methods (k-means, k-medoids, and fuzzy clustering algorithms) were compared using a range of distance and linkage methods. As no single method consistently outperformed others across different simulations, we propose a method to capture the best clustering outcome based on an additional measure, the silhouette width. This approach produced a consistently higher cluster accuracy compared to using any one method in isolation.
PMCID: PMC3268382  PMID: 22303382
hierarchical; k-means; k-medoids; epigenomics; epigenetics; illumina; infinium
15.  A consensus approach to vertebrate de novo transcriptome assembly from RNA-seq data: assembly of the duck (Anas platyrhynchos) transcriptome 
Frontiers in Genetics  2014;5:190.
For vertebrate organisms where a reference genome is not available, de novo transcriptome assembly enables a cost effective insight into the identification of tissue specific or differentially expressed genes and variation of the coding part of the genome. However, since there are a number of different tools and parameters that can be used to reconstruct transcripts, it is difficult to determine an optimal method. Here we suggest a pipeline based on (1) assessing the performance of three different assembly tools (2) using both single and multiple k-mer (MK) approaches (3) examining the influence of the number of reads used in the assembly (4) merging assemblies from different tools. We use an example dataset from the vertebrate Anas platyrhynchos domestica (Pekin duck). We find that taking a subset of data enables a robust assembly to be produced by multiple methods without the need for very high memory capacity. The use of reads mapped back to transcripts (RMBT) and CEGMA (Core Eukaryotic Genes Mapping Approach) provides useful metrics to determine the completeness of assembly obtained. For this dataset the use of MK in the assembly generated a more complete assembly as measured by greater number of RMBT and CEGMA score. Merged single k-mer assemblies are generally smaller but consist of longer transcripts, suggesting an assembly consisting of fewer fragmented transcripts. We suggest that the use of a subset of reads during assembly allows the relatively rapid investigation of assembly characteristics and can guide the user to the most appropriate transcriptome for particular downstream use. Transcriptomes generated by the compared assembly methods and the final merged assembly are freely available for download at
PMCID: PMC4070175  PMID: 25009556
RNA-seq; de novo transcriptome; assembly; Illumina; high-throughput sequencing
16.  Characterisation of the horse transcriptome from immunologically active tissues 
PeerJ  2014;2:e382.
The immune system of the horse has not been well studied, despite the fact that the horse displays several features such as sensitivity to bacterial lipopolysaccharide that make them in many ways a more suitable model of some human disorders than the current rodent models. The difficulty of working with large animal models has however limited characterisation of gene expression in the horse immune system with current annotations for the equine genome restricted to predictions from other mammals and the few described horse proteins. This paper outlines sequencing of 184 million transcriptome short reads from immunologically active tissues of three horses including the genome reference “Twilight”. In a comparison with the Ensembl horse genome annotation, we found 8,763 potentially novel isoforms.
PMCID: PMC4017814  PMID: 24860704
Equus caballus; RNA-Seq; Transcriptome assembly
17.  Molecular Characterization of Adipose Tissue in the African Elephant (Loxodonta africana) 
PLoS ONE  2014;9(3):e91717.
Adipose tissue (AT) is a dynamic and flexible organ with regulatory roles in physiological functions including metabolism, reproduction and inflammation; secreted adipokines, including leptin, and fatty acids facilitate many of these roles. The African elephant (Loxodonta africana) is experiencing serious challenges to optimal reproduction in captivity. The physiological and molecular basis of this impaired fertility remains unknown. AT production of leptin is a crucial molecular link between nutritional status, adiposity and fertility in many species. We propose that leptin has a similar function in the African elephant. African elephant visceral and subcutaneous adipose tissue (AT) was obtained from both sexes and a range of ages including females with known pregnancy status. RNA was extracted and histological sections created and analyzed by microarray, PCR and immunohistochemistry respectively. Gas-chromatography was used to determine the fatty acid composition of AT. Microarray expression profiling was used to compare gene expression profiles of AT from pre-pubertal versus reproductively competent adult African elephants. This study demonstrates, for the first time, leptin mRNA and protein expression in African elephant AT. The derived protein sequence of the elephant leptin protein was exploited to determine its relationship within the class I helical cytokine superfamily, which indicates that elephant leptin is most closely related to the leptin orthologs of Oryctolagus cuniculus (European rabbit), Lepus oiostolus (woolly hare), and members of the Ochotonidae (Pika). Immunohistological analysis identified considerable leptin staining within the cytoplasm of adipocytes. Significant differences in fatty acid profiles between pregnant and non-pregnant animals were revealed, most notably a reduction in both linoleic and α linoleic acid in pregnant animals. This report forms the basis for future studies to address the effect of nutrient composition and body condition on reproduction in captive and wild elephants.
PMCID: PMC3954733  PMID: 24633017
18.  Mutations in CCDC39 and CCDC40 are the major cause of primary ciliary dyskinesia with axonemal disorganisation and absent inner dynein arms 
Human mutation  2013;34(3):462-472.
Primary ciliary dyskinesia (PCD) is a genetically heterogeneous disorder caused by cilia and sperm dysmotility. About 12% of cases show perturbed 9+2 microtubule cilia structure and inner dynein arm (IDA) loss, historically termed ‘radial spoke defect’. We sequenced CCDC39 and CCDC40 in 54 ‘radial spoke defect’ families, as these are the two genes identified so far to cause this defect. We discovered biallelic mutations in a remarkable 69% (37/54) of families, including identification of 25 (19 novel) mutant alleles (12 in CCDC39 and 13 in CCDC40). All the mutations were nonsense, splice and frameshift predicting early protein truncation, which suggests this defect is caused by ‘null’ alleles conferring complete protein loss. Most families (73%; 27/37) had homozygous mutations, including families from outbred populations. A major putative hotspot mutation was identified, CCDC40 c.248delC, as well as several other possible hotspot mutations. Together, these findings highlight the key role of CCDC39 and CCDC40 in PCD with axonemal disorganisation and IDA loss, and these genes represent major candidates for genetic testing in families affected by this ciliary phenotype. We show that radial spoke structures are largely intact in these patients and propose this ciliary ultrastructural abnormality be referred to as ‘IDA and nexin-dynein regulatory complex (N-DRC) defect’, rather than ‘radial spoke defect’.
PMCID: PMC3630464  PMID: 23255504
primary ciliary dyskinesia; cilia; CCDC39; CCDC40; radial spoke; dynein regulatory complex; nexin link
19.  Targeted NGS gene panel identifies mutations in RSPH1 causing primary ciliary dyskinesia and a common mechanism for ciliary central pair agenesis due to radial spoke defects 
Human Molecular Genetics  2014;23(13):3362-3374.
Primary ciliary dyskinesia (PCD) is an inherited chronic respiratory obstructive disease with randomized body laterality and infertility, resulting from cilia and sperm dysmotility. PCD is characterized by clinical variability and extensive genetic heterogeneity, associated with different cilia ultrastructural defects and mutations identified in >20 genes. Next generation sequencing (NGS) technologies therefore present a promising approach for genetic diagnosis which is not yet in routine use. We developed a targeted panel-based NGS pipeline to identify mutations by sequencing of selected candidate genes in 70 genetically undefined PCD patients. This detected loss-of-function RSPH1 mutations in four individuals with isolated central pair (CP) agenesis and normal body laterality, from two unrelated families. Ultrastructural analysis in RSPH1-mutated cilia revealed transposition of peripheral outer microtubules into the ‘empty’ CP space, accompanied by a distinctive intermittent loss of the central pair microtubules. We find that mutations in RSPH1, RSPH4A and RSPH9, which all encode homologs of components of the ‘head’ structure of ciliary radial spoke complexes identified in Chlamydomonas, cause clinical phenotypes that appear to be indistinguishable except at the gene level. By high-resolution immunofluorescence we identified a loss of RSPH4A and RSPH9 along with RSPH1 from RSPH1-mutated cilia, suggesting RSPH1 mutations may result in loss of the entire spoke head structure. CP loss is seen in up to 28% of PCD cases, in whom laterality determination specified by CP-less embryonic node cilia remains undisturbed. We propose this defect could arise from instability or agenesis of the ciliary central microtubules due to loss of their normal radial spoke head tethering.
PMCID: PMC4049301  PMID: 24518672
20.  Evolutionary expansion and anatomical specialization of synapse proteome complexity 
Nature neuroscience  2008;11(7):799-806.
Understanding the origins and evolution of synapses may provide insight into species diversity and organisation of the brain. Using comparative proteomics and genomics we examined the evolution of the postsynaptic density (PSD) and MAGUK associated signalling complexes (MASCs) underlying learning and memory. PSD/MASC orthologues found in yeast perform basic cellular functions regulating protein synthesis and structural plasticity. Striking changes in signalling complexity were observed at the yeast:metazoan and invertebrate:vertebrate boundaries, with expansion of key synapse components, notably receptors, adhesion/cytoskeletal and scaffold proteins. Proteomic comparison of Drosophila and mouse MASCs revealed species-specific adaptation with greater signalling complexity in mouse. Although synapse components were conserved amongst diverse vertebrate species, mapping mRNA and protein expression within the mouse brain showed vertebrate-specific components preferentially contributed to differences between brain regions. We propose that evolution of synapse complexity around a core proto-synapse has contributed to invertebrate–vertebrate differences and to brain specialisation.
PMCID: PMC3624047  PMID: 18536710
21.  Insights into the evolution of Darwin’s finches from comparative analysis of the Geospiza magnirostris genome sequence 
BMC Genomics  2013;14:95.
A classical example of repeated speciation coupled with ecological diversification is the evolution of 14 closely related species of Darwin’s (Galápagos) finches (Thraupidae, Passeriformes). Their adaptive radiation in the Galápagos archipelago took place in the last 2–3 million years and some of the molecular mechanisms that led to their diversification are now being elucidated. Here we report evolutionary analyses of genome of the large ground finch, Geospiza magnirostris.
13,291 protein-coding genes were predicted from a 991.0 Mb G. magnirostris genome assembly. We then defined gene orthology relationships and constructed whole genome alignments between the G. magnirostris and other vertebrate genomes. We estimate that 15% of genomic sequence is functionally constrained between G. magnirostris and zebra finch. Genic evolutionary rate comparisons indicate that similar selective pressures acted along the G. magnirostris and zebra finch lineages suggesting that historical effective population size values have been similar in both lineages. 21 otherwise highly conserved genes were identified that each show evidence for positive selection on amino acid changes in the Darwin's finch lineage. Two of these genes (Igf2r and Pou1f1) have been implicated in beak morphology changes in Darwin’s finches. Five of 47 genes showing evidence of positive selection in early passerine evolution have cilia related functions, and may be examples of adaptively evolving reproductive proteins.
These results provide insights into past evolutionary processes that have shaped G. magnirostris genes and its genome, and provide the necessary foundation upon which to build population genomics resources that will shed light on more contemporaneous adaptive and non-adaptive processes that have contributed to the evolution of the Darwin’s finches.
PMCID: PMC3575239  PMID: 23402223
Genomics; Evolution; Darwin’s finches; Large ground finch; Geospiza magnirostris
22.  A Comparative Approach to Understanding Tissue-Specific Expression of Uncoupling Protein 1 Expression in Adipose Tissue 
Frontiers in Genetics  2013;3:304.
The thermoregulatory function of brown adipose tissue (BAT) is due to the tissue-specific expression of uncoupling protein 1 (UCP1) which is thought to have evolved in early mammals. We report that a CpG island close to the UCP1 transcription start site is highly conserved in all 29 vertebrates examined apart from the mouse and xenopus. Using methylation sensitive restriction digest and bisulfite mapping we show that the CpG island in both the bovine and human is largely un-methylated and is not related to differences in UCP1 expression between white and BAT. Tissue-specific expression of UCP1 has been proposed to be regulated by a conserved 5′ distal enhancer which has been reported to be absent in marsupials. We demonstrate that the enhancer, is also absent in five eutherians as well as marsupials, monotremes, amphibians, and fish, is present in pigs despite UCP1 having become a pseudogene, and that absence of the enhancer element does not relate to BAT-specific UCP1 expression. We identify an additional putative 5′ regulatory unit which is conserved in 14 eutherian species but absent in other eutherians and vertebrates, but again unrelated to UCP1 expression. We conclude that despite clear evidence of conservation of regulatory elements in the UCP1 5′ untranslated region, this does not appear to be related to species or tissues-specific expression of UCP1.
PMCID: PMC3535714  PMID: 23293654
CpG islands; methylation; uncoupling protein 1; phylogenic analysis
23.  Adaptive evolution of Toll-like receptor 5 in domesticated mammals 
Previous studies have proposed that mammalian toll like receptors (TLRs) have evolved under diversifying selection due to their role in pathogen detection. To determine if this is the case, we examined the extent of adaptive evolution in the TLR5 gene in both individual species and defined clades of the mammalia.
In support of previous studies, we find evidence of adaptive evolution of mammalian TLR5. However, we also show that TLR5 genes of domestic livestock have a concentration of single nucleotide polymorphisms suggesting a specific signature of adaptation. Using codon models of evolution we have identified a concentration of rapidly evolving codons within the TLR5 extracellular domain a site of interaction between host and the bacterial surface protein flagellin.
The results suggest that interactions between pathogen and host may be driving adaptive change in TLR5 by competition between species. In support of this, we have identified single nucleotide polymorphisms (SNP) in sheep and cattle TLR5 genes that are co-localised and co-incident with the predicted adaptive codons suggesting that adaptation in this region of the TLR5 gene is on-going in domestic species.
PMCID: PMC3483281  PMID: 22827462
Toll-like receptor; SNP; Adaptive evolution; Positive selection; Sheep; Cattle
25.  Endogenous retroviruses in primates 
Retrovirology  2011;8(Suppl 2):P7.
PMCID: PMC3236961

