Clostridium botulinum encompasses bacteria that produce at least one of the seven serotypes of botulinum neurotoxin (BoNT/A-G). The availability of genome sequences of four closely related Type A1 or A1(B) strains, as well as the A1-specific microarray, allowed the analysis of their genomic organizations and evolutionary relationship. The four genomes share >90% core genes and >96% functional groups. Phylogenetic analysis based on COG shows closer relations of the A1(B) strain, NCTC 2916, to B1 and F1 than A1 strains. Alignment of the genomes of the three A1 strains revealed a highly similar chromosomal structure with three small gaps in the genome of ATCC 19397 and one additional gap in the genome of Hall A, suggesting ATCC 19379 as an evolutionary intermediate between Hall A and ATCC 3502. Analyses of the four gap regions indicated potential horizontal gene transfer and recombination events important for the evolution of A1 strains.
Clostridium botulinum; genomic comparison; microarray
The brain responds to injury and infection by activating innate defense and tissue repair mechanisms. Working upon the hypothesis that the brain defense response involves common genes and pathways across diverse pathologies, we analysed global gene expression in brain from mouse models representing three major central nervous system disorders, cerebral stroke, multiple sclerosis and Alzheimer’s disease compared to normal brain using DNA microarray expression profiling. A comparison of dysregulated genes across disease models revealed common genes and pathways including key components of estrogen and TGF-β signaling pathways that have been associated with neuroprotection as well as a neurodegeneration mediator, TRPM7. Further, for each disease model, we discovered collections of differentially expressed genes that provide novel insight into the individual pathology and its associated mechanisms. Our data provide a resource for exploring the complex molecular mechanisms that underlie brain neurodegeneration and a new approach for identifying generic and disease-specific targets for therapy.
cDNA microarrays; Brain inflammation; Cerebral stroke; Alzheimer’s disease; Systems biology; Gene expression profiling
Two common sources of DNA for whole exome sequencing (WES) are whole blood (WB) and immortalized lymphoblastoid cell line (LCL). However, it is possible that LCLs have a substantially higher rate of mutation than WB, causing concern for their use in sequencing studies. We compared results from paired WB and LCL DNA samples for 16 subjects, using LCLs of low passage number (<5). Using a standard analysis pipeline we detected a large number of discordant genotype calls (approximately 50 per subject) that we segregated into categories of “confidence” based on read-level quality metrics. From these categories and validation by Sanger sequencing, we estimate that the vast majority of the candidate differences were false positives and that our categories were effective in predicting valid sequence differences, including LCLs with putative mosaicism for the non-reference allele (3–4 per exome). These results validate the use of DNA from LCLs of low passage number for exome sequencing.
graphical diagnostics; lymphoblastoid cell line; mosaicism; sequence variant call; strand bias; somatic mutation
High-throughput cancer studies have been extensively conducted, searching for genetic markers associated with outcomes beyond clinical and environmental risk factors. Gene–environment interactions can have important implications beyond main effects. The commonly-adopted single-marker analysis cannot accommodate the joint effects of a large number of markers. The existing joint-effects methods also have limitations. Specifically, they may suffer from high computational cost, do not respect the “main effect, interaction” hierarchical structure, or use ineffective techniques. We develop a penalization method for the identification of important G × E interactions and main effects. It has an intuitive formulation, respects the hierarchical structure, accommodates the joint effects of multiple markers, and is computationally affordable. In numerical study, we analyze prognosis data under the AFT (accelerated failure time) model. Simulation shows satisfactory performance of the proposed method. Analysis of an NHL (non-Hodgkin lymphoma) study with SNP measurements shows that the proposed method identifies markers with important implications and satisfactory prediction performance.
Gene–environment interaction; Penalized marker identification; Cancer prognosis
Candida albicans maintains both commensal and pathogenic states in humans. Here, we have defined the genomic response to osmotic stress mediated by transcription factor Sko1. We performed microarray analysis of a sko1Δ/Δ mutant strain subjected to osmotic stress, and we utilized gene sequence enrichment analysis and enrichment mapping to identify Sko1-dependent osmotic stress-response genes. We found that Sko1 regulates distinct gene classes with functions in ribosomal synthesis, mitochondrial function, and vacuolar transport. Our in silico analysis suggests that Sko1 may recognize two unique DNA binding motifs. Our C. albicans genomic analyses and complementation studies in Saccharomyces cerevisiae showed that Sko1 is conserved as a regulator of carbohydrate metabolism, redox metabolism, and glycerol synthesis. Further, our real time-qPCR results showed that osmotic stress-response genes that are dependent on the kinase Hog1 also require Sko1 for full expression. Our findings reveal divergent and conserved aspects of Sko1-dependent osmotic stress signaling.
Yeast; Transcription factor; SKO1; Osmotic stress; Enrichment mapping
A species-specific population of arrangements of repetitive elements (REs), called RE arrays, exists in the human and mouse genomes. We developed an RE analytical tool, named REViewer, for visualizing RE occurrences within RE arrays and other genomic regions as an interactive line map. REViewer utilizes an RE reference library which is established with two RE types: 1) REMiner-generated undefined REs and 2) RepeatMasker-derived defined REs. RE occurrences within queries are visualized as a line map using these two RE types. The REViewer’s controller provides analytical options, such as zoom, customization of axis unit, and RE type selection. The functionality of REViewer was evaluated using the human chromosome Y sequence. The REViewer is determined to be an efficient tool that facilitates visualization of up to 6000 REs in RE arrays and other genomic regions. The maximum query size is linked to the RE mining tools (e.g., REMiner, RepeatMasker), not to REViewer.
In coronary artery bypass grafting (CABG), the combined use of left and right internal mammary arteries (LIMA and RIMA) — collectively known as bilateral IMAs (BIMAs) provides a survival advantage over the use of LIMA alone. However, gene expression in RIMA has never been compared to that in LIMA. Here we report a genome-wide transcriptional analysis of BIMA to investigate the expression profiles of these conduits in patients undergoing CABG. As expected, in comparing the BIMAs to the aorta, we found differences in pathways and processes associated with atherosclerosis, inflammation, and cell signaling — pathways which provide biological support for the observation that BIMA grafts deliver long-term benefits to the patients and protect against continued atherosclerosis. These data support the widespread use of BIMAs as the preferred conduits in CABG.
Cardiovascular disease; Transcriptome analysis; Genomics; Coronary artery bypass grafting
We employed ABI high-density oligonucleotide microarrays containing 31,700 sixty-mer probes (representing 27,868 annotated human genes) to determine differential gene expression in idiopathic dilated cardiomyopathy (DCM). We identified 626 up-regulated and 636 down-regulated genes in DCM compared to controls. Most significant changes occurred in the tricarboxylic acid cycle, angiogenesis, and apoptotic signaling pathways, among which 32 apoptosis- and 13 MAPK activity-related genes were altered. Inorganic cation transporter, catalytic activities, energy metabolism and electron transport-related processes were among the most critically influenced pathways. Among the up-regulated genes were HTRA1 (6.9-fold), PDCD8(AIFM1) (5.2) and PRDX2 (4.4) and the down-regulated genes were NR4A2 (4.8), MX1 (4.3), LGALS9 (4), IFNA13 (4), UNC5D (3.6) and HDAC2 (3) (pb0.05), all of which have no clearly defined cardiac-related function yet. Gene ontology and enrichment analysis also revealed significant alterations in mitochondrial oxidative phosphorylation, metabolism and Alzheimer’s disease pathways. Concordance was also confirmed for a significant number of genes and pathways in an independent validation microarray dataset. Furthermore, verification by real-time RT-PCR showed a high degree of consistency with the microarray results. Our data demonstrate an association of DCM with alterations in various cellular events and multiple yet undeciphered genes that may contribute to heart muscle disease pathways.
Idiopathic dilated cardiomyopathy; Global gene expression; Microarray gene expression; Gene up/down-regulation; Mitochondrial function; Apoptotic signaling
To explore the mechanisms controlling erythroid differentiation and development, we analyzed the genome-wide transcription dynamics occurring during the differentiation of human embryonic stem cells (HESCs) into the erythroid lineage and development of embryonic to adult erythropoiesis using high throughput sequencing technology. HESCs and erythroid cells at three developmental stages: ESER (embryonic), FLER (fetal), and PBER (adult) were analyzed. Our findings revealed that the number of expressed genes decreased during differentiation, whereas the total expression intensity increased. At each of the three transitions (HESCs–ESERs, ESERs–FLERs, and FLERs–PBERs), many differentially expressed genes were observed, which were involved in maintaining pluripotency, early erythroid specification, rapid cell growth, and cell–cell adhesion and interaction. We also discovered dynamic networks and their central nodes in each transition. Our study provides a fundamental basis for further investigation of erythroid differentiation and development, and has implications in using ESERs for transfusion product in clinical settings.
High-throughput RNA sequencing; Erythropoiesis; Cell differentiation; Development; Gene regulatory networks
Clopidogrel is an oral antiplatelet pro-drug prescribed to 40 million patients worldwide who are at risk for thrombotic events or receiving percutaneous coronary intervention (PCI). However about a fifth of patients treated with clopidogrel do not respond adequately to the drug. From a cohort of 105 patients on whom we had functional data on clopidogrel response, we used ultra-high throughput sequencing to assay mutations in CYP2C19 and ABCB1, the two genes genetically linked to respond. Testing for mutations in CYP2C19, as recommended by the FDA, only correctly predicted if a patient would respond to clopidogrel 52.4% of the time. Similarly, testing of the ABCB1 gene only correctly foretold response in 51 (48.6%) patients. These results are clinically relevant and suggest that until additional genetic factors are discovered that predict response more completely, functional assays are more appropriate for clinical use.
Cardiovascular surgery; Ultra-high throughput sequencing; Gene polymorphisms; Platelets
We investigated the genome-wide distribution of CNVs in the
Alzheimer's disease (AD) Neuroimaging Initiative (ADNI) sample (146 with
AD, 313 with Mild Cognitive Impairment (MCI), and 181 controls). Comparison of
single CNVs between cases (MCI and AD) and controls shows overrepresentation of
large heterozygous deletions in cases (p-value < 0.0001). The analysis
of CNV-Regions identifies 44 copy number variable loci of heterozygous
deletions, with more CNV-Regions among affected than controls (p = 0.005). Seven
of the 44 CNV-Regions are nominally significant for association with cognitive
impairment. We validated and confirmed our main findings with genome
re-sequencing of selected patients and controls. The functional pathway analysis
of the genes putatively affected by deletions of CNV-Regions reveals enrichment
of genes implicated in axonal guidance, cell–cell adhesion, neuronal
morphogenesis and differentiation. Our findings support the role of CNVs in AD,
and suggest an association between large deletions and the development of
Alzheimer's disease; Copy Number Variable Regions (CNV-Regions); Copy Number Variations (CNVs); Genome-wide scan; Next Generation Sequencing (NGS)
We conducted a translational genomics pilot study to evaluate the impact of genomic information related to colorectal cancer (CRC) risk on psychosocial, behavioral and communication outcomes. In 47 primary care participants, 96% opted for testing of three single nucleotide polymorphisms (SNPs) related to CRC risk. Participants averaged 2.5 of 6 possible SNP risk alleles (10% lifetime risk). At 3-months, participants did not report significant increases in cancer worry/distress; over half reported physical activity and dietary changes. SNP risk scores were unrelated to behavior change at 3-months. Many participants (64%) shared their SNP results, including 28% who shared results with a physician. In this pilot, genomic risk education, including discussion of other risk factors, appeared to impact patients' health behaviors, regardless of the level of SNP risk. Future work can compare risk education with and without SNP results to evaluate if SNP information adds value to existing approaches.
translational genomic research; SNP testing; colorectal cancer risk; genomics education; behavior change
Next-generation sequencing is increasingly employed in biomedical investigations. Strong concordance between microarray and mRNA-seq levels has been reported in high quality specimens but information is lacking on formalin-fixed, paraffin-embedded (FFPE) tissues, and particularly for microRNA (miRNA) analysis. We conducted a preliminary examination of the concordance between miRNA-seq and cDNA-mediated annealing, selection, extension, and ligation (DASL) miRNA assays. Quantitative agreement between platforms is moderate (Spearman correlation 0.514–0.596) and there is discordance of detection calls on a subset of miRNAs. Quantitative PCR (q-RT-PCR) performed for several discordant miRNAs confirmed the presence of most sequences detected by miRNA-seq but not by DASL but also that miRNA-seq did not detect some sequences, which DASL confidently detected. Our results suggest that miRNA-seq is specific, with few false positive calls, but it may not detect certain abundant miRNAs in FFPE tissue. Further work is necessary to fully address these issues that are pertinent for translational research.
RNA-sequencing; microarrays; paraffin-tissue
Precise characterization of chromatin states is an important but difficult task for understanding the regulatory role of chromatin. A number of computational methods have been developed with varying levels of success. However, a remaining challenge is to model epigenomic patterns over multi-scales, as each histone mark is distributed with its own characteristic length scale. We developed a tiered hidden Markov model and applied it to analyze a ChIP-seq dataset in human embryonic stem cells. We identified a two-tier structure containing 15 distinct bin-level chromatin states grouped into three domain-level states. Whereas the bin-level states capture the local variation of histone marks, the domain-level states detect large-scale variations. Compared to bin-level states, the domain-level states are more robust and coherent. We also found active regions in intergenic regions that upon closer examination were expressed non-coding RNAs and pseudogenes. These results provide insights into an additional layer of complexity in chromatin organization.
hidden Markov model; chromatin; computational biology
Advances in DNA sequencing technologies have made it possible to rapidly, accurately and affordably sequence entire individual human genomes. As impressive as this ability seems, however, it will not likely to amount to much if one cannot extract meaningful information from individual sequence data. Annotating variations within individual genomes and providing information about their biological or phenotypic impact will thus be crucially important in moving individual sequencing projects forward, especially in the context of the clinical use of sequence information. In this paper we consider the various ways in which one might annotate individual sequence variations and point out limitations in the available methods for doing so. It is arguable that, in the foreseeable future, DNA sequencing of individual genomes will become routine for clinical, research, forensic, and personal purposes. We therefore also consider directions and areas for further research in annotating genomic variants.
Sequencing; functional analysis; computer modeling; genomic variation
Single Nucleotide Polymorphisms (SNPs) between microarray probes and RNA targets can affect the performance of expression array by weakening the hybridization. In this paper, we examined the effect of the SNPs on Affymetrix GeneChip probe set summaries and the expression quantitative trait loci (eQTL) mapping results in two eQTL datasets, one from mouse and one from human. We showed that removing SNP-containing probes significantly changed the probe set summaries and the more SNP-containing probes we removed the greater the change. Comparison of the eQTL mapping results between with and without SNP-containing probes showed that less than 70% of the significant eQTL peaks were concordant regardless of the significance threshold. These results indicate that SNPs do affect both probe set summaries and eQTLs (both cis and trans), thus SNP-containing probes should be filtered out to improve the performance of eQTL mapping.
microarray; SNP; eQTL; cis-eQTL; trans-eQTL; mouse; human
Hedgehog Interacting Protein (HHIP) was implicated in chronic obstructive pulmonary disease (COPD) by genome-wide association studies (GWAS). However, it remains unclear how HHIP contributes to COPD pathogenesis. To identify genes regulated by HHIP, we performed gene expression microarray analysis in a human bronchial epithelial cell line (Beas-2B) stably infected with HHIP shRNAs. HHIP silencing led to differential expression of 296 genes; enrichment for variants nominally associated with COPD was found. Eighteen of the differentially expressed genes were validated by real-time PCR in Beas-2B cells. Seven of 11 validated genes tested in human COPD and control lung tissues demonstrated significant gene expression differences. Functional annotation indicated enrichment for extracellular matrix and cell growth genes. Network modeling demonstrated that the extracellular matrix and cell proliferation genes influenced by HHIP tended to be interconnected. Thus, we identified potential HHIP targets in human bronchial epithelial cells that may contribute to COPD pathogenesis.
Hedgehog interacting protein (HHIP); Gene expression profiling; COPD (Chronic obstructive pulmonary disease); extracellular matrix (ECM); network modeling
Canine progressive rod-cone degeneration (prcd) is a retinal disease previously mapped to a broad, gene-rich centromeric region of canine chromosome 9. As allelic disorders are present in multiple breeds, we used linkage disequilibrium (LD) to narrow the ∼6.4 Mb interval candidate region. Multiple dog breeds, each representing genetically isolated populations, were typed for SNPs and other polymorphisms identified from BACs. The candidate region was initially localized to a 1.5 Mb zero recombination interval between growth factor receptor-bound protein 2 (GRB2) and SEC14-like 1 (SEC14L). A fine-scale haplotype of the region was developed which reduced the LD interval to 106 Kb, and identified a conserved haplotype of 98 polymorphisms present in all prcd-affected chromosomes from 14 different dog breeds. The findings strongly suggest that a common ancestor transmitted the prcd disease allele to many of the modern dog breeds, and demonstrate the power of LD approach in the canine model.
Disease Models; Animal; Genetic Diversity; Genetic Linkage; Genetic Markers; Genetic Predisposition to Disease; Genetic Variation; Retinal Degeneration
Progressive rod-cone degeneration (prcd) is a late-onset, autosomal recessive photoreceptor degeneration of dogs, and a homolog for some forms of human retinitis pigmentosa (RP). Previously, the disease relevant interval was reduced to a 106 Kb region on CFA9, and a common phenotype-specific haplotype was identified in all affected dogs from several different breeds, and breed varieties. Screening of a canine retinal EST library identified partial cDNAs for novel candidate genes in the disease relevant interval. The complete cDNA of one of these, PRCD, was cloned in dog, human and mouse. The gene codes for a 54 amino acid (aa) protein in dog and human, and 53 aa protein in the mouse; the first 24 aa, coded for by exon 1, are highly conserved in 14 vertebrate species. A homozygous mutation (TGC → TAC) in the second codon shows complete concordance with the disorder in 18 different dog breeds/breed varieties tested. The same homozygous mutation was identified in a human patient from Bangladesh with autosomal recessive (ar) RP. Expression studies support the predominant expression of this gene in the retina, with equal expression in the retinal pigment epithelium (RPE), photoreceptors and ganglion cell layers. This study provides strong evidence that a mutation in the novel gene, PRCD, is the cause of autosomal recessive retinal degeneration in both dogs and man.
Dogs; Disease Models, Animal; Genetic diversity; Genetic linkage; Genetic markers; Genetic predisposition to disease; Genetic variation; Mutation; Retinal Degeneration; Retinitis Pigmentosa
Amplicons – large, nearly identical repeats in direct or inverted orientation – are abundant in the male-specific region of the human Y chromosome (MSY) and provide targets for intrachromosomal non-allelic homologous recombination (NAHR). Thus far, NAHR events resulting in deletions, duplications, inversions, or isodicentric chromosomes have been reported only for amplicon pairs located exclusively on the short arm (Yp) or the long arm (Yq). Here we report our finding of four men with Y chromosomes that evidently formed by intrachromosomal NAHR between inverted repeat pairs comprising one amplicon on Yp and one amplicon on Yq. In two men with spermatogenic failure, sister-chromatid crossing-over resulted in pseudoisoYp chromosome formation and loss of distal Yq. In two men with normal spermatogenesis, intrachromatid crossing-over generated pericentric inversions. These findings highlight the recombinogenic nature of the MSY, as intrachromosomal NAHR occurs for nearly all Y-chromosome amplicon pairs, even those located on opposing chromosome arms.
Human Y chromosome; Non-allelic homologous recombination; Structural variation; Male infertility
Preterm birth in the United States is now 12%. Multiple genes, gene networks, variants have been associated with this disease. Using a custom database for preterm birth (dbPTB) with a refined set of genes extensively curated from literature and biological databases, we analyzed a GWAS of preterm birth for complete genotype data on nearly 2000 preterm and term mothers. We used both the curated genes and a genome-wide approach to carry out a pathway-based analysis. There were 19 significant pathways, which withstood FDR correction for multiple testing that were identified using both the curated genes and the genome-wide approach. The analysis based on the curated genes was more significant than genome-wide in 15 out of 19 pathways. This approach demonstrates the use of a validated set of genes, in the analysis of otherwise unsuccessful GWAS data, to identify gene-gene interactions in a way that enhances statistical power and discovery.
Preterm birth; Pathway analysis; GWAS
Three genes that encode related Ig-superfamily molecules have recently been mapped to human chromosome 15 in the region q22.3-23, and to the syntenic region on mouse chromosome 9. These genes presumably derived from gene duplications and they are highly similar to Deleted in Colorectal Cancer (DCC), which functions as an axon guidance molecule during development of the nervous system. In order to find out whether additional genes of this class were present in a chromosomal cluster, we produced a comparative physical map within the region of synteny between mouse chromosome 9 and human chromosome 15. This interval overlaps the critical region for the fourth genetic locus for Bardet-Biedl Syndrome (BBS4) in humans. Bardet-Biedl Syndrome (OMIM 600374) is characterized by poly/syn/brachydactyly, retinal degeneration, hypogonadism, mental retardation, obesity, diabetes, and kidney abnormalities. A detailed map of this locus will help to identify candidate genes for this disorder.
We genotyped a Chinese and an Indian-origin rhesus macaque using the Affymetrix Genome-Wide Human SNP Array 6.0 and catalogued 85,473 uniquely mapping heterospecific SNPs. These SNPs were assigned to rhesus chromosomes according to their probe sequence alignments as displayed in the human and rhesus reference sequences. The conserved gene order (synteny) revealed by heterospecific SNP maps is in concordance with that of the published human and rhesus macaque genomes.
Using these SNPs’ original human rs numbers, we identified 12,328 genes annotated in humans that are associated with these SNPs, 3,674 of which were found in at least one of the two rhesus macaques studied. Due to their density, the heterospecific SNPs allow fine-grained comparisons, including approximate boundaries of intra- and extra-chromosomal rearrangements involving gene orthologs, which can be used to distinguish rhesus macaque chromosomes from human chromosomes.
Macaca mulatta; single nucleotide polymoprhisms (SNPs); Homo sapiens; heterospecific sequence maps
Recent studies have demonstrated that gene set analysis, which tests disease association with genetic variants in a group of functionally related genes, is a promising approach for analyzing and interpreting genome-wide association studies (GWAS) data. These approaches aim to increase power by combining association signals from multiple genes in the same gene set. In addition, gene set analysis can also shed more light on the biological processes underlying complex diseases. However, current approaches for gene set analysis are still in an early stage of development in that analysis results are often prone to sources of bias, including gene set size and gene length, linkage disequilibrium patterns and the presence of overlapping genes. In this paper, we provide an in-depth review of the gene set analysis procedures, along with parameter choices and the particular methodology challenges at each stage. In addition to providing a survey of recently developed tools, we also classify the analysis methods into larger categories and discuss their strengths and limitations. In the last section, we outline several important areas for improving the analytical strategies in gene set analysis.
Genome-wide association study; Gene set; Pathway; Gene-set enrichment analysis; Statistical significance; Complex disease
Gene-expression microarrays allow researchers to characterize biological phenomena in a high-throughput fashion but are subject to technological biases and inevitable variabilities that arise during sample collection and processing. Normalization techniques aim to correct such biases. Most existing methods require multiple samples to be processed in aggregate; consequently, each sample's output is influenced by other samples processed jointly. However, in personalized-medicine workflows, samples may arrive serially, so renormalizing all samples upon each new arrival would be impractical. We have developed Single Channel Array Normalization (SCAN), a single-sample technique that models the effects of probe-nucleotide composition on fluorescence intensity and corrects for such effects, dramatically increasing the signal-to-noise ratio within individual samples while decreasing variation across samples. In various benchmark comparisons, we show that SCAN performs as well as or better than competing methods yet has no dependence on external reference samples and can be applied to any single-channel microarray platform.
Method; normalization; microarray; linear model; mixture model; single-sample technique