Search tips
Search criteria

Results 1-20 (20)

Clipboard (0)

Select a Filter Below

more »
Year of Publication
more »
author:("Gai, xiaohui")
1.  Single Nucleotide Polymorphism Array Analysis of Bone Marrow Failure Patients Reveals Characteristic Patterns of Genetic Changes 
British journal of haematology  2013;164(1):73-82.
The bone marrow failure syndromes (BMFS) are a heterogeneous group of rare blood disorders characterized by inadequate haematopoiesis, clonal evolution, and increased risk of leukaemia. Single nucleotide polymorphism arrays (SNP-A) have been proposed as a tool for surveillance of clonal evolution in BMFS. To better understand the natural history of BMFS and to assess the clinical utility of SNP-A in these disorders, we analysed 124 SNP-A from a comprehensively characterized cohort of 91 patients at our BMFS centre. SNP-A were correlated with medical histories, haematopathology, cytogenetic and molecular data. To assess clonal evolution, longitudinal analysis of SNP-A was performed in 25 patients. We found that acquired copy number-neutral loss of heterozygosity (CN-LOH) was significantly more frequent in acquired aplastic anaemia (aAA) than in other BMFS (odds ratio 12.2, p<0.01). Homozygosity by descent was most common in congenital BMFS, frequently unmasking autosomal recessive mutations. Copy number variants (CNVs) were frequently polymorphic, and we identified CNVs enriched in neutropenia and aAA. Our results suggest that acquired CN-LOH is a general phenomenon in aAA that is probably mechanistically and prognostically distinct from typical CN-LOH of myeloid malignancies. Our analysis of clinical utility of SNP-A shows the highest yield of detecting new clonal haematopoiesis at diagnosis and at relapse.
PMCID: PMC3986350  PMID: 24116929
bone marrow failure; aplastic anaemia; chromosomal rearrangements; clonal evolution; cytogenetic diagnosis
2.  Urine Is Not Sterile: Use of Enhanced Urine Culture Techniques To Detect Resident Bacterial Flora in the Adult Female Bladder 
Journal of Clinical Microbiology  2014;52(3):871-876.
Our previous study showed that bacterial genomes can be identified using 16S rRNA sequencing in urine specimens of both symptomatic and asymptomatic patients who are culture negative according to standard urine culture protocols. In the present study, we used a modified culture protocol that included plating larger volumes of urine, incubation under varied atmospheric conditions, and prolonged incubation times to demonstrate that many of the organisms identified in urine by 16S rRNA gene sequencing are, in fact, cultivable using an expanded quantitative urine culture (EQUC) protocol. Sixty-five urine specimens (from 41 patients with overactive bladder and 24 controls) were examined using both the standard and EQUC culture techniques. Fifty-two of the 65 urine samples (80%) grew bacterial species using EQUC, while the majority of these (48/52 [92%]) were reported as no growth at 103 CFU/ml by the clinical microbiology laboratory using the standard urine culture protocol. Thirty-five different genera and 85 different species were identified by EQUC. The most prevalent genera isolated were Lactobacillus (15%), followed by Corynebacterium (14.2%), Streptococcus (11.9%), Actinomyces (6.9%), and Staphylococcus (6.9%). Other genera commonly isolated include Aerococcus, Gardnerella, Bifidobacterium, and Actinobaculum. Our current study demonstrates that urine contains communities of living bacteria that comprise a resident female urine microbiota.
PMCID: PMC3957746  PMID: 24371246
3.  The Female Urinary Microbiome: a Comparison of Women with and without Urgency Urinary Incontinence 
mBio  2014;5(4):e01283-14.
Bacterial DNA and live bacteria have been detected in human urine in the absence of clinical infection, challenging the prevailing dogma that urine is normally sterile. Urgency urinary incontinence (UUI) is a poorly understood urinary condition characterized by symptoms that overlap urinary infection, including urinary urgency and increased frequency with urinary incontinence. The recent discovery of the urinary microbiome warrants investigation into whether bacteria contribute to UUI. In this study, we used 16S rRNA gene sequencing to classify bacterial DNA and expanded quantitative urine culture (EQUC) techniques to isolate live bacteria in urine collected by using a transurethral catheter from women with UUI and, in comparison, a cohort without UUI. For these cohorts, we demonstrated that the UUI and non-UUI urinary microbiomes differ by group based on both sequence and culture evidences. Compared to the non-UUI microbiome, sequencing experiments revealed that the UUI microbiome was composed of increased Gardnerella and decreased Lactobacillus. Nine genera (Actinobaculum, Actinomyces, Aerococcus, Arthrobacter, Corynebacterium, Gardnerella, Oligella, Staphylococcus, and Streptococcus) were more frequently cultured from the UUI cohort. Although Lactobacillus was isolated from both cohorts, distinctions existed at the species level, with Lactobacillus gasseri detected more frequently in the UUI cohort and Lactobacillus crispatus most frequently detected in controls. Combined, these data suggest that potentially important differences exist in the urinary microbiomes of women with and without UUI, which have strong implications in prevention, diagnosis, or treatment of UUI.
New evidence indicates that the human urinary tract contains microbial communities; however, the role of these communities in urinary health remains to be elucidated. Urgency urinary incontinence (UUI) is a highly prevalent yet poorly understood urinary condition characterized by urgency, frequency, and urinary incontinence. Given the significant overlap of UUI symptoms with those of urinary tract infections, it is possible that UUI may have a microbial component. We compared the urinary microbiomes of women affected by UUI to those of a comparison group without UUI, using both high-throughput sequencing and extended culture techniques. We identified statistically significant differences in the frequency and abundance of bacteria present. These differences suggest a potential role for the urinary microbiome in female urinary health.
PMCID: PMC4161260  PMID: 25006228
5.  Mitochondrial Disease Genetic Diagnostics: Optimized whole-exome analysis for all MitoCarta nuclear genes and the mitochondrial genome 
Discovery medicine  2012;14(79):389-399.
Discovering causative genetic variants in individual cases of suspected mitochondrial disease requires interrogation of both the mitochondrial (mtDNA) and nuclear genomes. Whole-exome sequencing can support simultaneous dual-genome analysis, although currently available capture kits do not target the mtDNA genome and provide insufficient capture for some nuclear-encoded mitochondrial genes. To optimize interrogation of nuclear and mtDNA genes relevant to mitochondrial biology and disease, a custom SureSelect “Mito-Plus” whole-exome library was formulated by blending RNA “baits” from three separate designs: (A) Agilent Technologies SureSelectXT 50 Mb All Exon PLUS Targeted Enrichment Kit, (B) 16-gene nuclear panel targeting sequences for known MitoCarta proteins not included in the 50 Mb All-Exon design, and (C) sequences targeting the entire mtDNA genome. The final custom formulations consisted of a 1:1 ratio of nuclear baits to which a 1 to 1,000-fold diluted ratio of mtDNA genome baits were blended. Patient sample capture libraries were paired-end sequenced on an Illumina HiSeq 2000 system using v3.0 SBS chemistry. mtDNA genome coverage varied depending on the mtDNA:nuclear blend ratio, where a 1:100 ratio provided optimal dual-genome coverage with 10X coverage for over 97.5% of all targeted nuclear regions and 1,000X coverage for 99.8% of the mtDNA genome. mtDNA mutations were reliably detected to at least an 8% heteroplasmy level, as discriminated both from sequencing errors and potential contamination from nuclear mtDNA transcripts (Numts). The “1:100 Mito-Plus Whole-Exome” Agilent capture kit offers an optimized tool for whole-exome analysis of nuclear and mtDNA genes relevant to the diagnostic evaluation of mitochondrial disease.
PMCID: PMC3923327  PMID: 23272691
Exome; Capture; Mitochondria; MitoCarta; heteroplasmy; variants; Agilent; SureSelect; HiSeq; NUMT
6.  AGC1 Deficiency Causes Infantile Epilepsy, Abnormal Myelination, and Reduced N-Acetylaspartate 
JIMD Reports  2014;14:77-85.
Background: Whole exome sequencing (WES) offers a powerful diagnostic tool to rapidly and efficiently sequence all coding genes in individuals presenting for consideration of phenotypically and genetically heterogeneous disorders such as suspected mitochondrial disease. Here, we report results of WES and functional validation in a consanguineous Indian kindred where two siblings presented with profound developmental delay, congenital hypotonia, refractory epilepsy, abnormal myelination, fluctuating basal ganglia changes, cerebral atrophy, and reduced N-acetylaspartate (NAA).
Methods: Whole blood DNA from one affected and one unaffected sibling was captured by Agilent SureSelect Human All Exon kit and sequenced on the Illumina HiSeq2000. Mutations were validated by Sanger sequencing in all family members. Protein from wild-type and mutant fibroblasts was isolated to assess mutation effects on protein expression and enzyme activity.
Results: A novel SLC25A12 homozygous missense mutation, c.1058G>A; p.Arg353Gln, segregated with disease in this kindred. SLC25A12 encodes the neuronal aspartate-glutamate carrier 1 (AGC1) protein, an essential component of the neuronal malate/aspartate shuttle that transfers NADH and H+ reducing equivalents from the cytosol to mitochondria. AGC1 activity enables neuronal export of aspartate, the glial substrate necessary for proper neuronal myelination. Recombinant mutant p.Arg353Gln AGC1 activity was reduced to 15% of wild type. One prior reported SLC25A12 mutation caused complete loss of AGC1 activity in a child with epilepsy, hypotonia, hypomyelination, and reduced brain NAA.
Conclusions: These data strongly suggest that SLC25A12 disease impairs neuronal AGC1 activity. SLC25A12 sequencing should be considered in children with infantile epilepsy, congenital hypotonia, global delay, abnormal myelination, and reduced brain NAA.
Electronic supplementary material
The online version of this chapter (doi:10.1007/8904_2013_287) contains supplementary material, which is available to authorized users.
PMCID: PMC4213337  PMID: 24515575
7.  NMNAT1 mutations cause Leber congenital amaurosis 
Nature genetics  2012;44(9):1040-1045.
Leber congenital amaurosis (LCA) is an infantile-onset form of inherited retinal degeneration characterized by severe vision loss1, 2. Two-thirds of LCA cases are caused by mutations in 17 known disease genes3 (RetNet Retinal Information Network). Using exome sequencing, we identified a homozygous missense mutation (c.25G>A, p.Val9Met) in NMNAT1 as likely disease-causing in two siblings of a consanguineous Pakistani kindred affected by LCA. This mutation segregated with disease in their kindred, including in three other children with LCA. NMNAT1 resides in the previously identified LCA9 locus and encodes the nuclear isoform of nicotinamide mononucleotide adenylyltransferase, a rate-limiting enzyme in nicotinamide adenine dinucleotide (NAD+) biosynthesis4, 5. Functional studies showed the p.Val9Met mutation decreased NMNAT1 enzyme activity. Sequencing NMNAT1 in 284 unrelated LCA families identified 14 rare mutations in 13 additional affected individuals. These results are the first to link an NMNAT isoform to disease and indicate that NMNAT1 mutations cause LCA.
PMCID: PMC3454532  PMID: 22842227
8.  Mitochondrial genome sequence analysis: A custom bioinformatics pipeline substantially improves Affymetrix MitoChip v2.0 call rate and accuracy 
BMC Bioinformatics  2011;12:402.
Mitochondrial genome sequence analysis is critical to the diagnostic evaluation of mitochondrial disease. Existing methodologies differ widely in throughput, complexity, cost efficiency, and sensitivity of heteroplasmy detection. Affymetrix MitoChip v2.0, which uses a sequencing-by-genotyping technology, allows potentially accurate and high-throughput sequencing of the entire human mitochondrial genome to be completed in a cost-effective fashion. However, the relatively low call rate achieved using existing software tools has limited the wide adoption of this platform for either clinical or research applications. Here, we report the design and development of a custom bioinformatics software pipeline that achieves a much improved call rate and accuracy for the Affymetrix MitoChip v2.0 platform. We used this custom pipeline to analyze MitoChip v2.0 data from 24 DNA samples representing a broad range of tissue types (18 whole blood, 3 skeletal muscle, 3 cell lines), mutations (a 5.8 kilobase pair deletion and 6 known heteroplasmic mutations), and haplogroup origins. All results were compared to those obtained by at least one other mitochondrial DNA sequence analysis method, including Sanger sequencing, denaturing HPLC-based heteroduplex analysis, and/or the Illumina Genome Analyzer II next generation sequencing platform.
An average call rate of 99.75% was achieved across all samples with our custom pipeline. Comparison of calls for 15 samples characterized previously by Sanger sequencing revealed a total of 29 discordant calls, which translates to an estimated 0.012% for the base call error rate. We successfully identified 4 known heteroplasmic mutations and 24 other potential heteroplasmic mutations across 20 samples that passed quality control.
Affymetrix MitoChip v2.0 analysis using our optimized MitoChip Filtering Protocol (MFP) bioinformatics pipeline now offers the high sensitivity and accuracy needed for reliable, high-throughput and cost-efficient whole mitochondrial genome sequencing. This approach provides a viable alternative of potential utility for both clinical diagnostic and research applications to traditional Sanger and other emerging sequencing technologies for whole mitochondrial genome analysis.
PMCID: PMC3234255  PMID: 22011106
9.  Genomic Alterations in Biliary Atresia Suggests Region of Potential Disease Susceptibility in 2q37.3 
Biliary atresia (BA) is a progressive, idiopathic obliteration of the extrahepatic biliary system occurring exclusively in the neonatal period. It is the most common disease leading to liver transplantation in children. The etiology of BA is unknown, although infectious, immune and genetic causes have been suggested. While the recurrence of BA in families is not common, there are more than 30 multiplex families reported and an underlying genetic susceptibility has been hypothesized. We screened a cohort of 35 BA patients for genomic alterations that might confer susceptibility to BA. DNA was genotyped on the Illumina Quad550 platform, which analyzes over 550,000 single nucleotide polymorphisms (SNPs) for genomic deletions and duplications. Areas of increased and decreased copy number were compared to those found in control populations. In order to identify regions that could serve as susceptibility factors for BA, we searched for regions that were found in BA patients, but not in controls. We identified two unrelated BA patients with overlapping heterozygous deletions of 2q37.3. Patient 1 had a 1.76 Mb (280 SNP), heterozygous deletion containing thirty genes. Patient 2 had a 5.87 Mb (1,346 SNP) heterozygous deletion containing fifty-five genes. The overlapping 1.76 Mb deletion on chromosome 2q37.3 from 240,936,900 to 242,692,820 constitutes the critical region and the genes within this region could be candidates for susceptibility to BA.
PMCID: PMC2914625  PMID: 20358598
Biliary atresia; copy number variation; deletion 2q37.3
10.  Duplication of 7q34 in Pediatric Low-Grade Astrocytomas Detected by High-Density Single-Nucleotide Polymorphism-Based Genotype Arrays Results in a Novel BRAF Fusion Gene 
In the present study, DNA from 28 pediatric low-grade astrocytomas was analyzed using Illumina HumanHap550K single-nucleotide polymorphism oligonucleotide arrays. A novel duplication in chromosome band 7q34 was identified in 17 of 22 juvenile pilocytic astrocytomas and three of six fibrillary astrocytomas. The 7q34 duplication spans 2.6 Mb of genomic sequence and contains approximately 20 genes, including two candidate tumor genes, HIPK2 and BRAF. There were no abnormalities in HIPK2, and analysis of two mutation hot-spots in BRAF revealed a V600E mutation in only one tumor without the duplication. Fluorescence in situ hybridization confirmed the 7q34 copy number change and was suggestive of a tandem duplication. Reverse transcription polymerase chain reaction-based sequencing revealed a fusion product between KIAA1549 and BRAF. The predicted fusion product includes the BRAF kinase domain and lacks the auto-inhibitory N-terminus. Western blot analysis revealed phosphorylated mitogen-activated protein kinase (MAPK) protein in tumors with the duplication, consistent with BRAF-induced activation of the pathway. Further studies are required to determine the role of this fusion gene in downstream MAPK signaling and its role in development of pediatric low-grade astrocytomas.
PMCID: PMC2850204  PMID: 19016743
astrocytoma; BRAF; glioma; HIPK2; SNP array; 7q34
11.  Genomic analysis using high density SNP based oligonucleotide arrays and MLPA provides a comprehensive analysis of INI1/SMARCB1 in malignant rhabdoid tumors 
Translational Relevance
Previous reports suggested that abnormalities of INI1 could be detected in 70–75% of malignant rhabdoid tumors. The mechanism of inactivation in the other 25% remained unclear. The goal of this study was to perform a high-resolution genomic analysis of a large series of rhabdoid tumors with the expectation of identifying additional loci related to the initiation or progression of these malignancies. We also developed a comprehensive set of assays, including a new MLPA assay, to interrogate the INI1 locus in 22q11.2. Intragenic deletions could be detected using the Illumina 550K Beadchip, whereas single exon deletions could be detected using MLPA. The current study demonstrates that with a multi-platform approach, alterations at the INI1 locus can be detected in almost all cases. Thus, appropriate molecular genetic testing can be used as an aid in the diagnosis and for treatment planning for most patients.
A high-resolution genomic profiling and comprehensive targeted analysis of INI1/SMARCB1 of a large series of pediatric rhabdoid tumors was performed. The aim was to identify regions of copy number change and loss of heterozygosity that might pinpoint additional loci involved in the development or progression of rhabdoid tumors, and define the spectrum of genomic alterations of INI1 in this malignancy.
Experimental Design
A multi-platform approach, utilizing Illumina single nucleotide polymorphism (SNP) based oligonucleotide arrays, multiplex ligation dependent probe amplification (MLPA), fluorescence in situ hybridization (FISH), and coding sequence analysis was used to characterize genome wide copy number changes, loss of heterozygosity, and genomic alterations of INI1/SMARCB1 in a series of pediatric rhabdoid tumors.
The bi-allelic alterations of INI1 that led to inactivation were elucidated in 50 of 51 tumors. INI1 inactivation was demonstrated by a variety of mechanisms, including deletions, mutations, and loss of heterozygosity. The results from the array studies highlighted the complexity of rearrangements of chromosome 22, compared to the low frequency of alterations involving the other chromosomes.
The results from the genome wide SNP-array analysis suggest that INI1 is the primary tumor suppressor gene involved in the development of rhabdoid tumors with no second locus identified. In addition, we did not identify hot spots for the breakpoints in sporadic tumors with deletions of chromosome 22q11.2. By employing a multimodality approach, the wide spectrum of alterations of INI1 can be identified in the majority of patients, which increases the clinical utility of molecular diagnostic testing.
PMCID: PMC2668138  PMID: 19276269
INI1/SMARCB1; rhabdoid tumor; 22q11.2; SNP array; MLPA
12.  SNP array mapping of 20p deletions: Genotypes, Phenotypes and Copy Number Variation 
Human mutation  2009;30(3):371-378.
The use of array technology to define chromosome deletions and duplications is bringing us closer to establishing a genotype/phenotype map of genomic copy number alterations. We studied 21 patients and 5 relatives with deletions of the short arm of chromosome 20 using the Illumina HumanHap550 SNP array to 1) more accurately determine the deletion sizes, 2) identify and compare breakpoints, 3) establish genotype/phenotype correlations and 4) investigate the use of the HumanHap550 platform for analysis of chromosome deletions. Deletions ranged from 95kb to 14.62Mb, and all of the breakpoints were unique. Eleven patients had deletions between 95kb and 4Mb and these individuals had normal development, with no anomalies outside of those associated with Alagille syndrome. The proximal and distal boundaries of these eleven deletions constitute a 5.4MB region, and we propose that haploinsufficiency for only 1 of the 12 genes in this region causes phenotypic abnormalities. This defines the JAG1 associated critical region, in which deletions do not confer findings other than those associated with Alagille syndrome. The other 10 patients had deletions between 3.28Mb and 14.62Mb, which extended outside the critical region, and notably, all of these patients, had developmental delay. This group had other findings such as autism, scoliosis and bifid uvula. We identified 47 additional polymorphic genome-wide copy number variants (>20 SNPs), with 0–5 variants called per patient. Deletions of the short arm of chromosome 20 are associated with relatively mild and limited clinical anomalies. The use of SNP arrays provides accurate high-resolution definition of genomic abnormalities.
PMCID: PMC2650004  PMID: 19058200
SNP array analysis; 20p deletion; copy number variants; Alagille syndrome; haploinsufficiency; JAG1
13.  CNV Workshop: an integrated platform for high-throughput copy number variation discovery and clinical diagnostics 
BMC Bioinformatics  2010;11:74.
Recent studies have shown that copy number variations (CNVs) are frequent in higher eukaryotes and associated with a substantial portion of inherited and acquired risk for various human diseases. The increasing availability of high-resolution genome surveillance platforms provides opportunity for rapidly assessing research and clinical samples for CNV content, as well as for determining the potential pathogenicity of identified variants. However, few informatics tools for accurate and efficient CNV detection and assessment currently exist.
We developed a suite of software tools and resources (CNV Workshop) for automated, genome-wide CNV detection from a variety of SNP array platforms. CNV Workshop includes three major components: detection, annotation, and presentation of structural variants from genome array data. CNV detection utilizes a robust and genotype-specific extension of the Circular Binary Segmentation algorithm, and the use of additional detection algorithms is supported. Predicted CNVs are captured in a MySQL database that supports cohort-based projects and incorporates a secure user authentication layer and user/admin roles. To assist with determination of pathogenicity, detected CNVs are also annotated automatically for gene content, known disease loci, and gene-based literature references. Results are easily queried, sorted, filtered, and visualized via a web-based presentation layer that includes a GBrowse-based graphical representation of CNV content and relevant public data, integration with the UCSC Genome Browser, and tabular displays of genomic attributes for each CNV.
To our knowledge, CNV Workshop represents the first cohesive and convenient platform for detection, annotation, and assessment of the biological and clinical significance of structural variants. CNV Workshop has been successfully utilized for assessment of genomic variation in healthy individuals and disease cohorts and is an ideal platform for coordinating multiple associated projects.
Availability and Implementation
Available on the web at:
PMCID: PMC2827374  PMID: 20132550
14.  Sequence mining and transcript profiling to explore cyst nematode parasitism 
BMC Genomics  2009;10:58.
Cyst nematodes are devastating plant parasites that become sedentary within plant roots and induce the transformation of normal plant cells into elaborate feeding cells with the help of secreted effectors, the parasitism proteins. These proteins are the translation products of parasitism genes and are secreted molecular tools that allow cyst nematodes to infect plants.
We present here the expression patterns of all previously described parasitism genes of the soybean cyst nematode, Heterodera glycines, in all major life stages except the adult male. These insights were gained by analyzing our gene expression dataset from experiments using the Affymetrix Soybean Genome Array GeneChip, which contains probeset sequences for 6,860 genes derived from preparasitic and parasitic H. glycines life stages. Targeting the identification of additional H. glycines parasitism-associated genes, we isolated 633 genes encoding secretory proteins using algorithms to predict secretory signal peptides. Furthermore, because some of the known H. glycines parasitism proteins have strongest similarity to proteins of plants and microbes, we searched for predicted protein sequences that showed their highest similarities to plant or microbial proteins and identified 156 H. glycines genes, some of which also contained a signal peptide. Analyses of the expression profiles of these genes allowed the formulation of hypotheses about potential roles in parasitism. This is the first study combining sequence analyses of a substantial EST dataset with microarray expression data of all major life stages (except adult males) for the identification and characterization of putative parasitism-associated proteins in any parasitic nematode.
We have established an expression atlas for all known H. glycines parasitism genes. Furthermore, in an effort to identify additional H. glycines genes with putative functions in parasitism, we have reduced the currently known 6,860 H. glycines genes to a pool of 788 most promising candidate genes (including known parasitism genes) and documented their expression profiles. Using our approach to pre-select genes likely involved in parasitism now allows detailed functional analyses in a manner not feasible for larger numbers of genes. The generation of the candidate pool described here is an important enabling advance because it will significantly facilitate the unraveling of fascinating plant-animal interactions and deliver knowledge that can be transferred to other pathogen-host systems. Ultimately, the exploration of true parasitism genes verified from the gene pool delineated here will identify weaknesses in the nematode life cycle that can be exploited by novel anti-nematode efforts.
PMCID: PMC2640417  PMID: 19183474
15.  Concept, Design and Implementation of a Cardiovascular Gene-Centric 50 K SNP Array for Large-Scale Genomic Association Studies 
PLoS ONE  2008;3(10):e3583.
A wealth of genetic associations for cardiovascular and metabolic phenotypes in humans has been accumulating over the last decade, in particular a large number of loci derived from recent genome wide association studies (GWAS). True complex disease-associated loci often exert modest effects, so their delineation currently requires integration of diverse phenotypic data from large studies to ensure robust meta-analyses. We have designed a gene-centric 50 K single nucleotide polymorphism (SNP) array to assess potentially relevant loci across a range of cardiovascular, metabolic and inflammatory syndromes. The array utilizes a “cosmopolitan” tagging approach to capture the genetic diversity across ∼2,000 loci in populations represented in the HapMap and SeattleSNPs projects. The array content is informed by GWAS of vascular and inflammatory disease, expression quantitative trait loci implicated in atherosclerosis, pathway based approaches and comprehensive literature searching. The custom flexibility of the array platform facilitated interrogation of loci at differing stringencies, according to a gene prioritization strategy that allows saturation of high priority loci with a greater density of markers than the existing GWAS tools, particularly in African HapMap samples. We also demonstrate that the IBC array can be used to complement GWAS, increasing coverage in high priority CVD-related loci across all major HapMap populations. DNA from over 200,000 extensively phenotyped individuals will be genotyped with this array with a significant portion of the generated data being released into the academic domain facilitating in silico replication attempts, analyses of rare variants and cross-cohort meta-analyses in diverse populations. These datasets will also facilitate more robust secondary analyses, such as explorations with alternative genetic models, epistasis and gene-environment interactions.
PMCID: PMC2571995  PMID: 18974833
16.  Divergent evolution of arrested development in the dauer stage of Caenorhabditis elegans and the infective stage of Heterodera glycines 
Genome Biology  2007;8(10):R211.
The generation and analysis of over 20,000 ESTs allowed the identification and expression profiling of 6,860 predicted genes in the nematode Heterodera glycines. This revealed that gene expression patterns in the dauer stage of Caenorhabditis elegans are not conserved in H. glycines.
The soybean cyst nematode Heterodera glycines is the most important parasite in soybean production worldwide. A comprehensive analysis of large-scale gene expression changes throughout the development of plant-parasitic nematodes has been lacking to date.
We report an extensive genomic analysis of H. glycines, beginning with the generation of 20,100 expressed sequence tags (ESTs). In-depth analysis of these ESTs plus approximately 1,900 previously published sequences predicted 6,860 unique H. glycines genes and allowed a classification by function using InterProScan. Expression profiling of all 6,860 genes throughout the H. glycines life cycle was undertaken using the Affymetrix Soybean Genome Array GeneChip. Our data sets and results represent a comprehensive resource for molecular studies of H. glycines. Demonstrating the power of this resource, we were able to address whether arrested development in the Caenorhabditis elegans dauer larva and the H. glycines infective second-stage juvenile (J2) exhibits shared gene expression profiles. We determined that the gene expression profiles associated with the C. elegans dauer pathway are not uniformly conserved in H. glycines and that the expression profiles of genes for metabolic enzymes of C. elegans dauer larvae and H. glycines infective J2 are dissimilar.
Our results indicate that hallmark gene expression patterns and metabolism features are not shared in the developmentally arrested life stages of C. elegans and H. glycines, suggesting that developmental arrest in these two nematode species has undergone more divergent evolution than previously thought and pointing to the need for detailed genomic analyses of individual parasite species.
PMCID: PMC2246285  PMID: 17919324
17.  Ty5 gag Mutations Increase Retrotransposition and Suggest a Role for Hydrogen Bonding in the Function of the Nucleocapsid Zinc Finger† 
Journal of Virology  2002;76(7):3240-3247.
The Ty5 retrotransposon of Saccharomyces paradoxus transposes in Saccharomyces cerevisiae at frequencies 1,000-fold lower than do the native Ty1 elements. The low transposition activity of Ty5 could be due to differences in cellular environments between these yeast species or to naturally occurring mutations in Ty5. By screening of a Ty5 mutant library, two single mutants (D252N and Y68C) were each found to increase transposition approximately sixfold. When combined, transposition increased 36-fold, implying that the two mutations act independently. Neither mutation affected Ty5 protein synthesis, processing, cDNA recombination, or target site choice. However, cDNA levels in both single mutants and the double mutant were significantly higher than in the wild type. The D252N mutation resides in the zinc finger of nucleocapsid and increases the potential for hydrogen bonding with nucleic acids. We generated other mutations that increase the hydrogen bonding potential (i.e., D252R and D252K) and found that they similarly increased transposition. This suggests that hydrogen bonding within the zinc finger motif is important for cDNA production and builds upon previous studies implicating basic amino acids flanking the zinc finger as important for zinc finger function. Although NCp zinc fingers differ from the zinc finger motifs of cellular enzymes, the requirement for efficient hydrogen bonding is likely universal.
PMCID: PMC136051  PMID: 11884548
18.  Targeting of the Yeast Ty5 Retrotransposon to Silent Chromatin Is Mediated by Interactions between Integrase and Sir4p† 
Molecular and Cellular Biology  2001;21(19):6606-6614.
The Ty5 retrotransposons of Saccharomyces cerevisiae integrate preferentially into regions of silent chromatin at the telomeres and silent mating loci (HMR and HML). We define a Ty5-encoded targeting domain that spans 6 amino acid residues near the C terminus of integrase (LXSSXP). The targeting domain establishes silent chromatin when it is tethered to a weakened HMR-E silencer, and it disrupts telomeric silencing when it is overexpressed. As determined by both yeast two-hybrid and in vitro binding assays, the targeting domain interacts with the C terminus of Sir4p, a structural component of silent chromatin. This interaction is abrogated by mutations in the targeting domain that disrupt integration into silent chromatin, suggesting that recognition of Sir4p by the targeting domain is the primary determinant in Ty5 target specificity.
PMCID: PMC99806  PMID: 11533248
19.  Gene discovery using the maize genome database ZmDB 
Nucleic Acids Research  2000;28(1):94-96.
Zea mays DataBase (ZmDB) is a repository and analysis tool for sequence, expression and phenotype data of the major crop plant maize. The data accessible in ZmDB are mostly generated in a large collaborative project of maize gene discovery, sequencing and phenotypic analysis using a transposon tagging strategy and expressed sequence tag (EST) sequencing. ESTs constitute most of the current content. Database search tools, convenient links to external databases, and novel sequence analysis programs for spliced alignment are provided and together serve as an efficient protocol for gene discovery by sequence inspection. ZmDB can be accessed at http://zmdb. . ZmDB also provides web-based ordering of materials generated in the project, including EST and genomic DNA clones, seeds of mutant plants and microarrays of amplified EST and genomic DNA sequences.
PMCID: PMC102455  PMID: 10592191
20.  Prevalence of rare mitochondrial DNA mutations in mitochondrial disorders 
Journal of Medical Genetics  2013;50(10):704-714.
Mitochondrial DNA (mtDNA) diseases are rare disorders whose prevalence is estimated around 1 in 5000. Patients are usually tested only for deletions and for common mutations of mtDNA which account for 5–40% of cases, depending on the study. However, the prevalence of rare mtDNA mutations is not known.
We analysed the whole mtDNA in a cohort of 743 patients suspected of manifesting a mitochondrial disease, after excluding deletions and common mutations. Both heteroplasmic and homoplasmic variants were identified using two complementary strategies (Surveyor and MitoChip). Multiple correspondence analyses followed by hierarchical ascendant cluster process were used to explore relationships between clinical spectrum, age at onset and localisation of mutations.
7.4% of deleterious mutations and 22.4% of novel putative mutations were identified. Pathogenic heteroplasmic mutations were more frequent than homoplasmic mutations (4.6% vs 2.8%). Patients carrying deleterious mutations showed symptoms before 16 years of age in 67% of cases. Early onset disease (<1 year) was significantly associated with mutations in protein coding genes (mainly in complex I) while late onset disorders (>16 years) were associated with mutations in tRNA genes. MTND5 and MTND6 genes were identified as ‘hotspots’ of mutations, with Leigh syndrome accounting for the large majority of associated phenotypes.
Rare mitochondrial DNA mutations probably account for more than 7.4% of patients with respiratory chain deficiency. This study shows that a comprehensive analysis of mtDNA is essential, and should include young children, for an accurate diagnosis that is now accessible with the development of next generation sequencing technology.
PMCID: PMC3786640  PMID: 23847141
Mitochondrial disease; Mitochondrial DNA; Rare mutations; Patient cohort

Results 1-20 (20)