Search tips
Search criteria

Results 1-25 (44)

Clipboard (0)

Select a Filter Below

Year of Publication
1.  The Human Phenotype Ontology project: linking molecular biology and disease through phenotype data 
Nucleic Acids Research  2013;42(Database issue):D966-D974.
The Human Phenotype Ontology (HPO) project, available at, provides a structured, comprehensive and well-defined set of 10,088 classes (terms) describing human phenotypic abnormalities and 13,326 subclass relations between the HPO classes. In addition we have developed logical definitions for 46% of all HPO classes using terms from ontologies for anatomy, cell types, function, embryology, pathology and other domains. This allows interoperability with several resources, especially those containing phenotype information on model organisms such as mouse and zebrafish. Here we describe the updated HPO database, which provides annotations of 7,278 human hereditary syndromes listed in OMIM, Orphanet and DECIPHER to classes of the HPO. Various meta-attributes such as frequency, references and negations are associated with each annotation. Several large-scale projects worldwide utilize the HPO for describing phenotype information in their datasets. We have therefore generated equivalence mappings to other phenotype vocabularies such as LDDB, Orphanet, MedDRA, UMLS and phenoDB, allowing integration of existing datasets and interoperability with multiple biomedical resources. We have created various ways to access the HPO database content using flat files, a MySQL database, and Web-based tools. All data and documentation on the HPO project can be found online.
PMCID: PMC3965098  PMID: 24217912
2.  Finding Our Way through Phenotypes 
Deans, Andrew R. | Lewis, Suzanna E. | Huala, Eva | Anzaldo, Salvatore S. | Ashburner, Michael | Balhoff, James P. | Blackburn, David C. | Blake, Judith A. | Burleigh, J. Gordon | Chanet, Bruno | Cooper, Laurel D. | Courtot, Mélanie | Csösz, Sándor | Cui, Hong | Dahdul, Wasila | Das, Sandip | Dececchi, T. Alexander | Dettai, Agnes | Diogo, Rui | Druzinsky, Robert E. | Dumontier, Michel | Franz, Nico M. | Friedrich, Frank | Gkoutos, George V. | Haendel, Melissa | Harmon, Luke J. | Hayamizu, Terry F. | He, Yongqun | Hines, Heather M. | Ibrahim, Nizar | Jackson, Laura M. | Jaiswal, Pankaj | James-Zorn, Christina | Köhler, Sebastian | Lecointre, Guillaume | Lapp, Hilmar | Lawrence, Carolyn J. | Le Novère, Nicolas | Lundberg, John G. | Macklin, James | Mast, Austin R. | Midford, Peter E. | Mikó, István | Mungall, Christopher J. | Oellrich, Anika | Osumi-Sutherland, David | Parkinson, Helen | Ramírez, Martín J. | Richter, Stefan | Robinson, Peter N. | Ruttenberg, Alan | Schulz, Katja S. | Segerdell, Erik | Seltmann, Katja C. | Sharkey, Michael J. | Smith, Aaron D. | Smith, Barry | Specht, Chelsea D. | Squires, R. Burke | Thacker, Robert W. | Thessen, Anne | Fernandez-Triana, Jose | Vihinen, Mauno | Vize, Peter D. | Vogt, Lars | Wall, Christine E. | Walls, Ramona L. | Westerfeld, Monte | Wharton, Robert A. | Wirkner, Christian S. | Woolley, James B. | Yoder, Matthew J. | Zorn, Aaron M. | Mabee, Paula
PLoS Biology  2015;13(1):e1002033.
Imagine if we could compute across phenotype data as easily as genomic data; this article calls for efforts to realize this vision and discusses the potential benefits.
Despite a large and multifaceted effort to understand the vast landscape of phenotypic data, their current form inhibits productive data analysis. The lack of a community-wide, consensus-based, human- and machine-interpretable language for describing phenotypes and their genomic and environmental contexts is perhaps the most pressing scientific bottleneck to integration across many key fields in biology, including genomics, systems biology, development, medicine, evolution, ecology, and systematics. Here we survey the current phenomics landscape, including data resources and handling, and the progress that has been made to accurately capture relevant data descriptions for phenotypes. We present an example of the kind of integration across domains that computable phenotypes would enable, and we call upon the broader biology community, publishers, and relevant funding agencies to support efforts to surmount today's data barriers and facilitate analytical reproducibility.
PMCID: PMC4285398  PMID: 25562316
3.  The main pulmonary artery in adults: a controlled multicenter study with assessment of echocardiographic reference values, and the frequency of dilatation and aneurysm in Marfan syndrome 
Echocardiographic upper normal limits of both main pulmonary artery (MPA) diameters (MPA-d) and ratio of MPA to aortic root diameter (MPA-r) are not defined in healthy adults. Accordingly, frequency of MPA dilatation based on echocardiography remains to be assessed in adults with Marfan syndrome (MFS).
We enrolled 123 normal adults (72 men, 52 women aged 42 ± 14 years) and 98 patients with MFS (42 men, 56 women aged 39 ± 14 years) in a retrospective cross-sectional observational controlled study in four tertiary care centers. We defined outcome measures including upper normal limits of MPA-d and MPA-r as 95 quantile of normal persons, MPA dilatation as diameters > upper normal limits, MPA aneurysm as diameters >4 cm, and indication for surgery as MPA diameters >6 cm.
MPA diameters revealed normal distribution without correlation to age, sex, body weight, body height, body mass index and body surface area. The upper normal limit was 2.6 cm (95% confidence interval (CI) =2.44-2.76 cm) for MPA-d, and 1.05 (95% CI = .86–1.24) for MPA-r. MPA dilatation presented in 6 normal persons (4.9%) and in 68 MFS patients (69.4%; P < .001), MPA aneurysm presented only in MFS (15 patients; 15.3%; P < .001), and no patient required surgery. Mean MPA-r were increased in MFS (P < .001), but ratios >1.05 were equally frequent in 7 normal persons (5%) and in 8 MFS patients (10.5%; P = .161). MPA-r related to aortic root diameters (P = .042), reduced left ventricular ejection fraction (P = .006), and increased pulmonary artery systolic pressures (P = .040). No clinical manifestations of MFS and no FBN1 mutation characteristics related to MPA diameters.
We established 2.6 cm for MPA-d and 1.05 for MPA-r as upper normal limits. MFS exhibits a high prevalence of MPA dilatation and aneurysm. However, patients may require MPA surgery only in scarce circumstances, most likely because formation of marked MPA aneurysm may require LV dysfunction and increased PASP.
PMCID: PMC4272795  PMID: 25491897
Pulmonary artery; Marfan syndrome; FBN1; Echocardiography; Reference values
5.  Pseudoautosomal Region 1 Length Polymorphism in the Human Population 
PLoS Genetics  2014;10(11):e1004578.
The human sex chromosomes differ in sequence, except for the pseudoautosomal regions (PAR) at the terminus of the short and the long arms, denoted as PAR1 and PAR2. The boundary between PAR1 and the unique X and Y sequences was established during the divergence of the great apes. During a copy number variation screen, we noted a paternally inherited chromosome X duplication in 15 independent families. Subsequent genomic analysis demonstrated that an insertional translocation of X chromosomal sequence into theMa Y chromosome generates an extended PAR. The insertion is generated by non-allelic homologous recombination between a 548 bp LTR6B repeat within the Y chromosome PAR1 and a second LTR6B repeat located 105 kb from the PAR boundary on the X chromosome. The identification of the reciprocal deletion on the X chromosome in one family and the occurrence of the variant in different chromosome Y haplogroups demonstrate this is a recurrent genomic rearrangement in the human population. This finding represents a novel mechanism shaping sex chromosomal evolution.
Author Summary
The human sex chromosomes differ in sequence, except for homologous sequences at both ends, termed the pseudoautosomal regions (PAR1 and PAR2). PAR enables the pairing of chromosomes Y and X during meiosis. The PARs are located at the termini of respectively the short and long arms of chromosomes X and Y. The observation of gradual shortening of the Y chromosome over evolutionary time has led to speculations that the Y chromosome is “doomed to extinction.” However, the Y chromosome has been shaped over evolution not only by the loss of genes, but also by addition of genes as a result of interchromosomal exchanges. In this work, we identified males with a duplication on chromosome Xp22.33 of about 136 kb as an incidental finding during a copy number variation screen. We demonstrate that the duplicon is an insertional translocation due to non-allelic homologous recombination from the X to the Y chromosome that is flanked by a long terminal repeat (LTR6B). We show this translocation event has occurred independently multiple times and that the duplicated region recombines with the X chromosome. Therefore, the duplicated region represents an extension of the pseudoautosomal region, representing a novel mechanism shaping sex chromosomal evolution in humans.
PMCID: PMC4222609  PMID: 25375121
6.  Doubly heterozygous LMNA and TTN mutations revealed by exome sequencing in a severe form of dilated cardiomyopathy 
European Journal of Human Genetics  2013;21(10):1105-1111.
Familial dilated cardiomyopathy (DCM) is a heterogeneous disease; although 30 disease genes have been discovered, they explain only no more than half of all cases; in addition, the causes of intra-familial variability in DCM have remained largely unknown. In this study, we exploited the use of whole-exome sequencing (WES) to investigate the causes of clinical variability in an extended family with 14 affected subjects, four of whom showed particular severe manifestations of cardiomyopathy requiring heart transplantation in early adulthood. This analysis, followed by confirmative conventional sequencing, identified the mutation p.K219T in the lamin A/C gene in all 14 affected patients. An additional variant in the gene for titin, p.L4855F, was identified in the severely affected patients. The age for heart transplantation was substantially less for LMNA:p.K219T/TTN:p.L4855F double heterozygotes than that for LMNA:p.K219T single heterozygotes. Myocardial specimens of doubly heterozygote individuals showed increased nuclear length, sarcomeric disorganization, and myonuclear clustering compared with samples from single heterozygotes. In conclusion, our results show that WES can be used for the identification of causal and modifier variants in families with variable manifestations of DCM. In addition, they not only indicate that LMNA and TTN mutational status may be useful in this family for risk stratification in individuals at risk for DCM but also suggest titin as a modifier for DCM.
PMCID: PMC3778353  PMID: 23463027
familial dilated cardiomyopathy; lamin A/C; titin; whole-exome sequencing; modifying variant
7.  Genomic data sharing for translational research and diagnostics 
Genome Medicine  2014;6(9):78.
Translational genomics is changing, not only in the technology used but also in the sharing of data. The enormous potential for genomics technologies to improve patient care has been recognized, but it will not be reached unless powerful but secure data-sharing technologies are developed. A recent study demonstrates the power of federated queries, in which sequence variants can be searched simultaneously in files distributed over multiple centers.
PMCID: PMC4254431  PMID: 25473437
8.  Deletions of chromosomal regulatory boundaries are associated with congenital disease 
Genome Biology  2014;15(9):423.
Recent data from genome-wide chromosome conformation capture analysis indicate that the human genome is divided into conserved megabase-sized self-interacting regions called topological domains. These topological domains form the regulatory backbone of the genome and are separated by regulatory boundary elements or barriers. Copy-number variations can potentially alter the topological domain architecture by deleting or duplicating the barriers and thereby allowing enhancers from neighboring domains to ectopically activate genes causing misexpression and disease, a mutational mechanism that has recently been termed enhancer adoption.
We use the Human Phenotype Ontology database to relate the phenotypes of 922 deletion cases recorded in the DECIPHER database to monogenic diseases associated with genes in or adjacent to the deletions. We identify combinations of tissue-specific enhancers and genes adjacent to the deletion and associated with phenotypes in the corresponding tissue, whereby the phenotype matched that observed in the deletion. We compare this computationally with a gene-dosage pathomechanism that attempts to explain the deletion phenotype based on haploinsufficiency of genes located within the deletions. Up to 11.8% of the deletions could be best explained by enhancer adoption or a combination of enhancer adoption and gene-dosage effects.
Our results suggest that enhancer adoption caused by deletions of regulatory boundaries may contribute to a substantial minority of copy-number variation phenotypes and should thus be taken into account in their medical interpretation.
Electronic supplementary material
The online version of this article (doi:10.1186/s13059-014-0423-1) contains supplementary material, which is available to authorized users.
PMCID: PMC4180961  PMID: 25315429
9.  Screening for single nucleotide variants, small indels and exon deletions with a next-generation sequencing based gene panel approach for Usher syndrome 
Usher syndrome is an autosomal recessive disorder characterized both by deafness and blindness. For the three clinical subtypes of Usher syndrome causal mutations in altogether 12 genes and a modifier gene have been identified. Due to the genetic heterogeneity of Usher syndrome, the molecular analysis is predestined for a comprehensive and parallelized analysis of all known genes by next-generation sequencing (NGS) approaches. We describe here the targeted enrichment and deep sequencing for exons of Usher genes and compare the costs and workload of this approach compared to Sanger sequencing. We also present a bioinformatics analysis pipeline that allows us to detect single-nucleotide variants, short insertions and deletions, as well as copy number variations of one or more exons on the same sequence data. Additionally, we present a flexible in silico gene panel for the analysis of sequence variants, in which newly identified genes can easily be included. We applied this approach to a cohort of 44 Usher patients and detected biallelic pathogenic mutations in 35 individuals and monoallelic mutations in eight individuals of our cohort. Thirty-nine of the sequence variants, including two heterozygous deletions comprising several exons of USH2A, have not been reported so far. Our NGS-based approach allowed us to assess single-nucleotide variants, small indels, and whole exon deletions in a single test. The described diagnostic approach is fast and cost-effective with a high molecular diagnostic yield.
PMCID: PMC4190874  PMID: 25333064
Gene panel diagnostics; next-generation sequencing; Usher syndrome
10.  Getting Ready for the Human Phenome Project: The 2012 Forum of the Human Variome Project 
Human mutation  2013;34(4):661-666.
A forum of the Human Variome Project (HVP) was held as a satellite to the 2012 Annual Meeting of the American Society of Human Genetics in San Francisco, California. The theme of this meeting was “Getting Ready for the Human Phenome Project.” Understanding the genetic contribution to both rare single-gene “Mendelian” disorders and more complex common diseases will require integration of research efforts among many fields and better defined phenotypes. The HVP is dedicated to bringing together researchers and research populations throughout the world to provide the resources to investigate the impact of genetic variation on disease. To this end, there needs to be a greater sharing of phenotype and genotype data. For this to occur, many databases that currently exist will need to become interoperable to allow for the combining of cohorts with similar phenotypes to increase statistical power for studies attempting to identify novel disease genes or causative genetic variants. Improved systems and tools that enhance the collection of phenotype data from clinicians are urgently needed. This meeting begins the HVP’s effort toward this important goal.
PMCID: PMC4130157  PMID: 23401191
meeting report; database; phenotype; database interoperability
11.  Walking the interactome for candidate prioritization in exome sequencing studies of Mendelian diseases 
Bioinformatics  2014;30(22):3215-3222.
Motivation: Whole-exome sequencing (WES) has opened up previously unheard of possibilities for identifying novel disease genes in Mendelian disorders, only about half of which have been elucidated to date. However, interpretation of WES data remains challenging.
Results: Here, we analyze protein–protein association (PPA) networks to identify candidate genes in the vicinity of genes previously implicated in a disease. The analysis, using a random-walk with restart (RWR) method, is adapted to the setting of WES by developing a composite variant-gene relevance score based on the rarity, location and predicted pathogenicity of variants and the RWR evaluation of genes harboring the variants. Benchmarking using known disease variants from 88 disease-gene families reveals that the correct gene is ranked among the top 10 candidates in ≥50% of cases, a figure which we confirmed using a prospective study of disease genes identified in 2012 and PPA data produced before that date. We implement our method in a freely available Web server, ExomeWalker, that displays a ranked list of candidates together with information on PPAs, frequency and predicted pathogenicity of the variants to allow quick and effective searches for candidates that are likely to reward closer investigation.
Availability and implementation:
PMCID: PMC4221119  PMID: 25078397
12.  Clinical phenotype-based gene prioritization: an initial study using semantic similarity and the human phenotype ontology 
BMC Bioinformatics  2014;15(1):248.
Exome sequencing is a promising method for diagnosing patients with a complex phenotype. However, variant interpretation relative to patient phenotype can be challenging in some scenarios, particularly clinical assessment of rare complex phenotypes. Each patient’s sequence reveals many possibly damaging variants that must be individually assessed to establish clear association with patient phenotype. To assist interpretation, we implemented an algorithm that ranks a given set of genes relative to patient phenotype. The algorithm orders genes by the semantic similarity computed between phenotypic descriptors associated with each gene and those describing the patient. Phenotypic descriptor terms are taken from the Human Phenotype Ontology (HPO) and semantic similarity is derived from each term’s information content.
Model validation was performed via simulation and with clinical data. We simulated 33 Mendelian diseases with 100 patients per disease. We modeled clinical conditions by adding noise and imprecision, i.e. phenotypic terms unrelated to the disease and terms less specific than the actual disease terms. We ranked the causative gene against all 2488 HPO annotated genes. The median causative gene rank was 1 for the optimal and noise cases, 12 for the imprecision case, and 60 for the imprecision with noise case. Additionally, we examined a clinical cohort of subjects with hearing impairment. The disease gene median rank was 22. However, when also considering the patient’s exome data and filtering non-exomic and common variants, the median rank improved to 3.
Semantic similarity can rank a causative gene highly within a gene list relative to patient phenotype characteristics, provided that imprecision is mitigated. The clinical case results suggest that phenotype rank combined with variant analysis provides significant improvement over the individual approaches. We expect that this combined prioritization approach may increase accuracy and decrease effort for clinical genetic diagnosis.
Electronic supplementary material
The online version of this article (doi:10.1186/1471-2105-15-248) contains supplementary material, which is available to authorized users.
PMCID: PMC4117966  PMID: 25047600
Clinical; Phenotype; Exome; Genome; Informatics
13.  The influence of disease categories on gene candidate predictions from model organism phenotypes 
Journal of Biomedical Semantics  2014;5(Suppl 1):S4.
The molecular etiology is still to be identified for about half of the currently described Mendelian diseases in humans, thereby hindering efforts to find treatments or preventive measures. Advances, such as new sequencing technologies, have led to increasing amounts of data becoming available with which to address the problem of identifying disease genes. Therefore, automated methods are needed that reliably predict disease gene candidates based on available data. We have recently developed Exomiser as a tool for identifying causative variants from exome analysis results by filtering and prioritising using a number of criteria including the phenotype similarity between the disease and mouse mutants involving the gene candidates. Initial investigations revealed a variation in performance for different medical categories of disease, due in part to a varying contribution of the phenotype scoring component.
In this study, we further analyse the performance of our cross-species phenotype matching algorithm, and examine in more detail the reasons why disease gene filtering based on phenotype data works better for certain disease categories than others. We found that in addition to misleading phenotype alignments between species, some disease categories are still more amenable to automated predictions than others, and that this often ties in with community perceptions on how well the organism works as model.
In conclusion, our automated disease gene candidate predictions are highly dependent on the organism used for the predictions and the disease category being studied. Future work on computational disease gene prediction using phenotype data would benefit from methods that take into account the disease category and the source of model organism data.
PMCID: PMC4108905  PMID: 25093073
14.  Phenotype Ontologies and Cross-Species Analysis for Translational Research 
PLoS Genetics  2014;10(4):e1004268.
The use of model organisms as tools for the investigation of human genetic variation has significantly and rapidly advanced our understanding of the aetiologies underlying hereditary traits. However, while equivalences in the DNA sequence of two species may be readily inferred through evolutionary models, the identification of equivalence in the phenotypic consequences resulting from comparable genetic variation is far from straightforward, limiting the value of the modelling paradigm. In this review, we provide an overview of the emerging statistical and computational approaches to objectively identify phenotypic equivalence between human and model organisms with examples from the vertebrate models, mouse and zebrafish. Firstly, we discuss enrichment approaches, which deem the most frequent phenotype among the orthologues of a set of genes associated with a common human phenotype as the orthologous phenotype, or phenolog, in the model species. Secondly, we introduce and discuss computational reasoning approaches to identify phenotypic equivalences made possible through the development of intra- and interspecies ontologies. Finally, we consider the particular challenges involved in modelling neuropsychiatric disorders, which illustrate many of the remaining difficulties in developing comprehensive and unequivocal interspecies phenotype mappings.
PMCID: PMC3974665  PMID: 24699242
15.  Marfan syndrome: an update of genetics, medical and surgical management 
Heart  2007;93(6):755-760.
PMCID: PMC1955191  PMID: 17502658
16.  Construction and accessibility of a cross-species phenotype ontology along with gene annotations for biomedical research 
F1000Research  2014;2:30.
Phenotype analyses, e.g. investigating metabolic processes, tissue formation, or organism behavior, are an important element of most biological and medical research activities. Biomedical researchers are making increased use of ontological standards and methods to capture the results of such analyses, with one focus being the comparison and analysis of phenotype information between species.
We have generated a cross-species phenotype ontology for human, mouse and zebrafish that contains classes from the Human Phenotype Ontology, Mammalian Phenotype Ontology, and generated classes for zebrafish phenotypes. We also provide up-to-date annotation data connecting human genes to phenotype classes from the generated ontology. We have included the data generation pipeline into our continuous integration system ensuring stable and up-to-date releases.
This article describes the data generation process and is intended to help interested researchers access both the phenotype annotation data and the associated cross-species phenotype ontology. The resource described here can be used in sophisticated semantic similarity and gene set enrichment analyses for phenotype data across species. The stable releases of this resource can be obtained from
PMCID: PMC3799545  PMID: 24358873
17.  Observational Cohort Study of Ventricular Arrhythmia in Adults with Marfan Syndrome Caused by FBN1 Mutations 
PLoS ONE  2013;8(12):e81281.
Marfan syndrome is associated with ventricular arrhythmia but risk factors including FBN1 mutation characteristics require elucidation.
Methods and Results
We performed an observational cohort study of 80 consecutive adults (30 men, 50 women aged 42±15 years) with Marfan syndrome caused by FBN1 mutations. We assessed ventricular arrhythmia on baseline ambulatory electrocardiography as >10 premature ventricular complexes per hour (>10 PVC/h), as ventricular couplets (Couplet), or as non-sustained ventricular tachycardia (nsVT), and during 31±18 months of follow-up as ventricular tachycardia (VT) events (VTE) such as sudden cardiac death (SCD), and sustained ventricular tachycardia (sVT). We identified >10 PVC/h in 28 (35%), Couplet/nsVT in 32 (40%), and VTE in 6 patients (8%), including 3 with SCD (4%). PVC>10/h, Couplet/nsVT, and VTE exhibited increased N-terminal pro–brain natriuretic peptide serum levels(P<.001). All arrhythmias related to increased NT-proBNP (P<.001), where PVC>10/h and Couplet/nsVT also related to increased indexed end-systolic LV diameters (P = .024 and P = .020), to moderate mitral valve regurgitation (P = .018 and P = .003), and to prolonged QTc intervals (P = .001 and P = .006), respectively. Moreover, VTE related to mutations in exons 24–32 (P = .021). Kaplan–Meier analysis corroborated an association of VTE with increased NT-proBNP (P<.001) and with mutations in exons 24–32 (P<.001).
Marfan syndrome with causative FBN1 mutations is associated with an increased risk for arrhythmia, and affected persons may require life-long monitoring. Ventricular arrhythmia on electrocardiography, signs of myocardial dysfunction and mutations in exons 24–32 may be risk factors of VTE.
PMCID: PMC3862481  PMID: 24349050
18.  Bayesian ontology querying for accurate and noise-tolerant semantic searches 
Bioinformatics  2012;28(19):2502-2508.
Motivation: Ontologies provide a structured representation of the concepts of a domain of knowledge as well as the relations between them. Attribute ontologies are used to describe the characteristics of the items of a domain, such as the functions of proteins or the signs and symptoms of disease, which opens the possibility of searching a database of items for the best match to a list of observed or desired attributes. However, naive search methods do not perform well on realistic data because of noise in the data, imprecision in typical queries and because individual items may not display all attributes of the category they belong to.
Results:: We present a method for combining ontological analysis with Bayesian networks to deal with noise, imprecision and attribute frequencies and demonstrate an application of our method as a differential diagnostic support system for human genetics.
Availability: We provide an implementation for the algorithm and the benchmark at
Contact: or
Supplementary Information: Supplementary Material for this article is available at Bioinformatics online.
PMCID: PMC3463114  PMID: 22843981
19.  Loss-of-function mutations in the IL-21 receptor gene cause a primary immunodeficiency syndrome 
A primary immunodeficiency syndrome caused by loss-of-function mutations in the IL-21 receptor exhibits impaired B, T, and NK cell function.
Primary immunodeficiencies (PIDs) represent exquisite models for studying mechanisms of human host defense. In this study, we report on two unrelated kindreds, with two patients each, who had cryptosporidial infections associated with chronic cholangitis and liver disease. Using exome and candidate gene sequencing, we identified two distinct homozygous loss-of-function mutations in the interleukin-21 receptor gene (IL21R; c.G602T, p.Arg201Leu and c.240_245delCTGCCA, p.C81_H82del). The IL-21RArg201Leu mutation causes aberrant trafficking of the IL-21R to the plasma membrane, abrogates IL-21 ligand binding, and leads to defective phosphorylation of signal transducer and activator of transcription 1 (STAT1), STAT3, and STAT5. We observed impaired IL-21–induced proliferation and immunoglobulin class-switching in B cells, cytokine production in T cells, and NK cell cytotoxicity. Our study indicates that human IL-21R deficiency causes an immunodeficiency and highlights the need for early diagnosis and allogeneic hematopoietic stem cell transplantation in affected children.
PMCID: PMC3600901  PMID: 23440042
20.  Filtering for Compound Heterozygous Sequence Variants in Non-Consanguineous Pedigrees 
PLoS ONE  2013;8(8):e70151.
The identification of disease-causing mutations in next-generation sequencing (NGS) data requires efficient filtering techniques. In patients with rare recessive diseases, compound heterozygosity of pathogenic mutations is the most likely inheritance model if the parents are non-consanguineous. We developed a web-based compound heterozygous filter that is suited for data from NGS projects and that is easy to use for non-bioinformaticians. We analyzed the power of compound heterozygous mutation filtering by deriving background distributions for healthy individuals from different ethnicities and studied the effectiveness in trios as well as more complex pedigree structures. While usually more then 30 genes harbor potential compound heterozygotes in single exomes, this number can be markedly reduced with every additional member of the pedigree that is included in the analysis. In a real data set with exomes of four family members, two sisters affected by Mabry syndrome and their healthy parents, the disease-causing gene PIGO, which harbors the pathogenic compound heterozygous variants, could be readily identified. Compound heterozygous filtering is an efficient means to reduce the number of candidate mutations in studies aiming at identifying recessive disease genes in non-consanguineous families. A web-server is provided to make this filtering strategy available at
PMCID: PMC3734130  PMID: 23940540
21.  Estimating exome genotyping accuracy by comparing to data from large scale sequencing projects 
Genome Medicine  2013;5(7):69.
With exome sequencing becoming a tool for mutation detection in routine diagnostics there is an increasing need for platform-independent methods of quality control. We present a genotype-weighted metric that allows comparison of all the variant calls of an exome to a high-quality reference dataset of an ethnically matched population. The exome-wide genotyping accuracy is estimated from the distance to this reference set, and does not require any further knowledge about data generation or the bioinformatics involved. The distances of our metric are visualized by non-metric multidimensional scaling and serve as an intuitive, standardizable score for the quality assessment of exome data.
PMCID: PMC3978951  PMID: 23902830
22.  MouseFinder: candidate disease genes from mouse phenotype data 
Human Mutation  2012;33(5):858-866.
Mouse phenotype data represents a valuable resource for the identification of disease-associated genes, especially where the molecular basis is unknown and there is no clue to the candidate gene’s function, pathway involvement or expression pattern. However, until recently these data have not been systematically used due to difficulties in mapping between clinical features observed in humans and mouse phenotype annotations. Here, we describe a semantic approach to solve this problem and demonstrate highly significant recall of known disease-gene associations and orthology relationships. A web application (MouseFinder; has been developed to allow users to search the results of our whole-phenome comparison of human and mouse. We demonstrate its use in identifying ARTN as a strong candidate gene within the 1p34.1-p32 mapped locus for a hereditary form of ptosis.
PMCID: PMC3327758  PMID: 22331800
phenotype; candidate disease genes; model organism; mouse
23.  Phenotypic overlap in the contribution of individual genes to CNV pathogenicity revealed by cross-species computational analysis of single-gene mutations in humans, mice and zebrafish 
Disease Models & Mechanisms  2012;6(2):358-372.
Numerous disease syndromes are associated with regions of copy number variation (CNV) in the human genome and, in most cases, the pathogenicity of the CNV is thought to be related to altered dosage of the genes contained within the affected segment. However, establishing the contribution of individual genes to the overall pathogenicity of CNV syndromes is difficult and often relies on the identification of potential candidates through manual searches of the literature and online resources. We describe here the development of a computational framework to comprehensively search phenotypic information from model organisms and single-gene human hereditary disorders, and thus speed the interpretation of the complex phenotypes of CNV disorders. There are currently more than 5000 human genes about which nothing is known phenotypically but for which detailed phenotypic information for the mouse and/or zebrafish orthologs is available. Here, we present an ontology-based approach to identify similarities between human disease manifestations and the mutational phenotypes in characterized model organism genes; this approach can therefore be used even in cases where there is little or no information about the function of the human genes. We applied this algorithm to detect candidate genes for 27 recurrent CNV disorders and identified 802 gene-phenotype associations, approximately half of which involved genes that were previously reported to be associated with individual phenotypic features and half of which were novel candidates. A total of 431 associations were made solely on the basis of model organism phenotype data. Additionally, we observed a striking, statistically significant tendency for individual disease phenotypes to be associated with multiple genes located within a single CNV region, a phenomenon that we denote as pheno-clustering. Many of the clusters also display statistically significant similarities in protein function or vicinity within the protein-protein interaction network. Our results provide a basis for understanding previously un-interpretable genotype-phenotype correlations in pathogenic CNVs and for mobilizing the large amount of model organism phenotype data to provide insights into human genetic disorders.
PMCID: PMC3597018  PMID: 23104991
24.  Construction and accessibility of a cross-species phenotype ontology along with gene annotations for biomedical research 
F1000Research  2013;2:30.
Phenotype analyses, e.g. investigating metabolic processes, tissue formation, or organism behavior, are an important element of most biological and medical research activities. Biomedical researchers are making increased use of ontological standards and methods to capture the results of such analyses, with one focus being the comparison and analysis of phenotype information between species.
We have generated a cross-species phenotype ontology for human, mouse and zebra fish that contains zebrafish phenotypes. We also provide up-to-date annotation data connecting human genes to phenotype classes from the generated ontology. We have included the data generation pipeline into our continuous integration system ensuring stable and up-to-date releases.
This article describes the data generation process and is intended to help interested researchers access both the phenotype annotation data and the associated cross-species phenotype ontology. The resource described here can be used in sophisticated semantic similarity and gene set enrichment analyses for phenotype data across species. The stable releases of this resource can be obtained from
PMCID: PMC3799545  PMID: 24358873

Results 1-25 (44)