Search tips
Search criteria

Results 1-19 (19)

Clipboard (0)

Select a Filter Below

more »
Document Types
1.  Towards an Ontological Representation of Resistance: The Case of MRSa 
This paper addresses a family of issues surrounding the biological phenomenon of resistance and its representation in realist ontologies. The treatments of resistance terms in various existing ontologies are examined and found to be either overly narrow, internally inconsistent, or otherwise problematic. We propose a more coherent characterization of resistance in terms of what we shall call blocking dispositions, which are collections of mutually coordinated dispositions which are of such a sort that they cannot undergo simultaneous realization within a single bearer. A definition of ‘protective resistance’ is proposed for use in the Infectious Disease Ontology (IDO) and we show how this definition can be used to characterize the antibiotic resistance in Methicillin-Resistant Staphylococcus aureus (MRSa). The ontological relations between entities in our MRSa case study are used alongside a series of logical inference rules to illustrate logical reasoning about resistance. A description logic representation of blocking dispositions is also provided. We demonstrate that our characterization of resistance is sufficiently general to cover two other cases of resistance in the infectious disease domain involving HIV and malaria.
PMCID: PMC2930208  PMID: 20206294
Infectious Disease Ontology; Basic Formal Ontology; MRSa
2.  The Antibody Genetics of Multiple Sclerosis: Comparing Next-Generation Sequencing to Sanger Sequencing 
We previously identified a distinct mutation pattern in the antibody genes of B cells isolated from cerebrospinal fluid (CSF) that can identify patients who have relapsing-remitting multiple sclerosis (RRMS) and patients with clinically isolated syndromes who will convert to RRMS. This antibody gene signature (AGS) was developed using Sanger sequencing of single B cells. While potentially helpful to patients, Sanger sequencing is not an assay that can be practically deployed in clinical settings. In order to provide AGS evaluations to patients as part of their diagnostic workup, we developed protocols to generate AGS scores using next-generation DNA sequencing (NGS) on CSF-derived cell pellets without the need to isolate single cells. This approach has the potential to increase the coverage of the B-cell population being analyzed, reduce the time needed to generate AGS scores, and may improve the overall performance of the AGS approach as a diagnostic test in the future. However, no investigations have focused on whether NGS-based repertoires will properly reflect antibody gene frequencies and somatic hypermutation patterns defined by Sanger sequencing. To address this issue, we isolated paired CSF samples from eight patients who either had MS or were at risk to develop MS. Here, we present data that antibody gene frequencies and somatic hypermutation patterns are similar in Sanger and NGS-based antibody repertoires from these paired CSF samples. In addition, AGS scores derived from the NGS database correctly identified the patients who initially had or subsequently converted to RRMS, with precision similar to that of the Sanger sequencing approach. Further investigation of the utility of the AGS in predicting conversion to MS using NGS-derived antibody repertoires in a larger cohort of patients is warranted.
PMCID: PMC4165282  PMID: 25278930
multiple sclerosis; B cell; antibody; Roche 454; next-generation sequencing
3.  A genome-wide association study of variants associated with acquisition of Staphylococcus aureus bacteremia in a healthcare setting 
Humans vary in their susceptibility to acquiring Staphylococcus aureus infection, and research suggests that there is a genetic basis for this variability. Several recent genome-wide association studies (GWAS) have identified variants that may affect susceptibility to infectious diseases, demonstrating the potential value of GWAS in this arena.
We conducted a GWAS to identify common variants associated with acquisition of S. aureus bacteremia (SAB) resulting from healthcare contact. We performed a logistic regression analysis to compare patients with healthcare contact who developed SAB (361 cases) to patients with healthcare contact in the same hospital who did not develop SAB (699 controls), testing 542,410 SNPs and adjusting for age (by decade), sex, and 6 significant principal components from our EIGENSTRAT analysis. Additionally, we evaluated the joint effect of the host and pathogen genomes in association with severity of SAB infection via logistic regression, including an interaction of host SNP with bacterial genotype, and adjusting for age (by decade), sex, the 6 significant principal components, and dialysis status. Bonferroni corrections were applied in both analyses to control for multiple comparisons.
Ours is the first study that has attempted to evaluate the entire human genome for variants potentially involved in the acquisition or severity of SAB. Although this study identified no common variant of large effect size to have genome-wide significance for association with either the risk of acquiring SAB or severity of SAB, the variant (rs2043436) most significantly associated with severity of infection is located in a biologically plausible candidate gene (CDON, a member of the immunoglobulin family) and may warrant further study.
The genetic architecture underlying SAB is likely to be complex. Future investigations using larger samples, narrowed phenotypes, and advances in both genotyping and analytical methodologies will be important tools for identifying causative variants for this common and serious cause of healthcare-associated infection.
PMCID: PMC3928605  PMID: 24524581
Genomics; Genome-wide association study; Case–control study; Staphylococcus aureus; Bacteremia; Gram-positive bacterial infections; Polymorphism, single-nucleotide; Infections; Nosocomial; Cross infection
4.  Design and Evaluation of a Bacterial Clinical Infectious Diseases Ontology 
With antimicrobial resistance increasing worldwide, there is a great need to use automated antimicrobial decision support systems (ADSSs) to lower antimicrobial resistance rates by promoting appropriate antimicrobial use. However, they are infrequently used mostly because of their poor interoperability with different health information technologies. Ontologies can augment portable ADSSs by providing an explicit knowledge representation for biomedical entities and their relationships, helping to standardize and integrate heterogeneous data resources. We developed a bacterial clinical infectious diseases ontology (BCIDO) using Protégé-OWL. BCIDO defines a controlled terminology for clinical infectious diseases along with domain knowledge commonly used in hospital settings for clinical infectious disease treatment decision-making. BCIDO has 599 classes and 2355 object properties. Terms were imported from or mapped to Systematized Nomenclature of Medicine, Unified Medical Language System, RxNorm and National Center for Bitechnology Information Organismal Classification where possible. Domain expert evaluation using the “laddering” technique, ontology visualization, and clinical notes and scenarios, confirmed the correctness and potential usefulness of BCIDO.
PMCID: PMC3900194  PMID: 24551353
5.  CD19-targeted T cells rapidly induce molecular remissions in adults with chemotherapy-refractory acute lymphoblastic leukemia 
Science translational medicine  2013;5(177):177ra38.
Adults with relapsed B-acute lymphoblastic leukemia (ALL) have a dismal prognosis. Only those patients able to achieve a second remission with no minimal residual disease (MRD−) have a hope for long-term survival in the context of a subsequent allogeneic hematopoietic stem cell transplantation (allo-HSCT). We have treated 5 relapsed B-ALL subjects with autologous T cells expressing a CD19-specific CD28/CD3ζ second generation dual-signaling chimeric antigen receptor (CAR) termed 19-28z. All patients with persistent morphological disease or MRD+ disease upon T cell infusion demonstrated rapid tumor eradication and achieved MRD-negative complete remissions as assessed by deep sequencing PCR. Therapy was well tolerated although significant cytokine elevations, specifically observed in those patients with morphologic evidence of disease at the time of treatment, required lymphotoxic steroid therapy to ameliorate cytokine-mediated toxicities. Significantly, cytokine elevations directly correlated to tumor burden at the time of CAR modified T cell infusions. Tumor cells from one patient with relapsed disease after CAR modified T cell therapy, ineligible for additional allo-HSCT therapy, exhibited persistent expression of CD19 and sensitivity to autologous 19-28z T cell mediated cytotoxicity suggesting potential clinical benefit of additional CAR modified T cell infusions. These results demonstrate the marked anti-tumor efficacy of 19-28z CAR modified T cells in patients with relapsed/refractory B-ALL and the reliability of this novel therapy to induce profound molecular remissions, an ideal bridge to potentially curative therapy with subsequent allo-HSCT.
PMCID: PMC3742551  PMID: 23515080
6.  Ontology for Vector Surveillance and Management 
Journal of medical entomology  2013;50(1):1-14.
Ontologies, which are made up by standardized and defined controlled vocabulary terms and their interrelationships, are comprehensive and readily searchable repositories for knowledge in a given domain. The Open Biomedical Ontologies (OBO) Foundry was initiated in 2001 with the aims of becoming an “umbrella” for life-science ontologies and promoting the use of ontology development best practices. A software application (OBO-Edit; *.obo file format) was developed to facilitate ontology development and editing. The OBO Foundry now comprises over 100 ontologies and candidate ontologies, including the NCBI organismal classification ontology (NCBITaxon), the Mosquito Insecticide Resistance Ontology (MIRO), the Infectious Disease Ontology (IDO), the IDOMAL malaria ontology, and ontologies for mosquito gross anatomy and tick gross anatomy. We previously developed a disease data management system for dengue and malaria control programs, which incorporated a set of information trees built upon ontological principles, including a “term tree” to promote the use of standardized terms. In the course of doing so, we realized that there were substantial gaps in existing ontologies with regards to concepts, processes, and, especially, physical entities (e.g., vector species, pathogen species, and vector surveillance and management equipment) in the domain of surveillance and management of vectors and vector-borne pathogens. We therefore produced an ontology for vector surveillance and management, focusing on arthropod vectors and vector-borne pathogens with relevance to humans or domestic animals, and with special emphasis on content to support operational activities through inclusion in databases, data management systems, or decision support systems. The Vector Surveillance and Management Ontology (VSMO) includes >2,200 unique terms, of which the vast majority (>80%) were newly generated during the development of this ontology. One core feature of the VSMO is the linkage, through the has_vector relation, of arthropod species to the pathogenic microorganisms for which they serve as biological vectors. We also recognized and addressed a potential roadblock for use of the VSMO by the vector-borne disease community: the difficulty in extracting information from OBO-Edit ontology files (*.obo files) and exporting the information to other file formats. A novel ontology explorer tool was developed to facilitate extraction and export of information from the VSMO *.obo file into lists of terms and their associated unique IDs in *.txt or *.csv file formats. These lists can then be imported into a database or data management system for use as select lists with predefined terms. This is an important step to ensure that the knowledge contained in our ontology can be put into practical use.
PMCID: PMC3695545  PMID: 23427646
ontology; vector; pathogen; surveillance; management
7.  Haplotype Association Mapping Identifies a Candidate Gene Region in Mice Infected With Staphylococcus aureus 
G3: Genes|Genomes|Genetics  2012;2(6):693-700.
Exposure to Staphylococcus aureus has a variety of outcomes, from asymptomatic colonization to fatal infection. Strong evidence suggests that host genetics play an important role in susceptibility, but the specific host genetic factors involved are not known. The availability of genome-wide single nucleotide polymorphism (SNP) data for inbred Mus musculus strains means that haplotype association mapping can be used to identify candidate susceptibility genes. We applied haplotype association mapping to Perlegen SNP data and kidney bacterial counts from Staphylococcus aureus-infected mice from 13 inbred strains and detected an associated block on chromosome 7. Strong experimental evidence supports the result: a separate study demonstrated the presence of a susceptibility locus on chromosome 7 using consomic mice. The associated block contains no genes, but lies within the gene cluster of the 26-member extended kallikrein gene family, whose members have well-recognized roles in the generation of antimicrobial peptides and the regulation of inflammation. Efficient mixed-model association (EMMA) testing of all SNPs with two alleles and located within the gene cluster boundaries finds two significant associations: one of the three polymorphisms defining the associated block and one in the gene closest to the block, Klk1b11. In addition, we find that 7 of the 26 kallikrein genes are differentially expressed between susceptible and resistant mice, including the Klk1b11 gene. These genes represent a promising set of candidate genes influencing susceptibility to Staphylococcus aureus.
PMCID: PMC3362298  PMID: 22690378
host genetic susceptibility; infectious disease; kallikrein gene family
8.  Hematopoietic Cell Types: Prototype for a Revised Cell Ontology 
The Cell Ontology (CL) aims for the representation of in vivo and in vitro cell types from all of biology. The CL is a candidate reference ontology of the OBO Foundry and requires extensive revision to bring it up to current standards for biomedical ontologies, both in its structure and its coverage of various subfields of biology. We have now addressed the specific content of one area of the CL, the section of the ontology dealing with hematopoietic cells. This section has been extensively revised to improve its content and eliminate multiple inheritance in the asserted hierarchy, and the groundwork was laid for structuring the hematopoietic cell type terms as cross-products incorporating logical definitions built from relationships to external ontologies, such as the Protein Ontology and the Gene Ontology. The methods and improvements to the CL in this area represent a paradigm for improvement of the entire ontology over time.
PMCID: PMC2892030  PMID: 20123131
ontology; hematopoietic cells; immunology
9.  Memory B cells from a subset of treatment naïve relapsing remitting multiple sclerosis patients elicit CD4+ T cell proliferation and IFN-γ production in response to MBP and MOG 
European journal of immunology  2010;40(10):2942-2956.
Recent evidence suggests that B and T cell interactions may be paramount in relapsing remitting multiple sclerosis (RRMS) disease pathogenesis. We hypothesized that memory B cell pools from RRMS patients may specifically harbor a subset of potent neuro-antigen presenting cells that support neuro-antigen reactive T cell proliferation and cytokine secretion. To test this hypothesis, we compared CD80 and HLA-DR expression, IL-10 and LTα secretion, neuro-antigen binding capacity, and neuro-antigen presentation by memory B cells from RRMS patients to naïve B cells from RRMS patients and to memory and naïve B cells from healthy donors (HD). We identified memory B cells from some RRMS patients that elicited CD4+ T cell proliferation and IFN-γ secretion in response to myelin basic protein (MBP) and myelin oligodendrocyte glycoprotein (MOG). Notwithstanding the fact that the phenotypic parameters that promote efficient antigen presentation were observed to be similar between RRMS and HD memory B cells, a corresponding capability to elicit CD4+ T cell proliferation in response to MBP and MOG was not observed in HD memory B cells. Our results demonstrate for the first time that the memory B cell pool in RRMS harbors neuro-antigen specific B cells that can activate T cells.
PMCID: PMC3072802  PMID: 20812237
multiple sclerosis; B cells; autoimmunity; antigen presentation
10.  Logical Development of the Cell Ontology 
BMC Bioinformatics  2011;12:6.
The Cell Ontology (CL) is an ontology for the representation of in vivo cell types. As biological ontologies such as the CL grow in complexity, they become increasingly difficult to use and maintain. By making the information in the ontology computable, we can use automated reasoners to detect errors and assist with classification. Here we report on the generation of computable definitions for the hematopoietic cell types in the CL.
Computable definitions for over 340 CL classes have been created using a genus-differentia approach. These define cell types according to multiple axes of classification such as the protein complexes found on the surface of a cell type, the biological processes participated in by a cell type, or the phenotypic characteristics associated with a cell type. We employed automated reasoners to verify the ontology and to reveal mistakes in manual curation. The implementation of this process exposed areas in the ontology where new cell type classes were needed to accommodate species-specific expression of cellular markers. Our use of reasoners also inferred new relationships within the CL, and between the CL and the contributing ontologies. This restructured ontology can be used to identify immune cells by flow cytometry, supports sophisticated biological queries involving cells, and helps generate new hypotheses about cell function based on similarities to other cell types.
Use of computable definitions enhances the development of the CL and supports the interoperability of OBO ontologies.
PMCID: PMC3024222  PMID: 21208450
11.  Two Genes on A/J Chromosome 18 Are Associated with Susceptibility to Staphylococcus aureus Infection by Combined Microarray and QTL Analyses 
PLoS Pathogens  2010;6(9):e1001088.
Although it has recently been shown that A/J mice are highly susceptible to Staphylococcus aureus sepsis as compared to C57BL/6J, the specific genes responsible for this differential phenotype are unknown. Using chromosome substitution strains (CSS), we found that loci on chromosomes 8, 11, and 18 influence susceptibility to S. aureus sepsis in A/J mice. We then used two candidate gene selection strategies to identify genes on these three chromosomes associated with S. aureus susceptibility, and targeted genes identified by both gene selection strategies. First, we used whole genome transcription profiling to identify 191 (56 on chr. 8, 100 on chr. 11, and 35 on chr. 18) genes on our three chromosomes of interest that are differentially expressed between S. aureus-infected A/J and C57BL/6J. Second, we identified two significant quantitative trait loci (QTL) for survival post-infection on chr. 18 using N2 backcross mice (F1 [C18A]×C57BL/6J). Ten genes on chr. 18 (March3, Cep120, Chmp1b, Dcp2, Dtwd2, Isoc1, Lman1, Spire1, Tnfaip8, and Seh1l) mapped to the two significant QTL regions and were also identified by the expression array selection strategy. Using real-time PCR, 6 of these 10 genes (Chmp1b, Dtwd2, Isoc1, Lman1, Tnfaip8, and Seh1l) showed significantly different expression levels between S. aureus-infected A/J and C57BL/6J. For two (Tnfaip8 and Seh1l) of these 6 genes, siRNA-mediated knockdown of gene expression in S. aureus–challenged RAW264.7 macrophages induced significant changes in the cytokine response (IL-1 β and GM-CSF) compared to negative controls. These cytokine response changes were consistent with those seen in S. aureus-challenged peritoneal macrophages from CSS 18 mice (which contain A/J chromosome 18 but are otherwise C57BL/6J), but not C57BL/6J mice. These findings suggest that two genes, Tnfaip8 and Seh1l, may contribute to susceptibility to S. aureus in A/J mice, and represent promising candidates for human genetic susceptibility studies.
Author Summary
Staphylococcus aureus has a wide spectrum of human infection, ranging from asymptomatic nasal carriage to overwhelming sepsis and death. Mouse models offer an attractive strategy for investigating complex diseases such as S. aureus infections. A/J mice are highly susceptible to S. aureus infection compared with C57BL/6J mice. We showed that genes on chromosomes 8, 11, and 18 in A/J are responsible for susceptibility to S. aureus by using chromosome substitution strains (CSS). From the ∼4200 genes on these three chromosomes, we identified 191 which were differentially expressed between A/J and C57BL/6J when challenged with S. aureus. Next, we identified two significant QTLs on chromosome 18 that are associated with susceptibility to S. aureus infection in N2 backcross mice. Ten genes (March3, Cep120, Chmp1b, Dcp2, Dtwd2, Isoc1, Lman1, Spire1, Tnfaip8, and Seh1l) mapped to the two significant QTLs and were differentially expressed between A/J and C57BL/6J. One gene on each QTL, Tnfaip8 and Seh1l, affected expression of cytokines in mouse macrophages exposed to S. aureus. These cytokine response patterns were consistent with those seen in S. aureus-challenged peritoneal macrophages from CSS 18, but not C57BL/6J. Tnfaip8 and Seh1l are strong candidates for genes influencing susceptibility to S. aureus of A/J mice.
PMCID: PMC2932726  PMID: 20824097
12.  Conserved cryptic recombination signals in Vκ gene segments are cleaved in small pre-B cells 
BMC Immunology  2009;10:37.
The cleavage of recombination signals (RS) at the boundaries of immunoglobulin V, D, and J gene segments initiates the somatic generation of the antigen receptor genes expressed by B lymphocytes. RS contain a conserved heptamer and nonamer motif separated by non-conserved spacers of 12 or 23 nucleotides. Under physiologic conditions, V(D)J recombination follows the "12/23 rule" to assemble functional antigen-receptor genes, i.e., cleavage and recombination occur only between RS with dissimilar spacer types. Functional, cryptic RS (cRS) have been identified in VH gene segments; these VH cRS were hypothesized to facilitate self-tolerance by mediating VH → VHDJH replacements. At the Igκ locus, however, secondary, de novo rearrangements can delete autoreactive VκJκ joins. Thus, under the hypothesis that V-embedded cRS are conserved to facilitate self-tolerance by mediating V-replacement rearrangements, there would be little selection for Vκ cRS. Recent studies have demonstrated that VH cRS cleavage is only modestly more efficient than V(D)J recombination in violation of the 12/23 rule and first occurs in pro-B cells unable to interact with exogenous antigens. These results are inconsistent with a model of cRS cleavage during autoreactivity-induced VH gene replacement.
To test the hypothesis that cRS are absent from Vκ gene segments, a corollary of the hypothesis that the need for tolerizing VH replacements is responsible for the selection pressure to maintain VH cRS, we searched for cRS in mouse Vκ gene segments using a statistical model of RS. Scans of 135 mouse Vκ gene segments revealed highly conserved cRS that were shown to be cleaved in the 103/BCL2 cell line and mouse bone marrow B cells. Analogous to results for VH cRS, we find that Vκ cRS are conserved at multiple locations in Vκ gene segments and are cleaved in pre-B cells.
Our results, together with those for VH cRS, support a model of cRS cleavage in which cleavage is independent of BCR-specificity. Our results are inconsistent with the hypothesis that cRS are conserved solely to support receptor editing. The extent to which these sequences are conserved, and their pattern of conservation, suggest that they may serve an as yet unidentified purpose.
PMCID: PMC2711918  PMID: 19555491
13.  An improved ontological representation of dendritic cells as a paradigm for all cell types 
BMC Bioinformatics  2009;10:70.
Recent increases in the volume and diversity of life science data and information and an increasing emphasis on data sharing and interoperability have resulted in the creation of a large number of biological ontologies, including the Cell Ontology (CL), designed to provide a standardized representation of cell types for data annotation. Ontologies have been shown to have significant benefits for computational analyses of large data sets and for automated reasoning applications, leading to organized attempts to improve the structure and formal rigor of ontologies to better support computation. Currently, the CL employs multiple is_a relations, defining cell types in terms of histological, functional, and lineage properties, and the majority of definitions are written with sufficient generality to hold across multiple species. This approach limits the CL's utility for computation and for cross-species data integration.
To enhance the CL's utility for computational analyses, we developed a method for the ontological representation of cells and applied this method to develop a dendritic cell ontology (DC-CL). DC-CL subtypes are delineated on the basis of surface protein expression, systematically including both species-general and species-specific types and optimizing DC-CL for the analysis of flow cytometry data. We avoid multiple uses of is_a by linking DC-CL terms to terms in other ontologies via additional, formally defined relations such as has_function.
This approach brings benefits in the form of increased accuracy, support for reasoning, and interoperability with other ontology resources. Accordingly, we propose our method as a general strategy for the ontological representation of cells. DC-CL is available from .
PMCID: PMC2662812  PMID: 19243617
14.  Multiple, conserved cryptic recombination signals in VH gene segments: detection of cleavage products only in pro–B cells 
The Journal of Experimental Medicine  2007;204(13):3195-3208.
Receptor editing is believed to play the major role in purging newly formed B cell compartments of autoreactivity by the induction of secondary V(D)J rearrangements. In the process of immunoglobulin heavy (H) chain editing, these secondary rearrangements are mediated by direct VH-to-JH joining or cryptic recombination signals (cRSs) within VH gene segments. Using a statistical model of RS, we have identified potential cRSs within VH gene segments at conserved sites flanking complementarity-determining regions 1 and 2. These cRSs are active in extrachromosomal recombination assays and cleaved during normal B cell development. Cleavage of multiple VH cRSs was observed in the bone marrow of C57BL/6 and RAG2:GFP and μMT congenic animals, and we determined that cRS cleavage efficiencies are 30–50-fold lower than a physiological RS. cRS signal ends are abundant in pro–B cells, including those recovered from μMT mice, but undetectable in pre– or immature B cells. Thus, VH cRS cleavage regularly occurs before the generation of functional preBCR and BCR. Conservation of cRSs distal from the 3′ end of VH gene segments suggests a function for these cryptic signals other than VH gene replacement.
PMCID: PMC2150985  PMID: 18056287
15.  V(D)J Recombinase-Mediated Processing of Coding Junctions at Cryptic Recombination Signal Sequences in Peripheral T Cells during Human Development1 
V(D)J recombinase mediates rearrangements at immune loci and cryptic recombination signal sequences (cRSS), resulting in a variety of genomic rearrangements in normal lymphocytes and leukemic cells from children and adults. The frequency at which these rearrangements occur and their potential pathologic consequences are developmentally dependent. To gain insight into V(D)J recombinase-mediated events during human development, we investigated 265 coding junctions associated with cRSS sites at the hypoxanthine-guanine phosphoribosyltransferase (HPRT) locus in peripheral T cells from 111 children during the late stages of fetal development through early adolescence. We observed a number of specific V(D)J recombinase processing features that were both age and gender dependent. In particular, TdT-mediated nucleotide insertions varied depending on age and gender, including percentage of coding junctions containing N-nucleotide inserts, predominance of GC nucleotides, and presence of inverted repeats (Pr-nucleotides) at processed coding ends. In addition, the extent of exonucleolytic processing of coding ends was inversely related to age. We also observed a coding-partner-dependent difference in exonucleolytic processing and an age-specific difference in the subtypes of V(D)J-mediated events. We investigated these age- and gender-specific differences with recombination signal information content analysis of the cRSS sites in the human HPRT locus to gain insight into the mechanisms mediating these developmentally specific V(D)J recombinase-mediated rearrangements in humans.
PMCID: PMC1937029  PMID: 17015725
16.  Reassignment of the murine 3'TRDD1 recombination signal sequence 
Immunogenetics  2006;58(11):895-903.
T cell receptor genes are assembled in developing T lymphocytes from discrete V, D and J genes by a site-specific somatic rearrangement mechanism. A flanking recombination signal, composed of a conserved heptamer and a semi-conserved nonamer separated by 12 or 23 variable nucleotides, targets the activity of the rearrangement machinery to the adjoining V, D and J genes. Following rearrangement of V, D or J genes, their respective recombination signals are ligated together. Although these signal joints are allegedly invariant, created by the head-to-head abuttal of the heptamers, some do exhibit junctional diversity. Recombination signals were initially identified by comparison and alignment of germ-line sequences with the sequence of rearranged genes. However, their overall low level of sequence conservation makes their characterization solely from sequence data difficult. Recently, computational analysis unravelled correlations between nucleotides at several positions scattered within the spacer and recombination activity, so that it is now possible to identify putative recombination signals and determine and predict their recombination efficiency. In this paper, we analyzed the variability introduced in signal joints generated after rearrangement of the TRDD1 and TRDD2 genes in murine thymocytes. The recurrent presence of identical nucleotides inserted in these signal joints led us to reconsider the location and sequence of the TRDD1 recombination signal. By combining molecular characterization and computational analysis, we show that the functional TRDD1 recombination signal is shifted inside the putative coding sequence of the TRDD1 gene, and consequently that this gene is shorter than indicated in the databases.
PMCID: PMC1876511  PMID: 17021860
Animals; Base Sequence; Gene Rearrangement, T-Lymphocyte; genetics; Mice; Molecular Sequence Data; Proteins; genetics; Recombination, Genetic; genetics; Sequence Analysis, DNA; Thymus Gland; immunology; T cell receptor gene; V(D)J recombination; recombination signal; signal joint; junctional diversity
17.  Prospective Estimation of Recombination Signal Efficiency and Identification of Functional Cryptic Signals in the Genome by Statistical Modeling 
The recombination signals (RS) that guide V(D)J recombination are phylogenetically conserved but retain a surprising degree of sequence variability, especially in the nonamer and spacer. To characterize RS variability, we computed the position-wise information, a measure correlated with sequence conservation, for each nucleotide position in an RS alignment and demonstrate that most position-wise information is present in the RS heptamers and nonamers. We have previously demonstrated significant correlations between RS positions and here show that statistical models of the correlation structure that underlies RS variability efficiently identify physiologic and cryptic RS and accurately predict the recombination efficiencies of natural and synthetic RS. In scans of mouse and human genomes, these models identify a highly conserved family of repetitive DNA as an unexpected source of frequent, cryptic RS that rearrange both in extrachromosomal substrates and in their genomic context.
PMCID: PMC2193808  PMID: 12538660
recombination signal sequence; cryptic recombination signal; recombination efficiency; recombination signal models; illegitimate V(D)J recombination
18.  Identification and utilization of arbitrary correlations in models of recombination signal sequences 
Genome Biology  2002;3(12):research0072.1-research0072.20.
A significant challenge in bioinformatics is to develop methods for detecting and modeling patterns in variable DNA sequence sites. Current approaches sometimes perform poorly when positions in the site do not independently affect protein binding. A statistical technique has been devloped for modeling the correlation structure in variable DNA sequence sites.
A significant challenge in bioinformatics is to develop methods for detecting and modeling patterns in variable DNA sequence sites, such as protein-binding sites in regulatory DNA. Current approaches sometimes perform poorly when positions in the site do not independently affect protein binding. We developed a statistical technique for modeling the correlation structure in variable DNA sequence sites. The method places no restrictions on the number of correlated positions or on their spatial relationship within the site. No prior empirical evidence for the correlation structure is necessary.
We applied our method to the recombination signal sequences (RSS) that direct assembly of B-cell and T-cell antigen-receptor genes via V(D)J recombination. The technique is based on model selection by cross-validation and produces models that allow computation of an information score for any signal-length sequence. We also modeled RSS using order zero and order one Markov chains. The scores from all models are highly correlated with measured recombination efficiencies, but the models arising from our technique are better than the Markov models at discriminating RSS from non-RSS.
Our model-development procedure produces models that estimate well the recombinogenic potential of RSS and are better at RSS recognition than the order zero and order one Markov models. Our models are, therefore, valuable for studying the regulation of both physiologic and aberrant V(D)J recombination. The approach could be equally powerful for the study of promoter and enhancer elements, splice sites, and other DNA regulatory sites that are highly variable at the level of individual nucleotide positions.
PMCID: PMC151174  PMID: 12537561
19.  A Functional Analysis of the Spacer of V(D)J Recombination Signal Sequences 
PLoS Biology  2003;1(1):e1.
During lymphocyte development, V(D)J recombination assembles antigen receptor genes from component V, D, and J gene segments. These gene segments are flanked by a recombination signal sequence (RSS), which serves as the binding site for the recombination machinery. The murine Jβ2.6 gene segment is a recombinationally inactive pseudogene, but examination of its RSS reveals no obvious reason for its failure to recombine. Mutagenesis of the Jβ2.6 RSS demonstrates that the sequences of the heptamer, nonamer, and spacer are all important. Strikingly, changes solely in the spacer sequence can result in dramatic differences in the level of recombination. The subsequent analysis of a library of more than 4,000 spacer variants revealed that spacer residues of particular functional importance are correlated with their degree of conservation. Biochemical assays indicate distinct cooperation between the spacer and heptamer/nonamer along each step of the reaction pathway. The results suggest that the spacer serves not only to ensure the appropriate distance between the heptamer and nonamer but also regulates RSS activity by providing additional RAG:RSS interaction surfaces. We conclude that while RSSs are defined by a “digital” requirement for absolutely conserved nucleotides, the quality of RSS function is determined in an “analog” manner by numerous complex interactions between the RAG proteins and the less-well conserved nucleotides in the heptamer, the nonamer, and, importantly, the spacer. Those modulatory effects are accurately predicted by a new computational algorithm for “RSS information content.” The interplay between such binary and multiplicative modes of interactions provides a general model for analyzing protein–DNA interactions in various biological systems.
Spacers not only ensure that the distance between the nonamer and heptamer is correct but they also regulate recombination activity by providing protein-binding sites along the DNA sequences that affect recombination
PMCID: PMC212687  PMID: 14551903

Results 1-19 (19)