PMCC PMCC

Search tips
Search criteria

Advanced
Results 1-25 (30)
 

Clipboard (0)
None

Select a Filter Below

Journals
Year of Publication
more »
2.  Next-generation sequencing (NGS) as a diagnostic tool for retinal degeneration reveals a much higher detection rate in early-onset disease 
Inherited retinal degeneration (IRD) is a common cause of visual impairment (prevalence ∼1/3500). There is considerable phenotype and genotype heterogeneity, making a specific diagnosis very difficult without molecular testing. We investigated targeted capture combined with next-generation sequencing using Nimblegen 12plex arrays and the Roche 454 sequencing platform to explore its potential for clinical diagnostics in two common types of IRD, retinitis pigmentosa and cone-rod dystrophy. 50 patients (36 unknowns and 14 positive controls) were screened, and pathogenic mutations were identified in 25% of patients in the unknown, with 53% in the early-onset cases. All patients with new mutations detected had an age of onset <21 years and 44% had a family history. Thirty-one percent of mutations detected were novel. A de novo mutation in rhodopsin was identified in one early-onset case without a family history. Bioinformatic pipelines were developed to identify likely pathogenic mutations and stringent criteria were used for assignment of pathogenicity. Analysis of sequencing metrics revealed significant variability in capture efficiency and depth of coverage. We conclude that targeted capture and next-generation sequencing are likely to be very useful in a diagnostic setting, but patients with earlier onset of disease are more likely to benefit from using this strategy. The mutation-detection rate suggests that many patients are likely to have mutations in novel genes.
doi:10.1038/ejhg.2012.172
PMCID: PMC3573204  PMID: 22968130
retinal degeneration; molecular diagnostics; next-generation sequencing
3.  Causes and Consequences of Chromatin Variation between Inbred Mice 
PLoS Genetics  2013;9(6):e1003570.
Variation at regulatory elements, identified through hypersensitivity to digestion by DNase I, is believed to contribute to variation in complex traits, but the extent and consequences of this variation are poorly characterized. Analysis of terminally differentiated erythroblasts in eight inbred strains of mice identified reproducible variation at approximately 6% of DNase I hypersensitive sites (DHS). Only 30% of such variable DHS contain a sequence variant predictive of site variation. Nevertheless, sequence variants within variable DHS are more likely to be associated with complex traits than those in non-variant DHS, and variants associated with complex traits preferentially occur in variable DHS. Changes at a small proportion (less than 10%) of variable DHS are associated with changes in nearby transcriptional activity. Our results show that whilst DNA sequence variation is not the major determinant of variation in open chromatin, where such variants exist they are likely to be causal for complex traits.
Author Summary
Regulatory sites of the genome affect gene expression and complex traits, including disease susceptibility. Variable regulatory sites are potentially interesting because they are a likely cause of phenotypic variation, providing a bridge between sequence and transcriptional variation. In this paper we identify regions of the genome where DNA is not wrapped up in chromatin (hence potentially regulatory) in eight inbred strains of mice. We compare sites that vary among strains and compare them to non-variable sites. We show that more than half of variable sites cannot be attributed to local sequence variation. Functional consequences (in terms of readily detectable changes in gene expression) are associated with less than 10% of variable DNase I hypersensitive sites. We show that variable sites are enriched for sequence variants contributing to complex traits in mice.
doi:10.1371/journal.pgen.1003570
PMCID: PMC3681629  PMID: 23785304
4.  Scaffolding low quality genomes using orthologous protein sequences 
Bioinformatics  2012;29(2):160-165.
Motivation: The ready availability of next-generation sequencing has led to a situation where it is easy to produce very fragmentary genome assemblies. We present a pipeline, SWiPS (Scaffolding With Protein Sequences), that uses orthologous proteins to improve low quality genome assemblies. The protein sequences are used as guides to scaffold existing contigs, while simultaneously allowing the gene structure to be predicted by homology.
Results: To perform, SWiPS does not depend on a high N50 or whole proteins being encoded on a single contig. We tested our algorithm on simulated next-generation data from Ciona intestinalis, real next-generation data from Drosophila melanogaster, a complex genome assembly of Homo sapiens and the low coverage Sanger sequence assembly of Callorhinchus milii. The improvements in N50 are of the order of ∼20% for the C.intestinalis and H.sapiens assemblies, which is significant, considering the large size of intergenic regions in these eukaryotes. Using the CEGMA pipeline to assess the gene space represented in the genome assemblies, the number of genes retrieved increased by >110% for C.milii and from 20 to 40% for C.intestinalis. The scaffold error rates are low: 85–90% of scaffolds are fully correct, and >95% of local contig joins are correct.
Availability: SWiPS is available freely for download at http://www.well.ox.ac.uk/∼yli142/swips.html.
Contact: yang.li@well.ox.ac.uk or copley@well.ox.ac.uk
doi:10.1093/bioinformatics/bts661
PMCID: PMC3546802  PMID: 23162087
5.  Dynamic and Physical Clustering of Gene Expression during Epidermal Barrier Formation in Differentiating Keratinocytes 
PLoS ONE  2009;4(10):e7651.
The mammalian epidermis is a continually renewing structure that provides the interface between the organism and an innately hostile environment. The keratinocyte is its principal cell. Keratinocyte proteins form a physical epithelial barrier, protect against microbial damage, and prepare immune responses to danger. Epithelial immunity is disordered in many common diseases and disordered epithelial differentiation underlies many cancers. In order to identify the genes that mediate epithelial development we used a tissue model of the skin derived from primary human keratinocytes. We measured global gene expression in triplicate at five times over the ten days that the keratinocytes took to fully differentiate. We identified 1282 gene transcripts that significantly changed during differentiation (false discovery rate <0.01%). We robustly grouped these transcripts by K-means clustering into modules with distinct temporal expression patterns, shared regulatory motifs, and biological functions. We found a striking cluster of late expressed genes that form the structural and innate immune defences of the epithelial barrier. Gene Ontology analyses showed that undifferentiated keratinocytes were characterised by genes for motility and the adaptive immune response. We systematically identified calcium-binding genes, which may operate with the epidermal calcium gradient to control keratinocyte division during skin repair. The results provide multiple novel insights into keratinocyte biology, in particular providing a comprehensive list of known and previously unrecognised major components of the epidermal barrier. The findings provide a reference for subsequent understanding of how the barrier functions in health and disease.
doi:10.1371/journal.pone.0007651
PMCID: PMC2766255  PMID: 19888454
6.  The animal in the genome: comparative genomics and evolution 
Comparisons between completely sequenced metazoan genomes have generally emphasized how similar their encoded protein content is, even when the comparison is between phyla. Given the manifest differences between phyla and, in particular, intuitive notions that some animals are more complex than others, this creates something of a paradox. Simplistic explanations have included arguments such as increased numbers of genes; greater numbers of protein products produced through alternative splicing; increased numbers of regulatory non-coding RNAs and increased complexity of the cis-regulatory code. An obvious value of complete genome sequences lies in their ability to provide us with inventories of such components. I examine progress being made in linking genome content to the pattern of animal evolution, and argue that the gap between genomic and phenotypic complexity can only be understood through the totality of interacting components.
doi:10.1098/rstb.2007.2235
PMCID: PMC2614226  PMID: 18192189
comparative genomics; evolution; Metazoa; transcription factors; ultraconserved regions
7.  Genome-wide Association of Hypoxia-inducible Factor (HIF)-1α and HIF-2α DNA Binding with Expression Profiling of Hypoxia-inducible Transcripts* 
The Journal of Biological Chemistry  2009;284(25):16767-16775.
Hypoxia-inducible factor (HIF) controls an extensive range of adaptive responses to hypoxia. To better understand this transcriptional cascade we performed genome-wide chromatin immunoprecipitation using antibodies to two major HIF-α subunits, and correlated the results with genome-wide transcript profiling. Within a tiled promoter array we identified 546 and 143 sequences that bound, respectively, to HIF-1α or HIF-2α at high stringency. Analysis of these sequences confirmed an identical core binding motif for HIF-1α and HIF-2α (RCGTG) but demonstrated that binding to this motif was highly selective, with binding enriched at distinct regions both upstream and downstream of the transcriptional start. Comparison of HIF-promoter binding data with bidirectional HIF-dependent changes in transcript expression indicated that whereas a substantial proportion of positive responses (>20% across all significantly regulated genes) are direct, HIF-dependent gene suppression is almost entirely indirect. Comparison of HIF-1α- versus HIF-2α-binding sites revealed that whereas some loci bound HIF-1α in isolation, many bound both isoforms with similar affinity. Despite high-affinity binding to multiple promoters, HIF-2α contributed to few, if any, of the transcriptional responses to acute hypoxia at these loci. Given emerging evidence for biologically distinct functions of HIF-1α versus HIF-2α understanding the mechanisms restricting HIF-2α activity will be of interest.
doi:10.1074/jbc.M901790200
PMCID: PMC2719312  PMID: 19386601
8.  POPE—a tool to aid high-throughput phylogenetic analysis 
Bioinformatics  2008;24(23):2778-2779.
Summary: POPE (Phylogeny, Ortholog and Paralog Extractor) provides an integrated platform for automatic ortholog identification. Intermediate steps can be visualized, modified and analyzed in order to assess and improve the underlying quality of orthology and paralogy assignments.
Availability: POPE is available for download from the website: http://www.well.ox.ac.uk/~tota/pope.
Contact: tota@well.ox.ac.uk
doi:10.1093/bioinformatics/btn533
PMCID: PMC2639271  PMID: 18849569
9.  A Common Genomic Framework for a Diverse Assembly of Plasmids in the Symbiotic Nitrogen Fixing Bacteria 
PLoS ONE  2008;3(7):e2567.
This work centres on the genomic comparisons of two closely-related nitrogen-fixing symbiotic bacteria, Rhizobium leguminosarum biovar viciae 3841 and Rhizobium etli CFN42. These strains maintain a stable genomic core that is also common to other rhizobia species plus a very variable and significant accessory component. The chromosomes are highly syntenic, whereas plasmids are related by fewer syntenic blocks and have mosaic structures. The pairs of plasmids p42f-pRL12, p42e-pRL11 and p42b-pRL9 as well large parts of p42c with pRL10 are shown to be similar, whereas the symbiotic plasmids (p42d and pRL10) are structurally unrelated and seem to follow distinct evolutionary paths. Even though purifying selection is acting on the whole genome, the accessory component is evolving more rapidly. This component is constituted largely for proteins for transport of diverse metabolites and elements of external origin. The present analysis allows us to conclude that a heterogeneous and quickly diversifying group of plasmids co-exists in a common genomic framework.
doi:10.1371/journal.pone.0002567
PMCID: PMC2434198  PMID: 18596979
10.  Dual Targeted Mitochondrial Proteins Are Characterized by Lower MTS Parameters and Total Net Charge 
PLoS ONE  2008;3(5):e2161.
Background
In eukaryotic cells, identical proteins can be located in different subcellular compartments (termed dual-targeted proteins).
Methodology/Principal Findings
We divided a reference set of mitochondrial proteins (published single gene studies) into two groups: i) Dual targeted mitochondrial proteins and ii) Exclusive mitochondrial proteins. Mitochondrial proteins were considered dual-targeted if they were also found or predicted to be localized to the cytosol, the nucleus, the endoplasmic reticulum (ER) or the peroxisome. We found that dual localized mitochondrial proteins have i) A weaker mitochondrial targeting sequence (MitoProtII score, hydrophobic moment and number of basic residues) and ii) a lower whole-protein net charge, when compared to exclusive mitochondrial proteins. We have also generated an annotation list of dual-targeted proteins within the predicted yeast mitochondrial proteome. This considerably large group of dual-localized proteins comprises approximately one quarter of the predicted mitochondrial proteome. We supported this prediction by experimental verification of a subgroup of the predicted dual targeted proteins.
Conclusions/Significance
Taken together, these results establish dual targeting as a widely abundant phenomenon that should affect our concepts of gene expression and protein function. Possible relationships between the MTS/mature sequence traits and protein dual targeting are discussed.
doi:10.1371/journal.pone.0002161
PMCID: PMC2367453  PMID: 18478128
11.  De-Orphaning the Structural Proteome through Reciprocal Comparison of Evolutionarily Important Structural Features 
PLoS ONE  2008;3(5):e2136.
Function prediction frequently relies on comparing genes or gene products to search for relevant similarities. Because the number of protein structures with unknown function is mushrooming, however, we asked here whether such comparisons could be improved by focusing narrowly on the key functional features of protein structures, as defined by the Evolutionary Trace (ET). Therefore a series of algorithms was built to (a) extract local motifs (3D templates) from protein structures based on ET ranking of residue importance; (b) to assess their geometric and evolutionary similarity to other structures; and (c) to transfer enzyme annotation whenever a plurality was reached across matches. Whereas a prototype had only been 80% accurate and was not scalable, here a speedy new matching algorithm enabled large-scale searches for reciprocal matches and thus raised annotation specificity to 100% in both positive and negative controls of 49 enzymes and 50 non-enzymes, respectively—in one case even identifying an annotation error—while maintaining sensitivity (∼60%). Critically, this Evolutionary Trace Annotation (ETA) pipeline requires no prior knowledge of functional mechanisms. It could thus be applied in a large-scale retrospective study of 1218 structural genomics enzymes and reached 92% accuracy. Likewise, it was applied to all 2935 unannotated structural genomics proteins and predicted enzymatic functions in 320 cases: 258 on first pass and 62 more on second pass. Controls and initial analyses suggest that these predictions are reliable. Thus the large-scale evolutionary integration of sequence-structure-function data, here through reciprocal identification of local, functionally important structural features, may contribute significantly to de-orphaning the structural proteome.
doi:10.1371/journal.pone.0002136
PMCID: PMC2362850  PMID: 18461181
12.  Asap: A Framework for Over-Representation Statistics for Transcription Factor Binding Sites 
PLoS ONE  2008;3(2):e1623.
Background
In studies of gene regulation the efficient computational detection of over-represented transcription factor binding sites is an increasingly important aspect. Several published methods can be used for testing whether a set of hypothesised co-regulated genes share a common regulatory regime based on the occurrence of the modelled transcription factor binding sites. However there is little or no information available for guiding the end users choice of method. Furthermore it would be necessary to obtain several different software programs from various sources to make a well-founded choice.
Methodology
We introduce a software package, Asap, for fast searching with position weight matrices that include several standard methods for assessing over-representation. We have compared the ability of these methods to detect over-represented transcription factor binding sites in artificial promoter sequences. Controlling all aspects of our input data we are able to identify the optimal statistics across multiple threshold values and for sequence sets containing different distributions of transcription factor binding sites.
Conclusions
We show that our implementation is significantly faster than more naïve scanning algorithms when searching with many weight matrices in large sequence sets. When comparing the various statistics, we show that those based on binomial over-representation and Fisher's exact test performs almost equally good and better than the others. An online server is available at http://servers.binf.ku.dk/asap/.
doi:10.1371/journal.pone.0001623
PMCID: PMC2229843  PMID: 18286180
13.  Evolution of a Complex Locus: Exon Gain, Loss and Divergence at the Gr39a Locus in Drosophila 
PLoS ONE  2008;3(1):e1513.
Background
Gene families typically evolve by gene duplication followed by the adoption of new or altered gene functions. A different way to evolve new but related functions is alternative splicing of existing exons of a complex gene. The chemosensory gene families of animals are characterised by numerous loci of related function. Alternative splicing has only rarely been reported in chemosensory loci, for example in 5 out of around 120 loci in Drosophila melanogaster. The gustatory receptor gene Gr39a has four large exons that are alternatively spliced with three small conserved exons. Recently the genome sequences of eleven additional species of Drosophila have become available allowing us to examine variation in the structure of the Gr39a locus across a wide phylogenetic range of fly species.
Methodology/Principal Findings
We describe a fifth exon and show that the locus has a complex evolutionary history with several duplications, pseudogenisations and losses of exons. PAML analyses suggested that the whole gene has a history of purifying selection, although this was less strong in exons which underwent duplication.
Conclusions/Significance
Estimates of functional divergence between exons were similar in magnitude to functional divergence between duplicated genes, suggesting that exon divergence is broadly equivalent to gene duplication.
doi:10.1371/journal.pone.0001513
PMCID: PMC2204066  PMID: 18231599
14.  MHC Adaptive Divergence between Closely Related and Sympatric African Cichlids 
PLoS ONE  2007;2(8):e734.
Background
The haplochromine cichlid species assemblages of Lake Malawi and Victoria represent some of the most important study systems in evolutionary biology. Identifying adaptive divergence between closely-related species can provide important insights into the processes that may have contributed to these spectacular radiations. Here, we studied a pair of sympatric Lake Malawi species, Pseudotropheus fainzilberi and P. emmiltos, whose reproductive isolation depends on olfactory communication. We tested the hypothesis that these species have undergone divergent selection at MHC class II genes, which are known to contribute to olfactory-based mate choice in other taxa.
Methodology/Principal Findings
Divergent selection on functional alleles was inferred from the higher genetic divergence at putative antigen binding sites (ABS) amino acid sequences than at putatively neutrally evolving sites at intron 1, exon 2 synonymous sequences and exon 2 amino acid residues outside the putative ABS. In addition, sympatric populations of these fish species differed significantly in communities of eukaryotic parasites.
Conclusions/Significance
We propose that local host-parasite coevolutionary dynamics may have driven adaptive divergence in MHC alleles, influencing odor-mediated mate choice and leading to reproductive isolation. These results provide the first evidence for a novel mechanism of adaptive speciation and the first evidence of adaptive divergence at the MHC in closely related African cichlid fishes.
doi:10.1371/journal.pone.0000734
PMCID: PMC1939875  PMID: 17710134
15.  Identification of Common Genetic Variation That Modulates Alternative Splicing 
PLoS Genetics  2007;3(6):e99.
Alternative splicing of genes is an efficient means of generating variation in protein function. Several disease states have been associated with rare genetic variants that affect splicing patterns. Conversely, splicing efficiency of some genes is known to vary between individuals without apparent ill effects. What is not clear is whether commonly observed phenotypic variation in splicing patterns, and hence potential variation in protein function, is to a significant extent determined by naturally occurring DNA sequence variation and in particular by single nucleotide polymorphisms (SNPs). In this study, we surveyed the splicing patterns of 250 exons in 22 individuals who had been previously genotyped by the International HapMap Project. We identified 70 simple cassette exon alternative splicing events in our experimental system; for six of these, we detected consistent differences in splicing pattern between individuals, with a highly significant association between splice phenotype and neighbouring SNPs. Remarkably, for five out of six of these events, the strongest correlation was found with the SNP closest to the intron–exon boundary, although the distance between these SNPs and the intron–exon boundary ranged from 2 bp to greater than 1,000 bp. Two of these SNPs were further investigated using a minigene splicing system, and in each case the SNPs were found to exert cis-acting effects on exon splicing efficiency in vitro. The functional consequences of these SNPs could not be predicted using bioinformatic algorithms. Our findings suggest that phenotypic variation in splicing patterns is determined by the presence of SNPs within flanking introns or exons. Effects on splicing may represent an important mechanism by which SNPs influence gene function.
Author Summary
Genetic variation, through its effects on gene expression, influences many aspects of the human phenotype. Understanding the impact of genetic variation on human disease risk has become a major goal for biomedical research and has the potential of revealing both novel disease mechanisms and novel functional elements controlling gene expression. Recent large-scale studies have suggested that a relatively high proportion of human genes show allele-specific variation in expression. Effects of common DNA polymorphisms on mRNA splicing are less well studied. Variation in splicing patterns is known to be tissue specific, and for a small number of genes has been shown to vary among individuals. What is not known is whether allele-specific splicing events are an important mechanism by which common genetic variation affects gene expression. In this study we show that allele-specific alternative splicing was observed in six out of 70 exon-skipping events. Sequence analysis of the relevant splice sites and of the regions surrounding single nucleotide polymorphisms correlated with the splicing events failed to identify any predictive bioinformatic signals. A genome-wide study of allele-specific splicing, using an experimental rather than a bioinformatic approach, is now required.
doi:10.1371/journal.pgen.0030099
PMCID: PMC1904363  PMID: 17571926
16.  Identification of Common Genetic Variation That Modulates Alternative Splicing 
PLoS Genetics  2007;3(6):e99.
Alternative splicing of genes is an efficient means of generating variation in protein function. Several disease states have been associated with rare genetic variants that affect splicing patterns. Conversely, splicing efficiency of some genes is known to vary between individuals without apparent ill effects. What is not clear is whether commonly observed phenotypic variation in splicing patterns, and hence potential variation in protein function, is to a significant extent determined by naturally occurring DNA sequence variation and in particular by single nucleotide polymorphisms (SNPs). In this study, we surveyed the splicing patterns of 250 exons in 22 individuals who had been previously genotyped by the International HapMap Project. We identified 70 simple cassette exon alternative splicing events in our experimental system; for six of these, we detected consistent differences in splicing pattern between individuals, with a highly significant association between splice phenotype and neighbouring SNPs. Remarkably, for five out of six of these events, the strongest correlation was found with the SNP closest to the intron–exon boundary, although the distance between these SNPs and the intron–exon boundary ranged from 2 bp to greater than 1,000 bp. Two of these SNPs were further investigated using a minigene splicing system, and in each case the SNPs were found to exert cis-acting effects on exon splicing efficiency in vitro. The functional consequences of these SNPs could not be predicted using bioinformatic algorithms. Our findings suggest that phenotypic variation in splicing patterns is determined by the presence of SNPs within flanking introns or exons. Effects on splicing may represent an important mechanism by which SNPs influence gene function.
Author Summary
Genetic variation, through its effects on gene expression, influences many aspects of the human phenotype. Understanding the impact of genetic variation on human disease risk has become a major goal for biomedical research and has the potential of revealing both novel disease mechanisms and novel functional elements controlling gene expression. Recent large-scale studies have suggested that a relatively high proportion of human genes show allele-specific variation in expression. Effects of common DNA polymorphisms on mRNA splicing are less well studied. Variation in splicing patterns is known to be tissue specific, and for a small number of genes has been shown to vary among individuals. What is not known is whether allele-specific splicing events are an important mechanism by which common genetic variation affects gene expression. In this study we show that allele-specific alternative splicing was observed in six out of 70 exon-skipping events. Sequence analysis of the relevant splice sites and of the regions surrounding single nucleotide polymorphisms correlated with the splicing events failed to identify any predictive bioinformatic signals. A genome-wide study of allele-specific splicing, using an experimental rather than a bioinformatic approach, is now required.
doi:10.1371/journal.pgen.0030099
PMCID: PMC1904363  PMID: 17571926
17.  Efficient Identification of Critical Residues Based Only on Protein Structure by Network Analysis 
PLoS ONE  2007;2(5):e421.
Despite the increasing number of published protein structures, and the fact that each protein's function relies on its three-dimensional structure, there is limited access to automatic programs used for the identification of critical residues from the protein structure, compared with those based on protein sequence. Here we present a new algorithm based on network analysis applied exclusively on protein structures to identify critical residues. Our results show that this method identifies critical residues for protein function with high reliability and improves automatic sequence-based approaches and previous network-based approaches. The reliability of the method depends on the conformational diversity screened for the protein of interest. We have designed a web site to give access to this software at http://bis.ifc.unam.mx/jamming/. In summary, a new method is presented that relates critical residues for protein function with the most traversed residues in networks derived from protein structures. A unique feature of the method is the inclusion of the conformational diversity of proteins in the prediction, thus reproducing a basic feature of the structure/function relationship of proteins.
doi:10.1371/journal.pone.0000421
PMCID: PMC1855080  PMID: 17502913
19.  Conflict between Translation Initiation and Elongation in Vertebrate Mitochondrial Genomes 
PLoS ONE  2007;2(2):e227.
The strand-biased mutation spectrum in vertebrate mitochondrial genomes results in an AC-rich L-strand and a GT-rich H-strand. Because the L-strand is the sense strand of 12 protein-coding genes out of the 13, the third codon position is overall strongly AC-biased. The wobble site of the anticodon of the 22 mitochondrial tRNAs is either U or G to pair with the most abundant synonymous codon, with only one exception. The wobble site of Met-tRNA is C instead of U, forming the Watson-Crick match with AUG instead of AUA, the latter being much more frequent than the former. This has been attributed to a compromise between translation initiation and elongation; i.e., AUG is not only a methionine codon, but also an initiation codon, and an anticodon matching AUG will increase the initiation rate. However, such an anticodon would impose selection against the use of AUA codons because AUA needs to be wobble-translated. According to this translation conflict hypothesis, AUA should be used relatively less frequently compared to UUA in the UUR codon family. A comprehensive analysis of mitochondrial genomes from a variety of vertebrate species revealed a general deficiency of AUA codons relative to UUA codons. In contrast, urochordate mitochondrial genomes with two tRNAMet genes with CAU and UAU anticodons exhibit increased AUA codon usage. Furthermore, six bivalve mitochondrial genomes with both of their tRNA-Met genes with a CAU anticodon have reduced AUA usage relative to three other bivalve mitochondrial genomes with one of their two tRNA-Met genes having a CAU anticodon and the other having a UAU anticodon. We conclude that the translation conflict hypothesis is empirically supported, and our results highlight the fine details of selection in shaping molecular evolution.
doi:10.1371/journal.pone.0000227
PMCID: PMC1794132  PMID: 17311091
20.  Insights into the Molecular Evolution of the PDZ/LIM Family and Identification of a Novel Conserved Protein Motif 
PLoS ONE  2007;2(2):e189.
The PDZ and LIM domain-containing protein family is encoded by a diverse group of genes whose phylogeny has currently not been analyzed. In mammals, ten genes are found that encode both a PDZ- and one or several LIM-domains. These genes are: ALP, RIL, Elfin (CLP36), Mystique, Enigma (LMP-1), Enigma homologue (ENH), ZASP (Cypher, Oracle), LMO7 and the two LIM domain kinases (LIMK1 and LIMK2). As conventional alignment and phylogenetic procedures of full-length sequences fell short of elucidating the evolutionary history of these genes, we started to analyze the PDZ and LIM domain sequences themselves. Using information from most sequenced eukaryotic lineages, our phylogenetic analysis is based on full-length cDNA-, EST-derived- and genomic- PDZ and LIM domain sequences of over 25 species, ranging from yeast to humans. Plant and protozoan homologs were not found. Our phylogenetic analysis identifies a number of domain duplication and rearrangement events, and shows a single convergent event during evolution of the PDZ/LIM family. Further, we describe the separation of the ALP and Enigma subfamilies in lower vertebrates and identify a novel consensus motif, which we call ‘ALP-like motif’ (AM). This motif is highly-conserved between ALP subfamily proteins of diverse organisms. We used here a combinatorial approach to define the relation of the PDZ and LIM domain encoding genes and to reconstruct their phylogeny. This analysis allowed us to classify the PDZ/LIM family and to suggest a meaningful model for the molecular evolution of the diverse gene architectures found in this multi-domain family.
doi:10.1371/journal.pone.0000189
PMCID: PMC1781342  PMID: 17285143
21.  New developments in the InterPro database 
Nucleic Acids Research  2007;35(Database issue):D224-D228.
InterPro is an integrated resource for protein families, domains and functional sites, which integrates the following protein signature databases: PROSITE, PRINTS, ProDom, Pfam, SMART, TIGRFAMs, PIRSF, SUPERFAMILY, Gene3D and PANTHER. The latter two new member databases have been integrated since the last publication in this journal. There have been several new developments in InterPro, including an additional reading field, new database links, extensions to the web interface and additional match XML files. InterPro has always provided matches to UniProtKB proteins on the website and in the match XML file on the FTP site. Additional matches to proteins in UniParc (UniProt archive) are now available for download in the new match XML files only. The latest InterPro release (13.0) contains more than 13 000 entries, covering over 78% of all proteins in UniProtKB. The database is available for text- and sequence-based searches via a webserver (), and for download by anonymous FTP (). The InterProScan search tool is now also available via a web service at .
doi:10.1093/nar/gkl841
PMCID: PMC1899100  PMID: 17202162
22.  Towards Alignment Independent Quantitative Assessment of Homology Detection 
PLoS ONE  2006;1(1):e113.
Identification of homologous proteins provides a basis for protein annotation. Sequence alignment tools reliably identify homologs sharing high sequence similarity. However, identification of homologs that share low sequence similarity remains a challenge. Lowering the cutoff value could enable the identification of diverged homologs, but also introduces numerous false hits. Methods are being continuously developed to minimize this problem. Estimation of the fraction of homologs in a set of protein alignments can help in the assessment and development of such methods, and provides the users with intuitive quantitative assessment of protein alignment results. Herein, we present a computational approach that estimates the amount of homologs in a set of protein pairs. The method requires a prevalent and detectable protein feature that is conserved between homologs. By analyzing the feature prevalence in a set of pairwise protein alignments, the method can estimate the number of homolog pairs in the set independently of the alignments' quality. Using the HomoloGene database as a standard of truth, we implemented this approach in a proteome-wide analysis. The results revealed that this approach, which is independent of the alignments themselves, works well for estimating the number of homologous proteins in a wide range of homology values. In summary, the presented method can accompany homology searches and method development, provides validation to search results, and allows tuning of tools and methods.
doi:10.1371/journal.pone.0000113
PMCID: PMC1762415  PMID: 17205117
23.  A High-Resolution Single Nucleotide Polymorphism Genetic Map of the Mouse Genome 
PLoS Biology  2006;4(12):e395.
High-resolution genetic maps are required for mapping complex traits and for the study of recombination. We report the highest density genetic map yet created for any organism, except humans. Using more than 10,000 single nucleotide polymorphisms evenly spaced across the mouse genome, we have constructed genetic maps for both outbred and inbred mice, and separately for males and females. Recombination rates are highly correlated in outbred and inbred mice, but show relatively low correlation between males and females. Differences between male and female recombination maps and the sequence features associated with recombination are strikingly similar to those observed in humans. Genetic maps are available from http://gscan.well.ox.ac.uk/#genetic_map and as supporting information to this publication.
A high-density SNP map based on outbred and inbred mice with male and female separation suggests a high degree of homology between mouse and human recombination.
doi:10.1371/journal.pbio.0040395
PMCID: PMC1635748  PMID: 17105354
24.  SMART 5: domains in the context of genomes and networks 
Nucleic Acids Research  2005;34(Database issue):D257-D260.
The Simple Modular Architecture Research Tool (SMART) is an online resource () used for protein domain identification and the analysis of protein domain architectures. Many new features were implemented to make SMART more accessible to scientists from different fields. The new ‘Genomic’ mode in SMART makes it easy to analyze domain architectures in completely sequenced genomes. Domain annotation has been updated with a detailed taxonomic breakdown and a prediction of the catalytic activity for 50 SMART domains is now available, based on the presence of essential amino acids. Furthermore, intrinsically disordered protein regions can be identified and displayed. The network context is now displayed in the results page for more than 350 000 proteins, enabling easy analyses of domain interactions.
doi:10.1093/nar/gkj079
PMCID: PMC1347442  PMID: 16381859
25.  The EH1 motif in metazoan transcription factors 
BMC Genomics  2005;6:169.
Background
The Engrailed Homology 1 (EH1) motif is a small region, believed to have evolved convergently in homeobox and forkhead containing proteins, that interacts with the Drosophila protein groucho (C. elegans unc-37, Human Transducin-like Enhancers of Split). The small size of the motif makes its reliable identification by computational means difficult. I have systematically searched the predicted proteomes of Drosophila, C. elegans and human for further instances of the motif.
Results
Using motif identification methods and database searching techniques, I delimit which homeobox and forkhead domain containing proteins also have likely EH1 motifs. I show that despite low database search scores, there is a significant association of the motif with transcription factor function. I further show that likely EH1 motifs are found in combination with T-Box, Zinc Finger and Doublesex domains as well as discussing other plausible candidate associations. I identify strong candidate EH1 motifs in basal metazoan phyla.
Conclusion
Candidate EH1 motifs exist in combination with a variety of transcription factor domains, suggesting that these proteins have repressor functions. The distribution of the EH1 motif is suggestive of convergent evolution, although in many cases, the motif has been conserved throughout bilaterian orthologs. Groucho mediated repression was established prior to the evolution of bilateria.
doi:10.1186/1471-2164-6-169
PMCID: PMC1310626  PMID: 16309560

Results 1-25 (30)