A great amount of data has been accumulated on genetic variations in the human genome, but we still do not know much about how the genetic variations affect gene function. In particular, little is known about the distribution of nonsense polymorphisms in human genes despite their drastic effects on gene products.
To detect polymorphisms affecting gene function, we analyzed all publicly available polymorphisms in a database for single nucleotide polymorphisms (dbSNP build 125) located in the exons of 36,712 known and predicted protein-coding genes that were defined in an annotation project of all human genes and transcripts (H-InvDB ver3.8). We found a total of 252,555 single nucleotide polymorphisms (SNPs) and 8,479 insertion and deletions in the representative transcripts in these genes. The SNPs located in ORFs include 40,484 synonymous and 53,754 nonsynonymous SNPs, and 1,258 SNPs that were predicted to be nonsense SNPs or read-through SNPs. We estimated the density of nonsense SNPs to be 0.85×10−3 per site, which is lower than that of nonsynonymous SNPs (2.1×10−3 per site). On average, nonsense SNPs were located 250 codons upstream of the original termination codon, with the substitution occurring most frequently at the first codon position. Of the nonsense SNPs, 581 were predicted to cause nonsense-mediated decay (NMD) of transcripts that would prevent translation. We found that nonsense SNPs causing NMD were more common in genes involving kinase activity and transport. The remaining 602 nonsense SNPs are predicted to produce truncated polypeptides, with an average truncation of 75 amino acids. In addition, 110 read-through SNPs at termination codons were detected.
Our comprehensive exploration of nonsense polymorphisms showed that nonsense SNPs exist at a lower density than nonsynonymous SNPs, suggesting that nonsense mutations have more severe effects than amino acid changes. The correspondence of nonsense SNPs to known pathological variants suggests that phenotypic effects of nonsense SNPs have been reported for only a small fraction of nonsense SNPs, and that nonsense SNPs causing NMD are more likely to be involved in phenotypic variations. These nonsense SNPs may include pathological variants that have not yet been reported. These data are available from Transcript View of H-InvDB and VarySysDB (http://h-invitational.jp/varygene/).
The BARD1 gene encodes for the BRCA1-associated RING domain (BARD1) protein. Germ line and somatic mutations in BARD1 are found in sporadic breast, ovarian and uterine cancers. There is a plethora of single nucleotide polymorphisms (SNPs) which may or may not be involved in the onset of female cancers. Hence, before planning a larger population study, it is advisable to sort out the possible functional SNPs. To accomplish this goal, data available in the dbSNP database and different computer programs can be used. To the best of our knowledge, until now there has been no such study on record for the BARD1 gene. Therefore, this study was undertaken to find the functional nsSNPs in BARD1.
2.85% of all SNPs in the dbSNP database were present in the coding regions. SIFT predicted 11 out of 50 nsSNPs as not tolerable and PolyPhen assessed 27 out of 50 nsSNPs as damaging. FastSNP revealed that the rs58253676 SNP in the 3′ UTR may have splicing regulator and enhancer functions. In the 5′ UTR, rs17489363 and rs17426219 may alter the transcriptional binding site. The intronic region SNP rs67822872 may have a medium-high risk level. The protein structures 1JM7, 3C5R and 2NTE were predicted by PDBSum and shared 100% similarity with the BARD1 amino acid sequence. Among the predicted nsSNPs, rs4986841, rs111367604, rs13389423 and rs139785364 were identified as deleterious and damaging by the SIFT and PolyPhen programs. Additionally, I-Mutant showed a decrease in stability for these nsSNPs upon mutation. Finally, the ExPASy-PROSIT program revealed that the predicted deleterious mutations are contained in the ankyrin ring and BRCT domains.
Using the available bioinformatics tools and the data present in the dbSNP database, the four nsSNPs, rs4986841, rs111367604, rs13389423 and rs139785364, were identified as deleterious, reducing the protein stability of BARD1. Hence, these SNPs can be used for the larger population-based studies of female cancers.
The TNF-α gene mutations are seen in many diseases especially inflammatory diseases. Hence, before planning a larger population study, it is advisable to sort out the possible functional SNPs. To accomplish this goal, data available in the dbSNP database and different computer programs can be used. Therefore, this study was undertaken to find the functional nsSNPs (non-synonymous single nucleotide polymorphisms) in TNF-α.
Out of the total 169 SNPs, 48 were nsSNPs (non-synonymous single nucleotide polymorphisms), 23 occurred in the mRNA 3′ UTR, 10 occurred in 5′ UTR region, 41 occurred in intronic regions and the rest were other types of SNPs. SIFT and PolyPhen predicted 2 out of 48 nsSNPs as damaging. Among the predicted nsSNPs, rs4645843 and rs1800620 were identified as deleterious and damaging by the SIFT (Sorting Intolerant from Tolerant) and PolyPhen programs. Additionally, I-Mutant and nsSNPAnalyzer showed a decrease in stability for these nsSNPs upon mutation. Protein structural analysis with these amino acid variants was performed by using I-Mutant, Swiss PDB viewer, ANOLEA (Atomic Non-Local Environment Assessment), MUSTER (MUlti-Sources ThreadER) and NOMAD-Ref servers to check their molecular dynamics and energy minimization calculations. This study suggested that P84L and A94T variants of TNF-α could directly or indirectly destabilize the amino acid interactions and hydrogen bond networks thus explaining the functional deviations of protein to some extent.
•We analyze total 48 nsSNPs. Among the predicted nsSNPs, rs4645843, rs1800620 were identified as deleterious and damaging.•The amino acid residue substitutions which had the greatest impact on the stability of the TNF-α protein were mutations P84L (rs4645843) and A94T (rs1800620).•rs4645843 and rs1800620 should be considered important candidates in causing diseases related to TNF-α gene malfunction.
TNF, tumor necrosis factor; SIFT, Sorting Intolerant from Tolerant; PolyPhen, phenotype polymorphism; SNP, single nucleotide polymorphism; nsSNP, nonsynonymous single nucleotide polymorphism; OMIM, Online Mendelian Inheritance in Man; ANOLEA, Atomic Non-Local Environment Assessment; MUSTER, MUlti-Sources ThreadER; Single nucleotide polymorphism (SNP); TNF-α; In silico analysis; Gene variant
SNPs located within the open reading frame of a gene that result in an alteration in the amino acid sequence of the encoded protein [nonsynonymous SNPs (nsSNPs)] might directly or indirectly affect functionality of the protein, alone or in the interactions in a multi-protein complex, by increasing/decreasing the activity of the metabolic pathway. Understanding the functional consequences of such changes and drawing conclusions about the molecular basis of diseases, involves integrating information from multiple heterogeneous sources including sequence, structure data and pathway relations between proteins. The data from NCBI's SNP database (dbSNP), gene and protein databases from Entrez, protein structures from the PDB and pathway information from KEGG have all been cross referenced into the StSNP web server, in an effort to provide combined integrated, reports about nsSNPs. StSNP provides ‘on the fly’ comparative modeling of nsSNPs with links to metabolic pathway information, along with real-time visual comparative analysis of the modeled structures using the Friend software application. The use of metabolic pathways in StSNP allows a researcher to examine possible disease-related pathways associated with a particular nsSNP(s), and link the diseases with the current available molecular structure data. The server is publicly available at http://glinka.bio.neu.edu/StSNP/.
Understanding how genetic variation affects the molecular function of gene products is an emergent area of bioinformatic research. Here, we present updates to MutDB (http://www.mutdb.org), a tool aiming to aid bioinformatic studies by integrating publicly available databases of human genetic variation with molecular features and clinical phenotype data. MutDB, first developed in 2002, integrates annotated SNPs in dbSNP and amino acid substitutions in Swiss-Prot with protein structural information, links to scores that predict functional disruption and other useful annotations. Though these functional annotations are mainly focused on nonsynonymous SNPs, some information on other SNP types included in dbSNP is also provided. Additionally, we have developed a new functionality that facilitates KEGG pathway visualization of genes containing SNPs and a SNP query tool for visualizing and exporting sets of SNPs that share selected features based on certain filters.
Analysis of single nucleotide polymorphism (SNP) is becoming a key research in genomics fields. Many functional analyses of SNPs have been carried out for coding regions and splicing sites that can alter proteins and mRNA splicing. However, SNPs in non-coding regulatory regions can also influence important biological regulation. Presently, there are few databases for SNPs in non-coding regulatory regions.
We identified 488,452 human SNPs in the putative promoter regions that extended from the +5000 bp to -500 bp region of the transcription start sites. Some SNPs occurring in transcription factor (TF) binding sites were also predicted (47,832 SNP; 9.8%). The result is stored in a database: SNP@promoter. Users can search the SNP@Promoter database using three entries: 1) by SNP identifier (rs number from dbSNP), 2) by gene (gene name, gene symbol, refSeq ID), and 3) by disease term. The SNP@Promoter database provides extensive genetic information and graphical views of queried terms.
We present the SNP@Promoter database. It was created in order to predict functional SNPs in putative promoter regions and predicted transcription factor binding sites. SNP@Promoter will help researchers to identify functional SNPs in non-coding regions.
Studies on the relationship between disease and genetic variations such as single nucleotide polymorphisms (SNPs) are important. Genetic variations can cause disease by influencing important biological regulation processes. Despite the needs for analyzing SNP and disease correlation, most existing databases provide information only on functional variants at specific locations on the genome, or deal with only a few genes associated with disease. There is no combined resource to widely support gene-, SNP-, and disease-related information, and to capture relationships among such data. Therefore, we developed an integrated database-pipeline system for studying SNPs and diseases.
To implement the pipeline system for the integrated database, we first unified complicated and redundant disease terms and gene names using the Unified Medical Language System (UMLS) for classification and noun modification, and the HUGO Gene Nomenclature Committee (HGNC) and NCBI gene databases. Next, we collected and integrated representative databases for three categories of information. For genes and proteins, we examined the NCBI mRNA, UniProt, UCSC Table Track and MitoDat databases. For genetic variants we used the dbSNP, JSNP, ALFRED, and HGVbase databases. For disease, we employed OMIM, GAD, and HGMD databases. The database-pipeline system provides a disease thesaurus, including genes and SNPs associated with disease. The search results for these categories are available on the web page , and a genome browser is also available to highlight findings, as well as to permit the convenient review of potentially deleterious SNPs among genes strongly associated with specific diseases and clinical phenotypes.
Our system is designed to capture the relationships between SNPs associated with disease and disease-causing genes. The integrated database-pipeline provides a list of candidate genes and SNP markers for evaluation in both epidemiological and molecular biological approaches to diseases-gene association studies. Furthermore, researchers then can decide semi-automatically the data set for association studies while considering the relationships between genetic variation and diseases. The database can also be economical for disease-association studies, as well as to facilitate an understanding of the processes which cause disease. Currently, the database contains 14,674 SNP records and 109,715 gene records associated with human diseases and it is updated at regular intervals.
Summary: Many existing databases annotate experimentally characterized single nucleotide polymorphisms (SNPs). Each non-synonymous SNP (nsSNP) changes one amino acid in the gene product (single amino acid substitution;SAAS). This change can either affect protein function or be neutral in that respect. Most polymorphisms lack experimental annotation of their functional impact. Here, we introduce SNPdbe—SNP database of effects, with predictions of computationally annotated functional impacts of SNPs. Database entries represent nsSNPs in dbSNP and 1000 Genomes collection, as well as variants from UniProt and PMD. SAASs come from >2600 organisms; ‘human’ being the most prevalent. The impact of each SAAS on protein function is predicted using the SNAP and SIFT algorithms and augmented with experimentally derived function/structure information and disease associations from PMD, OMIM and UniProt. SNPdbe is consistently updated and easily augmented with new sources of information. The database is available as an MySQL dump and via a web front end that allows searches with any combination of organism names, sequences and mutation IDs.
Several studies have demonstrated an association between polycystic ovary syndrome (PCOS) and the dinucleotide repeat microsatellite marker D19S884, which is located in intron 55 of the fibrillin-3 (FBN3) gene. Fibrillins, including FBN1 and 2, interact with latent transforming growth factor (TGF)-β-binding proteins (LTBP) and thereby control the bioactivity of TGFβs. TGFβs stimulate fibroblast replication and collagen production. The PCOS ovarian phenotype includes increased stromal collagen and expansion of the ovarian cortex, features feasibly influenced by abnormal fibrillin expression. To examine a possible role of fibrillins in PCOS, particularly FBN3, we undertook tagging and functional single nucleotide polymorphism (SNP) analysis (32 SNPs including 10 that generate non-synonymous amino acid changes) using DNA from 173 PCOS patients and 194 controls. No SNP showed a significant association with PCOS and alleles of most SNPs showed almost identical population frequencies between PCOS and control subjects. No significant differences were observed for microsatellite D19S884. In human PCO stroma/cortex (n = 4) and non-PCO ovarian stroma (n = 9), follicles (n = 3) and corpora lutea (n = 3) and in human ovarian cancer cell lines (KGN, SKOV-3, OVCAR-3, OVCAR-5), FBN1 mRNA levels were approximately 100 times greater than FBN2 and 200–1000-fold greater than FBN3. Expression of LTBP-1 mRNA was 3-fold greater than LTBP-2. We conclude that FBN3 appears to have little involvement in PCOS but cannot rule out that other markers in the region of chromosome 19p13.2 are associated with PCOS or that FBN3 expression occurs in other organs and that this may be influencing the PCOS phenotype.
fibrillin; latent-transforming growth factor β-binding protein; polycystic ovary syndrome; ovary
Single-nucleotide polymorphisms (SNPs) are biomarkers for exploring the genetic basis of many complex human diseases. The prediction of SNPs is promising in modern genetic analysis but it is still a great challenge to identify the functional SNPs in a disease-related gene. The computational approach has overcome this challenge and an increase in the successful rate of genetic association studies and reduced cost of genotyping have been achieved. The objective of this study is to identify deleterious non-synonymous SNPs (nsSNPs) associated with the COL1A1 gene.
Material and methods
The SNPs were retrieved from the Single Nucleotide Polymorphism Database (dbSNP). Using I-Mutant, protein stability change was calculated. The potentially functional nsSNPs and their effect on proteins were predicted by PolyPhen and SIFT respectively. FASTSNP was used for estimation of risk score.
Our analysis revealed 247 SNPs as non-synonymous, out of which 5 nsSNPs were found to be least stable by I-Mutant 2.0 with a DDG value of > –1.0. Four nsSNPs, namely rs17853657, rs17857117, rs57377812 and rs1059454, showed a highly deleterious tolerance index score of 0.00 with a change in their physicochemical properties by the SIFT server. Seven nsSNPs, namely rs1059454, rs8179178, rs17853657, rs17857117, rs72656340, rs72656344 and rs72656351, were found to be probably damaging with a PSIC score difference between 2.0 and 3.5 by the PolyPhen server. Three nsSNPs, namely rs1059454, rs17853657 and rs17857117, were found to be highly polymorphic with a risk score of 3-4 with a possible effect of non-conservative change and splicing regulation by FASTSNP.
Three nsSNPs, namely rs1059454, rs17853657 and rs17857117, are potential functional polymorphisms that are likely to have a functional impact on the COL1A1 gene.
in silico analysis; dbSNP; SIFT; PolyPhen
Advances in high-throughput genotyping and the International HapMap Project have enabled association studies at the whole-genome level. We have constructed whole-genome genotyping panels of over 550,000 (HumanHap550) and 650,000 (HumanHap650Y) SNP loci by choosing tag SNPs from all populations genotyped by the International HapMap Project. These panels also contain additional SNP content in regions that have historically been overrepresented in diseases, such as nonsynonymous sites, the MHC region, copy number variant regions and mitochondrial DNA. We estimate that the tag SNP loci in these panels cover the majority of all common variation in the genome as measured by coverage of both all common HapMap SNPs and an independent set of SNPs derived from complete resequencing of genes obtained from SeattleSNPs. We also estimate that, given a sample size of 1,000 cases and 1,000 controls, these panels have the power to detect single disease loci of moderate risk (λ ∼ 1.8–2.0). Relative risks as low as λ ∼ 1.1–1.3 can be detected using 10,000 cases and 10,000 controls depending on the sample population and disease model. If multiple loci are involved, the power increases significantly to detect at least one locus such that relative risks 20%–35% lower can be detected with 80% power if between two and four independent loci are involved. Although our SNP selection was based on HapMap data, which is a subset of all common SNPs, these panels effectively capture the majority of all common variation and provide high power to detect risk alleles that are not represented in the HapMap data.
Advances in high-throughput genotyping technology and the International HapMap Project have enabled genetic association studies at the whole-genome level. Our paper describes two genome-wide SNP panels that contain tag SNPs derived from the International HapMap Project. Tag SNPs are proxies for groups of highly correlated SNPs. Information can be captured for the entire group of correlated SNPs by genotyping only one representative SNP, the tag SNP. These whole-genome SNP panels also contain additional content thought to be overrepresented in disease, such as amino acid–changing nonsynonymous SNPs and mitochondrial SNPs. We show that these panels cover the genome with very high efficiency as measured by coverage of all HapMap SNPs and a set of SNPs derived from completely resequenced genes from the Seattle SNPs database. We also show that these panels have high power to detect disease risk alleles for both HapMap and non-HapMap SNPs. In complex disease where multiple risk alleles are believed to be involved, we show that the ability to detect at least one risk allele with the tag SNP panels is also high.
Polycystic ovary syndrome (PCOS) is characterized by excessive theca cell androgen secretion, dependent upon LH, which acts through the intermediacy of 3′,5′-cyclic adenosine monophosphate (cAMP). cAMP signaling pathways are controlled through regulation of its synthesis by adenylyl cyclases, and cAMP degradation by phosphodiesterases (PDEs). PDE8A, a high-affinity cAMP-specific PDE is expressed in the ovary and testis. Leydig cells from mice with a targeted mutation in the Pde8a gene are sensitized to the action of LH in terms of testosterone production. These observations led us to evaluate the human PDE8A gene as a PCOS candidate gene, and the hypothesis that reduced PDE8A activity or expression would contribute to excessive ovarian androgen production. We identified a rare variant (R136Q; NM_002605.2 c.407G > A) and studied another known single nucleotide polymorphism (SNP) (rs62019510, N401S) in the PDE8A coding sequence causing non-synonymous amino acid substitutions, and a new SNP in the promoter region (NT_010274.16:g.490155G > A). Although PDE8A kinetics were consistent with reduced activity in theca cell lysates, study of the expressed variants did not confirm reduced activity in cell-free assays. Sub-cellular localization of the enzyme was also not different among the coding sequence variants. The PDE8A promoter SNP and a previously described promoter SNP did not affect promoter activity in in vitro assays. The more common coding sequence SNP (N401S), and the promoter SNPs were not associated with PCOS in our transmission/disequilibrium test-based analysis, nor where they associated with total testosterone or dehydroepiandrosterone sulfate levels. These findings exclude a significant role for PDE8A as a PCOS candidate gene, and as a Las major determinant of androgen levels in women.
PDE8A; polycystic ovary syndrome; androgens; theca; SNP
Quantifying the distribution of fitness effects among newly arising mutations in the human genome is key to resolving important debates in medical and evolutionary genetics. Here, we present a method for inferring this distribution using Single Nucleotide Polymorphism (SNP) data from a population with non-stationary demographic history (such as that of modern humans). Application of our method to 47,576 coding SNPs found by direct resequencing of 11,404 protein coding-genes in 35 individuals (20 European Americans and 15 African Americans) allows us to assess the relative contribution of demographic and selective effects to patterning amino acid variation in the human genome. We find evidence of an ancient population expansion in the sample with African ancestry and a relatively recent bottleneck in the sample with European ancestry. After accounting for these demographic effects, we find strong evidence for great variability in the selective effects of new amino acid replacing mutations. In both populations, the patterns of variation are consistent with a leptokurtic distribution of selection coefficients (e.g., gamma or log-normal) peaked near neutrality. Specifically, we predict 27–29% of amino acid changing (nonsynonymous) mutations are neutral or nearly neutral (|s|<0.01%), 30–42% are moderately deleterious (0.01%<|s|<1%), and nearly all the remainder are highly deleterious or lethal (|s|>1%). Our results are consistent with 10–20% of amino acid differences between humans and chimpanzees having been fixed by positive selection with the remainder of differences being neutral or nearly neutral. Our analysis also predicts that many of the alleles identified via whole-genome association mapping may be selectively neutral or (formerly) positively selected, implying that deleterious genetic variation affecting disease phenotype may be missed by this widely used approach for mapping genes underlying complex traits.
Although mutations are known to cause varying degrees of harmful effects, it is difficult to quantify the distribution that best describes the variation of fitness effects of these mutations. Here we present a new method for inferring this distribution and inferring population history using Single Nucleotide Polymorphism (SNP) data from human populations. Using 47,576 SNPs discovered in 11,404 genes from sequencing 35 individuals (20 European Americans and 15 African Americans), we find evidence of an ancient population expansion in the sample with African ancestry and a relatively recent bottleneck in the sample with European ancestry. In both populations, the patterns of variation are consistent with a leptokurtic distribution of selection coefficients (e.g., gamma or log-normal) peaked near neutrality. Specifically, we predict 27–29% of amino acid changing (nonsynonymous) mutations are neutral or nearly neutral, 30–42% are moderately deleterious, and nearly all the remainder are highly deleterious or lethal. Furthermore, we infer that 10–20% of amino acid differences between humans and chimpanzees were fixed by positive selection, with the remainder of differences being neutral or nearly neutral.
Single nucleotide polymorphisms (SNPs) constitute the most common type of genetic variation in humans. SNPs introducing premature termination codons (PTCs), herein called X-SNPs, can alter the stability and function of transcripts and proteins and thus are considered to be biologically important. Initial studies suggested a strong selection against such variations/mutations. In this study, we undertook a genome-wide systematic screening to identify human X-SNPs using the dbSNP database. Our results demonstrated the presence of 28 X-SNPs from 28 genes with known minor allele frequencies. Eight X-SNPs (28.6 per cent) were predicted to cause transcript degradation by nonsense-mediated mRNA decay. Seventeen X-SNPs (60.7 per cent) resulted in moderate to severe truncation at the C-terminus of the proteins (deletion of > 50 per cent of the amino acids). The majority of the X-SNPs (78.6 per cent) represent commonly occurring SNPs, by contrast with the rarely occurring disease-causing PTC mutations. Interestingly, X-SNPs displayed a non-uniform distribution across human populations: eight X-SNPs were reported to be prevalent across three different human populations, whereas six X-SNPs were found exclusively in one or two population(s). In conclusion, we have systematically investigated human SNPs introducing PTCs with respect to their possible biological consequences, distributions across different human populations and evolutionary aspects. We believe that the SNPs reported here are likely to affect gene/protein function, although their biological and evolutionary roles need to be further investigated.
SNP; premature termination codons; nonsense-mediated mRNA decay; population distribution; evolutionary selection
Carcinogenesis occurs, at least in part, due to the accumulation of mutations in critical genes that control the mechanisms of cell proliferation, differentiation and death. Publicly accessible databases contain millions of expressed sequence tag (EST) and single nucleotide polymorphism (SNP) records, which have the potential to assist in the identification of SNPs overrepresented in tumor tissue.
An in silico SNP-tumor association study was performed utilizing tissue library and SNP information available in NCBI's dbEST (release 092002) and dbSNP (build 106).
A total of 4865 SNPs were identified which were present at higher allele frequencies in tumor compared to normal tissues. A subset of 327 (6.7%) SNPs induce amino acid changes to the protein coding sequences. This approach identified several SNPs which have been previously associated with carcinogenesis, as well as a number of SNPs that now warrant further investigation
This novel in silico approach can assist in prioritization of genes and SNPs in the effort to elucidate the genetic mechanisms underlying the development of cancer.
We have developed coliSNP, a database server (http://yayoi.kansai.jaea.go.jp/colisnp) that maps non-synonymous single nucleotide polymorphisms (nsSNPs) on the three-dimensional (3D) structure of proteins. Once a week, the SNP data from the dbSNP database and the protein structure data from the Protein Data Bank (PDB) are downloaded, and the correspondence of the two data sets is automatically tabulated in the coliSNP database. Given an amino acid sequence, protein name or PDB ID, the server will immediately provide known nsSNP information, including the amino acid mutation caused by the nsSNP, the solvent accessibility, the secondary structure and the flanking residues of the mutated residue in a single page. The position of the nsSNP within the amino acid sequence and on the 3D structure of the protein can also be observed. The database provides key information with which to judge whether an observed nsSNP critically affects protein function and/or stability. As far as we know, this is the only web-based nsSNP database that automatically compiles SNP and protein information in a concise manner.
The most common form of genetic variation, single nucleotide polymorphisms or SNPs, can affect the way an individual responds to the environment and modify disease risk. Although most of the millions of SNPs have little or no effect on gene regulation and protein activity, there are many circumstances where base changes can have deleterious effects. Non-synonymous SNPs that result in amino acid changes in proteins have been studied because of their obvious impact on protein activity. It is well known that SNPs within regulatory regions of the genome can result in disregulation of gene transcription. However, the impact of SNPs located in putative regulatory regions, or rSNPs, is harder to predict for two primary reasons. First, the mechanistic roles of non-coding genomic sequence remain poorly defined. Second, experimental validation of the functional consequences of rSNPs is often slow and laborious. In this review, we summarize traditional and novel methodologies for candidate rSNPs selection, in particular in silico techniques that aid in candidate rSNP selection. Additionally we will discuss molecular biological techniques that assess the impact of rSNPs on binding of regulatory machinery, as well as functional consequences on transcription. Standard techniques such as EMSA and luciferase reporter constructs are still widely used to assess effects of rSNPs on binding and gene transcription; however, these protocols are often bottlenecks in the discovery process. Therefore, we highlight novel and developing high-throughput protocols that promise to aid in shortening the process of rSNP validation. Given the large amount of genomic information generated from a multitude of re-sequencing and genome-wide SNP array efforts, future focus should be to develop validation techniques that will allow greater understanding of the impact these polymorphisms have on human health and disease.
polymorphism; SNPs; gene regulation; functional genomics; microsphere assay
Age-related cataract is clinically and genetically heterogeneous disorder affecting the ocular lens, and the leading cause of vision
loss and blindness worldwide. Here we screened nonsynonymous single nucleotide polymorphisms (nsSNPs) of a novel gene,
EPHA2 responsible for age related cataracts. The SNPs were retrieved from dbSNP. Using I-Mutant, protein stability change was
calculated. The potentially functional nsSNPs and their effect on protein was predicted by PolyPhen and SIFT respectively.
FASTSNP was used for functional analysis and estimation of risk score. The functional impact on the EPHA2 protein was
evaluated by using SWISSPDB viewer and NOMAD-Ref server. Our analysis revealed 16 SNPs as nonsynonymous out of which 6
nsSNPs, namely rs11543934, rs2291806, rs1058371, rs1058370, rs79100278 and rs113882203 were found to be least stable by I-Mutant
2.0 with DDG value of > −1.0. nsSNPs, namely rs35903225, rs2291806, rs1058372, rs1058370, rs79100278 and rs113882203 showed a
highly deleterious tolerance index score of 0.00 by SIFT server. Four nsSNPs namely rs11543934, rs2291806, rs1058370 and
rs113882203 were found to be probably damaging with PSIC score of ≥ 2. 0 by Polyp hen server. Three nsSNPs namely, rs11543934,
rs2291806 and rs1058370 were found to be highly polymorphic with a risk score of 3-4 with a possible effect of Non-conservative
change and splicing regulation by FASTSNP. The total energy and RMSD value was higher for the mutant-type structure
compared to the native type structure. We concluded that the nsSNP namely rs2291806 as the potential functional polymorphic
that is likely to have functional impact on the EPHA2 gene.
Computational analysis; single nucleotide polymorphism; EPHA2; cataract
Complex human diseases may be associated with many gene interactions. Gene interactions take several different forms and it is difficult to identify all of the interactions that are potentially associated with human diseases. One approach that may fill this knowledge gap is to infer previously unknown gene interactions via identification of non-physical linkages between different mutations (or single nucleotide polymorphisms, SNPs) to avoid hitchhiking effect or lack of recombination. Strong non-physical SNP linkages are considered to be an indication of biological (gene) interactions. These interactions can be physical protein interactions, regulatory interactions, functional compensation/antagonization or many other forms of interactions. Previous studies have shown that mutations in different genes can be linked to the same disorders. Therefore, non-physical SNP linkages, coupled with knowledge of SNP-disease associations may shed more light on the role of gene interactions in human disorders. A user-friendly web resource that integrates information about non-physical SNP linkages, gene annotations, SNP information, and SNP-disease associations may thus be a good reference for biomedical research.
Here we extracted the SNPs located within the promoter or exonic regions of protein-coding genes from the HapMap database to construct a database named the Linkage-Disequilibrium-based Gene Interaction database (LDGIdb). The database stores 646,203 potential human gene interactions, which are potential interactions inferred from SNP pairs that are subject to long-range strong linkage disequilibrium (LD), or non-physical linkages. To minimize the possibility of hitchhiking, SNP pairs inferred to be non-physically linked were required to be located in different chromosomes or in different LD blocks of the same chromosomes. According to the genomic locations of the involved SNPs (i.e., promoter, untranslated region (UTR) and coding region (CDS)), the SNP linkages inferred were categorized into promoter-promoter, promoter-UTR, promoter-CDS, CDS-CDS, CDS-UTR and UTR-UTR linkages. For the CDS-related linkages, the coding SNPs were further classified into nonsynonymous and synonymous variations, which represent potential gene interactions at the protein and RNA level, respectively. The LDGIdb also incorporates human disease-association databases such as Genome-Wide Association Studies (GWAS) and Online Mendelian Inheritance in Man (OMIM), so that the user can search for potential disease-associated SNP linkages. The inferred SNP linkages are also classified in the context of population stratification to provide a resource for investigating potential population-specific gene interactions.
The LDGIdb is a user-friendly resource that integrates non-physical SNP linkages and SNP-disease associations for studies of gene interactions in human diseases. With the help of the LDGIdb, it is plausible to infer population-specific SNP linkages for more focused studies, an avenue that is potentially important for pharmacogenetics. Moreover, by referring to disease-association information such as the GWAS data, the LDGIdb may help identify previously uncharacterized disease-associated gene interactions and potentially lead to new discoveries in studies of human diseases.
Gene interaction, SNP, Linkage disequilibrium, Systems biology, Bioinformatics
Asthma is a chronic inflammatory disease of the airways with a complex genetic background. In this study, we carried out a meta-analysis of single nucleotide polymorphisms (SNPs) thought to be associated with asthma.
The literature (PubMed) was searched for SNPs within genes relevant in asthma. The SNP-modified genes were converted to corresponding proteins, and their protein–protein interactions were searched from six different databases. This interaction network was analyzed using annotated vocabularies (ontologies), such as the Gene Ontology and Nature pathway interaction databases.
In total, 127 genes with SNPs related to asthma were found in the literature. The corresponding proteins were then entered into a large protein–protein interaction network with the help of various databases. Ninety-six SNP-related proteins had more than one interacting protein each, and a network containing 309 proteins and 644 connections was generated. This network was significantly enriched with a gene ontology entitled “protein binding” and several of its daughter categories, including receptor binding and cytokine binding, when compared with the background human proteome. In the detailed analysis, the chemokine network, including eight proteins and 13 toll-like receptors, were shown to interact with each other. Of great interest are the nonsynonymous SNPs which code for an alternative amino acid sequence of proteins and, of the toll-like receptor network, TLR1, TLR4, TLR5, TLR6, TLR10, IL4R, and IL13 are among these.
Protein binding, toll-like receptors, and chemokines dominated in the asthma-related protein interaction network. Systems level analysis of allergy-related mutations can provide new insights into the pathogenetic mechanisms of disease.
asthma; network; pathway pathogenesis; single nucleotide polymorphisms
Introduction. Apolipoprotein E (APOE) is an important risk factor for Alzheimer's disease (AD) and is present in 30–50% of patients who develop late-onset AD. Several single-nucleotide polymorphisms (SNPs) are present in APOE gene which act as the biomarkers for exploring the genetic basis of this disease. The objective of this study is to identify deleterious nsSNPs associated with APOE gene. Methods. The SNPs were retrieved from dbSNP. Using I-Mutant, protein stability change was calculated. The potentially functional nonsynonymous (ns) SNPs and their effect on protein was predicted by PolyPhen and SIFT, respectively. FASTSNP was used for functional analysis and estimation of risk score. The functional impact on the APOE protein was evaluated by using Swiss PDB viewer and NOMAD-Ref server. Results. Six nsSNPs were found to be least stable by I-Mutant 2.0 with DDG value of >−1.0. Four nsSNPs showed a highly deleterious tolerance index score of 0.00. Nine nsSNPs were found to be probably damaging with position-specific independent counts (PSICs) score of ≥2.0. Seven nsSNPs were found to be highly polymorphic with a risk score of 3-4. The total energies and root-mean-square deviation (RMSD) values were higher for three mutant-type structures compared to the native modeled structure. Conclusion. We concluded that three nsSNPs, namely, rs11542041, rs11542040, and rs11542034, to be potentially functional polymorphic.
The restriction fragment length polymorphism (RFLP) is a common laboratory method for the genotyping of single nucleotide polymorphisms (SNPs). Here, we describe a web-based software, named SNP-RFLPing, which provides the restriction enzyme for RFLP assays on a batch of SNPs and genes from the human, rat, and mouse genomes.
Three user-friendly inputs are included: 1) NCBI dbSNP "rs" or "ss" IDs; 2) NCBI Entrez gene ID and HUGO gene name; 3) any formats of SNP-in-sequence, are allowed to perform the SNP-RFLPing assay. These inputs are auto-programmed to SNP-containing sequences and their complementary sequences for the selection of restriction enzymes. All SNPs with available RFLP restriction enzymes of each input genes are provided even if many SNPs exist. The SNP-RFLPing analysis provides the SNP contig position, heterozygosity, function, protein residue, and amino acid position for cSNPs, as well as commercial and non-commercial restriction enzymes.
This web-based software solves the input format problems in similar softwares and greatly simplifies the procedure for providing the RFLP enzyme. Mixed free forms of input data are friendly to users who perform the SNP-RFLPing assay. SNP-RFLPing offers a time-saving application for association studies in personalized medicine and is freely available at .
The abundance and identity of functional variation segregating in natural populations is paramount to dissecting the molecular basis of quantitative traits as well as human genetic diseases. Genome sequencing of multiple organisms of the same species provides an efficient means of cataloging rearrangements, insertion, or deletion polymorphisms (InDels) and single-nucleotide polymorphisms (SNPs). While inbreeding depression and heterosis imply that a substantial amount of polymorphism is deleterious, distinguishing deleterious from neutral polymorphism remains a significant challenge. To identify deleterious and neutral DNA sequence variation within Saccharomyces cerevisiae, we sequenced the genome of a vineyard and oak tree strain and compared them to a reference genome. Among these three strains, 6% of the genome is variable, mostly attributable to variation in genome content that results from large InDels. Out of the 88,000 polymorphisms identified, 93% are SNPs and a small but significant fraction can be attributed to recent interspecific introgression and ectopic gene conversion. In comparison to the reference genome, there is substantial evidence for functional variation in gene content and structure that results from large InDels, frame-shifts, and polymorphic start and stop codons. Comparison of polymorphism to divergence reveals scant evidence for positive selection but an abundance of evidence for deleterious SNPs. We estimate that 12% of coding and 7% of noncoding SNPs are deleterious. Based on divergence among 11 yeast species, we identified 1,666 nonsynonymous SNPs that disrupt conserved amino acids and 1,863 noncoding SNPs that disrupt conserved noncoding motifs. The deleterious coding SNPs include those known to affect quantitative traits, and a subset of the deleterious noncoding SNPs occurs in the promoters of genes that show allele-specific expression, implying that some cis-regulatory SNPs are deleterious. Our results show that the genome sequences of both closely and distantly related species provide a means of identifying deleterious polymorphisms that disrupt functionally conserved coding and noncoding sequences.
DNA sequence variation makes an important contribution to most traits that vary in natural populations. However, mapping mutations that underlie a trait of interest is a significant challenge. Genome sequencing of multiple organisms provides a complete list of DNA sequence differences responsible for any trait that differs among the organisms. Yet, distinguishing those DNA sequence variants that contribute to a trait from all other variants is not easy. Here, we sequence the genomes of two strains of yeast and, through comparisons with a reference genome, we catalog multiple types of DNA sequence variation among the three strains. Using a variety of comparative genomics methods, we show that a substantial fraction of DNA sequence variations has deleterious effects on fitness. Finally, we show that a subset of deleterious mutations is associated with changes in gene expression levels. Our results imply that comparative genomics methods will be a valuable approach to identifying DNA sequence changes underlying numerous traits of interest.
Human single nucleotide polymorphisms (SNPs) represent the most frequent type of human population DNA variation. One of the main goals of SNP research is to understand the genetics of the human phenotype variation and especially the genetic basis of human complex diseases. Non-synonymous coding SNPs (nsSNPs) comprise a group of SNPs that, together with SNPs in regulatory regions, are believed to have the highest impact on phenotype. Here we present a World Wide Web server to predict the effect of an nsSNP on protein structure and function. The prediction method enabled analysis of the publicly available SNP database HGVbase, which gave rise to a dataset of nsSNPs with predicted functionality. The dataset was further used to compare the effect of various structural and functional characteristics of amino acid substitutions responsible for phenotypic display of nsSNPs. We also studied the dependence of selective pressure on the structural and functional properties of proteins. We found that in our dataset the selection pressure against deleterious SNPs depends on the molecular function of the protein, although it is insensitive to several other protein features considered. The strongest selective pressure was detected for proteins involved in transcription regulation.
The identification of genetic variants that are responsible for human inherited diseases is a fundamental problem in human and medical genetics. As a typical type of genetic variation, nonsynonymous single-nucleotide polymorphisms (nsSNPs) occurring in protein coding regions may alter the encoded amino acid, potentially affect protein structure and function, and further result in human inherited diseases. Therefore, it is of great importance to develop computational approaches to facilitate the discrimination of deleterious nsSNPs from neutral ones. In this paper, we review databases that collect nsSNPs and summarize computational methods for the identification of deleterious nsSNPs. We classify the existing methods for characterizing nsSNPs into three categories (sequence based, structure based, and annotation based), and we introduce machine learning models for the prediction of deleterious nsSNPs. We further discuss methods for identifying deleterious nsSNPs in noncoding variants and those for dealing with rare variants.