Search tips
Search criteria

Results 1-21 (21)

Clipboard (0)

Select a Filter Below

more »
Year of Publication
Document Types
1.  Chromatin Interaction Analysis with Paired-End Tag (ChIA-PET) sequencing technology and application 
BMC Genomics  2014;15(Suppl 12):S11.
Long-range chromatin interactions play an important role in transcription regulation. Chromatin Interaction Analysis with Paired-End-Tag sequencing (ChIA-PET) is an emerging technology that has unique advantages in chromatin interaction analysis, and thus provides insight into the study of transcription regulation.
This article introduces the experimental protocol and data analysis process of ChIA-PET, as well as discusses some applications using this technology. It also unveils the direction of future studies based on this technology.
Overall we show that ChIA-PET is the cornerstone to explore the three-dimensional (3D) chromatin structure, and certainly will lead the forthcoming wave of 3D genomics studies.
PMCID: PMC4303937  PMID: 25563301
2.  Molecular analysis of the benthos microbial community in Zavarzin thermal spring (Uzon Caldera, Kamchatka, Russia) 
BMC Genomics  2014;15(Suppl 12):S12.
Geothermal areas are of great interest for the study of microbial communities. The results of such investigations can be used in a variety of fields (ecology, microbiology, medicine) to answer fundamental questions, as well as those with practical benefits. Uzon caldera is located in the Uzon-Geyser depression that is situated in the centre of the Karym-Semyachin region of the East Kamchatka graben-synclinorium. The microbial communities of Zavarzin spring are well studied; however, its benthic microbial mat has not been previously described.
Pyrosequencing of the V3 region of the 16S rRNA gene was used to study the benthic microbial community of the Zavarzin thermal spring (Uzon Caldera, Kamchatka). The community is dominated by bacteria (>95% of all sequences), including thermophilic, chemoorganotrophic Caldiserica (33.0%) and Dictyoglomi (24.8%). The benthic community and the previously examined planktonic community of Zavarzin spring have qualitatively similar, but quantitatively different, compositions.
In this study, we performed a metagenomic analysis of the benthic microbial mat of Zavarzin spring. We compared this benthic community to microbial communities found in the water and of an integral probe consisting of water and bottom sediments. Various phylogenetic groups of microorganisms, including potentially new ones, represent the full-fledged trophic system of Zavarzin. A thorough geochemical study of the spring was performed.
PMCID: PMC4303939  PMID: 25563397
Microbiology; metagenome; Zavarzin thermal spring; Uzon Caldera; Kamchatka
3.  Time-course human urine proteomics in space-flight simulation experiments 
BMC Genomics  2014;15(Suppl 12):S2.
Long-term space travel simulation experiments enabled to discover different aspects of human metabolism such as the complexity of NaCl salt balance. Detailed proteomics data were collected during the Mars105 isolation experiment enabling a deeper insight into the molecular processes involved.
We studied the abundance of about two thousand proteins extracted from urine samples of six volunteers collected weekly during a 105-day isolation experiment under controlled dietary conditions including progressive reduction of salt consumption. Machine learning using Self Organizing maps (SOM) in combination with different analysis tools was applied to describe the time trajectories of protein abundance in urine. The method enables a personalized and intuitive view on the physiological state of the volunteers. The abundance of more than one half of the proteins measured clearly changes in the course of the experiment. The trajectory splits roughly into three time ranges, an early (week 1-6), an intermediate (week 7-11) and a late one (week 12-15). Regulatory modes associated with distinct biological processes were identified using previous knowledge by applying enrichment and pathway flow analysis. Early protein activation modes can be related to immune response and inflammatory processes, activation at intermediate times to developmental and proliferative processes and late activations to stress and responses to chemicals.
The protein abundance profiles support previous results about alternative mechanisms of salt storage in an osmotically inactive form. We hypothesize that reduced NaCl consumption of about 6 g/day presumably will reduce or even prevent the activation of inflammatory processes observed in the early time range of isolation. SOM machine learning in combination with analysis methods of class discovery and functional annotation enable the straightforward analysis of complex proteomics data sets generated by means of mass spectrometry.
PMCID: PMC4303941  PMID: 25563515
4.  Molecular dynamics simulations of the Nip7 proteins from the marine deep- and shallow-water Pyrococcus species 
The identification of the mechanisms of adaptation of protein structures to extreme environmental conditions is a challenging task of structural biology. We performed molecular dynamics (MD) simulations of the Nip7 protein involved in RNA processing from the shallow-water (P. furiosus) and the deep-water (P. abyssi) marine hyperthermophylic archaea at different temperatures (300 and 373 K) and pressures (0.1, 50 and 100 MPa). The aim was to disclose similarities and differences between the deep- and shallow-sea protein models at different temperatures and pressures.
The current results demonstrate that the 3D models of the two proteins at all the examined values of pressures and temperatures are compact, stable and similar to the known crystal structure of the P. abyssi Nip7. The structural deviations and fluctuations in the polypeptide chain during the MD simulations were the most pronounced in the loop regions, their magnitude being larger for the C-terminal domain in both proteins. A number of highly mobile segments the protein globule presumably involved in protein-protein interactions were identified. Regions of the polypeptide chain with significant difference in conformational dynamics between the deep- and shallow-water proteins were identified.
The results of our analysis demonstrated that in the examined ranges of temperatures and pressures, increase in temperature has a stronger effect on change in the dynamic properties of the protein globule than the increase in pressure. The conformational changes of both the deep- and shallow-sea protein models under increasing temperature and pressure are non-uniform. Our current results indicate that amino acid substitutions between shallow- and deep-water proteins only slightly affect overall stability of two proteins. Rather, they may affect the interactions of the Nip7 protein with its protein or RNA partners.
PMCID: PMC4209456  PMID: 25315147
Molecular dynamics simulation; Nip7 protein; High pressure; Adaptation; Salt bridges
5.  Genetic basis of olfactory cognition: extremely high level of DNA sequence polymorphism in promoter regions of the human olfactory receptor genes revealed using the 1000 Genomes Project dataset 
The molecular mechanism of olfactory cognition is very complicated. Olfactory cognition is initiated by olfactory receptor proteins (odorant receptors), which are activated by olfactory stimuli (ligands). Olfactory receptors are the initial player in the signal transduction cascade producing a nerve impulse, which is transmitted to the brain. The sensitivity to a particular ligand depends on the expression level of multiple proteins involved in the process of olfactory cognition: olfactory receptor proteins, proteins that participate in signal transduction cascade, etc. The expression level of each gene is controlled by its regulatory regions, and especially, by the promoter [a region of DNA about 100–1000 base pairs long located upstream of the transcription start site (TSS)]. We analyzed single nucleotide polymorphisms using human whole-genome data from the 1000 Genomes Project and revealed an extremely high level of single nucleotide polymorphisms in promoter regions of olfactory receptor genes and HLA genes. We hypothesized that the high level of polymorphisms in olfactory receptor promoters was responsible for the diversity in regulatory mechanisms controlling the expression levels of olfactory receptor proteins. Such diversity of regulatory mechanisms may cause the great variability of olfactory cognition of numerous environmental olfactory stimuli perceived by human beings (air pollutants, human body odors, odors in culinary etc.). In turn, this variability may provide a wide range of emotional and behavioral reactions related to the vast variety of olfactory stimuli.
PMCID: PMC3970011  PMID: 24715883
olfactory cognition; olfactory receptor gene; single nucleotide polymorphism; promoter; 1000 Genomes Project
6.  A New Stochastic Model for Subgenomic Hepatitis C Virus Replication Considers Drug Resistant Mutants 
PLoS ONE  2014;9(3):e91502.
As an RNA virus, hepatitis C virus (HCV) is able to rapidly acquire drug resistance, and for this reason the design of effective anti-HCV drugs is a real challenge. The HCV subgenomic replicon-containing cells are widely used for experimental studies of the HCV genome replication mechanisms, for drug testing in vitro and in studies of HCV drug resistance. The NS3/4A protease is essential for virus replication and, therefore, it is one of the most attractive targets for developing specific antiviral agents against HCV. We have developed a stochastic model of subgenomic HCV replicon replication, in which the emergence and selection of drug resistant mutant viral RNAs in replicon cells is taken into account. Incorporation into the model of key NS3 protease mutations leading to resistance to BILN-2061 (A156T, D168V, R155Q), VX-950 (A156S, A156T, T54A) and SCH 503034 (A156T, A156S, T54A) inhibitors allows us to describe the long term dynamics of the viral RNA suppression for various inhibitor concentrations. We theoretically showed that the observable difference between the viral RNA kinetics for different inhibitor concentrations can be explained by differences in the replication rate and inhibitor sensitivity of the mutant RNAs. The pre-existing mutants of the NS3 protease contribute more significantly to appearance of new resistant mutants during treatment with inhibitors than wild-type replicon. The model can be used to interpret the results of anti-HCV drug testing on replicon systems, as well as to estimate the efficacy of potential drugs and predict optimal schemes of their usage.
PMCID: PMC3958367  PMID: 24643004
7.  Abundances of microRNAs in human cells can be estimated as a function of the abundances of YRHB and RHHK tetranucleotides in these microRNAs as an ill-posed inverse problem solution 
Frontiers in Genetics  2013;4:122.
Mature microRNAs (miRNAs) are small endogenous non-coding RNAs 18–25 nt in length. They program the RNA Induced Silencing Complex (RISC) to make it inhibit either messenger RNAs or promoter DNAs. We have found that the mean abundance of miRNAs in Arabidopsis is correlated with the abundance of DRYD tetranucleotides near the 3′-end and the abundance of WRHB tetranucleotides in the center of the miRNA sequence. Based on this correlation, we have estimated miRNA abundances in seven organs of this plant, namely: inflorescences, stems, siliques, seedlings, roots, cauline, and rosette leaves. We have also found that the mean affinity of miRNAs for two proteins in the Argonaute family (Ago2 and Ago3) in man is correlated with the abundance of YRHB tetranucleotides near the 3′-end and that the preference of miRNAs for Ago2 is correlated with the abundance of RHHK tetranucleotides in the center of the miRNA sequence. This allowed us to obtain statistically significant estimates of miRNA abundances in human embryonic kidney cells, HEK293T. These findings in relation to two taxonomically distant entities (man and Arabidopsis) fit one another like pieces of a jigsaw puzzle, which allowed us to heuristically generalize them and state that the miRNA abundance in the human brain may be determined by the abundance of YRHB and RHHK tetranucleotides in these miRNAs.
PMCID: PMC3697047  PMID: 23847649
microRNA; Argonote; miRNA/Ago-affinity; miRNA abundance; quantitative sequence-activity relationship (QSAR); ill-posed inverse problem; linear-additive approximation; “limiting stage” approximation
8.  An Experimental Verification of the Predicted Effects of Promoter TATA-Box Polymorphisms Associated with Human Diseases on Interactions between the TATA Boxes and TATA-Binding Protein 
PLoS ONE  2013;8(2):e54626.
Human genome sequencing has resulted in a great body of data, including a stunningly large number of single nucleotide polymorphisms (SNPs) with unknown phenotypic manifestations. Identification and comprehensive analysis of regulatory SNPs in human gene promoters will help quantify the effects of these SNPs on human health. Based on our experimental and computer-aided study of SNPs in TATA boxes and the use of literature data, we have derived an equation for TBP/TATA equilibrium binding in three successive steps: TATA-binding protein (TBP) sliding along DNA due to their nonspecific affinity for each other ↔ recognition of the TATA box ↔ stabilization of the TBP/TATA complex. Using this equation, we have analyzed TATA boxes containing SNPs associated with human diseases and made in silico predictions of changes in TBP/TATA affinity. An electrophoretic mobility shift assay (EMSA)-based experimental study performed under the most standardized conditions demonstrates that the experimentally measured values are highly correlated with the predicted values: the coefficient of linear correlation, r, was 0.822 at a significance level of α<10−7 for equilibrium KD values, (-ln KD), and 0.785 at a significance level of α<10−3 for changes in equilibrium KD (δ) due to SNPs in the TATA boxes (). It has been demonstrated that the SNPs associated with increased risk of human diseases such as α-, β- and δ-thalassemia, myocardial infarction and thrombophlebitis, changes in immune response, amyotrophic lateral sclerosis, lung cancer and hemophilia B Leyden cause 2–4-fold changes in TBP/TATA affinity in most cases. The results obtained strongly suggest that the TBP/TATA equilibrium binding equation derived can be used for analysis of TATA-box sequences and identification of SNPs with a potential of being functionally important.
PMCID: PMC3570547  PMID: 23424617
9.  SitEx: a computer system for analysis of projections of protein functional sites on eukaryotic genes 
Nucleic Acids Research  2011;40(Database issue):D278-D283.
Search of interrelationships between the structural–functional protein organization and exon structure of encoding gene provides insights into issues concerned with the function, origin and evolution of genes and proteins. The functions of proteins and their domains are defined mostly by functional sites. The relation of the exon–intron structure of the gene to the protein functional sites has been little studied. Development of resources containing data on projections of protein functional sites on eukaryotic genes is needed. We have developed SitEx, a database that contains information on functional site amino acid positions in the exon structure of encoding gene. SitEx is integrated with the BLAST and 3DExonScan programs. BLAST is used for searching sequence similarity between the query protein and polypeptides encoded by single exons stored in SitEx. The 3DExonScan program is used for searching for structural similarity of the given protein with these polypeptides using superimpositions. The developed computer system allows users to analyze the coding features of functional sites by taking into account the exon structure of the gene, to detect the exons involved in shuffling in protein evolution, also to design protein-engineering experiments. SitEx is accessible at Currently, it contains information about 9994 functional sites presented in 2021 proteins described in proteomes of 17 organisms.
PMCID: PMC3245165  PMID: 22139920
10.  Molecular evolution of cyclin proteins in animals and fungi 
The passage through the cell cycle is controlled by complexes of cyclins, the regulatory units, with cyclin-dependent kinases, the catalytic units. It is also known that cyclins form several families, which differ considerably in primary structure from one eukaryotic organism to another. Despite these lines of evidence, the relationship between the evolution of cyclins and their function is an open issue. Here we present the results of our study on the molecular evolution of A-, B-, D-, E-type cyclin proteins in animals and fungi.
We constructed phylogenetic trees for these proteins, their ancestral sequences and analyzed patterns of amino acid replacements. The analysis of infrequently fixed atypical amino acid replacements in cyclins evidenced that accelerated evolution proceeded predominantly during paralog duplication or after it in animals and fungi and that it was related to aromorphic changes in animals. It was shown also that evolutionary flexibility of cyclin function may be provided by consequential reorganization of regions on protein surface remote from CDK binding sites in animal and fungal cyclins and by functional differentiation of paralogous cyclins formed in animal evolution.
The results suggested that changes in the number and/or nature of cyclin-binding proteins may underlie the evolutionary role of the alterations in the molecular structure of cyclins and their involvement in diverse molecular-genetic events.
PMCID: PMC3162929  PMID: 21798004
11.  Molecular evolution of the hyperthermophilic archaea of the Pyrococcus genus: analysis of adaptation to different environmental conditions 
BMC Genomics  2009;10:639.
Prokaryotic microorganisms are able to survive and proliferate in severe environmental conditions. The increasing number of complete sequences of prokaryotic genomes has provided the basis for studying the molecular mechanisms of their adaptation at the genomic level. We apply here a computer-based approach to compare the genomes and proteomes from P. furiosus, P. horikoshii, and P. abyssi to identify features of their molecular evolution related to adaptation strategy to diverse environmental conditions.
Phylogenetic analysis of rRNA genes from 26 Pyrococcus strains suggested that the divergence of P. furiosus, P. horikoshii and P. abyssi might have occurred from ancestral deep-sea organisms. It was demonstrated that the function of genes that have been subject to positive Darwinian selection is closely related to abiotic and biotic conditions to which archaea managed to become adapted. Divergence of the P. furiosus archaea might have been due to loss of some genes involved in cell motility or signal transduction, and/or to evolution under positive selection of the genes for translation machinery. In the course of P. horikoshii divergence, positive selection was found to operate mainly on the transcription machinery; divergence of P. abyssi was related with positive selection for the genes mainly involved in inorganic ion transport. Analysis of radical amino acid replacement rate in evolving P. furiosus, P. horikoshii and P. abyssi showed that the fixation rate was higher for radical substitutions relative to the volume of amino acid side-chain.
The current results give due credit to the important role of hydrostatic pressure as a cause of variability in the P. furiosus, P. horikoshii and P. abyssi genomes evolving in different habitats. Nevertheless, adaptation to pressure does not appear to be the sole factor ensuring adaptation to environment. For example, at the stage of the divergence of P. horikoshii and P. abyssi, an essential evolutionary role may be assigned to changes in the trophic chain, namely, acquisition of a consumer status at a high (P. horikoshii) or low level (P. abyssi).
PMCID: PMC2816203  PMID: 20042074
12.  Effective transcription factor binding site prediction using a combination of optimization, a genetic algorithm and discriminant analysis to capture distant interactions 
BMC Bioinformatics  2007;8:481.
Reliable transcription factor binding site (TFBS) prediction methods are essential for computer annotation of large amount of genome sequence data. However, current methods to predict TFBSs are hampered by the high false-positive rates that occur when only sequence conservation at the core binding-sites is considered.
To improve this situation, we have quantified the performance of several Position Weight Matrix (PWM) algorithms, using exhaustive approaches to find their optimal length and position. We applied these approaches to bio-medically important TFBSs involved in the regulation of cell growth and proliferation as well as in inflammatory, immune, and antiviral responses (NF-κB, ISGF3, IRF1, STAT1), obesity and lipid metabolism (PPAR, SREBP, HNF4), regulation of the steroidogenic (SF-1) and cell cycle (E2F) genes expression. We have also gained extra specificity using a method, entitled SiteGA, which takes into account structural interactions within TFBS core and flanking regions, using a genetic algorithm (GA) with a discriminant function of locally positioned dinucleotide (LPD) frequencies.
To ensure a higher confidence in our approach, we applied resampling-jackknife and bootstrap tests for the comparison, it appears that, optimized PWM and SiteGA have shown similar recognition performances. Then we applied SiteGA and optimized PWMs (both separately and together) to sequences in the Eukaryotic Promoter Database (EPD). The resulting SiteGA recognition models can now be used to search sequences for BSs using the web tool, SiteGA.
Analysis of dependencies between close and distant LPDs revealed by SiteGA models has shown that the most significant correlations are between close LPDs, and are generally located in the core (footprint) region. A greater number of less significant correlations are mainly between distant LPDs, which spanned both core and flanking regions. When SiteGA and optimized PWM models were applied together, this substantially reduced false positives at least at higher stringencies.
Based on this analysis, SiteGA adds substantial specificity even to optimized PWMs and may be considered for large-scale genome analysis. It adds to the range of techniques available for TFBS prediction, and EPD analysis has led to a list of genes which appear to be regulated by the above TFs.
PMCID: PMC2265442  PMID: 18093302
13.  AUG_hairpin: prediction of a downstream secondary structure influencing the recognition of a translation start site 
BMC Bioinformatics  2007;8:318.
The translation start site plays an important role in the control of translation efficiency of eukaryotic mRNAs. The recognition of the start AUG codon by eukaryotic ribosomes is considered to depend on its nucleotide context. However, the fraction of eukaryotic mRNAs with the start codon in a suboptimal context is relatively large. It may be expected that mRNA should possess some features providing efficient translation, including the proper recognition of a translation start site. It has been experimentally shown that a downstream hairpin located in certain positions with respect to start codon can compensate in part for the suboptimal AUG context and also increases translation from non-AUG initiation codons. Prediction of such a compensatory hairpin may be useful in the evaluation of eukaryotic mRNA translation properties.
We evaluated interdependency between the start codon context and mRNA secondary structure at the CDS beginning: it was found that a suboptimal start codon context significantly correlated with higher base pairing probabilities at positions 13 – 17 of CDS of human and mouse mRNAs. It is likely that the downstream hairpins are used to enhance translation of some mammalian mRNAs in vivo. Thus, we have developed a tool, AUG_hairpin, to predict local stem-loop structures located within the defined region at the beginning of mRNA coding part. The implemented algorithm is based on the available published experimental data on the CDS-located stem-loop structures influencing the recognition of upstream start codons.
An occurrence of a potential secondary structure downstream of start AUG codon in a suboptimal context (or downstream of a potential non-AUG start codon) may provide researchers with a testable assumption on the presence of additional regulatory signal influencing mRNA translation initiation rate and the start codon choice. AUG_hairpin, which has a convenient Web-interface with adjustable parameters, will make such an evaluation easy and efficient.
PMCID: PMC2001202  PMID: 17760957
14.  Recognition of interferon-inducible sites, promoters, and enhancers 
BMC Bioinformatics  2007;8:56.
Computational analysis of gene regulatory regions is important for prediction of functions of many uncharacterized genes. With this in mind, search of the target genes for interferon (IFN) induction appears of interest. IFNs are multi-functional cytokines. Their effects are immunomodulatory, antiviral, antibacterial, and antitumor. The interaction of the IFNs with their cell surface receptors produces an activation of several transcription factors. Four regulatory factors, ISGF3, STAT1, IRF1, and NF-κB, are essential for the function of the IFN system. The aim of this work is the development of computational approaches for the recognition of DNA binding sites for these factors and computer programs for the prediction of the IFN-inducible regions.
We developed computational approaches to the recognition of the binding sites for ISGF3, STAT1, IRF1, and NF-κB. Analysis of the distribution of these binding sites demonstrated that the regions -500 upstream of the transcription start site in IFN-inducible genes are enriched in putative binding sites for these transcription factors. Based on selected combinations of the sites whose frequencies were significantly higher than in the other functional gene groups, we developed methods for the prediction of the IFN-inducible promoters and enhancers. We analyzed 1004 sequences of the IFN-inducible genes compiled using microarray data analyses and also about 10,000 human gene sequences from the EPD and RefSeq databases; 74 of 1,664 human genes annotated in EPD were significantly IFN-inducible.
Analyses of several control datasets demonstrated that the developed methods have a high accuracy of prediction of the IFN-inducible genes. Application of these methods to several datasets suggested that the number of the IFN-inducible genes is approximately 1500–2000 in the human genome.
PMCID: PMC1810324  PMID: 17309789
15.  ARGO: a web system for the detection of degenerate motifs and large-scale recognition of eukaryotic promoters 
Nucleic Acids Research  2005;33(Web Server issue):W417-W422.
Reliable recognition of the promoters in eukaryotic genomes remains an open issue. This is largely owing to the poor understanding of the features of the structural–functional organization of the eukaryotic promoters essential for their function and recognition. However, it was demonstrated that detection of ensembles of regulatory signals characteristic of specific promoter groups increases the accuracy of promoter recognition and prediction of specific expression features of the queried genes. The ARGO_Motifs package was developed for the detection of sets of region-specific degenerate oligonucleotide motifs in the regulatory regions of the eukaryotic genes. The ARGO_Viewer package was developed for the recognition of tissue-specific gene promoters based on the presence and distribution of oligonucleotide motifs obtained by the ARGO_Motifs program. Analysis and recognition of tissue-specific promoters in five gene samples demonstrated high quality of promoter recognition. The public version of the ARGO system is available at and .
PMCID: PMC1160220  PMID: 15980502
16.  CRASP: a program for analysis of coordinated substitutions in multiple alignments of protein sequences 
Nucleic Acids Research  2004;32(Web Server issue):W64-W68.
Recent results suggest that during evolution certain substitutions at protein sites may occur in a coordinated manner due to interactions between amino acid residues. Information on these coordinated substitutions may be useful for analysis of protein structure and function. CRASP is an Internet-available software tool for the detection and analysis of coordinated substitutions in multiple alignments of protein sequences. The approach is based on estimation of the correlation coefficient between the values of a physicochemical parameter at a pair of positions of sequence alignment. The program enables the user to detect and analyze pairwise relationships between amino acid substitutions at protein sequence positions, estimate the contribution of the coordinated substitutions to the evolutionary invariance or variability in integral protein physicochemical characteristics such as the net charge of protein residues and hydrophobic core volume. The CRASP program is available at
PMCID: PMC441589  PMID: 15215352
17.  NotI flanking sequences: a tool for gene discovery and verification of the human genome 
Nucleic Acids Research  2002;30(14):3163-3170.
A set of 22 551 unique human NotI flanking sequences (16.2 Mb) was generated. More than 40% of the set had regions with significant similarity to known proteins and expressed sequences. The data demonstrate that regions flanking NotI sites are less likely to form nucleosomes efficiently and resemble promoter regions. The draft human genome sequence contained 55.7% of the NotI flanking sequences, Celera’s database contained matches to 57.2% of the clones and all public databases (including non-human and previously sequenced NotI flanks) matched 89.2% of the NotI flanking sequences (identity ≥90% over at least 50 bp, data from December 2001). The data suggest that the shotgun sequencing approach used to generate the draft human genome sequence resulted in a bias against cloning and sequencing of NotI flanks. A rough estimation (based primarily on chromosomes 21 and 22) is that the human genome contains 15 000–20 000 NotI sites, of which 6000–9000 are unmethylated in any particular cell. The results of the study suggest that the existing tools for computational determination of CpG islands fail to identify a significant fraction of functional CpG islands, and unmethylated DNA stretches with a high frequency of CpG dinucleotides can be found even in regions with low CG content.
PMCID: PMC135748  PMID: 12136098
18.  ASPD (Artificially Selected Proteins/Peptides Database): a database of proteins and peptides evolved in vitro 
Nucleic Acids Research  2002;30(1):200-202.
ASPD is a new curated database that incorporates data on full-length proteins, protein domains and peptides that were obtained through in vitro directed evolution processes (mainly by means of phage display). At present, the ASPD database contains data on 195 selection experiments, which were described in 112 original papers. For each experiment, the following information is given: (i) description of the target for binding, (ii) description of the protein or peptide which serves as the template for library construction and description of the native protein which binds the target, (iii) links to the major proteomic databases (SWISS-PROT, PDB, PROSITE and ENZYME), (iv) keywords referring to the biological significance of the experiment, (v) aligned sequences of proteins or peptides retrieved through in vitro evolution and relevant native or constructed sequences, (vi) the number of rounds of selection/amplification and (vii) the number of occurrences of clones with each sequence. The literature data include a full reference, a link to the MEDLINE database and the name of the corresponding author with his email address. ASPD has a user-friendly interface which allows for simple queries using the names of proteins and ligands, as well as keywords describing the biological role of the interaction studied, and also for queries based on authors’ names. It is also possible to access the database by means of the SRS system, allowing complex queries. There is a BLAST search tool against the ASPD for looking directly for homologous sequences. Research tools of the ASPD allow the analysis of pairwise correlations in the sequences of proteins and peptides selected against one target. The URL for the ASPD database is
PMCID: PMC99101  PMID: 11752292
19.  ACTIVITY: a database on DNA/RNA sites activity adapted to apply sequence-activity relationships from one system to another 
Nucleic Acids Research  2001;29(1):284-287.
ACTIVITY is a database on DNA/RNA site sequences with known activity magnitudes, measurement systems, sequence-activity relationships under fixed experimental conditions and procedures to adapt these relationships from one measurement system to another. This database deposits information on DNA/RNA affinities to proteins and cell nuclear extracts, cutting efficiencies, gene transcription activity, mRNA translation efficiencies, mutability and other biological activities of natural sites occurring within promoters, mRNA leaders, and other regulatory regions in pro- and eukaryotic genomes, their mutant forms and synthetic analogues. Since activity magnitudes are heavily system-dependent, the current version of ACTIVITY is supplemented by three novel sub-databases: (i) SYSTEM, measurement systems; (ii) KNOWLEDGE, sequence-activity relationships under fixed experimental conditions; and (iii) CROSS_TEST, procedures adapting a relationship from one measurement system to another. These databases are useful in molecular biology, pharmacogenetics, metabolic engineering, drug design and biotechnology. The databases can be queried using SRS and are available through the Web,
PMCID: PMC29829  PMID: 11125114
20.  COMPEL: a database on composite regulatory elements providing combinatorial transcriptional regulation 
Nucleic Acids Research  2000;28(1):311-315.
COMPEL is a database on composite regulatory elements, the basic structures of combinatorial regulation. Composite regulatory elements contain two closely situated binding sites for distinct transcription factors and represent minimal functional units providing combinatorial transcriptional regulation. Both specific factor–DNA and factor–factor interactions contribute to the function of composite elements (CEs). Information about the structure of known CEs and specific gene regulation achieved through such CEs appears to be extremely useful for promoter prediction, for gene function prediction and for applied gene engineering as well. The structure of the relational model of COMPEL is determined by the concept of molecular structure and regulatory role of CEs. Based on the set of a particular CE, a program has been developed for searching potential CEs in gene regulatory regions. WWW search and browse routines were developed for COMPEL release 3.0. The COMPEL database equipped with the search and browse tools is available at . The program for prediction of potential CEs of NFAT type is available at and
PMCID: PMC102399  PMID: 10592258
21.  SELEX_DB: an activated database on selected randomized DNA/RNA sequences addressed to genomic sequence annotation 
Nucleic Acids Research  2000;28(1):205-208.
SELEX_DB is a novel curated database on selected randomized DNA/RNA sequences designed for accumulation of experimental data on functional site sequences obtained by using SELEX and SELEX-like technologies from the pools of random sequences. This database also contains the programs for DNA/RNA functional site recognition within arbitrary nucleotide sequences. The first release of SELEX_DB has been installed under SRS and is available through the WWW at
PMCID: PMC102392  PMID: 10592226

Results 1-21 (21)