Background: Previous risk score is not simple for predicting existence of atherosclerotic renal artery stenosis (ARAS). Our study aims to develop a simple score to predict ARAS in eastern people with ischemic heart disease. Methods: There were two data sources involved in this study. From the data source of patients with acute myocardial infarction, we developed a clinical score for predicting existence of ARAS. After this, we validated this clinical score in data source of patients with ischemic heart failure. Results: By multivariable logistic regression analysis, only age, hypertension, stroke or intermittent claudication, serum creatinine were involved in this model. Receiver operating characteristic curve was plotted. In the first data source, area under curve is 0.808 to predict ARAS, and 0.762 for bilateral ARAS. In the second data source, area under curve is 0.721 to predict ARAS, and 0.827 for ARAS. Cutoff value of 35.0 yields a sensitivity of 82.4% and a specificity of 51.0% for ARAS, a sensitivity of 78.9% and a specificity of 47.1% for bilateral ARAS. In the second data source, this cutoff value yields a sensitivity of 85.0% and a specificity of 30.5% for ARAS, a sensitivity of 85.7% and a specificity of 17.5% for bilateral ARAS. Conclusions: We have developed a simple score for eastern people to predicting existence of ARAS with acceptable sensitivity and specificity in patients with ischemic heart disease. This score is still needed to be validated in general population or patients with no coronary heart disease.
Renal artery obstruction; heart failure; systolic; coronary artery disease; myocardial infarction
Identification of transcription units (TUs) encoded in a bacterial genome is essential to elucidation of transcriptional regulation of the organism. To gain a detailed understanding of the dynamically composed TU structures, we have used four strand-specific RNA-seq (ssRNA-seq) datasets collected under two experimental conditions to derive the genomic TU organization of Clostridium thermocellum using a machine-learning approach. Our method accurately predicted the genomic boundaries of individual TUs based on two sets of parameters measuring the RNA-seq expression patterns across the genome: expression-level continuity and variance. A total of 2590 distinct TUs are predicted based on the four RNA-seq datasets. Among the predicted TUs, 44% have multiple genes. We assessed our prediction method on an independent set of RNA-seq data with longer reads. The evaluation confirmed the high quality of the predicted TUs. Functional enrichment analyses on a selected subset of the predicted TUs revealed interesting biology. To demonstrate the generality of the prediction method, we have also applied the method to RNA-seq data collected on Escherichia coli and achieved high prediction accuracies. The TU prediction program named SeqTU is publicly available at https://code.google.com/p/seqtu/. We expect that the predicted TUs can serve as the baseline information for studying transcriptional and post-transcriptional regulation in C. thermocellum and other bacteria.
AIM: To investigate the effects and underlying mechanisms of resveratrol and genistein on contractile responses of rat gastrointestinal smooth muscle.
METHODS: Isolated strips of gastrointestinal smooth muscle from Spraque-Dawley rats were suspended in organ baths containing Kreb’s solution, and the contractility of smooth muscles was measured before and after incubation with resveratrol and genistein, and the related mechanisms were studied by co-incubation with various inhibitors.
RESULTS: Resveratrol and genistein dose-dependently decreased the resting tension, and also reduced the mean contractile amplitude of gastrointestinal smooth muscle. Estrogen receptor blockades (ICI 182780 and tamoxifen) failed to alter the inhibitory effects induced by resveratrol and genistein. However, their effects were attenuated by inhibitions of α-adrenergic receptor (phentolamine), nitric oxide synthase (levorotatory-NG-nitroarginine), ATP-sensitive potassium channels (glibenclamide), and cyclic adenosine monophosphate (SQ22536). In high K+/Ca2+-free Kreb’s solution containing 0.01 mmol/L egtazic acid, resveratrol and genistein reduced the contractile responses of CaCl2, and shifted its cumulative concentration-response curves rightward.
CONCLUSION: Resveratrol and genistein relax gastrointestinal smooth muscle via α-adrenergic receptors, nitric oxide and cyclic adenosine monophosphate pathways, ATP-sensitive potassium channels, and inhibition of L-type Ca2+ channels.
Ca2+ channel; Gastrointestinal; Motility; Phytoestrogen; Smooth muscle
About half of the protein-coding genes in prokaryotic genomes are organized into operons to facilitate co-regulation during transcription. With the evolution of genomes, operon structures are undergoing changes which could coordinate diverse gene expression patterns in response to various stimuli during the life cycle of a bacterial cell. Here we developed a graph-based model to elucidate the diversity of operon structures across a set of closely related bacterial genomes. In the constructed graph, each node represents one orthologous gene group (OGG) and a pair of nodes will be connected if any two genes, from the corresponding two OGGs respectively, are located in the same operon as immediate neighbors in any of the considered genomes. Through identifying the connected components in the above graph, we found that genes in a connected component are likely to be functionally related and these identified components tend to form treelike topology, such as paths and stars, corresponding to different biological mechanisms in transcriptional regulation as follows. Specifically, (i) a path-structure component integrates genes encoding a protein complex, such as ribosome; and (ii) a star-structure component not only groups related genes together, but also reflects the key functional roles of the central node of this component, such as the ABC transporter with a transporter permease and substrate-binding proteins surrounding it. Most interestingly, the genes from organisms with highly diverse living environments, i.e., biomass degraders and animal pathogens of clostridia in our study, can be clearly classified into different topological groups on some connected components.
DMINDA (DNA motif identification and
analyses) is an integrated web server for DNA motif identification
and analyses, which is accessible at http://csbl.bmb.uga.edu/DMINDA/. This web site is freely
available to all users and there is no login requirement. This server provides a
suite of cis-regulatory motif analysis functions on DNA
sequences, which are important to elucidation of the mechanisms of
transcriptional regulation: (i) de novo motif finding for a
given set of promoter sequences along with statistical scores for the predicted
motifs derived based on information extracted from a control set, (ii) scanning
motif instances of a query motif in provided genomic sequences, (iii) motif
comparison and clustering of identified motifs, and (iv) co-occurrence analyses
of query motifs in given promoter sequences. The server is powered by a backend
computer cluster with over 150 computing nodes, and is particularly useful for
motif prediction and analyses in prokaryotic genomes. We believe that DMINDA, as
a new and comprehensive web server for cis-regulatory motif
finding and analyses, will benefit the genomic research community in general and
prokaryotic genome researchers in particular.
As biotechnology advances rapidly, a tremendous amount of cancer genetic data has become available, providing an unprecedented opportunity for understanding the genetic mechanisms of cancer. To understand the effects of duplications and deletions on cancer progression, two genomes (normal and tumor) were sequenced from each of five stomach cancer patients in different stages (I, II, III and IV). We developed a phylogenetic model for analyzing stomach cancer data. The model assumes that duplication and deletion occur in accordance with a continuous time Markov Chain along the branches of a phylogenetic tree attached with five extended branches leading to the tumor genomes. Moreover, coalescence times of the phylogenetic tree follow a coalescence process. The simulation study suggests that the maximum likelihood approach can accurately estimate parameters in the phylogenetic model. The phylogenetic model was applied to the stomach cancer data. We found that the expected number of changes (duplication and deletion) per gene for the tumor genomes is significantly higher than that for the normal genomes. The goodness-of-fit test suggests that the phylogenetic model with constant duplication and deletion rates can adequately fit the duplication data for the normal genomes. The analysis found nine duplicated genes that are significantly associated with stomach cancer.
We have recently developed a new version of the DOOR operon database, DOOR 2.0, which is available online at http://csbl.bmb.uga.edu/DOOR/ and will be updated on a regular basis. DOOR 2.0 contains genome-scale operons for 2072 prokaryotes with complete genomes, three times the number of genomes covered in the previous version published in 2009. DOOR 2.0 has a number of new features, compared with its previous version, including (i) more than 250 000 transcription units, experimentally validated or computationally predicted based on RNA-seq data, providing a dynamic functional view of the underlying operons; (ii) an integrated operon-centric data resource that provides not only operons for each covered genome but also their functional and regulatory information such as their cis-regulatory binding sites for transcription initiation and termination, gene expression levels estimated based on RNA-seq data and conservation information across multiple genomes; (iii) a high-performance web service for online operon prediction on user-provided genomic sequences; (iv) an intuitive genome browser to support visualization of user-selected data; and (v) a keyword-based Google-like search engine for finding the needed information intuitively and rapidly in this database.
The thermophilic anaerobe Clostridium thermocellum is a candidate consolidated bioprocessing (CBP) biocatalyst for cellulosic ethanol production. It is capable of both cellulose solubilization and its fermentation to produce lignocellulosic ethanol. Intolerance to stresses routinely encountered during industrial fermentations may hinder the commercial development of this organism. A previous C. thermocellum ethanol stress study showed that the largest transcriptomic response was in genes and proteins related to nitrogen uptake and metabolism.
In this study, C. thermocellum was grown to mid-exponential phase and treated with furfural or heat to a final concentration of 3 g.L-1 or 68°C respectively to investigate general and specific physiological and regulatory stress responses. Samples were taken at 10, 30, 60 and 120 min post-shock, and from untreated control fermentations, for transcriptomic analyses and fermentation product determinations and compared to a published dataset from an ethanol stress study. Urea uptake genes were induced following furfural stress, but not to the same extent as ethanol stress and transcription from these genes was largely unaffected by heat stress. The largest transcriptomic response to furfural stress was genes for sulfate transporter subunits and enzymes in the sulfate assimilatory pathway, although these genes were also affected late in the heat and ethanol stress responses. Lactate production was higher in furfural treated culture, although the lactate dehydrogenase gene was not differentially expressed under this condition. Other redox related genes such as a copy of the rex gene, a bifunctional acetaldehyde-CoA/alcohol dehydrogenase and adjacent genes did show lower expression after furfural stress compared to the control, heat and ethanol fermentation profiles. Heat stress induced expression from chaperone related genes and overlap was observed with the responses to the other stresses. This study suggests the involvement of C. thermocellum genes with functions in oxidative stress protection, electron transfer, detoxification, sulfur and nitrogen acquisition, and DNA repair mechanisms in its stress responses and the use of different regulatory networks to coordinate and control adaptation.
This study has identified C. thermocellum gene regulatory motifs and aspects of physiology and gene regulation for further study. The nexus between future systems biology studies and recently developed genetic tools for C. thermocellum offers the potential for more rapid strain development and for broader insights into this organism’s physiology and regulation.
Biomass; Recalcitrance; Inhibitor; Stress; DNA microarray; Regulation; Regulatory motif
Paris polyphylla var. yunnanensis is an important medicinal plant. Seed dormancy is one of the main factors restricting artificial cultivation. The molecular mechanisms of seed dormancy remain unclear, and little genomic or transcriptome data are available for this plant.
In this study, massive parallel pyrosequencing on the Roche 454-GS FLX Titanium platform was used to generate a substantial sequence dataset for the P. polyphylla embryo. 369,496 high quality reads were obtained, ranging from 50 to 1146 bp, with a mean of 219 bp. These reads were assembled into 47,768 unigenes, which included 16,069 contigs and 31,699 singletons. Using BLASTX searches of public databases, 15,757 (32.3%) unique transcripts were identified. Gene Ontology and Cluster of Orthologous Groups of proteins annotations revealed that these transcripts were broadly representative of the P. polyphylla embryo transcriptome. The Kyoto Encyclopedia of Genes and Genomes assigned 5961 of the unique sequences to specific metabolic pathways. Relative expression levels analysis showed that eleven phytohormone-related genes and five other genes have different expression patterns in the embryo and endosperm in the seed stratification process.
Gene annotation and quantitative RT-PCR expression analysis identified 464 transcripts that may be involved in phytohormone catabolism and biosynthesis, hormone signal, seed dormancy, seed maturation, cell wall growth and circadian rhythms. In particular, the relative expression analysis of sixteen genes (CYP707A, NCED, GA20ox2, GA20ox3, ABI2, PP2C, ARP3, ARP7, IAAH, IAAS, BRRK, DRM, ELF1, ELF2, SFR6, and SUS) in embryo and endosperm and at two temperatures indicated that these related genes may be candidates for clarifying the molecular basis of seed dormancy in P. polyphlla var. yunnanensis.
Embryo; Stratification; Seed dormancy; High-throughput sequencing; Paris polyphylla
The circular chromosome of Escherichia coli has been suggested to fold into a collection of sequentially consecutive domains, genes in each of which tend to be co-expressed. It has also been suggested that such domains, forming a partition of the genome, are dynamic with respect to the physiological conditions. However, little is known about which DNA segments of the E. coli genome form these domains and what determines the boundaries of these domain segments. We present a computational model here to partition the circular genome into consecutive segments, theoretically suggestive of the physically folded supercoiled domains, along with a method for predicting such domains under specified conditions. Our model is based on a hypothesis that the genome of E. coli is partitioned into a set of folding domains so that the total number of unfoldings of these domains in the folded chromosome is minimized, where a domain is unfolded when a biological pathway, consisting of genes encoded in this DNA segment, is being activated transcriptionally. Based on this hypothesis, we have predicted seven distinct sets of such domains along the E. coli genome for seven physiological conditions, namely exponential growth, stationary growth, anaerobiosis, heat shock, oxidative stress, nitrogen limitation and SOS responses. These predicted folding domains are highly stable statistically and are generally consistent with the experimental data of DNA binding sites of the nucleoid-associated proteins that assist the folding of these domains, as well as genome-scale protein occupancy profiles, hence supporting our proposed model. Our study established for the first time a strong link between a folded E. coli chromosomal structure and the encoded biological pathways and their activation frequencies.
The role of peroxisome proliferator – activated receptor- δ (PPAR δ) gene in colon carcinogenesis remains highly controversial. Here, we established nude mice xenograft model using a human colon cancer cell line KM12C either with PPAR δ silenced or normal. The xenografts in PPAR δ-silenced group grew significantly larger and heavier with less differentiation, promoted cell proliferation, increased expression of vascular endothelial growth factor (VEGF) and similar apoptosis index compared with those of PPAR δ-normal group. After treated with the specific VEGF inhibitor bevacizumab, the capacities of growth and proliferation of xenografts were decreased in both groups while still significantly higher in PPAR δ-silenced group than in PPAR δ-normal group. Administration of PPAR δ agonist significantly decreased VEGF expression in PPAR δ-normal KM12C cells but not in PPAR δ-silenced cells. These findings demonstrate that, knockdown of PPAR δ promotes the growth of colon cancer by inducing less differentiation, accelerating the proliferation and VEGF expression of tumor cells in vivo, and reduces tumor sensitivity to bevacizumab. This study indicates that PPAR δ attenuates colon carcinogenesis.
Extremely thermophilic bacteria of the genus Caldicellulosiruptor utilize carbohydrate components of plant cell walls, including cellulose and hemicellulose, facilitated by a diverse set of glycoside hydrolases (GHs). From a biofuel perspective, this capability is crucial for deconstruction of plant biomass into fermentable sugars. While all species from the genus grow on xylan and acid-pretreated switchgrass, growth on crystalline cellulose is variable. The basis for this variability was examined using microbiological, genomic, and proteomic analyses of eight globally diverse Caldicellulosiruptor species. The open Caldicellulosiruptor pangenome (4,009 open reading frames [ORFs]) encodes 106 GHs, representing 43 GH families, but only 26 GHs from 17 families are included in the core (noncellulosic) genome (1,543 ORFs). Differentiating the strongly cellulolytic Caldicellulosiruptor species from the others is a specific genomic locus that encodes multidomain cellulases from GH families 9 and 48, which are associated with cellulose-binding modules. This locus also encodes a novel adhesin associated with type IV pili, which was identified in the exoproteome bound to crystalline cellulose. Taking into account the core genomes, pangenomes, and individual genomes, the ancestral Caldicellulosiruptor was likely cellulolytic and evolved, in some cases, into species that lost the ability to degrade crystalline cellulose while maintaining the capacity to hydrolyze amorphous cellulose and hemicellulose.
The availability of a large number of sequenced bacterial genomes allows researchers not only to derive functional and regulation information about specific organisms but also to study the fundamental properties of the organization of a genome. Here we address an important and challenging question regarding the global arrangement of operons in a bacterial genome: why operons in a bacterial genome are arranged in the way they are. We have previously studied this question and found that operons of more frequently activated pathways tend to be more clustered together in a genome. Specifically, we have developed a simple sequential distance-based pseudo energy function and found that the arrangement of operons in a bacterial genome tend to minimize the clusteredness function (C value) in comparison with artificially-generated alternatives, for a variety of bacterial genomes. Here we extend our previous work, and report a number of new observations: (a) operons of the same pathways tend to group into a few clusters rather than one; and (b) the global arrangement of these operon clusters tend to minimize a new “energy” function (C+ value) that reflects the efficiency of the transcriptional activation of the encoded pathways. These observations provide insights into further study of the genomic organization of genes in bacteria.
Global genomic arrangement; Bacterial genome; Chromosomal supercoils; Biological pathways; Gene expression
Polo-like kinases play an essential role in the ordered execution of mitotic events and 4 mammalian PLK family members have been identified. Accumulating evidence indicates that PLK1 is an attractive target for anticancer drugs. In this paper, a series of beta-carboline derivatives were synthesized and three compounds, DH281, DH285 and DH287, were identified as potent new PLK inhibitors. We employed various biochemical and cellular approaches to determine the effects of these compounds on the activity of PLK1 and other mitotic kinases and on cell cycle progression. We found that these three compounds could selectively inhibit the kinase activity of purified PLK1, PLK2 and PLK3 in vitro. They show strong antitumor activity against a number of cancer cell lines with relatively low micromolar IC50s, but are relatively less toxic to non-cancer cells (MRC5). Moreover, these compounds could induce obvious accumulation of HeLa cells in G2/M and S phases and trigger apoptosis. Although MRC5 cells show clear S-phase arrest after treatment with these compounds, the G2/M arrest and apoptosis are less insignificant, indicating the distinct sensitivity between normal and cancer cells. We also found that HeLa cells treated with these drugs exhibit monopolar spindles and increased Wee1 protein levels, the characteristics of cells treated with PLK1 inhibitors. Together, these results demonstrate that DH281, DH285 and DH287 beta-carboline compounds are new PLK inhibitors with potential for cancer treatment.
Identification of the novel genes relevant to plant cell-wall (PCW) synthesis represents a highly important and challenging problem. Although substantial efforts have been invested into studying this problem, the vast majority of the PCW related genes remain unknown.
Here we present a computational study focused on identification of the novel PCW genes in Arabidopsis based on the co-expression analyses of transcriptomic data collected under 351 conditions, using a bi-clustering technique. Our analysis identified 217 highly co-expressed gene clusters (modules) under some experimental conditions, each containing at least one gene annotated as PCW related according to the Purdue Cell Wall Gene Families database. These co-expression modules cover 349 known/annotated PCW genes and 2,438 new candidates. For each candidate gene, we annotated the specific PCW synthesis stages in which it is involved and predicted the detailed function. In addition, for the co-expressed genes in each module, we predicted and analyzed their cis regulatory motifs in the promoters using our motif discovery pipeline, providing strong evidence that the genes in each co-expression module are transcriptionally co-regulated. From the all co-expression modules, we infer that 108 modules are related to four major PCW synthesis components, using three complementary methods.
We believe our approach and data presented here will be useful for further identification and characterization of PCW genes. All the predicted PCW genes, co-expression modules, motifs and their annotations are available at a web-based database: http://csbl.bmb.uga.edu/publications/materials/shanwang/CWRPdb/index.html.
Plant cell wall; Arabidopsis; Co-expression network analysis; Bi-clustering; Cis regulatory motifs
The aim of this study was to detect differences in the expression levels of melanoma-associated antigen D4 (MAGED4) mRNA between non-small cell lung cancer (NSCLC) tissues and normal tissues, and to compare differences in the expression levels of MAGED4 in tumor patients. Patients were grouped according to age, gender, smoking history, tumor size, pathological classification, degree of lung cancer cell differentiation and presence of lymph node metastasis. The expression levels of MAGED4 were detected using real-time fluorescence quantitative PCR. MAGED4 expression was higher in squamous cell carcinomas compared to adenocarcinomas (P<0.05), in poorly differentiated tissues compared to well-differentiated tissues (P<0.05), and in patients with lymph node metastasis compared to patients without lymph node metastasis (P<0.05). MAGED4 may be used as a specific antigen for NSCLC to influence the improvement of diagnosis, prognosis and immunological therapy outcomes in lung cancer patients.
MAGED4 gene; non-small cell lung cancer; polymerase chain reaction
Biclustering is a powerful technique for identification of co-expressed gene groups under any (unspecified) substantial subset of given experimental conditions, which can be used for elucidation of transcriptionally co-regulated genes.
We have previously developed a biclustering algorithm, QUBIC, which can solve more general biclustering problems than previous biclustering algorithms. To fully utilize the analysis power the algorithm provides, we have developed a web server, QServer, for prediction, computational validation and analyses of co-expressed gene clusters. Specifically, the QServer has the following capabilities in addition to biclustering by QUBIC: (i) prediction and assessment of conserved cis regulatory motifs in promoter sequences of the predicted co-expressed genes; (ii) functional enrichment analyses of the predicted co-expressed gene clusters using Gene Ontology (GO) terms, and (iii) visualization capabilities in support of interactive biclustering analyses. QServer supports the biclustering and functional analysis for a wide range of organisms, including human, mouse, Arabidopsis, bacteria and archaea, whose underlying genome database will be continuously updated.
We believe that QServer provides an easy-to-use and highly effective platform useful for hypothesis formulation and testing related to transcription co-regulation.
Extracellular matrix metalloproteinase inducer (EMMPRIN) is a transmembrane glycoprotein that is involved in tumor invasion by stimulating matrix metalloproteinase (MMP) expression. Our previous immunohistochemical study found that the expression of EMMPRIN in salivary adenoid cystic carcinoma (SACC) was positively correlated with tumor perineural and perivascular invasion. The present study was designed to further investigate the role of EMMPRIN in the invasion of SACC. Western blot results showed that EMMPRIN was upregulated in the highly metastatic SACC cell line SACC-LM, compared to SACC-83, a SACC cell line with low metastatic ability. Blocking of EMMPRIN by its antibody significantly decreased the adhesion, secretion of MMP-2 and MMP-9, and invasion activity of SACC-LM cells in vitro (P<0.01). Co-cultures of SACC-LM cells with fibroblasts significantly produced elevated levels of MMP-2 and MMP-9, and promoted the in vitro invasion activity of SACC-LM cells, compared with cultures of SACC-LM cells alone (P<0.01). These results indicate that EMMPRIN may play an important role in the invasion of SACC by stimulating the expression of MMP-2 and MMP-9 in tumor and stromal cells.
salivary adenoid cystic carcinoma; invasion; extracellular matrix metalloproteinase inducer; matrix metalloproteinases
Objective: In this paper, we investigated the effect of the traditional Chinese medicine Chaiqin Chengqi Decoction (CQCQD) on serum cytokines in acute pancreatitis (AP) patients. Methods: Peripheral blood samples from 107 AP patients were collected within the first 48 h of AP onset and on the 10th day of CQCQD treatment. Control samples were collected from 20 healthy individuals. Serum proinflammatory cytokines tumor necrosis factor-α (TNF-α) and interleukin-6 (IL-6), and anti-inflammatory cytokines IL-10 and IL-1β receptor antagonist (IL-1ra) were examined using the Luminex 100 system. Results: Within the first 48 h of AP onset, IL-6 and IL-1ra levels in severe AP (SAP) patients were significantly higher than those in mild AP (MAP) patients, but IL-10 levels in SAP patients were significantly lower than those in MAP patients. Proinflammatory cytokine IL-6 was significantly decreased after CQCQD treatment (P<0.05), especially in SAP patients (n=25 of 36, P<0.05). The hospitalization time of SAP patients was shortened significantly when serum IL-6 decreased after CQCQD treatment (P<0.05). Conclusions: CQCQD decreased proinflammatory cytokine IL-6 levels in AP patients.
Acute pancreatitis; Cytokine; Chaiqin Chengqi Decoction (CQCQD)
Existing methods for orthologous gene mapping suffer from two general problems: (i) they are computationally too slow and their results are difficult to interpret for automated large-scale applications when based on phylogenetic analyses; or (ii) they are too prone to making mistakes in dealing with complex situations involving horizontal gene transfers and gene fusion due to the lack of a sound basis when based on sequence similarity information. We present a novel algorithm, Global Optimization Strategy (GOST), for orthologous gene mapping through combining sequence similarity and contextual (working partners) information, using a combinatorial optimization framework. Genome-scale applications of GOST show substantial improvements over the predictions by three popular sequence similarity-based orthology mapping programs. Our analysis indicates that our algorithm overcomes the intrinsic issues faced by sequence similarity-based methods, when orthology mapping involves gene fusions and horizontal gene transfers. Our program runs as efficiently as the most efficient sequence similarity-based algorithm in the public domain. GOST is freely downloadable at http://csbl.bmb.uga.edu/~maqin/GOST.
We present a new algorithm, BOBRO, for prediction of cis-regulatory motifs in a given set of promoter sequences. The algorithm substantially improves the prediction accuracy and extends the scope of applicability of the existing programs based on two key new ideas: (i) we developed a highly effective method for reliably assessing the possibility for each position in a given promoter to be the (approximate) start of a conserved sequence motif; and (ii) we developed a highly reliable way for recognition of actual motifs from the accidental ones based on the concept of ‘motif closure’. These two key ideas are embedded in a classical framework for motif finding through finding cliques in a graph but have made this framework substantially more sensitive as well as more selective in motif finding in a very noisy background. A comparative analysis shows that the performance coefficient was improved from 29% to 41% by our program compared to the best among other six state-of-the-art prediction tools on a large-scale data sets of promoters from one genome, and also consistently improved by substantial margins on another kind of large-scale data sets of orthologous promoters across multiple genomes. The power of BOBRO in dealing with noisy data was further demonstrated through identification of the motifs of the global transcriptional regulators by running it over 2390 promoter sequences of Escherichia coli K12.
The adsorption of Cd(II) and Pb(II) by squid melanin was investigated. At a metal ion concentration of 2 mM/L, the biosorption efficiency of melanin reached 95% for Cd(II) and Pb(II). The maximum content of bound Cd(II) and Pb(II) was 0.93 mM/g and 0.65 mM/g, respectively. Temperature had no obvious effect on the adsorption of the metals, and in a pH range of 4.0–7.0, the adsorption yield was high and stable. Macrosalts such as NaCl, MgCl2, and CaCl2 had no obvious effect on the binding of Pb(II) but greatly diminished the adsorption of Cd(II), which indicated that different functional groups in squid melanin are responsible for their adsorption. IR analysis of metal ion-enriched squid melanin demonstrated that the possible functional groups responsible for metal binding were phenolic hydroxyl (OH), carboxyl (COOH), and amine groups (NH). This study reports a new material for the removal of heavy metals from low-strength wastewater.
Biclustering extends the traditional clustering techniques by attempting to find (all) subgroups of genes with similar expression patterns under to-be-identified subsets of experimental conditions when applied to gene expression data. Still the real power of this clustering strategy is yet to be fully realized due to the lack of effective and efficient algorithms for reliably solving the general biclustering problem. We report a QUalitative BIClustering algorithm (QUBIC) that can solve the biclustering problem in a more general form, compared to existing algorithms, through employing a combination of qualitative (or semi-quantitative) measures of gene expression data and a combinatorial optimization technique. One key unique feature of the QUBIC algorithm is that it can identify all statistically significant biclusters including biclusters with the so-called ‘scaling patterns’, a problem considered to be rather challenging; another key unique feature is that the algorithm solves such general biclustering problems very efficiently, capable of solving biclustering problems with tens of thousands of genes under up to thousands of conditions in a few minutes of the CPU time on a desktop computer. We have demonstrated a considerably improved biclustering performance by our algorithm compared to the existing algorithms on various benchmark sets and data sets of our own. QUBIC was written in ANSI C and tested using GCC (version 4.1.2) on Linux. Its source code is available at: http://csbl.bmb.uga.edu/∼maqin/bicluster. A server version of QUBIC is also available upon request.