The lymphatic system is an important pathway for tumor dissemination to the lymph nodes, but to which extent it contributes to the formation of distant metastases remains unknown. We report that induction of lymphangiogenesis by vascular endothelial growth factor-C (VEGF-C) at the secondary site, in the lung, facilitates expansion of already disseminated cancer cells throughout the lung tissue. By using orthotopic spontaneous metastasis models in nude mice, we show that VEGF-C expression by tumor cells altered the pattern of pulmonary metastases from nodular to diffuse and facilitated disease progression. Metastases expressing VEGF-C were tightly associated with the airways, in contrast to the control cells that were scattered in the lung parenchyma, throughout the alveolar region. VEGF-C induced lung lymphangiogenesis and promoted intralymphatic spread of metastases in the lung and formation of tumor emboli in the pulmonary arteries. This pattern of metastasis corresponds to lymphangitic carcinomatosis metastatic phenotype in human cancer patients, an extremely aggressive pattern of pulmonary metastases. In accordance, pulmonary breast cancer metastases from patients which were classified as lymphangitic carcinomatosis showed high levels of VEGF-C expression in cancer cells. These data show that VEGF-C promotes late steps of the metastatic process and identify the VEGF-C/VEGF receptor-3 pathway as the target not only for prevention of metastases, but also for treatment of established metastatic disease.
The prevalence of suicide attempts (SA) in bipolar II disorder (BPII), particularly in comparison to the prevalence in bipolar I disorder (BPI), is an understudied and controversial issue with mixed results. To date, there has been no comprehensive review of the published prevalence data for attempted suicide in BPII.
We conducted a literature review and meta-analysis of published reports that specified the proportion of individuals with BPII in their presentation of SA data. Systematic searching yielded 24 reports providing rates of SA in BPII and 21 reports including rates of SA in both BPI and BPII. We estimated the prevalence of SA in BPII by combining data across reports of similar designs. To compare rates of SA in BPII and BPI, we calculated a pooled odds ratio (OR) and 95% confidence interval (CI) with random-effect meta-analytic techniques with retrospective data from 15 reports that detailed rates of SA in both BPI and BPII.
Among the 24 reports with any BPII data, 32.4% (356 /1099) of individuals retrospectively reported a lifetime history of SA, 19.8% (93 /469) prospectively reported attempted suicide, and 20.5% (55 /268) of index attempters were diagnosed with BPII. In 15 retrospective studies suitable for meta-analysis, the prevalence of attempted suicide in BPII and BPI was not significantly different: 32.4% and 36.3%, respectively (OR = 1.21, 95% CI: 0.98–1.48, p = 0.07).
The contribution of BPII to suicidal behavior is considerable. Our findings suggest that there is no significant effect of bipolar subtype on rate of SA. Our findings are particularly alarming in concert with other evidence, including (i) the well-documented predictive role of SA for completed suicide and (ii) the evidence suggesting that individuals with BPII use significantly more violent and lethal methods than do individuals with BPI. To reduce suicide-related morbidity and mortality, routine clinical care for BPII must include ongoing risk assessment and interventions targeted at risk factors.
attempted suicide; bipolar disorder; bipolar II; meta-analysis; suicide
To provide recommendations to patients, physicians, and other health care providers on several issues involving deep brain stimulation (DBS) for Parkinson disease (PD).
Data Sources and Study Selection
An international consortium of experts organized, reviewed the literature, and attended the workshop. Topics were introduced at the workshop, followed by group discussion.
Data Extraction and Synthesis
A draft of a consensus statement was presented and further edited after plenary debate. The final statements were agreed on by all members.
(1) Patients with PD without significant active cognitive or psychiatric problems who have medically intractable motor fluctuations, intractable tremor, or intolerance of medication adverse effects are good candidates for DBS. (2) Deep brain stimulation surgery is best performed by an experienced neurosurgeon with expertise in stereotactic neurosurgery who is working as part of a interprofessional team. (3) Surgical complication rates are extremely variable, with infection being the most commonly reported complication of DBS. (4) Deep brain stimulation programming is best accomplished by a highly trained clinician and can take 3 to 6 months to obtain optimal results. (5) Deep brain stimulation improves levodopa-responsive symptoms, dyskinesia, and tremor; benefits seem to be long-lasting in many motor domains. (6) Subthalamic nuclei DBS may be complicated by increased depression, apathy, impulsivity, worsened verbal fluency, and executive dysfunction in a subset of patients. (7) Both globus pallidus pars interna and subthalamic nuclei DBS have been shown to be effective in addressing the motor symptoms of PD. (8) Ablative therapy is still an effective alternative and should be considered in a select group of appropriate patients.
Freshwater planarians are an attractive model for regeneration and stem cell research and have become a promising tool in the field of regenerative medicine. With the availability of a sequenced planarian genome, the recent application of modern genetic and high-throughput tools has resulted in revitalized interest in these animals, long known for their amazing regenerative capabilities, which enable them to regrow even a new head after decapitation. However, a detailed description of the planarian transcriptome is essential for future investigation into regenerative processes using planarians as a model system.
In order to complement and improve existing gene annotations, we used a 454 pyrosequencing approach to analyze the transcriptome of the planarian species Schmidtea mediterranea Altogether, 598,435 454-sequencing reads, with an average length of 327 bp, were assembled together with the ~10,000 sequences of the S. mediterranea UniGene set using different similarity cutoffs. The assembly was then mapped onto the current genome data. Remarkably, our Smed454 dataset contains more than 3 million novel transcribed nucleotides sequenced for the first time. A descriptive analysis of planarian splice sites was conducted on those Smed454 contigs that mapped univocally to the current genome assembly. Sequence analysis allowed us to identify genes encoding putative proteins with defined structural properties, such as transmembrane domains. Moreover, we annotated the Smed454 dataset using Gene Ontology, and identified putative homologues of several gene families that may play a key role during regeneration, such as neurotransmitter and hormone receptors, homeobox-containing genes, and genes related to eye function.
We report the first planarian transcript dataset, Smed454, as an open resource tool that can be accessed via a web interface. Smed454 contains significant novel sequence information about most expressed genes of S. mediterranea. Analysis of the annotated data promises to contribute to identification of gene families poorly characterized at a functional level. The Smed454 transcriptome data will assist in the molecular characterization of S. mediterranea as a model organism, which will be useful to a broad scientific community.
Corynebacterium pseudotuberculosis is generally regarded as an important animal pathogen that rarely infects humans. Clinical strains are occasionally recovered from human cases of lymphadenitis, such as C. pseudotuberculosis FRC41 that was isolated from the inguinal lymph node of a 12-year-old girl with necrotizing lymphadenitis. To detect potential virulence factors and corresponding gene-regulatory networks in this human isolate, the genome sequence of C. pseudotuberculosis FCR41 was determined by pyrosequencing and functionally annotated.
Sequencing and assembly of the C. pseudotuberculosis FRC41 genome yielded a circular chromosome with a size of 2,337,913 bp and a mean G+C content of 52.2%. Specific gene sets associated with iron and zinc homeostasis were detected among the 2,110 predicted protein-coding regions and integrated into a gene-regulatory network that is linked with both the central metabolism and the oxidative stress response of FRC41. Two gene clusters encode proteins involved in the sortase-mediated polymerization of adhesive pili that can probably mediate the adherence to host tissue to facilitate additional ligand-receptor interactions and the delivery of virulence factors. The prominent virulence factors phospholipase D (Pld) and corynebacterial protease CP40 are encoded in the genome of this human isolate. The genome annotation revealed additional serine proteases, neuraminidase H, nitric oxide reductase, an invasion-associated protein, and acyl-CoA carboxylase subunits involved in mycolic acid biosynthesis as potential virulence factors. The cAMP-sensing transcription regulator GlxR plays a key role in controlling the expression of several genes contributing to virulence.
The functional data deduced from the genome sequencing and the extended knowledge of virulence factors indicate that the human isolate C. pseudotuberculosis FRC41 is equipped with a distinct gene set promoting its survival under unfavorable environmental conditions encountered in the mammalian host.
Salmonella paratyphi C is one of the few human-adapted pathogens along with S. typhi, S. paratyphi A and S. paratyphi B that cause typhoid, but it is not clear whether these bacteria cause the disease by the same or different pathogenic mechanisms. Notably, these typhoid agents have distinct sets of large genomic insertions, which may encode different pathogenicity factors. Previously we identified a novel prophage, SPC-P1, in S. paratyphi C RKS4594 and wondered whether it might be involved in pathogenicity of the bacteria.
We analyzed the sequence of SPC-P1 and found that it is an inducible phage with an overall G+C content of 47.24%, similar to that of most Salmonella phages such as P22 and ST64T but significantly lower than the 52.16% average of the RKS4594 chromosome. Electron microscopy showed short-tailed phage particles very similar to the lambdoid phage CUS-3. To evaluate its roles in pathogenicity, we lysogenized S. paratyphi C strain CN13/87, which did not have this prophage, and infected mice with the lysogenized CN13/87. Compared to the phage-free wild type CN13/87, the lysogenized CN13/87 exhibited significantly increased virulence and caused multi-organ damages in mice at considerably lower infection doses.
SPC-P1 contributes pathogenicity to S. paratyphi C in animal infection models, so it is possible that this prophage is involved in typhoid pathogenesis in humans. Genetic and functional analyses of SPC-P1 may facilitate the study of pathogenic evolution of the extant typhoid agents, providing particular help in elucidating the pathogenic determinants of the typhoid agents.
Grain endosperm chalkiness of rice is a varietal characteristic that negatively affects not only the appearance and milling properties but also the cooking texture and palatability of cooked rice. However, grain chalkiness is a complex quantitative genetic trait and the molecular mechanisms underlying its formation are poorly understood.
A near-isogenic line CSSL50-1 with high chalkiness was compared with its normal parental line Asominori for grain endosperm chalkiness. Physico-biochemical analyses of ripened grains showed that, compared with Asominori, CSSL50-1 contains higher levels of amylose and 8 DP (degree of polymerization) short-chain amylopectin, but lower medium length 12 DP amylopectin. Transcriptome analysis of 15 DAF (day after flowering) caryopses of the isogenic lines identified 623 differential expressed genes (P < 0.01), among which 324 genes are up-regulated and 299 down-regulated. These genes were classified into 18 major categories, with 65.3% of them belong to six major functional groups: signal transduction, cell rescue/defense, transcription, protein degradation, carbohydrate metabolism and redox homeostasis. Detailed pathway dissection demonstrated that genes involved in sucrose and starch synthesis are up-regulated, whereas those involved in non-starch polysaccharides are down regulated. Several genes involved in oxidoreductive homeostasis were found to have higher expression levels in CSSL50-1 as well, suggesting potential roles of ROS in grain chalkiness formation.
Extensive gene expression changes were detected during rice grain chalkiness formation. Over half of these differentially expressed genes are implicated in several important categories of genes, including signal transduction, transcription, carbohydrate metabolism and redox homeostasis, suggesting that chalkiness formation involves multiple metabolic and regulatory pathways.
Single nucleotide polymorphisms (SNPs) are ideally suited for the construction of high-resolution genetic maps, studying population evolutionary history and performing genome-wide association mapping experiments. Here, we used a genome-wide set of 1536 SNPs to study linkage disequilibrium (LD) and population structure in a panel of 478 spring and winter wheat cultivars (Triticum aestivum) from 17 populations across the United States and Mexico.
Most of the wheat oligo pool assay (OPA) SNPs that were polymorphic within the complete set of 478 cultivars were also polymorphic in all subpopulations. Higher levels of genetic differentiation were observed among wheat lines within populations than among populations. A total of nine genetically distinct clusters were identified, suggesting that some of the pre-defined populations shared significant proportion of genetic ancestry. Estimates of population structure (FST) at individual loci showed a high level of heterogeneity across the genome. In addition, seven genomic regions with elevated FST were detected between the spring and winter wheat populations. Some of these regions overlapped with previously mapped flowering time QTL. Across all populations, the highest extent of significant LD was observed in the wheat D-genome, followed by lower LD in the A- and B-genomes. The differences in the extent of LD among populations and genomes were mostly driven by differences in long-range LD ( > 10 cM).
Genome- and population-specific patterns of genetic differentiation and LD were discovered in the populations of wheat cultivars from different geographic regions. Our study demonstrated that the estimates of population structure between spring and winter wheat lines can identify genomic regions harboring candidate genes involved in the regulation of growth habit. Variation in LD suggests that breeding and selection had a different impact on each wheat genome both within and among populations. The higher extent of LD in the wheat D-genome versus the A- and B-genomes likely reflects the episodes of recent introgression and population bottleneck accompanying the origin of hexaploid wheat. The assessment of LD and population structure in this assembled panel of diverse lines provides critical information for the development of genetic resources for genome-wide association mapping of agronomically important traits in wheat.
The tuberous root of sweetpotato is an important agricultural and biological organ. There are not sufficient transcriptomic and genomic data in public databases for understanding of the molecular mechanism underlying the tuberous root formation and development. Thus, high throughput transcriptome sequencing is needed to generate enormous transcript sequences from sweetpotato root for gene discovery and molecular marker development.
In this study, more than 59 million sequencing reads were generated using Illumina paired-end sequencing technology. De novo assembly yielded 56,516 unigenes with an average length of 581 bp. Based on sequence similarity search with known proteins, a total of 35,051 (62.02%) genes were identified. Out of these annotated unigenes, 5,046 and 11,983 unigenes were assigned to gene ontology and clusters of orthologous group, respectively. Searching against the Kyoto Encyclopedia of Genes and Genomes Pathway database (KEGG) indicated that 17,598 (31.14%) unigenes were mapped to 124 KEGG pathways, and 11,056 were assigned to metabolic pathways, which were well represented by carbohydrate metabolism and biosynthesis of secondary metabolite. In addition, 4,114 cDNA SSRs (cSSRs) were identified as potential molecular markers in our unigenes. One hundred pairs of PCR primers were designed and used for validation of the amplification and assessment of the polymorphism in genomic DNA pools. The result revealed that 92 primer pairs were successfully amplified in initial screening tests.
This study generated a substantial fraction of sweetpotato transcript sequences, which can be used to discover novel genes associated with tuberous root formation and development and will also make it possible to construct high density microarrays for further characterization of gene expression profiles during these processes. Thousands of cSSR markers identified in the present study can enrich molecular markers and will facilitate marker-assisted selection in sweetpotato breeding. Overall, these sequences and markers will provide valuable resources for the sweetpotato community. Additionally, these results also suggested that transcriptome analysis based on Illumina paired-end sequencing is a powerful tool for gene discovery and molecular marker development for non-model species, especially those with large and complex genome.
Type A1 Clostridium botulinum strains are a group of Gram-positive, spore-forming anaerobic bacteria that produce a genetically, biochemically, and biophysically indistinguishable 150 kD protein that causes botulism. The genomes of three type A1 C. botulinum strains have been sequenced and show a high degree of synteny. The purpose of this study was to characterize differences among these genomes and compare these differentiating features with two additional unsequenced strains used in previous studies.
Several strategies were deployed in this report. First, University of Massachusetts Dartmouth laboratory Hall strain (UMASS strain) neurotoxin gene was amplified by PCR and sequenced; its sequence was aligned with the published ATCC 3502 Sanger Institute Hall strain and Allergan Hall strain neurotoxin gene regions. Sequence alignment showed that there was a synonymous single nucleotide polymorphism (SNP) in the region encoding the heavy chain between Allergan strain and ATCC 3502 and UMASS strains. Second, comparative genomic hybridization (CGH) demonstrated that the UMASS strain and a strain expected to be derived from ATCC 3502 in the Centers for Disease Control and Prevention (CDC) laboratory (ATCC 3502*) differed in gene content compared to the ATCC 3502 genome sequence published by the Sanger Institute. Third, alignment of the three sequenced C. botulinum type A1 strain genomes revealed the presence of four comparable blocks. Strains ATCC 3502 and ATCC 19397 share the same genome organization, while the organization of the blocks in strain Hall were switched. Lastly, PCR was designed to identify UMASS and ATCC 3502* strain genome organizations. The PCR results indicated that UMASS strain belonged to Hall type and ATCC 3502* strain was identical to ATCC 3502 (Sanger Institute) type.
Taken together, C. botulinum type A1 strains including Sanger Institute ATCC 3502, ATCC 3502*, ATCC 19397, Hall, Allergan, and UMASS strains demonstrate differences at the level of the neurotoxin gene sequence, in gene content, and in genome arrangement.
The need for rapid and efficient microbial cell factory design and construction are possible through the enabling technology, metabolic engineering, which is now being facilitated by systems biology approaches. Metabolic engineering is often complimented by directed evolution, where selective pressure is applied to a partially genetically engineered strain to confer a desirable phenotype. The exact genetic modification or resulting genotype that leads to the improved phenotype is often not identified or understood to enable further metabolic engineering.
In this work we performed whole genome high-throughput sequencing and annotation can be used to identify single nucleotide polymorphisms (SNPs) between Saccharomyces cerevisiae strains S288c and CEN.PK113-7D. The yeast strain S288c was the first eukaryote sequenced, serving as the reference genome for the Saccharomyces Genome Database, while CEN.PK113-7D is a preferred laboratory strain for industrial biotechnology research. A total of 13,787 high-quality SNPs were detected between both strains (reference strain: S288c). Considering only metabolic genes (782 of 5,596 annotated genes), a total of 219 metabolism specific SNPs are distributed across 158 metabolic genes, with 85 of the SNPs being nonsynonymous (e.g., encoding amino acid modifications). Amongst metabolic SNPs detected, there was pathway enrichment in the galactose uptake pathway (GAL1, GAL10) and ergosterol biosynthetic pathway (ERG8, ERG9). Physiological characterization confirmed a strong deficiency in galactose uptake and metabolism in S288c compared to CEN.PK113-7D, and similarly, ergosterol content in CEN.PK113-7D was significantly higher in both glucose and galactose supplemented cultivations compared to S288c. Furthermore, DNA microarray profiling of S288c and CEN.PK113-7D in both glucose and galactose batch cultures did not provide a clear hypothesis for major phenotypes observed, suggesting that genotype to phenotype correlations are manifested post-transcriptionally or post-translationally either through protein concentration and/or function.
With an intensifying need for microbial cell factories that produce a wide array of target compounds, whole genome high-throughput sequencing and annotation for SNP detection can aid in better reducing and defining the metabolic landscape. This work demonstrates direct correlations between genotype and phenotype that provides clear and high-probability of success metabolic engineering targets. The genome sequence, annotation, and a SNP viewer of CEN.PK113-7D are deposited at http://www.sysbio.se/cenpk.
As we enter an era when testing millions of SNPs in a single gene association study will become the standard, consideration of multiple comparisons is an essential part of determining statistical significance. Bonferroni adjustments can be made but are conservative due to the preponderance of linkage disequilibrium (LD) between genetic markers, and permutation testing is not always a viable option. Three major classes of corrections have been proposed to correct the dependent nature of genetic data in Bonferroni adjustments: permutation testing and related alternatives, principal components analysis (PCA), and analysis of blocks of LD across the genome. We consider seven implementations of these commonly used methods using data from 1514 European American participants genotyped for 700,078 SNPs in a GWAS for AIDS.
A Bonferroni correction using the number of LD blocks found by the three algorithms implemented by Haploview resulted in an insufficiently conservative threshold, corresponding to a genome-wide significance level of α = 0.15 - 0.20. We observed a moderate increase in power when using PRESTO, SLIDE, and simpleℳ when compared with traditional Bonferroni methods for population data genotyped on the Affymetrix 6.0 platform in European Americans (α = 0.05 thresholds between 1 × 10-7 and 7 × 10-8).
Correcting for the number of LD blocks resulted in an anti-conservative Bonferroni adjustment. SLIDE and simpleℳ are particularly useful when using a statistical test not handled in optimized permutation testing packages, and genome-wide corrected p-values using SLIDE, are much easier to interpret for consumers of GWAS studies.
The link between reproductive life history and incidence of ovarian tumors is well known. Periods of reduced ovulations may confer protection against ovarian cancer. Using phenotypic data available for mouse, a possible association between the ovarian transcriptome, reproductive records and spontaneous ovarian tumor rates was investigated in four mouse inbred strains. NIA15k-DNA microarrays were employed to obtain expression profiles of BalbC, C57BL6, FVB and SWR adult ovaries.
Linear regression analysis with multiple-test control (adjusted p ≤ 0.05) resulted in ovarian tumor frequency (OTF) and number of litters (NL) as the top-correlated among five tested phenotypes. Moreover, nearly one-hundred genes were coincident between these two traits and were decomposed in 76 OTF(–) NL(+) and 20 OTF(+) NL(–) genes, where the plus/minus signs indicate the direction of correlation. Enriched functional categories were RNA-binding/mRNA-processing and protein folding in the OTF(–) NL(+) and the OTF(+) NL(–) subsets, respectively. In contrast, no associations were detected between OTF and litter size (LS), the latter a measure of ovulation events in a single estrous cycle.
Literature text-mining pointed to post-transcriptional control of ovarian processes including oocyte maturation, folliculogenesis and angiogenesis as possible causal relationships of observed tumor and reproductive phenotypes. We speculate that repetitive cycling instead of repetitive ovulations represent the actual link between ovarian tumorigenesis and reproductive records.
A large number of probabilistic models used in sequence analysis assign non-zero probability values to most input sequences. To decide when a given probability is sufficient the most common way is bayesian binary classification, where the probability of the model characterizing the sequence family of interest is compared to that of an alternative probability model. We can use as alternative model a null model. This is the scoring technique used by sequence analysis tools such as HMMER, SAM and INFERNAL. The most prevalent null models are position-independent residue distributions that include: the uniform distribution, genomic distribution, family-specific distribution and the target sequence distribution. This paper presents a study to evaluate the impact of the choice of a null model in the final result of classifications. In particular, we are interested in minimizing the number of false predictions in a classification. This is a crucial issue to reduce costs of biological validation.
For all the tests, the target null model presented the lowest number of false positives, when using random sequences as a test. The study was performed in DNA sequences using GC content as the measure of content bias, but the results should be valid also for protein sequences. To broaden the application of the results, the study was performed using randomly generated sequences. Previous studies were performed on aminoacid sequences, using only one probabilistic model (HMM) and on a specific benchmark, and lack more general conclusions about the performance of null models. Finally, a benchmark test with P. falciparum confirmed these results.
Of the evaluated models the best suited for classification are the uniform model and the target model. However, the use of the uniform model presents a GC bias that can cause more false positives for candidate sequences with extreme compositional bias, a characteristic not described in previous studies. In these cases the target model is more dependable for biological validation due to its higher specificity.
Physical protein-protein interaction (PPI) is a critical phenomenon for the function of most proteins in living organisms and a significant fraction of PPIs are the result of domain-domain interactions. Exon shuffling, intron-mediated recombination of exons from existing genes, is known to have been a major mechanism of domain shuffling in metazoans. Thus, we hypothesized that exon shuffling could have a significant influence in shaping the topology of PPI networks.
We tested our hypothesis by compiling exon shuffling and PPI data from six eukaryotic species: Homo sapiens, Mus musculus, Drosophila melanogaster, Caenorhabditis elegans, Cryptococcus neoformans and Arabidopsis thaliana. For all four metazoan species, genes enriched in exon shuffling events presented on average higher vertex degree (number of interacting partners) in PPI networks. Furthermore, we verified that a set of protein domains that are simultaneously promiscuous (known to interact to multiple types of other domains), self-interacting (able to interact with another copy of themselves) and abundant in the genomes presents a stronger signal for exon shuffling.
Exon shuffling appears to have been a recurrent mechanism for the emergence of new PPIs along metazoan evolution. In metazoan genomes, exon shuffling also promoted the expansion of some protein domains. We speculate that their promiscuous and self-interacting properties may have been decisive for that expansion.
The thymus is a central lymphoid organ, in which bone marrow-derived T cell precursors undergo a complex process of maturation. Developing thymocytes interact with thymic microenvironment in a defined spatial order. A component of thymic microenvironment, the thymic epithelial cells, is crucial for the maturation of T-lymphocytes through cell-cell contact, cell matrix interactions and secretory of cytokines/chemokines. There is evidence that extracellular matrix molecules play a fundamental role in guiding differentiating thymocytes in both cortical and medullary regions of the thymic lobules. The interaction between the integrin α5β1 (CD49e/CD29; VLA-5) and fibronectin is relevant for thymocyte adhesion and migration within the thymic tissue. Our previous results have shown that adhesion of thymocytes to cultured TEC line is enhanced in the presence of fibronectin, and can be blocked with anti-VLA-5 antibody.
Herein, we studied the role of CD49e expressed by the human thymic epithelium. For this purpose we knocked down the CD49e by means of RNA interference. This procedure resulted in the modulation of more than 100 genes, some of them coding for other proteins also involved in adhesion of thymocytes; others related to signaling pathways triggered after integrin activation, or even involved in the control of F-actin stress fiber formation. Functionally, we demonstrated that disruption of VLA-5 in human TEC by CD49e-siRNA-induced gene knockdown decreased the ability of TEC to promote thymocyte adhesion. Such a decrease comprised all CD4/CD8-defined thymocyte subsets.
Conceptually, our findings unravel the complexity of gene regulation, as regards key genes involved in the heterocellular cell adhesion between developing thymocytes and the major component of the thymic microenvironment, an interaction that is a mandatory event for proper intrathymic T cell differentiation.
Retrieving pertinent information from biological scientific literature requires cutting-edge text mining methods which may be able to recognize the meaning of the very ambiguous names of biological entities. Aliases of a gene share a common vocabulary in their respective collections of PubMed abstracts. This may be true even when these aliases are not associated with the same subset of documents. This gene-specific vocabulary defines a unique fingerprint that can be used to disclose ambiguous aliases. The present work describes an original method for automatically assessing the ambiguity levels of gene aliases in large gene terminologies based exclusively in the content of their associated literature. The method can deal with the two major problems restricting the usage of current text mining tools: 1) different names associated with the same gene; and 2) one name associated with multiple genes, or even with non-gene entities. Important, this method does not require training examples.
Aliases were considered “ambiguous” when their Jaccard distance to the respective official gene symbol was equal or greater than the smallest distance between the official gene symbol and one of the three internal controls (randomly picked unrelated official gene symbols). Otherwise, they were assigned the status of “synonyms”. We evaluated the coherence of the results by comparing the frequencies of the official gene symbols in the text corpora retrieved with their respective “synonyms” or “ambiguous” aliases. Official gene symbols were mentioned in the abstract collections of 42 % (70/165) of their respective synonyms. No official gene symbol occurred in the abstract collections of any of their respective ambiguous aliases. In overall, querying PubMed with official gene symbols and “synonym” aliases allowed a 3.6-fold increase in the number of unique documents retrieved.
These results confirm that this method is able to distinguish between synonyms and ambiguous gene aliases based exclusively on their vocabulary fingerprint. The approach we describe could be used to enhance the retrieval of relevant literature related to a gene.
Alternative splicing (AS) is a central mechanism in the generation of genomic complexity and is a major contributor to transcriptome and proteome diversity. Alterations of the splicing process can lead to deregulation of crucial cellular processes and have been associated with a large spectrum of human diseases. Cancer-associated transcripts are potential molecular markers and may contribute to the development of more accurate diagnostic and prognostic methods and also serve as therapeutic targets. Alternative splicing-enriched cDNA libraries have been used to explore the variability generated by alternative splicing. In this study, by combining the use of trapping heteroduplexes and RNA amplification, we developed a powerful approach that enables transcriptome-wide exploration of the AS repertoire for identifying AS variants associated with breast tumor cells modulated by ERBB2 (HER-2/neu) oncogene expression.
The human breast cell line (C5.2) and a pool of 5 ERBB2 over-expressing breast tumor samples were used independently for the construction of two AS-enriched libraries. In total, 2,048 partial cDNA sequences were obtained, revealing 214 alternative splicing sequence-enriched tags (ASSETs). A subset with 79 multiple exon ASSETs was compared to public databases and reported 138 different AS events. A high success rate of RT-PCR validation (94.5%) was obtained, and 2 novel AS events were identified. The influence of ERBB2-mediated expression on AS regulation was evaluated by capillary electrophoresis and probe-ligation approaches in two mammary cell lines (Hb4a and C5.2) expressing different levels of ERBB2. The relative expression balance between AS variants from 3 genes was differentially modulated by ERBB2 in this model system.
In this study, we presented a method for exploring AS from any RNA source in a transcriptome-wide format, which can be directly easily adapted to next generation sequencers. We identified AS transcripts that were differently modulated by ERBB2-mediated expression and that can be tested as molecular markers for breast cancer. Such a methodology will be useful for completely deciphering the cancer cell transcriptome diversity resulting from AS and for finding more precise molecular markers.
Cathepsin B (catB) is a promising target for anti-cancer drug design due to its implication in several steps of tumorigenesis. catB activity and inhibition are pH-dependent, making it difficult to identify efficient inhibitor candidates for clinical trials. In addition it is known that heparin binding stabilizes the enzyme in alkaline conditions. However, the molecular mechanism of stabilization is not well understood, indicating the need for more detailed structural and dynamic studies in order to clarify the influence of pH and heparin binding on catB stability.
Our pKa calculations of catB titratable residues revealed distinct protonation states under different pH conditions for six key residues, of which four lie in the crucial interdomain interface. This implies changes in the overall charge distribution at the catB surface, as revealed by calculation of the electrostatic potential. We identified two basic surface regions as possible heparin binding sites, which were confirmed by docking calculations. Molecular dynamics (MD) of both apo catB and catB-heparin complexes were performed using protonation states for catB residues corresponding to the relevant acidic or alkaline conditions. The MD of apo catB at pH 5.5 was very stable, and presented the highest number and occupancy of hydrogen bonds within the inter-domain interface. In contrast, under alkaline conditions the enzyme's overall flexibility was increased: interactions between active site residues were lost, helical content decreased, and domain separation was observed as well as high-amplitude motions of the occluding loop – a main target of drug design studies. Essential dynamics analysis revealed that heparin binding modulates large amplitude motions promoting rearrangement of contacts between catB domains, thus favoring the maintenance of helical content as well as active site stability.
The results of our study contribute to unraveling the molecular events involved in catB inactivation in alkaline pH, highlighting the fact that protonation changes of few residues can alter the overall dynamics of an enzyme. Moreover, we propose an allosteric role for heparin in the regulation of catB stability in such a manner that the restriction of enzyme flexibility would allow the establishment of stronger contacts and thus the maintenance of overall structure.
Molecular docking simulation is the Rational Drug Design (RDD) step that investigates the affinity between protein receptors and ligands. Typically, molecular docking algorithms consider receptors as rigid bodies. Receptors are, however, intrinsically flexible in the cellular environment. The use of a time series of receptor conformations is an approach to explore its flexibility in molecular docking computer simulations, but it is extensively time-consuming. Hence, selection of the most promising conformations can accelerate docking experiments and, consequently, the RDD efforts.
We previously docked four ligands (NADH, TCL, PIF and ETH) to 3,100 conformations of the InhA receptor from M. tuberculosis. Based on the receptor residues-ligand distances we preprocessed all docking results to generate appropriate input to mine data. Data preprocessing was done by calculating the shortest interatomic distances between the ligand and the receptor’s residues for each docking result. They were the predictive attributes. The target attribute was the estimated free-energy of binding (FEB) value calculated by the AutodDock3.0.5 software. The mining inputs were submitted to the M5P model tree algorithm. It resulted in short and understandable trees. On the basis of the correlation values, for NADH, TCL and PIF we obtained more than 95% correlation while for ETH, only about 60%. Post processing the generated model trees for each of its linear models (LMs), we calculated the average FEB for their associated instances. From these values we considered a LM as representative if its average FEB was smaller than or equal the average FEB of the test set. The instances in the selected LMs were considered the most promising snapshots. It totalized 1,521, 1,780, 2,085 and 902 snapshots, for NADH, TCL, PIF and ETH respectively.
By post processing the generated model trees we were able to propose a criterion of selection of linear models which, in turn, is capable of selecting a set of promising receptor conformations. As future work we intend to go further and use these results to elaborate a strategy to preprocess the receptors 3-D spatial conformation in order to predict FEB values. Besides, we intend to select other compounds, among the million catalogued, that may be promising as new drug candidates for our particular protein receptor target.
G. diazotrophicus and A. vinelandii are aerobic nitrogen-fixing bacteria. Although oxygen is essential for the survival of these organisms, it irreversibly inhibits nitrogenase, the complex responsible for nitrogen fixation. Both microorganisms deal with this paradox through compensatory mechanisms. In A. vinelandii a conformational protection mechanism occurs through the interaction between the nitrogenase complex and the FeSII protein. Previous studies suggested the existence of a similar system in G. diazotrophicus, but the putative protein involved was not yet described. This study intends to identify the protein coding gene in the recently sequenced genome of G. diazotrophicus and also provide detailed structural information of nitrogenase conformational protection in both organisms.
Genomic analysis of G. diazotrophicus sequences revealed a protein coding ORF (Gdia0615) enclosing a conserved “fer2” domain, typical of the ferredoxin family and found in A. vinelandii FeSII. Comparative models of both FeSII and Gdia0615 disclosed a conserved beta-grasp fold. Cysteine residues that coordinate the 2[Fe-S] cluster are in conserved positions towards the metallocluster. Analysis of solvent accessible residues and electrostatic surfaces unveiled an hydrophobic dimerization interface. Dimers assembled by molecular docking presented a stable behaviour and a proper accommodation of regions possibly involved in binding of FeSII to nitrogenase throughout molecular dynamics simulations in aqueous solution. Molecular modeling of the nitrogenase complex of G. diazotrophicus was performed and models were compared to the crystal structure of A. vinelandii nitrogenase. Docking experiments of FeSII and Gdia0615 with its corresponding nitrogenase complex pointed out in both systems a putative binding site presenting shape and charge complementarities at the Fe-protein/MoFe-protein complex interface.
The identification of the putative FeSII coding gene in G. diazotrophicus genome represents a large step towards the understanding of the conformational protection mechanism of nitrogenase against oxygen. In addition, this is the first study regarding the structural complementarities of FeSII-nitrogenase interactions in diazotrophic bacteria. The combination of bioinformatic tools for genome analysis, comparative protein modeling, docking calculations and molecular dynamics provided a powerful strategy for the elucidation of molecular mechanisms and structural features of FeSII-nitrogenase interaction.
The need to manage large amounts of data is a clear demand for laboratories nowadays. The use of Laboratory Information Management Systems (LIMS) to achieve this is growing each day. A LIMS is a complex computational system used to manage laboratory data with emphasis in quality assurance. Several LIMS are available currently. However, most of them have proprietary code and are commercialized with a high cost. Moreover, due to its complexity, LIMS are usually designed to comply with the needs of one kind of laboratory, making it very difficult to reuse a LIMS. In this work we describe the Sistema Integrado de Gerência de Laboratórios (SIGLa), an open source LIMS with a new approach designed to allow it to adapt its activities and processes to various types of laboratories.
SIGLa incorporates a workflow management system, making it possible to create and manage customized workflows. For each new laboratory a workflow is defined with its activities, rules and procedures. During the execution, for each workflow created, the values of attributes defined in a XPDL file (which describe the workflow) are stored in SIGLa’s database, allowing then to be managed and retrieved upon request. These characteristics increase system’s flexibility and extend its usability to include the needs of multiple types of laboratories. To construct the main functionalities of SIGLa a workflow of a proteomic laboratory was first defined. To validate the SIGLa capability of adapting to multiples laboratories, on this paper we study theprocess and the needs of a microarray laboratory and define its workflow. This workflow has been defined in a period of about two weeks, showing the efficiency and flexibility of the tool.
Using SIGLa it has been possible to construct a microarray LIMS in a few days illustrating the flexibility and power of the method proposed. With SIGLa’s development we hope to contribute positively to the area of management of complex data in laboratory by managing its large amounts of data, guaranteeing the consistence of the data and increasing the laboratory productivity. We also hope to make possible to laboratories with little resources to afford a high level system for complex data management.
The genome-wide identification of both morbid genes, i.e., those genes whose mutations cause hereditary human diseases, and druggable genes, i.e., genes coding for proteins whose modulation by small molecules elicits phenotypic effects, requires experimental approaches that are time-consuming and laborious. Thus, a computational approach which could accurately predict such genes on a genome-wide scale would be invaluable for accelerating the pace of discovery of causal relationships between genes and diseases as well as the determination of druggability of gene products.
In this paper we propose a machine learning-based computational approach to predict morbid and druggable genes on a genome-wide scale. For this purpose, we constructed a decision tree-based meta-classifier and trained it on datasets containing, for each morbid and druggable gene, network topological features, tissue expression profile and subcellular localization data as learning attributes. This meta-classifier correctly recovered 65% of known morbid genes with a precision of 66% and correctly recovered 78% of known druggable genes with a precision of 75%. It was than used to assign morbidity and druggability scores to genes not known to be morbid and druggable and we showed a good match between these scores and literature data. Finally, we generated decision trees by training the J48 algorithm on the morbidity and druggability datasets to discover cellular rules for morbidity and druggability and, among the rules, we found that the number of regulating transcription factors and plasma membrane localization are the most important factors to morbidity and druggability, respectively.
We were able to demonstrate that network topological features along with tissue expression profile and subcellular localization can reliably predict human morbid and druggable genes on a genome-wide scale. Moreover, by constructing decision trees based on these data, we could discover cellular rules governing morbidity and druggability.