The Neanthesacuminata species complex (Polychaeta) are cosmopolitan in distribution. Neanthesarenaceodentata, Southern California member of the N. acuminata complex, has been widely used as toxicological test animal in the marine environment. Method of reproduction is unique in this polychaete complex. Same sexes fight and opposite sexes lie side by side until egg laying. Females lose about 75% of their weight and die after laying eggs. The male, capable of reproducing up to nine times, fertilizes the eggs and incubates the embryos for 3-4 weeks. The objective of this study was to determine if there is any set of proteins that influences this unique pattern of reproduction. Gel-based two-dimensional electrophoresis (2-DE) and gel-free quantitative proteomics methods were used to identify differential protein expression patterns before and after spawning in both male and female N. arenaceodentata. Males showed a higher degree of similarity in protein expression patterns but females showed large changes in phosphoproteme before and after spawning. There was a decrease (about 70%) in the number of detected phosphoproteins in spent females. The proteins involved in muscular development, cell signaling, structure and integrity, and translation were differentially expressed. This study provides proteomic insights of the male and female worms that may serve as a foundation for better understanding of unusual reproductive patterns in polychaete worms.
Motivation: Most functions within the cell emerge thanks to protein–protein interactions (PPIs), yet experimental determination of PPIs is both expensive and time-consuming. PPI networks present significant levels of noise and incompleteness. Predicting interactions using only PPI-network topology (topological prediction) is difficult but essential when prior biological knowledge is absent or unreliable.
Methods: Network embedding emphasizes the relations between network proteins embedded in a low-dimensional space, in which protein pairs that are closer to each other represent good candidate interactions. To achieve network denoising, which boosts prediction performance, we first applied minimum curvilinear embedding (MCE), and then adopted shortest path (SP) in the reduced space to assign likelihood scores to candidate interactions. Furthermore, we introduce (i) a new valid variation of MCE, named non-centred MCE (ncMCE); (ii) two automatic strategies for selecting the appropriate embedding dimension; and (iii) two new randomized procedures for evaluating predictions.
Results: We compared our method against several unsupervised and supervisedly tuned embedding approaches and node neighbourhood techniques. Despite its computational simplicity, ncMCE-SP was the overall leader, outperforming the current methods in topological link prediction.
Conclusion: Minimum curvilinearity is a valuable non-linear framework that we successfully applied to the embedding of protein networks for the unsupervised prediction of novel PPIs. The rationale for our approach is that biological and evolutionary information is imprinted in the non-linear patterns hidden behind the protein network topology, and can be exploited for predicting new protein links. The predicted PPIs represent good candidates for testing in high-throughput experiments or for exploitation in systems biology tools such as those used for network-based inference and prediction of disease-related functional modules.
firstname.lastname@example.org or email@example.com
Supplementary data are available at Bioinformatics online.
Subgraph matching algorithms are designed to find all instances of predefined subgraphs in a large graph or network and play an important role in the discovery and analysis of so-called network motifs, subgraph patterns which occur more often than expected by chance. We present the index-based subgraph matching algorithm (ISMA), a novel tree-based algorithm. ISMA realizes a speedup compared to existing algorithms by carefully selecting the order in which the nodes of a query subgraph are investigated. In order to achieve this, we developed a number of data structures and maximally exploited symmetry characteristics of the subgraph. We compared ISMA to a naive recursive tree-based algorithm and to a number of well-known subgraph matching algorithms. Our algorithm outperforms the other algorithms, especially on large networks and with large query subgraphs. An implementation of ISMA in Java is freely available at http://sourceforge.net/projects/isma/.
Since hub nodes have been found to play important roles in many networks, highly connected hub genes are expected to play an important role in biology as well. However, the empirical evidence remains ambiguous. An open question is whether (or when) hub gene selection leads to more meaningful gene lists than a standard statistical analysis based on significance testing when analyzing genomic data sets (e.g., gene expression or DNA methylation data). Here we address this question for the special case when multiple genomic data sets are available. This is of great practical importance since for many research questions multiple data sets are publicly available. In this case, the data analyst can decide between a standard statistical approach (e.g., based on meta-analysis) and a co-expression network analysis approach that selects intramodular hubs in consensus modules. We assess the performance of these two types of approaches according to two criteria. The first criterion evaluates the biological insights gained and is relevant in basic research. The second criterion evaluates the validation success (reproducibility) in independent data sets and often applies in clinical diagnostic or prognostic applications. We compare meta-analysis with consensus network analysis based on weighted correlation network analysis (WGCNA) in three comprehensive and unbiased empirical studies: (1) Finding genes predictive of lung cancer survival, (2) finding methylation markers related to age, and (3) finding mouse genes related to total cholesterol. The results demonstrate that intramodular hub gene status with respect to consensus modules is more useful than a meta-analysis p-value when identifying biologically meaningful gene lists (reflecting criterion 1). However, standard meta-analysis methods perform as good as (if not better than) a consensus network approach in terms of validation success (criterion 2). The article also reports a comparison of meta-analysis techniques applied to gene expression data and presents novel R functions for carrying out consensus network analysis, network based screening, and meta analysis.
Ocean acidification due to rising atmospheric CO2 is expected to affect the physiology of important calcifying marine organisms, but the nature and magnitude of change is yet to be established. In coccolithophores, different species and strains display varying calcification responses to ocean acidification, but the underlying biochemical properties remain unknown. We employed an approach combining tandem mass-spectrometry with isobaric tagging (iTRAQ) and multiple database searching to identify proteins that were differentially expressed in cells of the marine coccolithophore species Emiliania huxleyi (strain NZEH) between two CO2 conditions: 395 (∼current day) and ∼1340 p.p.m.v. CO2. Cells exposed to the higher CO2 condition contained more cellular particulate inorganic carbon (CaCO3) and particulate organic nitrogen and carbon than those maintained in present-day conditions. These results are linked with the observation that cells grew slower under elevated CO2, indicating cell cycle disruption. Under high CO2 conditions, coccospheres were larger and cells possessed bigger coccoliths that did not show any signs of malformation compared to those from cells grown under present-day CO2 levels. No differences in calcification rate, particulate organic carbon production or cellular organic carbon: nitrogen ratios were observed. Results were not related to nutrient limitation or acclimation status of cells. At least 46 homologous protein groups from a variety of functional processes were quantified in these experiments, of which four (histones H2A, H3, H4 and a chloroplastic 30S ribosomal protein S7) showed down-regulation in all replicates exposed to high CO2, perhaps reflecting the decrease in growth rate. We present evidence of cellular stress responses but proteins associated with many key metabolic processes remained unaltered. Our results therefore suggest that this E. huxleyi strain possesses some acclimation mechanisms to tolerate future CO2 scenarios, although the observed decline in growth rate may be an overriding factor affecting the success of this ecotype in future oceans.
Growth and remodelling impact the network topology of complex systems, yet a general theory explaining how new links arise between existing nodes has been lacking, and little is known about the topological properties that facilitate link-prediction. Here we investigate the extent to which the connectivity evolution of a network might be predicted by mere topological features. We show how a link/community-based strategy triggers substantial prediction improvements because it accounts for the singular topology of several real networks organised in multiple local communities - a tendency here named local-community-paradigm (LCP). We observe that LCP networks are mainly formed by weak interactions and characterise heterogeneous and dynamic systems that use self-organisation as a major adaptation strategy. These systems seem designed for global delivery of information and processing via multiple local modules. Conversely, non-LCP networks have steady architectures formed by strong interactions, and seem designed for systems in which information/energy storage is crucial.
Coral bleaching, triggered by elevated sea-surface temperatures (SSTs) has caused a decline in coral cover and changes in the abundances of corals on reefs worldwide. Coral decline can be exacerbated by the effects of local stressors like turbidity, yet some reefs with a natural history of turbidity can support healthy and resilient coral communities. However, little is known about responses of coral communities to bleaching events on anthropogenically turbid reefs as a result of recent (post World War II) terrestrial runoff. Analysis of region-scale coral cover and species abundance at 17–20 sites on the turbid reefs of Okinawa Island (total of 79 species, 30 genera, and 13 families) from 1995 to 2009 indicates that coral cover decreased drastically, from 24.4% to 7.5% (1.1%/year), subsequent to bleaching events in 1998 and 2001. This dramatic decrease in coral cover corresponded to the demise of Acropora species (e.g., A. digitifera) by 2009, when Acropora had mostly disappeared from turbid reefs on Okinawa Island. In contrast, Merulinidae species (e.g., Dipsastraea pallida/speciosa/favus) and Porites species (e.g., P. lutea/australiensis), which are characterized by tolerance to thermal stress, survived on turbid reefs of Okinawa Island throughout the period. Our results suggest that high turbidity, influenced by recent terrestrial runoff, could have caused a reduction in resilience of Acropora species to severe thermal stress events, because the corals could not have adapted to a relatively recent decline in water quality. The coral reef ecosystems of Okinawa Island will be severely impoverished if Acropora species fail to recover.
In order for society to make effective policy decisions on complex and far-reaching subjects, such as appropriate responses to global climate change, scientists must effectively communicate complex results to the non-scientifically specialized public. However, there are few ways however to transform highly complicated scientific data into formats that are engaging to the general community. Taking inspiration from patterns observed in nature and from some of the principles of jazz bebop improvisation, we have generated Microbial Bebop, a method by which microbial environmental data are transformed into music. Microbial Bebop uses meter, pitch, duration, and harmony to highlight the relationships between multiple data types in complex biological datasets. We use a comprehensive microbial ecology, time course dataset collected at the L4 marine monitoring station in the Western English Channel as an example of microbial ecological data that can be transformed into music. Four compositions were generated (www.bio.anl.gov/MicrobialBebop.htm.) from L4 Station data using Microbial Bebop. Each composition, though deriving from the same dataset, is created to highlight different relationships between environmental conditions and microbial community structure. The approach presented here can be applied to a wide variety of complex biological datasets.
Cryptorchidism is the most frequent congenital disorder in male children; however the genetic causes of cryptorchidism remain poorly investigated. Comparative integratomics combined with systems biology approach was employed to elucidate genetic factors and molecular pathways underlying testis descent.
Literature mining was performed to collect genomic loci associated with cryptorchidism in seven mammalian species. Information regarding the collected candidate genes was stored in MySQL relational database. Genomic view of the loci was presented using Flash GViewer web tool (http://gmod.org/wiki/Flashgviewer/). DAVID Bioinformatics Resources 6.7 was used for pathway enrichment analysis. Cytoscape plug-in PiNGO 1.11 was employed for protein-network-based prediction of novel candidate genes. Relevant protein-protein interactions were confirmed and visualized using the STRING database (version 9.0).
The developed cryptorchidism gene atlas includes 217 candidate loci (genes, regions involved in chromosomal mutations, and copy number variations) identified at the genomic, transcriptomic, and proteomic level. Human orthologs of the collected candidate loci were presented using a genomic map viewer. The cryptorchidism gene atlas is freely available online: http://www.integratomics-time.com/cryptorchidism/. Pathway analysis suggested the presence of twelve enriched pathways associated with the list of 179 literature-derived candidate genes. Additionally, a list of 43 network-predicted novel candidate genes was significantly associated with four enriched pathways. Joint pathway analysis of the collected and predicted candidate genes revealed the pivotal importance of the muscle-contraction pathway in cryptorchidism and evidence for genomic associations with cardiomyopathy pathways in RASopathies.
The developed gene atlas represents an important resource for the scientific community researching genetics of cryptorchidism. The collected data will further facilitate development of novel genetic markers and could be of interest for functional studies in animals and human. The proposed network-based systems biology approach elucidates molecular mechanisms underlying co-presence of cryptorchidism and cardiomyopathy in RASopathies. Such approach could also aid in molecular explanation of co-presence of diverse and apparently unrelated clinical manifestations in other syndromes.
Cryptorchidism; Muscle-contraction pathway; Cardiomyopathy; Comparative integratomics; Protein-protein interactions; Systems biology; Undescended testes; RASopathy
Physical interactions between proteins mediate a variety of biological functions, including signal transduction, physical structuring of the cell and regulation. While extensive catalogs of such interactions are known from model organisms, their evolutionary histories are difficult to study given the lack of interaction data from phylogenetic outgroups. Using phylogenomic approaches, we infer a upper bound on the time of origin for a large set of human protein-protein interactions, showing that most such interactions appear relatively ancient, dating no later than the radiation of placental mammals. By analyzing paired alignments of orthologous and putatively interacting protein-coding genes from eight mammals, we find evidence for weak but significant co-evolution, as measured by relative selective constraint, between pairs of genes with interacting proteins. However, we find no strong evidence for shared instances of directional selection within an interacting pair. Finally, we use a network approach to show that the distribution of selective constraint across the protein interaction network is non-random, with a clear tendency for interacting proteins to share similar selective constraints. Collectively, the results suggest that, on the whole, protein interactions in mammals are under selective constraint, presumably due to their functional roles.
The properties (or labels) of nodes in networks can often be predicted based on their proximity and their connections to other labeled nodes. So-called “label propagation algorithms” predict the labels of unlabeled nodes by propagating information about local label density iteratively through the network. These algorithms are fast, simple and scale to large networks but nonetheless regularly perform better than slower and much more complex algorithms on benchmark problems. We show here, however, that these algorithms have an intrinsic limitation that prevents them from adapting to some common patterns of network node labeling; we introduce a new algorithm, 3Prop, that retains all their advantages but is much more adaptive. As we show, 3Prop performs very well on node labeling problems ill-suited to label propagation, including predicting gene function in protein and genetic interaction networks and gender in friendship networks, and also performs slightly better on problems already well-suited to label propagation such as labeling blogs and patents based on their citation networks. 3Prop gains its adaptability by assigning separate weights to label information from different steps of the propagation. Surprisingly, we found that for many networks, the third iteration of label propagation receives a negative weight.
The code is available from the authors by request.
Ultraconserved elements of DNA have been identified in vertebrate and invertebrate genomes. These elements have been found to have diverse functions, including enhancer activities in developmental processes. The evolutionary origins and functional roles of these elements in cellular systems, however, have not yet been determined.
Here, we identified a wide range of ultraconserved elements common to distant species, from primitive aquatic organisms to terrestrial species with complicated body systems, including some novel elements conserved in fruit fly and human. In addition to a well-known association with developmental genes, these DNA elements have a strong association with genes implicated in essential cell functions, such as epigenetic regulation, apoptosis, detoxification, innate immunity, and sensory reception. Interestingly, we observed that ultraconserved elements clustered by sequence similarity. Furthermore, species composition and flanking genes of clusters showed lineage-specific patterns. Ultraconserved elements are highly enriched with binding sites to developmental transcription factors regardless of how they cluster.
We identified large numbers of ultraconserved elements across distant species. Specific classes of these conserved elements seem to have been generated before the divergence of taxa and fixed during the process of evolution. Our findings indicate that these ultraconserved elements are not the exclusive property of higher modern eukaryotes, but rather transmitted from their metazoan ancestors.
Ultraconserved elements; Developmental enhancers; Transcriptional regulatory networks; Genome evolution; Marine biology
Fishes are known to use chemical alarm cues from both conspecifics and heterospecifics to assess local predation risks and enhance predator detection. Yet it is unknown how recognition of heterospecific cues arises for coral reef fishes. Here, we test if naïve juvenile fish have an innate recognition of heterospecific alarm cues. We also examine if there is a relationship between the intensity of the antipredator response to these cues and the degree to which species are related to each other. Naïve juvenile anemone fish, Amphiprion percula, were tested to see if they displayed antipredator responses to chemical alarm cues from four closely related heterospecific species (family Pomacentridae), a distantly related sympatric species (Asterropteryx semipunctatus) and a saltwater (control). Juveniles displayed significant reductions in foraging rate when exposed to all four confamilial heterospecific species but they did not respond to the distantly related sympatric species or the saltwater control. There was also a strong relationship between the intensity of the antipredator response and the extent to which species were related, with responses weakening as species became more distantly related. These findings demonstrate that chemical alarm cues are conserved within the pomacentrid family, providing juveniles with an innate recognition of heterospecific alarm cues as predicted by the phylogenetic relatedness hypothesis.
Gielis curves and surfaces can describe a wide range of natural shapes and they have been used in various studies in biology and physics as descriptive tool. This has stimulated the generalization of widely used computational methods. Here we show that proper normalization of the Levenberg-Marquardt algorithm allows for efficient and robust reconstruction of Gielis curves, including self-intersecting and asymmetric curves, without increasing the overall complexity of the algorithm. Then, we show how complex curves of k-type can be constructed and how solutions to the Dirichlet problem for the Laplace equation on these complex domains can be derived using a semi-Fourier method. In all three methods, descriptive and computational power and efficiency is obtained in a surprisingly simple way.
Multivariate local polynomial fitting is applied to the multivariate linear heteroscedastic regression model. Firstly, the local polynomial fitting is applied to estimate heteroscedastic function, then the coefficients of regression model are obtained by using generalized least squares method. One noteworthy feature of our approach is that we avoid the testing for heteroscedasticity by improving the traditional two-stage method. Due to non-parametric technique of local polynomial estimation, it is unnecessary to know the form of heteroscedastic function. Therefore, we can improve the estimation precision, when the heteroscedastic function is unknown. Furthermore, we verify that the regression coefficients is asymptotic normal based on numerical simulations and normal Q-Q plots of residuals. Finally, the simulation results and the local polynomial estimation of real data indicate that our approach is surely effective in finite-sample situations.
Research on the mechanism for early development of shellfish, such as body plan, shell formation, settlement and metamorphosis is currently an active research field. However, studies were still limited and not deep enough because of the lack of genomic resources such as genome or transcriptome sequences. In the present research, de novo transcriptome sequencing was performed for Crassostrea angulata, the most economically important cultured oyster species in China, at eight early developmental stages using the 454 sequencing technology. A total of 555,215 reads were produced with an average length of 309 nucleotides that were then assembled into 10,462 contigs. As determined by GO annotation and KEGG pathway mapping, functional annotation of the unigenes recovered diverse biological functions and processes. Six unique sequences related to settlement, metamorphosis and growth were subsequently analyzed by real-time PCR. Given the lack of whole genome information for oysters, transcriptome and de novo analysis of C. angulata from the eight different developing phases will provide important and useful information on early development mechanism and help genetic breeding of shellfish.
Interaction networks are central elements of ecological systems and have very complex structures. Historically, much effort has focused on niche-mediated processes to explain these structures, while an emerging consensus posits that both niche and neutral mechanisms simultaneously shape many features of ecological communities. However, the study of interaction networks still lacks a comprehensive neutral theory. Here we present a neutral model of predator-prey interactions and analyze the structural characteristics of the simulated networks. We find that connectance values (complexity) and complexity-diversity relationships of neutral networks are close to those observed in empirical bipartite networks. High nestedness and low modularity values observed in neutral networks fall in the range of those from empirical antagonist bipartite networks. Our results suggest that, as an alternative to niche-mediated processes that induce incompatibility between species (“niche forbidden links”), neutral processes create “neutral forbidden links” due to uneven species abundance distributions and the low probability of interaction between rare species. Neutral trophic networks must be seen as the missing endpoint of a continuum from niche to purely stochastic approaches of community organization.
With rapid advances in the development of DNA sequencing technologies, a plethora of high-throughput genome and proteome data from a diverse spectrum of organisms have been generated. The functional annotation and evolutionary history of proteins are usually inferred from domains predicted from the genome sequences. Traditional database-based domain prediction methods cannot identify novel domains, however, and alignment-based methods, which look for recurring segments in the proteome, are computationally demanding. Here, we propose a novel genome-wide domain prediction method, SECOM. Instead of conducting all-against-all sequence alignment, SECOM first indexes all the proteins in the genome by using a hash seed function. Local similarity can thus be detected and encoded into a graph structure, in which each node represents a protein sequence and each edge weight represents the shared hash seeds between the two nodes. SECOM then formulates the domain prediction problem as an overlapping community-finding problem in this graph. A backward graph percolation algorithm that efficiently identifies the domains is proposed. We tested SECOM on five recently sequenced genomes of aquatic animals. Our tests demonstrated that SECOM was able to identify most of the known domains identified by InterProScan. When compared with the alignment-based method, SECOM showed higher sensitivity in detecting putative novel domains, while it was also three orders of magnitude faster. For example, SECOM was able to predict a novel sponge-specific domain in nucleoside-triphosphatase (NTPases). Furthermore, SECOM discovered two novel domains, likely of bacterial origin, that are taxonomically restricted to sea anemone and hydra. SECOM is an open-source program and available at http://sfb.kaust.edu.sa/Pages/Software.aspx.
Gene expression data are influenced by multiple biological and technological factors leading to a wide range of dispersion scenarios, although skewed patterns are not commonly addressed in microarray analyses. In this study, the distribution pattern of several human transcriptomes has been studied on free-access microarray gene expression data. Our results showed that, even in previously normalized gene expression data, probe and differential expression within probe effects suffer from substantial departures from the commonly assumed symmetric Gaussian distribution. We developed a flexible mixed model for non-competitive microarray data analysis that accounted for asymmetric and heavy-tailed (Student’s t distribution) dispersion processes. Random effects for gene expression data were modeled under asymmetric Student’s t distributions where the asymmetry parameter (λ) took values from perfect symmetry (λ = 0) to right- (λ>0) or left-side (λ>0) over-expression patterns. This approach was applied to four free-access human data sets and revealed clearly better model performance when comparing with standard approaches accounting for traditional symmetric Gaussian distribution patterns. Our analyses on human gene expression data revealed a substantial degree of right-hand asymmetry for probe effects, whereas differential gene expression addressed both symmetric and left-hand asymmetric patterns. Although these results cannot be extrapolated to all microarray experiments, they highlighted the incidence of skew dispersion patterns in human transcriptome; moreover, we provided a new analytical approach to appropriately address this biological phenomenon. The source code of the program accommodating these analytical developments and additional information about practical aspects on running the program are freely available by request to the corresponding author of this article.
In this study, we analyzed the metamorphosis of the marine bryozoan Bugula neritina. We observed the morphogenesis of the ancestrula. We defined three distinct pre-ancestrula stages based on the anatomy of the developing polypide and the overall morphology of pre-ancestrula. We then used an annotation based enrichment analysis tool to analyze the B. neritina transcriptome and identified over-representation of genes related to Wnt signaling pathways, suggesting its involvement in metamorphosis. Finally, we studied the temporal-spatial gene expression studies of several Wnt pathway genes. We found that one of the Wnt ligand, BnWnt10, was expressed spatially opposite to the Wnt antagonist BnsFRP within the blastemas, which is the presumptive polypide. Down-stream components of the canonical Wnt signaling pathway were exclusively expressed in the blastemas. Bnβcatenin and BnFz5/8 were exclusively expressed in the blastemas throughout the metamorphosis. Based on the genes expression patterns, we propose that BnWnt10 and BnsFRP may relate to the patterning of the polypide, in which the two genes served as positional signals and contributed to the polarization of the blastemas. Another Wnt ligand, BnWnt6, was expressed in the apical part of the pre-ancestrula epidermis. Overall, our findings suggest that the Wnt signaling pathway may be important to the pattern formation of polypide and the development of epidermis.
Relationships we have with our friends, family, or colleagues influence our personal decisions, as well as decisions we make together with others. As in human beings, despotism and egalitarian societies seem to also exist in animals. While studies have shown that social networks constrain many phenomena from amoebae to primates, we still do not know how consensus emerges from the properties of social networks in many biological systems. We created artificial social networks that represent the continuum from centralized to decentralized organization and used an agent-based model to make predictions about the patterns of consensus and collective movements we observed according to the social network. These theoretical results showed that different social networks and especially contrasted ones – star network vs. equal network - led to totally different patterns. Our model showed that, by moving from a centralized network to a decentralized one, the central individual seemed to lose its leadership in the collective movement's decisions. We, therefore, showed a link between the type of social network and the resulting consensus. By comparing our theoretical data with data on five groups of primates, we confirmed that this relationship between social network and consensus also appears to exist in animal societies.
Elementary arithmetic (e.g., addition, subtraction) in humans has been shown to exhibit spatial properties. Its exact nature has remained elusive, however. To address this issue, we combine two earlier models for parietal cortex: A model we recently proposed on number-space interactions and a modeling framework of parietal cortex that implements radial basis functions for performing spatial transformations. Together, they provide us with a framework in which elementary arithmetic is based on evolutionarily more basic spatial transformations, thus providing the first implemented instance of Dehaene and Cohen's recycling hypothesis.
Cnidarians, including corals and anemones, offer unique insights into metazoan evolution because they harbor genetic similarities with vertebrates beyond that found in model invertebrates and retain genes known only from non-metazoans. Cataloging genes expressed in Acropora palmata, a foundation-species of reefs in the Caribbean and western Atlantic, will advance our understanding of the genetic basis of ecologically important traits in corals and comes at a time when sequencing efforts in other cnidarians allow for multi-species comparisons.
A cDNA library from a sample enriched for symbiont free larval tissue was sequenced on the 454 GS-FLX platform. Over 960,000 reads were obtained and assembled into 42,630 contigs. Annotation data was acquired for 57% of the assembled sequences. Analysis of the assembled sequences indicated that 83–100% of all A. palmata transcripts were tagged, and provided a rough estimate of the total number genes expressed in our samples (∼18,000–20,000). The coral annotation data contained many of the same molecular components as in the Bilateria, particularly in pathways associated with oxidative stress and DNA damage repair, and provided evidence that homologs of p53, a key player in DNA repair pathways, has experienced selection along the branch separating Cnidaria and Bilateria. Transcriptome wide screens of paralog groups and transition/transversion ratios highlighted genes including: green fluorescent proteins, carbonic anhydrase, and oxidative stress proteins; and functional groups involved in protein and nucleic acid metabolism, and the formation of structural molecules. These results provide a starting point for study of adaptive evolution in corals.
Currently available transcriptome data now make comparative studies of the mechanisms underlying coral's evolutionary success possible. Here we identified candidate genes that enable corals to maintain genomic integrity despite considerable exposure to genotoxic stress over long life spans, and showed conservation of important physiological pathways between corals and bilaterians.
Machine learning based miRNA-target prediction algorithms often fail to obtain a balanced prediction accuracy in terms of both sensitivity and specificity due to lack of the gold standard of negative examples, miRNA-targeting site context specific relevant features and efficient feature selection process. Moreover, all the sequence, structure and machine learning based algorithms are unable to distribute the true positive predictions preferentially at the top of the ranked list; hence the algorithms become unreliable to the biologists. In addition, these algorithms fail to obtain considerable combination of precision and recall for the target transcripts that are translationally repressed at protein level.
In the proposed article, we introduce an efficient miRNA-target prediction system MultiMiTar, a Support Vector Machine (SVM) based classifier integrated with a multiobjective metaheuristic based feature selection technique. The robust performance of the proposed method is mainly the result of using high quality negative examples and selection of biologically relevant miRNA-targeting site context specific features. The features are selected by using a novel feature selection technique AMOSA-SVM, that integrates the multi objective optimization technique Archived Multi-Objective Simulated Annealing (AMOSA) and SVM.
MultiMiTar is found to achieve much higher Matthew’s correlation coefficient (MCC) of 0.583 and average class-wise accuracy (ACA) of 0.8 compared to the others target prediction methods for a completely independent test data set. The obtained MCC and ACA values of these algorithms range from −0.269 to 0.155 and 0.321 to 0.582, respectively. Moreover, it shows a more balanced result in terms of precision and sensitivity (recall) for the translationally repressed data set as compared to all the other existing methods. An important aspect is that the true positive predictions are distributed preferentially at the top of the ranked list that makes MultiMiTar reliable for the biologists. MultiMiTar is now available as an online tool at www.isical.ac.in/~bioinfo_miu/multimitar.htm. MultiMiTar software can be downloaded from www.isical.ac.in/~bioinfo_miu/multimitar-download.htm.
Ribosomal RNAs have been widely used for identification and classification of species, and have produced data giving new insights into phylogenetic relationships. Recently, multilocus genotyping and even whole genome sequencing-based technologies have been adopted in ambitious comparative biology studies. However, such technologies are still far from routine-use in species classification studies due to their high costs in terms of labor, equipment and consumables.
Here, we describe a simple and powerful approach for species classification called genome profiling (GP). The GP method composed of random PCR, temperature gradient gel electrophoresis (TGGE) and computer-aided gel image processing is highly informative and less laborious. For demonstration, we classified 26 species of insects using GP and 18S rDNA-sequencing approaches. The GP method was found to give a better correspondence to the classical phenotype-based approach than did 18S rDNA sequencing employing a congruence value. To our surprise, use of a single probe in GP was sufficient to identify the relationships between the insect species, making this approach more straightforward.
The data gathered here, together with those of previous studies show that GP is a simple and powerful method that can be applied for actually universally identifying and classifying species. The current success supported our previous proposal that GP-based web database can be constructible and effective for the global identification/classification of species.