Bioinformatics is an interdisciplinary research field that applies advanced computational techniques to biological data. Bibliometrics analysis has recently been adopted to understand the knowledge structure of a research field by citation pattern. In this paper, we explore the knowledge structure of Bioinformatics from the perspective of a core open access Bioinformatics journal, BMC Bioinformatics with trend analysis, the content and co-authorship network similarity, and principal component analysis. Publications in four core journals including Bioinformatics – Oxford Journal and four conferences in Bioinformatics were harvested from DBLP. After converting publications into TF-IDF term vectors, we calculate the content similarity, and we also calculate the social network similarity based on the co-authorship network by utilizing the overlap measure between two co-authorship networks. Key terms is extracted and analyzed with PCA, visualization of the co-authorship network is conducted. The experimental results show that Bioinformatics is fast-growing, dynamic and diversified. The content analysis shows that there is an increasing overlap among Bioinformatics journals in terms of topics and more research groups participate in researching Bioinformatics according to the co-authorship network similarity.
The goal of this research is to learn how the editorial staffs of bioinformatics and medical informatics journals provide support for cross-community exposure. Models such as co-citation and co-author analysis measure the relationships between researchers; but they do not capture how environments that support knowledge transfer across communities are organized.
In this paper, we propose a social network analysis model to study how editorial boards integrate researchers from disparate communities. We evaluate our model by building relational networks based on the editorial boards of approximately 40 journals that serve as research outlets in medical informatics and bioinformatics. We track the evolution of editorial relationships through a longitudinal investigation over the years 2000 through 2005.
Our findings suggest that there are research journals that support the collocation of editorial board members from the bioinformatics and medical informatics communities. Network centrality metrics indicate that editorial board members are located in the intersection of the communities and that the number of individuals in the intersection is growing with time.
Social network analysis methods provide insight into the relationships between the medical informatics and bioinformatics communities. The number of editorial board members facilitating the publication intersection of the communities has grown, but the intersection remains dependent on a small group of individuals and fragile.
Significant interest exists in establishing synergistic research in bioinformatics, systems biology and intelligent computing. Supported by the United States National Science Foundation (NSF), International Society of Intelligent Biological Medicine (http://www.ISIBM.org), International Journal of Computational Biology and Drug Design (IJCBDD) and International Journal of Functional Informatics and Personalized Medicine, the ISIBM International Joint Conferences on Bioinformatics, Systems Biology and Intelligent Computing (ISIBM IJCBS 2009) attracted more than 300 papers and 400 researchers and medical doctors world-wide. It was the only inter/multidisciplinary conference aimed to promote synergistic research and education in bioinformatics, systems biology and intelligent computing. The conference committee was very grateful for the valuable advice and suggestions from honorary chairs, steering committee members and scientific leaders including Dr. Michael S. Waterman (USC, Member of United States National Academy of Sciences), Dr. Chih-Ming Ho (UCLA, Member of United States National Academy of Engineering and Academician of Academia Sinica), Dr. Wing H. Wong (Stanford, Member of United States National Academy of Sciences), Dr. Ruzena Bajcsy (UC Berkeley, Member of United States National Academy of Engineering and Member of United States Institute of Medicine of the National Academies), Dr. Mary Qu Yang (United States National Institutes of Health and Oak Ridge, DOE), Dr. Andrzej Niemierko (Harvard), Dr. A. Keith Dunker (Indiana), Dr. Brian D. Athey (Michigan), Dr. Weida Tong (FDA, United States Department of Health and Human Services), Dr. Cathy H. Wu (Georgetown), Dr. Dong Xu (Missouri), Drs. Arif Ghafoor and Okan K Ersoy (Purdue), Dr. Mark Borodovsky (Georgia Tech, President of ISIBM), Dr. Hamid R. Arabnia (UGA, Vice-President of ISIBM), and other scientific leaders. The committee presented the 2009 ISIBM Outstanding Achievement Awards to Dr. Joydeep Ghosh (UT Austin), Dr. Aidong Zhang (Buffalo) and Dr. Zhi-Hua Zhou (Nanjing) for their significant contributions to the field of intelligent biological medicine.
The advent of high-throughput next generation sequencing technologies have fostered enormous potential applications of supercomputing techniques in genome sequencing, epi-genetics, metagenomics, personalized medicine, discovery of non-coding RNAs and protein-binding sites. To this end, the 2008 International Conference on Bioinformatics and Computational Biology (Biocomp) – 2008 World Congress on Computer Science, Computer Engineering and Applied Computing (Worldcomp) was designed to promote synergistic inter/multidisciplinary research and education in response to the current research trends and advances. The conference attracted more than two thousand scientists, medical doctors, engineers, professors and students gathered at Las Vegas, Nevada, USA during July 14–17 and received great success. Supported by International Society of Intelligent Biological Medicine (ISIBM), International Journal of Computational Biology and Drug Design (IJCBDD), International Journal of Functional Informatics and Personalized Medicine (IJFIPM) and the leading research laboratories from Harvard, M.I.T., Purdue, UIUC, UCLA, Georgia Tech, UT Austin, U. of Minnesota, U. of Iowa etc, the conference received thousands of research papers. Each submitted paper was reviewed by at least three reviewers and accepted papers were required to satisfy reviewers' comments. Finally, the review board and the committee decided to select only 19 high-quality research papers for inclusion in this supplement to BMC Genomics based on the peer reviews only. The conference committee was very grateful for the Plenary Keynote Lectures given by: Dr. Brian D. Athey (University of Michigan Medical School), Dr. Vladimir N. Uversky (Indiana University School of Medicine), Dr. David A. Patterson (Member of United States National Academy of Sciences and National Academy of Engineering, University of California at Berkeley) and Anousheh Ansari (Prodea Systems, Space Ambassador). The theme of the conference to promote synergistic research and education has been achieved successfully.
Supported by National Science Foundation (NSF), International Society of Intelligent Biological Medicine (ISIBM), International Journal of Computational Biology and Drug Design and International Journal of Functional Informatics and Personalized Medicine, IEEE 7th Bioinformatics and Bioengineering attracted more than 600 papers and 500 researchers and medical doctors. It was the only synergistic inter/multidisciplinary IEEE conference with 24 Keynote Lectures, 7 Tutorials, 5 Cutting-Edge Research Workshops and 32 Scientific Sessions including 11 Special Research Interest Sessions that were designed dynamically at Harvard in response to the current research trends and advances. The committee was very grateful for the IEEE Plenary Keynote Lectures given by: Dr. A. Keith Dunker (Indiana), Dr. Jun Liu (Harvard), Dr. Brian Athey (Michigan), Dr. Mark Borodovsky (Georgia Tech and President of ISIBM), Dr. Hamid Arabnia (Georgia and Vice-President of ISIBM), Dr. Ruzena Bajcsy (Berkeley and Member of United States National Academy of Engineering and Member of United States Institute of Medicine of the National Academies), Dr. Mary Yang (United States National Institutes of Health and Oak Ridge, DOE), Dr. Chih-Ming Ho (UCLA and Member of United States National Academy of Engineering and Academician of Academia Sinica), Dr. Andy Baxevanis (United States National Institutes of Health), Dr. Arif Ghafoor (Purdue), Dr. John Quackenbush (Harvard), Dr. Eric Jakobsson (UIUC), Dr. Vladimir Uversky (Indiana), Dr. Laura Elnitski (United States National Institutes of Health) and other world-class scientific leaders. The Harvard meeting was a large academic event 100% full-sponsored by IEEE financially and academically. After a rigorous peer-review process, the committee selected 27 high-quality research papers from 600 submissions. The committee is grateful for contributions from keynote speakers Dr. Russ Altman (IEEE BIBM conference keynote lecturer on combining simulation and machine learning to recognize function in 4D), Dr. Mary Qu Yang (IEEE BIBM workshop keynote lecturer on new initiatives of detecting microscopic disease using machine learning and molecular biology, http://ieeexplore.ieee.org/servlet/opac?punumber=4425386) and Dr. Jack Y. Yang (IEEE BIBM workshop keynote lecturer on data mining and knowledge discovery in translational medicine) from the first IEEE Computer Society BioInformatics and BioMedicine (IEEE BIBM) international conference and workshops, November 2-4, 2007, Silicon Valley, California, USA.
This paper presents the Bioinformatics Computational Journal (BCJ), a framework for conducting and managing computational experiments in bioinformatics and computational biology. These experiments often involve series of computations, data searches, filters, and annotations which can benefit from a structured environment. Systems to manage computational experiments exist, ranging from libraries with standard data models to elaborate schemes to chain together input and output between applications. Yet, although such frameworks are available, their use is not widespread–ad hoc scripts are often required to bind applications together. The BCJ explores another solution to this problem through a computer based environment suitable for on-site use, which builds on the traditional laboratory notebook paradigm. It provides an intuitive, extensible paradigm designed for expressive composition of applications. Extensive features facilitate sharing data, computational methods, and entire experiments. By focusing on the bioinformatics and computational biology domain, the scope of the computational framework was narrowed, permitting us to implement a capable set of features for this domain. This report discusses the features determined critical by our system and other projects, along with design issues. We illustrate the use of our implementation of the BCJ on two domain-specific examples.
The explosion of genome sequencing data along with genotype to phenotype correlation studies has created data deluge in the area of biomedical sciences. The aim of the Medical bioinformatics section is to aid the development and maturation of the field by providing a platform for the translation of these datasets into useful clinical applications. The increase in computing capabilities and availability of different data from advanced technologies will allow researchers to build System Biology models of various diseases in order to efficiently develop new therapeutic interventions and reduce the current prohibitively large costs of drug discovery.
The section welcomes studies on the development of Biomedical Informatics for translational medicine and clinical applications, including tools, methodologies and data integration.
Motivation: With the explosion of biomedical literature and the evolution of online and open access, scientists are reading more articles from a wider variety of journals. Thus, the list of core journals relevant to their research may be less obvious and may often change over time. To help researchers quickly identify appropriate journals to read and publish in, we developed a web application for finding related journals based on the analysis of PubMed log data.
Supplementary information: Supplementary data are available at Bioinformatics online.
The SYMBIOmatics Specific Support Action (SSA) is "an information gathering and dissemination activity" that seeks "to identify synergies between the bioinformatics and the medical informatics" domain to improve collaborative progress between both domains (ref. to ). As part of the project experts in both research fields will be identified and approached through a survey. To provide input to the survey, the scientific literature was analysed to extract topics relevant to both medical informatics and bioinformatics.
This paper presents results of a systematic analysis of the scientific literature from medical informatics research and bioinformatics research. In the analysis pairs of words (bigrams) from the leading bioinformatics and medical informatics journals have been used as indication of existing and emerging technologies and topics over the period 2000–2005 ("recent") and 1990–1990 ("past"). We identified emerging topics that were equally important to bioinformatics and medical informatics in recent years such as microarray experiments, ontologies, open source, text mining and support vector machines. Emerging topics that evolved only in bioinformatics were system biology, protein interaction networks and statistical methods for microarray analyses, whereas emerging topics in medical informatics were grid technology and tissue microarrays.
We conclude that although both fields have their own specific domains of interest, they share common technological developments that tend to be initiated by new developments in biotechnology and computer science.
When the human genome project was conceived, its leaders wanted all researchers to have equal access to the data and associated research tools. Their vision of equal access provides an unprecedented teaching opportunity. Teachers and students have free access to the same databases that researchers are using. Furthermore, the recent movement to deliver scientific publications freely has presented a second source of current information for teaching. I have developed a genomics course that incorporates many of the public-domain databases, research tools, and peer-reviewed journals. These online resources provide students with exciting entree into the new fields of genomics, proteomics, and bioinformatics. In this essay, I outline how these fields are especially well suited for inclusion in the undergraduate curriculum. Assessment data indicate that my students were able to utilize online information to achieve the educational goals of the course and that the experience positively influenced their perceptions of how they might contribute to biology.
genomic; proteomics; bioinformatics; teaching; research; undergraduate; model organisms; online databases; public domain
The 6th Benelux Bioinformatics Conference (BBC11) held in Luxembourg on 12 and 13 December 2011 attracted around 200 participants, including internationally-renowned guest speakers and more than 100 peer-reviewed submissions from 3 continents. Researchers from the public and private sectors convened at BBC11 to discuss advances and challenges in a wide spectrum of application areas. A key theme of the conference was the contribution of bioinformatics to enable and accelerate translational and clinical research. The BBC11 stressed the need for stronger collaborating efforts across disciplines and institutions. The demonstration of the clinical relevance of systems approaches and of next-generation sequencing-based measurement technologies are among the existing opportunities for increasing impact in translational research. Translational bioinformatics will benefit from research models that strike a balance between the importance of protecting intellectual property and the need to openly access scientific and technological advances. The full conference proceedings are freely available at http://www.bbc11.lu.
Translational bioinformatics; Clinical bioinformatics; Translational research; Systems biology; Next-generation sequencing; Bioinformatic infrastructure
The public sharing of primary research datasets potentially benefits the research community but is not yet common practice. In this pilot study, we analyzed whether data sharing frequency was associated with funder and publisher requirements, journal impact factor, or investigator experience and impact. Across 397 recent biomedical microarray studies, we found investigators were more likely to publicly share their raw dataset when their study was published in a high-impact journal and when the first or last authors had high levels of career experience and impact. We estimate the USA’s National Institutes of Health (NIH) data sharing policy applied to 19% of the studies in our cohort; being subject to the NIH data sharing plan requirement was not found to correlate with increased data sharing behavior in multivariate logistic regression analysis. Studies published in journals that required a database submission accession number as a condition of publication were more likely to share their data, but this trend was not statistically significant. These early results will inform our ongoing larger analysis, and hopefully contribute to the development of more effective data sharing initiatives.
bibliometrics; bioinformatics; policy evaluation; data sharing
The Asia-Pacific Bioinformatics Network (APBioNet) held the first International Conference on Bioinformatics (InCoB) in Bangkok in 2002 to promote North-South networking. Commencing as a forum for Asia-Pacific researchers to interact with and learn from with scientists of developed countries, InCoB has become a major regional bioinformatics conference, with participants from the region as well as North America and Europe. Since 2006, InCoB has selected the best submissions for publication in BMC Bioinformatics. In response to the growth and maturation of data-driven approaches, InCoB added BMC Genomics in 2009 and with the introduction of this conference supplement, BMC Systems Biology to its journal choices for submitting authors. Co-hosting InCoB2013 with the second International Conference for Translational Bioinformatics (ICTBI) is in line with InCoB's support for the current trend in taking bioinformatics to the bedside, along with a systems approach to solving biological problems.
This editorial announces Algorithms for Molecular Biology, a new online open access journal published by BioMed Central. By launching the first open access journal on algorithmic bioinformatics, we provide a forum for fast publication of high-quality research articles in this rapidly evolving field. Our journal will publish thoroughly peer-reviewed papers without length limitations covering all aspects of algorithmic data analysis in computatioal biology. Publications in Algorithms for Molecular Biology are easy to find, highly visible and tracked by organisations such as PubMed. An established online submission system makes a fast reviewing procedure possible and enables us to publish accepted papers without delay. All articles published in our journal are permanently archived by PubMed Central and other scientific archives. We are looking forward to receiving your contributions.
Biology has a conceptual basis that allows one to build models and theorize across many life sciences, including medicine and medically-related disciplines. A dearth of good venues for publication has been perceived during a period when bioinformatics, systems analysis and biomathematics are burgeoning.
Steps have been taken to provide the sort of journal with a quick turnaround time for manuscripts which is online and freely accessible to all readers, whatever their persuasion or discipline. We have now been running for some time a journal which has had many good papers presented pre-launch, and a steady stream of papers thereafter. The value of this journal as a new venue has already been vindicated.
Within a short space of time, we have founded a state-of-the-art electronic journal freely accessible to all in a much sort-after interdisciplinary field that will be of benefit to the thinking life scientist, which must include medically qualified doctors as well as scientists who prefer to build their new hypotheses on basic principles and sound concepts underpinning biology. At the same time, these principles are not sacrosanct and require critical analysis. The journal promises to deliver many exciting ideas in the future.
Policies supporting the rapid and open sharing of proteomic data are being implemented by the leading journals in the field. The proteomics community is taking steps to ensure that data are made publicly accessible and are of high quality, a challenging task that requires the development and deployment of methods for measuring and documenting data quality metrics. On September 18, 2010, the U.S. National Cancer Institute (NCI) convened the “International Workshop on Proteomic Data Quality Metrics” in Sydney, Australia, to identify and address issues facing the development and use of such methods for open access proteomics data. The stakeholders at the workshop enumerated the key principles underlying a framework for data quality assessment in mass spectrometry data that will meet the needs of the research community, journals, funding agencies, and data repositories. Attendees discussed and agreed up on two primary needs for the wide use of quality metrics: (1) an evolving list of comprehensive quality metrics and (2) standards accompanied by software analytics. Attendees stressed the importance of increased education and training programs to promote reliable protocols in proteomics. This workshop report explores the historic precedents, key discussions, and necessary next steps to enhance the quality of open access data.
By agreement, this article is published simultaneously in the Journal of Proteome Research, Molecular and Cellular Proteomics, Proteomics, and Proteomics Clinical Applications as a public service to the research community. The peer review process was a coordinated effort conducted by a panel of referees selected by the journals.
selected reaction monitoring; bioinformatics; data quality; metrics; open access; Amsterdam Principles; standards
Over the past few decades, major advances in the field of molecular biology, coupled with advances in genomic technologies, have led to an explosive growth in the biological data generated by the scientific community. The critical need to process and analyze such a deluge of data and turn it into useful knowledge has caused bioinformatics to gain prominence and importance. Bioinformatics is an interdisciplinary research area that applies techniques, methodologies, and tools in computer and information science to solve biological problems. In Nigeria, bioinformatics has recently played a vital role in the advancement of biological sciences. As a developing country, the importance of bioinformatics is rapidly gaining acceptance, and bioinformatics groups comprised of biologists, computer scientists, and computer engineers are being constituted at Nigerian universities and research institutes. In this article, we present an overview of bioinformatics education and research in Nigeria. We also discuss professional societies and academic and research institutions that play central roles in advancing the discipline in Nigeria. Finally, we propose strategies that can bolster bioinformatics education and support from policy makers in Nigeria, with potential positive implications for other developing countries.
In today’s proteomics research, various techniques and instrumentation bioinformatics tools are necessary to manage the large amount of heterogeneous data with an automatic quality control to produce reliable and comparable results. Therefore a data-processing pipeline is mandatory for data validation and comparison in a data-warehousing system. The proteome bioinformatics platform ProteinScape has been proven to cover these needs. The reprocessing of HUPO BPP participants’ MS data was done within ProteinScape. The reprocessed information was transferred into the global data repository PRIDE.
ProteinScape as a data-warehousing system covers two main aspects: archiving relevant data of the proteomics workflow and information extraction functionality (protein identification, quantification and generation of biological knowledge). As a strategy for automatic data validation, different protein search engines are integrated. Result analysis is performed using a decoy database search strategy, which allows the measurement of the false-positive identification rate. Peptide identifications across different workflows, different MS techniques, and different search engines are merged to obtain a quality-controlled protein list.
The proteomics identifications database (PRIDE), as a public data repository, is an archiving system where data are finally stored and no longer changed by further processing steps. Data submission to PRIDE is open to proteomics laboratories generating protein and peptide identifications. An export tool has been developed for transferring all relevant HUPO BPP data from ProteinScape into PRIDE using the PRIDE.xml format.
The EU-funded ProDac project will coordinate the development of software tools covering international standards for the representation of proteomics data. The implementation of data submission pipelines and systematic data collection in public standards–compliant repositories will cover all aspects, from the generation of MS data in each laboratory to the conversion of all the annotating information and identifications to a standardized format. Such datasets can be used in the course of publishing in scientific journals.
Since 2010, the European Molecular Biology Laboratory's (EMBL) Heidelberg laboratory and the European Bioinformatics Institute (EMBL-EBI) have jointly run bioinformatics training courses developed specifically for secondary school science teachers within Europe and EMBL member states. These courses focus on introducing bioinformatics, databases, and data-intensive biology, allowing participants to explore resources and providing classroom-ready materials to support them in sharing this new knowledge with their students.
In this article, we chart our progress made in creating and running three bioinformatics training courses, including how the course resources are received by participants and how these, and bioinformatics in general, are subsequently used in the classroom. We assess the strengths and challenges of our approach, and share what we have learned through our interactions with European science teachers.
In a number of estimation problems in bioinformatics, accuracy measures of the target problem are usually given, and it is important to design estimators that are suitable to those accuracy measures. However, there is often a discrepancy between an employed estimator and a given accuracy measure of the problem. In this study, we introduce a general class of efficient estimators for estimation problems on high-dimensional binary spaces, which represent many fundamental problems in bioinformatics. Theoretical analysis reveals that the proposed estimators generally fit with commonly-used accuracy measures (e.g. sensitivity, PPV, MCC and F-score) as well as it can be computed efficiently in many cases, and cover a wide range of problems in bioinformatics from the viewpoint of the principle of maximum expected accuracy (MEA). It is also shown that some important algorithms in bioinformatics can be interpreted in a unified manner. Not only the concept presented in this paper gives a useful framework to design MEA-based estimators but also it is highly extendable and sheds new light on many problems in bioinformatics.
Flow cytometry bioinformatics is the application of bioinformatics to flow cytometry data, which involves storing, retrieving, organizing, and analyzing flow cytometry data using extensive computational resources and tools. Flow cytometry bioinformatics requires extensive use of and contributes to the development of techniques from computational statistics and machine learning. Flow cytometry and related methods allow the quantification of multiple independent biomarkers on large numbers of single cells. The rapid growth in the multidimensionality and throughput of flow cytometry data, particularly in the 2000s, has led to the creation of a variety of computational analysis methods, data standards, and public databases for the sharing of results. Computational methods exist to assist in the preprocessing of flow cytometry data, identifying cell populations within it, matching those cell populations across samples, and performing diagnosis and discovery using the results of previous steps. For preprocessing, this includes compensating for spectral overlap, transforming data onto scales conducive to visualization and analysis, assessing data for quality, and normalizing data across samples and experiments. For population identification, tools are available to aid traditional manual identification of populations in two-dimensional scatter plots (gating), to use dimensionality reduction to aid gating, and to find populations automatically in higher dimensional space in a variety of ways. It is also possible to characterize data in more comprehensive ways, such as the density-guided binary space partitioning technique known as probability binning, or by combinatorial gating. Finally, diagnosis using flow cytometry data can be aided by supervised learning techniques, and discovery of new cell types of biological importance by high-throughput statistical methods, as part of pipelines incorporating all of the aforementioned methods.
Open standards, data, and software are also key parts of flow cytometry bioinformatics. Data standards include the widely adopted Flow Cytometry Standard (FCS) defining how data from cytometers should be stored, but also several new standards under development by the International Society for Advancement of Cytometry (ISAC) to aid in storing more detailed information about experimental design and analytical steps. Open data is slowly growing with the opening of the CytoBank database in 2010 and FlowRepository in 2012, both of which allow users to freely distribute their data, and the latter of which has been recommended as the preferred repository for MIFlowCyt-compliant data by ISAC. Open software is most widely available in the form of a suite of Bioconductor packages, but is also available for web execution on the GenePattern platform.
The interdisciplinary nature of bioinformatics makes it an ideal framework to develop activities enabling enquiry-based learning. We describe here the development and implementation of a pilot project to use bioinformatics-based research activities in high schools, called “Bioinformatics@school.” It includes web-based research projects that students can pursue alone or under teacher supervision and a teacher training program. The project is organized so as to enable discussion of key results between students and teachers. After successful trials in two high schools, as measured by questionnaires, interviews, and assessment of knowledge acquisition, the project is expanding by the action of the teachers involved, who are helping us develop more content and are recruiting more teachers and schools.
The aims of this study were to develop bioinformatics tools to explore ultra-deep pyrosequencing (UDPS) data, to test these tools, and to use them to determine the optimum error threshold, and to compare results from UDPS and cloning based sequencing (CBS).
Four serum samples, infected with either genotype D or E, from HBeAg-positive and HBeAg-negative patients were randomly selected. UDPS and CBS were used to sequence the basic core promoter/precore region of HBV. Two online bioinformatics tools, the “Deep Threshold Tool” and the “Rosetta Tool” (http://hvdr.bioinf.wits.ac.za/tools/), were built to test and analyze the generated data.
A total of 10952 reads were generated by UDPS on the 454 GS Junior platform. In the four samples, substitutions, detected at 0.5% threshold or above, were identified at 39 unique positions, 25 of which were non-synonymous mutations. Sample #2 (HBeAg-negative, genotype D) had substitutions in 26 positions, followed by sample #1 (HBeAg-negative, genotype E) in 12 positions, sample #3 (HBeAg-positive, genotype D) in 7 positions and sample #4 (HBeAg-positive, genotype E) in only four positions. The ratio of nucleotide substitutions between isolates from HBeAg-negative and HBeAg-positive patients was 3.5∶1. Compared to genotype E isolates, genotype D isolates showed greater variation in the X, basic core promoter/precore and core regions. Only 18 of the 39 positions identified by UDPS were detected by CBS, which detected 14 of the 25 non-synonymous mutations detected by UDPS.
UDPS data should be approached with caution. Appropriate curation of read data is required prior to analysis, in order to clean the data and eliminate artefacts. CBS detected fewer than 50% of the substitutions detected by UDPS. Furthermore it is important that the appropriate consensus (reference) sequence is used in order to identify variants correctly.
Ab initio assembly of transcriptome sequencing data has been widely used to identify large intergenic non-coding RNAs (lincRNAs), a novel class of gene regulators involved in many biological processes. To differentiate real lincRNA transcripts from thousands of assembly artifacts, a series of filtering steps such as filters of transcript length, expression level and coding potential, need to be applied. However, an easy-to-use and publicly available bioinformatics pipeline that integrates these filters is not yet available. Hence, we implemented sebnif, an integrative bioinformatics pipeline to facilitate the discovery of bona fide novel lincRNAs that are suitable for further functional characterization. Specifically, sebnif is the only pipeline that implements an algorithm for identifying high-quality single-exonic lincRNAs that were often omitted in many studies. To demonstrate the usage of sebnif, we applied it on a real biological RNA-seq dataset from Human Skeletal Muscle Cells (HSkMC) and built a novel lincRNA catalog containing 917 highly reliable lincRNAs. Sebnif is available at http://sunlab.lihs.cuhk.edu.hk/sebnif/.