Human Heredity and Health in Africa (H3Africa) research seeks to promote fair collaboration between scientists in Africa and those from elsewhere. Here, we outline how concerns over inequality and exploitation led to a policy framework that places a firm focus on African leadership and capacity building as guiding principles for African genomics research.
Africa; genomics; ethics; fairness; H3Africa
In recent years, high-throughput technologies have brought big data to the life sciences. The march of progress has been rapid, leaving in its wake a demand for courses in data analysis, data stewardship, computing fundamentals, etc., a need that universities have not yet been able to satisfy—paradoxically, many are actually closing “niche” bioinformatics courses at a time of critical need. The impact of this is being felt across continents, as many students and early-stage researchers are being left without appropriate skills to manage, analyse, and interpret their data with confidence. This situation has galvanised a group of scientists to address the problems on an international scale. For the first time, bioinformatics educators and trainers across the globe have come together to address common needs, rising above institutional and international boundaries to cooperate in sharing bioinformatics training expertise, experience, and resources, aiming to put ad hoc training practices on a more professional footing for the benefit of all.
We report a study of genome-wide, dense SNP (∼900K) and copy number polymorphism data of indigenous southern Africans. We demonstrate the genetic contribution to southern and eastern African populations, which involved admixture between indigenous San, Niger-Congo-speaking and populations of Eurasian ancestry. This finding illustrates the need to account for stratification in genome-wide association studies, and that admixture mapping would likely be a successful approach in these populations. We developed a strategy to detect the signature of selection prior to and following putative admixture events. Several genomic regions show an unusual excess of Niger-Kordofanian, and unusual deficiency of both San and Eurasian ancestry, which were considered the footprints of selection after population admixture. Several SNPs with strong allele frequency differences were observed predominantly between the admixed indigenous southern African populations, and their ancestral Eurasian populations. Interestingly, many candidate genes, which were identified within the genomic regions showing signals for selection, were associated with southern African-specific high-risk, mostly communicable diseases, such as malaria, influenza, tuberculosis, and human immunodeficiency virus/AIDs. This observation suggests a potentially important role that these genes might have played in adapting to the environment. Additionally, our analyses of haplotype structure, linkage disequilibrium, recombination, copy number variation and genome-wide admixture highlight, and support the unique position of San relative to both African and non-African populations. This study contributes to a better understanding of population ancestry and selection in south-eastern African populations; and the data and results obtained will support research into the genetic contributions to infectious as well as non-communicable diseases in the region.
Genome-wide analysis of human populations is useful in shedding light on the evolutionary history of the human genome, with a wide range of applications from reconstructing past associations between different population histories to disease mapping. In this manuscript we report on the application of genome-wide data to southern African populations and the identification of genome-wide signatures of selection pre- and post-admixture. Several signals of selection, before and after admixture, were identified, some of which involved loci associated with human diseases, including malaria, influenza, tuberculosis and HIV/AIDS. These results may reflect adaptations of southern African populations to infectious diseases. Consistent with previous studies, this study highlights the significance of the San in the genetics of human populations, as they are distinct from the other populations in many respects i.e. haplotype structure, locations of recombination hotspots, copy number and population structure. Furthermore, our study demonstrates the admixture of the San, Bantu-speaking populations and populations of Eurasian ancestry in some of the southern and eastern African populations. It illustrates the value in correcting for this stratification in future genome-wide association studies, and suggests that a future admixture mapping in these populations would likely be warranted and successful.
“Scientific community” refers to a group of people collaborating together on scientific-research-related activities who also share common goals, interests, and values. Such communities play a key role in many bioinformatics activities. Communities may be linked to a specific location or institute, or involve people working at many different institutions and locations. Education and training is typically an important component of these communities, providing a valuable context in which to develop skills and expertise, while also strengthening links and relationships within the community. Scientific communities facilitate: (i) the exchange and development of ideas and expertise; (ii) career development; (iii) coordinated funding activities; (iv) interactions and engagement with professionals from other fields; and (v) other activities beneficial to individual participants, communities, and the scientific field as a whole. It is thus beneficial at many different levels to understand the general features of successful, high-impact bioinformatics communities; how individual participants can contribute to the success of these communities; and the role of education and training within these communities. We present here a quick guide to building and maintaining a successful, high-impact bioinformatics community, along with an overview of the general benefits of participating in such communities. This article grew out of contributions made by organizers, presenters, panelists, and other participants of the ISMB/ECCB 2013 workshop “The ‘How To Guide’ for Establishing a Successful Bioinformatics Network” at the 21st Annual International Conference on Intelligent Systems for Molecular Biology (ISMB) and the 12th European Conference on Computational Biology (ECCB).
The current increase in Gene Ontology (GO) annotations of proteins in the existing genome databases and their use in different analyses have fostered the improvement of several biomedical and biological applications. To integrate this functional data into different analyses, several protein functional similarity measures based on GO term information content (IC) have been proposed and evaluated, especially in the context of annotation-based measures. In the case of topology-based measures, each approach was set with a specific functional similarity measure depending on its conception and applications for which it was designed. However, it is not clear whether a specific functional similarity measure associated with a given approach is the most appropriate, given a biological data set or an application, i.e., achieving the best performance compared to other functional similarity measures for the biological application under consideration. We show that, in general, a specific functional similarity measure often used with a given term IC or term semantic similarity approach is not always the best for different biological data and applications. We have conducted a performance evaluation of a number of different functional similarity measures using different types of biological data in order to infer the best functional similarity measure for each different term IC and semantic similarity approach. The comparisons of different protein functional similarity measures should help researchers choose the most appropriate measure for the biological application under consideration.
Infectious diseases are the leading cause of death, particularly in developing countries. Although many drugs are available for treating the most common infectious diseases, in many cases the mechanism of action of these drugs or even their targets in the pathogen remain unknown. In addition, the key factors or processes in pathogens that facilitate infection and disease progression are often not well understood. Since proteins do not work in isolation, understanding biological systems requires a better understanding of the interconnectivity between proteins in different pathways and processes, which includes both physical and other functional interactions. Such biological networks can be generated within organisms or between organisms sharing a common environment using experimental data and computational predictions. Though different data sources provide different levels of accuracy, confidence in interactions can be measured using interaction scores. Connections between interacting proteins in biological networks can be represented as graphs and edges, and thus studied using existing algorithms and tools from graph theory. There are many different applications of biological networks, and here we discuss three such applications, specifically applied to the infectious disease tuberculosis, with its causative agent Mycobacterium tuberculosis and host, Homo sapiens. The applications include the use of the networks for function prediction, comparison of networks for evolutionary studies, and the generation and use of host–pathogen interaction networks.
Biological networks; Tuberculosis; Pathogen; Evolution; Protein–protein interaction
With the advancement of new high throughput sequencing technologies, there has been an increase in the number of genome sequencing projects worldwide, which has yielded complete genome sequences of human, animals and plants. Subsequently, several labs have focused on genome annotation, consisting of assigning functions to gene products, mostly using Gene Ontology (GO) terms. As a consequence, there is an increased heterogeneity in annotations across genomes due to different approaches used by different pipelines to infer these annotations and also due to the nature of the GO structure itself. This makes a curator's task difficult, even if they adhere to the established guidelines for assessing these protein annotations. Here we develop a genome-scale approach for integrating GO annotations from different pipelines using semantic similarity measures. We used this approach to identify inconsistencies and similarities in functional annotations between orthologs of human and Drosophila melanogaster, to assess the quality of GO annotations derived from InterPro2GO mappings compared to manually annotated GO annotations for the Drosophila melanogaster proteome from a FlyBase dataset and human, and to filter GO annotation data for these proteomes. Results obtained indicate that an efficient integration of GO annotations eliminates redundancy up to 27.08 and 22.32% in the Drosophila melanogaster and human GO annotation datasets, respectively. Furthermore, we identified lack of and missing annotations for some orthologs, and annotation mismatches between InterPro2GO and manual pipelines in these two proteomes, thus requiring further curation. This simplifies and facilitates tasks of curators in assessing protein annotations, reduces redundancy and eliminates inconsistencies in large annotation datasets for ease of comparative functional genomics.
functional annotation; Gene Ontology annotation; annotation pipeline; manual annotation; electronic annotation
The discipline of bioinformatics has developed rapidly since the complete sequencing of the first genomes in the 1990s. The development of many high-throughput techniques during the last decades has ensured that bioinformatics has grown into a discipline that overlaps with, and is required for, the modern practice of virtually every field in the life sciences. This has placed a scientific premium on the availability of skilled bioinformaticians, a qualification that is extremely scarce on the African continent. The reasons for this are numerous, although the absence of a skilled bioinformatician at academic institutions to initiate a training process and build sustained capacity seems to be a common African shortcoming. This dearth of bioinformatics expertise has had a knock-on effect on the establishment of many modern high-throughput projects at African institutes, including the comprehensive and systematic analysis of genomes from African populations, which are among the most genetically diverse anywhere on the planet. Recent funding initiatives from the National Institutes of Health and the Wellcome Trust are aimed at ameliorating this shortcoming. In this paper, we discuss the problems that have limited the establishment of the bioinformatics field in Africa, as well as propose specific actions that will help with the education and training of bioinformaticians on the continent. This is an absolute requirement in anticipation of a boom in high-throughput approaches to human health issues unique to data from African populations.
bioinformatics education; bioinformatics in Africa; postgraduate program
Population differentiation is the result of demographic and evolutionary forces. Whole genome datasets from the 1000 Genomes Project (October 2012) provide an unbiased view of genetic variation across populations from Europe, Asia, Africa and the Americas. Common population-specific SNPs (MAF > 0.05) reflect a deep history and may have important consequences for health and wellbeing. Their interpretation is contextualised by currently available genome data.
The identification of common population-specific (CPS) variants (SNPs and SSV) is influenced by admixture and the sample size under investigation. Nine of the populations in the 1000 Genomes Project (2 African, 2 Asian (including a merged Chinese group) and 5 European) revealed that the African populations (LWK and YRI), followed by the Japanese (JPT) have the highest number of CPS SNPs, in concordance with their histories and given the populations studied. Using two methods, sliding 50-SNP and 5-kb windows, the CPS SNPs showed distinct clustering across large genome segments and little overlap of clusters between populations. iHS enrichment score and the population branch statistic (PBS) analyses suggest that selective sweeps are unlikely to account for the clustering and population specificity. Of interest is the association of clusters close to recombination hotspots. Functional analysis of genes associated with the CPS SNPs revealed over-representation of genes in pathways associated with neuronal development, including axonal guidance signalling and CREB signalling in neurones.
Common population-specific SNPs are non-randomly distributed throughout the genome and are significantly associated with recombination hotspots. Since the variant alleles of most CPS SNPs are the derived allele, they likely arose in the specific population after a split from a common ancestor. Their proximity to genes involved in specific pathways, including neuronal development, suggests evolutionary plasticity of selected genomic regions. Contrary to expectation, selective sweeps did not play a large role in the persistence of population-specific variation. This suggests a stochastic process towards population-specific variation which reflects demographic histories and may have some interesting implications for health and susceptibility to disease.
Electronic supplementary material
The online version of this article (doi:10.1186/1471-2164-15-437) contains supplementary material, which is available to authorized users.
Interaction between proteins is one of the most important mechanisms in the execution of cellular functions. The study of these interactions has provided insight into the functioning of an organism’s processes. As of October 2013, Homo sapiens had over 170000 Protein-Protein interactions (PPI) registered in the Interologous Interaction Database, which is only one of the many public resources where protein interactions can be accessed. These numbers exemplify the volume of data that research on the topic has generated. Visualization of large data sets is a well known strategy to make sense of information, and protein interaction data is no exception. There are several tools that allow the exploration of this data, providing different methods to visualize protein network interactions. However, there is still no native web tool that allows this data to be explored interactively online.
Given the advances that web technologies have made recently it is time to bring these interactive views to the web to provide an easily accessible forum to visualize PPI. We have created a Web-based Protein Interaction Network Visualizer: PINV, an open source, native web application that facilitates the visualization of protein interactions (http://biosual.cbio.uct.ac.za/pinv.html). We developed PINV as a set of components that follow the protocol defined in BioJS and use the D3 library to create the graphic layouts. We demonstrate the use of PINV with multi-organism interaction networks for a predicted target from Mycobacterium tuberculosis, its interacting partners and its orthologs.
The resultant tool provides an attractive view of complex, fully interactive networks with components that allow the querying, filtering and manipulation of the visible subset. Moreover, as a web resource, PINV simplifies sharing and publishing, activities which are vital in today’s research collaborative environments. The source code is freely available for download at https://github.com/4ndr01d3/biosual.
Visualization; Protein-Protein Interactions; PPI; Web development
Summary: We present two web-based components for the display of Protein-Protein Interaction networks using different self-organizing layout methods: force-directed and circular. These components conform to the BioJS standard and can be rendered in an HTML5-compliant browser without the need for third-party plugins. We provide examples of interaction networks and how the components can be used to visualize them, and refer to a more complex tool that uses these components.
The use of Gene Ontology (GO) data in protein analyses have largely contributed to
the improved outcomes of these analyses. Several GO semantic similarity measures
have been proposed in recent years and provide tools that allow the integration of
biological knowledge embedded in the GO structure into different biological
analyses. There is a need for a unified tool that provides the scientific
community with the opportunity to explore these different GO similarity measure
approaches and their biological applications.
We have developed DaGO-Fun, an online tool available at
http://web.cbio.uct.ac.za/ITGOM, which incorporates many different
GO similarity measures for exploring, analyzing and comparing GO terms and
proteins within the context of GO. It uses GO data and UniProt proteins with their
GO annotations as provided by the Gene Ontology Annotation (GOA) project to
precompute GO term information content (IC), enabling rapid response to user
The DaGO-Fun online tool presents the advantage of integrating all the relevant
IC-based GO similarity measures, including topology- and annotation-based
approaches to facilitate effective exploration of these measures, thus enabling
users to choose the most relevant approach for their application. Furthermore,
this tool includes several biological applications related to GO semantic
similarity scores, including the retrieval of genes based on their GO annotations,
the clustering of functionally related genes within a set, and term enrichment
Admixed populations can make an important contribution to the discovery of disease susceptibility genes if the parental populations exhibit substantial variation in susceptibility. Admixture mapping has been used successfully, but is not designed to cope with populations that have more than two or three ancestral populations. The inference of admixture proportions and local ancestry and the imputation of missing genotypes in admixed populations are crucial in both understanding variation in disease and identifying novel disease loci. These inferences make use of reference populations, and accuracy depends on the choice of ancestral populations. Using an insufficient or inaccurate ancestral panel can result in erroneously inferred ancestry and affect the detection power of GWAS and meta-analysis when using imputation. Current algorithms are inadequate for multi-way admixed populations. To address these challenges we developed PROXYANC, an approach to select the best proxy ancestral populations. From the simulation of a multi-way admixed population we demonstrate the capability and accuracy of PROXYANC and illustrate the importance of the choice of ancestry in both estimating admixture proportions and imputing missing genotypes. We applied this approach to a complex, uniquely admixed South African population. Using genome-wide SNP data from over 764 individuals, we accurately estimate the genetic contributions from the best ancestral populations: isiXhosa , ‡Khomani SAN , European , Indian , and Chinese . We also demonstrate that the ancestral allele frequency differences correlate with increased linkage disequilibrium in the South African population, which originates from admixture events rather than population bottlenecks.
The collective term for people of mixed ancestry in southern Africa is “Coloured,” and this is officially recognized in South Africa as a census term, and for self-classification. Whilst we acknowledge that some cultures may use this term in a derogatory manner, these connotations are not present in South Africa, and are certainly not intended here.
Several approaches have been proposed for computing
term information content (IC) and semantic similarity scores
within the gene ontology (GO) directed acyclic graph (DAG).
These approaches contributed to improving protein analyses at
the functional level. Considering the recent proliferation of these
approaches, a unified theory in a well-defined mathematical
framework is necessary in order to provide a theoretical basis
for validating these approaches. We review the existing IC-based
ontological similarity approaches developed in the context
of biomedical and bioinformatics fields to propose a general
framework and unified description of all these measures. We
have conducted an experimental evaluation to assess the impact
of IC approaches, different normalization models, and correction
factors on the performance of a functional similarity metric.
Results reveal that considering only parents or only children of
terms when assessing information content or semantic similarity
scores negatively impacts the approach under consideration.
This study produces a unified framework for current and future
GO semantic similarity measures and provides theoretical basics
for comparing different approaches. The experimental evaluation
of different approaches based on different term information
content models paves the way towards a solution to the issue of scoring a term's specificity in the GO DAG.
The outcome of infection by Mycobacterium tuberculosis (Mtb) depends greatly on how the host responds to the bacteria and how the bacteria manipulates the host, which is facilitated by protein–protein interactions. Thus, to understand this process, there is a need for elucidating protein interactions between human and Mtb, which may enable us to characterize specific molecular mechanisms allowing the bacteria to persist and survive under different environmental conditions. In this work, we used the interologs method based on experimentally verified intra-species and inter-species interactions to predict human-Mtb functional interactions. These interactions were further filtered using known human-Mtb interactions and genes that are differentially expressed during infection, producing 190 interactions. Further analysis of the subcellular location of proteins involved in these human-Mtb interactions confirms feasibility of these interactions. We also conducted functional analysis of human and Mtb proteins involved in these interactions, checking whether these proteins play a role in infection and/or disease, and enriching Mtb proteins in a previously predicted list of drug targets. We found that the biological processes of the human interacting proteins suggested their involvement in apoptosis and production of nitric oxide, whereas those of the Mtb interacting proteins were relevant to the intracellular environment of Mtb in the host. Mapping these proteins onto KEGG pathways highlighted proteins belonging to the tuberculosis pathway and also suggested that Mtb proteins might use the host to acquire nutrients, which is in agreement with the intracellular lifestyle of Mtb. This indicates that these interactions can shed light on the interplay between Mtb and its human host and thus, contribute to the process of designing novel drugs with new biological mechanisms of action.
The mountains of data thrusting from the new landscape of modern high-throughput biology are irrevocably changing biomedical research and creating a near-insatiable demand for training in data management and manipulation and data mining and analysis. Among life scientists, from clinicians to environmental researchers, a common theme is the need not just to use, and gain familiarity with, bioinformatics tools and resources but also to understand their underlying fundamental theoretical and practical concepts. Providing bioinformatics training to empower life scientists to handle and analyse their data efficiently, and progress their research, is a challenge across the globe. Delivering good training goes beyond traditional lectures and resource-centric demos, using interactivity, problem-solving exercises and cooperative learning to substantially enhance training quality and learning outcomes. In this context, this article discusses various pragmatic criteria for identifying training needs and learning objectives, for selecting suitable trainees and trainers, for developing and maintaining training skills and evaluating training quality. Adherence to these criteria may help not only to guide course organizers and trainers on the path towards bioinformatics training excellence but, importantly, also to improve the training experience for life scientists.
bioinformatics; training; bioinformatics courses; training life scientists; train the trainers
Summary: We present iAnn, an open source community-driven platform for dissemination of life science events, such as courses, conferences and workshops. iAnn allows automatic visualisation and integration of customised event reports. A central repository lies at the core of the platform: curators add submitted events, and these are subsequently accessed via web services. Thus, once an iAnn widget is incorporated into a website, it permanently shows timely relevant information as if it were native to the remote site. At the same time, announcements submitted to the repository are automatically disseminated to all portals that query the system. To facilitate the visualization of announcements, iAnn provides powerful filtering options and views, integrated in Google Maps and Google Calendar. All iAnn widgets are freely available.
Measles virus (MV) causes T cell suppression by interference with phosphatidylinositol-3-kinase (PI3K) activation. We previously found that this interference affected the activity of splice regulatory proteins and a T cell inhibitory protein isoform was produced from an alternatively spliced pre-mRNA.
Differentially regulated and alternatively splice variant transcripts accumulating in response to PI3K abrogation in T cells potentially encode proteins involved in T cell silencing.
To test this hypothesis at the cellular level, we performed a Human Exon 1.0 ST Array on RNAs isolated from T cells stimulated only or stimulated after PI3K inhibition. We developed a simple algorithm based on a splicing index to detect genes that undergo alternative splicing (AS) or are differentially regulated (RG) upon T cell suppression.
Applying our algorithm to the data, 9% of the genes were assigned as AS, while only 3% were attributed to RG. Though there are overlaps, AS and RG genes differed with regard to functional regulation, and were found to be enriched in different functional groups. AS genes targeted extracellular matrix (ECM)-receptor interaction and focal adhesion pathways, while RG genes were mainly enriched in cytokine-receptor interaction and Jak-STAT. When combined, AS/RG dependent alterations targeted pathways essential for T cell receptor signaling, cytoskeletal dynamics and cell cycle entry.
PI3K abrogation interferes with key T cell activation processes through both differential expression and alternative splicing, which together actively contribute to T cell suppression.
Latent tuberculosis is a clinical syndrome that occurs after an individual has been exposed to the Mycobacterium tuberculosis (Mtb) Bacillus, the infection has been established and an immune response has been generated to control the pathogen and force it into a quiescent state. Mtb can exit this quiescent state where it is unresponsive to treatment and elusive to the immune response, and enter a rapid replicating state, hence causing infection reactivation. It remains a gray area to understand how the pathogen causes a persistent infection and it is unclear whether the organism will be in a slow replicating state or a dormant non-replicating state. The ability of the pathogen to adapt to changing host immune response mechanisms, in which it is exposed to hypoxia, low pH, nitric oxide (NO), nutrient starvation, and several other anti-microbial effectors, is associated with a high metabolic plasticity that enables it to metabolize under these different conditions. Adaptive gene regulatory mechanisms are thought to coordinate how the pathogen changes their metabolic pathways through mechanisms that sense changes in oxygen tension and other stress factors, hence stimulating the pathogen to make necessary adjustments to ensure survival. Here, we review studies that give insights into latency/dormancy regulatory mechanisms that enable infection persistence and pathogen adaptation to different stress conditions. We highlight what mathematical and computational models can do and what they should do to enhance our current understanding of TB latency.
Mycobacterium tuberculosis; latency and dormancy regulation; latency models; mathematical and computational modeling
A large number of diverse, complex, and distributed data resources are currently available in the Bioinformatics domain. The pace of discovery and the diversity of information means that centralised reference databases like UniProt and Ensembl cannot integrate all potentially relevant information sources. From a user perspective however, centralised access to all relevant information concerning a specific query is essential. The Distributed Annotation System (DAS) defines a communication protocol to exchange annotations on genomic and protein sequences; this standardisation enables clients to retrieve data from a myriad of sources, thus offering centralised access to end-users.
We introduce MyDas, a web server that facilitates the publishing of biological annotations according to the DAS specification. It deals with the common functionality requirements of making data available, while also providing an extension mechanism in order to implement the specifics of data store interaction. MyDas allows the user to define where the required information is located along with its structure, and is then responsible for the communication protocol details.
The wide coverage and biological relevance of the Gene Ontology (GO), confirmed through its successful use in protein function prediction, have led to the growth in its popularity. In order to exploit the extent of biological knowledge that GO offers in describing genes or groups of genes, there is a need for an efficient, scalable similarity measure for GO terms and GO-annotated proteins. While several GO similarity measures exist, none adequately addresses all issues surrounding the design and usage of the ontology. We introduce a new metric for measuring the distance between two GO terms using the intrinsic topology of the GO-DAG, thus enabling the measurement of functional similarities between proteins based on their GO annotations. We assess the performance of this metric using a ROC analysis on human protein-protein interaction datasets and correlation coefficient analysis on the selected set of protein pairs from the CESSM online tool. This metric achieves good performance compared to the existing annotation-based GO measures. We used this new metric to assess functional similarity between orthologues, and show that it is effective at determining whether orthologues are annotated with similar functions and identifying cases where annotation is inconsistent between orthologues.
InterPro amalgamates predictive protein signatures from a number of well-known partner databases into a single resource. To aid with interpretation of results, InterPro entries are manually annotated with terms from the Gene Ontology (GO). The InterPro2GO mappings are comprised of the cross-references between these two resources and are the largest source of GO annotation predictions for proteins. Here, we describe the protocol by which InterPro curators integrate GO terms into the InterPro database. We discuss the unique challenges involved in integrating specific GO terms with entries that may describe a diverse set of proteins, and we illustrate, with examples, how InterPro hierarchies reflect GO terms of increasing specificity. We describe a revised protocol for GO mapping that enables us to assign GO terms to domains based on the function of the individual domain, rather than the function of the families in which the domain is found. We also discuss how taxonomic constraints are dealt with and those cases where we are unable to add any appropriate GO terms. Expert manual annotation of InterPro entries with GO terms enables users to infer function, process or subcellular information for uncharacterized sequences based on sequence matches to predictive models.
http://www.ebi.ac.uk/interpro. The complete InterPro2GO mappings are available at: ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/external2go/interpro2go
Technological developments in large-scale biological experiments, coupled with bioinformatics tools, have opened the doors to computational approaches for the global analysis of whole genomes. This has provided the opportunity to look at genes within their context in the cell. The integration of vast
amounts of data generated by these technologies provides a strategy for identifying potential drug targets
within microbial pathogens, the causative agents of infectious diseases. As proteins are druggable targets,
functional interaction networks between proteins are used to identify proteins essential to the survival,
growth, and virulence of these microbial pathogens. Here we have integrated functional genomics data to
generate functional interaction networks between Mycobacterium tuberculosis proteins and carried out computational analyses to dissect the functional interaction network produced for identifying drug targets
using network topological properties. This study has provided the opportunity to expand the range of potential drug targets and to move towards optimal target-based strategies.