Search tips
Search criteria

Results 1-5 (5)

Clipboard (0)

Select a Filter Below

more »
Year of Publication
Document Types
author:("soudan, assam")
1.  Finding and identifying the viral needle in the metagenomic haystack: trends and challenges 
Collectively, viruses have the greatest genetic diversity on Earth, occupy extremely varied niches and are likely able to infect all living organisms. Viral infections are an important issue for human health and cause considerable economic losses when agriculturally important crops or husbandry animals are infected. The advent of metagenomics has provided a precious tool to study viruses by sampling them in natural environments and identifying the genomic composition of a sample. However, reaching a clear recognition and taxonomic assignment of the identified viruses has been hampered by the computational difficulty of these problems. In this perspective paper we examine the trends in current research for the identification of viral sequences in a metagenomic sample, pinpoint the intrinsic computational difficulties for the identification of novel viral sequences within metagenomic samples, and suggest possible avenues to overcome them.
PMCID: PMC4285800  PMID: 25610431
microbial metagenomics; NGS; virome; host—pathogen interactions; taxonomic assignment
2.  Timely deposition of macromolecular structures is necessary for peer review 
Deposition of crystallographic structures should be concurrent with or prior to manuscript submission for peer review, enabling validation and increasing reliability of the PDB.
Most of the macromolecular structures in the Protein Data Bank (PDB), which are used daily by thousands of educators and scientists alike, are determined by X-ray crystallography. It was examined whether the crystallographic models and data were deposited to the PDB at the same time as the publications that describe them were submitted for peer review. This condition is necessary to ensure pre-publication validation and the quality of the PDB public archive. It was found that a significant proportion of PDB entries were submitted to the PDB after peer review of the corresponding publication started, and many were only submitted after peer review had ended. It is argued that clear description of journal policies and effective policing is important for pre-publication validation, which is key in ensuring the quality of the PDB and of peer-reviewed literature.
PMCID: PMC3852646  PMID: 24311569
Protein Data Bank; deposition; validation
3.  Finishing bacterial genome assemblies with Mix 
BMC Bioinformatics  2013;14(Suppl 15):S16.
Among challenges that hamper reaping the benefits of genome assembly are both unfinished assemblies and the ensuing experimental costs. First, numerous software solutions for genome de novo assembly are available, each having its advantages and drawbacks, without clear guidelines as to how to choose among them. Second, these solutions produce draft assemblies that often require a resource intensive finishing phase.
In this paper we address these two aspects by developing Mix , a tool that mixes two or more draft assemblies, without relying on a reference genome and having the goal to reduce contig fragmentation and thus speed-up genome finishing. The proposed algorithm builds an extension graph where vertices represent extremities of contigs and edges represent existing alignments between these extremities. These alignment edges are used for contig extension. The resulting output assembly corresponds to a set of paths in the extension graph that maximizes the cumulative contig length.
We evaluate the performance of Mix on bacterial NGS data from the GAGE-B study and apply it to newly sequenced Mycoplasma genomes. Resulting final assemblies demonstrate a significant improvement in the overall assembly quality. In particular, Mix is consistent by providing better overall quality results even when the choice is guided solely by standard assembly statistics, as is the case for de novo projects.
Mix is implemented in Python and is available at, novel data for our Mycoplasma study is available at
PMCID: PMC3851838  PMID: 24564706
4.  Flux Analysis of the Trypanosoma brucei Glycolysis Based on a Multiobjective-Criteria Bioinformatic Approach 
Advances in Bioinformatics  2012;2012:159423.
Trypanosoma brucei is a protozoan parasite of major of interest in discovering new genes for drug targets. This parasite alternates its life cycle between the mammal host(s) (bloodstream form) and the insect vector (procyclic form), with two divergent glucose metabolism amenable to in vitro culture. While the metabolic network of the bloodstream forms has been well characterized, the flux distribution between the different branches of the glucose metabolic network in the procyclic form has not been addressed so far. We present a computational analysis (called Metaboflux) that exploits the metabolic topology of the procyclic form, and allows the incorporation of multipurpose experimental data to increase the biological relevance of the model. The alternatives resulting from the structural complexity of networks are formulated as an optimization problem solved by a metaheuristic where experimental data are modeled in a multiobjective function. Our results show that the current metabolic model is in agreement with experimental data and confirms the observed high metabolic flexibility of glucose metabolism. In addition, Metaboflux offers a rational explanation for the high flexibility in the ratio between final products from glucose metabolism, thsat is, flux redistribution through the malic enzyme steps.
PMCID: PMC3477527  PMID: 23097667
5.  Identification of conserved gene clusters in multiple genomes based on synteny and homology 
BMC Bioinformatics  2011;12(Suppl 9):S18.
Uncovering the relationship between the conserved chromosomal segments and the functional relatedness of elements within these segments is an important question in computational genomics. We build upon the series of works on gene teams and homology teams.
Our primary contribution is a local sliding-window SYNS (SYNtenic teamS) algorithm that refines an existing family structure into orthologous sub-families by analyzing the neighborhoods around the members of a given family with a locally sliding window. The neighborhood analysis is done by computing conserved gene clusters. We evaluate our algorithm on the existing homologous families from the Genolevures database over five genomes of the Hemyascomycete phylum.
The result is an efficient algorithm that works on multiple genomes, considers paralogous copies of genes and is able to uncover orthologous clusters even in distant genomes. Resulting orthologous clusters are comparable to those obtained by manual curation.
PMCID: PMC3283307  PMID: 22151970

Results 1-5 (5)