Search tips
Search criteria

Results 1-12 (12)

Clipboard (0)

Select a Filter Below

Year of Publication
Document Types
1.  In Vitro Quantification of the Relative Packaging Efficiencies of Single-Stranded RNA Molecules by Viral Capsid Protein 
Journal of Virology  2012;86(22):12271-12282.
While most T=3 single-stranded RNA (ssRNA) viruses package in vivo about 3,000 nucleotides (nt), in vitro experiments have demonstrated that a broad range of RNA lengths can be packaged. Under the right solution conditions, for example, cowpea chlorotic mottle virus (CCMV) capsid protein (CP) has been shown to package RNA molecules whose lengths range from 100 to 10,000 nt. Furthermore, in each case it can package the RNA completely, as long as the mass ratio of CP to nucleic acid in the assembly mixture is 6:1 or higher. Yet the packaging efficiencies of the RNAs can differ widely, as we demonstrate by measurements in which two RNAs compete head-to-head for a limited amount of CP. We show that the relative efficiency depends nonmonotonically on the RNA length, with 3,200 nt being optimum for packaging by the T=3 capsids preferred by CCMV CP. When two RNAs of the same length—and hence the same charge—compete for CP, differences in packaging efficiency are necessarily due to differences in their secondary structures and/or three-dimensional (3D) sizes. For example, the heterologous RNA1 of brome mosaic virus (BMV) is packaged three times more efficiently by CCMV CP than is RNA1 of CCMV, even though the two RNAs have virtually identical lengths. Finally, we show that in an assembly mixture at neutral pH, CP binds reversibly to the RNA and there is a reversible equilibrium between all the various RNA/CP complexes. At acidic pH, excess protein unbinds from RNA/CP complexes and nucleocapsids form irreversibly.
PMCID: PMC3486494  PMID: 22951822
2.  Self-Assembly of Viral Capsid Protein and RNA Molecules of Different Sizes: Requirement for a Specific High Protein/RNA Mass Ratio 
Journal of Virology  2012;86(6):3318-3326.
Virus-like particles can be formed by self-assembly of capsid protein (CP) with RNA molecules of increasing length. If the protein “insisted” on a single radius of curvature, the capsids would be identical in size, independent of RNA length. However, there would be a limit to length of the RNA, and one would not expect RNA much shorter than native viral RNA to be packaged unless multiple copies were packaged. On the other hand, if the protein did not favor predetermined capsid size, one would expect the capsid diameter to increase with increase in RNA length. Here we examine the self-assembly of CP from cowpea chlorotic mottle virus with RNA molecules ranging in length from 140 to 12,000 nucleotides (nt). Each of these RNAs is completely packaged if and only if the protein/RNA mass ratio is sufficiently high; this critical value is the same for all of the RNAs and corresponds to equal RNA and N-terminal-protein charges in the assembly mix. For RNAs much shorter in length than the 3,000 nt of the viral RNA, two or more molecules are assembled into 24- and 26-nm-diameter capsids, whereas for much longer RNAs (>4,500 nt), a single RNA molecule is shared/packaged by two or more capsids with diameters as large as 30 nm. For intermediate lengths, a single RNA is assembled into 26-nm-diameter capsids, the size associated with T=3 wild-type virus. The significance of these assembly results is discussed in relation to likely factors that maintain T=3 symmetry in vivo.
PMCID: PMC3302347  PMID: 22205731
3.  Salt-Dependent DNA-DNA Spacings in Intact Bacteriophage λ Reflect Relative Importance of DNA Self-Repulsion and Bending Energies 
Physical review letters  2011;106(2):028102.
Using solution synchrotron X-ray scattering, we measure the variation of DNA-DNA d-spacings in bacteriophage λ with mono-, di- and poly-valent salt concentrations, for wild-type (48.5 kbp) and short-genome-mutant (37.8 kbp) strains. From the decrease in d-spacings with increasing salt, we deduce the relative contributions of DNA self-repulsion and bending to the energetics of pack-aged phage genomes. We quantify the DNA-DNA interaction energies within the intact phage by combining the measured d-spacings in the capsid with measurements of osmotic pressure in DNA assemblies under the same salt conditions in bulk solution. In the commonly used Tris-Mg buffer, the DNA-DNA interaction energies inside the phage capsids are shown to be about 1 kT/base pair, an order of magnitude larger than the bending energies.
PMCID: PMC3420006  PMID: 21405253
4.  Automatic categorization of diverse experimental information in the bioscience literature 
BMC Bioinformatics  2012;13:16.
Curation of information from bioscience literature into biological knowledge databases is a crucial way of capturing experimental information in a computable form. During the biocuration process, a critical first step is to identify from all published literature the papers that contain results for a specific data type the curator is interested in annotating. This step normally requires curators to manually examine many papers to ascertain which few contain information of interest and thus, is usually time consuming. We developed an automatic method for identifying papers containing these curation data types among a large pool of published scientific papers based on the machine learning method Support Vector Machine (SVM). This classification system is completely automatic and can be readily applied to diverse experimental data types. It has been in use in production for automatic categorization of 10 different experimental datatypes in the biocuration process at WormBase for the past two years and it is in the process of being adopted in the biocuration process at FlyBase and the Saccharomyces Genome Database (SGD). We anticipate that this method can be readily adopted by various databases in the biocuration community and thereby greatly reducing time spent on an otherwise laborious and demanding task. We also developed a simple, readily automated procedure to utilize training papers of similar data types from different bodies of literature such as C. elegans and D. melanogaster to identify papers with any of these data types for a single database. This approach has great significance because for some data types, especially those of low occurrence, a single corpus often does not have enough training papers to achieve satisfactory performance.
We successfully tested the method on ten data types from WormBase, fifteen data types from FlyBase and three data types from Mouse Genomics Informatics (MGI). It is being used in the curation work flow at WormBase for automatic association of newly published papers with ten data types including RNAi, antibody, phenotype, gene regulation, mutant allele sequence, gene expression, gene product interaction, overexpression phenotype, gene interaction, and gene structure correction.
Our methods are applicable to a variety of data types with training set containing several hundreds to a few thousand documents. It is completely automatic and, thus can be readily incorporated to different workflow at different literature-based databases. We believe that the work presented here can contribute greatly to the tremendous task of automating the important yet labor-intensive biocuration effort.
PMCID: PMC3305665  PMID: 22280404
5.  Discovery of functional elements in 12 Drosophila genomes using evolutionary signatures 
Nature  2007;450(7167):219-232.
Sequencing of multiple related species followed by comparative genomics analysis constitutes a powerful approach for the systematic understanding of any genome. Here, we use the genomes of 12 Drosophila species for the de novo discovery of functional elements in the fly. Each type of functional element shows characteristic patterns of change, or ‘evolutionary signatures’, dictated by its precise selective constraints. Such signatures enable recognition of new protein-coding genes and exons, spurious and incorrect gene annotations, and numerous unusual gene structures, including abundant stop-codon readthrough. Similarly, we predict non-protein-coding RNA genes and structures, and new microRNA (miRNA) genes. We provide evidence of miRNA processing and functionality from both hairpin arms and both DNA strands. We identify several classes of pre- and post-transcriptional regulatory motifs, and predict individual motif instances with high confidence. We also study how discovery power scales with the divergence and number of species compared, and we provide general guidelines for comparative studies.
PMCID: PMC2474711  PMID: 17994088
6.  Comparative Genomics of the Eukaryotes 
Science (New York, N.Y.)  2000;287(5461):2204-2215.
A comparative analysis of the genomes of Drosophila melanogaster, Caenorhabditis elegans, and Saccharomyces cerevisiae—and the proteins they are predicted to encode—was undertaken in the context of cellular, developmental, and evolutionary processes. The nonredundant protein sets of flies and worms are similar in size and are only twice that of yeast, but different gene families are expanded in each genome, and the multidomain proteins and signaling pathways of the fly and worm are far more complex than those of yeast. The fly has orthologs to 177 of the 289 human disease genes examined and provides the foundation for rapid analysis of some of the basic processes involved in human disease.
PMCID: PMC2754258  PMID: 10731134
7.  VectorBase: a data resource for invertebrate vector genomics 
Nucleic Acids Research  2008;37(Database issue):D583-D587.
VectorBase ( is an NIAID-funded Bioinformatic Resource Center focused on invertebrate vectors of human pathogens. VectorBase annotates and curates vector genomes providing a web accessible integrated resource for the research community. Currently, VectorBase contains genome information for three mosquito species: Aedes aegypti, Anopheles gambiae and Culex quinquefasciatus, a body louse Pediculus humanus and a tick species Ixodes scapularis. Since our last report VectorBase has initiated a community annotation system, a microarray and gene expression repository and controlled vocabularies for anatomy and insecticide resistance. We have continued to develop both the software infrastructure and tools for interrogating the stored data.
PMCID: PMC2686483  PMID: 19028744
8.  Inferring genome-scale rearrangement phylogeny and ancestral gene order: a Drosophila case study 
Genome Biology  2007;8(11):R236.
A simple, fast, and biologically-inspired computational approach to infer genome-scale rearrangement phylogeny and ancestral gene order has been developed and applied to eight Drosophila genomes, providing insights into evolutionary chromosomal dynamics.
A simple, fast, and biologically inspired computational approach for inferring genome-scale rearrangement phylogeny and ancestral gene order has been developed. This has been applied to eight Drosophila genomes. Existing techniques are either limited to a few hundred markers or a small number of taxa. This analysis uses over 14,000 genomic loci and employs discrete elements consisting of pairs of homologous genetic elements. The results provide insight into evolutionary chromosomal dynamics and synteny analysis, and inform speciation studies.
PMCID: PMC2258185  PMID: 17996033
9.  Analysis of 14 BAC sequences from the Aedes aegypti genome: a benchmark for genome annotation and assembly 
Genome Biology  2007;8(5):R88.
In order to provide a set of manually curated and annotated sequences from the Aedes aegypti genome, mapped BAC clones encompassing 1.57 Mb were sequenced, assembled and manually annotated using computational gene-finding, EST matches as well as comparative protein homology.
Aedes aegypti is the principal vector of yellow fever and dengue viruses throughout the tropical world. To provide a set of manually curated and annotated sequences from the Ae. aegypti genome, 14 mapped bacterial artificial chromosome (BAC) clones encompassing 1.57 Mb were sequenced, assembled and manually annotated using a combination of computational gene-finding, expressed sequence tag (EST) matches and comparative protein homology. PCR and sequencing were used to experimentally confirm expression and sequence of a subset of these transcripts.
Of the 51 manual annotations, 50 and 43 demonstrated a high level of similarity to Anopheles gambiae and Drosophila melanogaster genes, respectively. Ten of the 12 BAC sequences with more than one annotated gene exhibited synteny with the A. gambiae genome. Putative transcripts from eight BAC clones were found in multiple copies (two copies in most cases) in the Aedes genome assembly, which point to the probable presence of haplotype polymorphisms and/or misassemblies.
This study not only provides a benchmark set of manually annotated transcripts for this genome that can be used to assess the quality of the auto-annotation pipeline and the assembly, but it also looks at the effect of a high repeat content on the genome assembly and annotation pipeline.
PMCID: PMC1929151  PMID: 17519023
10.  VectorBase: a home for invertebrate vectors of human pathogens 
Nucleic Acids Research  2006;35(Database issue):D503-D505.
VectorBase () is a web-accessible data repository for information about invertebrate vectors of human pathogens. VectorBase annotates and maintains vector genomes providing an integrated resource for the research community. Currently, VectorBase contains genome information for two organisms: Anopheles gambiae, a vector for the Plasmodium protozoan agent causing malaria, and Aedes aegypti, a vector for the flaviviral agents causing Yellow fever and Dengue fever.
PMCID: PMC1751530  PMID: 17145709
11.  FlyBase: genomes by the dozen 
Nucleic Acids Research  2006;35(Database issue):D486-D491.
FlyBase () is the primary database of genetic and genomic data for the insect family Drosophilidae. Historically, Drosophila melanogaster has been the most extensively studied species in this family, but recent determination of the genomic sequences of an additional 11 Drosophila species opens up new avenues of research for other Drosophila species. This extensive sequence resource, encompassing species with well-defined phylogenetic relationships, provides a model system for comparative genomic analyses. FlyBase has developed tools to facilitate access to and navigation through this invaluable new data collection.
PMCID: PMC1669768  PMID: 17099233
12.  Annotation of the Drosophila melanogaster euchromatic genome: a systematic review 
Genome Biology  2002;3(12):research0083.1-83.22.
The recent completion of the Drosophila melanogaster genomic sequence to high quality, and the availability of a greatly expanded set of Drosophila cDNA sequences, afforded FlyBase the opportunity to significantly improve genomic annotations.
The recent completion of the Drosophila melanogaster genomic sequence to high quality and the availability of a greatly expanded set of Drosophila cDNA sequences, aligning to 78% of the predicted euchromatic genes, afforded FlyBase the opportunity to significantly improve genomic annotations. We made the annotation process more rigorous by inspecting each gene visually, utilizing a comprehensive set of curation rules, requiring traceable evidence for each gene model, and comparing each predicted peptide to SWISS-PROT and TrEMBL sequences.
Although the number of predicted protein-coding genes in Drosophila remains essentially unchanged, the revised annotation significantly improves gene models, resulting in structural changes to 85% of the transcripts and 45% of the predicted proteins. We annotated transposable elements and non-protein-coding RNAs as new features, and extended the annotation of untranslated (UTR) sequences and alternative transcripts to include more than 70% and 20% of genes, respectively. Finally, cDNA sequence provided evidence for dicistronic transcripts, neighboring genes with overlapping UTRs on the same DNA sequence strand, alternatively spliced genes that encode distinct, non-overlapping peptides, and numerous nested genes.
Identification of so many unusual gene models not only suggests that some mechanisms for gene regulation are more prevalent than previously believed, but also underscores the complex challenges of eukaryotic gene prediction. At present, experimental data and human curation remain essential to generate high-quality genome annotations.
PMCID: PMC151185  PMID: 12537572

Results 1-12 (12)