Open source tools are needed to facilitate the construction, analysis, and visualization of gene-gene interaction networks for sequencing data. To address this need, we present Encore, an open source network analysis pipeline for GWAS and rare variant data. Encore constructs Genetic Association Interaction Networks or Epistasis Networks using two optional approaches: our previous information-theory method or a generalized linear model approach. Additionally, Encore includes multiple data filtering options, including Random Forest/Random Jungle for main effect enrichment and Evaporative Cooling and Relief-F filters for enrichment of interaction effects. Encore implements SNPrank network centrality for identifying susceptibility hubs (nodes containing a large amount of disease susceptibility information through the combination of multivariate main effects and multiple gene-gene interactions in the network), and it provides appropriate files for interactive visualization of a network using tools from our online Galaxy instance. We implemented these algorithms in C++ using OpenMP for shared-memory parallel analysis on a server or desktop. To demonstrate Encore’s utility in analysis of genetic sequencing data, we present an analysis of exome resequencing data from healthy individuals and those with Systemic Lupus Erythematous (SLE). Our results verify the importance of the previously associated SLE genes HLA-DRB and NCF2, and these two genes had the highest gene-gene interaction degrees among the susceptibility hubs. An additional 14 genes previously associated with SLE emerged in our epistasis network model of the exome data, and three novel candidate genes, ST8SIA4, CMTM4, and C2CD4B, were implicated in the model. In summary, we present a comprehensive tool for epistasis network analysis and the first such analysis of exome data from a genetic study of SLE.
epistasis network; machine learning; network analysis; network centrality; Systemic Lupus Erythematous
The simultaneous targeting of host and pathogen processes represents an untapped approach for the treatment of intracellular infections. Hypoxia-inducible factor-1 (HIF-1) is a host cell transcription factor that is activated by and required for the growth of the intracellular protozoan parasite Toxoplasma gondii at physiological oxygen levels. Parasite activation of HIF-1 is blocked by inhibiting the family of closely related Activin-Like Kinase (ALK) host cell receptors ALK4, ALK5, and ALK7, which was determined in part by use of an ALK4,5,7 inhibitor named SB505124. Besides inhibiting HIF-1 activation, SB505124 also potently blocks parasite replication under normoxic conditions. To determine whether SB505124 inhibition of parasite growth was exclusively due to inhibition of ALK4,5,7 or because the drug inhibited a second kinase, SB505124-resistant parasites were isolated by chemical mutagenesis. Whole-genome sequencing of these mutants revealed mutations in the Toxoplasma MAP kinase, TgMAPK1. Allelic replacement of mutant TgMAPK1 alleles into wild-type parasites was sufficient to confer SB505124 resistance. SB505124 independently impacts TgMAPK1 and ALK4,5,7 signaling since drug resistant parasites could not activate HIF-1 in the presence of SB505124 or grow in HIF-1 deficient cells. In addition, TgMAPK1 kinase activity is inhibited by SB505124. Finally, mice treated with SB505124 had significantly lower tissue burdens following Toxoplasma infection. These data therefore identify SB505124 as a novel small molecule inhibitor that acts by inhibiting two distinct targets, host HIF-1 and TgMAPK1.
Understanding how a compound blocks growth of an intracellular pathogen is important not only for developing these compounds into drugs that can be prescribed to patients, but also because these data will likely provide novel insight into the biology of these pathogens. Forward genetic screens are one established approach towards defining these mechanisms. But performing these screens with intracellular parasites has been limited not only because of technical limitations but also because the compounds may have off-target effects in either the host or parasite. Here, we report the first compound that kills a pathogen by simultaneously inhibiting distinct host- and parasite-encoded targets. Because developing drug resistance simultaneously to two targets is less likely, this work may highlight a new approach to antimicrobial drug discovery.
Genotyping variants in the human genome has proven to be an efficient method to identify genetic associations with phenotypes. The distribution of variants within families or populations can facilitate identification of the genetic factors of disease. Illumina's panel of genotyping BeadChips allows investigators to genotype thousands or millions of single nucleotide polymorphisms (SNPs) or to analyze other genomic variants, such as copy number, across a large number of DNA samples. These SNPs can be spread throughout the genome or targeted in specific regions in order to maximize potential discovery. The Infinium assay has been optimized to yield high-quality, accurate results quickly. With proper setup, a single technician can process from a few hundred to over a thousand DNA samples per week, depending on the type of array. This assay guides users through every step, starting with genomic DNA and ending with the scanning of the array. Using propriety reagents, samples are amplified, fragmented, precipitated, resuspended, hybridized to the chip, extended by a single base, stained, and scanned on either an iScan or Hi Scan high-resolution optical imaging system. One overnight step is required to amplify the DNA. The DNA is denatured and isothermally amplified by whole-genome amplification; therefore, no PCR is required. Samples are hybridized to the arrays during a second overnight step. By the third day, the samples are ready to be scanned and analyzed. Amplified DNA may be stockpiled in large quantities, allowing bead arrays to be processed every day of the week, thereby maximizing throughput.
Basic Protocol; Issue 81; genomics; SNP; Genotyping; Infinium; iScan; HiScan; Illumina
Systemic lupus erythematosus (SLE) is an autoimmune disease characterized by autoantibody production and altered type I interferon expression. Genetic surveys and genome-wide association studies have identified more than 30 SLE susceptibility genes. One of these genes, TNIP1, encodes the ABIN1 protein. ABIN1 functions in the immune system by restricting the NF-κB signaling. In order to better understand the genetic factors that influence association with SLE in genes that regulate the NF-κB pathway, we analyzed a dense set of genetic markers spanning TNIP1 and TAX1BP1, as well as the TNIP1 homolog, TNIP2, in case-control sets of diverse ethnic origins.
We fine-mapped TNIP1, TNIP2, and TAX1BP1 in a total of 8372 SLE cases and 7492 healthy controls from European-ancestry, African-American, Hispanic, East Asian, and African-American Gullah populations. Levels of TNIP1 mRNA and ABIN1 protein were analyzed using quantitative RT-PCR and Western blotting, respectively, in EBV-transformed human B cell lines.
We found significant associations between genetic variants within TNIP1 and SLE but not in TNIP2 or TAX1BP1. After resequencing and imputation, we identified two independent risk haplotypes within TNIP1 in individuals of European-ancestry that were also present in African-American and Hispanic populations. These risk haplotypes produced lower levels of TNIP1 mRNA and ABIN1 protein suggesting they harbor hypomorphic functional variants that influence susceptibility to SLE by restricting ABIN1 expression.
Our results confirmed the association signals between SLE and TNIP1 variants in multiple populations and provide new insight into the mechanism by which TNIP1 variants may contribute to SLE pathogenesis.
Although new and emerging next-generation sequencing (NGS) technologies have reduced sequencing costs significantly, much work remains to implement them for de novo sequencing of complex and highly repetitive genomes such as the tetraploid genome of Upland cotton (Gossypium hirsutum L.). Herein we report the results from implementing a novel, hybrid Sanger/454-based BAC-pool sequencing strategy using minimum tiling path (MTP) BACs from Ctg-3301 and Ctg-465, two large genomic segments in A12 and D12 homoeologous chromosomes (Ctg). To enable generation of longer contig sequences in assembly, we implemented a hybrid assembly method to process ~35x data from 454 technology and 2.8-3x data from Sanger method. Hybrid assemblies offered higher sequence coverage and better sequence assemblies. Homology studies revealed the presence of retrotransposon regions like Copia and Gypsy elements in these contigs and also helped in identifying new genomic SSRs. Unigenes were anchored to the sequences in Ctg-3301 and Ctg-465 to support the physical map. Gene density, gene structure and protein sequence information derived from protein prediction programs were used to obtain the functional annotation of these genes. Comparative analysis of both contigs with Arabidopsis genome exhibited synteny and microcollinearity with a conserved gene order in both genomes. This study provides insight about use of MTP-based BAC-pool sequencing approach for sequencing complex polyploid genomes with limited constraints in generating better sequence assemblies to build reference scaffold sequences. Combining the utilities of MTP-based BAC-pool sequencing with current longer and short read NGS technologies in multiplexed format would provide a new direction to cost-effectively and precisely sequence complex plant genomes.
Functional characterization of causal variants present on risk haplotypes identified through genome-wide association studies (GWAS) is a primary objective of human genetics. In this report, we evaluate the function of a pair of tandem polymorphic dinucleotides, 42 kb downstream of the promoter of TNFAIP3, (rs148314165, rs200820567, collectively referred to as TT>A) recently nominated as causal variants responsible for genetic association of systemic lupus erythematosus (SLE) with tumor necrosis factor alpha inducible protein 3 (TNFAIP3). TNFAIP3 encodes the ubiquitin-editing enzyme, A20, a key negative regulator of NF-κB signaling. A20 expression is reduced in subjects carrying the TT>A risk alleles; however, the underlying functional mechanism by which this occurs is unclear. We used a combination of electrophoretic mobility shift assays (EMSA), mass spectrometry (MS), reporter assays, chromatin immunoprecipitation-PCR (ChIP-PCR) and chromosome conformation capture (3C) EBV transformed lymphoblastoid cell lines (LCL) from individuals carrying risk and non-risk TNFAIP3 haplotypes to characterize the effect of TT>A on A20 expression. Our results demonstrate that the TT>A variants reside in an enhancer element that binds NF-κB and SATB1 enabling physical interaction of the enhancer with the TNFAIP3 promoter through long-range DNA looping. Impaired binding of NF-κB to the TT>A risk alleles or knockdown of SATB1 expression by shRNA, inhibits the looping interaction resulting in reduced A20 expression. Together, these data reveal a novel mechanism of TNFAIP3 transcriptional regulation and establish the functional basis by which the TT>A risk variants attenuate A20 expression through inefficient delivery of NF-κB to the TNFAIP3 promoter. These results provide critical functional evidence supporting a direct causal role for TT>A in the genetic predisposition to SLE.
A key objective of human genetics is the identification and characterization of variants responsible for association with complex diseases. A pair of single nucleotide polymorphisms (rs148314165, rs200820567) 42 kb downstream from the promoter of TNFAIP3, have been proposed as the variants responsible for association with systemic lupus erythematosus based on comprehensive genetic and bioinformatic analyses. TNFAIP3 encodes for the ubiquitin-editing enzyme, A20, which plays a central role in maintaining immune system homeostasis through restriction of NF-κB signaling. Cells that carry this risk haplotype express low levels of TNFAIP3 compared to cells carrying the nonrisk haplotype. How the risk alleles of rs148314165 and rs200820567 might influence low TNFAIP3 expression is unknown. In this paper, we demonstrate that these variants reside in an enhancer element that binds NF-κB and SATB1 enabling the interaction of the enhancer with the TNFAIP3 promoter through long-range DNA looping. Impaired binding of NF-κB directly to the risk alleles or shRNA-mediated knockdown of SATB1 inhibits interaction of the enhancer with the TNFAIP3 promoter resulting in reduced A20 expression. These results clarify the functional mechanism by which rs148314165 and rs200820567 attenuate A20 expression and support a causal role for these variants in the predisposition to autoimmune disease.
Recent advances in the field of genetics have dramatically changed our understanding of autoimmune disease. Candidate gene and, more recently, genome-wide association (GWA) studies have led to an explosion in the number of loci and pathways known to contribute to autoimmune phenotypes. Since the 1970s, researchers have known that several alleles in the MHC region play a role in the pathogenesis of many autoimmune diseases. More recent work has identified numerous risk loci involving both the innate and adaptive immune responses. However, much remains to be learned about the heritability of autoimmune conditions. Most regions found through GWA scans have yet to isolate the association to the causal allele(s) responsible for conferring disease risk. A role for rare variants (allele frequencies of <1%) has begun to emerge. Future research will use next generation sequencing (NGS) technology to comprehensively evaluate the human genome for risk variants. Whole transcriptome sequencing is now possible, which will provide much more detailed gene expression data. The dramatic drop in the cost and time required to sequence the entire human genome will ultimately make it possible for this technology to be used as a clinical diagnostic tool.
Genetics; Genomics; Genome-wide association study; Autoimmune disease
Systemic lupus erythematosus (SLE) is an autoimmune disease with diverse clinical manifestations characterized by the development of pathogenic autoantibodies manifesting in inflammation of target organs such as the kidneys, skin and joints. Genome-wide association studies have identified genetic variants in the UBE2L3 region that are associated with SLE in subjects of European and Asian ancestry. UBE2L3 encodes an ubiquitin-conjugating enzyme, UBCH7, involved in cell proliferation and immune function. In this study, we sought to further characterize the genetic association in the region of UBE2L3 and use molecular methods to determine the functional effect of the risk haplotype. We identified significant associations between variants in the region of UBE2L3 and SLE in individuals of European and Asian ancestry that exceeded a Bonferroni corrected threshold (P < 1 × 10−4). A single risk haplotype was observed in all associated populations. Individuals harboring the risk haplotype display a significant increase in both UBE2L3 mRNA expression (P = 0.0004) and UBCH7 protein expression (P = 0.0068). The results suggest that variants carried on the SLE associated UBE2L3 risk haplotype influence autoimmunity by modulating UBCH7 expression.
Systemic Lupus Erythematosus; UBE2L3; Multi Ethnic Association Study; UBCH7 Expression
Homozygous C1q deficiency is an extremely rare condition and strongly associated with systemic lupus erythematosus. To assess and characterize C1q deficiency in an African-American lupus pedigree, C1q genomic region was evaluated in the lupus cases and family members.
Genomic DNA from patient was obtained and C1q A, B and C gene cluster was sequenced using next generation sequencing method. The identified mutation was further confirmed by direct Sanger sequencing method in the patient and all blood relatives. C1q levels in serum were measured using sandwich ELISA method.
In an African-American patient with lupus and C1q deficiency, we identified and confirmed a novel homozygote start codon mutation in C1qA gene that changes amino acid Methionine to Arginine at position 1. The Met1Arg mutation prevents protein translation (Met1Arg). Mutation analyses of the patient’s family members also revealed the Met1Arg homozygote mutation in her deceased brother who also had lupus with absence of total complement activity consistent with a recessive pattern of inheritance.
The identification of new mutation in C1qA gene that disrupts the start codon (ATG to AGG (Met1Arg)), has not been reported previously and it expands the knowledge and importance of the C1q gene in the pathogenesis of lupus especially in high risk African-American population.
Pyrosequencing analysis of 16S rRNA genes was used to examine impacts of elevated CO2 (eCO2) on soil microbial communities from 12 replicates each from ambient CO2 (aCO2) and eCO2 settings. The results suggest that the soil microbial community composition and structure significantly altered under conditions of eCO2, which was closely associated with soil and plant properties.
Legumes (Fabaceae or Leguminosae) are unique among cultivated plants for their ability to carry out endosymbiotic nitrogen fixation with rhizobial bacteria, a process that takes place in a specialized structure known as the nodule. Legumes belong to one of the two main groups of eurosids, the Fabidae, which includes most species capable of endosymbiotic nitrogen fixation 1. Legumes comprise several evolutionary lineages derived from a common ancestor 60 million years ago (Mya). Papilionoids are the largest clade, dating nearly to the origin of legumes and containing most cultivated species 2. Medicago truncatula (Mt) is a long-established model for the study of legume biology. Here we describe the draft sequence of the Mt euchromatin based on a recently completed BAC-assembly supplemented with Illumina-shotgun sequence, together capturing ~94% of all Mt genes. A whole-genome duplication (WGD) approximately 58 Mya played a major role in shaping the Mt genome and thereby contributed to the evolution of endosymbiotic nitrogen fixation. Subsequent to the WGD, the Mt genome experienced higher levels of rearrangement than two other sequenced legumes, Glycine max (Gm) and Lotus japonicus (Lj). Mt is a close relative of alfalfa (M. sativa), a widely cultivated crop with limited genomics tools and complex autotetraploid genetics. As such, the Mt genome sequence provides significant opportunities to expand alfalfa’s genomic toolbox.
We previously isolated a spontaneous mutant of Escherichia coli K-12, strain MG1655, following passage through the streptomycin-treated mouse intestine, that has colonization traits superior to the wild-type parent strain (M. P. Leatham et al., Infect. Immun. 73:8039–8049, 2005). This intestine-adapted strain (E. coli MG1655*) grew faster on several different carbon sources than the wild type and was nonmotile due to deletion of the flhD gene. We now report the results of several high-throughput genomic analysis approaches to further characterize E. coli MG1655*. Whole-genome pyrosequencing did not reveal any changes on its genome, aside from the deletion at the flhDC locus, that could explain the colonization advantage of E. coli MG1655*. Microarray analysis revealed modest yet significant induction of catabolic gene systems across the genome in both E. coli MG1655* and an isogenic flhD mutant constructed in the laboratory. Catabolome analysis with Biolog GN2 microplates revealed an enhanced ability of both E. coli MG1655* and the isogenic flhD mutant to oxidize a variety of carbon sources. The results show that intestine-adapted E. coli MG1655* is more fit than the wild type for intestinal colonization, because loss of FlhD results in elevated expression of genes involved in carbon and energy metabolism, resulting in more efficient carbon source utilization and a higher intestinal population. Hence, mutations that enhance metabolic efficiency confer a colonization advantage.
Systemic Lupus Erythematosus (SLE, OMIM 152700) is an autoimmune disease characterized by self-reactive antibodies resulting in systemic inflammation and organ failure. TNFAIP3, encoding the ubiquitin-modifying enzyme A20, is an established susceptibility locus for SLE. By fine mapping and genomic resequencing in ethnically diverse populations we fully characterized the TNFAIP3 risk haplotype and isolated a novel TT>A polymorphic dinucleotide associated with SLE in subjects of European (P = 1.58 × 10−8; odds ratio (OR) = 1.70) and Korean (P = 8.33 × 10−10; OR = 2.54) ancestry. This variant, located in a region of high conservation and regulatory potential, bound a nuclear protein complex comprised of NF-κB subunits with reduced avidity. Furthermore, compared with the non-risk haplotype, the haplotype carrying this variant resulted in reduced TNFAIP3 mRNA and A20 protein expression. These results establish this TT>A variant as the most likely functional polymorphism responsible for the association between TNFAIP3 and SLE.
Follicular lymphoma (FL) is a form of non-Hodgkin's lymphoma (NHL) that arises from germinal center (GC) B-cells. Despite the significant advances in immunotherapy, FL is still not curable. Beyond transcriptional profiling and genomics datasets, there currently is no epigenome-scale dataset or integrative biology approach that can adequately model this disease and therefore identify novel mechanisms and targets for successful prevention and treatment of FL.
We performed methylation-enriched genome-wide bisulfite sequencing of FL cells and normal CD19+ B-cells using 454 sequencing technology. The methylated DNA fragments were enriched with methyl-binding proteins, treated with bisulfite, and sequenced using the Roche-454 GS FLX sequencer. The total number of bases covered in the human genome was 18.2 and 49.3 million including 726,003 and 1.3 million CpGs in FL and CD19+ B-cells, respectively. 11,971 and 7,882 methylated regions of interest (MRIs) were identified respectively. The genome-wide distribution of these MRIs displayed significant differences between FL and normal B-cells. A reverse trend in the distribution of MRIs between the promoter and the gene body was observed in FL and CD19+ B-cells. The MRIs identified in FL cells also correlated well with transcriptomic data and ChIP-on-Chip analyses of genome-wide histone modifications such as tri-methyl-H3K27, and tri-methyl-H3K4, indicating a concerted epigenetic alteration in FL cells.
This study is the first to provide a large scale and comprehensive analysis of the DNA methylation sequence composition and distribution in the FL epigenome. These integrated approaches have led to the discovery of novel and frequent targets of aberrant epigenetic alterations. The genome-wide bisulfite sequencing approach developed here can be a useful tool for profiling DNA methylation in clinical samples.
Sugarcane (Saccharum spp.) has become an increasingly important crop for its leading role in biofuel production. The high sugar content species S. officinarum is an octoploid without known diploid or tetraploid progenitors. Commercial sugarcane cultivars are hybrids between S. officinarum and wild species S. spontaneum with ploidy at ~12×. The complex autopolyploid sugarcane genome has not been characterized at the DNA sequence level.
The microsynteny between sugarcane and sorghum was assessed by comparing 454 pyrosequences of 20 sugarcane bacterial artificial chromosomes (BACs) with sorghum sequences. These 20 BACs were selected by hybridization of 1961 single copy sorghum overgo probes to the sugarcane BAC library with one sugarcane BAC corresponding to each of the 20 sorghum chromosome arms. The genic regions of the sugarcane BACs shared an average of 95.2% sequence identity with sorghum, and the sorghum genome was used as a template to order sequence contigs covering 78.2% of the 20 BAC sequences. About 53.1% of the sugarcane BAC sequences are aligned with sorghum sequence. The unaligned regions contain non-coding and repetitive sequences. Within the aligned sequences, 209 genes were annotated in sugarcane and 202 in sorghum. Seventeen genes appeared to be sugarcane-specific and all validated by sugarcane ESTs, while 12 appeared sorghum-specific but only one validated by sorghum ESTs. Twelve of the 17 sugarcane-specific genes have no match in the non-redundant protein database in GenBank, perhaps encoding proteins for sugarcane-specific processes. The sorghum orthologous regions appeared to have expanded relative to sugarcane, mostly by the increase of retrotransposons.
The sugarcane and sorghum genomes are mostly collinear in the genic regions, and the sorghum genome can be used as a template for assembling much of the genic DNA of the autopolyploid sugarcane genome. The comparable gene density between sugarcane BACs and corresponding sorghum sequences defied the notion that polyploidy species might have faster pace of gene loss due to the redundancy of multiple alleles at each locus.
The Human Microbiome Project (HMP) is one of the U.S. National Institutes of Health Roadmap for Medical Research. Primary interests of the HMP include the distinctiveness of different gut microbiomes, the factors influencing microbiome diversity, and the functional redundancies of the members of human microbiotas. In this present work, we contribute to these interests by characterizing two extinct human microbiotas.
We examine two paleofecal samples originating from cave deposits in Durango Mexico and dating to approximately 1300 years ago. Contamination control is a serious issue in ancient DNA research; we use a novel approach to control contamination. After we determined that each sample originated from a different human, we generated 45 thousand shotgun DNA sequencing reads. The phylotyping and functional analysis of these reads reveals a signature consistent with the modern gut ecology. Interestingly, inter-individual variability for phenotypes but not functional pathways was observed. The two ancient samples have more similar functional profiles to each other than to a recently published profile for modern humans. This similarity could not be explained by a chance sampling of the databases.
We conduct a phylotyping and functional analysis of ancient human microbiomes, while providing novel methods to control for DNA contamination and novel hypotheses about past microbiome biogeography. We postulate that natural selection has more of an influence on microbiome functional profiles than it does on the species represented in the microbial ecology. We propose that human microbiomes were more geographically structured during pre-Columbian times than today.
An improved assembly of the Ciona intestinalis genome reveals that it contains non-canonical introns and that about 20% of Ciona genes reside in operons.
The draft genome sequence of the ascidian Ciona intestinalis, along with associated gene models, has been a valuable research resource. However, recently accumulated expressed sequence tag (EST)/cDNA data have revealed numerous inconsistencies with the gene models due in part to intrinsic limitations in gene prediction programs and in part to the fragmented nature of the assembly.
We have prepared a less-fragmented assembly on the basis of scaffold-joining guided by paired-end EST and bacterial artificial chromosome (BAC) sequences, and BAC chromosomal in situ hybridization data. The new assembly (115.2 Mb) is similar in length to the initial assembly (116.7 Mb) but contains 1,272 (approximately 50%) fewer scaffolds. The largest scaffold in the new assembly incorporates 95 initial-assembly scaffolds. In conjunction with the new assembly, we have prepared a greatly improved global gene model set strictly correlated with the extensive currently available EST data. The total gene number (15,254) is similar to that of the initial set (15,582), but the new set includes 3,330 models at genomic sites where none were present in the initial set, and 1,779 models that represent fusions of multiple previously incomplete models. In approximately half, 5'-ends were precisely mapped using 5'-full-length ESTs, an important refinement even in otherwise unchanged models.
Using these new resources, we identify a population of non-canonical (non-GT-AG) introns and also find that approximately 20% of Ciona genes reside in operons and that operons contain a high proportion of single-exon genes. Thus, the present dataset provides an opportunity to analyze the Ciona genome much more precisely than ever.
Biological nitrogen fixation is a prokaryotic process that plays an essential role in the global nitrogen cycle. Azorhizobium caulinodans ORS571 has the dual capacity to fix nitrogen both as free-living organism and in a symbiotic interaction with Sesbania rostrata. The host is a fast-growing, submergence-tolerant tropical legume on which A. caulinodans can efficiently induce nodule formation on the root system and on adventitious rootlets located on the stem.
The 5.37-Mb genome consists of a single circular chromosome with an overall average GC of 67% and numerous islands with varying GC contents. Most nodulation functions as well as a putative type-IV secretion system are found in a distinct symbiosis region. The genome contains a plethora of regulatory and transporter genes and many functions possibly involved in contacting a host. It potentially encodes 4717 proteins of which 96.3% have homologs and 3.7% are unique for A. caulinodans. Phylogenetic analyses show that the diazotroph Xanthobacter autotrophicus is the closest relative among the sequenced genomes, but the synteny between both genomes is very poor.
The genome analysis reveals that A. caulinodans is a diazotroph that acquired the capacity to nodulate most probably through horizontal gene transfer of a complex symbiosis island. The genome contains numerous genes that reflect a strong adaptive and metabolic potential. These combined features and the availability of the annotated genome make A. caulinodans an attractive organism to explore symbiotic biological nitrogen fixation beyond leguminous plants.
H8 is derived from a collection of Salmonella enterica serotype Enteritidis bacteriophage. Its morphology and genomic structure closely resemble those of bacteriophage T5 in the family Siphoviridae. H8 infected S. enterica serotypes Enteritidis and Typhimurium and Escherichia coli by initial adsorption to the outer membrane protein FepA. Ferric enterobactin inhibited H8 binding to E. coli FepA (50% inhibition concentration, 98 nM), and other ferric catecholate receptors (Fiu, Cir, and IroN) did not participate in phage adsorption. H8 infection was TonB dependent, but exbB mutations in Salmonella or E. coli did not prevent infection; only exbB tolQ or exbB tolR double mutants were resistant to H8. Experiments with deletion and substitution mutants showed that the receptor-phage interaction first involves residues distributed over the protein's outer surface and then narrows to the same charged (R316) or aromatic (Y260) residues that participate in the binding and transport of ferric enterobactin and colicins B and D. These data rationalize the multifunctionality of FepA: toxic ligands like bacteriocins and phage penetrate the outer membrane by parasitizing residues in FepA that are adapted to the transport of the natural ligand, ferric enterobactin. DNA sequence determinations revealed the complete H8 genome of 104.4 kb. A total of 120 of its 143 predicted open reading frames (ORFS) were homologous to ORFS in T5, at a level of 84% identity and 89% similarity. As in T5, the H8 structural genes clustered on the chromosome according to their function in the phage life cycle. The T5 genome contains a large section of DNA that can be deleted and that is absent in H8: compared to T5, H8 contains a 9,000-bp deletion in the early region of its chromosome, and nine potentially unique gene products. Sequence analyses of the tail proteins of phages in the same family showed that relative to pb5 (Oad) of T5 and Hrs of BF23, the FepA-binding protein (Rbp) of H8 contains unique acidic and aromatic residues. These side chains may promote binding to basic and aromatic residues in FepA that normally function in the adsorption of ferric enterobactin. Furthermore, a predicted H8 tail protein showed extensive identity and similarity to pb2 of T5, suggesting that it also functions in pore formation through the cell envelope. The variable region of this protein contains a potential TonB box, intimating that it participates in the TonB-dependent stage of the phage infection process.
Sorghum bicolor is a close relative of maize and is a staple crop in Africa and much of the developing world because of its superior tolerance of arid growth conditions. We have generated sequence from the hypomethylated portion of the sorghum genome by applying methylation filtration (MF) technology. The evidence suggests that 96% of the genes have been sequence tagged, with an average coverage of 65% across their length. Remarkably, this level of gene discovery was accomplished after generating a raw coverage of less than 300 megabases of the 735-megabase genome. MF preferentially captures exons and introns, promoters, microRNAs, and simple sequence repeats, and minimizes interspersed repeats, thus providing a robust view of the functional parts of the genome. The sorghum MF sequence set is beneficial to research on sorghum and is also a powerful resource for comparative genomics among the grasses and across the entire plant kingdom. Thousands of hypothetical gene predictions in rice and Arabidopsis are supported by the sorghum dataset, and genomic similarities highlight evolutionarily conserved regions that will lead to a better understanding of rice and Arabidopsis.
Methylation filtration makes practical the sequencing of large genomes, such as those found in sorghum, by preferentially capturing functionally relevant sequences