PMCC PMCC

Search tips
Search criteria

Advanced
Results 1-24 (24)
 

Clipboard (0)
None

Select a Filter Below

Journals
more »
Year of Publication
1.  Comprehensive Transcriptome Assembly of Chickpea (Cicer arietinum L.) Using Sanger and Next Generation Sequencing Platforms: Development and Applications 
PLoS ONE  2014;9(1):e86039.
A comprehensive transcriptome assembly of chickpea has been developed using 134.95 million Illumina single-end reads, 7.12 million single-end FLX/454 reads and 139,214 Sanger expressed sequence tags (ESTs) from >17 genotypes. This hybrid transcriptome assembly, referred to as Cicer arietinum Transcriptome Assembly version 2 (CaTA v2, available at http://data.comparative-legumes.org/transcriptomes/cicar/lista_cicar-201201), comprising 46,369 transcript assembly contigs (TACs) has an N50 length of 1,726 bp and a maximum contig size of 15,644 bp. Putative functions were determined for 32,869 (70.8%) of the TACs and gene ontology assignments were determined for 21,471 (46.3%). The new transcriptome assembly was compared with the previously available chickpea transcriptome assemblies as well as to the chickpea genome. Comparative analysis of CaTA v2 against transcriptomes of three legumes - Medicago, soybean and common bean, resulted in 27,771 TACs common to all three legumes indicating strong conservation of genes across legumes. CaTA v2 was also used for identification of simple sequence repeats (SSRs) and intron spanning regions (ISRs) for developing molecular markers. ISRs were identified by aligning TACs to the Medicago genome, and their putative mapping positions at chromosomal level were identified using transcript map of chickpea. Primer pairs were designed for 4,990 ISRs, each representing a single contig for which predicted positions are inferred and distributed across eight linkage groups. A subset of randomly selected ISRs representing all eight chickpea linkage groups were validated on five chickpea genotypes and showed 20% polymorphism with average polymorphic information content (PIC) of 0.27. In summary, the hybrid transcriptome assembly developed and novel markers identified can be used for a variety of applications such as gene discovery, marker-trait association, diversity analysis etc., to advance genetics research and breeding applications in chickpea and other related legumes.
doi:10.1371/journal.pone.0086039
PMCID: PMC3900451  PMID: 24465857
2.  The Medicago Genome Provides Insight into the Evolution of Rhizobial Symbioses 
Young, Nevin D. | Debellé, Frédéric | Oldroyd, Giles E. D. | Geurts, Rene | Cannon, Steven B. | Udvardi, Michael K. | Benedito, Vagner A. | Mayer, Klaus F. X. | Gouzy, Jérôme | Schoof, Heiko | Van de Peer, Yves | Proost, Sebastian | Cook, Douglas R. | Meyers, Blake C. | Spannagl, Manuel | Cheung, Foo | De Mita, Stéphane | Krishnakumar, Vivek | Gundlach, Heidrun | Zhou, Shiguo | Mudge, Joann | Bharti, Arvind K. | Murray, Jeremy D. | Naoumkina, Marina A. | Rosen, Benjamin | Silverstein, Kevin A. T. | Tang, Haibao | Rombauts, Stephane | Zhao, Patrick X. | Zhou, Peng | Barbe, Valérie | Bardou, Philippe | Bechner, Michael | Bellec, Arnaud | Berger, Anne | Bergès, Hélène | Bidwell, Shelby | Bisseling, Ton | Choisne, Nathalie | Couloux, Arnaud | Denny, Roxanne | Deshpande, Shweta | Dai, Xinbin | Doyle, Jeff | Dudez, Anne-Marie | Farmer, Andrew D. | Fouteau, Stéphanie | Franken, Carolien | Gibelin, Chrystel | Gish, John | Goldstein, Steven | González, Alvaro J. | Green, Pamela J. | Hallab, Asis | Hartog, Marijke | Hua, Axin | Humphray, Sean | Jeong, Dong-Hoon | Jing, Yi | Jöcker, Anika | Kenton, Steve M. | Kim, Dong-Jin | Klee, Kathrin | Lai, Hongshing | Lang, Chunting | Lin, Shaoping | Macmil, Simone L | Magdelenat, Ghislaine | Matthews, Lucy | McCorrison, Jamison | Monaghan, Erin L. | Mun, Jeong-Hwan | Najar, Fares Z. | Nicholson, Christine | Noirot, Céline | O’Bleness, Majesta | Paule, Charles R. | Poulain, Julie | Prion, Florent | Qin, Baifang | Qu, Chunmei | Retzel, Ernest F. | Riddle, Claire | Sallet, Erika | Samain, Sylvie | Samson, Nicolas | Sanders, Iryna | Saurat, Olivier | Scarpelli, Claude | Schiex, Thomas | Segurens, Béatrice | Severin, Andrew J. | Sherrier, D. Janine | Shi, Ruihua | Sims, Sarah | Singer, Susan R. | Sinharoy, Senjuti | Sterck, Lieven | Viollet, Agnès | Wang, Bing-Bing | Wang, Keqin | Wang, Mingyi | Wang, Xiaohong | Warfsmann, Jens | Weissenbach, Jean | White, Doug D. | White, Jim D. | Wiley, Graham B. | Wincker, Patrick | Xing, Yanbo | Yang, Limei | Yao, Ziyun | Ying, Fu | Zhai, Jixian | Zhou, Liping | Zuber, Antoine | Dénarié, Jean | Dixon, Richard A. | May, Gregory D. | Schwartz, David C. | Rogers, Jane | Quétier, Francis | Town, Christopher D. | Roe, Bruce A.
Nature  2011;480(7378):520-524.
Legumes (Fabaceae or Leguminosae) are unique among cultivated plants for their ability to carry out endosymbiotic nitrogen fixation with rhizobial bacteria, a process that takes place in a specialized structure known as the nodule. Legumes belong to one of the two main groups of eurosids, the Fabidae, which includes most species capable of endosymbiotic nitrogen fixation 1. Legumes comprise several evolutionary lineages derived from a common ancestor 60 million years ago (Mya). Papilionoids are the largest clade, dating nearly to the origin of legumes and containing most cultivated species 2. Medicago truncatula (Mt) is a long-established model for the study of legume biology. Here we describe the draft sequence of the Mt euchromatin based on a recently completed BAC-assembly supplemented with Illumina-shotgun sequence, together capturing ~94% of all Mt genes. A whole-genome duplication (WGD) approximately 58 Mya played a major role in shaping the Mt genome and thereby contributed to the evolution of endosymbiotic nitrogen fixation. Subsequent to the WGD, the Mt genome experienced higher levels of rearrangement than two other sequenced legumes, Glycine max (Gm) and Lotus japonicus (Lj). Mt is a close relative of alfalfa (M. sativa), a widely cultivated crop with limited genomics tools and complex autotetraploid genetics. As such, the Mt genome sequence provides significant opportunities to expand alfalfa’s genomic toolbox.
doi:10.1038/nature10625
PMCID: PMC3272368  PMID: 22089132
3.  A Comprehensive Transcriptome Assembly of Pigeonpea (Cajanus cajan L.) using Sanger and Second-Generation Sequencing Platforms 
Molecular Plant  2012;5(5):1020-1028.
A comprehensive transcriptome assembly for pigeonpea has been developed by analyzing 128.9 million short Illumina GA IIx single end reads, 2.19 million single end FLX/454 reads, and 18 353 Sanger expressed sequenced tags from more than 16 genotypes. The resultant transcriptome assembly, referred to as CcTA v2, comprised 21 434 transcript assembly contigs (TACs) with an N50 of 1510 bp, the largest one being ∼8 kb. Of the 21 434 TACs, 16 622 (77.5%) could be mapped on to the soybean genome build 1.0.9 under fairly stringent alignment parameters. Based on knowledge of intron junctions, 10 009 primer pairs were designed from 5033 TACs for amplifying intron spanning regions (ISRs). By using in silico mapping of BAC-end-derived SSR loci of pigeonpea on the soybean genome as a reference, putative mapping positions at the chromosome level were predicted for 6284 ISR markers, covering all 11 pigeonpea chromosomes. A subset of 128 ISR markers were analyzed on a set of eight genotypes. While 116 markers were validated, 70 markers showed one to three alleles, with an average of 0.16 polymorphism information content (PIC) value. In summary, the CcTA v2 transcript assembly and ISR markers will serve as a useful resource to accelerate genetic research and breeding applications in pigeonpea.
doi:10.1093/mp/ssr111
PMCID: PMC3440007  PMID: 22241453
Cajanus cajan (L.); second-generation sequencing; transcriptome assembly; intron spanning region (ISR) markers
4.  Evolutionary and comparative analyses of the soybean genome 
Breeding Science  2012;61(5):437-444.
The soybean genome assembly has been available since the end of 2008. Significant features of the genome include large, gene-poor, repeat-dense pericentromeric regions, spanning roughly 57% of the genome sequence; a relatively large genome size of ~1.15 billion bases; remnants of a genome duplication that occurred ~13 million years ago (Mya); and fainter remnants of older polyploidies that occurred ~58 Mya and >130 Mya. The genome sequence has been used to identify the genetic basis for numerous traits, including disease resistance, nutritional characteristics, and developmental features. The genome sequence has provided a scaffold for placement of many genomic feature elements, both from within soybean and from related species. These may be accessed at several websites, including http://www.phytozome.net, http://soybase.org, http://comparative-legumes.org, and http://www.legumebase.brc.miyazaki-u.ac.jp. The taxonomic position of soybean in the Phaseoleae tribe of the legumes means that there are approximately two dozen other beans and relatives that have undergone independent domestication, and which may have traits that will be useful for transfer to soybean. Methods of translating information between species in the Phaseoleae range from design of markers for marker assisted selection, to transformation with Agrobacterium or with other experimental transformation methods.
doi:10.1270/jsbbs.61.437
PMCID: PMC3406793  PMID: 23136483
Glycine max; soybean; legume evolution; polyploidy; SoyBase; Legume Information System; Legumebase; Phytozome
5.  Chromosome Visualization Tool: A Whole Genome Viewer 
CViT (chromosome visualization tool) is a Perl utility for quickly generating images of features on a whole genome at once. It reads GFF3-formated data representing chromosomes (linkage groups or pseudomolecules) and sets of features on those chromosomes. It can display features on any chromosomal unit system, including genetic (centimorgan), cytological (centiMcClintock), and DNA unit (base-pair) coordinates. CViT has been used to track sequencing progress (status of genome sequencing, location and number of gaps), to visualize BLAST hits on a whole genome view, to associate maps with one another, to locate regions of repeat densities to display syntenic regions, and to visualize centromeres and knobs on chromosomes.
doi:10.1155/2011/373875
PMCID: PMC3246742  PMID: 22220167
6.  Taking the First Steps towards a Standard for Reporting on Phylogenies: Minimal Information about a Phylogenetic Analysis (MIAPA) 
In the eight years since phylogenomics was introduced as the intersection of genomics and phylogenetics, the field has provided fundamental insights into gene function, genome history and organismal relationships. The utility of phylogenomics is growing with the increase in the number and diversity of taxa for which whole genome and large transcriptome sequence sets are being generated. We assert that the synergy between genomic and phylogenetic perspectives in comparative biology would be enhanced by the development and refinement of minimal reporting standards for phylogenetic analyses. Encouraged by the development of the Minimum Information About a Microarray Experiment (MIAME) standard, we propose a similar roadmap for the development of a Minimal Information About a Phylogenetic Analysis (MIAPA) standard. Key in the successful development and implementation of such a standard will be broad participation by developers of phylogenetic analysis software, phylogenetic database developers, practitioners of phylogenomics, and journal editors.
doi:10.1089/omi.2006.10.231
PMCID: PMC3167193  PMID: 16901231
7.  Defining the Transcriptome Assembly and Its Use for Genome Dynamics and Transcriptome Profiling Studies in Pigeonpea (Cajanus cajan L.) 
This study reports generation of large-scale genomic resources for pigeonpea, a so-called ‘orphan crop species’ of the semi-arid tropic regions. FLX/454 sequencing carried out on a normalized cDNA pool prepared from 31 tissues produced 494 353 short transcript reads (STRs). Cluster analysis of these STRs, together with 10 817 Sanger ESTs, resulted in a pigeonpea trancriptome assembly (CcTA) comprising of 127 754 tentative unique sequences (TUSs). Functional analysis of these TUSs highlights several active pathways and processes in the sampled tissues. Comparison of the CcTA with the soybean genome showed similarity to 10 857 and 16 367 soybean gene models (depending on alignment methods). Additionally, Illumina 1G sequencing was performed on Fusarium wilt (FW)- and sterility mosaic disease (SMD)-challenged root tissues of 10 resistant and susceptible genotypes. More than 160 million sequence tags were used to identify FW- and SMD-responsive genes. Sequence analysis of CcTA and the Illumina tags identified a large new set of markers for use in genetics and breeding, including 8137 simple sequence repeats, 12 141 single-nucleotide polymorphisms and 5845 intron-spanning regions. Genomic resources developed in this study should be useful for basic and applied research, not only for pigeonpea improvement but also for other related, agronomically important legumes.
doi:10.1093/dnares/dsr007
PMCID: PMC3111231  PMID: 21565938
Cajanus cajan L.; next generation sequencing; transcriptome assembly; molecular markers and gene discovery
8.  The clinical significance of the FUS-CREB3L2 translocation in low-grade fibromyxoid sarcoma 
Background
Low-grade fibromyxoid sarcoma (LGFMS) is a rare soft-tissue neoplasm with a deceptively benign histological appearance. Local recurrences and metastases can manifest many years following excision. The FUS-CREB3L2 gene translocation, which occurs commonly in LGFMS, may be detected by reverse-transcriptase polymerase chain reaction (RT-PCR) and fluorescence in situ hybridisation (FISH). We assessed the relationship between clinical outcome and translocation test result by both methods.
Methods
We report genetic analysis of 23 LGFMS cases and clinical outcomes of 18 patients with mean age of 40.6 years. During follow-up (mean 24.8 months), there were no cases of local recurrence or metastasis. One case was referred with a third recurrence of a para-spinal tumour previously incorrectly diagnosed as a neurofibroma.
Results
Results showed 50% of cases tested positive for the FUS-CREB3L2 translocation by RT-PCR and 81.8% by FISH, suggesting FISH is more sensitive than RT-PCR for confirming LGFMS diagnosis. Patients testing positive by both methods tended to be younger and had larger tumours. Despite this, there was no difference in clinical outcome seen during short and medium-term follow-up.
Conclusions
RT-PCR and FISH for the FUS-CREB3L2 fusion transcript are useful tools for confirming LGFMS diagnosis, but have no role in predicting medium-term clinical outcome. Due to the propensity for late recurrence or metastasis, wide excision is essential, and longer-term follow-up is required. This may identify a difference in long-term clinical outcome between translocation-positive and negative patients.
doi:10.1186/1749-799X-6-15
PMCID: PMC3063187  PMID: 21406083
9.  Applying Small-Scale DNA Signatures as an Aid in Assembling Soybean Chromosome Sequences 
Advances in Bioinformatics  2010;2010:976792.
Previous work has established a genomic signature based on relative counts of the 16 possible dinucleotides. Until now, it has been generally accepted that the dinucleotide signature is characteristic of a genome and is relatively homogeneous across a genome. However, we found some local regions of the soybean genome with a signature differing widely from that of the rest of the genome. Those regions were mostly centromeric and pericentromeric, and enriched for repetitive sequences. We found that DNA binding energy also presented large-scale patterns across soybean chromosomes. These two patterns were helpful during assembly and quality control of soybean whole genome shotgun scaffold sequences into chromosome pseudomolecules.
doi:10.1155/2010/976792
PMCID: PMC2933861  PMID: 20827309
10.  RNA-Seq Atlas of Glycine max: A guide to the soybean transcriptome 
BMC Plant Biology  2010;10:160.
Background
Next generation sequencing is transforming our understanding of transcriptomes. It can determine the expression level of transcripts with a dynamic range of over six orders of magnitude from multiple tissues, developmental stages or conditions. Patterns of gene expression provide insight into functions of genes with unknown annotation.
Results
The RNA Seq-Atlas presented here provides a record of high-resolution gene expression in a set of fourteen diverse tissues. Hierarchical clustering of transcriptional profiles for these tissues suggests three clades with similar profiles: aerial, underground and seed tissues. We also investigate the relationship between gene structure and gene expression and find a correlation between gene length and expression. Additionally, we find dramatic tissue-specific gene expression of both the most highly-expressed genes and the genes specific to legumes in seed development and nodule tissues. Analysis of the gene expression profiles of over 2,000 genes with preferential gene expression in seed suggests there are more than 177 genes with functional roles that are involved in the economically important seed filling process. Finally, the Seq-atlas also provides a means of evaluating existing gene model annotations for the Glycine max genome.
Conclusions
This RNA-Seq atlas extends the analyses of previous gene expression atlases performed using Affymetrix GeneChip technology and provides an example of new methods to accommodate the increase in transcriptome data obtained from next generation sequencing. Data contained within this RNA-Seq atlas of Glycine max can be explored at http://www.soybase.org/soyseq.
doi:10.1186/1471-2229-10-160
PMCID: PMC3017786  PMID: 20687943
11.  Polyploidy Did Not Predate the Evolution of Nodulation in All Legumes 
PLoS ONE  2010;5(7):e11630.
Background
Several lines of evidence indicate that polyploidy occurred by around 54 million years ago, early in the history of legume evolution, but it has not been known whether this event was confined to the papilionoid subfamily (Papilionoideae; e.g. beans, medics, lupins) or occurred earlier. Determining the timing of the polyploidy event is important for understanding whether polyploidy might have contributed to rapid diversification and radiation of the legumes near the origin of the family; and whether polyploidy might have provided genetic material that enabled the evolution of a novel organ, the nitrogen-fixing nodule. Although symbioses with nitrogen-fixing partners have evolved in several lineages in the rosid I clade, nodules are widespread only in legume taxa, being nearly universal in the papilionoids and in the mimosoid subfamily (e.g., mimosas, acacias) – which diverged from the papilionoid legumes around 58 million years ago, soon after the origin of the legumes.
Methodology/Principal Findings
Using transcriptome sequence data from Chamaecrista fasciculata, a nodulating member of the mimosoid clade, we tested whether this species underwent polyploidy within the timeframe of legume diversification. Analysis of gene family branching orders and synonymous-site divergence data from C. fasciculata, Glycine max (soybean), Medicago truncatula, and Vitis vinifera (grape; an outgroup to the rosid taxa) establish that the polyploidy event known from soybean and Medicago occurred after the separation of the mimosoid and papilionoid clades, and at or shortly before the Papilionoideae radiation.
Conclusions
The ancestral legume genome was not fundamentally polyploid. Moreover, because there has not been an independent instance of polyploidy in the Chamaecrista lineage there is no necessary connection between polyploidy and nodulation in legumes. Chamaecrista may serve as a useful model in the legumes that lacks a paleopolyploid history, at least relative to the widely studied papilionoid models.
doi:10.1371/journal.pone.0011630
PMCID: PMC2905438  PMID: 20661290
12.  An Evaluation of the Diagnostic Accuracy of the Grade of Preoperative Biopsy Compared to Surgical Excision in Chondrosarcoma of the Long Bones 
Chondrosarcoma is the second most common primary malignant bone tumour. Distinguishing between grades is not necessarily straightforward and may alter the disease management. We evaluated the correlation between histological grading of the preoperative image-guided needle biopsy and the resection specimen of 78 consecutive cases of chondrosarcoma of the femur, humerus, and tibia. In 11 instances, there was a discrepancy in histological grade between the biopsy and surgical specimen. Therefore, there was an 85.9% (67/78) accuracy rate for pre-operative histological grading of chondrosarcoma, based on needle biopsy. However, the accuracy of the diagnostic biopsy to distinguish low-grade from high-grade chondrosarcoma was 93.6% (73/78). We conclude that accurate image-guided biopsy is a very useful adjunct in determining histological grade of chondrosarcoma and the subsequent treatment plan. At present, a multidisciplinary approach, comprising experienced orthopaedic surgeons, radiologists, and pathologists, offers the most reliable means of accurately diagnosing and grading of chondrosarcoma of long bones.
doi:10.1155/2010/270195
PMCID: PMC3265259  PMID: 22312488
13.  Complementary genetic and genomic approaches help characterize the linkage group I seed protein QTL in soybean 
BMC Plant Biology  2010;10:41.
Background
The nutritional and economic value of many crops is effectively a function of seed protein and oil content. Insight into the genetic and molecular control mechanisms involved in the deposition of these constituents in the developing seed is needed to guide crop improvement. A quantitative trait locus (QTL) on Linkage Group I (LG I) of soybean (Glycine max (L.) Merrill) has a striking effect on seed protein content.
Results
A soybean near-isogenic line (NIL) pair contrasting in seed protein and differing in an introgressed genomic segment containing the LG I protein QTL was used as a resource to demarcate the QTL region and to study variation in transcript abundance in developing seed. The LG I QTL region was delineated to less than 8.4 Mbp of genomic sequence on chromosome 20. Using Affymetrix® Soy GeneChip and high-throughput Illumina® whole transcriptome sequencing platforms, 13 genes displaying significant seed transcript accumulation differences between NILs were identified that mapped to the 8.4 Mbp LG I protein QTL region.
Conclusions
This study identifies gene candidates at the LG I protein QTL for potential involvement in the regulation of protein content in the soybean seed. The results demonstrate the power of complementary approaches to characterize contrasting NILs and provide genome-wide transcriptome insight towards understanding seed biology and the soybean genome.
doi:10.1186/1471-2229-10-41
PMCID: PMC2848761  PMID: 20199683
14.  Granular Cell Tumours: A Rare Entity in the Musculoskeletal System 
Sarcoma  2010;2009:765927.
Granular Cell Tumours are rare mesenchymal soft tissue tumours that arise throughout the body and are believed to be of neural origin. They often present as asymptomatic, slow-growing, benign, solitary lesions but may be multifocal. 1-2% of cases are malignant and can metastasise. Described series in the literature are sparse. We identified eleven cases in ten patients treated surgically and followed-up for a period of over 6 years in our regional bone and soft tissue tumour centre. Five tumours were located in the lower limb, four in the upper limb, and two in the trunk. Mean patient age was 31.2 years (range 8–55 years). Excision was complete in one case, marginal in five cases and intralesional in five cases. No patients required postoperative adjuvant treatment. Mean follow-up was 19.3 months (range 1–37 months). One case was multifocal, but there were no cases of local recurrence or malignancy. Histopathological and immunohistochemical analysis revealed the classical granular cell tumour features in all cases. We believe this case series to be the largest of its type in patients presenting to an orthopaedic soft tissue tumour unit. We present our findings and correlate them with findings of other series in the literature.
doi:10.1155/2009/765927
PMCID: PMC2821775  PMID: 20169099
15.  High-throughput SNP discovery through deep resequencing of a reduced representation library to anchor and orient scaffolds in the soybean whole genome sequence 
BMC Genomics  2010;11:38.
Background
The Soybean Consensus Map 4.0 facilitated the anchoring of 95.6% of the soybean whole genome sequence developed by the Joint Genome Institute, Department of Energy, but its marker density was only sufficient to properly orient 66% of the sequence scaffolds. The discovery and genetic mapping of more single nucleotide polymorphism (SNP) markers were needed to anchor and orient the remaining genome sequence. To that end, next generation sequencing and high-throughput genotyping were combined to obtain a much higher resolution genetic map that could be used to anchor and orient most of the remaining sequence and to help validate the integrity of the existing scaffold builds.
Results
A total of 7,108 to 25,047 predicted SNPs were discovered using a reduced representation library that was subsequently sequenced by the Illumina sequence-by-synthesis method on the clonal single molecule array platform. Using multiple SNP prediction methods, the validation rate of these SNPs ranged from 79% to 92.5%. A high resolution genetic map using 444 recombinant inbred lines was created with 1,790 SNP markers. Of the 1,790 mapped SNP markers, 1,240 markers had been selectively chosen to target existing unanchored or un-oriented sequence scaffolds, thereby increasing the amount of anchored sequence to 97%.
Conclusion
We have demonstrated how next generation sequencing was combined with high-throughput SNP detection assays to quickly discover large numbers of SNPs. Those SNPs were then used to create a high resolution genetic map that assisted in the assembly of scaffolds from the 8× whole genome shotgun sequences into pseudomolecules corresponding to chromosomes of the organism.
doi:10.1186/1471-2164-11-38
PMCID: PMC2817691  PMID: 20078886
16.  SoyBase, the USDA-ARS soybean genetics and genomics database 
Nucleic Acids Research  2009;38(Database issue):D843-D846.
SoyBase, the USDA-ARS soybean genetic database, is a comprehensive repository for professionally curated genetics, genomics and related data resources for soybean. SoyBase contains the most current genetic, physical and genomic sequence maps integrated with qualitative and quantitative traits. The quantitative trait loci (QTL) represent more than 18 years of QTL mapping of more than 90 unique traits. SoyBase also contains the well-annotated ‘Williams 82’ genomic sequence and associated data mining tools. The genetic and sequence views of the soybean chromosomes and the extensive data on traits and phenotypes are extensively interlinked. This allows entry to the database using almost any kind of available information, such as genetic map symbols, soybean gene names or phenotypic traits. SoyBase is the repository for controlled vocabularies for soybean growth, development and trait terms, which are also linked to the more general plant ontologies. SoyBase can be accessed at http://soybase.org.
doi:10.1093/nar/gkp798
PMCID: PMC2808871  PMID: 20008513
17.  Integrating microarray analysis and the soybean genome to understand the soybeans iron deficiency response 
BMC Genomics  2009;10:376.
Background
Soybeans grown in the upper Midwestern United States often suffer from iron deficiency chlorosis, which results in yield loss at the end of the season. To better understand the effect of iron availability on soybean yield, we identified genes in two near isogenic lines with changes in expression patterns when plants were grown in iron sufficient and iron deficient conditions.
Results
Transcriptional profiles of soybean (Glycine max, L. Merr) near isogenic lines Clark (PI548553, iron efficient) and IsoClark (PI547430, iron inefficient) grown under Fe-sufficient and Fe-limited conditions were analyzed and compared using the Affymetrix® GeneChip® Soybean Genome Array. There were 835 candidate genes in the Clark (PI548553) genotype and 200 candidate genes in the IsoClark (PI547430) genotype putatively involved in soybean's iron stress response. Of these candidate genes, fifty-eight genes in the Clark genotype were identified with a genetic location within known iron efficiency QTL and 21 in the IsoClark genotype. The arrays also identified 170 single feature polymorphisms (SFPs) specific to either Clark or IsoClark. A sliding window analysis of the microarray data and the 7X genome assembly coupled with an iterative model of the data showed the candidate genes are clustered in the genome. An analysis of 5' untranslated regions in the promoter of candidate genes identified 11 conserved motifs in 248 differentially expressed genes, all from the Clark genotype, representing 129 clusters identified earlier, confirming the cluster analysis results.
Conclusion
These analyses have identified the first genes with expression patterns that are affected by iron stress and are located within QTL specific to iron deficiency stress. The genetic location and promoter motif analysis results support the hypothesis that the differentially expressed genes are co-regulated. The combined results of all analyses lead us to postulate iron inefficiency in soybean is a result of a mutation in a transcription factor(s), which controls the expression of genes required in inducing an iron stress response.
doi:10.1186/1471-2164-10-376
PMCID: PMC2907705  PMID: 19678937
18.  Evolutionary genomics of LysM genes in land plants 
Background
The ubiquitous LysM motif recognizes peptidoglycan, chitooligosaccharides (chitin) and, presumably, other structurally-related oligosaccharides. LysM-containing proteins were first shown to be involved in bacterial cell wall degradation and, more recently, were implicated in perceiving chitin (one of the established pathogen-associated molecular patterns) and lipo-chitin (nodulation factors) in flowering plants. However, the majority of LysM genes in plants remain functionally uncharacterized and the evolutionary history of complex LysM genes remains elusive.
Results
We show that LysM-containing proteins display a wide range of complex domain architectures. However, only a simple core architecture is conserved across kingdoms. Each individual kingdom appears to have evolved a distinct array of domain architectures. We show that early plant lineages acquired four characteristic architectures and progressively lost several primitive architectures. We report plant LysM phylogenies and associated gene, protein and genomic features, and infer the relative timing of duplications of LYK genes.
Conclusion
We report a domain architecture catalogue of LysM proteins across all kingdoms. The unique pattern of LysM protein domain architectures indicates the presence of distinctive evolutionary paths in individual kingdoms. We describe a comparative and evolutionary genomics study of LysM genes in plant kingdom. One of the two groups of tandemly arrayed plant LYK genes likely resulted from an ancient genome duplication followed by local genomic rearrangement, while the origin of the other groups of tandemly arrayed LYK genes remains obscure. Given the fact that no animal LysM motif-containing genes have been functionally characterized, this study provides clues to functional characterization of plant LysM genes and is also informative with regard to evolutionary and functional studies of animal LysM genes.
doi:10.1186/1471-2148-9-183
PMCID: PMC2728734  PMID: 19650916
19.  An analysis of synteny of Arachis with Lotus and Medicago sheds new light on the structure, stability and evolution of legume genomes 
BMC Genomics  2009;10:45.
Background
Most agriculturally important legumes fall within two sub-clades of the Papilionoid legumes: the Phaseoloids and Galegoids, which diverged about 50 Mya. The Phaseoloids are mostly tropical and include crops such as common bean and soybean. The Galegoids are mostly temperate and include clover, fava bean and the model legumes Lotus and Medicago (both with substantially sequenced genomes). In contrast, peanut (Arachis hypogaea) falls in the Dalbergioid clade which is more basal in its divergence within the Papilionoids. The aim of this work was to integrate the genetic map of Arachis with Lotus and Medicago and improve our understanding of the Arachis genome and legume genomes in general. To do this we placed on the Arachis map, comparative anchor markers defined using a previously described bioinformatics pipeline. Also we investigated the possible role of transposons in the patterns of synteny that were observed.
Results
The Arachis genetic map was substantially aligned with Lotus and Medicago with most synteny blocks presenting a single main affinity to each genome. This indicates that the last common whole genome duplication within the Papilionoid legumes predated the divergence of Arachis from the Galegoids and Phaseoloids sufficiently that the common ancestral genome was substantially diploidized. The Arachis and model legume genomes comparison made here, together with a previously published comparison of Lotus and Medicago allowed all possible Arachis-Lotus-Medicago species by species comparisons to be made and genome syntenies observed. Distinct conserved synteny blocks and non-conserved regions were present in all genome comparisons, implying that certain legume genomic regions are consistently more stable during evolution than others. We found that in Medicago and possibly also in Lotus, retrotransposons tend to be more frequent in the variable regions. Furthermore, while these variable regions generally have lower densities of single copy genes than the more conserved regions, some harbor high densities of the fast evolving disease resistance genes.
Conclusion
We suggest that gene space in Papilionoids may be divided into two broadly defined components: more conserved regions which tend to have low retrotransposon densities and are relatively stable during evolution; and variable regions that tend to have high retrotransposon densities, and whose frequent restructuring may fuel the evolution of some gene families.
doi:10.1186/1471-2164-10-45
PMCID: PMC2656529  PMID: 19166586
20.  Highly syntenic regions in the genomes of soybean, Medicago truncatula, and Arabidopsis thaliana 
BMC Plant Biology  2005;5:15.
Background
Recent genome sequencing enables mega-base scale comparisons between related genomes. Comparisons between animals, plants, fungi, and bacteria demonstrate extensive synteny tempered by rearrangements. Within the legume plant family, glimpses of synteny have also been observed. Characterizing syntenic relationships in legumes is important in transferring knowledge from model legumes to crops that are important sources of protein, fixed nitrogen, and health-promoting compounds.
Results
We have uncovered two large soybean regions exhibiting synteny with M. truncatula and with a network of segmentally duplicated regions in Arabidopsis. In all, syntenic regions comprise over 500 predicted genes spanning 3 Mb. Up to 75% of soybean genes are colinear with M. truncatula, including one region in which 33 of 35 soybean predicted genes with database support are colinear to M. truncatula. In some regions, 60% of soybean genes share colinearity with a network of A. thaliana duplications. One region is especially interesting because this 500 kbp segment of soybean is syntenic to two paralogous regions in M. truncatula on different chromosomes. Phylogenetic analysis of individual genes within these regions demonstrates that one is orthologous to the soybean region, with which it also shows substantially denser synteny and significantly lower levels of synonymous nucleotide substitutions. The other M. truncatula region is inferred to be paralogous, presumably resulting from a duplication event preceding speciation.
Conclusion
The presence of well-defined M. truncatula segments showing orthologous and paralogous relationships with soybean allows us to explore the evolution of contiguous genomic regions in the context of ancient genome duplication and speciation events.
doi:10.1186/1471-2229-5-15
PMCID: PMC1201151  PMID: 16102170
21.  The roles of segmental and tandem gene duplication in the evolution of large gene families in Arabidopsis thaliana 
BMC Plant Biology  2004;4:10.
Background
Most genes in Arabidopsis thaliana are members of gene families. How do the members of gene families arise, and how are gene family copy numbers maintained? Some gene families may evolve primarily through tandem duplication and high rates of birth and death in clusters, and others through infrequent polyploidy or large-scale segmental duplications and subsequent losses.
Results
Our approach to understanding the mechanisms of gene family evolution was to construct phylogenies for 50 large gene families in Arabidopsis thaliana, identify large internal segmental duplications in Arabidopsis, map gene duplications onto the segmental duplications, and use this information to identify which nodes in each phylogeny arose due to segmental or tandem duplication. Examples of six gene families exemplifying characteristic modes are described. Distributions of gene family sizes and patterns of duplication by genomic distance are also described in order to characterize patterns of local duplication and copy number for large gene families. Both gene family size and duplication by distance closely follow power-law distributions.
Conclusions
Combining information about genomic segmental duplications, gene family phylogenies, and gene positions provides a method to evaluate contributions of tandem duplication and segmental genome duplication in the generation and maintenance of gene families. These differences appear to correspond meaningfully to differences in functional roles of the members of the gene families.
doi:10.1186/1471-2229-4-10
PMCID: PMC446195  PMID: 15171794
22.  DiagHunter and GenoPix2D: programs for genomic comparisons, large-scale homology discovery and visualization 
Genome Biology  2003;4(10):R68.
The DiagHunter and GenoPix2D applications work together to enable genomic comparisons and exploration at both genome-wide and single-gene scales. DiagHunter identifies homologous regions (synteny blocks) within or between genomes. GenoPix2D allows interactive display of synteny blocks and other genomic features, as well as querying by annotation and by sequence similarity.
The DiagHunter and GenoPix2D applications work together to enable genomic comparisons and exploration at both genome-wide and single-gene scales. DiagHunter identifies homologous regions (synteny blocks) within or between genomes. DiagHunter works efficiently with diverse, large datasets to predict extended and interrupted synteny blocks and to generate graphical and text output quickly. GenoPix2D allows interactive display of synteny blocks and other genomic features, as well as querying by annotation and by sequence similarity.
PMCID: PMC328457  PMID: 14519203
23.  OrthoParaMap: Distinguishing orthologs from paralogs by integrating comparative genome data and gene phylogenies 
BMC Bioinformatics  2003;4:35.
Background
In eukaryotic genomes, most genes are members of gene families. When comparing genes from two species, therefore, most genes in one species will be homologous to multiple genes in the second. This often makes it difficult to distinguish orthologs (separated through speciation) from paralogs (separated by other types of gene duplication). Combining phylogenetic relationships and genomic position in both genomes helps to distinguish between these scenarios. This kind of comparison can also help to describe how gene families have evolved within a single genome that has undergone polyploidy or other large-scale duplications, as in the case of Arabidopsis thaliana – and probably most plant genomes.
Results
We describe a suite of programs called OrthoParaMap (OPM) that makes genomic comparisons, identifies syntenic regions, determines whether sets of genes in a gene family are related through speciation or internal chromosomal duplications, maps this information onto phylogenetic trees, and infers internal nodes within the phylogenetic tree that may represent local – as opposed to speciation or segmental – duplication. We describe the application of the software using three examples: the melanoma-associated antigen (MAGE) gene family on the X chromosomes of mouse and human; the 20S proteasome subunit gene family in Arabidopsis, and the major latex protein gene family in Arabidopsis.
Conclusion
OPM combines comparative genomic positional information and phylogenetic reconstructions to identify which gene duplications are likely to have arisen through internal genomic duplications (such as polyploidy), through speciation, or through local duplications (such as unequal crossing-over). The software is freely available at .
doi:10.1186/1471-2105-4-35
PMCID: PMC200972  PMID: 12952558
24.  Large-scale transcriptome analysis in chickpea (Cicer arietinum L.), an orphan legume crop of the semi-arid tropics of Asia and Africa 
Plant Biotechnology Journal  2011;9(8):922-931.
Chickpea (Cicer arietinum L.) is an important legume crop in the semi-arid regions of Asia and Africa. Gains in crop productivity have been low however, particularly because of biotic and abiotic stresses. To help enhance crop productivity using molecular breeding techniques, next generation sequencing technologies such as Roche/454 and Illumina/Solexa were used to determine the sequence of most gene transcripts and to identify drought-responsive genes and gene-based molecular markers. A total of 103 215 tentative unique sequences (TUSs) have been produced from 435 018 Roche/454 reads and 21 491 Sanger expressed sequence tags (ESTs). Putative functions were determined for 49 437 (47.8%) of the TUSs, and gene ontology assignments were determined for 20 634 (41.7%) of the TUSs. Comparison of the chickpea TUSs with the Medicago truncatula genome assembly (Mt 3.5.1 build) resulted in 42 141 aligned TUSs with putative gene structures (including 39 281 predicted intron/splice junctions). Alignment of ∼37 million Illumina/Solexa tags generated from drought-challenged root tissues of two chickpea genotypes against the TUSs identified 44 639 differentially expressed TUSs. The TUSs were also used to identify a diverse set of markers, including 728 simple sequence repeats (SSRs), 495 single nucleotide polymorphisms (SNPs), 387 conserved orthologous sequence (COS) markers, and 2088 intron-spanning region (ISR) markers. This resource will be useful for basic and applied research for genome analysis and crop improvement in chickpea.
doi:10.1111/j.1467-7652.2011.00625.x
PMCID: PMC3437486  PMID: 21615673
chickpea; next generation sequencing; transcriptome; drought-responsive genes; markers

Results 1-24 (24)