1.  Pyrosequencing-Based Transcriptome Analysis of the Asian Rice Gall Midge Reveals Differential Response during Compatible and Incompatible Interaction 
The Asian rice gall midge (Orseolia oryzae) is a major pest responsible for immense loss in rice productivity. Currently, very little knowledge exists with regard to this insect at the molecular level. The present study was initiated with the aim of developing molecular resources as well as identifying alterations at the transcriptome level in the gall midge maggots that are in a compatible (SH) or in an incompatible interaction (RH) with their rice host. Roche 454 pyrosequencing strategy was used to develop both transcriptomics and genomics resources that led to the identification of 79,028 and 85,395 EST sequences from gall midge biotype 4 (GMB4) maggots feeding on a susceptible and resistant rice variety, TN1 (SH) and Suraksha (RH), respectively. Comparative transcriptome analysis of the maggots in SH and RH revealed over-representation of transcripts from proteolysis and protein phosphorylation in maggots from RH. In contrast, over-representation of transcripts for translation, regulation of transcription and transcripts involved in electron transport chain were observed in maggots from SH. This investigation, besides unveiling various mechanisms underlying insect-plant interactions, will also lead to a better understanding of strategies adopted by insects in general, and the Asian rice gall midge in particular, to overcome host defense.
PMCID: PMC3497313  PMID: 23202939
Orseolia oryzae; susceptible host; resistant host; next generation sequencing (NGS); real time PCR; insect biotypes; insect-plant interaction
2.  Serine Proteases-Like Genes in the Asian Rice Gall Midge Show Differential Expression in Compatible and Incompatible Interactions with Rice 
The Asian rice gall midge, Orseolia oryzae (Wood-Mason), is a serious pest of rice. Investigations into the gall midge-rice interaction will unveil the underlying molecular mechanisms which, in turn, can be used as a tool to assist in developing suitable integrated pest management strategies. The insect gut is known to be involved in various physiological and biological processes including digestion, detoxification and interaction with the host. We have cloned and identified two genes, OoprotI and OoprotII, homologous to serine proteases with the conserved His87, Asp136 and Ser241 residues. OoProtI shared 52.26% identity with mosquito-type trypsin from Hessian fly whereas OoProtII showed 52.49% identity to complement component activated C1s from the Hessian fly. Quantitative real time PCR analysis revealed that both the genes were significantly upregulated in larvae feeding on resistant cultivar than in those feeding on susceptible cultivar. These results provide an opportunity to understand the gut physiology of the insect under compatible or incompatible interactions with the host. Phylogenetic analysis grouped these genes in the clade containing proteases of phytophagous insects away from hematophagous insects.
PMCID: PMC3116160  PMID: 21686154
biotype; chymotrypsin; insect-plant interaction; phytophagous insects; real time PCR; trypsin
3.  Development of New Polymorphic Microsatellite Loci for the Barley Stem Gall Midge, Mayetiola hordei (Diptera: Cecidomyiidae) from an Enriched Library 
Using an enriched library method, seven polymorphic microsatellite loci were isolated from the barley stem gall midge, Mayetiola hordei. Polymorphism at loci was surveyed on 57 individual midges collected on barley in Tunisia. Across loci, polymorphism ranged from two to six alleles per locus. The observed heterozygosity varied between 0.070 and 0.877. Based on the number of alleles detected and the associated levels of heterozygosity, we believe that these loci will prove useful for population genetic studies on M. hordei.
PMCID: PMC3509590  PMID: 23203074
Mayetiola destructor; dinucleotide; trinucleotide; molecular ecology
4.  Genome-wide characterization of simple sequence repeats in cucumber (Cucumis sativus L.) 
BMC Genomics  2010;11:569.
Cucumber, Cucumis sativus L. is an important vegetable crop worldwide. Until very recently, cucumber genetic and genomic resources, especially molecular markers, have been very limited, impeding progress of cucumber breeding efforts. Microsatellites are short tandemly repeated DNA sequences, which are frequently favored as genetic markers due to their high level of polymorphism and codominant inheritance. Data from previously characterized genomes has shown that these repeats vary in frequency, motif sequence, and genomic location across taxa. During the last year, the genomes of two cucumber genotypes were sequenced including the Chinese fresh market type inbred line '9930' and the North American pickling type inbred line 'Gy14'. These sequences provide a powerful tool for developing markers in a large scale. In this study, we surveyed and characterized the distribution and frequency of perfect microsatellites in 203 Mbp assembled Gy14 DNA sequences, representing 55% of its nuclear genome, and in cucumber EST sequences. Similar analyses were performed in genomic and EST data from seven other plant species, and the results were compared with those of cucumber.
A total of 112,073 perfect repeats were detected in the Gy14 cucumber genome sequence, accounting for 0.9% of the assembled Gy14 genome, with an overall density of 551.9 SSRs/Mbp. While tetranucleotides were the most frequent microsatellites in genomic DNA sequence, dinucleotide repeats, which had more repeat units than any other SSR type, had the highest cumulative sequence length. Coding regions (ESTs) of the cucumber genome had fewer microsatellites compared to its genomic sequence, with trinucleotides predominating in EST sequences. AAG was the most frequent repeat in cucumber ESTs. Overall, AT-rich motifs prevailed in both genomic and EST data. Compared to the other species examined, cucumber genomic sequence had the highest density of SSRs (although comparable to the density of poplar, grapevine and rice), and was richest in AT dinucleotides. Using an electronic PCR strategy, we investigated the polymorphism between 9930 and Gy14 at 1,006 SSR loci, and found unexpectedly high degree of polymorphism (48.3%) between the two genotypes. The level of polymorphism seems to be positively associated with the number of repeat units in the microsatellite. The in silico PCR results were validated empirically in 660 of the 1,006 SSR loci. In addition, primer sequences for more than 83,000 newly-discovered cucumber microsatellites, and their exact positions in the Gy14 genome assembly were made publicly available.
The cucumber genome is rich in microsatellites; AT and AAG are the most abundant repeat motifs in genomic and EST sequences of cucumber, respectively. Considering all the species investigated, some commonalities were noted, especially within the monocot and dicot groups, although the distribution of motifs and the frequency of certain repeats were characteristic of the species examined. The large number of SSR markers developed from this study should be a significant contribution to the cucurbit research community.
PMCID: PMC3091718  PMID: 20950470
5.  A Review of Microsatellite Markers and Their Applications in Rice Breeding Programs to Improve Blast Disease Resistance 
Over the last few decades, the use of molecular markers has played an increasing role in rice breeding and genetics. Of the different types of molecular markers, microsatellites have been utilized most extensively, because they can be readily amplified by PCR and the large amount of allelic variation at each locus. Microsatellites are also known as simple sequence repeats (SSR), and they are typically composed of 1–6 nucleotide repeats. These markers are abundant, distributed throughout the genome and are highly polymorphic compared with other genetic markers, as well as being species-specific and co-dominant. For these reasons, they have become increasingly important genetic markers in rice breeding programs. The evolution of new biotypes of pests and diseases as well as the pressures of climate change pose serious challenges to rice breeders, who would like to increase rice production by introducing resistance to multiple biotic and abiotic stresses. Recent advances in rice genomics have now made it possible to identify and map a number of genes through linkage to existing DNA markers. Among the more noteworthy examples of genes that have been tightly linked to molecular markers in rice are those that confer resistance or tolerance to blast. Therefore, in combination with conventional breeding approaches, marker-assisted selection (MAS) can be used to monitor the presence or lack of these genes in breeding populations. For example, marker-assisted backcross breeding has been used to integrate important genes with significant biological effects into a number of commonly grown rice varieties. The use of cost-effective, finely mapped microsatellite markers and MAS strategies should provide opportunities for breeders to develop high-yield, blast resistance rice cultivars. The aim of this review is to summarize the current knowledge concerning the linkage of microsatellite markers to rice blast resistance genes, as well as to explore the use of MAS in rice breeding programs aimed at improving blast resistance in this species. We also discuss the various advantages, disadvantages and uses of microsatellite markers relative to other molecular marker types.
PMCID: PMC3856076  PMID: 24240810
simple sequence repeats; marker development and application; blast resistance; marker assisted selection; rice breeding
6.  Microsatellite isolation and marker development in carrot - genomic distribution, linkage mapping, genetic diversity analysis and marker transferability across Apiaceae 
BMC Genomics  2011;12:386.
The Apiaceae family includes several vegetable and spice crop species among which carrot is the most economically important member, with ~21 million tons produced yearly worldwide. Despite its importance, molecular resources in this species are relatively underdeveloped. The availability of informative, polymorphic, and robust PCR-based markers, such as microsatellites (or SSRs), will facilitate genetics and breeding of carrot and other Apiaceae, including integration of linkage maps, tagging of phenotypic traits and assisting positional gene cloning. Thus, with the purpose of isolating carrot microsatellites, two different strategies were used; a hybridization-based library enrichment for SSRs, and bioinformatic mining of SSRs in BAC-end sequence and EST sequence databases. This work reports on the development of 300 carrot SSR markers and their characterization at various levels.
Evaluation of microsatellites isolated from both DNA sources in subsets of 7 carrot F2 mapping populations revealed that SSRs from the hybridization-based method were longer, had more repeat units and were more polymorphic than SSRs isolated by sequence search. Overall, 196 SSRs (65.1%) were polymorphic in at least one mapping population, and the percentage of polymophic SSRs across F2 populations ranged from 17.8 to 24.7. Polymorphic markers in one family were evaluated in the entire F2, allowing the genetic mapping of 55 SSRs (38 codominant) onto the carrot reference map. The SSR loci were distributed throughout all 9 carrot linkage groups (LGs), with 2 to 9 SSRs/LG. In addition, SSR evaluations in carrot-related taxa indicated that a significant fraction of the carrot SSRs transfer successfully across Apiaceae, with heterologous amplification success rate decreasing with the target-species evolutionary distance from carrot. SSR diversity evaluated in a collection of 65 D. carota accessions revealed a high level of polymorphism for these selected loci, with an average of 19 alleles/locus and 0.84 expected heterozygosity.
The addition of 55 SSRs to the carrot map, together with marker characterizations in six other mapping populations, will facilitate future comparative mapping studies and integration of carrot maps. The markers developed herein will be a valuable resource for assisting breeding, genetic, diversity, and genomic studies of carrot and other Apiaceae.
PMCID: PMC3162538  PMID: 21806822
7.  Rapid Development of Microsatellite Markers for Callosobruchus chinensis Using Illumina Paired-End Sequencing 
PLoS ONE  2014;9(5):e95458.
The adzuki bean weevil, Callosobruchus chinensis L., is one of the most destructive pests of stored legume seeds such as mungbean, cowpea, and adzuki bean, which usually cause considerable loss in the quantity and quality of stored seeds during transportation and storage. However, a lack of genetic information of this pest results in a series of genetic questions remain largely unknown, including population genetic structure, kinship, biotype abundance, and so on. Co-dominant microsatellite markers offer a great resolving power to determine these events. Here, we report rapid microsatellite isolation from C. chinensis via high-throughput sequencing.
Principal Findings
In this study, 94,560,852 quality-filtered and trimmed reads were obtained for the assembly of genome using Illumina paired-end sequencing technology. In total, the genome with total length of 497,124,785 bp, comprising 403,113 high quality contigs was generated with de novo assembly. More than 6800 SSR loci were detected and a suit of 6303 primer pair sequences were designed and 500 of them were randomly selected for validation. Of these, 196 pair of primers, i.e. 39.2%, produced reproducible amplicons that were polymorphic among 8 C. chinensis genotypes collected from different geographical regions. Twenty out of 196 polymorphic SSR markers were used to analyze the genetic diversity of 18 C. chinensis populations. The results showed the twenty SSR loci were highly polymorphic among these populations.
This study presents a first report of genome sequencing and de novo assembly for C. chinensis and demonstrates the feasibility of generating a large scale of sequence information and SSR loci isolation by Illumina paired-end sequencing. Our results provide a valuable resource for C. chinensis research. These novel markers are valuable for future genetic mapping, trait association, genetic structure and kinship among C. chinensis.
PMCID: PMC4023940  PMID: 24835431
8.  Nonrandom distribution and frequencies of genomic and EST-derived microsatellite markers in rice, wheat, and barley 
BMC Genomics  2005;6:23.
Earlier comparative maps between the genomes of rice (Oryza sativa L.), barley (Hordeum vulgare L.) and wheat (Triticum aestivum L.) were linkage maps based on cDNA-RFLP markers. The low number of polymorphic RFLP markers has limited the development of dense genetic maps in wheat and the number of available anchor points in comparative maps. Higher density comparative maps using PCR-based anchor markers are necessary to better estimate the conservation of colinearity among cereal genomes. The purposes of this study were to characterize the proportion of transcribed DNA sequences containing simple sequence repeats (SSR or microsatellites) by length and motif for wheat, barley and rice and to determine in-silico rice genome locations for primer sets developed for wheat and barley Expressed Sequence Tags.
The proportions of SSR types (di-, tri-, tetra-, and penta-nucleotide repeats) and motifs varied with the length of the SSRs within and among the three species, with trinucleotide SSRs being the most frequent. Distributions of genomic microsatellites (gSSRs), EST-derived microsatellites (EST-SSRs), and transcribed regions in the contiguous sequence of rice chromosome 1 were highly correlated. More than 13,000 primer pairs were developed for use by the cereal research community as potential markers in wheat, barley and rice.
Trinucleotide SSRs were the most common type in each of the species; however, the relative proportions of SSR types and motifs differed among rice, wheat, and barley. Genomic microsatellites were found to be primarily located in gene-rich regions of the rice genome. Microsatellite markers derived from the use of non-redundant EST-SSRs are an economic and efficient alternative to RFLP for comparative mapping in cereals.
PMCID: PMC550658  PMID: 15720707
9.  Highly Informative Single-Copy Nuclear Microsatellite DNA Markers Developed Using an AFLP-SSR Approach in Black Spruce (Picea mariana) and Red Spruce (P. rubens) 
PLoS ONE  2014;9(8):e103789.
Microsatellites or simple sequence repeats (SSRs) are highly informative molecular markers for various biological studies in plants. In spruce (Picea) and other conifers, the development of single-copy polymorphic genomic microsatellite markers is quite difficult, owing primarily to the large genome size and predominance of repetitive DNA sequences throughout the genome. We have developed highly informative single-locus genomic microsatellite markers in black spruce (Picea mariana) and red spruce (Picea rubens) using a simple but efficient method based on a combination of AFLP and microsatellite technologies.
Principal Findings
A microsatellite-enriched library was constructed from genomic AFLP DNA fragments of black spruce. Sequencing of the 108 putative SSR-containing clones provided 94 unique sequences with microsatellites. Twenty-two of the designed 34 primer pairs yielded scorable amplicons, with single-locus patterns. Fourteen of these microsatellite markers were characterized in 30 black spruce and 30 red spruce individuals drawn from many populations. The number of alleles at a polymorphic locus ranged from 2 to 18, with a mean of 9.3 in black spruce, and from 3 to 15, with a mean of 6.2 alleles in red spruce. The polymorphic information content or expected heterozygosity ranged from 0.340 to 0.909 (mean = 0.67) in black spruce and from 0.161 to 0.851 (mean = 0.62) in red spruce. Ten SSR markers showing inter-parental polymorphism inherited in a single-locus Mendelian mode, with two cases of distorted segregation. Primer pairs for almost all polymorphic SSR loci resolved microsatellites of comparable size in Picea glauca, P. engelmannii, P. sitchensis, and P. abies.
The AFLP-based microsatellite-enriched library appears to be a rapid, cost-effective approach for isolating and developing single-locus informative genomic microsatellite markers in black spruce. The markers developed should be useful in black spruce, red spruce and other Picea species for various genetics, genomics, breeding, forensics, conservation studies and applications.
PMCID: PMC4134192  PMID: 25126846
10.  Characterization of novel microsatellite markers in Musa acuminata subsp. burmannicoides, var. Calcutta 4 
BMC Research Notes  2010;3:148.
Banana is a nutritionally important crop across tropical and sub-tropical countries in sub-Saharan Africa, Central and South America and Asia. Although cultivars have evolved from diploid, triploid and tetraploid wild Asian species of Musa acuminata (A genome) and Musa balbisiana (B genome), many of today's commercial cultivars are sterile triploids or diploids, with fruit developing via parthenocarpy. As a result of restricted genetic variation, improvement has been limited, resulting in a crop frequently lacking resistance to pests and disease. Considering the importance of molecular tools to facilitate development of disease resistant genotypes, the objectives of this study were to develop polymorphic microsatellite markers from BAC clone sequences for M. acuminata subsp. burmannicoides, var. Calcutta 4. This wild diploid species is used as a donor cultivar in breeding programs as a source of resistance to diverse biotic stresses.
Microsatellite sequences were identified from five Calcutta 4 BAC consensi datasets. Specific primers were designed for 41 loci. Isolated di-nucleotide repeat motifs were the most abundant, followed by tri-nucleotides. From 33 tested loci, 20 displayed polymorphism when screened across 21 diploid M. acuminata accessions, contrasting in resistance to Sigatoka diseases. The number of alleles per SSR locus ranged from two to four, with a total of 56. Six repeat classes were identified, with di-nucleotides the most abundant. Expected heterozygosity values for polymorphic markers ranged from 0.31 to 0.75.
This is the first report identifying polymorphic microsatellite markers from M. acuminata subsp. burmannicoides, var. Calcutta 4 across accessions contrasting in resistance to Sigatoka diseases. These BAC-derived polymorphic microsatellite markers are a useful resource for banana, applicable for genetic map development, germplasm characterization, evolutionary studies and marker assisted selection for traits.
PMCID: PMC2893197  PMID: 20507605
11.  Conspecific Crop-Weed Introgression Influences Evolution of Weedy Rice (Oryza sativa f. spontanea) across a Geographical Range 
PLoS ONE  2011;6(1):e16189.
Introgression plays an important role in evolution of plant species via its influences on genetic diversity and differentiation. Outcrossing determines the level of introgression but little is known about the relationships of outcrossing rates, genetic diversity, and differentiation particularly in a weedy taxon that coexists with its conspecific crop.
Methodology/Principal Findings
Eleven weedy rice (Oryza sativa f. spontanea) populations from China were analyzed using microsatellite (SSR) fingerprints to study outcrossing rate and its relationship with genetic variability and differentiation. To estimate outcrossing, six highly polymorphic SSR loci were used to analyze >5500 progeny from 216 weedy rice families, applying a mixed mating model; to estimate genetic diversity and differentiation, 22 SSR loci were analyzed based on 301 weedy individuals. Additionally, four weed-crop shared SSR loci were used to estimate the influence of introgression from rice cultivars on weedy rice differentiation. Outcrossing rates varied significantly (0.4∼11.7%) among weedy rice populations showing relatively high overall Nei's genetic diversity (0.635). The observed heterozygosity was significantly correlated with outcrossing rates among populations (r2 = 0.783; P<0.001) although no obvious correlation between outcrossing rates and genetic diversity parameters was observed. Allelic introgression from rice cultivars to their coexisting weedy rice was detected. Weedy rice populations demonstrated considerable genetic differentiation that was correlated with their spatial distribution (r2 = 0.734; P<0.001), and possibly also influenced by the introgression from rice cultivars.
Outcrossing rates can significantly affect heterozygosity of populations, which may shape the evolutionary potential of weedy rice. Introgression from the conspecific crop rice can influence the genetic differentiation and possibly evolution of its coexisting weedy rice populations.
PMCID: PMC3020953  PMID: 21249201
12.  Large-scale identification of polymorphic microsatellites using an in silico approach 
BMC Bioinformatics  2008;9:374.
Simple Sequence Repeat (SSR) or microsatellite markers are valuable for genetic research. Experimental methods to develop SSR markers are laborious, time consuming and expensive. In silico approaches have become a practicable and relatively inexpensive alternative during the last decade, although testing putative SSR markers still is time consuming and expensive. In many species only a relatively small percentage of SSR markers turn out to be polymorphic. This is particularly true for markers derived from expressed sequence tags (ESTs). In EST databases a large redundancy of sequences is present, which may contain information on length-polymorphisms in the SSR they contain, and whether they have been derived from heterozygotes or from different genotypes. Up to now, although a number of programs have been developed to identify SSRs in EST sequences, no software can detect putatively polymorphic SSRs.
We have developed PolySSR, a new pipeline to identify polymorphic SSRs rather than just SSRs. Sequence information is obtained from public EST databases derived from heterozygous individuals and/or at least two different genotypes. The pipeline includes PCR-primer design for the putatively polymorphic SSR markers, taking into account Single Nucleotide Polymorphisms (SNPs) in the flanking regions, thereby improving the success rate of the potential markers. A large number of polymorphic SSRs were identified using publicly available EST sequences of potato, tomato, rice, Arabidopsis, Brassica and chicken.
The SSRs obtained were divided into long and short based on the number of times the motif was repeated. Surprisingly, the frequency of polymorphic SSRs was much higher in the short SSRs.
PolySSR is a very effective tool to identify polymorphic SSRs. Using PolySSR, several hundred putative markers were developed and stored in a searchable database. Validation experiments showed that almost all markers that were indicated as putatively polymorphic by polySSR were indeed polymorphic. This greatly improves the efficiency of marker development, especially in species where there are low levels of polymorphism, like tomato. When combined with the new sequencing technologies PolySSR will have a big impact on the development of polymorphic SSRs in any species.
PolySSR and the polymorphic SSR marker database are available from .
PMCID: PMC2562394  PMID: 18793407
13.  Development and Application of Microsatellites in Candidate Genes Related to Wood Properties in the Chinese White Poplar (Populus tomentosa Carr.) 
Gene-derived simple sequence repeats (genic SSRs), also known as functional markers, are often preferred over random genomic markers because they represent variation in gene coding and/or regulatory regions. We characterized 544 genic SSR loci derived from 138 candidate genes involved in wood formation, distributed throughout the genome of Populus tomentosa, a key ecological and cultivated wood production species. Of these SSRs, three-quarters were located in the promoter or intron regions, and dinucleotide (59.7%) and trinucleotide repeat motifs (26.5%) predominated. By screening 15 wild P. tomentosa ecotypes, we identified 188 polymorphic genic SSRs with 861 alleles, 2–7 alleles for each marker. Transferability analysis of 30 random genic SSRs, testing whether these SSRs work in 26 genotypes of five genus Populus sections (outgroup, Salix matsudana), showed that 72% of the SSRs could be amplified in Turanga and 100% could be amplified in Leuce. Based on genotyping of these 26 genotypes, a neighbour-joining analysis showed the expected six phylogenetic groupings. In silico analysis of SSR variation in 220 sequences that are homologous between P. tomentosa and Populus trichocarpa suggested that genic SSR variations between relatives were predominantly affected by repeat motif variations or flanking sequence mutations. Inheritance tests and single-marker associations demonstrated the power of genic SSRs in family-based linkage mapping and candidate gene-based association studies, as well as marker-assisted selection and comparative genomic studies of P. tomentosa and related species.
PMCID: PMC3576656  PMID: 23213110
candidate gene-derived SSRs; cross-species transferability; in silico analysis of SSR variations; Populus tomentosa; single marker–trait association mapping
14.  A White Campion (Silene latifolia) floral expressed sequence tag (EST) library: annotation, EST-SSR characterization, transferability, and utility for comparative mapping 
BMC Genomics  2009;10:243.
Expressed sequence tag (EST) databases represent a valuable resource for the identification of genes in organisms with uncharacterized genomes and for development of molecular markers. One class of markers derived from EST sequences are simple sequence repeat (SSR) markers, also known as EST-SSRs. These are useful in plant genetic and evolutionary studies because they are located in transcribed genes and a putative function can often be inferred from homology searches. Another important feature of EST-SSR markers is their expected high level of transferability to related species that makes them very promising for comparative mapping. In the present study we constructed a normalized EST library from floral tissue of Silene latifolia with the aim to identify expressed genes and to develop polymorphic molecular markers.
We obtained a total of 3662 high quality sequences from a normalized Silene cDNA library. These represent 3105 unigenes, with 73% of unigenes matching genes in other species. We found 255 sequences containing one or more SSR motifs. More than 60% of these SSRs were trinucleotides. A total of 30 microsatellite loci were identified from 106 ESTs having sufficient flanking sequences for primer design. The inheritance of these loci was tested via segregation analyses and their usefulness for linkage mapping was assessed in an interspecific cross. Tests for crossamplification of the EST-SSR loci in other Silene species established their applicability to related species.
The newly characterized genes and gene-derived markers from our Silene EST library represent a valuable genetic resource for future studies on Silene latifolia and related species. The polymorphism and transferability of EST-SSR markers facilitate comparative linkage mapping and analyses of genetic diversity in the genus Silene.
PMCID: PMC2689282  PMID: 19467153
15.  Vibrio vulnificus Typing Based on Simple Sequence Repeats: Insights into the Biotype 3 Group▿ † 
Journal of Clinical Microbiology  2007;45(9):2951-2959.
Vibrio vulnificus is an opportunistic, highly invasive human pathogen with worldwide distribution. V. vulnificus strains are commonly divided into three biochemical groups (biotypes), most members of which are pathogenic. Simple sequence repeats (SSR) provide a source of high-level genomic polymorphism used in bacterial typing. Here, we describe the use of variations in mutable SSR loci for accurate and rapid genotyping of V. vulnificus. An in silico screen of the genomes of two V. vulnificus strains revealed thousands of SSR tracts. Twelve SSR with core motifs longer than 5 bp in a panel of 32 characterized and 56 other V. vulnificus isolates, including both clinical and environmental isolates from all three biotypes, were tested for polymorphism. All tested SSR were polymorphic, and diversity indices ranged from 0.17 to 0.90, allowing a high degree of discrimination among isolates (27 of 32 characterized isolates). Genetic analysis of the SSR data resulted in the clear distinction of isolates that belong to the highly virulent biotype 3 group. Despite the clonal nature of this new group, SSR analysis demonstrated high-level discriminatory power within the biotype 3 group, as opposed to other molecular methods that failed to differentiate these isolates. Thus, SSR are suitable for rapid typing and classification of V. vulnificus strains by high-throughput capillary electrophoresis methods. SSR (≥5 bp) by their nature enable the identification of variations occurring on a small scale and, therefore, may provide new insights into the newly emerged biotype 3 group of V. vulnificus and may be used as an efficient tool in epidemiological studies.
PMCID: PMC2045284  PMID: 17652479
16.  Isolation of New 40 Microsatellite Markers in Mandarin Fish (Siniperca chuatsi) 
In this study, 23 genomic microsatellite DNA markers and 17 express sequence tag (EST)-derived microsatellites were developed and characterized using the fast isolation by AFLP of sequences containing repeats (FIASCO) method and data mining from public EST databases of mandarin fish (Siniperca chuatsi). These polymorphic microsatellite markers were then tested for polymorphism in a wild S. chuatsi population. The number of alleles at 23 genomic SSRs varied from 2 to 19 with an average of 8.0 alleles per locus. The average observed and expected heterozygosities were 0.746 and 0.711, respectively. Of 5361 EST sequences examined, 3.9% (209) contain microsatellites, and di-nucleotide repeats are the most abundant (67.0%), followed by tri-nucleotide (29.7%) and tetra-nucleotide repeats (3.3%). The number of alleles at 17 EST-SSRs varied from 2 to 17 with an average of 8.4 alleles per locus. The average observed and expected heterozygosities were 0.789 and 0.685, respectively. No significant difference of loci polymorphism was found between genomic SSRs and EST-SSRs in terms of number of alleles and heterozygosities. Results of cross-species utility indicated that 13 (52.2%) of the genomic-SSRs and 13 (76.5%) of the EST-SSRs were successfully cross-amplified in a related species, the golden mandarin fish (Siniperca scherzeri).
PMCID: PMC3155344  PMID: 21845071
microsatellite; genomic SSRs; EST-SSRs; Siniperca chuatsi
17.  Genome-wide identification of microsatellites in white clover (Trifolium repens L.) using FIASCO and phpSSRMiner 
Plant Methods  2008;4:19.
Allotetraploid white clover (Trifolium repens L.) is an important forage legume widely cultivated in most temperate regions. Only a small number of microsatellite markers are publicly available and can be utilized in white clover breeding programs. The objectives of this study were to develop an integrated approach for microsatellite development and to evaluate the approach for the development of new SSR markers for white clover.
Genomic libraries containing simple sequence repeat (SSR) sequences were constructed using a modified Fast Isolation by AFLP of Sequences COntaining repeats (FIASCO) procedure and phpSSRMiner was used to develop the microsatellite markers. SSR motifs were isolated using two biotin-labeled probes, (CA)17 and (ATG)12. The sequences of 6,816 clones were assembled into 1,698 contigs, 32% of which represented novel sequences based on BLASTN searches. Approximately 32%, 28%, and 16% of these SSRs contained hexa-, tri-, and di-nucleotide repeats, respectively. The most frequent motifs were the CA and ATG complementary repeats and the associated compound sequences. Primer pairs were designed for 859 SSR loci based on sequences from these genomic libraries and from GenBank white clover nucleotide sequences. A total of 191 SSR primers developed from the two libraries were tested for polymorphism in individual clones from the parental genotypes GA43 ('Durana'), 'SRVR' and six F1 progeny from a mapping population. Ninety two percent produced amplicons and 66% of these were polymorphic.
The combined approach of identifying SSR-enriched fragments by FIASCO coupled with the primer design and in silico amplification using phpSSRMiner represents an efficient and low cost pipeline for the large-scale development of microsatellite markers in plants.
The approach described here could be readily adapted and utilized in other non-related species with none or limited genomic resources.
PMCID: PMC2517061  PMID: 18631390
18.  Genome-Wide Microsatellite Identification in the Fungus Anisogramma anomala Using Illumina Sequencing and Genome Assembly 
PLoS ONE  2013;8(11):e82408.
High-throughput sequencing has been dramatically accelerating the discovery of microsatellite markers (also known as Simple Sequence Repeats). Both 454 and Illumina reads have been used directly in microsatellite discovery and primer design (the “Seq-to-SSR” approach). However, constraints of this approach include: 1) many microsatellite-containing reads do not have sufficient flanking sequences to allow primer design, and 2) difficulties in removing microsatellite loci residing in longer, repetitive regions. In the current study, we applied the novel “Seq-Assembly-SSR” approach to overcome these constraints in Anisogramma anomala. In our approach, Illumina reads were first assembled into a draft genome, and the latter was then used in microsatellite discovery. A. anomala is an obligate biotrophic ascomycete that causes eastern filbert blight disease of commercial European hazelnut. Little is known about its population structure or diversity. Approximately 26 M 146 bp Illumina reads were generated from a paired-end library of a fungal strain from Oregon. The reads were assembled into a draft genome of 333 Mb (excluding gaps), with contig N50 of 10,384 bp and scaffold N50 of 32,987 bp. A bioinformatics pipeline identified 46,677 microsatellite motifs at 44,247 loci, including 2,430 compound loci. Primers were successfully designed for 42,923 loci (97%). After removing 2,886 loci close to assembly gaps and 676 loci in repetitive regions, a genome-wide microsatellite database of 39,361 loci was generated for the fungus. In experimental screening of 236 loci using four geographically representative strains, 228 (96.6%) were successfully amplified and 214 (90.7%) produced single PCR products. Twenty-three (9.7%) were found to be perfect polymorphic loci. A small-scale population study using 11 polymorphic loci revealed considerable gene diversity. Clustering analysis grouped isolates of this fungus into two clades in accordance with their geographic origins. Thus, the “Seq-Assembly-SSR” approach has proven to be a successful one for microsatellite discovery.
PMCID: PMC3842260  PMID: 24312419
19.  Characterization of simple sequence repeats (SSRs) from Phlebotomus papatasi (Diptera: Psychodidae) expressed sequence tags (ESTs) 
Parasites & Vectors  2011;4:189.
Phlebotomus papatasi is a natural vector of Leishmania major, which causes cutaneous leishmaniasis in many countries. Simple sequence repeats (SSRs), or microsatellites, are common in eukaryotic genomes and are short, repeated nucleotide sequence elements arrayed in tandem and flanked by non-repetitive regions. The enrichment methods used previously for finding new microsatellite loci in sand flies remain laborious and time consuming; in silico mining, which includes retrieval and screening of microsatellites from large amounts of sequence data from sequence data bases using microsatellite search tools can yield many new candidate markers.
Simple sequence repeats (SSRs) were characterized in P. papatasi expressed sequence tags (ESTs) derived from a public database, National Center for Biotechnology Information (NCBI). A total of 42,784 sequences were mined, and 1,499 SSRs were identified with a frequency of 3.5% and an average density of 15.55 kb per SSR. Dinucleotide motifs were the most common SSRs, accounting for 67% followed by tri-, tetra-, and penta-nucleotide repeats, accounting for 31.1%, 1.5%, and 0.1%, respectively. The length of microsatellites varied from 5 to 16 repeats. Dinucleotide types; AG and CT have the highest frequency. Dinucleotide SSR-ESTs are relatively biased toward an excess of (AX)n repeats and a low GC base content. Forty primer pairs were designed based on motif lengths for further experimental validation.
The first large-scale survey of SSRs derived from P. papatasi is presented; dinucleotide SSRs identified are more frequent than other types. EST data mining is an effective strategy to identify functional microsatellites in P. papatasi.
PMCID: PMC3191335  PMID: 21958493
20.  Simple sequence repeat variation in the Daphnia pulex genome 
BMC Genomics  2010;11:691.
Simple sequence repeats (SSRs) are highly variable features of all genomes. Their rapid evolution makes them useful for tracing the evolutionary history of populations and investigating patterns of selection and mutation across gnomes. The recently sequenced Daphnia pulex genome provides us with a valuable data set to study the mode and tempo of SSR evolution, without the inherent biases that accompany marker selection.
Here we catalogue SSR loci in the Daphnia pulex genome with repeated motif sizes of 1-100 nucleotides with a minimum of 3 perfect repeats. We then used whole genome shotgun reads to determine the average heterozygosity of each SSR type and the relationship that it has to repeat number, motif size, motif sequence, and distribution of SSR loci. We find that SSR heterozygosity is motif specific, and positively correlated with repeat number as well as motif size. For non-repeat unit polymorphisms, we identify a motif-dependent end-nucleotide polymorphism bias that may contribute to the patterns of abundance for specific homopolymers, dimers, and trimers. Our observations confirm the high frequency of multiple unit variation (multistep) at large microsatellite loci, and further show that the occurrence of multiple unit variation is dependent on both repeat number and motif size. Using the Daphnia pulex genetic map, we show a positive correlation between dimer and trimer frequency and recombination.
This genome-wide analysis of SSR variation in Daphnia pulex indicates that several aspects of SSR variation are motif dependent and suggests that a combination of unit length variation and end repeat biased base substitution contribute to the unique spectrum of SSR repeat loci.
PMCID: PMC3017760  PMID: 21129182
21.  Identity and diversity of blood meal hosts of biting midges (Diptera: Ceratopogonidae: Culicoides Latreille) in Denmark 
Parasites & Vectors  2012;5:143.
Host preference studies in haematophagous insects e.g. Culicoides biting midges are pivotal to assess transmission routes of vector-borne diseases and critical for the development of veterinary contingency plans to identify which species should be included due to their risk potential. Species of Culicoides have been found in almost all parts of the world and known to live in a variety of habitats. Several parasites and viruses are transmitted by Culicoides biting midges including Bluetongue virus and Schmallenberg virus. The aim of the present study was to determine the identity and diversity of blood meals taken from vertebrate hosts in wild-caught Culicoides biting midges near livestock farms.
Biting midges were collected at weekly intervals for 20 weeks from May to October 2009 using light traps at four collection sites on the island Sealand, Denmark. Blood-fed female biting midges were sorted and head and wings were removed for morphological species identification. The thoraxes and abdomens including the blood meals of the individual females were subsequently subjected to DNA isolation. The molecular marker cytochrome oxidase I (COI barcode) was applied to identify the species of the collected biting midges (GenBank accessions JQ683259-JQ683374). The blood meals were first screened with a species-specific cytochrome b primer pair for cow and if negative with a universal cytochrome b primer pair followed by sequencing to identify mammal or avian blood meal hosts.
Twenty-four species of biting midges were identified from the four study sites. A total of 111,356 Culicoides biting midges were collected, of which 2,164 were blood-fed. Specimens of twenty species were identified with blood in their abdomens. Blood meal sources were successfully identified by DNA sequencing from 242 (76%) out of 320 Culicoides specimens. Eight species of mammals and seven species of birds were identified as blood meal hosts. The most common host species was the cow, which constituted 77% of the identified blood meals. The second most numerous host species was the common wood pigeon, which constituted 6% of the identified blood meals.
Our results suggest that some Culicoides species are opportunistic and readily feed on a variety of mammals and birds, while others seems to be strictly mammalophilic or ornithophilic. Based on their number, dispersal potential and blood feeding behaviour, we conclude that Culicoides biting midges are potential vectors for many pathogens not yet introduced to Denmark.
PMCID: PMC3461417  PMID: 22824422
COI barcoding; Bluetongue virus; Schmallenberg virus; Blood meal host; Culicoides; Ornithophilic insects; Mammalophilic insects
22.  Frequency, type, and distribution of EST-SSRs from three genotypes of Lolium perenne, and their conservation across orthologous sequences of Festuca arundinacea, Brachypodium distachyon, and Oryza sativa 
BMC Plant Biology  2007;7:36.
Simple sequence repeat (SSR) markers are highly informative and widely used for genetic and breeding studies in several plant species. They are used for cultivar identification, variety protection, as anchor markers in genetic mapping, and in marker-assisted breeding. Currently, a limited number of SSR markers are publicly available for perennial ryegrass (Lolium perenne). We report on the exploitation of a comprehensive EST collection in L. perenne for SSR identification. The objectives of this study were 1) to analyse the frequency, type, and distribution of SSR motifs in ESTs derived from three genotypes of L. perenne, 2) to perform a comparative analysis of SSR motif polymorphisms between allelic sequences, 3) to conduct a comparative analysis of SSR motif polymorphisms between orthologous sequences of L. perenne, Festuca arundinacea, Brachypodium distachyon, and O. sativa, 4) to identify functionally associated EST-SSR markers for application in comparative genomics and breeding.
From 25,744 ESTs, representing 8.53 megabases of nucleotide information from three genotypes of L. perenne, 1,458 ESTs (5.7%) contained one or more SSRs. Of these SSRs, 955 (3.7%) were non-redundant. Tri-nucleotide repeats were the most abundant type of repeats followed by di- and tetra-nucleotide repeats. The EST-SSRs from the three genotypes were analysed for allelic- and/or genotypic SSR motif polymorphisms. Most of the SSR motifs (97.7%) showed no polymorphisms, whereas 22 EST-SSRs showed allelic- and/or genotypic polymorphisms. All polymorphisms identified were changes in the number of repeat units. Comparative analysis of the L. perenne EST-SSRs with sequences of Festuca arundinacea, Brachypodium distachyon, and Oryza sativa identified 19 clusters of orthologous sequences between these four species. Analysis of the clusters showed that the SSR motif generally is conserved in the closely related species F. arundinacea, but often differs in length of the SSR motif. In contrast, SSR motifs are often lost in the more distant related species B. distachyon and O. sativa.
The results indicate that the L. perenne EST-SSR markers are a valuable resource for genetic mapping, as well as evaluation of co-location between QTLs and functionally associated markers.
PMCID: PMC1950305  PMID: 17626623
23.  Use of genome sequence data in the design and testing of SSR markers for Phytophthora species 
BMC Genomics  2008;9:620.
Microsatellites or single sequence repeats (SSRs) are a powerful choice of marker in the study of Phytophthora population biology, epidemiology, ecology, genetics and evolution. A strategy was tested in which the publicly available unigene datasets extracted from genome sequences of P. infestans, P. sojae and P. ramorum were mined for candidate SSR markers that could be applied to a wide range of Phytophthora species.
A first approach, aimed at the identification of polymorphic SSR loci common to many Phytophthora species, yielded 171 reliable sequences containing 211 SSRs. Microsatellites were identified from 16 target species representing the breadth of diversity across the genus. Repeat number ranged from 3 to 16 with most having seven repeats or less and four being the most commonly found. Trinucleotide repeats such as (AAG)n, (AGG)n and (AGC)n were the most common followed by pentanucleotide, tetranucleotide and dinucleotide repeats. A second approach was aimed at the identification of useful loci common to a restricted number of species more closely related to P. sojae (P. alni, P. cambivora, P. europaea and P. fragariae). This analysis yielded 10 trinucleotide and 2 tetranucleotide SSRs which were repeated 4, 5 or 6 times.
Key studies on inter- and intra-specific variation of selected microsatellites remain. Despite the screening of conserved gene coding regions, the sequence diversity between species was high and the identification of useful SSR loci applicable to anything other than the most closely related pairs of Phytophthora species was challenging. That said, many novel SSR loci for species other than the three 'source species' (P. infestans, P. sojae and P. ramorum) are reported, offering great potential for the investigation of Phytophthora populations. In addition to the presence of microsatellites, many of the amplified regions may represent useful molecular marker regions for other studies as they are highly variable and easily amplifiable from different Phytophthora species.
PMCID: PMC2647557  PMID: 19099584
24.  Identification, characterization and utilization of unigene derived microsatellite markers in tea (Camellia sinensis L.) 
BMC Plant Biology  2009;9:53.
Despite great advances in genomic technology observed in several crop species, the availability of molecular tools such as microsatellite markers has been limited in tea (Camellia sinensis L.). The development of microsatellite markers will have a major impact on genetic analysis, gene mapping and marker assisted breeding. Unigene derived microsatellite (UGMS) markers identified from publicly available sequence database have the advantage of assaying variation in the expressed component of the genome with unique identity and position. Therefore, they can serve as efficient and cost effective alternative markers in such species.
Considering the multiple advantages of UGMS markers, 1,223 unigenes were predicted from 2,181 expressed sequence tags (ESTs) of tea (Camellia sinensis L.). A total of 109 (8.9%) unigenes containing 120 SSRs were identified. SSR abundance was one in every 3.55 kb of EST sequences. The microsatellites mainly comprised of di (50.8%), tri (30.8%), tetra (6.6%), penta (7.5%) and few hexa (4.1%) nucleotide repeats. Among the dinucleotide repeats, (GA)n.(TC)n were most abundant (83.6%). Ninety six primer pairs could be designed form 83.5% of SSR containing unigenes. Of these, 61 (63.5%) primer pairs were experimentally validated and used to investigate the genetic diversity among the 34 accessions of different Camellia spp. Fifty one primer pairs (83.6%) were successfully cross transferred to the related species at various levels. Functional annotation of the unigenes containing SSRs was done through gene ontology (GO) characterization. Thirty six (60%) of them revealed significant sequence similarity with the known/putative proteins of Arabidopsis thaliana. Polymorphism information content (PIC) ranged from 0.018 to 0.972 with a mean value of 0.497. The average heterozygosity expected (HE) and observed (Ho) obtained was 0.654 and 0.413 respectively, thereby suggesting highly heterogeneous nature of tea. Further, test for IAM and SMM models for the UGMS loci showed excess heterozygosity and did not show any bottleneck operating in the tea population.
UGMS markers identified and characterized in this study provided insight about the abundance and distribution of SSR in the expressed genome of C. sinensis. The identification and validation of 61 new UGMS markers will not only help in intra and inter specific genetic diversity assessment but also be enriching limited microsatellite markers resource in tea. Further, the use of these markers would reduce the cost and facilitate the gene mapping and marker-aided selection in tea. Since, 36 of these UGMS markers correspond to the Arabidopsis protein sequence data with known functions will offer the opportunity to investigate the consequences of SSR polymorphism on gene functions.
PMCID: PMC2693106  PMID: 19426565
25.  An annotated genetic map of loblolly pine based on microsatellite and cDNA markers 
BMC Genetics  2011;12:17.
Previous loblolly pine (Pinus taeda L.) genetic linkage maps have been based on a variety of DNA polymorphisms, such as AFLPs, RAPDs, RFLPs, and ESTPs, but only a few SSRs (simple sequence repeats), also known as simple tandem repeats or microsatellites, have been mapped in P. taeda. The objective of this study was to integrate a large set of SSR markers from a variety of sources and published cDNA markers into a composite P. taeda genetic map constructed from two reference mapping pedigrees. A dense genetic map that incorporates SSR loci will benefit complete pine genome sequencing, pine population genetics studies, and pine breeding programs. Careful marker annotation using a variety of references further enhances the utility of the integrated SSR map.
The updated P. taeda genetic map, with an estimated genome coverage of 1,515 cM(Kosambi) across 12 linkage groups, incorporated 170 new SSR markers and 290 previously reported SSR, RFLP, and ESTP markers. The average marker interval was 3.1 cM. Of 233 mapped SSR loci, 84 were from cDNA-derived sequences (EST-SSRs) and 149 were from non-transcribed genomic sequences (genomic-SSRs). Of all 311 mapped cDNA-derived markers, 77% were associated with NCBI Pta UniGene clusters, 67% with RefSeq proteins, and 62% with functional Gene Ontology (GO) terms. Duplicate (i.e., redundant accessory) and paralogous markers were tentatively identified by evaluating marker sequences by their UniGene cluster IDs, clone IDs, and relative map positions. The average gene diversity, He, among polymorphic SSR loci, including those that were not mapped, was 0.43 for 94 EST-SSRs and 0.72 for 83 genomic-SSRs. The genetic map can be viewed and queried at
Many polymorphic and genetically mapped SSR markers are now available for use in P. taeda population genetics, studies of adaptive traits, and various germplasm management applications. Annotating mapped genes with UniGene clusters and GO terms allowed assessment of redundant and paralogous EST markers and further improved the quality and utility of the genetic map for P. taeda.
PMCID: PMC3038140  PMID: 21269494

