Search tips
Search criteria 


Logo of gdataGuide for AuthorsAboutExplore this JournalGenomics Data
Genom Data. 2017 June; 12: 111–115.
Published online 2017 March 30. doi:  10.1016/j.gdata.2017.03.014
PMCID: PMC5384296

APMicroDB: A microsatellite database of Acyrthosiphon pisum


Pea aphids represent a complex genetic system that could be used for QTL analysis, genetic diversity and population genetics studies. Here, we described the development of first microsatellite repeat database of the pea aphid (APMicroDB), accessible at “”. We identified 3,40,233 SSRs using MIcroSAtellite (MISA) tool that was distributed in 14,067 (out of 23,924) scaffold of the pea aphid. We observed 89.53% simple repeats of which 73.41% were mono-nucleotide, followed by di-nucleotide repeats. This database stored information about the repeats kind, GC content, motif type (mono - hexa), genomic location etc. We have also incorporated the primer information derived from Primer3 software of the 250bp flanking region of the identified marker. Blast tool is also provided for searching the user query sequence for identified marker and their primers. This work has an immense use for scientific community working in the field of agricultural pest management, QTL mapping, and host-pathogen interaction analysis.

1. Introduction

Simple Sequence Repeats (SSRs) also known as Microsatellites, are the extensively dispersed short tandem repeat units harbor substantial length variation [1], [2]. A major proportion of eukaryotic genomes (up to 4%) are composed of these markers. Despite their presence in both coding and non-coding region, high abundance was only observed in the non-coding region of the genome [3], [4]. Previous studies suggested that short tandem repeats (STRs) are under the selective pressure that played an important role in genome structure and evolution [5], [6], [7].

SSRs offers several advantages such as their distribution, specificity, and reproducibility, therefore, they were extensively employed in population genetics [8], [9], genetic diversity [10], [11], [12], [13] and evolution [14], [15]. Based on the origin, SSRs has been classified into two types: 1) genomic SSRs (that derived from genome), and 2) EST-SSRs (that comes from expressed sequence tags) [10], [14]. EST-based SSRs were originated from transcribed region which is more conserved as compared to genomic SSRs [16], [17]. Therefore, genomic SSRs are highly polymorphic and fitted for genetic diversity studies within a particular species.

The present study is focused on the identification of SSRs from the genome of A. pisum. Pea aphids (Acyrthosiphon pisum) are the phloem-feeding insects having several advantages over other aphid species [18]. Association of pea aphid with more than 20 legume genera represents their host race specific evolution. Each race is more or less specialized and genetically differentiated from other host races [19], [20]. To reveals the host-pathogen relationship, it is important to understand the genomic architecture of aphid genome. Hence, the international aphid genome consortium first time reported the draft genome of the pea aphid of size 464 Mb. Initially, ~ 3.13 million reads were assembled into 72,844 contigs using Atlas assembly pipeline. However, in the second version, the number of contigs was reduced to 60,596 with the N50 length of around 28 kb. Previously, only few studies have been reported to experimentally characterize the microsatellite markers in pea aphid [21], [22], [23]. However, the wet-lab characterization is very tedious and time-consuming job. Therefore, researchers paved the attention for in silico identification of SSRs in the aphid genome [2], [24]. For e.g. Behura et al. reported 1,69,601 and 4283 microsatellite repeats in whole genome and coding region of A. pisum respectively. Based on the identified SSRs, few insect specific databases such as InSatDb, EuMicrosatdb etc. has been developed in the past [25], [26]. Best of the author knowledge, no publicly accessible database of SSRs has been reported for the pea aphid. Owing to the importance of microsatellite, and pea aphid as model insect species, the foremost purpose of this manuscript is to discover the abundance and distribution of SSRs in the pea aphid genome.

2. Database development

2.1. Database construction and architecture

We have downloaded the pea aphid genome v2.0 from the NCBI database in FASTA format [27]. The complete genome was scaffold-wise scan for the occurrence of microsatellite repeats using MIcroSAtellite (MISA) tool ( We used the PRIMER3 software to predict the primer of the identified microsatellite markers [28]. For this, we extracted a flanking region of 250 bp of the repeats on both sides using bedtools [29]. The custom PERL scripts were used to process the MISA output in CSV format. Finally, the file was uploaded into MySQL database. The front-end of the database was developed using HTML, PHP language, and JAVA scripts.

2.2. Genome analysis

We analyzed the distribution of STRs across the scaffold and observed that simple microsatellite repeats represents 89.53% of the total STRs (Table 1). We also plotted the different motif repeats from mono- hexa to show their relative abundance in pea aphid genome. As evident from Fig. 1 and Table-S1, Mononucleotide type repeats (73.41%) was most abundant as compared to other types [30], [31]. However, hexanucleotide repeats (0.03%) was the least ones (suppl-1.docx, Table-S1). Our analysis also supported the Katti et al. analysis that tri-nucleotide repeats have a maximum length 441 bp followed by dinucleotides (suppl-1.docx, Table-S1) [32]. We also observed that STRs of length up to 15 bp represents the major proportion in the genome followed by length 16–20 (Fig. 2). However, the motif of length 46–50 bp was represented by only 0.13% (Fig. 2, Table-S2).

Fig. 1
Histogram plot of SSRs with the type of repeats in the x-axis and their percentage in the y-axis.
Fig. 2
Pie chart showing the percent distribution of microsatellite repeats within different length ranges.
Table 1
Overall distribution of SSRs and their percentage in pea aphid genome.

2.3. STR validation

Previously, Kurokawa et al. reported six microsatellite markers in pea aphid using experimental approach [21]. In the same year, Caillaud et al. reported fifteen markers from pea aphids [22]. In order to validate this, we used the FASTA sequence of reported marker and search in our database using blast tool. We observed that 76% of the markers were partially or completely matched with our database (Table 2). Out of the 15 markers, we found six were exactly matched, and seven markers matched with repeat kind but their copy number has been changed. This might be because the assembly of pea aphid genome is only available at preliminary scaffold level but not at the chromosome level.

Table 2
Validation of previously identified STRs with APMicroDB.

3. Utility

3.1. Search

We provided the scaffold wise search option for STRs along with the marker properties such as the type of motif, repeat kind etc. Furthermore, we have also given the advanced search option to filter the results based on the scaffold region, copy number of the marker, and GC content. This will be helpful to the user interested in locating the marker in the given genomic region of the genome, which may be coding or non-coding. The search result is shown in a well-organized tabular format with an additional button for extracting primer information of a particular SSR (Fig. 3). On clicking the show primer button, users will get the information about the primers (250 bp flanking region of marker) and their properties.

Fig. 3
Showing the database search page and its results along with primer information.

3.2. Web tool

A customized BLAST tool is implemented in this database for similarity search. The user input query sequence will be searched against the database of repeats containing flanking region. A user-friendly search option for e-value cut off, query coverage and a number of hits to be displayed is provided in the blast search. The identified hit is further linked with the primer information of the identified hits (Fig. 4).

Fig. 4
The overall flow of user Blast query, and its link to database and primers.

4. Discussion

Here, we reported the mining of 3,40,233 microsatellite markers, which is almost double that are reported by Behura and Severson [24]. The percentage of mono- was higher followed by di-, tri-, tetra, penta, and hexa-nucleotide repeats respectively. A similar trend was observed by Sharma et al. supporting the fact that an increase in repeat length is proportional with the decrease in repeat numbers [31]. The distribution of repeat length showed a good coverage in the range of 11–15 bp long repeats. However, low coverage (0.13%) was observed in the case of repeats of length 46–50 bp. In 2001, Katti et al. observed that tri-nucleotide repeat seems to be much longer as compared to other repeats in Drosophila [32]. This is highly correlated with our study of pea aphid that belongs to the same phylum. A significant correlation with the previously identified marker suggests the application of this database. Despite the improvement in pea aphid assembly from version1.0 to version 2.0 still the assembly existed at the scaffold level. This indicates a gap in the knowledge of SSR markers in pea aphids and suggested that there must be a much more SSRs marker that could only be resolved only at the chromosome level.

5. Data maintenance

APMicroDB will be regularly maintained by our team. We will welcome any scientific suggestion from the readers via. ‘Contact’ link on the database. In future, we will upgrade the database whenever the new assembly from different strain/race of pea aphid will be reported. The update will be helpful in study species-specific primer and establish an evolutionary relationship.

6. Conclusion

STRs are the most extensively studied marker having wide application in genetic diversity, evolution, and genome mapping. Despite the great importance of microsatellite makers, no database exists to store and compiles the genome-wide information of SSR markers from pea aphid. Therefore, in the present work, an effort has been made to develop first whole genome based SSRs database of pea aphid that will be useful in phylogenetic analysis, and evolutionary insight on pea aphid.

Competing interests

The authors declare that they have no competing interests.


The author is thankful to Mr. Amit Pandey for their help in database designing and also thankful to ICAR-IASRI for providing RA support. No separate funding is provided for publication of this article.


Simple Sequence Repeats
Short tandem repeat
base pair
Hyper Text Markup Language
Hypertext Preprocessor


Appendix ASupplementary data to this article can be found online at

Appendix A. Supplementary data

Supplementary tables

Click here to view.(8.6K, docx)Image 1


1. Jun T.-H., Michel A.P., Mian M.A.R. Development of soybean aphid genomic SSR markers using next generation sequencing. Genome. May 2011;54(5):360–367. [PubMed]
2. Behura S.K., Severson D.W. Association of microsatellite pairs with segmental duplications in insect genomes. BMC Genomics. Jan 2013;14(1):907. [PubMed]
3. Tóth G., Gáspári Z., Jurka J. Microsatellites in different eukaryotic genomes: Survey and analysis. Genome Res. Jul. 2000;10(7):967–981. [PubMed]
4. Hancock J.M. Simple sequences and the expanding genome. Bioessays. May 1996;18(5):421–425. [PubMed]
5. Ellegren H. Microsatellites: Simple sequences with complex evolution. Nat. Rev. Genet. Jun. 2004;5(6):435–445. [PubMed]
6. Kashi Y., King D.G. Simple sequence repeats as advantageous mutators in evolution. Trends Genet. May 2006;22(5):253–259. [PubMed]
7. Behura S.K. Molecular marker systems in insects: current trends and future avenues. Mol. Ecol. Oct 2006;15(11):3087–3113. [PubMed]
8. Kim K.S., Sappington T.W. Microsatellite data analysis for population genetics. Methods Mol. Biol. Jan 2013;1006:271–295. [PubMed]
9. Putman A.I., Carbone I. Challenges in analysis and interpretation of microsatellite data for population genetic studies. Ecol. Evol. Oct 2014;4(22) (p. n/a–n/a) [PMC free article] [PubMed]
10. Jing S., Liu B., Peng L., Peng X., Zhu L., Fu Q., He G. Development and use of EST-SSR markers for assessing genetic diversity in the brown planthopper (Nilaparvata lugens Stål) Bull. Entomol. Res. Feb 2012;102(1):113–122. [PubMed]
11. Fontes F. von H.M., Colombo C.A., Lourenção A.L. Structure of genetic diversity of Bemisia tabaci (Genn.) (Hemiptera: Aleyrodidae) populations in Brazilian crops and locations. Sci. Agric. Feb 2012;69(1):47–53.
12. Arunkumar K.P., Sahu A.K., Mohanty A.R., Awasthi A.K., Pradeep A.R., Urs S.R., Nagaraju J. Genetic diversity and population structure of Indian golden silkmoth (Antheraea assama) PLoS One. Jan 2012;7(8):e43716. [PubMed]
13. Mahon A.R., Arango C.P., Halanych K.M. Genetic diversity of Nymphon (Arthropoda: Pycnogonida: Nymphonidae) along the Antarctic Peninsula with a focus on Nymphon australe Hodgson 1902. Mar. Biol. Jul 2008;155(3):315–323.
14. Kim K.S., Ratcliffe S.T., French B.W., Liu L., Sappington T.W. Utility of EST-derived SSRs as population genetics markers in a beetle. J. Hered. Jan 2008;99(2):112–124. [PubMed]
15. Stolle E., Kidner J.H., Moritz R.F.A. Patterns of evolutionary conservation of microsatellites (SSRs) suggest a faster rate of genome evolution in Hymenoptera than in Diptera. Genome Biol. Evol. Jan 2013;5(1):151–162. [PubMed]
16. Brunet B.M.T., Doucet D., Sturtevant B.R., Sperling F.A.H. Characterization of EST-based SSR loci in the spruce budworm, Choristoneura fumiferana (Lepidoptera: Tortricidae) Conserv. Genet. Resour. Jan 2013;5(2):541–544.
17. Shiferaw E., Pè M.E., Porceddu E., Ponnaiah M. Exploring the genetic diversity of Ethiopian grass pea (Lathyrus sativus L.) using EST-SSR markers. Mol. Breed. Aug 2012;30(2):789–797. [PubMed]
18. Will T., Furch A.C.U., Zimmermann M.R. How phloem-feeding insects face the challenge of phloem-located defenses. Front. Plant Sci. Jan 2013;4:336. [PubMed]
19. Peccoud J., Ollivier A., Plantegenest M., Simon J.-C. A continuum of genetic divergence from sympatric host races to species in the pea aphid complex. Proc. Natl. Acad. Sci. U. S. A. May 2009;106(18):7495–7500. [PubMed]
20. Stavrinides J., McCloskey J.K., Ochman H. Pea aphid as both host and vector for the phytopathogenic bacterium Pseudomonas syringae. Appl. Environ. Microbiol. Apr 2009;75(7):2230–2235. [PubMed]
21. Kurokawa T., Yao I., Akimoto S.-I., Hasegawa E. Isolation of six microsatellite markers from the pea aphid, Acyrthosiphon pisum (Homoptera, Aphididae) Mol. Ecol. Notes. Sep 2004;4(3):523–524.
22. Caillaud M.C., Mondor-Genson G., Levine-Wilkinson S., Mieuzet L., Frantz A., Simon J.C., Coeur D'acier A. Microsatellite DNA markers for the pea aphid Acyrthosiphon pisum. Mol. Ecol. Notes. Sep 2004;4(3):446–448.
23. Weng Y., Azhaguvel P., Michels G.J., Rudd J.C. Cross-species transferability of microsatellite markers from six aphid (Hemiptera: Aphididae) species and their use for evaluating biotypic diversity in two cereal aphids. Insect Mol. Biol. Oct 2007;16(5):613–622. [PubMed]
24. Behura S.K., Severson D.W. Genome-wide comparative analysis of simple sequence coding repeats among 25 insect species. Gene. Aug 2012;504(2):226–232. [PubMed]
25. Archak S., Meduri E., Kumar P.S., Nagaraju J. InSatDb: a microsatellite database of fully sequenced insect genomes. Nucleic Acids Res. Jan 2007;35(Database issue):D36–D39. [PubMed]
26. Aishwarya V., Grover A., Sharma P.C. EuMicroSatdb: a database for microsatellites in the sequenced genomes of eukaryotes. BMC Genomics. Jan 2007;8(1):225. [PubMed]
27. Genome sequence of the pea aphid Acyrthosiphon pisumPLoS Biol. Feb 2010;8(2):e1000313. [PubMed]
28. Untergasser A., Cutcutache I., Koressaar T., Ye J., Faircloth B.C., Remm M., Rozen S.G. Primer3—new capabilities and interfaces. Nucleic Acids Res. Aug 2012;40(15):e115. [PubMed]
29. Quinlan A.R., Hall I.M. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics. Mar. 2010;26(6):841–842. [PubMed]
30. Temnykh S., DeClerck G., Lukashova A., Lipovich L., Cartinhour S., McCouch S. Computational and experimental analysis of microsatellites in rice (Oryza sativa L.): frequency, length variation, transposon associations, and genetic marker potential. Genome Res. Aug. 2001;11(8):1441–1452. [PubMed]
31. Sharma P.C., Grover A., Kahl G. Mining microsatellites in eukaryotic genomes. Trends Biotechnol. Nov. 2007;25(11):490–498. [PubMed]
32. Katti M.V., Ranjekar P.K., Gupta V.S. Differential distribution of simple sequence repeats in eukaryotic genome sequences. Mol. Biol. Evol. Jul. 2001;18(7):1161–1167. [PubMed]

Articles from Genomics Data are provided here courtesy of Elsevier