|Home | About | Journals | Submit | Contact Us | Français|
Pea aphids represent a complex genetic system that could be used for QTL analysis, genetic diversity and population genetics studies. Here, we described the development of first microsatellite repeat database of the pea aphid (APMicroDB), accessible at “http://deepaklab.com/aphidmicrodb”. We identified 3,40,233 SSRs using MIcroSAtellite (MISA) tool that was distributed in 14,067 (out of 23,924) scaffold of the pea aphid. We observed 89.53% simple repeats of which 73.41% were mono-nucleotide, followed by di-nucleotide repeats. This database stored information about the repeats kind, GC content, motif type (mono - hexa), genomic location etc. We have also incorporated the primer information derived from Primer3 software of the 250bp flanking region of the identified marker. Blast tool is also provided for searching the user query sequence for identified marker and their primers. This work has an immense use for scientific community working in the field of agricultural pest management, QTL mapping, and host-pathogen interaction analysis.
Simple Sequence Repeats (SSRs) also known as Microsatellites, are the extensively dispersed short tandem repeat units harbor substantial length variation , . A major proportion of eukaryotic genomes (up to 4%) are composed of these markers. Despite their presence in both coding and non-coding region, high abundance was only observed in the non-coding region of the genome , . Previous studies suggested that short tandem repeats (STRs) are under the selective pressure that played an important role in genome structure and evolution , , .
SSRs offers several advantages such as their distribution, specificity, and reproducibility, therefore, they were extensively employed in population genetics , , genetic diversity , , ,  and evolution , . Based on the origin, SSRs has been classified into two types: 1) genomic SSRs (that derived from genome), and 2) EST-SSRs (that comes from expressed sequence tags) , . EST-based SSRs were originated from transcribed region which is more conserved as compared to genomic SSRs , . Therefore, genomic SSRs are highly polymorphic and fitted for genetic diversity studies within a particular species.
The present study is focused on the identification of SSRs from the genome of A. pisum. Pea aphids (Acyrthosiphon pisum) are the phloem-feeding insects having several advantages over other aphid species . Association of pea aphid with more than 20 legume genera represents their host race specific evolution. Each race is more or less specialized and genetically differentiated from other host races , . To reveals the host-pathogen relationship, it is important to understand the genomic architecture of aphid genome. Hence, the international aphid genome consortium first time reported the draft genome of the pea aphid of size 464 Mb. Initially, ~ 3.13 million reads were assembled into 72,844 contigs using Atlas assembly pipeline. However, in the second version, the number of contigs was reduced to 60,596 with the N50 length of around 28 kb. Previously, only few studies have been reported to experimentally characterize the microsatellite markers in pea aphid , , . However, the wet-lab characterization is very tedious and time-consuming job. Therefore, researchers paved the attention for in silico identification of SSRs in the aphid genome , . For e.g. Behura et al. reported 1,69,601 and 4283 microsatellite repeats in whole genome and coding region of A. pisum respectively. Based on the identified SSRs, few insect specific databases such as InSatDb, EuMicrosatdb etc. has been developed in the past , . Best of the author knowledge, no publicly accessible database of SSRs has been reported for the pea aphid. Owing to the importance of microsatellite, and pea aphid as model insect species, the foremost purpose of this manuscript is to discover the abundance and distribution of SSRs in the pea aphid genome.
We have downloaded the pea aphid genome v2.0 from the NCBI database in FASTA format . The complete genome was scaffold-wise scan for the occurrence of microsatellite repeats using MIcroSAtellite (MISA) tool (http://pgrc.ipk-gatersleben.de/misa/). We used the PRIMER3 software to predict the primer of the identified microsatellite markers . For this, we extracted a flanking region of 250 bp of the repeats on both sides using bedtools . The custom PERL scripts were used to process the MISA output in CSV format. Finally, the file was uploaded into MySQL database. The front-end of the database was developed using HTML, PHP language, and JAVA scripts.
We analyzed the distribution of STRs across the scaffold and observed that simple microsatellite repeats represents 89.53% of the total STRs (Table 1). We also plotted the different motif repeats from mono- hexa to show their relative abundance in pea aphid genome. As evident from Fig. 1 and Table-S1, Mononucleotide type repeats (73.41%) was most abundant as compared to other types , . However, hexanucleotide repeats (0.03%) was the least ones (suppl-1.docx, Table-S1). Our analysis also supported the Katti et al. analysis that tri-nucleotide repeats have a maximum length 441 bp followed by dinucleotides (suppl-1.docx, Table-S1) . We also observed that STRs of length up to 15 bp represents the major proportion in the genome followed by length 16–20 (Fig. 2). However, the motif of length 46–50 bp was represented by only 0.13% (Fig. 2, Table-S2).
Previously, Kurokawa et al. reported six microsatellite markers in pea aphid using experimental approach . In the same year, Caillaud et al. reported fifteen markers from pea aphids . In order to validate this, we used the FASTA sequence of reported marker and search in our database using blast tool. We observed that 76% of the markers were partially or completely matched with our database (Table 2). Out of the 15 markers, we found six were exactly matched, and seven markers matched with repeat kind but their copy number has been changed. This might be because the assembly of pea aphid genome is only available at preliminary scaffold level but not at the chromosome level.
We provided the scaffold wise search option for STRs along with the marker properties such as the type of motif, repeat kind etc. Furthermore, we have also given the advanced search option to filter the results based on the scaffold region, copy number of the marker, and GC content. This will be helpful to the user interested in locating the marker in the given genomic region of the genome, which may be coding or non-coding. The search result is shown in a well-organized tabular format with an additional button for extracting primer information of a particular SSR (Fig. 3). On clicking the show primer button, users will get the information about the primers (250 bp flanking region of marker) and their properties.
A customized BLAST tool is implemented in this database for similarity search. The user input query sequence will be searched against the database of repeats containing flanking region. A user-friendly search option for e-value cut off, query coverage and a number of hits to be displayed is provided in the blast search. The identified hit is further linked with the primer information of the identified hits (Fig. 4).
Here, we reported the mining of 3,40,233 microsatellite markers, which is almost double that are reported by Behura and Severson . The percentage of mono- was higher followed by di-, tri-, tetra, penta, and hexa-nucleotide repeats respectively. A similar trend was observed by Sharma et al. supporting the fact that an increase in repeat length is proportional with the decrease in repeat numbers . The distribution of repeat length showed a good coverage in the range of 11–15 bp long repeats. However, low coverage (0.13%) was observed in the case of repeats of length 46–50 bp. In 2001, Katti et al. observed that tri-nucleotide repeat seems to be much longer as compared to other repeats in Drosophila . This is highly correlated with our study of pea aphid that belongs to the same phylum. A significant correlation with the previously identified marker suggests the application of this database. Despite the improvement in pea aphid assembly from version1.0 to version 2.0 still the assembly existed at the scaffold level. This indicates a gap in the knowledge of SSR markers in pea aphids and suggested that there must be a much more SSRs marker that could only be resolved only at the chromosome level.
APMicroDB will be regularly maintained by our team. We will welcome any scientific suggestion from the readers via. ‘Contact’ link on the database. In future, we will upgrade the database whenever the new assembly from different strain/race of pea aphid will be reported. The update will be helpful in study species-specific primer and establish an evolutionary relationship.
STRs are the most extensively studied marker having wide application in genetic diversity, evolution, and genome mapping. Despite the great importance of microsatellite makers, no database exists to store and compiles the genome-wide information of SSR markers from pea aphid. Therefore, in the present work, an effort has been made to develop first whole genome based SSRs database of pea aphid that will be useful in phylogenetic analysis, and evolutionary insight on pea aphid.
The authors declare that they have no competing interests.
The author is thankful to Mr. Amit Pandey for their help in database designing and also thankful to ICAR-IASRI for providing RA support. No separate funding is provided for publication of this article.
Appendix ASupplementary data to this article can be found online at http://dx.doi.org/10.1016/j.gdata.2017.03.014.