|Home | About | Journals | Submit | Contact Us | Français|
Challenged by population increase, climatic change, and soil deterioration, crop improvement is always a priority in securing food supplies. Although the production of grain legumes is in general lower than that of cereals, the nutritional value of grain legumes make them important components of food security. Nevertheless, limited by severe genetic bottlenecks during domestication and human selection, grain legumes, like other crops, have suffered from a loss of genetic diversity which is essential for providing genetic materials for crop improvement programs. Illustrated by whole-genome-sequencing, wild relatives of crops adapted to various environments were shown to maintain high genetic diversity. In this review, we focused on nine important grain legumes (soybean, peanut, pea, chickpea, common bean, lentil, cowpea, lupin, and pigeonpea) to discuss the potential uses of their wild relatives as genetic resources for crop breeding and improvement, and summarized the various genetic/genomic approaches adopted for these purposes.
Wild plants have been domesticated for thousands of years since the beginning of human civilization, as a means to ensure a stable food supply. Through plant breeding activities over the centuries, crop plants have been manipulated to develop new and desirable traits . The artificial selection processes based on phenotypes (Appendix A) drove the development of new varieties with desirable features and are considered the most ancient form of plant breeding. Over time, these new species or varieties have become genetically diverged from their original progenitors.
Challenged by the demand of the ever-increasing global population , the negative effects of mono-cropping and climate change , there is constant need for crop improvement. Furthermore, the genetic diversity of crops is generally low due to a strong bottleneck effect during domestication and artificial selection, hence limiting the potential for crop improvement .
Wild relatives are potential genetic resources for crop improvement [5,6,7], as well as for exploring new or alternative production systems. The rationale is straightforward: wild populations must contain higher genetic variability as they were propagated in a wide range of habitats without human selection . Just to demonstrate this point, desirable traits such as biotic and abiotic stress resistances and special nutritional values important for crop improvement can be found in some of the wild relatives [9,10]. On the other hand, since the genetic modification of food crops is still controversial among the public, it is more acceptable to introduce genetic materials from wild relatives (of the same or closely related species) into crop varieties through breeding, hybridization or some other techniques  to generate improved crops. Although the use of wild relatives as sources of new alleles (Appendix A) is challenging due mainly to the linkage drag (Appendix A), advances in genetic and genomic researches of crop plants and their wild relatives have expanded our understanding on complex traits and led to the discovery of new genes.
The idea of generating exotic genetic libraries in order to accelerate plant breeding is further supported by the development of genomic and genetic tools in the post genomic era . Nonetheless, many genetic populations generated over the years have not been further genotyped or phenotyped under different environmental conditions. It is also important to point out that wild genetic materials are not always available, and therefore establishing formal holding institutions for each crop worldwide, together with constant selection and conservation of this biodiversity, should be a priority. Two recent review papers have summarized the main repository institutions in the world for grain legumes as well as the number of introductions and collections of both cultivated and wild pulses [13,14].
Molecular plant breeding has been hailed as the foundation of 21st-century crop improvement and it integrates traditional plant breeding practices, molecular markers, genomic research and biotechnology . Undoubtedly, the development of tools and strategies over the past 30 years has contributed multi-dimensionally to crop improvement and the information generated is huge and complex. However, one should note that the success of any breeding program for crop improvement depends on the plant material, and specifically on its variability for key traits.
Grain legumes refer to legumes the seeds of which are harvested for consumption. Commonly known grain legumes include soybean, peanut, pea, chickpea, common bean, lentil, cowpea, lupin and pigeonpea. The current agronomic challenge is the generally lower yield in grain legumes compared to cereals, together with the difference between their role as a diet staple for certain populations and the geographical locations where they are grown . However, unlike cereals, most of the grain legumes are good sources of protein which make them good substitutes for animal proteins. The seeds of grain legumes are also sources of edible oils and other compounds of high nutraceutical values . Another unique feature of legumes is that they are able to interact with soil rhizobium to fix atmospheric nitrogen. Such a mutualistic relationship provides nitrogen to support the growth of the legume plant and helps replenish soil fertility, thus minimizing the need for inorganic nitrogen fertilizers. As a result, legume is usually used in crop rotation or intercropping practices to improve diversity, quality and sustainability in traditional food production systems, providing an excellent solution for increasing agro-ecosystem services.
In this review, we will focus our discussions on grain legumes as 2016 has been declared the International Year of Pulses by the United Nations. We will include a brief history of their domestication, a description on the impacts of genome sequencing on the studies of wild relatives through the generation of genetic maps, the use of the new strategy of genotyping-by-sequencing (GBS) and the search and introgression of wild alleles in breeding programs. Finally, we will discuss our perspectives on the roles of wild relatives in the new challenges that arise in agriculture.
Grain legumes were planted as companion crops of wheat and barley when agriculture began in the Near East [17,18], while some other important grain legumes have their origins of domestication in Asia and the New World. The domestication of plants was generally associated with centers of cultural diversity along with fascinating relationships between ancient human settlements and particular phytogeographic characteristics. It is difficult to trace back the history of domestication, but today a significant amount of information generated by agronomists, biologists, anthropologists and historians has been made available to support the community hypothesis on the centers of origin and domestication. These descriptions painted a complicated picture of the domestication process through the associations with different ancient civilizations, such as how it was done before and after the existence of seed exchanges for cultivation, natural or intentional crossbreeding with wild relatives in different parts of the world, etc. In this section, we will summarize the existing information about the centers of origin and domestication of grain legumes analyzed in this review. This information allows us to grasp their importance in ancient times, understand their history of domestication and identify potential geographical locations of the divergence from their wild ancestors for future improvement. In addition, the concept of “gene pool” proposed by Harlan and de Wet (1971)  is particularly important for using wild relatives for crop improvement. The classification of crop related species is not based on formal taxonomy, but on gene pools with different levels (possibilities) of crosses. Primary, secondary and tertiary “gene pools” refer to subspecies or species that can, respectively: (i) freely cross with crops to produce fertile hybrids; (ii) cross with crops to produce a degree of fertile hybrids; and (iii) cross with crops only by using special approaches such as chromosome doubling, embryos rescue, tissue culture, etc.
Soybean (Glycine max L.): Soybean is one of the oldest crops . The cultivated soybean was domesticated from an endemic wild species in China, Glycine soja Siebold & Zucc., probably 6000–9000 years ago . Theodore Hymowitz, an eminent researcher in the history of soybean, has suggested that it is unlikely we will ever know the exact time when soybean cultivation began, and, based on early bronze inscriptions, domestication may have occurred during the Shang Dynasty (1500–1100 B.C.) . There is evidence that soybean appeared to be domesticated during Zhou dynasty in northeastern China , corresponding to Vavilov’s Chinese–Japanese center . The earliest documented evidence of Glycine spp. use by humans came from a Neolithic site 7800–9000 years ago in Jiahu, Henan Province, where charred remains of soybean were recovered .
Peanut (Arachis hypogaea L.): The center of origin of Arachis spp. is South America, with wild species found in Bolivia, Brazil, Paraguay, Argentina and Uruguay. The oldest archaeological records of A. hypogaea came from Huarmey Valley, Peru, dating from approximately 3500–4500 years ago [23,24], although there is evidence suggesting that peanut could have originated in northern Argentina and eastern Bolivia . There are also records of peanut use by ancient people in Ñanchoc Valley in northern Peru approximately 7840 years ago, although these records may correspond to the use of wild species or the very early stage of domestication . Archaeological together with genetic evidence suggests that A. montícola Krapov. & Rigoni is the tetraploid wild ancestor from which the peanut was domesticated [26,27], in a complex scenario which involved natural evolution and human domestication of diploid species distributed in Argentina and Bolivia . Important evidence suggests that two diploid wild species, Arachis duranensis Krapov. & W.C.Greg. (AA) and Arachis ipaensis Krapov. & W.C.Greg. (BB), are the progenitors of cultivated peanut. A single hybridization event between the two progenitors followed by genome duplication about 3500 years ago led to the origin of cultivated peanut [28,29].
Pea (Pisum sativum): Pisum sativum L. has its origin and domestication in the Mediterranean, primarily in the Middle East, about 10,000 years ago [30,31]. There are currently thousands of pea varieties from hundreds of years of selection and breeding. The natural growing range of the wild representatives of P. sativum (Pisum elatius M. Bieb. and Pisum humile Boiss. & Noë) extends from Iran and Turkmenistan through Anterior Asia, northern Africa and southern Europe. However, it is difficult to precisely locate the diversity center given the early domestication and the diverse areas of cultivation . Three pea types are currently recognized: (i) Pisum sativum, which extends from Iran and Turkmenistan through Anterior Asia, northern Africa and southern Europe; (ii) P. fulvum Sibth. & Sm., which is found in Jordan, Syria, Lebanon and Israel; and (iii) P. abyssinicum A. Braun, which is found from Yemen to Ethiopia . It was suggested that both P. sativum and P. fulvum were domesticated in the Near East about 11,000 years ago from an extinct ancestor of Pisum spp., and P. abyssinicum was developed from P. sativum independently in Old Kingdom or Middle Kingdom Egypt about 4000–5000 years ago .
Chickpea (Cicer arietinum L.): Chickpea was domesticated from the wild progenitor, C. reticulatum Ladiz., known from southeastern Turkey and adjacent Syria about 11,000 years ago [35,36]. It is interesting that, unlike other legumes, the center of origin of the wild progenitor is confined to a small specific area. Domesticated chickpeas have been found in archaeological sites corresponding to the Pre-Pottery Neolithic period and is one of the “founder crops package” that gave rise to farming . Two main chickpea varieties are currently cultivated worldwide: the small-seeded desi and the larger-seeded kabuli .
Common bean (Phaseolus vulgaris L.): Most Phaseolus species of small seed and leaf were originated from Mesoamerican [38,39,40]. Common bean was domesticated in Mesoamerica and in the Andes about 5000 years ago and an important number of investigations have revealed and followed the complex evolutionary and domestication history of the different genetic pools [41,42,43]. Interestingly, there is a wide variety of common bean, all belonging to the same species and are classified as “landraces” with a fascinating diversity of seed sizes, shapes and colors. They are the results of a complex and magnificent history of domestication and selection. While P. vulgaris is the most economically important species of Phaseolus, there are other species of the genus that have been domesticated: P. lunatus L., P. dumosus Macfad., P. coccineus L. and P. acutifolius A. Gray.
Lentil (Lens culinaris Medik.): Lentil is one of the oldest crops cultivated and domesticated by humans on the planet and has been recovered from archeological sites dating from the Neolithic period . Lentils were domesticated in the Near East in an area called “the cradle of agriculture” about 11,000 B.C.  from the wild progenitor, Lens culinaris subsp. orientalis (Boiss.) Ponert.
Cowpea (Vigna unguiculata [L.] Walp.): Cowpea is widely cultivated in the semiarid and sub-humid zones of Africa and Asia as one of the most important food for sub-Saharan populations, adaptable to marginal and changing environments . With sparse evidence, the history of cowpea domestication remains to be elucidated, complicated by the diverse morphology and widespread distribution of the wild species . Central-southern Africa seems to be the center of origin of cowpea, and west Africa and India are the first and second most probable centers of domestication respectively [47,48]. The earliest archaeological evidence of cowpea cultivation in Africa dated from 1830–1595 B.C. . The wild progenitor of cowpea is V. unguiculata var. spontanea (formerly var. dekindtiana).
Lupin or lupini bean (Lupinus L.): The Mediterranean region and the American continent are two centers of wild lupin and of domestication. It was probably introduced into cultivation in the Old World in ancient Greece. The earliest archaeological evidence of lupin dated from 2000 B.C. in the tombs of Egyptian Pharaohs where domesticated seeds were discovered. Andean pearl lupin (L. mutabilis) was domesticated in 6th–7th Century B.C. in America, by a pre-Incan culture in what is modern-day Peru. L. albus L. (white lupin), L. luteus L. (yellow lupin), L. angustifolius L. (narrow-leafed lupin), L. mutabilis Sweet (pearl lupin) and L. polyphyllus Lindl. (multifoliate or Washington lupin) are currently under widespread cultivation for many different proposes .
Pigeonpea (Cajanus cajan [L.] Millsp): Populations of the wild progenitor, Cajanus cajanifolius (Haines) Maesen, have been identified in eastern Peninsular India alongside a diverse group of other Cajanus species. C. cajanifolius is rare today, probably due to habitat loss [50,51]. Archaeological evidence suggests that pigeonpea could have been domesticated during the middle of the 2nd Millennium B.C. by settlements in Orissa, close to the areas where the wild species grew (Cajanus cajanifolius Gopalpur and Golbai) . Currently, pigeonpea is widely cultivated in all tropical and semitropical regions.
Loss in genetic diversity due to the founder effect and domestication syndrome are two main characteristics of cultivated crops. Domestication syndrome is defined as all the morpho-physiological modifications that make the cultivated crops different from their wild ancestors, conferring adaptability for agriculture [43,52]. Some of these include changes in growth habits, seed dispersal mechanisms, loss of germination inhibition, etc. These changes seem to have occurred in parallel in different regions of our planet . A conceptual framework has been proposed to distinguish between crucial domestication and crop evolution/diversification traits, i.e., between short episodes and long-term historical processes such as domestication . In this context, it is worth noting that archaeobotany has also characterized and contrasted different patterns of domestication, such as that between legumes and non-legume crops. Regarding the seed size, grain legumes do not show evidence of seed size increase with domestication, although selection pressure persisted for larger seeds in association with animal-drawn ploughs (or ards) . Others have proposed that the seeding depth by humans might have contributed in some circumstances to increasing the biomass of the seed, but this did not seem to have strong empirical support after testing . This shows how agriculture is an interaction between cultural behavior and management practices acting together on the available genetic diversity of plants.
Likewise, the scientific discussion about conscious or unconscious selections is a topic of great importance in relation to plant domestication. If cultivation practices and regimes were strong selection pressures during the domestication of crops, we should also study the desires and decisions of human beings . There is increasing evidence suggesting that humans have actively modified particular ecosystems to increase the availability of certain plant resources hundreds of years before the manifestation of the indicators of domestication .
No matter what the situations were, it is encouraging to consider the progress that has been made and what can be foreseen regarding the understanding of the spatio-temporal patterns of domestication, the speed at which it happened, intentionality versus serendipity, etc. . Finally, it is important to keep in mind that the plant domestication process is still occurring at present ; that it is not only a series of events from the past. There is still great potential yet, with the unprecedented development of selection tools that would allow us to produce more and higher quality food for our planet.
The release of the reference genome of the dicot model plant Arabidopsis thaliana in 2000  and the two rice genomes [62,63] marked the beginning of the age of plant genome sequencing. Nevertheless, the cost of sequencing a crop genome was barely affordable at that time. The advances in next-generation sequencing technologies have largely reduced the sequencing cost and labor required. It is expected that the sequencing cost can be reduced from several million to several thousand US dollars soon . This makes sequencing of crop genomes more accessible to researchers and hence more and more crop genomes are being sequenced in the hope of speeding up crop research. In recent years, in addition to sequencing crop genomes, efforts have also been made to sequence the genomes of their wild relatives (Table 1). The data would facilitate greater understanding of the evolution or domestication relationship between the crops and their wild relatives, while at the same time they would also provide a solid ground for the mining of important genetic resources from the wild species.
Comparative population genomic analyses have confirmed that wild species tend to have higher genetic diversities, making the wild relatives promising natural reservoirs of potential genes/alleles for crop improvement. Wild soybeans have been shown to have higher genetic diversities over cultivated soybeans . In addition to the overall genetic diversity, researchers have also uncovered specific gene sequences unique to wild soybeans that confer enhanced disease resistance and metabolic functions , which serve as good candidates for soybean improvement. In contrast, a study using wild soybeans in Korea also identified possible gene loss events in wild species . The discrepancy in these two studies may imply that the outcome of any comparative genomic study on wild germplasms really relies on the diversity of the wild collections. Qi et al. (2014)  conducted the de novo sequencing of a wild soybean, Glycine soja (G. soja) W05 helping to build a deeper understanding of the wild soybean genome and demonstrating the potential of its use for crop improvement. Li et al. (2014)  also published the de novo assembly of 7 wild and cultivated soybeans and provide a pan-genome analysis identifying lineage-specific genes, copy number variations and mutations that are eventually associated with positive human selection for certain agronomic traits, making this an important source of information regarding wild soybean genetic diversity. Using a complete re-sequencing approach, Zhou et al. (2015)  analyzed the genomic diversities of 302 lines of wild soybeans, landraces and improved varieties. They also characterized important genomic regions associated with domestication and improvement for important agronomic traits using genome wide association studies and discovered that some traits are closely associated with specific geographic regions.
In 2014, a high-quality reference genome of Andean (Peruvian) common bean (P. vulgaris) landrace (G19833) was published along with the pooled re-sequencing analysis of 30 wild individuals from the Mesoamerican and Andean populations. The results suggested that the wild Mesoamerican populations are more genetically diverse than those from the Andes. Both populations are also substantially different based on Fst values (fixation index) (Appendix A), and the divergence probably occurred ~165,000 years ago. An interesting contradiction occurred when comparing landraces and wild relatives within each of the two genetic pools. Landraces from Mesoamerica are less diverse than their wild counterparts while the Andean landrace populations are more diverse than their wild Andean relatives . Likewise, the recently published genome assembly of common bean BAT93 with a Mesoamerican origin, together with transcriptomic and phylogenetic analyses, suggests that most of the bean-specific gene family expansions predated the differentiation between Mesoamerican and Andean gene pools and consequently prior to domestication. The latter results suggest that pre-existing adaptations could contribute to the subsequent domestication process . However, a complete analysis of 577 accessions of common bean revealed the existence of several genetic groups and the presence of varying degrees of diversity in Mesoamerica and the Andes based on the genetic-spatial patterns of wild common bean. An interesting landscape genetics approach demonstrated that demographic processes and natural selection are correlated with the characterized genetic structure. This can be a source of potentially important genes associated with the adaptation to specific local environmental conditions .
The draft genome sequence of chickpea (Cicer arietinum CDC Frontier, a kabuli variety) published in 2013 together with the re-sequencing of 29 elite (17 desi and 12 kabuli) varieties allowed us to get a first glimpse at the genetic history of chickpea accessions . The results suggested that the genetic diversity in the desi group was slightly higher than in the kabuli group, but population structure, diversity and phylogenetic analyses showed a mixing of desi and kabuli genotypes (Appendix A) during breeding processes . In the same year, the draft genome sequence of chickpea (Cicer arietinum ICC4958, a desi variety) was also published  and the final version of this genome was released in 2015 . The evidence suggests that the kabuli-type chickpea was recently derived from the desi-type through artificial selection for increased seed size, from a small gene pool . This hypothesis is supported by results showing that the divergence of the two chickpea types occurred about 8000 years ago. However, an older initial divergence can also be detected about 160,000 to 250,000 years ago . Recently, the SNP (Appendix A)-genotyping of 93 wild and cultivated Cicer spp. accessions on a genome-wide scale revealed the natural allelic diversity, population genetic structure, phylogeny, etc., within a wider genetic pool . High intra- and inter-specific polymorphic potential (66%–85%) and broader natural allelic diversity (6%–64%) were described, suggesting a great potential for the discovery of new alleles of importance for specific geographical origin and phenotypic characteristics in the wild Cicer primary gene pool .
Peanut (Arachis hypogaea) is an allo-tetraploid (AABB-type genome; 2n = 4x = 40) apparently derived from a hybridization between two diploid species and further polyploidization [29,84]. A recent sequencing effort reported the genome sequences of its diploid ancestors, Arachis duranensis and Arachis ipaensis, carrying the A and B subgenomes. The information generated, together with empirical evidence by the crossing of these two species support the proposed hypothesis of the origin of the cultivated peanut . Armed with this genomic information on the diploid ancestors, the researchers were able to characterize some genomic regions and genes probably associated with disease resistance, and describe important genomic information such as gene evolution, DNA methylation, and transposons, thus laying the foundation for more in-depth peanut genomics studies.
Since the initiation of the pigeonpea genome project , a concerted international effort has been made to elucidate the first draft genome of pigeonpea. The cultivated pigeonpea genome information shows extensive synteny between pigeonpea and other legumes, including those belonging to different clades but lacking the recent genome duplication as that found in soybean . Population structure analyses using SNPs from 79 pigeonpea accessions and 31 wild relatives disclosed information about its domestication history and relationships with wild species. Evidence suggests that the recent gene flow between cultivated and non-cultivated forms occurred probably as a result of frequent cross-pollination between this diploid crop and its wild relatives. The gene pool of wild pigeonpea shows high genetic diversity and the presence of rare alleles is potentially important for crop improvement .
Sequencing the genomes of lentil, pea and cowpea are being undertaken by the scientific community. Lentil Genome v1.2 is available in a pre-release form . Pea genome data are now available at the Unité de Recherche Génomique Info (URGI; ). The elucidation of the cowpea genome is moving forward by the Cowpea Genomics Initiative (CGI; ) and important progress has been achieved to understand the genetic diversity among Vigna species, mainly by using traditional markers [90,91,92,93,94]. A recent work using genotyping-by-sequencing of globally cultivated cowpea genotypes (768 genotypes from 56 countries) revealed the worldwide distribution of genetic diversity and structures, suggesting the existence of three genetically well-differentiated populations associated with areas where the genotypes were collected, supporting the hypothesis of two areas of domestication: West and East Africa as the first and India as a sub-domestication region of cowpea . A previous report also showed extensive gene flows between wild and domesticated types .
Lupin is an interesting complex of species of the genus Lupinus, and its domestication and cultivation have origins in both the New and the Old World. There are more wild germplasms that the cultivated ones, and the main repository institutions are summarized in an excellent review . However, much work still needs to be done to research the genomic information for all lupin species. The lupin genomes vary widely in terms of chromosome numbers. The taxonomy has been confusing but it is being improved constantly [14,49,97,98,99]. Different morphological and ecological adaptations are associated with lupins from the New World versus those from the Old World . Given the current importance of L. angustifolius, a draft genome of the narrow-leafed cultivar Tanjil has recently been published, providing useful information for understanding the genetic basis in the genistoid clade of Papilionoideae legumes and for facilitating genomics-based breeding approaches .
Complete genome releases and constant genome quality improvement will undoubtedly accelerate the improvement of important traits of cultivated grain legumes [101,102]. The availability of genomic information on incompletely sequenced grain legumes is critical, considering that many of them are important staples in many poor countries. Therefore more time and money should be invested in increasing our knowledge of these neglected grain legumes, in order to facilitate the improvement of specific features suitable for very small family farmers and/or alternative production systems .
The genome information of both the crops and their wild relatives have also served as the foundation for studies such as comparative genomics, functional genomics (transcriptomics, proteomics and epigenomics), association mapping (Appendix A) and gene discovery. High-density markers generated by sequencing are important for precise marker-assisted selections (Appendix A)  of targeted beneficial genomic regions and the removal of undesirable regions carried over from the wild species. Furthermore, genome-wide high-density markers are also important for genomic selections  and the current unprecedented availability of technologies and genomic information should have a significantly positive impact on the quality, diversity and speed of breeding programs. The major breeding strategies of grain legumes, taking into consideration the genome size, ploidy, genome availability and number of accessions in the main holding institutes in the world, have been recently summarized [13,14].
In the following section, we will discuss the population approaches for dissecting the genetic variabilities and looking for important genomic regions associated with traits for nine grain legumes, especially focusing on the use of wild relatives as potential reservoirs of variability.
In the past, through genetic mapping, we could locate the approximate positions of target loci represented by the distance from gene markers in cM. To identify the target gene, a large genetic population is needed to pinpoint a small region in the genome. BAC (bacterial artificial chromosome) sequencing or primer walking may be needed to identify the genes linked to the genetic markers. With genome sequencing, the situation has improved. First, the physical positions of markers can be found in well annotated reference genome. Secondly, sequences and gene models within the locus of interest can be examined to pinpoint possible gene candidates for more in-depth studies. Thirdly, by examining the genomic sequences, non-synonymous SNPs, InDels, and CNVs can be discovered, which can then be used to explain the phenotypic differences.
An advantage of using wild crop relatives over the use of unrelated species is that the former is more likely to produce fertile offspring with the domesticated crops for generating mapping populations or for breeding purposes. Genetic mapping can involve the generation of different kinds of populations, such as unrelated populations (mini-core collections), advanced backcross populations (A-BC) , recombinant inbred lines (RILs) , near isogenic lines (NILs) , nested association mapping (NAM) populations  and multi-parent advanced generation inter-cross (MAGIC) population , and so on. Each method has its own advantages and serves a unique purpose (Figure 1) .
Wild relatives have played unique roles in association mapping. In general, crop genomes have long linkage disequilibrium (LD) half-lives [78,111,112,113]. The resolution of genome-wide association mapping (GWAS) is usually dependent on the size of the LD block. Therefore, the resolution of maps generated from cultivated varieties alone tends to be low owing to the low LD decay rate. Due to the higher genetic diversity and probably higher outcrossing rate among wild germplasms, the genomes of the wild relatives usually have higher linkage disequilibrium decay rates [78,111,112,113], and thus they can serve as better materials for GWAS compared to the cultivated crops.
On the other hand, it has also been demonstrated that some QTLs (Appendix A) fixed by domestication can hardly be mapped using cultivated populations. For example, two 100-seed weight loci on Chromosome 12 of the soybean genome suggested to be related to domestication were only found in the wild soybean-derived populations and not the cultivated soybean-derived populations . Hence, mapping involving wild relatives may help discover more domestication-related loci which may also be important for crop improvement.
Genetic mapping is important for the identification of genes/loci controlling specific agronomic traits. There have been a lot of mapping studies using wild crop relative-derived populations. In this review, we specifically collected all the information about population approaches in pulses. For any crop, the variability in the genomes of the wild relatives is high and thus provides a higher number of genetic markers for mapping. In addition to the traditional markers such as variable length polymorphisms, variable number of repeats, InDel markers and SNPs, there is currently an increased reliance on array-based and sequencing-based markers such as Diversity Arrays Technology (DART) , restriction site-associated DNA markers (RAD) , reduced-representation libraries (RRLs), complexity reduction of polymorphic sequences (CRoPS) , bin markers  and other new technologies . Meanwhile, whole-genome sequencing has played a pivotal role in genetic mapping. In theory, whole-genome sequencing can generate the highest density of markers. While array-based SNP detection is limited by the number of probes on the array, as long as an SNP is covered by sequencing reads, it can be used as a marker. However, in reality, constrained by the sequencing depth (a factor of the operation cost) and the high error rates of next-generation sequencing, a “bin”, which is an array of high-confidence SNPs detected by a sliding window, is often used instead of any single SNP . Genotyping-by-sequencing (GBS)-based mapping was successfully first demonstrated in rice . GBS can greatly reduce the financial cost and labor required for linkage mapping compared to traditional mapping using PCR-based markers [120,121].
GBS have thus far been used to map many important production-related loci in crops. Nevertheless, up till now, there have been limited successful cases describing the mapping of genes from wild relatives in grain legumes. GBS of a unique RI population of G. max × G. soja has successfully identified a major QTL conferring salt tolerance in wild soybean . Combined with the association study of resequencing consensus of 20 unrelated germplasms and the comparison of de novo genomes of cultivated soybean and wild soybean, the authors have identified the causal gene for salt tolerance in the wild parent to be a gene encoding a cation/proton exchanger (GmCHX1) . Similarly, a recent potential multidrug and toxic compound extrusion (MATE) transporter has been identified to be associated with the total contents of antioxidants, phenolics, and flavonoids in soybean seeds. This common genomic region for the three groups of compounds can explain up to 64% of the phenotypic variance under field conditions  (Table 2). Another trait studied using GBS is the resistance to sclerotinia stem rot disease in soybean. After genotyping 101 soybean lines with different levels of resistance, the researchers found three major QTLs, distributed on chromosomes Gm03, Gm08 and Gm20  (Table 2). An excellent approach using a big soybean collection including wild germplasms and whole-genome resequencing allowed researchers to identify important genes related to domestication and crop improvement, making use not only of the new sequencing technologies but also of the information generated during many years of characterizing QTLs and genes in soybean . Soybean root architecture  and total fresh weight  seem to be clearly associated with a specific region in the soybean genome. Same results have been generated by two independent groups using different wild and cultivated soybean populations and approaches (Table 2).
Populations and mapping using cultivated × wild or landrace of common beans have been developed to dissect agronomic traits such as white mold resistance (NILs and BC) [125,126] as well as seed weight, seed size, days to flowering, yield, plant height and concentration of minerals such as Zn and Fe in seeds [127,128] (Table 2). The recently available information on the common bean genome (Table 1) together with the re-sequenced genome, transcriptome and methylome [41,72,129,130] will allow the scientific community to speed up the dissection of important traits for this grain legume using GBS approaches [131,132,133].
In the case of chickpea, the role of wild species for mapping and crop improvement has been extensively reviewed [160,161] and huge efforts have been made to develop different mapping populations including wild chickpea and landraces (Table 2) for traits such as flowering time, 100-seed weight, pod and branch number per plant, plant hairiness, seed yield per plant, etc. [137,138,139,140,141]. During the past several years, important articles reporting on the genetics and genomics of chickpea have deepened our understanding of this legume and contributed to the possibility of developing new approaches to improve this crop by characterizing the genetic diversity in the wild species [68,69,70,71,83,140,162,163,164,165].
The developing of mapping populations using wild or diploid ancestors of peanut has presented a special challenge, given the ploidy difference and sexual incompatibility between wild and cultivated peanut. Therefore generating the mapping populations sometimes involves the development of wild synthetic allotetraploids . Still, there are five examples of using wild relatives to successfully map important genetic regions controlling root-knot nematode resistance, drought- and agronomic/domestication-related traits, flowering precocity, seed and pod numbers, pod length and size, pod maturity time, height of the main stem, plant spread, flower color and late leaf spot resistance [142,143,144,146,167] (Table 2). Janila et al. (2016)  also summarized up to the present, the history and perspectives of genetics and genomics-assisted breeding (Appendix A) in peanut, highlighting the importance of wild relatives as a source of novel alleles.
In the case of pea, an RIL population between the wild relative, Pisum sativum subsp. Syriacum, and a cultivated line has been developed, revealing six QTLs related to Mycosphaerella pinodes resistance (Table 2). Interesting QTLs related with domestication features have been described using five populations generated by crossing lines representing different stages of domestication (e.g., wild, landrace, etc.) (Table 2). Pea landraces tolerant of abiotic stresses such as frost, drought and high temperature have been identified with great potential as germplasms for breeding target . Although there have not been many studies on QTL mapping using wild relatives together with traditional or new GBS-derived markers up to now, the use of GBS is starting to rise for pea . For instance, a study using RILs from cultivated lines has yielded high-density and high-quality SNP markers with great potentials . A recent review summarized the current status of genomic tools in pea breeding programs , which could be applied to better explore wild germplasm of pea.
As a globally popular food crop, lentil has attracted more and more attention from researchers. The wild lentil relative, Lens ervoides “Brign”, was crossed with the cultivated lentil line, “Eston”, and the resulting RIL population was then phenotyped for 23 important and complex traits including anthracnose resistance. There is great potential for this population to be genotyped using new technologies and further explored  (Table 2). In addition, next-generation sequencing of both the wild and cultivated lentils revealed a large collection of SNPs and improved the genotyping platform for the mapping of the L. culinaris genome .
Pigeonpea is a cross-pollinated diploid crop. Great efforts have been made by the International Crops Research Institute for the Semi-Arid Tropics (ICRISAT) over the years to increase productivity. Diverse breeding strategies are continuously being developed for pigeonpea improvement . The developments of cytoplasmic male sterile lines for hybrid breeding have systematically and steadily been generating the most promising materials available [151,172,173,174]. Incompatible crossing barriers between cultivated pigeonpea and its wild relatives have not hindered its improvement [175,176]. A number of interesting examples show the improvements on this grain legume using features in alien germplasms, such as abiotic stress (salinity) tolerance and resistance to biotic stress (fusarium wilt, phytophthora blight and cyst nematode), genetic dwarfs, high protein content and special nutritional value, cleistogamy, male sterility lines, etc. [149,150,151,152,175,177] (Table 2). A recent excellent review summarized the current status of genomics-assisted breeding in pigeonpea  after the release of the first draft genome.
Cowpea is a readily self-pollinating crop. Using cultivated cowpea as well as wild relatives and landraces, an important number of mapping populations have been developed during the past 30 years. Close cross-compatible relatives have been explored as genetic reservoirs and parent donors of important agronomic traits [178,179,180]. The available high-marker-density linkage map using synteny with other legumes have increased the possibilities to search for regions and genes associated with agronomically important traits and a series of databases strongly facilitate cowpea breeding . Most of them have been developed using cultivated cowpea genotypes, demonstrating a great potential for increasing the genetic diversity of this cultivated pulse . A set of traits, such as floral scent compounds, seed size, pod fiber layer thickness, seed weight, time of flower opening, days to flower, have also been improved using genetic resources from wild relatives [154,155,156] (Table 2). Some population approaches evaluating pod length, pod tenderness and domestication related traits have also been developed in yardlong bean (Vigna unguiculata [L.] Walp. ssp. unguiculata cv.-gr. sesquipedalis), which, interestingly, has evolved from cowpea by divergence domestication in Asia [157,158,159].
In the case of lupins, mapping studies have been done, either through traditional molecular markers or high-throughput sequencing techniques, to complement breeding programs for different species within the complex [75,99,182,183,184,185]. However, total or low-frequency crossing barriers between species of Lupinus make the scenario of interspecific gene transfer too complicated to improve specific agronomic traits . Some examples of interspecific crosses have been reported but the situation is far too complicated to be a useful tool for lupin breeding .
Taking advantage of the genetic diversity of wild relatives of crop plants to improve the performance of crops is not a novel concept, but it is worth noting that, for grain legumes, it is often difficult to trace back the breeding history and the real impact on crop production. In contrast, the record-keeping on other major crops seems to be a little more comprehensive . It seems that grain legumes, especially those cultivated in more limited geographical areas, have been systematically neglected .
The advent of genomics and genomics-assisted breeding have expanded our understanding of complex traits, allowing us to dissect the genetic bases of traditional and new important agronomical traits, not only to increase productivity or adapt to climate change, but also to develop alternative food production systems tailored to poor areas and small farms to grow more and better food.
The use of genomics-assisted breeding using wild relatives, particularly in grain legumes, should be intensified. The skepticism of plant breeders to make use of wild and exotic plant genetic resources due to associated linkage drag is gradually overcome because the exploration of expanded gene pools is providing us with unprecedented opportunities to discover major genes controlling important traits. Once the major genes or genomic regions have been characterized, it would be worthwhile to move forward the introgression from the cultivated populations and even consider transgenesis, genome editing or the less controversial mutagenesis. Scientific developments and the knowledge generated thereof using wild relatives are revolutionizing our understanding of biological processes. However, it has been cautioned that the improvement of specific traits using wild relatives masks the true potential of genetic diversity in the wild relatives for breeding . Clearly, the potential with using wild relatives goes beyond crop improvement itself. It is a promising scenario where international collaborations shall arise and deepen in order to contribute to increased food production, environmental sustainability and better quality of life for future generations.
In Table 2, we summarize published works on genes or genomic regions associated with a wide range of important traits from reproductive features to interactions with microorganisms, by mining the genetic resources in wild relatives for different types of population development. In this work, we have compiled the most up-to-date information on population approaches using wild relatives of the nine grain legumes. Only a few of these major works have been performed using available genotyping technology by deep sequencing, and even fewer of them explored the diversity of genomic information for the characterized regions. Finally, the various grain legume populations already generated and summarized in this review are in themselves great resources for further exploration and deeper genomic and genetic analyses, especially with respect to the -omics information on their wild versus cultivated germplasms.
Although we have long recognized the value of wild relatives in the past, there always seem to be a repeated pattern of recognizing their potential importance following periods of calamities in food production. An example of great interests in wild crop relatives after the Second World War , and a second, after the devastating losses in maize production occurred in USA in 1970. Both periods led to significant advances in the collection and evaluation of wild materials culminating in the formation of the International Plant Genetic Resources Institute (today operating under the name, Biodiversity International) .
The contribution of wild relatives to modern agriculture, helping to increase yield, quality and disease resistance, is significant. Joint efforts from plant breeders and collaborative institutions across the world have contributed to the examples of success. The linkage drag and reproductive barriers are a hard hurdle to overcome and the contributions made thus far to dealing with this issue should be especially recognized. However, the genetic erosion of the most important crops remains problematic, particularly when we are facing more frequent periods of flooding, drought and/or diseases with global warming, coupled with intensive agricultural systems. In this regard, the use of wild relatives may be the solution.
Tanksley and McCouch  proposed the paradigm shift from “looking at the phenotype” to “looking at the gene” during the screening of exotic germplasm, shifting away from selecting potential parents based in phenotype to evaluate directly the presence of novel genes. The examples given in this review are some of the biggest successes and plant breeders have continued working on this idea. With the emergence of next-generation sequencing techniques, several studies using wild relatives have been published, generating important information about genetic diversity, population structure, gene expression, methylation patterns, adaptation mechanisms, etc. Our abilities to characterize and understand the genetic variability are the basis for this paradigm shift, based on examining the genetic composition rather than the phenotype. We face a big challenge related to the scattering of characterized genetic diversity information generated by researchers around the world and deposited in various genetic resource centers. It is important to emphasize the importance of characterizing not only the traditional morpho-agronomic traits but also those more complex traits that are crucial for enhancing the potential to adapt to future climate scenarios. Characterization and accessibility to core and mini core collections of pulses diversity is a fundamental requirement for any breeding approach [190,191]. The organization and availability of this vast amount of new information must be made a worldwide priority in order to facilitate the use of these genetic resources [192,193]. In an excellent review for the International Year of Pulses, the number of accessions and locations of grain legume collections around the world have been collated . The discovery of new wild alleles controlling specific traits in specific crops should be more easily facilitated with the centralization of genomics information into big databases that contain important information such as specific gene expressions in different tissues, at different developmental stages or under different stress conditions .
One major limitation in the utilization of crop wild relatives in breeding programs is due to major gaps in the genetic diversity of “gene pools”. The availability of crop wild relatives could be hampered by many factors, such as loss of natural habitats. “Gap analysis” is a tool to assess genetic conservation and to formulate conservation strategies by prioritizing among taxa containing gaps due to sampling, geographic and environmental factors. The power of this tool was demonstrated by a case study of Phaseolus gene pool [195,196].
A big international effort is underway with the aim to adapt agriculture to climate change, which includes collecting, protecting and preparing crop wild relatives. Several pulses are among the major targets: common bean, adzuki bean, chickpea, cowpea, faba bean, groundnut, lentil, lima bean, mung bean, pea, pigeonpea, soybean, urd bean and vetch [197,198]. The information generated and systematized from this project certainly will be a unique source of information and materials facing the current and futures challenges for agriculture in the context of crop wild relatives use.
This work is supported by grants from Hong Kong Research Grants Council (14108014), and Lo Kwee-Seong Biomedical Research Fund, and CUHK VC Discretionary Fund (4930716, and 4940734). Nacira Muñoz was supported in the by a fellowship from Instituto Nacional de Tecnología Agropecuaria (INTA), Argentina. Jee-Yan Chu copy-edited this manuscript.
|QTL||Quantitative trait locus|
|CNV||Copy number variation|
|A-BC||Advanced backcross population|
|RIL||Recombinant inbred line|
|NAM||Nested association mapping|
|MAGIC||Multi-parent advanced generation inter-cross|
|GWAS||Genome-wide association studies|
|SNP||Single nucleotide polymorphism|
|DArT||Diversity Arrays Technology|
|RAD||Restriction site-associated DNA marker|
|CRoPS||Complexity reduction of polymorphic sequence|
|IBL||Inbred backcross line|
|MATE||Multidrug and toxic compound extrusion|
|SRAP||Sequence-related amplified polymorphism|
|TRAP||Target region amplification polymorphism|
|SRR||Simple sequence repeat|
|RAPD||Random amplified polymorphic DNA|
|AFLP||Amplified fragment length polymorphism|
|Gene||A DNA sequence that determines the appearance of hereditary characteristics in living organisms|
|Allele||Each alternative form of a gene, occupying the same position in each pair of homologous chromosomes|
|Fst (fixation index)||Measure of population differentiation due to the genetic structure|
|Phenotype||A set of inherited characteristics that is dependent on both the genes and the environment|
|Genotype||A set of genes that are characteristic of each organism or individual|
|QTL (quantitative trait locus)||A locus on the chromosome that is associated with the quantitative variation of a trait|
|Association mapping||A germplasm-based approach to characterize QTLs or variations by exploiting the historic linkage disequilibrium to associate phenotypes with the underlying genotypes|
|Joint-linkage association mapping||A family-based approach to characterize QTLs or variations that are shared across families. It differs in power and scope from the characterization of QTLs based on bi-parental populations.|
|Marker-assisted selection||A set of tools that use gene markers to select, in a precise manner, the plants with the genetic potential to produce the desired trait for breeding|
|Genomics-assisted breeding||A set of genomics tools (using high-throughput approaches) to select in a targeted manner the plants with the genetic potential to produce the desired trait for breeding|
|Introgression||The gene flow from one genetic background (individual) to another gene pool (individual) by backcrossing with one of its parent|
|Backcross||Crossing of offspring lines with one of the original parental line|
|Linkage drag||Offspring with undesirable genetic background inherited from one of the parental lines|
|SNP (Single Nucleotide Polymorphism)||Single-nucleotide variations on the DNA sequence within a population or between paired chromosomes|
Nacira Muñoz, Ailin Liu, Leo Kan, Man-Wah Li and Hon-Ming Lam analyzed the literatures and co-wrote the manuscript. Hon-Ming Lam coordinated the writing efforts. Nacira Muñoz and Ailin Liu designed the framework of the manuscript.
The authors declare no conflict of interest. The funding sponsors had no role in the collection, analyses, or interpretation of literature, in the writing of the manuscript, and in the decision to publish the results.