|Home | About | Journals | Submit | Contact Us | Français|
Rodents are recognized as hosts for at least 60 zoonotic diseases and may represent a serious threat for human health. In the context of global environmental changes and increasing mobility of humans and animals, contacts between pathogens and potential animal hosts and vectors are modified, amplifying the risk of disease emergence. An accurate identification of each rodent at a specific level is needed in order to understand their implications in the transmission of diseases. Among the Muridae, the Rattini tribe encompasses 167 species inhabiting South East Asia, a hotspot of both biodiversity and emerging and re-emerging diseases. The region faces growing economical development that affects habitats, biodiversity and health. Rat species have been demonstrated as significant hosts of pathogens but are still difficult to recognize at a specific level using morphological criteria. DNA-barcoding methods appear as accurate tools for rat species identification but their use is hampered by the need of reliable identification of reference specimens. In this study, we explore and highlight the limits of the current taxonomy of the Rattini tribe.
We used the DNA sequence information itself as the primary information source to establish group membership and estimate putative species boundaries. We sequenced two mitochondrial and one nuclear genes from 122 rat samples to perform phylogenetic reconstructions. The method of Pons and colleagues (2006) that determines, with no prior expectations, the locations of ancestral nodes defining putative species was then applied to our dataset. To give an appropriate name to each cluster recognized as a putative species, we reviewed information from the literature and obtained sequences from a museum holotype specimen following the ancient DNA criteria.
Using a recently developed methodology, this study succeeds in refining the taxonomy of one of the most difficult groups of mammals. Most of the species expected within the area were retrieved but new putative species limits were also indicated, in particular within Berylmys and Rattus genera, where future taxonomic studies should be directed. Our study lays the foundations to better investigate rodent-born diseases in South East Asia and illustrates the relevance of evolutionary studies for health and medical sciences.
Among mammals, rodents are recognized as major hosts and vectors of parasites and pathogens, some of them causing important zoonoses and representing a serious threat for human health [1-5]. Most epidemiological studies have focused on the most common rodents with emphasis on commensal species such as the laboratory rat, Rattus norvegicus. A common assumption is that the rodent species responsible for disease transmission are those living close to humans, but since wild species distant from human settlements have been proven to play a key role in maintaining, spreading and transmitting pathogens and parasites (e.g. ), this point of view is being questionned. Specific diversity within the host community has also been shown to play an important function in the maintenance of a disease and in the probability of its transmission to humans [6,7]. Consequently, researchers are now focusing not on a single particular host species but on the whole host community and are endeavouring to understand the role of each rodent species in the context of the entire host-pathogen community.
Today this knowledge is more urgent than ever since biodiversity in many areas is being altered rapidly by the ongoing global change. Because of anthropogenic disturbances, the host-pathogen interactions are being dramatically modified leading to new and unexpected disease risks and the emergence and/or re-emergence of infectious diseases [6-10]. To be able to predict and to anticipate some of these risks, one should be able in the case of rodent host communities, to identify first and foremost each rodent at a specific level, a real challenge when considering that rodents represent 40% of mammalian species  including many cryptic species, and that new genera and species are yearly described (e.g. Laonastes aenigmamus, ; Saxatilomys paulinae, ; Mayermys germani, ; Tonkinomys daovantieni, ).
Among Muridae rodents, the Rattini tribe encompasses 35 genera corresponding to 167 rat species  following the tribal arrangement of the Murinae proposed by Lecompte et al. . Nearly all representatives of this tribe inhabit South East Asia, a major hotspot of biodiversity  faced with a runaway economic growth damaging habitats, biodiversity and health but also a hotspot of emerging and re-emerging diseases [19,20]. If the partition of the tribe among five divisions (i.e. Crunomys, Dacnomys, Maxomys, Micromys and Rattus divisions) [16,17] is widely accepted, its taxonomy remains however largely untested phylogenetically and its delimitations are not yet secured. Chiropodomys, Vandeleuria, Hapalomys, Haeromys and Vernaya genera were included in the Micromys division by Musser and Carleton . As the Eurasian harvest mouse, Micromys was proven to belong to the Rattini tribe ([17,21]), the whole Micromys division should belong to the Rattini tribe if Musser and Carleton's assumption is right. However, some of these genera (i.e. Chiropodomys and Vandeleuria) were recently shown to be unaffiliated to Micromys according to molecular evidences , while putative representatives of the Rattini tribe (i.e. Tonkinomys daovantieni, Saxatilomys paulinae, Srilankamys sp., Hapalomys sp., Haeromys sp., Vernaya sp.) have not been investigated using molecular data and are currently considered as Murinae incertae sedis . Numerous rat species have been demonstrated or postulated as major hosts of pathogens (e.g. Hantaviruses described from bandicoot rat, Bandicota indica in Thailand, [22,23]; Bandicota indica, B. savilei, Berylmys berdmorei, Niviventer sp., and Rattus sp. serologically tested positive for Rickettsia tsutsugamushi, the agent responsible for scrub typhus ; etc.). Although easily identified at a generic level by an expert, Asian rats are often difficult to discriminate at a specific level using morphological or cytological criteria. The wide range of intra-specific morphological variation makes morphological criteria unsuitable for accurate rat species identification and has led to an over-description of species and to a confusing taxonomy, hampered by an overabundance of synonyms. It is particularly true concerning the Rattus genus (e.g. 41 synonyms for R. norvegicus, 83 for R. rattus, etc.  and see also ) that consists of a heterogeneous accumulation of species and of several monophyletic clusters that may or may not prove to be grouped in a single genus . This polyphyletic pattern is highlighted by the six species groups proposed by Musser and Carleton  (i.e. the Rattus rattus, Rattus exulans, Rattus norvegicus, Rattus fuscipes, Rattus leucopus and Rattus xanthurus species groups) and a seventh assemblage containing unaffiliated species (i.e. the Rattus species group unresolved) for which phylogenetic affinities are uncertain; some representatives will eventually be removed from the genus. Even karyotypic criteria, which previously claimed to be species diagnostic tools, were recently revealed to be unsuitable to discriminate between Asian rat species . DNA-based methods, however, appear to be promising tools for easy and accurate rat species-specific identifications .
Robins et al.  were the first to attempt to identify Rattus species using mitochondrial DNA sequences mostly obtained from museum tissue samples. Nevertheless, their conclusions based on DNA-barcoding and tree based methods were limited because these methods need reliably identified specimens as reference. Specimens and tissues offered by museums to scientists are collected by many different people and it seems likely, given the extent of some misidentifications, that rat species identification is not an easy task even for mammal specialists. Moreover, the taxonomy of the tribe Rattini is complex and changing and often different to that in use when samples were first described and listed in museums .
Level of variation in cytochrome b sequences was also proposed as a reference point in making decisions concerning species-level distinctions . Based on the analysis of 4 genera of rodents, Bradley and Baker  suggested that genetic distance values lesser than 2% were indicative of intraspecific variation and values higher than 11% of species recognition. But how to conclude between 2 and 11% ? The DNA-based species delimitation approach proposed by Pons et al.  relies on DNA sequence information itself as the primary information source for establishing group membership and defining putative species and does not require defining entities as priors. This method was shown to be useful for identifying meaningful entities among groups whose current taxonomy is incomplete (e.g. tiger beetles of the genus Rivacindela, ) or uncertain (e.g. aphids of the genus Brachycaudus) and has already been successfully applied when species are difficult to conceptualize (e.g. bacteria  or for asexual animals, [30,31]). Using a likelihood framework, this new procedure detects the point of transition in the rate of lineage branching of a tree from interspecific long branches to intraspecific short burgeoning branching and identifies clusters of specimens corresponding to putative species.
In our study, we used molecular data to test the limits of the current taxonomy of the Rattini tribe. We aimed at identifying where species boundaries are unclear and where further investigations need to be carried out to provide a more rigorous systematic framework for epidemiological surveys. As molecular data are useful to detect and distinguish morphologically similar species, this study investigated the existence of putative cryptic species among the Rattini tribe (i.e. two or more species that are classified as a single nominal species because they are at least superficially morphologically indistinguishable ). To these aims, we first sequenced two mitochondrial and one nuclear genes from rat specimens coming from Southeast Asia (Thailand, Cambodia and Lao People's Democratic Republic) to perform phylogenetic reconstructions. Then, as morphological characters are often misleading, we applied the method developed by Pons et al.  that determines, with no prior expectations, the locations of ancestral nodes to define putative species. Finally, we endeavoured to give a name to each cluster recognized as a putative species using information from the literature and also sequences obtained from a museum holotype specimen following all the ancient DNA guidelines.
116 specimens of Rattini were selected among the 3,000 trapped by our team in the fields mostly in Thailand and punctually in Cambodia and in Lao PDR. Specimens selected were chosen in order to maximise the number of species and geographic locations analysed. Field specimen identifications and locality information are listed in Table Table11 and indicated in Figure Figure1.1. Field identifications were made based on morphological criteria according to [11,33-35]. Based on morphological and cytological evidences, no specimen was identified by us as a representative of the cosmopolitan Rattus rattus species. Considering their preponderant place in epidemiological surveys, 4 worldwide black rat specimens (identified in ) were added to the sample set. To provide an appropriate outgroup, we included specimens of the Eurasian harvest mouse, Micromys belonging to the Rattini tribe and previously recognized as the sister lineage to the Rattus group sensu lato of Verneau et al., [37,38,17,21]. In total, our taxa sampling consisted of 122 rats.
For nomenclatural prospects, a small piece of skin from the holotype specimen of Leopoldamys neilli was also analysed in this study. The type specimen is the male n°54-4330 from the Centre for Thai National Reference collections, collected by W.A. Neill in 1973 at Wat Tham Prapothisat, in the Saraburi Province (Kaengkhoi District, Thailand, 14°35'N X 101°8'E) (see  for further details).
Three genes proven valuable for rodent systematics were considered for the phylogenetic analyses [39,40,25,17]. We targeted two mitochondrial markers, the cytochrome b (cytb) and the cytochrome c oxydase I (COI) genes and the first exon of the nuclear gene encoding the interphotoreceptor retinoid binding protein (IRBP).
To avoid contamination, pre-amplification procedures and post-amplification analyses were performed in independent rooms in the laboratory. DNA was extracted from tissue with DNEasy Tissue Kit (Qiagen) in accordance with the manufacturer's instructions. Primer sets used to amplify the cytb, COI and IRBP genes are listed in Table Table2.2. All amplifications were carried out in 25 μL reactions containing about 30 ng of extracted DNA, 0.2 mg/mL BSA (Roche, 1 mg/mL), 300 μM of each dNTP, 0.2 μM of each primer, 1 unit of Taq polymerase (Qiagen), 2.5 μL of 10X buffer, 0.5 mM of extra MgCl2. Cycling conditions were as follows: one activation step at 94°C for 4 min followed by 40 cycles of denaturation at 94°C for 30 s, annealing at 48°C-58°C depending on the primers (Table (Table2)2) for 30 s, elongation at 72°C for 45 s-1'30 min depending on the length of the target (1 minute per kb), and a final extension at 72°C for 10 min. PCR products were sequenced by Macrogen (Seoul, South Korea).
Sequences were aligned by eye using SEAVIEW  and translated into peptide sequences using the Transeq EMBOSS tool  to exclude putative NUMt copies and to ensure sequence orthology. As the risk of homoplasy by convergence and reversal is reduced by considering a large number of characters , we combined the three genes into a single dataset using the DAMBE software . Thus, a total of 3,068 bp were considered in the subsequent phylogenetic analyses.
Base composition bias was evaluated using PAUP* v4.0b10 , and a chi-square test was performed to check for taxa with deviations of nucleotide composition. Substitutional saturation was assessed via saturation plots. Using DAMBE , the absolute number of transitions was plotted against MLComposite TN93 (Tamura-Nei Model) distance for all pairwise comparisons of taxa. For the three genes, the curve did not reach a plateau when subtracting the third codon position, but did reach a plateau when considering the entire sequences (data not shown). To discard fast evolving transitions and improve inferences without drastically compromising the resolution, we decided to recode the third codon position nucleotides to two state categories, R (purine) and Y (pyrimidine), (RY-coding strategy; ).
Phylogenetic trees were reconstructed using two probabilistic approaches: maximum likelihood (ML) and Bayesian inferences (BI). The appropriate model of evolution was first determined for each gene and for the concatenated dataset (with and without RY-coding) using corrected Akaike information criterion (AICc) and MrAIC . The HKY+I+Γ model was selected for both the cytb and COI genes while the GTR+ Γ was selected for the IRBP gene and the combined dataset (with and without RY-coding). ML analyses were performed with PhyML-v2.4.4 . For each analysis, the transition/transversion ratio, the proportion of invariable sites as well as the gamma distribution parameter (if necessary) were estimated and the starting tree was determined by BioNJ analysis of the dataset (default settings). Using optimization options, 500 bootstrap (Bp) replicates were performed. PhyML analyses were first run independently on each locus and then on the combined dataset (with and without RY-coding). Taking into account that PhyML does not allow data-partitioning, partitioned ML analysis was also performed using RAxML 7.0.4 . As the model choice is limited in RAxML, the general time-reversible (GTR) + Γ model (option -m GTRGAMMA) was selected for the three partitions (option -q multipleModelFileName), and individual α-shape parameters, GTR-rates and base frequencies were estimated and optimized for each partition. Robustness of the tree was assessed using the rapid bootstrap procedure (option -f a) with 100 replications (option -# numberOfRuns) .
Bayesian analyses were performed using MrBayes v3.1 . Four independent runs of 5,000,000 generations each were performed applying appropriate independent models of evolution to each gene. A burn-in period of 1,000,000 generations was determined graphically using Tracer1.2 . For each dataset, all runs gave similar tree topologies and posterior probability (pp) values.
We used the DNA-based approach proposed by Pons et al. . Using a likelihood framework, this new procedure detects the switch in the rate of lineage branching of a tree from interspecific long branches to intraspecific short budding branching and identifies clusters of specimens corresponding to putative species. Two models are implemented to account for the branching process of the entire tree. Under the null model, the whole sample derives from a single population obeying a coalescent process. The alternative model, called general mixed Yule coalescent (GMYC) model combines equations that separately describe branching within populations (coalescent process) and branching between species (a Yule model including speciation and extinction rates). Under the GMYC model, a threshold (T) is optimized such that nodes before the threshold are considered as species diversification events, whereas branches crossing the threshold define clusters following a coalescent process. A standard likelihood ratio test (LRT) is used to assess whether the alternative model provides a better fit than the null model. If the GMYC model is favoured over the null model, the T parameter of the maximum likelihood solution allows the number of species to be estimated. This test was achieved using the R code provided by T. G. Barraclough. This latest version outputs the estimates of the number of species, of the threshold time and their 95% confidence limits (i.e. solutions with 2-log likelihood units of the maximum).
Because a pre-requisite of the method is an ultrametric tree, we used the relaxed Bayesian dating method implemented in Multidivtime  to convert our optimal phylogram tree (estimated from the Bayesian analysis of the combined dataset) in a rooted additive tree with terminal nodes equally distant to the root. In this aim, we followed the documentation files written by Rutschmann  and the procedure detailed in . The settings for the Markov chain Monte Carlo analyses were slightly modified (200,000 cycles in which the Markov chain was sampled 20,000 times every 10th cycle following a burnin period of 100,000 cycles). No fossil is described to calibrate our Rattini phylogeny. As our aim was simply to obtain an ultrametric tree, prior ages to lineages were arbitrarily assigned to 1 (rttm = 1; rttmsd = 0). The mean of the prior distribution for the rate of molecular evolution at the ingroup root node (rtrate) was computed as the mean of the median of the amount of evolution for the different tips of the three independent gene trees (rtrate = 0.735; rtratesd = 0.367).
Rattus cytb (663 bp) and COI (655 bp) sequences obtained by Robins et al.  were extracted from GenBank and added to our mitochondrial (mt) dataset (see Table Table3).3). As our study focuses on rodents from the Indochinese region, sequences of species belonging to the Rattus fuscipes species group (i.e. native Australian species) and to the Rattus leucopus species group (i.e. species indigenous to New Guinea and adjacent archipelagos) were not incorporated in this dataset. Two other unpublished cytb sequences of R. argentiventer and R. sikkimensis (synonym of R. andamanensis) provided by O. Verneau and F. Catzeflis were also included in the subsequent analysis. Sequences of a single representative of Berylmys, Niviventer, Leopoldamys, Maxomys and Micromys were used to root our mitochondrial phylogeny. Therefore, the mt dataset included 129 sequences corresponding to 1,318 bp of mt DNA. Partitioned ML analysis was performed using RAxML 7.0.4  and the same options as before.
For species assignment, we tested the relevance of DNA sequences obtained from a holotype specimen. As museum samples contain tiny amounts of poorly preserved DNA, we selected a 85 bp fragment of the cytb gene, corresponding to positions from 666 to 750 of the gene sequence of Rattus norvegicus (NCBI accession number [GenBank NC_001665]). This fragment was chosen for the following reasons: i) it corresponds to an highly variable region of the gene that allows the discrimination of most vertebrate species including the closest related ones  ii) its short length is suited for the PCR amplification of degraded DNA  and iii) it has proved valuable for species assignment based on degraded DNA extracted from archaeological samples .
To check if it provides adequate discrimination for rat species, the whole cytb sequences of the 122 specimens were reduced to the 85 bp fragment following the groups evidenced by the DNA-based species delimitation method. Based on our sampling, rat species could be easily discriminated with this small sequence (except the two entities hereafter named Be2a and Be2b but see discussion) (see the 85 bp alignment in additional file 1). So, we decided to target this DNA barcode from the holotype of Leopoldamys neilli.
As we used a museum specimen, the difficulties associated with ancient DNA studies are relevant to this analysis. Hence, ancient DNA work was performed at the PALGENE national platform (CNRS, ENS Lyon, France) dedicated to ancient DNA analysis, following the standard procedures and using specific equipment and personal protections [58,59].
DNA was extracted from the holotype of Leopoldamys neilli following the protocol detailed by Rohland and Hofreiter . Primer sets declined from Télétchéa et al.,  were used for PCR attempts (Table (Table2).2). At least two independent PCR amplifications were performed in 25 μL reaction volumes containing 2.5 units of Perkin Elmer Gold Taq polymerase (Applied Biosystems), 1 mg/mL BSA (Roche, 20 mg/mL), 2 mM MgCl2, 250 μM of each dNTP, 0.5 μM of primers. For each independent PCR attempt, a range of dilutions was performed to find the best compromise between inhibitor's concentration and targeted DNA molecule concentration. DNA was amplified with a 5 min activation step at 95°C followed by 55 cycles of denaturation (94°C, 30 s), annealing (48°C, 30 s) and elongation (72°C, 45 s). Amplification products were systematically cloned using Topo TA Cloning for sequencing kit (Invitrogen). 16 clones of independent amplifications were sequenced to determine the consensus sequence (Macrogen, Seoul, South Korea).
The CAOS software, a two step character-based DNA barcoding method  was then used to determine if the Leopoldamys neilli holotype consensus sequence could be assigned to one of the clusters recognized as a putative species by the method of Pons et al., . First, a diagnostic rules generator, P-Gnome, was used to search DNA changes through the 85 bp cytb matrix (122 sequences) and to establish diagnostic rule sets for each of the previously described entities (outputs of the DNA-based species delimitation method). Then, the P-Elf program was run to classify as a query the holotype sequence according to the rules generated by P-Gnome.
Cytb, IRBP and COI sequences were generated for 122, 120 and 116 rat specimens respectively. All sequences were deposited in GenBank under the accession numbers HM217360 to HM217717 (Table (Table1).1). No significant difference in nucleotide composition among taxa was detected which indicated that no artificial grouping could occur due to a misleading compositional signal in the dataset. PhyML analyses were first carried out on each locus independently (data not shown). Each gene considered separately does not result in a robust Rattini phylogeny: mitochondrial markers help to resolve terminal nodes, while IRBP lends support to deepest ones. But, since the 3 genes yielded consistent, compatible topologies, sequences were concatenated and phylogenetic analyses were then carried out using the combined dataset.
Identical topologies were obtained with and without a RY-coding of the 3rd codon position (data not shown). However, better resolution and stronger topological supports (Bp and pp) were reached without an RY recoding strategy. It seems that our dataset was not informative enough for a RY recoding strategy resulting in this case in an over-depletion of the phylogenetic signal.
BI, partitioned and unpartitioned ML analyses (without RY recoding strategy) yielded the identical topology given in Figure Figure2.2. Most relationships among the Rattini tribe were well resolved (supports 61-100 for Bp, 0.82-1.00 for pp). Monophyletic groups corresponding to the Rattini divisions proposed by Musser and Carleton  are sustained with the highest values of Bp or pp. The Maxomys division clearly appears as the first division to diverge followed by the Dacnomys division, here represented by Leopoldamys and Niviventer genera, and the Rattus division. Berylmys appears with maximum support values as the earliest lineage to diverge among the Rattus division. A sister grouping is indicated between the genera Bandicota and Rattus, but this association is weakly supported. In fact, the monophyly of the Rattus genus received moderate pp (0.82) to weak Bp supports (61 for unpartitioned, 63 for partitioned ML analyses). To test the reliability of these findings, we considered an alternative hypothesis concerning the position of Bandicota within the Rattus division (i.e. Bandicota was placed inside the Rattus sp. cluster). SH-test failed to find significant differences between these hypotheses and the alternative branching orders of Bandicota inside the Rattus division could not be excluded (P > 0.05). Inside the Rattus sp. clade, the 3 Rattus species groups proposed by Musser and Carleton  could be distinguished. The R. exulans monotypic group (Re, Figure Figure2)2) clustered with the R. rattus species group (Rr, Figure Figure2)2) with high branch supports (Bp = 94/96 for the unpartitioned/partitioned ML analyses; pp = 1) and the R. norvegicus species group (Rn, Figure Figure2)2) is placed as sister taxa to the R. exulans species group/R. rattus species group cluster.
At this point in the analysis, 23 lineages (labelled R1 to M2 in the Figure Figure2)2) are identified within our taxon sampling. As their specific status are still questioned, intra-generic relationships are problematic to describe and will not be discussed in this section.
The existence of distinct phylogenetic lineages was corroborated by the analysis of the branching rate pattern. A lineage-through-time plot based on the Multidivtime ultrametric tree evidenced a sudden increase in branching rate towards the present, likely corresponding to the switch from interspecies to intraspecies branching events (see additional file 2). To fit the position of the switch, the method of Pons et al.  was applied to the time calibrated tree (Figure (Figure3).3). The GMYC model was preferred over the null model of uniform branching rates (logL = 700.133, compared to null model logL = 687.218; 2ΔL= 25.83, χ2 test, d.f. = 3, p < 0.0001). The model fitted the switch in the branching pattern occurring at -0.07084 (i.e. T of the ML solution/it is worth reminding that the time separating the ingroup root from the present was arbitrarily assigned to 1), leading to an estimate of 24 putative species, 4 of which containing a single individual (labelled R5, Be2b, N2 and N3 respectively in Figure Figure3).3). Two Maxomys (M1 and M2), 4 Niviventer (N1 to N4), 3 Leopoldamys (L1 to L3), 2 Bandicota (B1 and B2), 3 Berylmys (Be1, Be2a, Be2b) and 10 Rattus species (R1 to R10) could be numbered as indicated in Figure Figure3.3. It is worth noting that the Berylmys lineage (labelled Be2 in Figure Figure2)2) actually seems to correspond to two putative species following Pons et al's approach (therefore labelled Be2a and Be2b in Figure Figure3).3). Confidence interval for the threshold ranged from -0.09439 to -0.04189 and the estimated number of species ranged from 22 to 32 (i.e. estimates falling within 2 log-likelihood units of the ML solution).
The partitioned ML analysis of the mt dataset including 64 new Rattus sequences (this study) plus 61 from previous studies  gave the highly resolved and robust tree represented in Figure Figure4.4. This has allowed us to name some clusters identified as putative species by the DNA-based species delimitation method. Because the monophyly of each cluster embracing the supplementary published sequences is supported with the highest Bp value, the level of confidence of these identifications could be considered as maximal if the voucher identification beforehand is correct.
Robins' sequences identified as Rattus rattus cluster with 100% Bp support with sequences assigned to R. rattus specimens in . Specific identification of group R1 as Rattus rattus is thus convincingly confirmed. According to the mt tree, none of our samples from Thailand, Cambodia or Lao PDR could be assigned to this species. Following the same approach, R2 seems to correspond to Rattus tanezumi, R5 to Rattus tiomanicus, R8 to Rattus exulans and R9 to Rattus norvegicus. Sequences provided by O. Verneau and F. Catzeflis allow us identifying R6 as R. argentiventer and R7 as R. andamanensis. As expected, since its distribution is restricted to Sulawesi, sequences of Rattus hoffmanni group with none of our specimens. R. hoffmanni whose phylogenetic affinities among the Rattus rattus group need to be elucidated  appears as the sister taxa to R. argentiventer with strong support (88 Bp). The situation appears more complex for the species R3. This group corresponds to a mix of specimens identified as R. rattus diardi in , Rattus kandianus (considered as a synonym of R. rattus, ) in , R. tanezumi from Indonesia  and R. tanezumi, R. andamanensis or R. argentiventer according to the field names we assigned during our sampling. Consequently, no nominal species could be reliably assigned to R3.
We successfully obtained 85 bp cytb sequences from the Leopoldamys neilli holotype. At least two independent PCR runs were performed, positive PCR products were cloned and consensus sequences were determined using clone sequences of independent PCR amplifications. Analysis of the differences observed between the clone sequences and consensus sequence shows that 75% of the degradation was due to deamination of cytosines, as expected from ancient DNA substrates [62,63].
The consensus sequence was identified as a rat cytochrome b sequence using a BLAST program (no Leopoldamys neilli cytochrome b sequence was available in databanks such as EMBL or GenBank before this study). This sequence is a genuine holotype sequence for the following reasons: (i) Rattini samples were never introduced in the ancient DNA facilities before the analysis of this specimen was performed; (ii) all the 16 clones analysed were identified as rat; (iii) the errors induced by DNA damage are perfectly consistent with the pattern generally observed for ancient DNA sequences (strong bias toward type 2 transitions caused by deamination of cytosine [62,63]); (iv) for each amplification, all three PCR blanks remained negative ; (v) independent PCRs were performed and furnished the same conclusions. All in all, these points satisfy criteria of authentication for the ancient DNA work .
The genuine holotype sequence was deposited in GenBank under the accession number HM235947. It was assigned using the CAOS software to the monophyletic cluster corresponding to the Leopoldamys species, L2, in our tree (Figures (Figures22 and and3).3). Consequently, this monophyletic cluster recognized as a putative species by the method of Pons et al.  could be without ambiguity named as Leopoldamys neilli.
Our phylogenetic analyses of Indochinese Rattini based on the combination of cytb, COI and the first exon of the IRBP genes is compatible with the revised taxonomy of Rattini divisions proposed by Musser and Carleton . The Maxomys division, the Dacnomys division (here consisting of Leopoldamys and Niviventer as sister taxa) and the Rattus division (here including the genera Rattus, Bandicota and Berylmys) are sustained with the highest support values (Figure (Figure2).2). These results are congruent with the Murinae phylogeny obtained by Lecompte et al.  based on the analysis of the combined cytb, IRBP and GHR genes. In this latter analysis, the 3 divisions are well supported and the Maxomys division is also the first to diverge followed by the Dacnomys one and the Rattus group sensu stricto of Verneau .
In our analyses, the position of Bandicota still remains uncertain. The monophyly of the genus Rattus is in reality weakly supported (0.82 for pp and 61/63 for Bp) and SH-test failed to reject the hypothesis of a paraphyletic Rattus genus (i.e. Bandicota is placed within Rattus). Verneau and collaborators [64,37] attempted to determine the evolutionary relationships in Rattus sensu lato using LINE-1 (L1) amplification events. In their study , two LINE subfamilies were identified in the Bandicota and the other Rattus species except in Rattus fuscipes. Since L1 subfamily absence from a particular taxa reflects an ancestral state rather than a derived state , these findings excluded Rattus fuscipes from a Bandicota/Rattus clade and placed Bandicota inside the genus Rattus leading to its paraphyly. Our study is in agreement with the multi-locus phylogeny of Lecompte et al.,  which shows Bandicota and the genus Diplothrix diverging together prior to the Rattus clade. In the Lecompte's study, the monophyly of the genus Rattus is highly supported (98 Bp, 1 pp) but, as in our study, no specimen of the Rattus fuscipes species group was included. To draw conclusions about paraphyly in Rattus genus, it would be judicious to complete the taxa sampling among the genus Rattus and to include representatives of each Rattus species group defined by Musser and Carleton  particularly representatives of the Rattus fuscipes species group.
Within this genus, 7 species groups have been defined by Musser and Carleton , of which 3 inhabit the Indochinese region and are relevant to this study (Rr, Re and Rn in Figures Figures22 and and3).3). The Rattus rattus species group as described by Musser and Carleton  comprises 21 species of which 5 may be found in Thailand, Cambodia and Lao PDR. In our phylogenetic analysis, this cluster appears unambiguously to be monophyletic (1.00 for pp; 93/97 for Bp) and was placed undoubtedly as the sister group of the monotypic exulans species group (pp = 1.00; Bp = 94/96). This association was also found in recent molecular studies [25,17] but encompassing fewer representatives of the Rattus rattus species group. According to Musser and Carleton , the R. norvegicus species group includes 3 species (Rattus norvegicus, R. nitidus and R. pyctoris) of which only 2 may occur in the Indochinese region (Rattus norvegicus and R. nitidus). This group appears in our study as the sister taxa to the "R. exulans species group/R. rattus species group" cluster as found in  and .
Robins and colleagues  focusing on rats inhabiting islands in Southeast Asia, included in their sampling specimens from Australia (i.e. belonging to the Rattus fuscipes species group as defined by ) and from New Guinea and adjacent archipelagos (i.e. belonging to the Rattus leucopus group). Based on the analysis of nearly 2 kb of mt DNA, they recovered 5 of the 7 groups proposed by Musser and Carleton . Our study, even if focusing on a different region of South East Asia, is perfectly congruent with Robins' study, and both studies are compatible with the revised taxonomy of the Rattus genus recently proposed by Musser and Carleton . The sixth group defined by the authors  corresponds to the xanthurus species group encompassing species native to Sulawesi and adjacent islands. According to preliminary phylogenetic analyses of cytb sequences cited in , this assemblage could be placed as the sister-group to the R. leucopus and R. fuscipes groups. The last group defined by Musser and Carleton  does not correspond to a natural cluster but was formed for practical reasons since it includes species whose phylogenetic affinities have to be clarified; some may need to be excised from Rattus.
At a specific level, we realized that phylogenetic relationships were difficult to discuss. Species misidentifications are indeed plentiful and recurrent both in our sampling (see Table Table1)1) and in the literature. Mt sequences from Robins et al.  or provided by O. Verneau and F. Catzeflis were included in our dataset but questions about the reliability of the identification of vouchers were rapidly raised. To cite a few examples, the Rattus tanezumi sample occurring in the tiomanicus cluster in  (see Figure Figure4)4) was proposed by the authors to represent a misidentification. Similarly, the R. rattus cf. moluccarius specimen in  and  was, according to Musser and Carleton , an example of R. nitidus whereas their specimen assigned to Niviventer niviventer was probably improperly identified since N. niviventer has never been described in the locality where the specimen was caught . We observed that the situation was worse regarding the Niviventer genus. When including sequences available in the databanks (i.e. cytochrome b sequences from ), numerous species appeared to be paraphyletic (data not shown). These results are presumably the consequence of species misidentifications and this explains why we decided to exclude these sequences from our analyses. All in all, these reports ([25,64] and this study) stressed the necessity of a sound taxonomic revision of the Rattini tribe. Consequently one must first determine valid species boundaries and then assign an appropriate name in accordance with the rules of the International Code of the Nomenclature.
According to Musser and Carleton , 9 genera corresponding to the following 27 species of Rattini may occur in our sampling area (Figure (Figure1):1): Hapalomys delacouri (see Background for justification of its inclusion into the Rattini tribe), Sundamys muelleri, Chiromyscus chiropus, 3 Maxomys species (rajah, surifer, whiteheadi), 6 Niviventer species (fulvescens, hinpoon, langbianis, tenaster, cremoriventer, confucianus), 3 Leopoldamys species (neilli, edwardsi, sabanus), 2 Bandicota species (indica and savilei), 2 Berylmys species (bowersi and berdmorei) and 8 Rattus species (andamanensis, argentiventer, exulans, tanezumi, losea, tiomanicus, norvegicus, nitidus). According to our phylogeny (Figure (Figure2),2), 23 lineages exist within our sampling and 24 putative species were suggested by the method of Pons et al. . Confidence interval for the estimated number of species ranged from 22 to 32 (i.e. estimates falling within 2 log-likelihood units of the ML solution). An inadequate population sampling is one of the potential limitations of the branch length method as identified by Pons et al. . However, the GMYC model was preferred over the null model of uniform branching rates indicated that the intraspecific sampling effort is satisfactory in our dataset (failure to reject the null model over the GMYC model could be an incomplete sampling per species; ). Moreover, among the 24 estimated species, 4 species (labelled R5, Be2b, N2 and N3 respectively in Figure Figure3)3) contain a single individual. In accordance with Pons et al, it seems that the GMYC method correctly deals with the inclusion of some rare species represented by only one single individual .
The estimated number of species fit well with the number of species described in the literature for this area, although there are some exceptions, in particular within the Berylmys and the Rattus genera. Our study suggests 3 putative species of Berylmys in our sampling whereas only 2 are mentioned in the literature within the geographic area sampled (Berylmys bowersi and B. berdmorei) (see Table Table4).4). This outcome was supported by all the solutions included in the 95% confidence interval of the estimate of the number of species (Figure (Figure3).3). This finding may be an artefact of the species delimitation method which could have difficulty in dealing with high level of population differentiation and strong phylogeographic patterns. As acknowledged by Pons et al., , a limitation of this method is that populations with partial gene flow risk being recognized as separate entities. A marked phylogeographic structuring within Berylmys bowersi could explain the distinction of Be2a and Be2b as two putative species by the branch-length method. Be2b specimen came from the Kanchanaburi locality (Table (Table1,1, Figure Figure1),1), North to the Isthmus of Kra corresponding to the limit of the peninsular Thailand whereas the specimens of the Be2b group came from the Northern Thailand (Loei and Nan provinces, Figure Figure1)1) and Northern Lao PDR (Luang Prabang province, Figure Figure1).1). Populations of Berylmys bowersi in peninsular Thailand were reported to be geographically isolated and to differ in some ways from other populations . Our findings are congruent with this report. Further investigations are needed to determine if Be2a and Be2b are two phylogenetic lineages of a same species exhibiting a strong phylogeographical pattern or if they have two be considered as two closely related but separate species.
In a similar way, five species belonging to the Rattus rattus species group have been described in this area (i.e. R. andamanensis, argentiventer, tanezumi, losea, and tiomanicus). Marshall  reported also the presence of R. rattus in all provinces of Thailand and considered the roof rat as the most abundant mammal in the country. Interestingly, since 1998, no specimen among the 3,000 caught during our successive field surveys in rural or urban areas of Thailand, Lao PDR and Cambodia could be identified as a representative of R. rattus, according to morphological, cytological and molecular evidences. Our findings offer no support for the presence of R. rattus in the area and are in conflict with previous claims of R. rattus in the Indochinese region . However, this inconsistancy is probably due to a difference in the usage of "Rattus rattus" in place of "Rattus tanezumi" rather than a problem of identification or occurrence.
Finally, our analysis corroborates the presence of an additional Rattus species (labelled R3 in Figure Figure3)3) already identified as the diardii clade in the mitochondrial phylogeny of Robins et al. . R3 could be a cryptic species. This statement yet needs further investigation using independent data (morphology, nuclear genes). Then, if this hypothesis proved to be correct, the R3 species would have to be carefully named (R. diardii is indeed considered at present as a synonym of R. tanezumi ). In agreement with our result, Aplin in his preliminary study of the cytb  observed that the taxonomy of the Rattus rattus species group might be rather thornier than suggested by previous studies mostly based on karyotypic or electrophoretic evidences. Indeed, his ongoing study reports two distinct phylogenetic clades in the Asian region. The first one would correspond to an endemic Southeast Asian taxon (recorded in Vietnam, Cambodia and Southern Laos) and might correspond to our R3 according to geographical evidence. Our study and Robins' work reveal that the distribution of this Southeast group spreads far into the South as it occurs in Thailand and in Sri Lanka and also in Malaysia, in Indonesia and Northern Sulawesi (Figures (Figures44 and and5).5). The second clade proposed by Aplin  would be a northern and South Asian taxon (found in Japan, Hong Kong, northern Vietnam, northern Laos, and Bangladesh) and might correspond to R2 (here also found in Thailand and Indonesia, Figure Figure4,4, Table Table3/see3/see also Table Table44 for species name). Indeed when including Robins' sequences, R2 includes specimens from Japan and Hong Kong (Figures (Figures44 and and5).5). As mentioned by Aplin , the latter group (R2) is more closely related to Rattus rattus rather than the former group (R3). In our trees (Figures (Figures22 and and4),4), R2 is clearly placed as the sister taxa of R. rattus (R1). Our study reinforces Aplin's assumption  that the two Asian clades (i.e. R2 and R3) are sympatric in some part of their distribution by increasing greatly the area where the two taxa co-occur in continental Southeast Asia. Both are found in Northern and Central Thailand (Phrae, Nakhon Pathom and Ratchaburi provinces; this study). Since some specimens of both taxa were trapped in exactly the same location and time, at least in Phrae, they probably also share similar habitats and are likely syntopic.
By integrating phylogenetic, morphological and geographical evidence, we proposed to attribute the names summarized in Table Table44 to the 24 species highlighted herein. Our propositions are not definitive but are revisable ones. Indeed, once species boundaries are delimitated, assigning the appropriate name to each species is not an easy task particularly for the Rattini species whose taxonomy is complicated by a large number of synonym names. Even for a rodent specialist, morphological characters are sometimes misleading (see aforementioned misidentification examples) and intraspecific morphological polymorphism makes the problem more difficult. To alleviate this last difficulty, morphological studies have to consider a large number of specimens, a process that may be difficult and time-consuming to perform.
These inconveniences highlighted the great interest in obtaining molecular data from a holotype. Indeed, the holotype is by definition the element to which the name of a taxon is permanently attached. Consequently, including holotype specimens in molecular phylogenies would be very suitable to name each cluster recognized as a valid species providing that a rigorous and sound taxonomy is already set up. Indeed, holotype specimens may correspond to problematic taxa (e.g. problems of synonymy not yet revealed), and the use of type specimens could be misleading in such context. Including holotype specimens in molecular phylogenies is however totally infeasible for the two following reasons. Firstly, holotype specimens are unique and are difficult to obtain for genetic research purposes. Sampling authorisations are very scarce and destructive sampling is generally not possible. To achieve our study, no more than 24 holotypes would be damaged if our assumptions are correct. Faced with the understandable reluctance of museum curators, non-destructive extraction procedure  would be an elegant suggestion. Secondly, ancient materials contain tiny amounts of poorly preserved and highly fragmented DNA. As required for this study, getting 3 kb corresponding to 3 different genes (including one nuclear one) for more than 24 holotype specimens, and following the ancient DNA guidelines would be too expensive and much too time-consuming. To circumvent this problem it is fortunately possible to target small DNA fragments as barcodes. Our study proved that this strategy is a powerful one. Following all the ancient DNA requirements, we succeeded in amplifying a genuine small cytb fragment from the Leopoldamys neilli holotype. This barcode was used to assign a name without ambiguity to one of the clusters (i.e. L2) recognized as a valid species in our analyses. Even if more holotype specimens have to be investigated to achieve a steady revision of the Rattini tribe, our work illustrates the huge opportunities ancient DNA analysis may offer to taxonomists.
This study represents the first step of a long-term project aiming at a deep taxonomic revision of the Rattini. Putative species delimitations have been determined here without prior assumptions and we propose a suitable methodology using molecular data from holotypes to assign the right name to each delineated species. Ancient DNA analysis of holotypes should be considered by taxonomists as a promising tool opening up new realms of possibilities (e.g. testing synonymy of names of unclear taxonomies such as the synonymy of R. tanezumi and R. diardii; see Table Table4).4). Although DNA data alone are not a panacea for species description and delimitation, we are confident that future investigations combined with other types of information will clarify the taxonomy of this confusing group. Indeed, integrative approaches merging independent data such as morphology, karyology, mitochondrial and nuclear markers are the only means to understand the diversification among, and interactions between, evolutionary lineages. Our molecular study revealed that at least 7 putative different species, including a cryptic one (R3), could exist among the Rattus rattus species group (among which six were sampled within the area we investigated). As each of these species is expected to have specific ecological traits and to carry its own set of diseases, the recognition of cryptic species within Rattini could have serious implications for human health in Southeast Asia. However, this result has to be carefully considered. Indeed, it is worth noticing that the terminal nodes of our multilocus phylogeny are mostly supported by mitochondrial data (cytb and COI genes) while the deepest nodes are sustained by nuclear data (IRPB). Other kinds of markers have thus to be checked for congruence. Such clarifications for the Rattini tribe are today urgently required to achieve meaningful epidemiological research in South East Asia.
bp: base pairs; kb: kilo base pairs.
Conceived and designed the experiments: JM, JFC, MP. Performed the experiments: MP, YC, VH, SW. Analyzed the data: MP. Wrote the paper: MP, YC. Senior epidemiologists and supervisors responsible for all scientific output of the program: SM, JM, JPH. All authors read and approved the final manuscript.
Rat 85 pb cytb alignment. The whole cytb sequences obtained from the 122 specimens selected in this study were reduced to the 85 bp DNA marker already used to discriminate closely related species from degraded DNA [56,57]. Small sequences were sorted following the results of the DNA-based species delimitation method. Dots indicate identical positions as those of the reference sequence of Rattus rattus R12. Sites allowing discrimination between species are those shared by all the specimens of a same entitie but different for all the specimens of another one. Each rat species could be distinguished from each other based on this fragment except the two Berylmys species Be2a and Be2b (but see discussion). We tried to maximize the geographic diversity of the specimens, however, our sampling was achieved without prior expectation and some entities determined by the DNA-based species delimitation method encompass few specimens coming from the same locality (e.g. R5, R6, R7, Be2b, L3, etc.). In this case, intra-polymorphism is not taken into account and substitutions allowing discriminatation between species are thus overestimated. However, closely related rat species (such as R1 and R2 or R3 and R4, see phylogeny in Figure Figure2)2) could be easily discriminated. We thus considered that this fragment is reliable for an adequate discrimination for rat species.
Lineage-through-time plot based on the Multidivtime ultrametric tree. The sudden increase in branching rate, indicated by a red line, corresponds to the shift from interspecific to intraspecific lineage branching.
Firstly, we would like to thank the two anonymous reviewers for their comments that helped us to greatly improve the manuscript. We thank K. Blasdell for English corrections. We are particularly grateful to the people who have made this work possible in the field: S. Jittapalapong from the Faculty of Veterinary Medicine, W. Rerkamnuaychoke from the Faculty of Veterinary Technology at Kasetsart University in Thailand, B. Douangboupha from the National Agricultural and Forestry Research Institute in Lao PDR and P. Buchy from the Pasteur Institute in Cambodia. We also warmly thank all the people that worked hard with us in the field to collect samples used in this study, and especially K. Chaisiri and K. Satchasataporn. Thanks to C. Tollenaere and F. Catzeflis who provided us rat sequences or samples. We would like to acknowledge the Ambrose Monell Cryo Collection (AMCC) at the American Museum of Natural History, New York, for their support in our research. We are indebted to all the people of the ancient DNA platform PALGENE (directed by C. Hänni) and more particularly to B. Gillet. Thanks to L. Missa and S. Sutjarit for help in the lab and to C. Corbisier who started this work during her master study. Thanks to A. Cruaud, JY. Rasplus, E. Jousselin and G. Kergoat for helpful discussions, to G. Dobigny for constant support. We express gratitude to PH. Fabre who offered us judicious advice to use the Multidivtime software and help us to obtain the ultrametric tree needed in this study. Many thanks from Asia to T. Barraclough for help for the R code.
This work was supported by the French GIPANR, Programme Santé Environnement - Santé Travail (Program 00121 05), by the PHC Franco-Thai Cooperation in Higher Education and Research (Program No.16601PK). This study is part of the "CERoPath project" (Community ecology of rodents and their pathogens in South-East Asia: effects of biodiversity changes and implications in health ecology/ANR 07 BDIV 012) funded by the French National Agency for Research.