|Home | About | Journals | Submit | Contact Us | Français|
Conceived and designed the experiments: KL MAS FT. Analyzed the data: KL MAS EH FT CB. Wrote the paper: KL MAS EH FT. Sample preparation and sequencing: FT EN SB CL. Bioinformatics analysis: CB KL XW BBT.
Stochastic and deterministic allele specific gene expression (ASE) might influence single cell phenotype, but the extent and nature of the phenomenon at the onset of early mouse development is unknown. Here we performed single cell RNA-Seq analysis of single blastomeres of mouse embryos, which revealed significant changes in the transcriptome. Importantly, over half of the transcripts with detectable genetic polymorphisms exhibit ASE, most notably, individual blastomeres from the same two-cell embryo show similar pattern of ASE. However, about 6% of them exhibit stochastic expression, indicated by altered expression ratio between the two alleles. Thus, we demonstrate that ASE is both deterministic and stochastic in early blastomeres. Furthermore, we also found that 1,718 genes express two isoforms with different lengths of 3′UTRs, with the shorter one on average 5–6 times more abundant in early blastomeres compared to the transcripts in epiblast cells, suggesting that microRNA mediated regulation of gene expression acquires increasing importance as development progresses.
Preimplantation development from the totipotent zygote to the blastocyst culminates in the establishment of pluripotent inner cell mass cells (ICM) and trophectoderm cells at embryonic day (E) 3.5 . Differential gene expression in individual cells is a key determinant of cellular differentiation, functions and physiology , . This includes allele specific expression (ASE) or differential allelic expression (DAE), which in some instances is due to parent of origin specific imprinting that also affects X inactivation. ASE might also occur due to sequence polymorphisms , , with consequences for phenotypic diversity and disease susceptibility amongst individuals , . A subtle change in gene expression, as in the case of tumor suppressor gene, Pten, is sufficient for a profound effect on cell susceptibility phenotype in mice . However, ASE has not been investigated during early development due to the limited amount of material available for analysis. However, it is now feasible to address this question using single cell RNA-Seq analysis.
ASE can potentially serve as an elegant strategy for contributing to cellular diversity with broad functional consequences during development and in adults –. Nearly half of protein-coding genes in humans have multiple alleles with sequence polymorphisms within their mRNA's coding region (CDS) or untranslated region (UTR) . Between 5–50% of the expressed genes exhibit ASE in mammalian tissues or cell lines –. However, it is unclear if different subpopulations of cells at different cell cycle stages or circadian clock stages within a tissue, exhibit the same pattern with respect to ASE (Supporting Information S1). The extent of ASE within individual cells is yet unknown, which is crucial in order to establish how sequence polymorphisms and epigenetic status regulate gene expression differently within the same cell.
ASE when coupled with stochastic gene expression could influence cell phenotype . Seemingly identical cells can be affected by the microenvironment and the intrinsic transcriptional noise, to produce stochastic characteristics –, as reflected in the number of mRNA copies from expressed genes . However, the relationship and the extent to which stochasticity contributes to ASE are unknown. Furthermore, this aspect has not been assessed in single blastomeres at the onset of development.
Deep-sequencing based RNA-Seq is a powerful tool to analyze transcriptome at an unprecedented depth and accuracy. We have used single cell RNA-Seq to analyze the digital transcriptome of individual blastomeres in early mouse embryos . We produced over 1 billion 50 base reads from 52 individual mouse cells from oocytes to the blastocyst stage.
On average, our single cell RNA-Seq method detected 14,920 to 18,125 RefSeq transcripts expressed at each stage during preimplantation development (Reads per million, RPM >0.1). On average there were 16,666 transcripts (Table S1) in individual cells, which is more than half of all known genes that are expressed in early development.
Using principal component analysis (PCA), we found individual cells to be clustered at each development stage, and the relative distances between each stage represent the extent of change in the transcriptome (Fig. 1). This showed that mature oocytes are most separated from two-cell blastomeres, when the maternal transcriptome is turned off and the zygotic transcriptome is turned on –. We found that 2,193 transcripts were upregulated and 8,173 transcripts downregulated when comparing oocyte with blastomeres from two-cell stage embryos, a change by approximately 70% of all expressed genes. Between two- to four-cell stage, 5,384 transcripts were upregulated, and from four- to eight-cell stage, another 3,412 transcripts were upregulated, which represents the second wave of major zygotic gene activation. Notably, the ‘distance’ in changes from eight-cell blastomeres to the trophectoderm cell (TE) is much shorter than that between eight-cell blastomeres and ICM cells. There is therefore a greater change in the transcriptome from the 8-cell stage to the ICM compared to the change involved in the formation of the TE lineage. This suggests that early blastomeres may be destined for the trophectoderm lineage before diversification towards the pluripotent ICM lineage.
There is considerable interest concerning potential differences between individual blastomeres up to the 8-cell stage. We therefore compared different blastomeres within the same embryo [Fold Change, FC(B/B) >2 or <0.5] and between embryos at the same developmental stage [(FC(E/E) >2 or <0.5]. We found that two blastomeres from individual two-cell embryo showed significant similarities [(Pearson correlation coefficient >0.99 (Supporting Information S1)]. By contrast, blastomeres from different embryos at the same developmental stages showed greater differences. This is probably due to cell cycle variations between embryos.
We generated over 700 million 50 base reads for 20 individual two- to eight- cell blastomeres to examine the extent and nature of ASE. Our results show that the correlation coefficient between the transcriptomes of two blastomeres from the same two-cell embryo is as high as 0.993 to 0.996, based on the analysis of 16,525 expressed transcripts, which excludes technical bias in our data (Supporting Information S1 and Table S1). To determine ASE in individual blastomeres (Fig. 2 and Supporting Information S1), we examined loci that have two dbSNP (build 128) alleles expressed with a minimum coverage of 25 reads (Fig. 3 and Table S2). To obtain accurate ASE gene calls, we first eliminate homozygous loci that appear to be heterozygous due to potential technical errors by comparing loci from a relatively homozygous mouse embryonic stem (ES) cell line, where 95% of the time the minor allele frequency accounts for less than 6% (Fig. 4, Supporting Information S1). After filtering our data for homozygous loci, we found that between 531 and 2,029 of them in individual cells have expressed heterozygous SNPs (Tables 1 and and22 and Table S2), which allowed us to discriminate between the mRNA derived from both alleles of a gene in a single cell.
Between 39 and 51% of the selected heterozygous loci show clear ASE in individual cells of a two-cell embryo (Tables 1 and and2,2, Fig. 4, Supporting Information S1). Thus, the two alleles of a significant number of genes in the same individual cell exposed to identical trans-regulating factors respond differentially and are expressed at different levels. This differential response to an identical intracellular environment potentially provides relatively stable, yet flexible expression by allowing the two alleles to respond differently to trans-acting signals/factors within the same cell . An alternative possibility is that allelic differences in cis regulatory elements might have an important impact on the activity of many genes that we decided to explore further.
Differential allelic expression might be predetermined by genetic differences such as cis regulatory elements or epigenetic modifications such as local chromatin structure, or these differences may arise randomly affecting either allele. If ASE is predetermined, we would expect to find an essentially identical pattern in individual blastomeres of a two-cell embryo. Using SNP-coupled genetic polymorphism to identify the alleles of individual genes, we found that for more than ninety percent of the genes in two-, and four-cell embryos, expression of one of the alleles, henceforth called allele A, was always higher than that of the other allele a in individual cells within the same embryo (Table S2). Thus, for nearly half of the expressed genes distinguishable by mRNA SNPs, their functional read out was coupled to genetic polymorphism in cis in early embryos. This indicates that trans-regulatory factors such as differences in cell cycle, asymmetric division or micro-environment, do not have a significant effect. The differential allelic expression might be regulated either directly through deployment of the transcription machinery, or indirectly by differential epigenetic marking and status of the two alleles in the same cell –. The ability of two alleles to respond differently within the same cell offers a possible explanation for why progeny from out bred crosses with two distinct alleles often show superior performance compared with those from inbred parents, since the choice of alternative alleles might be advantageous through robustness of response to the environment .
Next we addressed how genetic background affects regulation of gene expression, since we had obtained embryos from MF1 out bred mice for this analysis. We found that for gene loci sharing the same set of heterozygous alleles across blastomeres at the same developmental stage, but from different embryos and therefore with different genetic backgrounds, between 66% and 83% showed allelic ratios that were skewed in the same direction (Table S2). This implies that a large number of alleles are systematically expressed more highly within individual cells of different embryos. Nonetheless, since some of the loci can show ASE in either direction, this indicates that genetic background can also sometimes affect the allele that is predisposed for expression within individual cells.
Next we asked if stochastic events contribute to differential allelic expression. For this, it is necessary to distinguish between ‘noise’ that could be due to technical issues, from genuine stochastic differences in expression, particularly as both might occur on a similar scale. To eliminate differences in expression due to technical reasons, we looked at the expression ratios of two alleles of genes in individual blastomeres obtained from the same two-cell embryos, which will potentially remove all technical noise associated with similar cells from different embryos (Fig. 2). We found that for nearly 6% of the expressed loci with distinguishable heterozygous alleles at the two-cell stage, and 10–12% of them at the four-cell and eight-cell stage, there was a significant difference in the magnitude of allele specific expression in blastomeres from the same embryo (allelic imbalance (AI), FC[(allele A/allele a in blastomere #1)/(allele A/allele a in blastomere #2)] >2 or <0.5, p<0.05) (Table 1). Thus, there could be hundreds of genes in early embryos with intrinsic stochastic characteristics, which as expected have no concordance in similarly analyzed blastomeres from different two-cell embryos (Table S3). Thus, combined data leads to a critical conclusion that for hundreds of genes, the combination of deterministic and stochastic expression establishes their functional readout, which can potentially control cell fate, developmental potential and phenotype of early embryo and adult organisms. Furthermore, these observations might offer an explanation, at least in part, for the developmental differences between identical twins, beyond environmental effects.
Alternative splicing is known to play an important role in defining cell identity and their physiological function . We detected 18–25% of known transcript variants (RPM >0.1) (Table S4). From oocyte to two-cell stage, 269 transcript variants were up regulated (fold change, FC[oocyte/two-cell, splicing]>2, p<0.05), and 677 variants were down regulated (FC[oocyte/two-cell, splicing]<0.5, p<0.05). There are even more transcript variants from two- to four-cell stage, but less so from eight-cell to blastocyst stage. Next we asked if different transcript isoforms of a gene show similar changes between different developmental stages, or if these changes are significantly different. We detected dramatic differential regulation of simultaneously expressed transcript variants during preimplantation development. (Table S4). Some of these transcript variants can encode different proteins by skipping exons or by utilizing alternative exons, suggesting that some proteins from a gene may have different expression dynamics during early embryonic development.
Our analysis also revealed differences in the 3′UTRs of transcripts –. Whereas these differences do not appear to have any relationship with ASE, they might contribute to the regulation of gene expression. The single cell RNA-Seq analysis uses poly(T) primer for reverse transcription, which is biased towards a better coverage of the 3′ end of mRNAs, so that after about 1 kb from 3′end of mRNAs, the read coverage dropped significantly and nearly disappeared after 1.5 kb (Fig. 5 and Table S5). However, for a large number of genes with long 3′UTR at the region upstream of the 1.5 kb of 3′ most regions, there are other peaks of reads covering the 5′ part of 3′UTRs and often with significantly more coverage than the distal 3′UTR (Fig. 5B and Supporting Information S1). We propose that this is due to the presence of alternative transcripts from the same gene with shorter 3′UTR. We found that 1,718 genes have two transcript isoforms with the same coding region but different lengths of 3′UTR (Table S5), suggesting that approximately 13% of expressed genes exhibit at least two isoforms with long (distal) and short (proximal) 3′UTR in the same cell at the same time (Supporting Information S1). We verified the RNA-Seq result with allele specific RT-qPCR (Supporting Information S1, Table S6, and Table S7).
The majority of transcripts with different 3′UTRs from a gene are due to alternative polyadenylation, since 40% of short (proximal) and 70% of long (distal) 3′UTRs, respectively, have CPSF binding sites located at −20 nt upstream of their 3′end, suggesting that the stop of the majority of these 3′UTRs are at the canonical polyadenylation recognition sites (Fig. 6). We propose that in individual blastomeres, the majority of transcripts from a gene have a short 3′UTR, which will be translated constantly, whereas for a small number of transcripts from the same gene with long 3′UTR, their translation will be dynamically regulated by microRNAs or RNA-binding proteins (Fig. 7). This would allow relatively stable translation of a large set of genes, while permitting subtle adjustments in protein levels. The abundance of transcript isoforms with short 3′UTR is about 5–6 fold higher compared with those from the same gene with a long 3′UTR in two-cell embryos (Fig. 6C and Supporting Information S1). Thus, the impact of miRNA regulation of mRNA translation for these mRNAs with two different lengths of 3′UTRs in early blastomeres is probably negligible (Fig. 7).
Interestingly, whereas the ratio of long to short 3′UTR is maintained at four- and eight-cell stages, it changes in postimplantation epiblast cells, where mRNAs with the longer 3′UTR isoforms increase from 16% to 33% (Fig. 6). This suggests that later in development, transcripts with long 3‘UTR might provide greater flexibility for translational regulation, at a time when cells respond to more complex signaling conditions for diverse cell fate decisions.
Non-coding RNAs play an important role during early embryonic development . We found that for the 1,446 non-coding genes in the RefSeq database, between 400 and 600 of them were expressed during preimplantation development of which, on average, 203 of them showed dynamic changes, suggesting that they may contribute to the regulation of early development (Table S1). Furthermore, 3′UTR of non-coding RNAs increased significantly from mature oocyte to two-cell blastomeres, but decreased slightly from two- to four- to eight-cell stage (Table S5). This was maintained thereafter in the ICM but decreased significantly in TE. At E4.5, the length of 3′UTR decreased significantly in the epiblast, which matches with the pattern for the protein-coding genes. This shows that as with the protein-coding genes, non-coding RNAs have similar pattern of expression level dynamics and of the length of 3′UTR.
A crucial step towards understanding early embryonic development is to obtain comprehensive knowledge of gene expression network underlying the ordered development phenotype. Previously the transcriptome landscape of preimplantation embryos has been analyzed using hundreds of pooled embryos. Furthermore, there is as yet no complete description of the overall differences in the transcriptomes of the ICM and trophectoderm. We have taken advantage of our single cell RNA-Seq analysis of blastomeres to address some of these questions (Table S8). We found that the most dramatic change in gene expression network occurred between oocyte and two-cell embryo stage when the maternal transcriptome is replaced by the expression of the zygotic transcriptome. We also found that eight-cell blastomeres are clearly more similar to trophectoderm cells than to ICM cells, suggesting that the initial development of blastomeres is probably towards the trophectoderm lineage from which the pluripotent ICM lineage emerges subsequently. We also obtained comprehensive transcriptomes of the ICM and trophectoderm.
It may be argued that changes in expressed genes may not necessarily reflect corresponding protein abundance due to extensive post-transcriptional and post-translational regulation. However, global transcriptome and proteome analysis has shown that transcriptomes exhibit strong correlation with corresponding proteomes . In addition, we also detected significant numbers of long non-coding RNAs. Nearly half of these non-coding RNAs showed dynamic changes during preimplantation development, which not only reflects the overall transcriptional activity of the genome, but it also reflects the functional status of the genome during early development.
Concerning ASE, over half of the genes with distinguishable SNPs display differential ASE in individual cells. Notably, a significant number of these specific alleles are systematically more highly expressed in blastomeres from different embryos, indicating that differences in their cis-regulatory elements may play an important role in determining allele-specific expression. Future studies with hybrid embryos from mutual crossing of two different strains of inbred mice will be required to obtain further insights into the influence of the parental origin of genes and cis-regulatory elements in dictating ASE. However, we also found several genes, which show intrinsic stochastic expression characteristics that may contribute to the differences between identical cells. Such differences could contribute to phenotypic differences, and account for the divergence between identical twins, independently of genetic and environment factors. We cannot formally exclude the possibility that the high percentage of ASE within individual early blastomeres is due to the presence of maternally inherited transcripts. However this seems unlikely, because if this were the case, the subsequent decline in the maternal transcripts during early development should result in a significant reduction in ASE. However, we see an increase in the percentage of ASE between 2 and 8-cell stage. We cannot rule out transcriptional oscillation as a contributing factor to ASE, particularly for those genes where the ASE is shifted to one allele or the other between blastomeres.
We also found that the lengths of the 3′UTR of transcripts show surprising changes during development, indicating that for many genes there are also post-transcriptional control mechanisms, which impact on translational regulation during early development. A large number of transcripts are known to be the targets of microRNAs . Here we found that nearly two thousand genes with potential microRNA binding sites have majority of their mRNA copies with truncated 3′UTRs, which would potentially allow them to escape from possible repression by the co-expressed microRNAs (Fig. 7). This is most significant up to the eight-cell stages. The proportion of transcripts with longer 3′UTRs increased later, which probably allows for the establishment of more complex gene regulation networks as development progresses, through more flexible regulation of mRNAs by microRNAs, and possibly by RNA binding proteins.
All the mouse work has been conducted under the license from Home Office of UK (the project license number is: PPL 80/2193). All embryos at two-cell, four-cell, and eight-cell stages were recovered from outbred MF1 females mated with MF1 male mice . Two-cell, four-cell, and eight-cell embryos were flushed from the oviduct using a 30-gauge blunt tip flushing needle at E1.5, E2.0, and E2.5, respectively. The zona pellucida was removed by use of acidic tyrode solution. Then the individual blastomeres were separated by gentle pipetting with a glass capillary in calcium-free medium. The resulting single cells were washed twice in 0.1% BSA in 1X PBS before they were picked individually for subsequent analysis.
The single-cell RNA-seq was performed following a detailed protocol we developed recently , . Briefly, an individual cell was manually picked and transferred into lysate buffer by a glass capillary controlled by a mouth pipette. The reverse transcription was directly done using the whole cell lysate. Then a poly(A) tail was added to the 3′ end of first-strand cDNAs using terminal deoxynucleotidyl transferase, which was followed by 20 plus 9 cycles of PCR to amplify the single cell cDNAs evenly.
Libraries produced from individual cells were barcoded and sequenced using the SOLiD system. The 60 Gb of sequence produced by 50 bp reads was aligned (www.solidsoftwaretools/gf/project/transcriptome/, raw data was deposited in GEO database, GSE22182) to the mouse reference genome (NCBI build 37) containing IUB codes on every locus reported in dbSNP database (build 128), in order to minimize the coverage bias of the reference allele. For each alignment location we assign a score that reflects the size of alignment and the number of mismatches between reads and the reference genome. For each read, low scoring alignments (more than 9 units lower than the maximum alignment score of the read) are removed and uniquely aligned reads are used for downstream analysis (Table S1, Supporting Information S1). For more details of alignment scoring, see .
For a given genomic coordinate, the “allelic ratio” is the number of reads aligned across a given position which contains the reference nucleotide divided by the number of reads containing the first non-reference nucleotide in dbSNP (Supporting Information S1). Thus we will only concern ourselves here with the subset of genomic positions for which (1) sequence data produces multiple nucleotides; (2) observed alleles are listed in dbSNP; (3) the position is covered by at least 25 uniquely aligned reads; (4) no more than 50% of the covering reads supporting each nucleotide align to identical positions in the genome (in order to account for PCR replication); and (5) UCSC annotation supports transcription of only one strand (Table S2 and Supporting Information S1). These genomic positions account for less than 1% of all sequenced bases, while the remaining 99% of the sequenced bases show a single nucleotide, indicating that those loci are homozygous (and so are not included for allele specific expression).
In general, a complete solution for this problem is difficult to answer with only RNA libraries . There are three essential advantages to this study that allows us to produce such a solution: (1) our libraries are produced from single cells, which makes every SNP either homozygous or heterozygous; (2) the use of cells produced from nearly inbred mice as a negative control (Table S2 and Supporting Information S1), which enables the detection of heterozygous loci; and (3) double-strand cDNA synthesis, which enables us to measure the variation of allelic ratios from balanced loci. The method we used here is based on empirical distributions calculated from data and therefore incorporates both library and instrument biases.
For every locus of interest (as described in the beginning of this section), we want to use allelic ratios for the common alleles based on the NCBI dbSNP annotation to detect heterozygous loci and determine the allelic specific expression of the 2 alleles. Detection of heterozygous loci is mainly affected by errors produced in reverse transcription combined with PCR amplification steps, which are known to create false polymorphic sites. Additionally, instrument error, even after 2 base encoding corrections, can produce the same effect.
In order to measure the extent of these biases, we used 12 mouse ES cells (for one of these cells, we have two technical replicates) obtained from the same embryonic stem cell line, generated from a nearly inbred (near homozygous) embryo. All of the 12 ES cells have identical genomic background, which allows us to aggregate data from all cells, and additionally, they are known to be nearly homozygous (Table S3). The allelic ratios generated from these cells help us measure the biases mentioned above. Figure S5 shows that there is a 95% chance for a homozygous locus in the search space to generate a minor (false) allele frequency of 0.066 or less. We use this observation to call a locus (from search space) heterozygous if the minor allele frequency is above 0.066. The expected false positive rate of incorrectly calling a homozygous locus as a heterozygous one will be less than 5% (Supporting Information S1). We also designed allele specific real-time PCR assays and validated 23 out of 26 ASE loci for 20 individual blastomeres (Supporting Information S1, Table S7)
Next, we want to measure the variation in allelic ratios of heterozygous loci introduced by library preparation and instrument measurement. To do this, consider that we have a SNP with 2 alleles (at the cDNA level): a1=5′ ATCGCCC 3′ and a2=5′ ATCACCC 3′ that have x and y copies, respectively. In the library preparation, the second strand is synthesized and x and y copies of the alleles b1=3′ TAGCGGG 5′ and b2=3′ TAGTGGG 5′ are complementarily produced. The ratio B= (number of a1 observations + number of b2 observations)/(number of a2 observations + number of b1 observations) should equal 1 if no library and instrument biases exists, irrespective if the locus is allelic balanced or not (Supporting Information S1). Since the alignment method we used makes a distinction between the two strands of the mouse genome, the number of a1 allele observations is equal to the number of reads aligned to the positive strand containing nucleotide G (same for all other alleles). The distribution of this statistic D generated from all heterozygous loci of an individual cell represents an approximation of the distribution of allelic ratios of balanced loci, which takes into consideration library and instrument biases. For a locus with allelic ratio r, the associated p-value (when testing H0: locus has balanced allelic expression) is the result of the two-tailed test when r is generated from D distribution. The locus is called as having allele specific expression (ASE) if the resulting p-value <0.01 (Supporting Information S1, and Tables 1 and and22).
Next we investigated the extent and types of allelic imbalance (AI) observed between blastomeres from the same embryos (AI= (ASE in blastomere #1)/(ASE in blastomere #2)). Here we focus on relative AI, which compares the ratio of the expression of two alleles between pairs of blastomeres from same embryo . We analyzed further only those genomic positions having a significant AI with χ2 p-value less than 0.05 (Tables 1 and and22 and Table S3).
Single cell RNA-Seq counts and RPM mapped to RefSeq of single oocytes, blastomeres of pre-implantation embryos.
Allelic Specific Expression (ASE) of single oocytes, and blastomeres of 2-cell, 4-cell, and 8-cell stage embryos. Each entry represents a locus that was found to be heterozygous in at least half of the cells of one cell type. For each such locus we generate observed allele counts, log2 allelic ratio and p-value (as described in SOM) from each individual cell.
ASE counts and AI χ2 test p-values for ES cells, and 2-cell, 4-cell, 8-cell embryos. The log2(AI) =log2[(ASE in blastomere #1)/(ASE in blastomere #2)] and χ2 p-values were calculated for 12 ES cells and each blastomeres.
Single cell RNA-Seq counts and RPM mapped to unique known exon-exon junctions and exons for each known splicing variant in RefSeq of single oocytes, blastomeres of pre-implantation embryos.
Consensus 3′UTRs of single mature oocytes, blastomeres of 2-cell, 4-cell, and 8-cell stage embryos, TE, and ICM of E3.5 blastocysts, epiblast of E4.5 blastocysts, and ESC. Each entry presents up to six UTR predictions of a RefSeq transcript. For each cell type we report a predicted UTR size(s) if the prediction was observed in at least half of all the individual cells.
The expression of 380 genes in blastomeres of 2-cell and 4-cell embryos by single cell TaqMan real-time PCR.
Allelic specific real-time PCR validation of ASE based on single cell RNA-Seq data. The averaged Ct values and standard deviations (SD) were calculated from three technical repeats.
Gene Ontology (GO) analysis results of functions, networks, and components for genes with their fold changes more than 2-fold and p-values <0.01 between two developmental stages.
Supporting materials including supplementary discussions, methods, figures, and references.
Competing Interests: KL, CB, EN, XW, BT are employees of Life Technologies Corporation.
Funding: The work was supported by a grant from the Wellcome Trust to M.A.S. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.