Dr. Rafi Kopan, Professor of Medicine and Developmental Biology, Washington University School of Medicine: You would open up Pandora's box by doing the single cell analysis because, as we discussed this morning, you have this gene noise, you have stochastic events, so what kind of statistical tools do you think you will need to differentiate between the majority behavior and noise and would it be in some ways better to look at populations of cells, some randomly chosen population of cells, instead of individual cells, groups of 10 or 20, whatever you want?
Dr. Potter: Yes, we thought about that question a lot, actually. First of all, you are absolutely right, when you perform single cell analysis there is more noise. There are multiple sources. First, not all cells are identical, even cells that we think belong to the same subtype. Of course this is one reason to examine single cells, to define individual differences and define novel subtypes. But, there is also just more inherent noise at the cellular level, with small numbers of molecules, in many cases only one or a few copies of a given transcript on average per cell, meaning some cells have a few and others none. And one must add to this the technical noise, as it is much more difficult to accurately define the gene expression profile of a single cell. We know that when you get down to five or ten picograms of total RNA and you apply these analysis procedures you don't get correlation coefficients that are 0.99, as is possible when examining greater quantities of RNA. Instead, they are in the range of 0.7 to 0.8. So there is noise in the analysis. Our take on this is that in order to get anything really meaningful out of it, you need to significantly increase the number of samples. For most of the studies I have described today we had really good correlation coefficients and we are talking about doing everything in triplicate, but I think when you get down to a single cell level, you have to forget about triplicate. It is too noisy to get an accurate analysis out of triplicate. So, let's say in an ideal world, where cost is no object, you look at a thousand individual cells. We could never afford to do that, but suppose you could. When you look at a thousand you are going to have the array patterns fall into distinct categories or bins and even though each array pattern is in itself going to be noisy, if you sum them together then we think that you're going to get a reliable view of the gene expression profile of that cell type. It is important to emphasize that even though the single cell analysis is not perfect, it is nevertheless extremely powerful and very capable of distinguishing different gene expression profiles and dividing them into distinct groups.
I think one very interesting approach would be to do it all by NexGen RNA-Seq. With NexGen, it is very easy to pool data sets, because you are just taking digital counts, so one just adds up the counts for the individual cells of a given subtype. One would find distinct cell types, for example, if one looked at a large number of cells from the cloud of metanephric mesenchyme. I don't know how many bins there would be, but there might be three; vascular, stroma and nephrogenic, as a very simple example. There might be more. If you had a 100 single cell samples, perhaps more feasible than a thousand and you divided them into three categories, then you could actually pool the RNA-Seq counts for the three different categories and get a very accurate read out of the gene expression profile of each cell type. By adding them together one would overcome the noise issues associated with the single cell samples.
You also asked if it might be better to examine pools of cells, perhaps ten or twenty. But it seems to me that this really defeats the purpose of single cell analysis. One is then looking at averages of many cells, and single cell distinctions would be very difficult to distinguish. The different category types would disappear. Does this make sense?
Dr. Kopan: Yes it does but it opens the question of how then do you correlate that with decision making? Just categorizing the different, let's call them options of gene expression that exist in the cloud, we still don't know what that option translates into in actual developmental potential.
Dr. Potter: I think we will have to sort of play it as we go a little bit. We can't predict exactly what we will find. If we find three distinct categories, for example and some cells are expressing Tie2 and some are expressing Foxd1 and some are expressing Six2, that would be a very simple world and we might be able to easily discern that some cells are determined to make vascular cells, some to make stromal cells and some to become the epithelial cells of the nephron. If we have more complicated results, like our very preliminary data suggesting simultaneous robust Six2 and Foxd1 expression in single cells, that suggests the existence of E11.5 metanephric mesenchyme progenitors that are undecided yet as to whether they are going to become epithelial or stroma.
Dr. Scott Boyle, Postdoctoral Fellow in Nephrology, Washington University School of Medicine: I understand that you can look at the gene expression profiles at P0 thru P4 and get an idea of maybe the program that is responsible for the cessation of nephrogenesis. But I am also interested in another comparison, perhaps P2 versus E14.5. So are there a group of genes there that may identify what is responsible for self renewal in the mesenchyme, given that is the part of the program that shut off during cessation.
Dr. Potter: In more general terms we have compared for example, the renal vesicle at P1 versus the renal vesicle at E12.5 and I didn't point it out on the heat map I showed but it is quite interesting to me that at earlier developmental times there are global differences in gene expression compared to later times. For example, the earlier cells in all compartments seem to be more devoted to cell division and DNA synthesis compared to later compartments. A capping mesenchyme later is not going to be the same as earlier; a renal vesicle later is not the same as earlier. We thought going in they might be, but we see now that they are not. Your question though is if we looked at, say an E15 capping mesenchyme and compare it to P1–3, can we drive a deeper insight into what the differentiation program is. After all, early in the development we have active renewal as well as differentiation but later, at say P3, we have differentiation without renewal. We have not made those comparisons that you are suggesting. Our problem is that we are using two different arrays for those two different analyses. About a year ago when we switched over to GFP single cell type analysis we switched over to a Nugen target application method away from an Epicentere system, we switched over to an Affymetrix gene 1.0ST array, away from the old Affymetrix MOE430. We upgraded our technology and it is difficult to make good array data comparisons when you are using different target amplification methods and different arrays. That would be a nice thing to do and my bioinformatics colleague, Bruce Aronow, might be able to do it, so perhaps I should ask him to try. But I think we can learn much the same thing by comparing the P1 data, immediately after birth, when renewal still is taking place, to the P3 data, when differentiation prevails.
Dr. Chen: Many people think that the Six2 population and the Foxd1 population do not overlap. Is this just because in situ hybridization is not sensitive enough to detect the low level of Foxd1 signal in these Six2 positive cells? One thing worth mentioning is that we looked at mice carrying Foxd1-Cre and ROSA-lacZ and found a small number of blue cells in tubular epithelia, which should be the derivatives of the Six2 population. We thought it could be ectopic expression or occasional repair by stroma cells. But this may also mean that some Six2 positive cap mesenchymal cells expressing Foxd1—as your preliminary results suggested.
Dr. Potter: I like that. This corroborates what we are seeing in our very preliminary single cell data.
Dr. Sanjay Jain, Assistant Professor of Medicine and Pathology and Immunology, Washington University School of Medicine: I have a few questions. One is a developmental question. So you comment about a few hundred specific genes that are compartmentalized, right, so have you gone further and looked at what might be driving the core regulated expression at upstream regions or common transcription factors or micro RNAs?
No, not really. What we have done is not focused on those couple of hundred genes but instead we looked at all of the genes that were highly upregulated in a specific compartment. Not just those few that were unique to a given compartment and not expressed anywhere else. Actually, we have a website that was made at Cincinnati Children's Hospital Medical Center, which I like a lot, called ToppGene (http://toppgene.cchmc.org/
). You just drop in your list of genes, and it will perform this evolutionarily conserved transcription factor binding site—proximal promoter analysis. So, for each kidney development compartment we took the list of 1–200 genes with most compartment specific expression and used this website to look for the over representation of specific transcription factor binding sites in their promoters. It would, however, be interesting to re-examine the data the way you suggest, with focus on just the small number of genes with extremely specific expression within single compartments.