We perform our experiments on a collection of 304 3D movies of GFP-tagged proteins in NIH 3T3 cells over 12 different cell lines, with a different protein labeled in each cell line. The lines were generated by CD tagging (Jarvik et al.
, 2002) and the microscope and image acquisition parameters are as described in Hu et al. (2006
). At each time point of the movie, we have a single-channel stack of 15 z
-slices, 1280×1024 pixels each. The x
-resolution is 0.11 microns, and the distance between pixels in the z
-direction is 0.5 microns. There is a 45 s interval between frames.
The set of proteins included in this study are located in six major subcellular structures. summarizes the tagged gene/protein for each cell line, along with the location in the cell where the protein is expressed, the number of movies for each cell line, and the number of frames in each movie. Note that we aim to distinguish proteins even when they appear in the same subcellular compartment. This is possible because such proteins still exhibit different dynamic behavior, and may also have a different spatial distribution within the subcellular compartment. As a known example, histones, RNA polymerases and nucleoporins are three nuclear proteins with very different distributions, which are respectively involved in the compaction of DNA into chromatin, the replication of DNA (nucleoli) and the nuclear pore complex.
Overview of the experimental dataset composed of 12 cell lines
We tested our classification method using leave-one-out cross-validation (i.e. when classifying a test cell, we built the class models from all cells except for this test cell), and achieved 84.2% accuracy on our dataset. A previous result on this same 3T3 dataset used only the middle z
-slice at each time point, and with a range of morphological, texture and temporal features, achieved an accuracy of 80.9% (Hu et al., 2010
). Therefore, in bettering the classification accuracy, we verify that our models are capturing relevant discriminative information present in the movies.
If we consider only the static component of our model—the proportion of objects in each type mλ—we get 70.4% accuracy. The addition of the dynamic components, (mλ,λ′,mλ,Ø), completes the model, giving our final result of 84.2%.
3.3 Cell models: when to stop acquiring frames for each cell
To test our intelligent stopping algorithms, we take each movie in our dataset and determine how many frames would have been acquired for a given stopping algorithm and threshold. We then measure the average number of frames acquired and the average classification accuracy. For example, in the extrinsic scenario with α=0.7, we acquire 3–35 frames per cell, with an average of 8.0 frames, yielding 80.3% classification accuracy. We compare this to the standard method that acquires exactly eight frames for each cell, yielding only 75.0% classification accuracy. To reach 80% classification accuracy using the standard method, we would need to acquire over 14 frames per cell—almost twice as many as with our intelligent method.
In , we compare our intelligent methods for the extrinsic and intrinsic scenarios with the standard method, using a range of stopping thresholds to get the different points on the curves. The intelligent algorithms achieve significantly higher accuracy for the same average number of frames acquired, with the best results for the extrinsic scenario.
Fig. 2. When to stop acquiring (cell models). The average classification accuracy is shown as a function of the average number of frames acquired with one of three methods to choose when to stop acquiring. The first method (solid line) uses the intelligent algorithm (more ...)
3.4 Class models: when to stop acquiring frames for each cell
To test these intelligent acquisition algorithms, we set aside 10 movies as our testing set. We take the remaining 294 training movies and determine which frames would have been acquired for each intelligent stopping method and threshold. We then use these to build a class model for each of the 12 classes, and attempt to classify the testing set. We use every frame for movies in the testing set. We ran 10 000 trials, randomizing the testing and training sets, including their order, in each trial. In , we show the resulting classification accuracy against the average number of frames acquired per cell with (i) the intelligent acquisition algorithm for the extrinsic scenario, varying the classification confidence threshold to get the different points on the curve, (ii) the intelligent acquisition algorithm for the intrinsic scenario, varying the likelihood uncertainty threshold to get the different points on the curve and (iii) a standard acquisition algorithm that acquires a fixed number of frames per cell. Once again, we see that the intelligent algorithms achieve significantly higher accuracy than the standard method for the same average number of frames acquired; the results for the extrinsic scenario are best. In this scenario, the intelligent algorithm requires only 10 frames per cell to reach a classification accuracy of 80%, whereas the standard algorithm requires 18 frames. We expect that the difference between the two intelligent methods will increase as the number of cells per class increases.
Fig. 3. When to stop acquiring (class models). The average classification accuracy is shown as a function of the average number of frames acquired with one of three methods to choose when to stop acquiring. The first method (solid line) uses the intelligent algorithm (more ...)
3.5 Class models: how many cells to acquire
To test our methods for how many cells to acquire, we randomly choose one movie (cell) per class to serve as the testing set. From the remaining movies, we randomly choose 10 movies per class as the training set. We choose a set of r classes to revisit and add five more movies from each of these classes to the training set. Finally, we classify the testing set, and record the accuracy.
We vary the number of classes to revisit, r, from 0 (none) to 12 (all). We test three methods for choosing which r classes to revisit: those of high-global marginal utility, those of high-local marginal utility and randomly chosen. The results are averaged over 100 000 trials and shown in a, which displays classification accuracy as a function of r. We can see that intelligently choosing which classes to revisit gives higher classification accuracy, with the best results for the global scenario.
Fig. 4. How many cells to acquire. Initially, we acquire 10 movies from each class. We then acquire five more movies (a) or 10 more movies (b) from a selected set of classes. The acquired movies are used to train a classifier that is then evaluated on held-out (more ...)
b repeats the above experiment, but acquires 10 additional movies from the revisited classes (instead of 5). Because we have fewer than 20 movies available for some classes, we do this for only six classes. The results are similar to those of a, but the increase in accuracy with intelligent acquisition is greater.
3.6 Computation time
The time taken to build a model is about 1–3 s per frame (Intel Core Duo 2.2 GHz processor, 1.96 GB of memory), depending on the number of objects; the time taken for each of the intelligent acquisition algorithms is negligible. This is relatively fast in comparison with the acquisition time of 45 s per frame. Furthermore, for the algorithms that determine when to stop acquiring frames, we can immediately begin acquiring the subsequent frame even while we are evaluating whether we should stop acquiring or not; thus, this computational penalty only applies on the one frame at which we choose to stop acquiring. For the algorithm that determines how many cells to acquire, the added computation time for building the model of a cell is negligible in comparison with the time it takes to acquire that cell.