Search tips
Search criteria 


Logo of bioinfoLink to Publisher's site
Bioinformatics. 2011 July 1; 27(13): 1854–1859.
Published online 2011 May 9. doi:  10.1093/bioinformatics/btr286
PMCID: PMC3117392

Model building and intelligent acquisition with application to protein subcellular location classification


Motivation: We present a framework and algorithms to intelligently acquire movies of protein subcellular location patterns by learning their models as they are being acquired, and simultaneously determining how many cells to acquire as well as how many frames to acquire per cell. This is motivated by the desire to minimize acquisition time and photobleaching, given the need to build such models for all proteins, in all cell types, under all conditions. Our key innovation is to build models during acquisition rather than as a post-processing step, thus allowing us to intelligently and automatically adapt the acquisition process given the model acquired.

Results: We validate our framework on protein subcellular location classification, and show that the combination of model building and intelligent acquisition results in time and storage savings without loss of classification accuracy, or alternatively, higher classification accuracy for the same total acquisition time.

Availability and implementation: The data and software used for this study will be made available upon publication at and

Contact: ude.umc@kanelej


The creation of accurate and predictive models for cells and tissues will require detailed information on the subcellular location of all proteins. The field of location proteomics (Murphy, 2005) is concerned with learning this information for entire proteomes and with capturing it in the form of generative models that can be used in cell simulations. Given the scale of the problem, efforts to optimize acquisition of location information are highly desirable (Jackson et al., 2009).

When studying the spatiotemporal behavior of proteins in a single cell using fluorescence microscopy, we typically have to choose a priori the number of frames to acquire. This can be problematic as we do not generally know in advance how many frames will be necessary to obtain the information we seek. We thus risk acquiring either too few or too many frames than needed for our application, thereby wasting time and storage space. We use the term frame to represent a time point in the time-lapse imaging of live cells.

Similarly, when learning about a homogeneous population of cells, here called class of cells, we may take several cells from this class and acquire a movie of each. However, we do not know in advance how many cell movies we should acquire to gain an understanding of the class nor how many frames to acquire for each cell movie. Once again, we risk acquiring too few cells and frames and not learning what we wish to know, or wasting time by acquiring too many cells and frames. Reducing the area and duration of exposure to the exciting light also protects the sample from photobleaching and phototoxicity in fluorescence microscopy. In this work, we propose intelligent model building and acquisition algorithms to deal with the above problems. These algorithms automatically determine, during acquisition, when to stop acquiring frames from a particular cell, and when to stop acquiring cells from a particular class. As shown in Figure 1, they work by building models during acquisition. This is in contrast to the sequential approach to processing microscopy images, which would view model building as a post-processing step. We apply the algorithms to 3D movies of 12 3T3 cell lines tagged with green fluorescent protein (GFP), with a different protein labeled in each cell line. We consider each cell line to be a different class and determine the parameters of acquisition (how many cell movies we need to learn about each class, and how many frames we need in each cell movie).

Fig. 1.
Framework for intelligent acquisition of a class model.

We test these algorithms by trying to recognize (classify) the pattern of the labeled proteins and show that:

  1. We can build models both of an individual cell and of a class of cells, which can then be used to correctly classify the subcellular location pattern of a given cell.
  2. When we build models from data that is intelligently acquired, the models achieve a higher classification accuracy on an independent test set than when we build models from the same amount of data acquired with standard acquisition methods (a fixed number of frames per cell and a fixed number of cells per class).


2.1 Model building and classification

We distinguish between two types of models: a cell model is based entirely on a movie of a single cell. A class model is based on all cell models from that class. In this application, a class consists of all the cells expressing a specific GFP-tagged protein. Note that while some of these proteins show similar static location patterns, they are largely distinguishable when temporal information is available (Hu et al., 2006, 2010). We now describe how to build cell models and class models, as well as our classification method.

2.1.1 Cell models

Our models are based entirely on objects present in the movie. The basis for this approach is our previous work demonstrating that subcellular patterns in static images can be adequately recognized by finding objects by thresholding, describing each object by numerical features, identifying object types by clustering and representing an image by the fraction of objects of each type that it contains (Velliste, 2002; Zhao et al., 2005).

The object detection stage aims to find sets of connected pixels in 3D that display a higher intensity level than their local environment. The pixel intensities represent the GFP signal in the protein channel, and the resulting objects represent the pattern displayed by a labeled protein within a cell. Because the background intensity is not uniform and objects may be touching, we use a multi-threshold approach for object detection (Gonzalez and Woods, 2008). First, the background is removed in each z-slice by subtracting the most common pixel intensity below the average intensity. Then, we apply a Gaussian filter (diameter=5 pixels, SD=1) to reduce the noise of the image. Finally, we find local thresholds (above a minimum threshold determined empirically), such that objects represent the biggest set of 3D-connected pixels that contain only one maximum intensity.

To assign object types, we compute seven static features based on previous work (Velliste, 2002; Zhao et al., 2005), as shown in Table 1. Similarly to that previous work, we take all objects across all frames and movies, normalize their features to unit standard deviation, and cluster them based on these features using the batch k-means algorithm. An object's type is defined as the number of the cluster into which it falls. We choose the number of types, k, to maximize classification accuracy on the training set using nested cross-validation.

Table 1.
List of static subcellular object features (SOFs)

The cell model then consists of three components (mλ, mλ,λ′,mλ,Ø):

  1. The first component, mλ, is a k-by-1 vector representing the proportion of objects of type λ.
  2. The second component, mλ,λ′, is a k-by-k matrix representing the proportion of objects of type λ that have a nearby object of type λ′ in the subsequent frame. We define a nearby object as an object whose center is within a distance dmax of the original object. We choose dmax to be 0.5 μm (corresponding to 1 pixel in the z-direction, and 4.5 pixels in the x,y-directions).
  3. The third component, mλ,Ø, is a k-by-1 vector representing the proportion of objects of type λ that have no nearby objects in the subsequent frame.

When computing these proportions, we take the posterior mean under the assumption of a uniform prior (Bishop, 2006). Hence, to determine mλ, we count the number of objects of type λ across all frames, storing as Nλ. If N is the total number of objects, we calculate mλ as:

equation image

Similarly, to determine mλ,λ′, we iterate through all objects of type λ, and look for nearby objects in the subsequent frame of type λ′. If the total number of such events is Nλ,λ′, we calculate mλ,λ′ as:

equation image

We derive mλ,Ø in the same fashion as mλ,λ′, but replacing Nλ,λ′ with Nλ,Ø, where Nλ,Ø is the number of objects of type λ with no nearby objects in the subsequent frame.

Cell models can be built up while the movie is being acquired, simply by including all objects present up until the current frame. As more frames are acquired, the cell model is refined by adding in the newly available objects.

2.1.2 Class models

A class model is simply the collection of cell models for that class. Hence, we can view the class model as a mixture model, where its constituent cell models form the component models of that mixture. As additional cells from the class are acquired, the class model is refined to include the resulting cell models.

2.1.3 Classification

To classify a cell model of unknown class, we first measure the likelihood of each of the class's constituent cell models. The likelihood of a constituent cell model m′ given the observed cell model m, L(m′|m), is found by looking at the three components of the model individually:

equation image


equation image

Note that the above expressions assume all components of the model are conditionally independent given the underlying cell model m′. This is analogous to the assumption made in a naïve Bayes classifier (Ng and Jordan, 2002), and has been shown to give good classification results even when it does not strictly hold (Zhang, 2004).

We classify by assigning the cell model to the class with the maximum likelihood.

2.2 Intelligent acquisition

We now describe our intelligent acquisition algorithms. We will show it results in a better model for a given amount of data (as verified by classification accuracy). We begin by discussing how many frames to acquire when building a cell model.

2.2.1 Cell models: when to stop acquiring frames for each cell

When building a cell model with the end application of classification, the goal is to stop acquiring when the classification decision is unlikely to change. We distinguish between two scenarios:

  1. Extrinsic scenario: we have access to the class models during acquisition. This means that we can classify the cell after acquiring each frame, and stop acquiring when this classification result reaches a sufficient confidence.
  2. Intrinsic scenario: we do not have access to the class models during acquisition. Hence, we cannot classify during acquisition, and our choice of when to stop acquiring the cell must be based solely on the data from that cell.

As discussed earlier, we classify by choosing the class of maximum likelihood. Equivalently, we can use the log-likelihood of a class, [ell](c), and calculate the standard error of this log-likelihood, e(c).

To demonstrate this, we first rewrite the equations as log-likelihoods, by taking logarithms of both sides:

equation image


equation image

A cell model m is built from all of the N individual object observations in that cell. We can see from the above equations that, when we wish to determine [ell](m′|m), the log-likelihood of a cell model m′ given the observed cell model m, each individual object observation contributes toward [ell](m′|m). Therefore, we could express [ell](m′|m) as the sum of the log-likelihoods given each individual object observation, [ell](1)(m′),…,[ell](N)(m′). We can then express the standard error of [ell](m′|m) as the standard deviation of [ell](1)(m′),…,[ell](N)(m′), multiplied by the square root of N, where there are N object observations.

If c1 is the most likely class and c2 is the second most likely class, we define the classification confidence, C, as:

equation image

In the extrinsic scenario, we acquire until C exceeds a given classification confidence threshold, α. In the intrinsic scenario, we cannot compute the log-likelihood of a class model, because the class models are unavailable. However, we can still compute the log-likelihood of some sample model, including its standard error, providing an indication of how accurately we expect to predict the log-likelihood of the class models when they become available. The natural choice of a sample model is the model of the cell being acquired. We refer to the standard error of the resulting log-likelihood as the likelihood uncertainty, and acquire until this likelihood uncertainty falls below the likelihood uncertainty threshold.

2.3 Class models: when to stop acquiring frames for each cell

When acquiring a cell to build a class model, intelligent acquisition can give even better results. In the extrinsic scenario, we stop acquiring a cell if we recognize that it is similar to a previous cell in that class. This allows us to focus our time and resources on acquiring cells that are different from previously acquired cells of that class, as these provide the most new information.

To assess whether a cell is similar to previous cells in the class, we simply classify it. If it is correctly classified with high confidence (exceeding the classification confidence threshold), we know that this must be the case and we stop acquiring. Moreover, we also stop acquiring when the likelihood uncertainty falls below the likelihood uncertainty threshold ensuring that we do eventually stop acquisition even when the cell is very different from previously acquired cells of that class.

This method implicitly assumes that acquisition alternates between cells of all the classes, such that we can build and refine a class model of every class simultaneously. In the intrinsic scenario, we do not alternate between cells, and thus the classes are not available before starting the acquisition. In this scenario, we can still use the likelihood uncertainty threshold as before, which gives almost as good results for small numbers of cells.

2.4 Class models: how many cells to acquire

Finally, after discussing how many frames to acquire from each cell, we now discuss when to stop acquiring cells altogether. Although all class models will improve when we acquire more cells of that class, we want to identify those for which further acquisition is especially beneficial. We again consider two scenarios:

  1. Global scenario: we acquire N1 cells from every class, but then have time to revisit r classes and acquire an additional N2 cells. We want to decide which r classes to revisit.
  2. Local scenario: we acquire N1 cells from a class, and then have to decide whether we should acquire an additional N2 cells. Unlike the global scenario, we do not have access to the other classes, and thus must make this decision locally with the information available at that time.

In the global scenario, we revisit the r classes with the highest global marginal utility. We define this as the decrease in classification accuracy on the training set that results from removing a cell from that class. To measure this for a cell C, we begin by testing whether C is classified correctly when all other cells are used for training. Let γ1=1 when it is classified incorrectly, and 0 otherwise. Next, we measure the classification accuracy of all other cells in the same class as C under two scenarios: (i) when training with all cells except the test cell, and (ii) when training with all cells except both the test cell and C. Let γ2 be the mean drop in accuracy from the first scenario to the second. The global marginal utility of cell C is then given by γ12. The global marginal utility of an entire class is the mean of the utility of its constituent cells.

In the local scenario, for each cell model m in the class, we determine its closest match, m′, which is the cell model that has the highest likelihood given m. We define the local marginal utility of a class as the proportion of cell models in that class that are chosen at least once as a closest match. This estimates the probability that a new cell will affect the classification result of existing cells. We revisit the classes with high-local marginal utility.


3.1 Dataset

We perform our experiments on a collection of 304 3D movies of GFP-tagged proteins in NIH 3T3 cells over 12 different cell lines, with a different protein labeled in each cell line. The lines were generated by CD tagging (Jarvik et al., 2002) and the microscope and image acquisition parameters are as described in Hu et al. (2006, 2010). At each time point of the movie, we have a single-channel stack of 15 z-slices, 1280×1024 pixels each. The x,y-resolution is 0.11 microns, and the distance between pixels in the z-direction is 0.5 microns. There is a 45 s interval between frames.

The set of proteins included in this study are located in six major subcellular structures. Table 2 summarizes the tagged gene/protein for each cell line, along with the location in the cell where the protein is expressed, the number of movies for each cell line, and the number of frames in each movie. Note that we aim to distinguish proteins even when they appear in the same subcellular compartment. This is possible because such proteins still exhibit different dynamic behavior, and may also have a different spatial distribution within the subcellular compartment. As a known example, histones, RNA polymerases and nucleoporins are three nuclear proteins with very different distributions, which are respectively involved in the compaction of DNA into chromatin, the replication of DNA (nucleoli) and the nuclear pore complex.

Table 2.
Overview of the experimental dataset composed of 12 cell lines

3.2 Classification

We tested our classification method using leave-one-out cross-validation (i.e. when classifying a test cell, we built the class models from all cells except for this test cell), and achieved 84.2% accuracy on our dataset. A previous result on this same 3T3 dataset used only the middle z-slice at each time point, and with a range of morphological, texture and temporal features, achieved an accuracy of 80.9% (Hu et al., 2010). Therefore, in bettering the classification accuracy, we verify that our models are capturing relevant discriminative information present in the movies.

If we consider only the static component of our model—the proportion of objects in each type mλ—we get 70.4% accuracy. The addition of the dynamic components, (mλ,λ′,mλ,Ø), completes the model, giving our final result of 84.2%.

3.3 Cell models: when to stop acquiring frames for each cell

To test our intelligent stopping algorithms, we take each movie in our dataset and determine how many frames would have been acquired for a given stopping algorithm and threshold. We then measure the average number of frames acquired and the average classification accuracy. For example, in the extrinsic scenario with α=0.7, we acquire 3–35 frames per cell, with an average of 8.0 frames, yielding 80.3% classification accuracy. We compare this to the standard method that acquires exactly eight frames for each cell, yielding only 75.0% classification accuracy. To reach 80% classification accuracy using the standard method, we would need to acquire over 14 frames per cell—almost twice as many as with our intelligent method.

In Figure 2, we compare our intelligent methods for the extrinsic and intrinsic scenarios with the standard method, using a range of stopping thresholds to get the different points on the curves. The intelligent algorithms achieve significantly higher accuracy for the same average number of frames acquired, with the best results for the extrinsic scenario.

Fig. 2.
When to stop acquiring (cell models). The average classification accuracy is shown as a function of the average number of frames acquired with one of three methods to choose when to stop acquiring. The first method (solid line) uses the intelligent algorithm ...

3.4 Class models: when to stop acquiring frames for each cell

To test these intelligent acquisition algorithms, we set aside 10 movies as our testing set. We take the remaining 294 training movies and determine which frames would have been acquired for each intelligent stopping method and threshold. We then use these to build a class model for each of the 12 classes, and attempt to classify the testing set. We use every frame for movies in the testing set. We ran 10 000 trials, randomizing the testing and training sets, including their order, in each trial. In Figure 3, we show the resulting classification accuracy against the average number of frames acquired per cell with (i) the intelligent acquisition algorithm for the extrinsic scenario, varying the classification confidence threshold to get the different points on the curve, (ii) the intelligent acquisition algorithm for the intrinsic scenario, varying the likelihood uncertainty threshold to get the different points on the curve and (iii) a standard acquisition algorithm that acquires a fixed number of frames per cell. Once again, we see that the intelligent algorithms achieve significantly higher accuracy than the standard method for the same average number of frames acquired; the results for the extrinsic scenario are best. In this scenario, the intelligent algorithm requires only 10 frames per cell to reach a classification accuracy of 80%, whereas the standard algorithm requires 18 frames. We expect that the difference between the two intelligent methods will increase as the number of cells per class increases.

Fig. 3.
When to stop acquiring (class models). The average classification accuracy is shown as a function of the average number of frames acquired with one of three methods to choose when to stop acquiring. The first method (solid line) uses the intelligent algorithm ...

3.5 Class models: how many cells to acquire

To test our methods for how many cells to acquire, we randomly choose one movie (cell) per class to serve as the testing set. From the remaining movies, we randomly choose 10 movies per class as the training set. We choose a set of r classes to revisit and add five more movies from each of these classes to the training set. Finally, we classify the testing set, and record the accuracy.

We vary the number of classes to revisit, r, from 0 (none) to 12 (all). We test three methods for choosing which r classes to revisit: those of high-global marginal utility, those of high-local marginal utility and randomly chosen. The results are averaged over 100 000 trials and shown in Figure 4a, which displays classification accuracy as a function of r. We can see that intelligently choosing which classes to revisit gives higher classification accuracy, with the best results for the global scenario.

Fig. 4.
How many cells to acquire. Initially, we acquire 10 movies from each class. We then acquire five more movies (a) or 10 more movies (b) from a selected set of classes. The acquired movies are used to train a classifier that is then evaluated on held-out ...

Figure 4b repeats the above experiment, but acquires 10 additional movies from the revisited classes (instead of 5). Because we have fewer than 20 movies available for some classes, we do this for only six classes. The results are similar to those of Figure 4a, but the increase in accuracy with intelligent acquisition is greater.

3.6 Computation time

The time taken to build a model is about 1–3 s per frame (Intel Core Duo 2.2 GHz processor, 1.96 GB of memory), depending on the number of objects; the time taken for each of the intelligent acquisition algorithms is negligible. This is relatively fast in comparison with the acquisition time of 45 s per frame. Furthermore, for the algorithms that determine when to stop acquiring frames, we can immediately begin acquiring the subsequent frame even while we are evaluating whether we should stop acquiring or not; thus, this computational penalty only applies on the one frame at which we choose to stop acquiring. For the algorithm that determines how many cells to acquire, the added computation time for building the model of a cell is negligible in comparison with the time it takes to acquire that cell.


We have demonstrated that intelligently choosing when to stop acquiring frames and cells leads to an increased accuracy for a given amount of acquired data, or equivalently, a reduced acquisition time and resources for a given accuracy. The intelligent acquisition algorithms described here are not closely tied to the model building procedure used, and thus have broad applicability in other modeling scenarios.

In addition, we have presented a model-building technique based solely on the locations and types of objects present within a cell, and shown that the resulting models can classify with a higher accuracy than previous results using all of the image data.

Automated microscopy is increasingly used both for basic research in cell and systems biology and for drug screening and development (Taylor et al., 2006). Approaches such as those we have described here can be directly incorporated into automated microscopes, such as high-content screening systems, which are typically designed to include decision making during acquisition. Alternatively, they can also be relatively easily added to conventional microscopes. In this case, the main challenge is to create a control loop between the model-building software (MBS) and the microscope control software (MCS), in particular to give the MBS the ability to control rudimentary aspects of microscope acquisition. The two critical requirements are for the MBS to be able to retrieve each cell image after it is acquired, and for the MBS to be able to either initiate acquisition of the next frame or stop acquisition of the next frame (assuming that continued acquisition of a large number of frames is the default). These are surprisingly difficult to achieve with the MCS of commercial microscopes as typically configured, because some microscopes wait until all (or a certain number) of frames have been acquired before writing them to disk, and because the MCS often cannot itself be controlled other than via a graphical user interface. At least three solutions present themselves. The first is to configure the microscope with optional ‘macro’ languages provided by the manufacturer that can incorporate external software into the control loop. The second is to use third-party MCS, such as the open-source MicroManager (Edelstein et al., 2010), which give nearly complete microscope control. This is an excellent solution, with the main disadvantages being that manufacturer support may be lacking in the case of hardware problems and that the performance (latency, acquisition speed) may not be as good as the software that has been optimized for the manufacturer's hardware. The last is to use software that simulates interaction with the graphical user interface to perform basic control. This solution is the lowest cost and has the minimal impact, but the external control software may need to be extensively modified to work with new versions of the manufacturer software.

Once the control loop is established, implementation of the approaches described here is straightforward. As discussed above, the time required for computing models is small relative to acquisition time. The one potential exception is the time required for defining the object types, which is not included in the model-building time (the object types are assumed to be constant during the model building). To learn object types we currently use k-means clustering for many different values of k, and the time required can be many minutes. In the extrinsic scenario, this is not a problem since classes do not change. In the intrinsic scenario, the object types may need to be relearned when a new class is observed. Possible solutions include reducing the range of k over which the search is done, using more highly optimized clustering code, and/or using ‘online’ or incremental clustering approaches.

For the future, we plan to extend the methods described here to allow models to be built from image series collected with spatial and temporal resolution that may vary under computer control.


We thank Yanhua Hu, Jesus Carmona and Theodore Scott Nowicki for acquiring the images used in this study.

Funding: National Science Foundation (grant EF-0331657, in part); National Institutes of Health (grants GM075205 and U54 RR022241, in part); the PA State Tobacco Settlement, Kamlet-Smith Bioinformatics Grant (in part).

Conflict of Interest: none declared.


  • Bishop C.M. Probability distributions. In: Jordan M., et al., editors. Pattern Recognition and Machine Learning. Information Science and Statistics. 1st. New York, NY, USA: Springer; 2006. pp. 76–78.
  • Edelstein A., et al. Curr. Protoc. Mol. Biol. 2010. Computer control of microscopes using μManager. Chapter 14Unit 14.20. [PMC free article] [PubMed]
  • Hu Y., et al. Proceedings of the 2006 IEEE International Symposium on Biomedical Imaging. Arlington, VA, USA: IEEE; 2006. Application of temporal texture features to automated analysis of protein subcellular locations in time series fluorescence microscope images; pp. 1028–1031.
  • Hu Y., et al. Automated analysis of protein subcellular locations in time series images. Bioinformatics. 2010;26:1630–1636. [PMC free article] [PubMed]
  • Jackson C., et al. Intelligent acquisition and learning of fluorescence microscope data models. IEEE Trans. Image Proc. 2009;18:2071–2084. [PubMed]
  • Jarvik J.W., et al. In vivo functional proteomics: mammalian genome annotation using CD-tagging. BioTechniques. 2005;33:852–867. [PubMed]
  • Murphy R.F. Location proteomics: a systems approach to subcellular location. Biochem. Soc. Trans. 2005;33:535–538. [PubMed]
  • Ng A.Y., Jordan M.I. On discriminative vs. generative classifiers: a comparison of logistic regression and naive Bayes. Adv. Neural Inform. Process. Syst. 2002;2:841–848.
  • Gonzalez R.C., Woods R.E. Digital Image Processing. 3rd. Upper Saddle River, NJ: Prentice Hall; 2008. pp. 752–763.
  • Taylor D.L., et al. High Content Screening: A Powerful Approach to Systems Cell Biology and Drug Discovery. New York: Springer; 2006.
  • Velliste M. PhD thesis. Pittsburgh, PA, USA: Carnegie Mellon University; 2002. Image interpretation methods for a systematics of protein subcellular location.
  • Zhang H. Seventeenth Florida Artificial Intelligence Research Society Conference. Florida: The AAAI Press; 2004. The optimality of naïve Bayes; pp. 562–567.
  • Zhao T., et al. Object type recognition for automated analysis of protein subcellular location. IEEE Trans. Image Process. 2005;14:1351–1359. [PMC free article] [PubMed]

Articles from Bioinformatics are provided here courtesy of Oxford University Press