Understanding the many complex cellular and subcellular processes underlying biological phenomena will require approaches for obtaining spatiotemporal information for the tens of thousands of proteins expressed in a typical cell. These measurements (in many cases in the form of statistical estimates) can then be used in modeling and simulation efforts where the goal is to predict and help understand cellular systems. One such example in microtubule organization is the simulation of microtubules with motor proteins in order to understand their phenotypic behavior (1
). Another example in the area of cytoskeleton organization is the simulation of actin to understand the lamellipodial behavior of a cell (2
Among the many relevant cellular phenomena to be modeled and quantified, the subcellular spatial distribution of proteins (their location and overall organization) is important because of the crucial role that location plays in many cellular phenomena. Many neurodegenerative diseases such as Alzheimer’s and Parkinson’s are related to the malfunction of microtubule associated proteins and the microtubule network that leads to the accumulation of protein aggregates in brain cells (3
). Technologies that facilitate the quantitative analysis of protein location, on a proteome-wide scale, would therefore have potentially high impact.
Many approaches have been described for obtaining subcellular location data of large numbers of protein distributions (4
). Green fluorescent protein (GFP) tagging has emerged as the most widely used tool for this purpose and has enabled proteome-scale studies (see (9
) for a prominent example using GFP-fusions in yeast). A notable exception is the work by the Human Protein Atlas project (10
), which uses antibody-based methods and has generated millions of images for over six thousand antisera against various proteins. Although it is possible to interpret the information content in such collections visually, automated approaches can play an important role in extracting more detailed quantitative information from them (12
Potential frameworks to characterize protein location patterns from such image data include descriptive techniques and generative models. In short, descriptive techniques seek to describe the content of images using numerical feature vectors, one vector per cell or image. These techniques enable automated subcellular location determination
using supervised learning approaches (see (13
), for an example) but, in the absence of any associated modeling technique, they cannot be used to provide quantitative physical information pertaining to the protein distributions. Generative models, on the other hand, generalize from examples by learning a description of the underlying process believed to give rise to the image (14
). We have previously described a framework to learn generative models of multiple subcellular location patterns from cells (15
). Cell membrane, nuclear and protein object models were constructed so that simulated images representing seven different subcellular location patterns could be generated. In short, one way to fully understand the location patterns of individual proteins in a given cell type is to summarize this information in the form of a model that can accurately represent the statistical variation contained in a set of fluorescence microscopy images. In the context of this work, we sought to demonstrate that physically meaningful parameters describing the process by which protein distributions are generated can also be learned from these images. We also sought to extend our previous modeling framework, which represented protein distributions as a collection of distinct objects (15
), to protein distributions such as microtubule networks, that cannot be easily represented as objects.
There are several direct methods for estimation of microtubule parameters by tracing described in the current literature. For these, however, the imaging approach is either not suitable for intact cells, or the image resolution is not sufficient to discern individual microtubules throughout the entire extent of the cell (16
). This can be seen in in which the high density of microtubules near the centrosomal region makes it impossible to visually or computationally extract individual tracks. Even in regions where individual tracks can be discerned (often near the boundary of the cell), tracing algorithms are invariably hindered by “crossing” tracks. One solution is to use specialized microscopy methods that greatly enhance estimation of filament like structures: Fluorescence Speckle Microscopy (21
), Fluorescence Correlation Spectroscopy (22
) and Stimulated Emission Depletion microscopy (23
). However, these methods are not easy to apply on a proteome scale. Indirect approaches, on the other hand, are more suitable for filament structures since the structures themselves do not have to be matched exactly but rather the pattern they form in an image is matched instead. A compelling example of such an approach was used to validate models of the mitotic spindle (24
). In that study, however, very limited and simple image features such as mean of fluorescence intensity was used to compare patterns in the images. Another excellent example of an indirect method was analysis of the structure and dynamics of the actin filament network in the lamellipodia of a migrating cell (25
). However, images in this work were cropped to a representative region in the lamellipodia that would not be expected to yield accurate estimate parameters for the entire cell. The method of comparison used only a distribution of correlation lengths from images, which may not be adequate to completely quantify complex patterns in images resulting from overlapping filament structures.
Example image from the 3D HeLa dataset. (A) shows the sum X-Y projection of the image (B) shows a slice along the X-Z and (C) shows a slice along the X-Y. The scale bar is 10 μm.
OVERVIEW OF OUR CONTRIBUTION
The principle behind the system we describe is that very basic a priori knowledge can be used to formulate models of proteins from which artificial images are generated (according to initial estimates of the parameters of the model). The model parameters are then iteratively modified until a specified similarity measure between the real input images and the artificial ones is maximized. The critical steps in this procedure are shown in and include microtubule pattern generation, image simulation, and comparison with a real microscopic image. These steps are assembled into an optimization procedure to be detailed below. We have obtained preliminary results with both simulated and real data showing that extraction of such parameters for microtubule distributions is feasible.