Proteins function in different cellular or subcellular compartments as part of complex systems. In systems biology, investigating and modeling these complex systems from different aspects and at various levels is hoped to lead to a mechanistic understanding of cell behavior [
1,
2]. Extensive efforts have been made towards proteome-scale determination of protein sequence, structure, abundance and interactions and tremendous progress has been achieved. Much less information is available about protein location within cells, with descriptions using words (such as GO terms) being the main approach used to represent this important concept. More detailed and comprehensive approaches to learning and describing the spatial distributions of proteins at different levels of accuracy will be critical for systems models.
Development of modern microscopy technology makes observation of protein localization possible both in vitro and in vivo with high throughput. However, traditional visual analysis to recognize protein localizations can be a key barrier for converting large sets of images to useful descriptions of protein locations. To overcome this difficulty, machine learning methods and digital image processing tools have been combined to develop systems that automatically recognize protein subcellular location patterns [
3]. Here “pattern” designates the subcellular distribution of a protein, or of a set of proteins whose distributions are statistically indistinguishable. The most critical component of these systems is sets of numerical features to describe protein subcellular location patterns in 2D or 3D images. With these features, the feasibility is capable of classifying major protein subcellular location patterns with high accuracy and efficiency compared to visual analysis has been demonstrated [
4,
5].
However, recognition of location patterns provides only limited information. For example, describing a protein location as “nucleus” in a given cell type under a given condition provides no detail on how it is distributed within the nucleus (and of course no information on the size or shape of nuclei in that cell type). Similarly, recognition based approaches can describe a protein’s “relocation from organelle A to B“ but communicates no information about how this process happens spatially and geometrically. Thus, beyond simply recognizing subcellular location patterns, an important goal is to be able to build models to capture the essence and variation of a specific pattern.
Zhao and Murphy [
6] describe the first system for constructing
generative models of subcellular patterns in 2D images, providing a framework in which cell structure and subcellular location patterns can be represented and communicated. In this work, images are viewed as the manifestation of a set of random variables and image synthesis or generation is viewed as a stepwise random process. A statistical generative model is the combination of all distributions of these random variables. Building a generative model for images in the form of a joint distribution of all pixels in an image is too computationally expensive (and potentially underdetermined) to be practical. Therefore, methods of computational geometry and data analysis were explored in order to compromise between complexity and accuracy of the model. 2D fluorescence images of cells were represented by three major components: nucleus, cell membrane, and protein objects distributed inside these compartments. All three components were represented by small sets of parameters (much fewer than the number of image pixels) from which the key features related to protein locations in the original image can be reconstructed with reasonable accuracy. The three components were modeled conditionally on each other, e.g., the output of the model of organelle position takes as inputs the instances drawn from models of cell and nuclear shape.
While this initial approach is useful for the vast majority of fluorescence microscope images that are acquired in only 2D, they represent a significant simplification of actual cell organization in 3D. Therefore, in this paper we describe extending the generative modeling and simulation framework to 3D.