Location proteomics as a branch of proteomics study has grown over the last 10 years. To systematically analyze large amounts of data, automatic algorithms have been developed. This article presented our effort to extend the analysis of static patterns with an additional dimension: the temporal domain. Time series microscopy image of 12 proteins were collected, 5 sets of 2D temporal features and 3 sets of 3D temporal features were implemented and classifications were performed to validate their usefulness. The best 2D temporal feature sets, in the order of their ability to improve classification accuracy, were normal flow, temporal texture and Fourier transform features. Combining 2D static features with 3 sets of 2D temporal feature sets gave the best accuracy of 78%, compared with 66% for static features alone. Accuracy using 3D static and/or temporal features was lower than for 2D features.
If limited acquisition time requires deciding whether to collect 3D static images or 2D time series images, our results suggest that 2D time series images have higher potential of delivering better differentiation. Although not all of the 2D or 3D temporal feature sets improved classification accuracy for our dataset, we still presented them here because they may be useful for future datasets.
While each protein in a proteome is unique, its location patterns might not be. Thus while increasing accuracy of distinguishing the 12 proteins is an indication of feature value, the ability to distinguish them all perfectly is not expected. If proteins interact or colocalize with each other, they cannot be differentiated either by static or temporal pattern. In our dataset, alpha-actinin-4 (actn4) and caldesmon 1 (cald1) both bind to actin, and over 30% of caldesmon 1 is misclassified as alpha-actinin-4. These two proteins are always observed in the same cluster using cluster analysis (data not shown). Similarly, ADP-ATP translocase 23 (timm23) and catalase (cat), which have both been described as mitochondrial proteins, are difficult to distinguish (50% of catalase is misclassified as ADP-ATP translocase 23). The results suggest that there are only 10 distinguishable patterns in our 12 protein set, and this agrees with prior knowledge about these proteins.
For each classification, we compared accuracy with or without SDA feature selection, because SVM is known to be highly robust with large number of correlated features. Our result shows when the number of features is large, 340 normal flow and 189 autoregression, SDA outperforms no feature selection. When the number of features is small, 21 static and 130 temporal textures, SVM does well without SDA feature selection. Many different feature selection algorithms and classifiers could be tried in order to achieve higher classification accuracy, but such an analysis is beyond the scope of this study.
Since temporal texture features and Fourier transform features can be calculated within 25 and 50 s for each time series, they are readily applicable to many high throughput applications. On the other hand, calculating normal flow features, object tracking features and autoregression features take 8, 22 and 2 min per time series, respectively.
Given the dramatic increase in automated microscopy over the past decade, we anticipate that methods for analyzing temporal changes in protein patterns such as those we have described here will be of significant utility both for basic research in systems biology and for drug screening and development purposes.