In our previous study [4
], we demonstrated an earlier version of our paraICA algorithm and showed improved performance compared to regular ICA, especially in terms of the connection accuracy. In this letter, we update our algorithm and focus on how the parameters affect the paraICA performance. We also apply our algorithm to new data to evaluate associations between brain function and genomic factors. Under different tolerance levels, we show the paraICA can provide different results for the same dataset, hence emphasizing the importance of controlling for both overfitting and underfitting. Based on the simulation, we empirically selected a tolerance level of −1.0e – 3. The simulation also demonstrates that paraICA is robust to different connection strengths (see ). However, the number of components estimated, typically unknown in real data, has a large effect on the paraICA performance. An over-estimated component increases overfitting and lowers the accuracy of the component, and it produces a higher but false correlation. An underestimated component number also decreases performance and underestimates the connection strength, illustrated in .
For the real data, the related fMRI and SNP components found by paraICA present an interesting relationship between brain function and its possible genetic traits. The largest portion of brain function is located in precuneus, cuneus, and lingual gyrus areas, mainly involved in memory retrieval [9
]. Some of these regions were previously implicated in schizophrenia and other psychiatric disorders [10
]. The linked genetic association consists of ten contributing SNPs (in nine genes). Three of them, CHRNA7, DISC1, and CHAT, have been previously reported to be closely associated with schizophrenia [12
]. Gene DDC, an enzyme implicated in two metabolic pathways, synthesizes important neurotransmitters, dopamine, norepinephrine, epinephrine, and serotonin. Gene ADRA2A has a critical role in regulating neurotransmitter in the cental nervous system. Both gene SCARB1 and gene GNAO1 are expressed in the brain. These results are encouraging and show the utility of our algorithm combining fMRI and SNP.
In summary, using an approach called parallel ICA, we built up a framework to combine two high-dimensional data types, aiming to find hidden factors and connections between them. With properly controlled constraints, avoiding overfitting and underfitting caused by multiple reasons, reliable results can be obtained using this extended paraICA algorithm. Our algorithm provides a promising way to assess multivariate genetic influence on endophenotypes, such as brain function related to mental disorders. Given that current technology can investigate over 500000 SNPs, the analysis of such data will provide a more comprehensive means to identify possible SNP/fMRI associations, and the proposed approach is well-suited to perform such an analysis.