|Home | About | Journals | Submit | Contact Us | Français|
Deterministic formulas for the accuracy of genomic predictions highlight the relationships among prediction accuracy and potential factors influencing prediction accuracy prior to performing computationally intensive cross-validation. Visualizing such deterministic formulas in an interactive manner may lead to a better understanding of how genetic factors control prediction accuracy.
The software to simulate deterministic formulas for genomic prediction accuracy was implemented in R and encapsulated as a web-based Shiny application. Shiny genomic prediction accuracy simulator (ShinyGPAS) simulates various deterministic formulas and delivers dynamic scatter plots of prediction accuracy versus genetic factors impacting prediction accuracy, while requiring only mouse navigation in a web browser. ShinyGPAS is available at: https://chikudaisei.shinyapps.io/shinygpas/.
ShinyGPAS is a shiny-based interactive genomic prediction accuracy simulator using deterministic formulas. It can be used for interactively exploring potential factors that influence prediction accuracy in genome-enabled prediction, simulating achievable prediction accuracy prior to genotyping individuals, or supporting in-class teaching. ShinyGPAS is open source software and it is hosted online as a freely available web-based resource with an intuitive graphical user interface.
Prediction of breeding values from high-dimensional single nucleotide polymorphisms is a primary interest in quantitative genetics [1–3]. This is particularly true for the application of genomic selection in animal and plant breeding programs, where genetic improvement of agricultural species relies on the performance of a model to predict unknown breeding values, also known as prediction accuracy. Here prediction accuracy is defined as the correlation between true and predicted genomic values. A deterministic formula such as the one proposed by Daetwyler et al.  highlights the relationship between prediction accuracy and potential factors that influence prediction accuracy. In general, deterministic formulas compute the expected predictive correlation (or squared prediction accuracy R2) on the basis of a number of factors that are potentially useful to assess prediction accuracy before performing computationally demanding cross-validation (CV). It also allows us to decide the optimal design for reference populations (e.g., reference population size) to achieve a desired level of accuracy in selection candidates. Not only theoretical derivations of deterministic formulas but also their applications are active research areas. For instance, Brard and Ricard  recently performed comparison and meta-analysis of deterministic formulas. Erbe et al.  inferred parameters that influence prediction accuracy in deterministic formulas via maximum likelihood. Collectively, these studies have shed new light on alternative aspects of factors influencing predictive performance that may not be obvious from empirical genome-enabled prediction analysis based on CV.
In particular, visualizing such deterministic formulas may lead to a better understanding of how genetic factors control prediction accuracy. Typically, visualization involves generating a static two-dimensional graph, where the y-axis is the genomic prediction accuracy and the x-axis is one of the factors influencing prediction accuracy, while keeping the other factors constant. Given that this type of static graph is a snapshot of a complex dynamic system, if users want to change parameters, they need to re-type and re-execute the code. To provide an overview of the whole landscape of genomic prediction simulation, we need an efficient visualization tool that is capable of generating interactive as well as dynamic graphs. The objective of this article is to describe a Shiny-based web application called Shiny genomic prediction accuracy simulator (ShinyGPAS), which produces interactive graphs and offers an intuitive graphical user interface (GUI) for simulating genomic prediction accuracy based on deterministic formulas.
ShinyGPAS is implemented entirely in R, which is an open source programming language and environment for performing statistical computing and data visualization . The GUI is provided by the shiny R package , a web application framework for R. ShinyGPAS is a Shiny application that leverages R and the shiny package to construct an intuitive framework for deterministic formulas using dynamic interaction and visualization. The ShinyGPAS user interface is shown in Fig. 1. Although ShinyGPAS is R-based software, it does not require users to be familiar with the programming language or download the software on a local computer. The underlying R code is encapsulated by Shiny and offered as cohesive web-based software to be usable solely by mouse navigation in a web browser. This increases accessibility to the software, especially for users with less R programming experience. ShinyGPAS is deployed through the cloud-based shinyapps.io platform for hosting Shiny web applications (https://www.shinyapps.io/).
ShinyGPAS currently delivers eight simulators based on deterministic formulas described in (a) Daetwyler et al. [4, 9], (b) Goddard , (c) Goddard et al. , (d) Rabier et al. , (e) Rabier et al. , (f) de los Campos et al. , (g) Karaman et al.  and (h) Wientjes et al. . The first seven formulas predict accuracy within populations whereas the last one is designed for multipopulation scenarios, including multi-environment and multitrait. Deterministic formulas are functions derived from the combinations of the number of individuals in a reference set, the number of independent chromosome segments underlying the trait, the effective population size, the number of markers, the proportion of genetic variance explained by the molecular markers, and heritability. Shiny-based interactive application offers the implementation of dynamic deterministic formulas, allowing to evaluate the simultaneous impact of all the parameters described above on the prediction accuracy. A user can click a link located within each deterministic formula simulator to access original journal articles. Below are deterministic formulas currently implemented in ShinyGPAS.
Prediction of genomic values is a challenging task and there is no universally best deterministic formula that accounts for all potential factors. Therefore, we will continue adding newly developed deterministic formula to ShinyGPAS.
A typical workflow starts from selecting one of the tab panels on the top (Fig. 1) and then moving to a preferred deterministic formula simulator. Each deterministic formula captures a different aspect of the genotype–phenotype map in the context of genomic prediction accuracy. Thus, navigate through interactively visualized deterministic formulas may highlight the common patterns as well as differences among them. A suite of available parameters such as h2, N, Me, Ne, M, and b are located in the sidebar panel. Shiny slider provides possible input values that can be chosen from pre-defined ranges. Users can pick a preferred value by a simple mouse navigation. A radio button located on the top offers possible options for factors that influence prediction accuracy to be used to determine the x-axis. The Shiny reactive expressions are used in ShinyGPAS to efficiently cache results and ease computational burden to ensure high speed of processing during an interactive session.
A Shiny application has great potential to deliver interactive data analysis and visualization in a web browser. Yet there is limited application of this type of tool in breeding and quantitative genetics. The Shiny framework allows users to convert deterministic formulas of genomic prediction accuracy into interactive graphics in an engaging and straightforward manner. ShinyGPAS can be used for interactive exploration of potential factors that influence prediction accuracy in genome-enabled prediction, simulation of achievable prediction accuracy prior to genotyping individuals, or supporting in-class teaching. The ShinyGPAS source code has been made publicly available on GitHub: https://github.com/morota/ShinyGPAS.
The author declare that he has no competing interests.
GM acknowledges funding from the University of Nebraska startup funds and the National Science Foundation under Grant No. 1736192.