This study provides an unbiased evaluation of potential biomarkers in a well-characterized SMA cohort, carefully controlled to avoid confounding factors introduced by age and functional level. Preliminary analyses identified a rich set of more than 400 analyte markers that regressed significantly against one or more clinical outcome measures in the BforSMA study, 200 of which were against the primary outcome measure, the MHFMS. Candidate markers from this group included 97 plasma proteins, 59 plasma metabolites and 44 urine metabolites.
A key feature of this study is the combination of measures of historical (SMA type) and current function, in a cohort in which recruitment was targeted specifically to minimize the correlation between the two. SMA is unique among monogenic disorders for having its phenotype severity modified by production of a smaller quantity of an identical SMN protein produced from the highly homologous SMN2 gene present in all individuals with SMA. In monogenic neurodegenerative disorders the level of impairment can be modeled as a function of residual activity of the mutated gene and the effect of other (genetic and environmental) factors that modify its impact. Strong association of an identified analyte to MHFMS more than SMA type suggests its relationship to the factors involved in phenotype expression other than SMN, or to more downstream consequences of the debility associated with SMA type or MHFMS.
As there were no a priori assumptions as to what class of analyte (transcript, protein or metabolite) would likely generate biomarkers associated with the MHFMS, a broad set were selected to both maximize class distribution while minimizing experimental error. This approach has been used in biomarker discovery previously in models of atherosclerosis 
and hepatotoxicity 
Cohort Recruitment in SMA Clinical Studies
A number of lessons were learned in the clinical phase of this project. Rapid enrollment was enhanced by a competitive process with weekly reports to participating sites that encouraged, and later constrained, recruitment to needed patient groups based upon age, SMA type and gender. Recruitment was further enhanced by the support of patient advocacy groups, and the assistance of International Spinal Muscular Atrophy Patient Registry at Indiana University to identify subjects interested in participating in the study. One important barrier to enrollment, particularly in weaker individuals with SMA Type I who were the most difficult to recruit, was the exclusion of patients taking prescription drugs. In a few subjects, blood or urine samples could not be obtained, in each case a function of difficult venous access or the limited cooperation possible from children. As expected, recruitment was slower in the winter months and around holidays. The possibility of biases against the population of individuals with SMA as a whole that were introduced by the inclusion and exclusion criteria cannot be excluded. As only children age 2 to 12 years were included, for reasons described previously, the more typical SMA Type I infants with severe weakness were largely excluded as they would not in many cases still be alive or meet the inclusion criteria. Thus the Type I population in this study is potentially biased towards the stronger and more stable subjects.
The study group identified several issues associated with sampling procedures in this study including an apparent lack of appropriate sample collection and storage materials for pediatric subjects, lack of reference standards for pediatric subjects and a paucity of data on sample handling and analytics for biomarker and pediatric studies. This is an untapped area of research that is of critical importance with increasing attention to therapeutics programs for children. For this study population the only systematic issue identified was that the smaller children were more often the ones in whom it was difficult to obtain samples. In future studies of this type using experts in venipuncture for small children may improve the rate of successful sample procurement.
Issues Arising from Limited Range of Functional Measures
One limitation of this study is the use of a scale that may not differentiate among the lowest (“floor effect”) and highest (“ceiling effect”) functioning subject as the primary outcome measure of motor function for correlation to biomarker analyte values. The MHFMS is designed to assess function in children with Type II SMA. As a consequence, it cannot assess the variety of differences in motor function of those with SMA who are unable to sit (Type I infants, and those with SMA who once sat but have since lost this ability) or who are able to stand and walk (i.e.
those with Type III SMA who retain this defining motor ability). Our study strategy was thus to use the MHFMS as the primary outcome measure for regression to analyte values, but to also evaluate analyte regressions to other measures of function ( and Table S1
) that assess ability outside of the MHFMS range, or complement the MHFMS by assessing other motor functions not targeted by items in the MHFMS.
The restricted range of the MHFMS also introduces potential error in the strength of associations found. By design, our study cohort included subjects who scored the constrained maximum or minimum MHFMS value. Because ceiling and floor values will increase (positive) or decrease (negative) slopes of identified associations, and alter the correlation strength in an unknowable way, we conducted a post hoc test analysis of the primary outcome measure removing subjects having these border values. This post hoc approach, by decreasing the number of subjects, will necessarily decrease the power of the statistical models. Nonetheless, the strongest hits as ranked by Q-value were reproduced when the border-score subjects were removed from the analysis.
Given the restricted range of the MHFMS and some of the supplemental measures, it is notable that identified candidate makers that were found to discriminate between high or low functioning SMA subjects were generally consistent across all outcome measures employed in the study (MHFMS, Current Level of Function, Respiratory Support, and Feeding Method). The biological importance of these findings needs to be explored: these correlations may indicate distinct phases of disease pathogenesis, different pathological mechanisms, or even the temporal importance of SMN during early motor development. However, the very existence of markers that distinguish between Type I and Type II, Type II and Type III and between Type III and control subjects is highly encouraging of the potential to develop pharmacodynamics biomarkers from this list. Whether these candidate markers have any predictive value for different therapeutic mechanisms of action remains to be determined.
Protein, Metabolite, and Transcript Findings
Plasma protein candidate markers were generally the most significant markers of the set in the univariate analysis against the MHFMS as well as across other outcome measures. While statistically significant metabolomics analytes were identified, candidate markers from this sample set are undergoing further post hoc evaluation to assess potential confounding effects of special diet and nutritional supplements provided to the majority of the Type I SMA subjects. If confirmed, these metabolomic associations to clinical markers of severe motor function impairment may undermine both their potential usefulness as candidate markers and enthusiasm for further developmental work of their validation.
Unexpectedly, there were no widespread changes in gene expression that correlated with disease severity or comparison of SMA to controls. We also did not observe a significant difference in SMN expression levels between SMA patients and control subjects. By using another method of transcript quantification, lower levels of SMN-FL transcripts were demonstrated for this cohort patients compared to controls (see companion paper, Crawford et al. 
). This discrepancy is most likely due to a difference in the methodology used to quantify the levels of SMN transcripts. The absence of widespread changes in gene expression or splicing in blood suggests that the degree of reduction of the SMN protein levels in this tissue is not sufficient to cause dramatic changes on the level of gene expression or splicing, at least in peripheral blood mononuclear cells. This would be consistent with the fact that blood and other tissues of the body do not typically exhibit significant disturbances of cellular or organ function except in the Type I cohort. Limited SMN expression at the level of symptomatic SMA appears to predominantly affect motor neurons, and possibly some other neuronal types 
. Pathologic changes have been observed in the context of extreme reduction of SMN production in every tissue that has been investigated, but the basis for the increased vulnerability of motor neurons is unknown 
. The study did identify a decrease in the expression of the NLR family inhibitory protein (NAIP), which is consistent with the genomic deletion of SMN1 together with neighboring NAIP in SMA alleles associated with severe type 
The Path to Validation and Qualification and the Biological Utility of the Plasma Biomarkers
The BforSMA project is designed to be an unbiased approach to generating a dataset that can be used for biomarker identification and also post-hoc hypothesis testing. On one hand, an unbiased ascertainment of large data sets will, by design, generate false positives.
When one attempts to determine a role in disease for the statistically significant biomarkers discovered in plasma a few caveats must be emphasized. Any biological differences recorded in the plasma are likely to be downstream, and in some cases far downstream, from the original disease perturbation in SMA, at the motor neuron or neuromuscular junction. A significant perturbation from the norm, as related to a measure of motor function, can reflect any of a number of biological processes that affect bone and other connective tissues and not be indicative of a specific pathophysiological link at the motor neuron level or necessarily reflect reduced levels of SMN protein expression. From a biomechanical perspective changes in bone components are not surprising in SMA. Tendons transmit contractile muscle forces to bone. In a state of muscle weakness, such as in SMA, these forces to bone are less, especially in the long bones of the limbs in the non-ambulatory SMA types I and II. As such, bone remodeling processes, an especially important process in a growing child, are altered. Thus, a case can be made for the functional differences in bone growth remodeling between the weakest and strongest children. Cartilage intermediate layer protein 2 (CILP2), a marker with a strong association to the Hammersmith Functional Motor Score, may not have a direct relationship to motor neuron biology. However it is notable that CILP2, like other high-scoring plasma protein markers such as COMP, TNXB, THBS4, SPP1, COL2A1 to name a few, are associated with connective tissue development (cartilage matrix synthesis in the case of CILP2) and bone and joint disorders. SMA patients have been reported to have bone density losses that are correlated to age, frequent fractures, and in severe cases congenital fractures 
. It is possible that the bone and joint protein signature present in the BforSMA study relates to secondary connective tissue sequelae in the disease. In addition, Shanmurgan et al. found that the SMN protein is a binding partner to osteoclast stimulating factor (OSF), a protein involved in osteoclast development and bone resorption which raises the possibility of a direct SMN-related perturbation in bone in SMA.
On the other hand, biomarkers like CILP2 identified in this study have the advantage of being independent of specific hypotheses about pathophysiology. While some biomarkers may initially attract attention based upon arguments arising from biological plausibility, the role of other potentially valuable biomarkers may not be immediately obvious. Much further work will be necessary to identify analytes on this list that can improve the understanding of SMN deficiency on cellular pathophysiology, or to develop a biomarker that can be valuable to the efficient performance of SMA treatment trials, whether of a specific SMN-enhancing agent or of a therapy targeting other steps of the disease cascade.
Several questions remain to be answered. Will a candidate biomarker be stable over time, and is measurement stability influenced by technical issues such as sample handling or short-term biologic factors such as diet or time of day? Will changes in candidate biomarkers track meaningful changes in impairment, such that it may be an early surrogate for the consequences of SMA as it is experienced by patients? Or will changes over time reflect other processes (e.g.
sarcopenia) that are not related to motor function and therefore may not be informative in the domain of interest? By including a cohort of age and gender matched subjects with other motor impairments relating to localizable neurologic conditions (e.g. myopathy or cerebral palsy), a confirmatory study may help to clarify if changes in biomarkers are specific to SMA or instead relate better to downstream consequences of the disorder. The power of any one biomarker might be insufficient for use in treatment trials, but the contribution of a panel of qualified biomarkers that independently contribute to clinical assessment might be of value to explore. Experimental subjects in this cohort were all healthy at the time of enrollment and cared for at major academic centers where the care of children with SMA is a priority. Would a candidate biomarker perform as well in a larger population of children with SMA compared to those in the BforSMA study? Not all of the development need be in the clinical laboratory. Improvements in measurement of clinical function 
, better matching them to the underlying neuronal dysfunction, or extending their range so that a broader range of children with SMA can be assessed, may yield improvement in biomarker performance characteristics.
The path of a biomarker, from candidate identified by single-visit correlation to a clinical feature to becoming a qualified biomarker with well-characterized meaning, is necessarily multi-faceted and complicated. Confirmation of these observations will require that we reproduce findings in other prospectively collected samples from SMA cohorts that share features with the BforSMA population. Because the BforSMA study was a single-visit effort, to truly determine whether these candidate markers are prognostic and can change with SMA status, they must be tested over time in longitudinal assessments. Such projects are now in development with new analytic methods that can more readily be scaled to clinical research protocols.
Support and further validation of biomarkers can come from other areas, such as confirmation in SMA animal models. The assembly of biomarker networks of metabolites, proteins and transcripts based on statistical significance has the potential to be partially internally validating, as it may identify coordinated cellular or tissue physiologic strain of pathophysiologic importance not apparent from any single metabolite or platform of analysis. Further evaluation of identified candidates to understand their pathophysiologic relationship to SMN, or to the consequences of neuromuscular impairment, might be possible by further biomarker study with SMA patients in which muscle physiology outcome measures that conceptually describe the motor unit function and structure, or comparison to SMA animal models or to other disease control human populations. In particular, testing any hypothesis that a subset of these markers are primarily SMA-specific versus being consequences of secondary changes downstream of neuromuscular disease would be valuable for further study. However, it is important to emphasize that whether they are primary or secondary to SMA pathophysiology, any markers that strongly associate with SMA outcomes over time and replicate in different studies will be of great value in clinical trials and patient management.
Finally, once a subset of the markers is confirmed in other populations and prospective studies, efforts can be devised to determine if combining markers across platforms with SMA clinical characteristics could produce a multicomponent predictive model that has even stronger associations with SMA status or is possibly better able to predict outcomes or response to interventions. These types of markers have proven very powerful in the cancer field and with the emergence of new SMA drug trials, there is potential for developing information about multicomponent models for drug response and even stratify responder populations with some of these SMA candidate markers after they have been confirmed and shown to be responsive to therapy 
.This BforSMA project has generated a resource of protein and metabolite candidate biomarkers for future study. The effort has taken advantage of recent technological developments that enhance our ability to measure a broad range of proteins, metabolites and transcripts from a single blood sample; a well-characterized set of SMA subjects in whom function and age are independent; and the advanced bioinformatics and biostatistical resources necessary to support the project. The data and samples generated by this effort will be an important resource for the field and future studies, with the additional value of serving as a single visit ‘test run’ for an industry-style multicenter trial for several SMA clinical sites. The full dataset from the study will be made available to all investigators in an accessible format to be used as a resource to address the many questions raised by our findings.