|Home | About | Journals | Submit | Contact Us | Français|
The promise of Alzheimer’s disease (AD) biomarkers has led to their incorporation in new diagnostic criteria and in therapeutic trials; however, significant barriers exist to widespread use. Chief among these is the lack of internationally accepted standards for quantitative metrics. Hippocampal volumetry is the most widely studied quantitative magnetic resonance imaging (MRI) measure in AD and thus represents the most rational target for an initial effort at standardization.
The authors of this position paper propose a path toward this goal. The steps include: 1) Establish and empower an oversight board to manage and assess the effort, 2) Adopt the standardized definition of anatomic hippocampal boundaries on MRI arising from the EADC-ADNI hippocampal harmonization effort as a Reference Standard, 3) Establish a scientifically appropriate, publicly available Reference Standard Dataset based on manual delineation of the hippocampus in an appropriate sample of subjects (ADNI), and 4) Define minimum technical and prognostic performance metrics for validation of new measurement techniques using the Reference Standard Dataset as a benchmark.
Although manual delineation of the hippocampus is the best available reference standard, practical application of hippocampal volumetry will require automated methods. Our intent is to establish a mechanism for credentialing automated software applications to achieve internationally recognized accuracy and prognostic performance standards that lead to the systematic evaluation and then widespread acceptance and use of hippocampal volumetry. The standardization and assay validation process outlined for hippocampal volumetry is envisioned as a template that could be applied to other imaging biomarkers.
A biomarker is a physiological, biochemical, or anatomic parameter that can be objectively measured as an indicator of normal biologic processes, pathological processes, or responses to a therapeutic intervention (1). Biomarkers used in the Alzheimer’s disease (AD) field include both imaging measures and biofluid analytes. Biofluid analytes in this context can refer to proteins in any biofluid, however cerebrospinal fluid (CSF) biomarkers are presently the most well developed (2). The five most widely studied biomarkers in AD can be divided into two major categories: 1) Biomarkers of cerebral Aβ amyloid accumulation - these are increased radiotracer retention on amyloid-tracer based positron emission tomography (PET) imaging and low CSF Aβ 1-42, and 2) Biomarkers of neuronal degeneration or injury - these are elevated CSF tau (both total and phosphorylated tau); decreased fluorodeoxyglucose (FDG) uptake on PET in the temporo-parietal cortex; and brain atrophy in the medial, basal and lateral temporal lobes and the medial and lateral parietal cortices determined from structural magnetic resonance imaging (MRI) or computed tomography (CT) (3). Three of these five major AD biomarkers are imaging measures and imaging is the primary focus of this position paper. Biomarkers are increasingly important in AD in two contexts: clinical diagnosis/prognosis and therapeutic trials.
Criteria for the clinical diagnosis of AD were established in 1984 (4). These criteria have been widely adopted, validated against neuropathological examination in many studies, and are still used today. A consensus now exists, however, that diagnostic criteria for AD should be updated to reflect the scientific advances of the past quarter of a century. One of most important of these advances is the development of biomarkers for AD. This recognition has inspired recent efforts on several fronts to revise diagnostic criteria for AD. The two most well-known such efforts are those of Dubois et al (5, 6) and the National Institute on Aging (NIA)-Alzheimer’s Association (AA) (7-10). The NIA-AA commissioned three work groups to revise diagnostic criteria. Each was assigned the task of defining or revising criteria for one of three recognized phases of the disease: pre-clinical or asymptomatic AD, symptomatic pre-dementia or mild cognitive impairment (MCI), and the AD dementia phase (7-10). Biomarkers providing evidence of in situ AD pathophysiology are employed in the revised definitions of AD in all three phases of the disease by the NIA-AA and are also included in the criteria of Dubois et al (5, 6).
The second major use for biomarkers of AD is in clinical trials, where biomarkers can be employed for several distinct purposes. As an indicator of AD pathophysiological processes, AD biomarkers may be used for subject inclusion/exclusion – to ensure study subjects are appropriate for targeting of the therapeutic mechanism of action or as an enrichment strategy to improve efficiency of therapeutic trials (2, 11). Biomarkers also provide a biologically-based measure of disease severity. They can be used as a covariate in outcome analyses and as safety measures. Finally, an important application of AD biomarkers in clinical trials is as outcome measures, in which an effect on the biomarker is sought as evidence of modification of the underlying pathological AD process (12-21). However, since AD pathophysiology is increasingly being recognized to be very complex and multifaceted, effects of candidate drugs on some individual pathophysiological aspects of AD may not necessarily be of functional or cognitive relevance. Therefore, increasing efforts are being spent on developing biomarkers which could serve as surrogate endpoints in clinical trials, accurately predicting and reflecting clinically significant outcomes (2, 22) Biomarkers are more objective and reliable quantitative measures of AD pathophysiological processes than traditional cognitive and functional outcomes that are affected by subject motivation and extrinsic factors such as alertness, environmental stresses, and informant mood and distress.
The evaluation of the value of biomarkers is different for therapeutic trials than for clinical diagnosis, but the rationale and methods to standardize and validate the reliability of the measures are very similar. Moreover, if an imaging biomarker is used as an inclusion criterion for subjects participating in a clinical trial of a compound that subsequently achieves regulatory approval, then it is possible, some would say likely, that regulators will require the same biomarker must be approved as a diagnostic to identify patients that are suitable for treatment. This would then require that the biomarker, in our case imaging, be easily implementable in clinical imaging facilities world-wide. Therefore, although requirements in terms of precision and sensitivity to pathology may vary, issues pertaining to standardization of an imaging biomarker for use in clinical trials and for clinical diagnostics are inextricably interwoven.
The potential value of quantitative imaging biomarkers for both clinical diagnosis and clinical trials is clear, but major barriers exist to widespread acceptance and implementation. The most substantive barriers have been the lack of standardized methods for 1) image acquisition, 2) extraction of quantitative information from images, and 3) linking quantitative metrics to internationally recognized performance criteria. These in turn have impeded the establishment of cut points in the continuous range of quantitative values that can be used in diagnosis and evaluating change in clinical trials. Standardization of image acquisition for structural MRI and PET scans has been a major focus of the Alzheimer’s Disease Neuroimaging Initiative (ADNI) project (23, 24) and ADNI acquisition protocols have become the de facto standard for clinical trials and could be applied clinically. On the other hand, little progress has been made in the standardization of techniques for quantitative image analysis, either in ADNI or in the field in general. This is particularly true for MRI where the lack of standardization has led to publication of values that are highly disparate across the literature. For example, greater than two-fold differences in hippocampal volume of cognitively normal elderly subjects have been reported from different centers (25). This is unlikely to have a basis in biology and is almost certainly due to inter-center differences in the measurement tools and the anatomical protocols for delineating the hippocampus. Likewise, a strong methodological dependence is evident in published rates of hippocampal atrophy. Three-fold differences in rates of hippocampal atrophy have been reported in elderly controls as well as wide variations in apparently similar cohorts of AD patients (26). For example, Du et al (27) reported annualized rates of hippocampal atrophy in healthy elderly controls mean age 77 of 0.8%/yr; Jack et al (28) in controls age 78 of 1.4%/yr and Wang et al (29) mean age 73 of 2.3%/yr. This strong dependence upon the method used and its specific implementation undermines the credibility of the results. Both newly proposed diagnostic criteria explicitly point out that extensive work on imaging biomarker standardization is needed prior to widespread adoption for diagnostic purposes.
Qualification or general acceptance of the validity of a biomarker in clinical trials must rest on a well-established body of evidence beginning with widespread agreement that there is clinical significance to the result of the biomarker and that it can be measured with appropriate accuracy and reproducibility. Quantitative measurement of hippocampal volume fulfills these basic criteria. The advantages of hippocampal volume as a target for an initial standardization and assay validation exercise are: 1) The hippocampus is an anatomically defined structure with boundaries that are visually definable in a properly acquired MRI scan. 2) The hippocampus is involved early and progressively with neuronal loss and neurofibrillary tangles, which is one of the primary hallmarks of AD pathology (30). 3) A large imaging and pathology literature provides evidence that loss of hippocampal volume is significant in AD. Numerous studies have shown the association of hippocampal atrophy with neurodegenerative pathology at autopsy (31-36), with clinical diagnoses of AD or MCI (37-43), and with the severity of cognitive disorders and episodic memory deficits due to AD pathophysiology (44, 45). In addition, longitudinal measures of change in hippocampal volume both predict the future cognitive decline and correlate with contemporary indices of clinical decline (46, 47), and quantitative measures of the hippocampus predict progression from MCI to AD (48-63).. 4) Fully automated software tools are now available that can measure hippocampal volume efficiently and reproducibly (21, 37, 58, 64-71). Visual rating (72-74), while convenient and currently used in some diagnostic settings, does not lend itself to detecting subtle size differences, lacks precision relative to quantitative methods, and does not take advantage of the power of current technology. Formal computer-aided manual tracing of the entire hippocampus was introduced over two decades ago to aid in seizure lateralization (75). Although manual hippocampal tracing has been effective for research studies in different diseases, and still serves as the best available Reference Standard measure of the hippocampus on MRI (76), it is time consuming and requires highly trained operators. Thus it is not feasible in routine clinical practice and due to its expense it is impractical in clinical trials. Fully-automated hippocampal volumetry using standardized methods would be a practical alternative to manual methods. Automated hippocampal volumetry has successfully enabled the discovery of novel genes associated with hippocampal volume in over 7000 subjects scanned at multiple internationally distributed sites. This result supports the assertion that such methods can be efficiently and reproducibly applied on a worldwide scale (77). Furthermore, software methods that employ within-subject registration permit sensitive measures of volume change over time (51, 78). 5) While more complex MRI measures of disease-related atrophy consisting of combinations of multiple regions of interest (ROI) might have superior diagnostic properties compared to hippocampal volume (79-84), the analysis of hippocampal volume is less complex than multi-ROI approaches so a reference standard is easier to generate. Specifically, the hippocampus can be delineated by hand, but the disease signatures of more complex analytic methods are a result of training and machine learning methods that would present a further challenge to validate, and are likely to evolve over time.
Further supporting hippocampal volumetry as a target for initial AD imaging biomarker standardization and assay validation is the fact that clinical guidelines in many countries (85, 86) dictate that all patients investigated for cognitive impairment should undergo structural brain imaging to exclude treatable causes such as tumors and hematoma. An MRI acquisition sequence that would permit quantitative analysis of hippocampal volume is easy to include in a routine clinical MRI examination, only lengthens the exam by a few minutes, and is currently considered to be an essential part of a clinically diagnostic imaging protocol at some centers. Moreover, a significant effort has already been expended to standardize acquisition parameters for the high resolution 3D anatomical MR imaging sequence needed for quantitative volume measures across MRI vendors in the ADNI study (23). The ADNI 3D T1 anatomical sequence used for volumetric measurements can be performed in a standardized manner in an overwhelming majority of imaging centers worldwide. Finally, there is an ongoing international initiative led by one of the co-authors (GBF) to establish a Reference Standard in hand-drawn hippocampal volumes, which is the European Alzheimer’s Disease Centers (EADC) – ADNI Hippocampal Harmonization Effort (87, 88).
The issue of validating imaging biomarkers for AD has recently drawn the attention of non-profit organizations, including the Radiological Society of North America (RSNA) and the Coalition Against Major Disease (CAMD). CAMD is part of Critical Path Institute a nonprofit public private partnership dedicated to more efficient drug development. Qualification of hippocampal atrophy for use in clinical trial enrichment is being pursued by CAMD with the US Food and Drug Administration (FDA) and European Medicines Agency (EMA). At a meeting of The Radiological Society of North America Quantitative Imaging Biomarkers consortium in September, 2010 a work group was convened to address the issue of standardizing quantitative imaging of AD. Among the candidate imaging modalities discussed, measures of hippocampal volume on structural MRI were identified as the most widely used in the context of multicenter clinical trials, and therefore were the most obvious candidates for an initial (exemplar) effort to standardize quantitative imaging biomarkers. This position paper follows from the recommendations of this RSNA work group.
In general terms, three separate steps are required for biomarker development: 1) Assay validation (also called technical or analytical performance validity) to show that, when following defined standardized procedures, the biomarker can be measured precisely and accurately compared to a reference standard (89), 2) Clinical Validation to establish that the biomarker has value for a specific intended task and context of use, and 3) Qualification of the biomarker with the appropriate regulatory agencies based upon wide-spread consensus that the biomarker is “fit for purpose” for a particular use. Each proposed task (e.g., diagnostic, prognostic, outcome) needs to be considered separately. Qualification of a biomarker for clinical trials may be a stepping stone to a qualification for its use as a clinical diagnostic. However, the use of a biomarker in clinical diagnosis is distinct from its use in therapeutic trials, and development may focus on one or the other first. The use of a biomarker in clinical trials is at the discretion of the trial sponsor, but mechanisms have been introduced by which regulatory bodies (e.g., the US Food and Drug Administration Center for Drug Evaluation and Research, FDA CDER; or the European Medicines Agency EMA) qualify biomarkers for use in clinical trials. The use of a biomarker for clinical diagnosis requires regulatory approval in the relevant jurisdiction (e.g., approval by FDA Center for Devices and Radiological Health, CDRH, in the USA; or CE marking in Europe), and may separately also require approval from healthcare funders for reimbursement.
Below we outline the steps of a proposed work plan that would lead to standardization of quantitative (automated or manual) hippocampal volumetry as a biomarker for AD in evaluative studies in the context of clinical trials and for diagnosis.
Ideally, the work plan would follow the timeline above where initial steps would focus on establishing the reference standard of manual hippocampus traces, generating a standardized approach to volume normalization and benchmark performance metrics. Once the reference standard is established, then the focus likely would be on evaluation studies and qualifying the reference standard with the FDA and EMA for diagnostic, prognostic and outcome use in clinical trials. Standardized acquisition of MRI scans suitable for hippocampal volumetry are already widely performed and support from the pharmaceutical industry is likely. Subsequently, we expect evaluation studies will be conducted to show the diagnostic value of hippocampal volumetry use outside the context of clinical trials. We wish to emphasize that the intent of this position paper is not to stifle existing alternative methods or innovative development of new methods, but rather to facilitate the development of widely available implementations of automated hippocampal volumetry methods, and to serve as a template for an initial effort which can then be used for other imaging biomarkers.
As an example illustrating the approach discussed above we identified 373 ADNI subjects diagnosed as MCI at baseline who qualified for an analysis of time to progression to AD. Of the 397 ADNI subjects diagnosed as MCI at baseline, 16 had no follow-up visits, and 8 failed quality control, leaving 373 for this analysis (Table 1). A list of the ADNI subject ID numbers used in the example MCI analyses is included as a Supplement. All subjects had hippocampal volume measured in three ways, labeled Methods A, B and C here. In this exercise, we considered Method A to represent the Reference Standard Dataset, and assessed Methods B and C in two ways: technical performance accuracy relative to the Reference Standard Dataset and prognostic performance in predicting conversion from MCI to AD at 2 years post baseline. While the data presented below are real, and not hypothetical, the specific methods are left undefined because we do not wish to have this position paper misconstrued as evidence that the authors endorse a particular method for credentialing.
Of the 373 patients, 166 progressed from MCI to AD during follow-up and 8 progressed to non-AD dementia based upon clinical criteria. We also examined a subset of 313 subjects that either progressed to AD at or prior to the 24 month visit (n=135) or had available follow-up through the 24 month visit without progressing to AD (n=178) to evaluate differences in hippocampal volume for those that progressed at 24 months vs. those that remain stable. Subjects who progressed to non-AD dementia at or before 24 months were excluded from this analysis.
Method B potentially meets two major criteria for credentialing – it is highly accurate in the group-wise and individual measurement of hippocampal volume relative to Method A as shown in the table and scatter plots, and it also has essentially identical performance in predicting conversion from MCI to AD (Fig. 1, Table 2). Method C has a similar prognostic performance in predicting conversion to AD as Method A as shown in the ROC analysis, but in its current form might not meet technical accuracy criteria relative to the reference standard dataset. This is how we would envision the credentialing process would proceed for most automated applications, with the EADC-ADNI harmonization data set of manually traced hippocampi serving as the Reference Standard Dataset and the oversight committee setting predetermined minimal benchmark criteria to judge the performance of individual methods.
One important feature of the process for critically evaluating automated hippocampal segmentation algorithms is the failure rate. For a variety of reasons, usually related to poor scan quality, automated algorithms will fail to produce a plausible result in some proportion of cases in a study. Taken to the extreme, imagine, for example, a method that produced perfect predictive results in cases that underwent successful hippocampal segmentation, but the method failed in 99% of the time. The method would score quite well on prognostic metrics, but would not be practical. A fair and objective approach therefore is needed to penalize automated segmentation algorithms that fail in an unacceptably high proportion of cases.
Heather J Wiste, Mayo Clinic, data analysis
Clifford R. Jack Jr. serves as a consultant for Janssen, Lilly, GE, Johnson and Johnson, Eisai, and Élan. He is an investigator in clinical trials sponsored by Pfizer, Allon and Baxter, Inc. He receives research funding from the NIA [R01-AG11378 (PI), P50-AG16574 (Co-I), R21-AG38736 (Co-I), and U01 AG024904-01 (Co-I)], and the Alexander Family Alzheimer’s Disease Research Professorship of the Mayo Foundation. He owns stock in Johnson and Johnson. Frederik Barkhof has received compensation for consulting from Roche, Lundbeck, Janssen Alz Immunotherapy, and GE Medical Systems. Matthew A. Berstein receives research contract funding from Pfizer Inc., and Baxter Allon. Marc Cantillon is an employee of Coalition Against Major Diseases.
Patricia E. Cole reports no conflicts. Charles DeCarli reports no conflicts. Bruno Dubois has served on the advisory boards of GE healthcare and Eisai/Pfizer. Simon Duchesne reports no conflicts. Nick C. Fox served on the scientific advisory boards of Alzheimer’s Research Forum, Alzheimer’s Society and Alzheimer’s Research Trust and editorial boards of Alzheimer’s Disease and Associated Disorders; Neurodegenerative Diseases and BioMed Central - Alzheimer’s Research and Therapy. He holds a patent for QA Box that may accrue revenue. In the last five years his research group has received payment for consultancy or for conducting studies from Abbott Laboratories, Elan Pharmaceuticals, Eisai, Eli Lilly, GE Healthcare, IXICO, Lundbeck, Pfizer Inc, Sanofi-Aventis and Wyeth Pharmaceuticals. He receives research support from MRC [G0801306 (PI), G0601846 (PI)] NIH [U01 AG024904 (Co-investigator(sub contract)], Alzheimer Research Trust [ART/RF/2007/1 (PI)], NIHR (Senior Investigator) and EPSRC [GR/S48844/01 (PI)]. Giovanni B. Frisoni serves/has served on the advisory boards for Lilly, BMS, Bayer, Lundbeck, Elan, Astra Zeneca, Pfizer, Taurx, and Wyeth. He is a member of the editorial boards of Lancet Neurology, Aging Clinical & Experimental Research, Alzheimer’s Disease & Associated Disorders, and Neurodegenerative Diseases. He serves as the Imaging Selection Editor of Neurobiology of Aging. He has received grants from Wyeth Int’l, Lilly Int’l, Lundbeck Italia, and the Alzheimer’s Association. Harald Hampel disclosed no conflicts. Derek L.G. Hill is an employee and stock holder in IXICO Ltd. Keith Johnson reports no conflicts. Jean-Francois Mangin reports no conflicts. Philip Scheltens serves/has served on the advisory boards of Genentech, Novartis, Roche, Danone, Nutricia, Baxter and Lundbeck. He has been a speaker at symposia organized by Lundbeck, Merz, Danone, Novartis, Roche and Genentech. For all his activities he receives no personal compensation.
He serves on the editorial board of Alzheimer’s Research & Therapy and Alzheimers Disease and Associated Disorders, is a member of the scientific advisory board of the EU Joint Programming Initiative and the French National Plan Alzheimer. The Alzheimer Center receives unrestricted funding from various sources through the VUmc Fonds. Adam J. Schwartz is an employee and shareholder of Eli Lilly and Company. Reisa Sperling has been a consultant for Pfizer, Janssen, Elan, Bristol-Myers-Squibb, Bayer and Avid (unpaid). She is a site investigator for clinical trials for Bristol-Myers-Squibb, Pfizer, Janssen, and Avid. Joyce Suhy is an employee of Synarc. Paul M. Thompson reports no conflicts. Michael Weiner serves/has served on the advisory boards of Elan/Wyeth, Novartis, Banner, VACO, Lilly, Araclon and Insitut Catala de Neurociencies Aplicades, Biogen Idec, and Pfizer. He serves/has served as a consultant for Elan/Wyeth, Novartis, Forest, Ipsen, Daiichi Sankyo, Inc., Astra Zeneca, Araclon, Medivation/Pfizer, TauRx Therapeutics LTD, Bayer Healthcare, Biogen Idec, Exonhit Therapeutics, SA, Servier, Synarc, and Pfizer. Funding for his travel has been provided by Elan/Wyeth, Alzheimer’s Association, Forest, University of California, Davis, Tel-Aviv University Medical School, Colloquium Paris, Ipsen, Wenner-Gren Foundations, Social Security Administration, Korean Neurological Association, National Institutes of Health, Washington University at St. Louis, Banner Alzheimer’s Institute, CTAD, Veterans Affairs Central Office, Beijing Institute of Geriatrics, Innogenetics, New York University, NeuroVigil, Inc., CHRU-Hospital Roger Salengro, Siemens, AstraZeneca, Geneva University Hospitals, Lilly, University of California, San Diego – ADNI, Paris University, Institut Catala de Neurociencies Aplicades, University of New Mexico School of Medicine, Pfizer, Paul Sabatier University, and Novartis. He serves on the editorial advisory boards of Alzheimer’s and Dementia and MRI. He has received honoraria from American Academy of Neurology, Ipsen, NeuroVigil, Inc., and Insitut Catala de Neurociencies Aplicades. He receives research support from Merck and Avid. He owns stock in Synarc and Elan. Norman L. Foster disclosed no conflicts.