Suggested characteristics of an optimal cohort study for examining genetic and environmental influences on disease have been described () (1
). Large size is a key component, as relevant genetic variants and other risk exposures may be uncommon and effect sizes are often modest (11
). These are not, however, simply small studies made large, as the costs and inefficiencies in 100-fold expansion of a 5,000-person, disease-specific cohort study are prohibitive. Large studies require fundamentally different approaches in which minimizing cost is a primary consideration and “process” expertise to maximize efficiency of high-throughput operations is as important as scientific rigor. Modern industrial design principles that identify and manage critical choke points are essential to ensuring high throughput and maintaining quality (12
Characteristics of Optimal Cohort Studya
Decentralized models involving semipermanent research centers can be expensive to maintain and can present challenges in standardization. Using temporary assessment centers in a centralized model avoids the need to maintain remote offices, staff, and laboratory capabilities. Centralized models may provide greater overall control of costs, as well as agility in responding to changing situations such as relocating underperforming sites or modifying suboptimal procedures. They may thus free investigators to focus on science rather than miring them in the operational concerns of their individual sites.
Centralized models may also have drawbacks. The inherent need to choose a specific population base and standardized assessments for a very large, centralized cohort may limit the questions that can be addressed. This contrasts with the diverse approaches fostered by multiple independent studies that can be a powerful force for improving methodology and assessing the replicability of findings. In addition, the potential for disenfranchising academic centers accustomed to operational leadership in their assigned geographic area may risk losing critical scientific input from these groups. Special care is also needed to involve community-based organizations and to ensure that they feel local concerns are being addressed in a centralized study design. Keeping them engaged in a centralized model requires significant effort with frequent local visits and community meetings that include the study leadership. Being chosen as part of a major national effort can be a source of considerable community pride, especially if it is clear that community input is valued and implemented.
Experienced investigators may also have well-functioning local recruitment systems and understanding of unique local conditions that may require tailoring of methods. Approaches for harnessing this expertise need careful attention, but they might include engaging academic investigators in protocol development and implementation or tasking individual academic centers with study-wide functions such as ensuring diversity of participants, developing novel substudies, or responding to queries through a participant call center.
A key aspect of limiting costs is the vigor with which a high “response rate” is pursued during recruitment. Nationally representative surveys, such as the National Health and Nutrition Examination Survey (NHANES) (13
) and smaller disease-specific studies such as the Cardiovascular Health Study (14
), serve important aims in providing population-based estimates of disease prevalence and incidence. For this purpose, they need representative population samples, necessitating considerable expenditures to ensure high response rates (15
). The limitations of an essentially volunteer sample are well known, particularly the generally healthier profile, higher educational attainment, and greater health consciousness of volunteers (17
). This “healthy volunteer” effect can lead to underestimation of disease prevalence and incidence, but its impact on relative risk estimates for environmental and genetic factors is generally not important (18
). Although high response rates are critical for population-based estimates of disease incidence, prevalence, or mortality, a 10% or even 1% response rate may be acceptable in certain situations, especially if the focus is on risk associations and the base population is large enough to capture a diversity of exposures and backgrounds. Results from such studies can still be applicable to populations with different distributions of these exposures, although this cannot be proven but only assumed for exposures that are unknown or unknowable. High response rates thus need not be a driving factor scientifically or economically. For these reasons, UK Biobank chose to emphasize diversity but to de-emphasize response rates. It has accepted yields of 5%–10% while realizing substantial savings by not attempting to convert initial refusals, as conversion was not found to be effective in pilot studies.
Other important cost determinants include method of ascertainment (such as registry vs. household enumeration), complexity of data collection, and follow-up methods (such as active vs. passive). Choice among these is largely driven by the scope and goals of a given study. Irrespective of these choices, however, centralized approaches are likely to provide the advantages described above, as evidenced by the marked reductions in UK Biobank costs when it shifted to a centralized design while keeping other aspects constant.