For each checklist item shown in , this section provides examples of appropriate reporting from actual scientific articles of genetic risk models for diseases and health conditions, as well as an explanation of the importance and need for the item and helpful guidance about details that constitute transparent reporting.
Methods
Item 4: Specify the key elements of the study design and describe the setting, locations and relevant dates, including periods of recruitment, follow-up and data collection.
Examples: ‘The Rotterdam Study is a prospective, population-based, cohort study among 7,983 inhabitants of a Rotterdam suburb, designed to investigate determinants of chronic diseases. Participants were aged 55 years and older. Baseline examinations took place from 1990 until 1993. Follow-up examinations were performed in 1993–1994, 1997–1999, and 2002–2004. Between these exams, continuous surveillance on major disease outcomes was conducted. Information on vital status was obtained from municipal health authorities.'
42‘A cohort of 2,576 men and 2,636 women from a general population (aged 30–65 years at inclusion) participated in the DESIR longitudinal study and were clinically and biologically evaluated at inclusion, at 3-, 6-, and 9-year visits.'
43Explanation: Key elements about the study design include whether the analyses were performed in: a cohort study, which follows a group of individuals over time to identify incident cases of disease; a cross sectional study, which examines prevalent disease in a defined population; or a case–control study, which compares individuals with the trait of interest to those without.
17, 29, 44 Setting refers to how participants were recruited, for example, through hospitals, outpatient clinics, screening centers or registries and location refers to the country, region and cities, if relevant. Stating the dates of data-collection rather than the duration of the follow-up helps to place the study in historical context and is particularly important in the context of changes in diagnostic methods (eg, imaging and use of biomarkers), and changes in the assessment of genotype and other risk factors.
Researchers should also state whether the data were
de novo collected specifically for the purpose stated in the introduction, or whether the analyses were conducted using previously collected data.
29 The secondary use of existing data is not necessarily less credible, but a statement might help to explain limitations in the study, including, but not limited to, relevant data not being assessed or the presence of peculiar population characteristics.
Item 5: Describe eligibility criteria for participants, and sources and methods of selection of participants.
Examples: (Eligibility criteria) ‘The diagnosis of diabetes in case subjects was based on either current treatment with diabetes-specific medication or laboratory evidence of hyperglycemia if treated with diet alone. Patients with confirmed diagnosis of monogenic diabetes and those treated with regular insulin therapy within 1 year of diagnosis were excluded. Case subjects in this study had an age at diagnosis between 35 and 70 years, inclusive. Control subjects had not been diagnosed with diabetes at the time of recruitment or subsequently and were excluded if there was evidence of hyperglycemia during recruitment (fasting glucose >7.0

mmol/l, A1C >6.4%) or if they were >80 years old'.
45(Sources and methods of selection) ‘The study population consisted of 283 women with previous gestational diabetes mellitus who were admitted to the Department of Obstetrics, Copenhagen University Hospital, Rigshospitalet, Denmark, during 1978–1996 and who had participated in a follow-up study during 2000–2002'.
32Explanation: The predictive performance of a risk model might vary with the population in which the test is applied, and is preferably assessed by testing a random sample of individuals from the population at risk of the disease or outcome. The eligibility criteria, source and methods of selection of the study participants thus inform readers about the assumed target population for testing as well as about the representativeness of the study population. Knowledge of the selection criteria is essential in appraising the validity and generalizability of the study results. Eligibility criteria may be presented as inclusion and exclusion criteria, specifying characteristics, such as age, sex, ancestry, ethnicity and/or geographical region, and, for case–control studies, diagnosis and comorbidity. The source refers to the populations from which the participants were selected and to the methods of selection—whether participants were, for example, randomly invited, referred or self-selected. The diagnostic criteria should be clearly described, including references to standards, if applicable.
For cohort and cross-sectional studies, the population base from which participants were invited (eg, from a general population, specific region or hospital) should be specified. Depending on the aim of the cohort, typical eligibility criteria may include age, sex, ethnicity, specific risk factors and for cohorts of patients, diagnosis, disease duration or stage and comorbidity.
29For case–control studies, one should specify the (diagnostic) criteria that were used to select cases, and the criteria for selecting the controls. The extent to which controls were screened for absence of symptoms related to the disease or outcome under study should be described. Description of the criteria should enable understanding of the spectrum of disease involved. Case–control studies sometimes compare very severe cases with very healthy controls, particularly if the data were previously collected primarily for gene discovery.
8, 46 Such stringent selection of participants is an effective strategy for gene discovery, but predictive performance might be overestimated compared with assessment in unselected populations where controls might have early symptoms or risk factors of disease. Furthermore, for case–control studies, it is important to specify whether cases and controls were matched and how, as overmatching might affect the predictive power of that factor in the sample relative to its predictive power in an unmatched population.
Item 6: Clearly define all participant characteristics, risk factors and outcomes. Clearly define genetic variants using a widely-used nomenclature system.
Examples: (Predictors) ‘We selected six SNPs from six loci on the basis of their association with levels of LDL or HDL cholesterol in at least one previous study. These six SNPs were, for association with LDL cholesterol,
APOB (apolipoprotein B, rs693),
PCSK9 (proprotein convertase subtilisin/kexin type 9, rs11591147), and
LDLR (low-density lipoprotein receptor, rs688); and for association with HDL cholesterol,
CETP (cholesteryl ester transfer protein, rs1800775),
LIPC (hepatic lipase, rs1800588), and
LPL (lipoprotein lipase, rs328).'
47(Predictors) Another example is provision of the information in tabular form (See ).
48 | Table 3Example table: description of genetic variants used in the analyses |
(Predictors) ‘We defined a positive self reported family history of diabetes as a report that one or both parents had diabetes; this definition is more than 56% sensitive and 97% specific for confirmed parental diabetes. […] We considered diabetes to be present in a parent when medication was prescribed to control the diabetes or when the casual plasma glucose level was 11.1

mmol per liter or higher or 200.0

mg per deciliter or higher at any examination'.
48(Outcomes) ‘The prespecified composite end point of cardiovascular events was defined as myocardial infarction, ischemic stroke, and death from coronary heart disease. Myocardial infarction was defined on the basis of codes 410 and I21 in the International Classification of Diseases, 9th Revision and 10th Revision (ICD-9 and ICD-10), respectively. Ischemic stroke was defined on the basis of codes 434 or 436 (ICD-9) and I63 or I64 (ICD-10)'.
47Explanation: All participant characteristics, genetic and non-genetic risk factors, and outcomes that are considered and used in the analyses, should be defined and described unambiguously. Disease outcomes should be defined by reference to established diagnostic criteria or justification of study-specific criteria, if such are employed. Both the selection of genetic and non-genetic risk factors should be clarified. Authors should specify whether all known risk factors are included, and, if not, why some are excluded. Genetic variants should be described using widely-used nomenclature.
49 For example, SNPs could be presented with rs numbers with allusion to the pertinent reference database and build (eg, HapMap release 27).
50 When proxies (surrogate markers) are considered, the correlation with the intended variant should be quantified, for example, in terms of R
2 along with the population used to derive the correlation. When variants are obtained by imputation, the imputation method and reference database should be described along with an estimate of the quality of the imputation.
Item 7: (a) Describe sources of data and details of methods of assessment (measurement) for each variable. (b) Give a detailed description of genotyping and other laboratory methods.
Examples: (Sources of data) ‘Phenotyping was performed by the participating gastroenterologist from each university medical center by reviewing a patient's chart retrospectively.'
7(Sources of data) ‘All clinical measurements were performed in practice by [the first author] (first measurement) and a nurse practitioner (second, third and fourth measurements with in-between periods of 3 months)'.
51(Methods of assessment) ‘Weight was measured in underwear to the nearest 0.1

kg on Soehnle electronic scales. We measured height in bare feet to the nearest 1

mm by using a stadiometer with the participant standing erect with head in the Frankfort plane. We calculated body mass index as weight (kilograms)/height (metres) squared. We measured waist circumference, taken as the smallest circumference at or below the costal margin, with participants unclothed in the standing position by using a fibreglass tape measure at 600

g tension. We measured systolic blood pressure and diastolic blood pressure twice in the sitting position after five minutes' rest with the Hawksley random zero sphygmomanometer. We took the average of the two readings to be the measured blood pressure. We took venous blood in the fasting state or at least 5

h after a light, fat-free breakfast, before a 2

h 75

g oral glucose tolerance test was done. Serum for lipid analyses was refrigerated at −4°C and assayed within 72

h. We used a Cobas Fara centrifugal analyzer (Roche Diagnostics System, Nutley, NJ, USA) to measure cholesterol and triglyceride concentrations. We measured high-density lipoprotein cholesterol by precipitating non-high-density lipoprotein cholesterol with dextran sulfate-magnesium chloride with the use of centrifuge and measuring cholesterol in the supernatant fluid. We used the Friedewald formula to calculate low-density lipoprotein cholesterol concentration'.
52(Outcomes) ‘Women with gestational diabetes mellitus in the years 1978–1985 were diagnosed by a 3

h, 50

g oral glucose tolerance test (OGTT), whereas women with gestational diabetes mellitus in 1987–1996 were diagnosed by a 3

h, 75

g OGTT'.
32(Genotyping) ‘Genotyping was performed with the use of matrix-assisted laser desorption–ionization time of-flight mass spectrometry on a MassARRAY platform (Sequenom), as described previously. All SNPs were in Hardy–Weinberg equilibrium (
P>0.001). The genotyping success rate was 96%. Using 15 samples analyzed in quadruplicate, we found the genotyping error rate to be <0.7%'.
47Explanation: Apart from the selection and definitions of the variables, the sources and methods used for the assessment can impact the quality of the study. Important quality concerns are the potential for misclassification of risk factors and outcomes, as well as the accuracy of genotyping.
29 Sources of data basically refer to who did the data collection and how. Were the data collected by research physicians or trained students? Were questionnaires completed in an interview or based on self-report, and was the genotyping performed in house or by a specialized laboratory? Methods of assessment refer to the specific techniques or questionnaires that were used. If methods have been published previously, provide a reference. The laboratory procedures used to measure biomarkers should be described in sufficient detail for others to be able to perform them and evaluate the generalizability of prediction models that include them. For less widely-used assessments, such as questionnaires and procedures that are developed by the researchers themselves, authors should report validity and reliability information about the quality of the assessment.
53 When different assessments are used at baseline and follow-up (eg, baseline assessments done by research physicians and follow-up assessments obtained from medical records of the general practitioner) these should be explained. When there is an arbitration process for outcomes (eg, centralized team arbitrating on outcomes based on information contributed by local investigators in peripheral centers), this process should be specified.
Item 8: (a) Describe how genetic variants were handled in the analyses. (b) Explain how other quantitative variables were handled in the analyses. If applicable, describe which groupings were chosen, and why.
Examples: (Genetic variants) ‘Using these 18 SNPs, we constructed a genotype score ranging from 0 to 36 on the basis of the number of risk alleles (see for coding of the risk alleles)'.
48(Genetic variants) ‘For the first analysis of the effects of the polymorphic DNA variants, we used additive genetic models. In addition, we tested dominant and recessive alternative models for the best fit […]. Multivariate linear regression analyses were used to test correlations between genotype and phenotype. Non-normally distributed variables were logtransformed before analysis. The effect size of a genetic or clinical risk factor on the risk of type 2 diabetes was calculated from multivariate regression analysis, with adjustment for age and sex, with the use of Nagelkerke R square. We estimated the predictive value of a combination of risk alleles (each person could have 0, 1, or 2 of them, for a total of 22) in 11 genes, which significantly predicted the risk of diabetes by defining subjects with more than 12 risk alleles (about 20%) as being at high risk and those with fewer than 8 risk alleles (about 20%) as being at low risk'.
33(Other variables) ‘Multivariate unconditional logistic regression analysis was performed to evaluate the relationships between prevalence or progression of AMD and all the genotypes plus various risk factors, controlling for age (70 years or older versus younger than 70), sex and education (high school or less versus more than high school), cigarette smoking (never, past, or current) and body mass index (BMI), which was calculated as the weight in kilograms divided by the square of the height in meters (<25, 25–29.9, and 30+)'.
6Explanation: There are many approaches to data analysis of genetic variants; thus, specification and clarification of this handling is particularly relevant. Genetic variants may be entered in regression analysis separately as dominant or recessive effects for example,
54, 55 per allele (additive or log-additive) effects,
32 or genotype categories.
42, 56 Any of these three approaches can be followed depending on what was the best fitting genetic model for each variant.
6, 7, 8 Alternatively, genetic variants may be entered combined as risk scores.
33, 47, 52 Risk scores often simply sum the number of risk alleles or genotypes (unweighted), or sum their β-coefficients from regression analyses (weighted). When using risk scores, authors should explain which of the alleles or genotypes is considered as the risk variant, as this is not necessarily the less common (minor) variant (see ). The description of the coding of the genetic variants should enable other researchers to replicate the analyses for validation or updating of the risk model.
Quantitative variables can be handled as continuous or be categorized. Transformations may be required when the relationships between the variables and the outcome are not linear, and these should be specified. Frequently, quantitative variables are categorized before inclusion in the analyses. A well-known example is body mass index, which is categorized as underweight, normal weight, overweight and obese. The rationale and thresholds used for categorization should be explained, particularly when they deviate from commonly used cut-offs based on clinical or epidemiological studies.
Item 9: Specify the procedure and data used for the derivation of the risk model. Specify which candidate variables were initially examined or considered for inclusion in models. Include details of any variable selection procedures and other model-building issues. Specify the horizon of risk prediction (eg, 5-year risk).
Examples: (Model derivation) ‘We constructed multivariable proportional-hazards models to examine the association between the genotype score and the time to the first cardiovascular event, excluding subjects who had had a previous myocardial infarction or ischemic stroke. We first confirmed that the proportional-hazards assumption was met. The hazard ratio for the genotype score as a continuous measure was estimated in a model adjusting for all 14 available baseline covariates. Cumulative incidence curves were constructed according to the genotype score with the use of Cox regression analysis.'
47(Variable selection) ‘Twenty-three candidate genes involved in the pathogenesis of inflammation and myocardial ischemia-reperfusion injury were selected
a priori based on previous transcription profiling in humans and animal models, pathway analysis, a review of linkage and association studies reported in the literature, and expert opinion. Forty-eight SNPs were subsequently selected in these process-specific candidate genes, based on literature review, genomic context, and predictive analyses with an emphasis on functionally important variants'.
54(Model building issues) ‘Both univariate and multivariate odds ratios (ORs) were calculated with a binary-logistic regression model … to evaluate the relationship between polymorphisms and prevalent CVD. For that purpose, dummy variables were created using the homozygous wild-type genotype as reference category. Age and gender, both demographic variables, were incorporated in both the univariate as well as in the multivariate linear regression analyses … Adjustment for potential confounders was performed by incorporating smoking, alcohol, diabetes mellitus, waist circumference, serum creatinine, mean systolic and diastolic blood pressure, microalbuminuria and dyslipidaemia into these models. To avoid collinearity, waist circumference was used instead of waist-to-hip ratio or body mass index and condensed measures such as diabetes and dyslipidaemia were used, as defined earlier'.
51Explanation: Because of the potential for flexibility in the derivation of the risk model, authors need to clarify why and how they constructed the model as they did and which data they used. This clarification includes a specification of the variables, defined in item 6, that were initially considered and which procedures were followed for a final selection (eg, backward deletion or forward inclusion, and the criteria for deletion and inclusion), if applicable. Clarification also includes a specification of the study participants included in the analysis, if different from the total study population, transformations of the variables, the choice of statistical model (eg, logistic or Cox proportional hazards models), and the handling of interaction effects between predictors in the model (see also item 13). The specification also concerns the rationale for constructing separate models for subgroups, eg, for different ethnic groups, or including the stratification variable as a variable or interaction effect in a model for the total population.
Authors should also specify and explain the horizon of the risk prediction, when appropriate (eg, in cohort studies, whether the model predicts, for instance, 5 year or lifetime risk). When more complicated risk prediction models are developed using statistical learning methods such as regularized regression or support vector machines, these should be explained and specified in sufficient detail that others can implement these models in other data sets. For some more complex ‘black box' models (such as random forests) this may require making a software implementation of the final model available. The description of the data used should include whether a selection of the population was used for the derivation of the model, how this sub-population was selected, and how censored data were handled in cohort studies.
Some studies aim only to validate and further apply an already existing model. In this case, it should simply be stated that a previous model was used with appropriate reference to the previous study or studies that developed the model along with a succinct description of its features.
Item 10: Specify the procedure and data used for the validation of the risk model.
Example: ‘The internal validity of the prediction models was assessed using bootstrapping techniques. A total of 100 random bootstrap samples were drawn with replacement from the (total) group of 1337 patients. The discriminative accuracy of the 100 prediction models as fit on these bootstrap samples was determined for each bootstrap sample and for the original group (
n =1,337). This comparison gives an impression of how ‘overoptimistic' the model is, ie, how much the performance of the model would deteriorate when applied to a new group of similar patients.'
36‘Evaluation of model predictive performance using the same dataset used for fitting the model usually leads to a biased assessment. To obtain an unbiased assessment of discriminatory power of the multivariate regression models, a tenfold cross-validation was used in the ROC analysis and in the IDI analysis. Tenfold crossvalidation randomly divides the data into ten (roughly) equal subsets and repeatedly uses any nine subsets for model fitting and the remaining subset as validation until each of the ten subsets has been used exactly once as validation data.'
57Explanation: Assessment of the risk model in the same population as that from which the model was derived generally leads to more positive conclusions than when the evaluation is conducted in an independent population.
58 Therefore, validation of the risk model, reassessing the performance of the model in another dataset, is an essential part of model evaluation,
59 especially when models are developed with the specific intention to apply them in health care. There are two main types of validation: internal validation in the same population or external validation in an independent sample. Internal validation is useful to prevent optimistic assessments, but it does not inform about the performance of the model in other samples of the same population.
60 Moreover, many methods of standard internal validation, such as cross-validation, can still give inflated estimates of classification accuracy, even if properly performed. Authors should report whether they performed (internal or external) validation, and describe the procedure of the validation process. For example, for internal validation, authors should describe what part of the population was used to derive the risk model and what part was used for the validation, and whether they, for example, used cross validation and bootstrapping techniques.
60 For external validation, they should describe the populations that are used for the validation, particularly the comparability with the population that was used to derive the risk model. If the model is already validated elsewhere in previous research, this should also be stated. So far, none of the genetic risk prediction studies had performed an external validation of the risk model.
3Item 11: Specify how missing data were handled.
Examples: ‘Variables with missing values were hypertension (1%), smoking (10%), BMI (14%), plasma HDL cholesterol (19%), plasma LDL cholesterol (20%) and plasma triglycerides (16%). We applied a multiple imputation method (aregImpute function of the
R statistical package; version 2.5.1;
http://www.r-project.org) to impute these missing values in our Cox proportional hazards models because imputation decreases bias in the hazard ratios that may occur when patients with incomplete information are excluded from the analysis. In a secondary analysis, we used the full data set (
n=2,145) and multiple imputation to impute both missing values for conventional risk factors and missing genotype data. This analysis gave discriminative accuracies for the 3 prediction models virtually identical to the analysis without imputation of missing genotype data […].'
36Explanation: Missing data are inevitable in observational studies. Authors should specify the percentage of missing values in their data, indicate whether there are theoretical or empirical grounds that missingness could be non-random, and specify how missing data were handled in the analyses. Authors should specify the methods used to deal with the missing data (eg, complete case analysis, imputation, reweighting) and the assumptions that underlie this choice. Assumptions may include the distribution of the data and whether data were missing completely at random, or related to other variables, including the outcome of the study.
61Item 12: Specify all measures used for the evaluation of the risk model including, but not limited to, measures of model fit and predictive ability.
Examples: ‘We calculated odds ratios and 95% confidence intervals associated with each additional risk allele for each SNP individually and in the genotype score. Using C statistics …, we evaluated the discriminatory capability of the models with the genotype score as compared with the models without the genotype score. We also evaluated risk reclassification with the use of the genotype score, according to the method developed by Pencina
et al for determining net reclassification improvement. We assessed model calibration using the Hosmer–Lemeshow chi-square test. We used categories of genotype score to calculate likelihood ratios and posterior probabilities of diabetes. Statistical analyses were performed with the use of SAS software, version 8 (SAS Institute). A two-tailed
P-value of ≤0.05 was considered to indicate statistical significance'.
48‘Our primary measure of discrimination was the Harrell c-index, a generalization of the area under the receiver-operating characteristic curve that allows for censored data. The c-index assesses the ability of the risk score to rank women who develop incident cardiovascular disease higher than women who do not. We assessed general calibration across deciles of predicted risk by using the Hosmer–Lemeshow goodness-of-fit test to compare the average predicted risk with the Kaplan–Meier risk estimate within each decile and considered a chi-square value of 20 or higher (
P<0.01) to be poor calibration. We assessed risk reclassification by sorting the predicted 10-year risk for each model into 4 categories (<5, 5 to <10, 10 to <20, and ≥20%). We then compared the assigned categories for a pair of models. For each pair, we calculated the proportion of participants who were reclassified by the comparison model versus the reference model; we considered reclassification to be correct if the Kaplan–Meier risk estimate for the reclassified group was closer to the comparison category than the reference. We computed the Hosmer–Lemeshow statistic for the reclassification tables, which assesses agreement between the Kaplan–Meier risk estimate and predicted risk within the reclassified categories. We also computed the Net Reclassification Improvement, which compares the shifts in reclassified categories by observed outcome, and the Integrated Discrimination Improvement, which directly compares the average difference in predicted risk for women who go on to develop cardiovascular disease with women who do not for the 2 models, on the women who were not censored before 8 years'.
56Explanation: A thorough assessment of a risk prediction model comprises many different aspects, but generally includes at least the following questions: (1) How well does the model fit the underlying data?; and (2) What is the predictive ability of the model? Several measures are available to answer each question, and the methods section should clearly describe which measures were used to answer which questions.
4, 62 Measures of model fit (also referred to as calibration) include the Hosmer Lemeshow statistic,
R2, log-likelihood and Akaike information criterion (AIC), and measures of predictive ability (also called discrimination measures) include the area under the receiver operating characteristic curve (AUC), discrimination slope and Brier score. These measures can be accompanied by figures and tables, including calibration plots (eg,
60), risk distributions (see ), AUC plots (see ), discrimination plots (eg,
63) and predictiveness curves (eg,
64). The description of the methods used should clarify also what measures of uncertainty are employed (eg, 95% confidence intervals) and specify any tests used to determine the significance of the findings. When
P-values are reported, authors should indicate what
P-value threshold they considered for statistical significance.
When two risk models are compared and one is an expanded version of the other, the assessment of the risk models includes the two questions for each model. Increases in AUC or in discrimination slope (called integrated discrimination improvement, IDI) provide simple ways to assess improvement of one model over the other.
58 Recent studies have also assessed whether the improvement of risk models also reclassifies people into different risk categories.
2, 65 These measures of reclassification, such as the percentage of total reclassification and net reclassification improvement,
4, 66 are calculated from a reclassification table (). When risk categories are used (eg, for the calculation of reclassification measures), the rationale for the cut-off values should be presented with either appropriate reference to previous work showing that this is a standard choice, or appropriate justification for the choice of cut-offs made by the authors. When several different cut-off categorizations have been studied, all of them should be reported.
| Table 4Example table: net reclassification improvement based on addition of gene count score to Framingham offspring risk score |
Item 13: Describe all subgroups, interactions and exploratory analyses that were examined
Examples: (Subgroups) ‘[In introduction:] However, it remains unknown whether all these genetic and environmental factors act independently or jointly and to what extent they as a group can predict the occurrence of AMD or progression to advanced AMD from early and intermediate stages. Such information may be useful for screening those at high risk due to a positive family history or having signs of early or intermediate disease, among whom some progress to advanced stages of AMD with visual loss. … [In Methods:] Individuals with advanced AMD were compared to the control group of persons with no AMD, and progressors were compared to non-progressors with regard to genotype and risk factor data'.
6(Interactions) ‘Multiplicative interactions were tested for each pair of [all 6] SNPs by including both main effects and an interaction term (a product of two main effects) in a logistic regression model'.
67Explanation: For the evaluation of the predictive performance there might be subgroups in which the risk model performs better than in the initial study population, and there might be genetic variants that jointly have a larger impact on disease risk. The large number of possible analyses that include subgroups or interactions, however, increases the likelihood of finding at least some statistically significant effect by chance.
68 Authors should therefore not only clarify all additional subgroup analyses they performed, but also indicate whether these were planned based on
a priori clinical or epidemiological evidence, or arose in an exploratory manner. Similarly, authors should also explain whether interaction effects were considered and, if so, which ones and why, and how the selection in the final model was done (see item 9). These descriptions should include any methods used to prevent over interpretation of the results, for example, methods that adjust the
P-value thresholds to adjust for multiple testing. Planned analyses of subgroups and interactions should logically follow from the introduction (see item 3); exploratory analyses can be introduced in the methods.