Five centres (Auckland, New Zealand; Adelaide, Australia; London and Manchester, UK; and Cork, Ireland) recruited nulliparous women with singleton pregnancies to the SCOPE study between November 2004 and August 2008.19
Women (n=4961) attending hospital antenatal clinics, obstetricians, general practitioners, or community midwives before 15 weeks’ gestation were invited to participate. Exclusion criteria included recognised as high risk of pre-eclampsia, small for gestational age baby or spontaneous preterm birth because of underlying medical conditions (chronic hypertension requiring antihypertensive drugs, diabetes, renal disease, systemic lupus erythematosus, antiphospholipid syndrome, sickle cell disease, HIV), previous cervical knife cone biopsy, three or more abortions or three or more miscarriages, current ruptured membranes; known major fetal anomaly or abnormal karyotype; or intervention that could modify the outcome of pregnancy (such as aspirin, cervical suture).19
A research midwife interviewed and examined women at 14-16 and 19-21 weeks’ gestation. Women underwent an ultrasound scan at 19-21 weeks. At the time of interview, data were entered on an internet accessed central database with a complete audit trail (MedSciNet).
At 14-16 weeks’ gestation the following data were collected: demographic information including age, ethnicity, immigration details, education, work, socioeconomic index, income level, living situation; the woman’s birth weight and gestation at delivery and whether it was a singleton or multiple pregnancy; previous miscarriages, abortions, or ectopic pregnancies and whether these pregnancies were with the same partner as the current pregnancy or not; history of infertility, use of assisted reproductive technologies, duration of sexual relationship, and exposure to partner’s sperm; gynaecological (including polycystic ovarian syndrome) and medical history, including hypertension while taking combined oral contraception, asthma, urinary tract infection, inflammatory bowel disease, thyroid disease, and thromboembolism; and family history (in mother and sisters) of obstetric complications (miscarriage, pre-eclampsia, eclampsia, gestational hypertension, spontaneous preterm birth, any preterm birth, gestational diabetes, stillbirth, and neonatal death) and family history (mother, father, sibling) of medical conditions (hypertension, coronary artery heart disease, cerebrovascular accident, type 1 and 2 diabetes, and venous thromboembolism).
Information was collected on vaginal bleeding early in pregnancy (gestation, severity and duration of bleeding, and recurrent bleeds), hyperemesis, and infections during pregnancy. Vegetarian status was recorded, and other dietary information before conception and during pregnancy was obtained from food frequency questions for fruit, green leafy vegetables, oily and other fish, and fast foods. Use of folate and multivitamins, cigarettes, alcohol (including binge drinking), and recreational drugs (including marijuana, amphetamine, cocaine, heroin, ecstasy, LSD (lysergide)) was recorded for before conception, first trimester, and at 15 weeks. A lifestyle questionnaire was completed on work, exercise and sedentary activities, snoring, domestic violence, and social supports. Psychological scales were completed to measure perceived stress,20
and behavioural responses to pregnancy (adapted from the behavioural responses to illness questionnaire23
). Two consecutive manual blood pressure measurements (mercury or aneroid sphygmomanometer, with a large cuff if the arm circumference ≥33 cm and Korotkoff V for diastolic blood pressure) were recorded. Other maternal measurements included maternal height and weight and waist, hip, arm, and head circumference. Proteinuria in a midstream urine specimen was measured by dipstick or a protein:creatinine ratio. Random whole blood glucose and serum lipid concentrations (triglycerides, total cholesterol, high density lipoprotein cholesterol, low density lipoprotein cholesterol, total cholesterol:high density lipoprotein cholesterol ratio) were also measured.
Ultrasound examination at 19-21 weeks’ gestation included measurements of the fetus (biparietal diameter, head circumference, abdominal circumference, and femur length) and Doppler studies of the umbilical and uterine arteries.24
All fetal measurements were adjusted for gestational age by calculating the multiple of the median for each gestational week. Mean uterine resistance index (RI) was calculated from the left and right uterine resistance index. If only a left or right uterine resistance index was available, this was used as “mean resistance index” (n=20). Notching of each uterine artery was recorded. An abnormal uterine artery Doppler result was defined as a mean resistance index >90th centile (>0.695).
Participants were followed prospectively, and research midwives collected data on pregnancy outcome and measurements of the baby. Data monitoring included individual checks of all data for each participant, including checks for any transcription errors of the lifestyle questionnaire, and detection of illogical or inconsistent data and outliers with customised software.
Our primary outcome was pre-eclampsia defined as systolic blood pressure ≥140 mm Hg or diastolic blood pressure ≥90 mm Hg, or both, on at least two occasions four hours apart after 20 weeks’ gestation but before the onset of labour, or postpartum, with either proteinuria (24 hour urinary protein ≥300 mg or spot urine protein:creatinine ratio ≥ 30 mg/mmol creatinine or urine dipstick protein ≥++) or any multisystem complication of pre-eclampsia.19 25
Multisystem complications included any of acute renal insufficiency defined as a new increase in serum creatinine concentration ≥100 µmol/L antepartum or >130 µmol/L postpartum; effects on liver, defined as raised aspartate transaminase or alanine transaminase concentration, or both, >45 IU/L and/or severe right upper quadrant or epigastric pain or liver rupture; neurological effects included eclampsia, imminent eclampsia (severe headache with hyper-reflexia and persistent visual disturbance), or cerebral haemorrhage; and haematological effects included thrombocytopenia (platelets <100×109
/L), disseminated intravascular coagulation, or haemolysis. The reference group was women who did not develop pre-eclampsia.
The estimated date of delivery was calculated as follows: if the woman was certain of the date of her last menstrual period (LMP), the estimated date of delivery was adjusted only if a scan at <16 weeks’ gestation found a difference of seven or more days between the scan gestation and that calculated by the LMP or a scan at 19-21 weeks found a difference of 10 or more days. If her date was uncertain, scan dates were used to calculate the estimated date of delivery. Preterm pre-eclampsia was pre-eclampsia resulting in delivery before 37+0
weeks’ gestation. Small for gestational age was defined as a birth weight below the 10th customised centile, adjusted for maternal height, booking weight, ethnicity, and delivery gestation and infant’s sex.26 27
The number of women required to be screened was based on achieving suitable screening test characteristics and precise estimates of their values. Given a pretest probability (prevalence) of pre-eclampsia of 5%, then a post-test probability of 30% or greater would make this a useful test, based on current clinical practice. The algorithm must therefore have sufficient ability such that it is unlikely the post-test probability will fall below 0.30 (30%) for pre-eclampsia. This can be attained, with a power of 80%, in a cohort of 3000 if the true positive likelihood ratio of the screening test is 9.2 to 10.0. Given a prevalence of 5%, if we observe a sensitivity of 90% this cohort size will give a 95% confidence interval for this sensitivity of 84.0 to 94.3, and a specificity of 91%.
We used two datasets to construct the predictive models for pre-eclampsia. The first comprised clinical variables obtained at 14-16 weeks’ gestation and the second comprised clinical data at 14-16 weeks combined with variables from the 19-21 week ultrasound scan. Of the 933 original and derived variables recorded, we excluded variables added after recruitment commenced (n=76), paternal variables (n=48), variables not applicable to prediction of pre-eclampsia (n=246), variables with more than 10% missing data (clinical laboratory data and work variables not applicable to women not working, n=27), and 402 variables with P>0.10 on univariable comparison of women with and without pre-eclampsia. Of the remaining 134 variables, we selected 38 as candidate predictors on the following criteria: known potential risk factors for pre-eclampsia, ease of collection in the clinical setting, and potential applicability to future populations (see table A in appendix 1 on bmj.com for a full list of variables). With this approach the only established “risk” factor not included in the candidate predictors was cigarette smoking, as in our dataset this was not associated with pre-eclampsia. We added the number of cigarettes smoked a day at 15 weeks’ gestation as a candidate predictor, giving a total of 39 variables for the multivariable analysis. Variables were not included as candidate predictors because of colinearity (n=61), a low cell count (<5) in the χ2 test (n=11), lack of a consistent relation with pre-eclampsia in literature (n=4), or not readily applicable to a future obstetric population (n=20).
Among the 39 candidate predictors, data were complete in 32, missing in <1% for six variables, and missing in 6% for participant’s birth weight. We imputed missing continuous data (n=4) with expectation maximisation and used the median for three variables unrelated to other observed data. The expectation maximisation algorithm was implemented in the “mix” package in R, version 220.127.116.11 29
To evaluate its imputation, we used a permutation technique on the complete dataset. For each variable, we systematically removed each data point and imputed the “missing” value using expectation maximisation. We calculated the ratio of mean absolute error between imputed and original values to the mean value for that variable. The mean ratio for the variables imputed with expectation maximisation was 10.8%.
We used SAS (version 9.1) for univariable data analysis and to generate a multivariable logistic regression model. We used Student’s t
test, Wilcoxon rank sum test, or χ2
test for comparing characteristics in the study population and pregnancy outcomes between women who did and did not develop pre-eclampsia. Stepwise logistic regression was used to determine independent risk factors for pre-eclampsia in both datasets. The order of variable selection was determined by the χ2
statistic for each potential variable and the forward selection step could be followed by removal of variables in one or more backward elimination steps. We calculated receiver operating characteristics curves and determined screening test characteristics at a 25%, 10%, and 5% false positive rate. For internal validation we evaluated the calibration and discrimination (10-fold cross validation) of the model using methods described by Altman et al.30
Calibration was assessed by plotting the observed proportion of events against the predicted probabilities. For the cross validation, participants were stratified by region (New Zealand, Australia, Ireland, and UK), pre-eclampsia status (positive or negative), and gestation (<260 days or ≥260 days) and randomly allocated to one of 10 groups. Tenfold cross validation was then performed, with 90% of the data used to generate a model, and estimation of disease risk was performed in the 10% remaining. These predicted values were then combined across the 10 runs and summarised by the C statistic (AUC). This entire procedure was repeated 10 times.
To determine the variables most consistently retained in the prediction models, we generated the 10 “best models,” based on the proportion of variance explained, by calculating all possible logistic regression models retaining 10 variables. We determined the frequency of each variable present in the 10 highest scoring models and identified key risk factors. We then calculated the proportion of women with specific combinations of key clinical risk factors and abnormal uterine artery Doppler at 20 weeks’ gestation who developed pre-eclampsia.