|Home | About | Journals | Submit | Contact Us | Français|
In this review we summarize the scientific literature on reproductive health following deployment to the first Gulf war by armed service personnel. All the studies examined had methodological limitations, making interpretation difficult. Nonetheless we conclude that for male veterans there is no strong or consistent evidence to date for an effect of service in the first Gulf war on the risk of major, clearly defined, birth defects or stillbirth in offspring conceived after deployment. Effects on specific rare defects cannot be excluded at this stage since none of the studies had the statistical power to examine them. For miscarriage and infertility, there is some evidence of small increased risks associated with service, but the role of bias is likely to be strong. For female veterans, there is insufficient information to make robust conclusions, although the weight of evidence to date does not indicate any major problem associated specifically with deployment to the Gulf. None of the studies have been able to examine risk according particular exposures, and so possible associations with specific exposures for smaller groups of exposed veterans cannot be excluded.
We suggest that the way forward to address the question of veterans' reproductive health with confidence in the future is prospective surveillance following deployment. Anything less will result in further problems of interpretation and continued anxiety for parents, as well as prospective parents, in the armed forces.
The first reports of adverse reproductive outcomes among military personnel deployed to the 1990/91 Gulf War were anecdotal media stories of congenital malformations in the children of male veterans (Briggs 1995; Sylvester & Chambers 1995; Moehringer 1995; Reid 1996). They emerged in 1995 and 1996, several years after the war had ended and later than the news of Gulf War illness in the veterans themselves, and were alarming. From a scientific viewpoint, there was insufficient evidence to make a judgement, but concern about exposure to substances hazardous to reproduction during armed service grew. A report by the US General Accounting Office in 1994 had identified 21 potential reproductive toxicants and teratogens that were present during the 1990/91 Gulf War (US GAO 1994). The agents identified were present in smoke from oil fires, soil samples (arsenic, benzene, benzopyrene, cadmium, lead, mercury, nickel, toluene, xylene, di-n-butyl phthalate, hexachlorobenzene, hexachloroethane, pentachlorophenol, hexachlorocyclopentadiene), pesticides (carbaryl, diazinon, dichlorvos, ethanol, lindane, warfarin) and decontaminating agents (ethylene glycol monomethyl ether). Although most of these agents are potentially harmful as exposures to females, effects arising from paternal exposure to chemical and physical agents during the Gulf War were plausible. There was evidence from occupational epidemiology and animal studies that male exposure to heavy metals, solvents, paints and pesticides was associated with poor sperm quality, spontaneous abortion, birth defects and cancer in offspring (Welch et al. 1988; Olshan et al. 1991; Olshan & Faustman 1993; Savitz et al. 1994, 1997a,b; Dimich-Ward et al. 1996; Daniels et al. 1997). This increasing awareness of male-mediated toxicity in the early 1990s formed a timely backdrop to discussions of reproductive hazards resulting from the Gulf War, where the majority of veterans are men.
The results of scientific studies of reproduction in veterans of the 1990/91 Gulf War began to be published in the mid-1990s and publications are still appearing. The aim of this paper was to review the results of studies published up to the time of writing.
Birth defects are abnormalities in the structure or function of the body which are present in the foetus. They can be diagnosed before or after birth, and can sometimes result in the death of the foetus before birth (foetal death) or the parents may choose to terminate the pregnancy following diagnosis (medical termination).
A chronological summary of published studies which have investigated the relationship between deployment to the Gulf War as an armed service employee and birth defects in subsequent offspring is presented in table 1. Of the 11 studies described, seven are based on survey information collected from veterans themselves and four are based on studies making novel use of existing medical and register information.
The first survey, which was small and with no comparison group, found no excess congenital anomalies in the children of Mississippi National Guardsmen deployed to the Gulf War (Penman et al. 1996). The next was a Canadian survey which reported an increased prevalence of birth defects among veterans' children. However, this result was difficult to interpret, because increases were noted for children born before, during and after the Gulf War, and especially for minor anomalies, indicating possible biased reporting (Canadian DND 1998). A thorough survey of Australian veterans did not find evidence of an increased risk of anomalies in the children of deployed men (Sim et al. 2003), but the relatively small number of Australian veterans who were deployed to the Gulf provided the study with little power to detect small effects. Two surveys stand out as being particularly large: one from the USA (Kang et al. 2001) and the other from the UK (Doyle et al. 2004). Both reported some evidence of a modest increase in risk of birth defect for male veterans' offspring born after the war, although cautious interpretations were offered because of concern about reporting bias. The UK study was able to investigate bias because it included medical validation of reported anomalies. It confirmed the influence of biased reporting of some of the less well-defined conditions since there was little evidence for an increased risk of any type of anomaly when the analysis was restricted to confirmed conditions (Doyle et al. 2004). Notably, there were no excesses for congenital syndromes, which include Goldenhar syndrome.
All the four studies which made use of existing medical records or registers originate from the USA. The first analysed military hospital discharge data for over 80000 children born to Gulf War veterans (GWV) and non-Gulf veterans (NGWV), and found no differences in the prevalence of major birth defects diagnosed at birth (Cowan et al. 1997). Although large and with a suitable control group, this study was of livebirths only and did not include infants born in non-military hospitals (Doyle et al. 1997). Media reports of increased numbers of infants born to GWV suffering with Goldenhar syndrome (characterized by abnormal development of facial structures) led to an in-depth study of the same dataset. This study found that the prevalence of likely cases of Goldenhar syndrome was higher in GWV children compared to NGWV children, but the numbers of this rare condition were small and the difference was not statistically significant (Araneta et al. 1997). Goldenhar syndrome is considered as a variant of hemifacial microsomia, and a recent US case–control study of this condition did not find an association with parental deployment to the Gulf 5–11 years before the child was born (Werler et al. 2005).
In an attempt to address criticism that the studies of Cowan et al. (1997) and Araneta et al. (1997) excluded children born in non-military hospitals, in particular children born to parents who had left the armed services, a large study linking routinely collected state-wide birth defects surveillance data to military databases was undertaken. For the state of Hawaii, prevalence of the 48 birth defects studied was found to be similar for children of GWV and NGWV, and also for GWV infants who were conceived before and after the Gulf War (Araneta et al. 2000). However, a later study using the same design, but including information from six US states, found a higher prevalence of specific defects in the children of GWVs compared with the children of NGWVs (Araneta et al. 2003). The specific conditions were tricuspid valve deficiency and aortic valve stenosis in the offspring of male personnel, and hypospadias in offspring of female personnel. There has been criticism of this study with regards to multiple testing: given the very large number of statistical tests undertaken, ‘statistically significant’ results are expected by chance alone.
The only other published study to present information on specific types of defect reported is the UK study (Doyle et al. 2004), which found some evidence of increased prevalence of anomalies of the genital and urinary systems. Direct comparisons between this and the linkage study described earlier are not possible because of differences in the methods of coding and grouping the conditions: the US study used the Centers for Disease Control's (CDC's) Metropolitan Atlanta Congenital Defects Program method, which was in turn adapted from the British Paediatric coding system (personal correspondence with author). Coding was based on information received from the six birth defects surveillance programmes involved in the study, and encompassed 48 different categories. The UK study used the European Surveillance of congenital anomalies system of grouping (Eurocat 1997).
For the purposes of this review, we recoded all the individual conditions from the UK study and undertook a new analysis in order to compare the two studies diagnosis-by-diagnosis. In this analysis, validated congenital anomalies from male UK GWV and NGWV were re-coded strictly according to the 48 birth defect categories of the CDC system. The findings are presented in table 2. The exercise is hampered by low numbers, making interpretation difficult but, overall, the UK results do not confirm the US results, and vice versa. Two exceptions are (i) renal agenesis/hypoplasia with excess cases observed in both studies and (ii) Down's syndrome and other chromosomal anomalies, where a deficit of cases is observed in both studies. However, these results are not statistically significant, even if the estimates from both studies are combined: for renal agenesis/hypoplasia, the combined relative risk is 2.5 (95% CI 0.9–7.4); for Down's syndrome, combined relative risk is 0.5 (95% CI 0.3–1.1).
The conclusion from these published works is that at present there is no consistent evidence of a strong association between Gulf War deployment of servicemen and the appearance of major, clearly defined, birth defects among infants conceived after the war. For service-women, low numbers in most of the studies make conclusions difficult, but overall there is little evidence of a major effect. The portfolio of studies addressing the risk of birth defects is reasonably sound, and includes a range of methodologies and data sources. But notable limitations include low statistical power to detect rare defects or defects in the offspring of female veterans, the strong likelihood of bias in studies relying on self-report with no validation and the inability to relate outcomes to specific exposures of concern.
The term foetal death encompasses both miscarriage and stillbirth. Miscarriage is defined as in-utero death of the foetus before 20 completed weeks (USA) or 24 completed weeks (Europe) gestation of pregnancy. Stillbirth is an in-utero death at or over 20 (USA) or 24 (Europe) completed weeks of gestation. Miscarriage is much more difficult to study than stillbirth because: (i) early miscarriage may occur before the mother recognizes a pregnancy and can go unrecognized and unreported; (ii) not all women who have suffered miscarriage will seek medical treatment, and hence there may be no medical documentation of the event; (iii) there are very few medical registers of miscarriage for comparison purposes in research studies. For these reasons, research on miscarriage, especially early miscarriage, usually has to rely on self-report of cases.
There have been seven studies of foetal death in the offspring of Gulf veterans to date (see table 3), although only four (Sato et al. 1999; Kang et al. 2001; Araneta et al. 2003; Doyle et al. 2004) had sufficient statistical power to be able to address the question with reasonable confidence. All have relied on self-report of miscarriage and stillbirth. None of these studies found an association between service in the Gulf and increased risk of stillbirth in pregnancies conceived after deployment. However, it must be noted that for female veterans, the numbers of stillbirths, and the resulting statistical power, was low even in the bigger studies and effect sizes of less than 3 times the risk would have been undetectable.
For miscarriage, two studies found statistically significant increases in miscarriage reported by male Gulf veterans compared to male NGWVs. Kang et al. (2001) reported a 65% increased risk and Doyle et al. (2004) a 40% increased risk. Neither of these studies found an association between deployment and risk of miscarriage in pregnancies conceived after the war by female veterans, but another study reported an approximate trebling of risk (Araneta et al. 2003). Somewhat paradoxically this latter study did not find a raised risk of miscarriage for conceptions occurring during the conflict, when higher exposures to potential reproductive toxins would have been expected. The authors of all three reports have suggested that these findings be treated with caution because they are based on self-reported miscarriage and the role of biased reporting must be considered. The fact that in two of these studies (Kang et al. 2001; Doyle et al. 2004) the miscarriage rates in comparison (NGWV) pregnancies appear unusually low, rather than the rates being particularly high in the veterans' (GWV) pregnancies, provides some evidence that this might indeed be the case.
The study by Araneta et al. (2004) also found a high risk of ectopic pregnancy in conceptions after the war but, again, not in conceptions occurring during the war. However, this finding is based on very small numbers of events (four ectopic pregnancies in the post-war conception group) and was not found in a another, larger, investigation (Sato et al. 1999).
We conclude that for stillbirth there is currently no evidence from published studies that deployment of male service personnel to the Gulf is associated with increased risk. The data are too sparse for female veterans to make meaningful conclusions, but a large effect of deployment can be ruled out. For miscarriage, the picture is less clear. Some evidence of a small effect has been presented, but the results are inconsistent and it is likely that bias can explain some of the observed effects. However, a large effect (over a doubling in risk) can be excluded.
Theoretically at least (Colie 1993; Schrader & Kesner 1993; Skakkebaek et al. 1993), exposure to one or more toxicants of the type thought to be present in the Gulf war could affect spermatogenesis, either temporarily through direct damage to spermatozoa, or more permanently through damage to the spermatogenic stem cells or testicular cells responsible for spermatogenesis. These effects would be manifest as increased levels of infertility. A laboratory-based study published in 2003 which reported extensive damage to rats' testes when given insecticides and NAPS of the type used in the Gulf War, which worsened when the rats were subjected to moderate levels of stress, would seem to strengthen this hypothesis (Abou-Donia et al. 2003). Further plausibility arises from the adverse effect on semen quality seen among veterans of the Vietnam War (which included potential exposure to herbicides such as Agent Orange), such as an almost tripling in risk of poor sperm concentration (less than 20 million per ml) compared with a similar group of armed service personnel not deployed to Vietnam (DeStefano et al. 1989). Despite demonstrating that depleted uranium (DU) mobilizes and translocates to the gonads in rats, and hence is potentially toxic to reproductive tissues (Domingo 2001), studies of implanted DU in male rats have however found no evidence of a detrimental effect of DU on mating success, sperm concentration or sperm velocity (Arfsten et al. 2006).
Only three epidemiological studies (Ishoy et al. 2001; Sim et al. 2003; Maconochie et al. 2004) have specifically examined fertility in relation to Gulf War service in general, while a further set of overlapping surveillance studies have examined reproductive health in a small number of US GWV who were victims of ‘friendly fire’ involving DU weapons (McDiarmid et al. 2000, 2001, 2004, 2006). These studies are summarized in table 4. Notably, no studies have examined this outcome among US GWV in general.
The first epidemiological study relating to fertility was an interview-based study of 661 male Danish GWVs and a matched comparison group of 215 military servicemen not deployed to the Gulf, which included taking a blood sample (Ishoy et al. 2001). Semen samples were not taken, but instead reproductive hormones were used as serum markers of male reproductive health status. A serum inhibin B level of less than or equal to 80pgml−1 combined with a serum follicle stimulating hormone (FSH) of greater than or equal to 10IUl−1 was used as a validated indicator of oligospermia (sperm count less than 20 million per ml). The study found no difference between GWVs and controls with respect to any of the reproductive hormones measured, including the proportions with FSH and inhibin B levels indicating suspected oligospermia, which were identical (1.6%) in the two groups. Nor did the study find a difference in proportions reporting ‘treatment due to childlessness after 1990’ (2.8% in GWV versus 2.6% in NGWV, p>0.05). No attempt was made to validate the self-reported reproductive histories, however, and expected numbers were small, hence power was consequently low.
The second study examined self-reported fertility status among Australian veterans, using a postal questionnaire. Among males who responded to the questionnaire and had achieved or tried to achieve conception since 1991 (1313 GWV and 1412 NGWV), GWV were slightly more likely than the comparison group to report difficulties with fertility (defined as difficulties fathering a pregnancy despite trying for at least 12 months) following the Gulf War. There was no evidence of a difference in identifying a cause of infertility, however, and GWV with reported fertility difficulties appeared more likely subsequently to father a successful pregnancy. This latter finding could be related to the fact that slightly more GWV sought treatment than NGWV (4.0% versus 3.3%), though since there was no clinical validation of the self-reported reproductive histories, the possibility of recall (reporting) bias is perhaps more likely, with NGWV being less likely to report fertility difficulties if they had subsequently fathered a liveborn child. Expected numbers for men in this study were also small and power consequently low. Among women, identical proportions (10%: three GWV, four NGWV) reported fertility difficulties commencing 1991 or later, but with only 32 (out of 38) female Australian GWV and 40 NGWV participating in the study, the numbers in analyses were too small to draw meaningful conclusions.
The largest epidemiological study of infertility in GWV was the UK study (Maconochie et al. 2004), which examined failure to achieve any conceptions (type I infertility) or livebirths (type II infertility) after the Gulf War, having tried for more than 1 year and consulted a doctor, among 10465 male UK GWV and 7376 NGWV who had fathered or tried to father pregnancies after the Gulf War. Time to conception among pregnancies fathered by men not reporting fertility problems was also examined. Again, this study used a self-administered postal questionnaire to obtain details of reproductive history, but unlike the other studies, an attempt was made to verify and obtain further information on all reported fertility problems, including diagnostic details and a copy of the semen analysis results, if available, by contacting both male and female partners' General Practitioner or relevant clinician. The study found a small increased risk of reported infertility associated with Gulf War service, which was strengthened when the definition was extended to include men reporting fertility problems who had fathered only pregnancies ending in foetal death. This effect was regardless of whether or not the men had fathered pregnancies before the war, and was constant over time, which argues in favour of either paternal germ cell mutation or other damage to spermatogenic stem cells or the testicular cells necessary for supporting spermatogenesis. Furthermore, the results were similar when analyses were restricted to clinically confirmed diagnoses. The evidence for an adverse effect of Gulf war service on fertility was also strengthened by the finding that pregnancies fathered by GWV not reporting fertility problems also took longer to conceive.
These findings were consistent with the Australian study, but conflicted with that of Danish veterans. The UK study had a fairly low response rate (53% for GWV and 42% for NGWV), but a study of non-responders provided no evidence of bias with respect to infertility, the prevalence of (self-reported) infertility among GWV and NGWV being almost identical in responders and non-responders. Differential recall of infertility problems by GWV is a possibility in all three studies, and it could be argued that GWV had more incentive to report details relating to this highly sensitive issue, even if minor, if they perceived that it could be associated with their Gulf War service. The similarity in results when restricted to clinically confirmed infertility provides little evidence of this kind of biased reporting in the UK study, however.
Four rounds of medical surveillance (1994, 1997, 1999, 2001) have been conducted on a small number of US veterans exposed to DU during friendly fire incidents in the Gulf War when their vehicles were hit with munitions containing DU penetrators (McDiarmid et al. 2000, 2001, 2004, 2006). The numbers participating at each time point varied in size, but all are extremely small, between 29 and 50 GWV. The four overlapping studies involved GWV only, comparing either DU-exposed GWV having high (greater than or equal to 10μgg−1 creatinine) urinary uranium concentrations with DU-exposed GWV having low (less than 10μgg−1 creatinine) urinary uranium concentrations (McDiarmid et al. 2000, 2001, 2004, 2006), or DU-exposed GWV with non-DU-exposed GWV (McDiarmid et al. 2000). Overall, despite persistent urine uranium elevations in these DU-exposed GWV for more than 10 years, no clinically significant difference in semen characteristics (volume, count, concentration) and motility was found between groups at any of the time points.
In conclusion, epidemiological evidence for an effect of Gulf war service on risk of infertility is sparse, and the majority of studies lack statistical power. In particular, the numbers of female GWV in the populations studied are too small to produce meaningful analyses. Nevertheless, evidence from animal studies suggests that the possibility of sperm damage resulting from exposure to toxicants of the type present in the 1991 Gulf War is at least plausible, and the Australian and UK studies provide some evidence of a consistent, if small, effect of Gulf War service on risk of infertility. This is strengthened by findings of increased time to conception among UK GWV not reporting fertility problems, and by previous findings of increased risk of miscarriage among pregnancies fathered by GWV, but the possibility of reporting bias cannot be ruled out. Overall, this is a difficult outcome to study, and the epidemiological evidence is too sparse to draw firm conclusions.
A reproductive health concern that appears to be novel in its alleged association with the 1990/91 Gulf War is ‘burning semen’ (San Jose Mercury News 1995). The incidence and prevalence of burning semen symptoms, in either the general population or military deployers, have not been well described. Among case series, many couples who present with symptoms of burning semen can be diagnosed with seminal plasma hypersensitivity or a related hypersensitivity disorder (Bernstein et al. 1997, 2003; Bernstein 1998); and some are successfully treated with desensitization therapies. A causal relationship between burning semen symptoms and any exposures of the Gulf War has never been established. Stories of burning semen among GWVs did heighten concerns, however, and perhaps added to the sense of mystery about ‘Gulf War Syndrome’ in general. With subsequent military deployment to Iraq more than 10 years later, concerns about fertility and semen quality were of high enough interest to prompt some US military men to cryo-preserve their sperm prior to deployment (Kelly 2003).
In the spectrum of reproductive health outcomes, gynaecologic problems, unrelated to conception and pregnancy, should clearly also be considered. Women were more likely to present for evaluation in the US Gulf War registries for almost all medical symptoms and conditions (Gray et al. 1998). A modest survey study of US Air Force women suggested that GWVs reported more gynaecological problems after deployment than their non-deployed peers (Pierce 1997). Other, larger analyses have not borne out an association between any gynaecologic pathology and Gulf War service (Wittich 1996; Murphy et al. 1997; Frommelt et al. 2000). It should be noted, however, that deployment and austere environments, in general, have been described as challenging for women service members, perhaps predisposing to a limited number of problems, such as urinary tract infections or bacterial vaginoses. These challenges may be related to limitations in access and/or acceptability of medical services for women in theatre, rather than deployment-specific exposures (Ryan-Wenger & Lowe 2000).
Despite the concern about reproduction and adverse outcomes of pregnancy following service in the Gulf War, there have been relatively few studies on the topic. Relative, that is, to the multitude of studies on Gulf War illness. This probably reflects the difficultly of studying such events, particularly in men. The majority of the epidemiological studies described here are either cross-sectional surveys requesting information on events in the past or data linkage studies. Both methodologies have their limitations. For cross-sectional surveys, a common problem is bias and a limitation for data linkage studies is the restricted nature of the datasets being used. An additional problem in all the studies is low statistical power to examine rare events, for example a specific birth defect or stillbirth.
Despite these limitations, studies of reproductive outcome have a clear advantage over studies of Gulf War illness, in that the definition of outcome is usually clear and comparable across studies. This is especially true for stillbirths and specific birth defects, such as neural tube defects, reduction deformities of limbs or chromosomal anomalies. It is highly unlikely that a parent would report such a condition if it did not exist, or fail to report it if it did exist, whatever the exposure status of the parent. Also, such outcomes are easily validated using medical records. Unfortunately, this clarity reduces when we examine early miscarriage and some of the less well-defined birth defects, such as anomalies of the musculoskeletal system. For these conditions, there is an element of judgement—what one person regards as a birth defect another might regard as a normal variant of structure. The role of reporting bias, or differential reporting according to exposure status, is likely to be greater for these conditions. As we have seen from the above review, it is precisely these outcomes where associations with deployment to the 1990/91 Gulf War have been reported, and thus the findings need to be treated with caution.
It is possible to make statements about the current state of knowledge, albeit with qualifications. For male veterans, we conclude that there is no strong or consistent evidence in the literature to date for an effect of service in the first Gulf War on the risk of major, clearly defined, birth defects or stillbirth in offspring conceived after deployment. Effects on specific rare defects cannot be excluded at this stage, since none of the studies had the statistical power to examine them. For miscarriage and infertility, the picture is less clear. There is some evidence of small increased risks associated with service, but the role of bias cannot be ruled out. For female veterans, there is insufficient information to make robust conclusions, although the weight of evidence to date does not indicate any major problem.
None of the studies discussed here have been able to examine risk according to particular exposures and so we cannot exclude the possibility of undetected adverse effects for small groups of veterans with high exposures to specific agents. A recent review of all literature, including the studies examined here, has looked at teratogenicity in relation to potential exposure to DU (Hindin et al. 2005). The authors concluded that the evidence ‘is consistent with increased risk of birth defects in the offspring of persons exposed to Depleted Uranium’. We believe there is no evidence from published literature on armed service Gulf veterans to support this statement. That, of course, is not the same as saying that such effects do not exist.
Could such risks ever be quantified if they did exist? Exposures in theatre are difficult to define and measure, and estimates are confounded by multiple exposures, both deployment-specific and exposures unrelated to deployment. In the Gulf War, for example, some service members may have been exposed to vaccines, prophylactic medications, oil-well smoke, destruction of munitions and nerve agents, heat and other environmental stressors. They also may have been exposed to tobacco, alcohol, caffeine, medications and other factors not unique to deployment. Also many ‘deployers’, especially those who were not ground combat forces, may have had no concerning exposures at all. Epidemiologic studies from the first Gulf War were thus unable to distinguish exposed deployers from unexposed. It is now highly unlikely that future studies will ever be able to examine the impact of specific exposures in the 1990/91 Gulf war on reproductive outcome.
We suggest that the only way forward to address the question of veterans' reproductive health with confidence in the future is prospective surveillance following deployment. This avoids the methodological limitations of the epidemiological studies conducted to date. The US Department of Defense is committed to ongoing surveillance for birth defects and other reproductive health outcomes with the development of a Birth and Infant Health Registry in 1998 (Ryan et al. 2001). The US also recognized the importance of prospective studies for addressing the health of veterans by launching the Millennium Cohort Study in 2001 (Gray et al. 2002). This ambitious project is following the health of more than 100000 service members, collecting both subjective and objective data, for the next 20 years. Efforts such as these will help identify true changes in reproductive health outcomes in the military and, in concert with improved exposure data and prospective studies, better define cause–effect relationships between military service and reproductive health challenges.
We acknowledge Rebecca Simmons for her work on recoding and re-analysing the UK anomaly data for table 2.
One contribution of 17 to a Theme Issue ‘The health of Gulf War veterans’.