Search tips
Search criteria 


Logo of cancerinformCancer Informatics
Cancer Inform. 2010; 9: 265–279.
Published online 2010 November 28. doi:  10.4137/CIN.S6202
PMCID: PMC2998934

Sinusoidal Cox Regression—A Rare Cancer Example


Evidence of an association between survival time and date of birth would suggest an etiologic role for a seasonally variable environmental exposure occurring within a narrow perinatal time period. Risk factors that may exhibit seasonal epidemicity include diet, infectious agents, allergens, and antihistamine use. Typically data has been analyzed by simply categorizing births into months or seasons of the year and performing multiple pairwise comparisons. This paper presents a statistically robust alternative, based upon a trigonometric Cox regression model, to analyze the cyclic nature of birth dates related to patient survival. Disease birth-date results are presented using a sinusoidal plot with peak date(s) of relative risk and a single P value that indicates whether an overall statistically significant seasonal association is present. Advantages of this derivative-free method include ease of use, increased power to detect statistically significant associations, and the ability to avoid arbitrary, subjective demarcation of seasons.

Keywords: sinusoidal Cox regression, seasonality of birth


The fetal origin hypothesis purports that early life environmental exposures may influence disease phenotype in adulthood.1 Accordingly, individuals born during a specific season may experience unique exposures at a time when they are particularly sensitive to such exposures. A key premise of the theory is that certain exposures and experiences around the time of birth lead to adaptive phenotypes that better prepare the individual for events that may occur later in life. However, not all developmental adaptations have an apparent value in modern society, such as the disruptive response to a host of man-made teratogens.

Experts have hypothesized an etiologic link between season of birth and various childhood diseases, in particular cancer.25 This is partly due to the temporal brevity between the perinatal window of susceptibility to environmental carcinogens and disease development. This narrow age-dependent period is characterized by rapid cell growth and division and an undeveloped immune system.6,7 Various chemicals and oncogenic viruses have been shown in the laboratory to induce developmental cancers when specifically administered during perinatal life versus maturity.810 Infectious agents, pesticides, antihistamines, indoor environmental tobacco smoke, and vitamin D are a few environmental exposures that tend to follow a seasonal pattern and conceivably may play a role in disease risk or protection.11

There also is evidence suggesting that prenatal and early childhood exposures may play a role in adult diseases. For example, exposure to airborne infections during infancy has been associated with increased mortality from similar infections in old age.12 A birth cohort from 1634 through 1870 in the small village on Minorca Island, Spain was used to demonstrate decreased mortality among summer births, thus suggesting the importance of early seasonal exposures in disease development decades later.1 Similarly, seasonal birth exposures have been implicated in some adult cancers.11

Evidence of an association between season of birth and childhood disease would suggest that a seasonally variable environmental exposure may underlie the disease. The type, nature, timing, and severity of the exposure ultimately may be important determinants of whether a seasonal factor has long-term negative or positive consequences for disease development and patient survival.

To the best of our knowledge, no studies have examined day of birth as a seasonal determinant of survival time, when controlling for outcome related covariates. This paper presents a simple method to determine the statistical significance of a seasonal risk factor in a Cox regression model that allows for control of confounding variables.


Cox regression, also known as the proportional hazards model, is a continuous time technique in which event rates vary instantaneously as a function of time.1315 A common use of Cox regression is to estimate the relative risk (RR) for a binary event such as death or cancer recurrence, taking into account the variable length of follow-up among participants in a study. That is, some participants may not have experienced the event by the end of the study. A key advantage of the Cox technique over single variable methods (eg, Kaplan-Meier) for analyzing time to event/survival data is the ability to account for confounding and effect modification of other variables in the model. Another important feature is that survival probabilities may be estimated even when some participants fail to complete the trial (ie, censored data). Importantly, ignoring censored data may lead to serious bias and the distortion of study results. A simple modification of the basic Cox model also allows for the inclusion of covariates that change over the course of the study (eg, participant gets married). Cox regression has been widely used in the fields of epidemiology and clinical research for the analysis of nested case-control, case-cohort, and cohort studies including clinical trials.16

Letting x1, …, xr denote a study participant’s values for (r) predictor variables, the Cox model specifies that log (hazard rate) = [alpha](t) + [beta]1x1 + … + [beta]rxr, where (t) represents time and [alpha], [beta]1, …, [beta]r, are estimates of the intercept and model coefficients.17 Unlike traditional regression models where the intercept is constant, the [alpha] (t) term in the Cox model is defined as the logarithm of an unspecified but positive baseline hazard function that depends on time. Setting all the xi equal to zero in the Cox model yields the baseline hazard. The hazard rate [h(t)] is the instantaneous incidence for an event and is mathematically defined as h(t)=limΔt0+{Pr[tT<t+Δt|tT]/Δt}, where T denotes the event time.13 In other words, the hazard rate is the probability of an event occurring in a very short interval, divided by the length of the interval. A desirable property of Cox regression is that the event times are not required to be normally distributed, as is the case for many statistical procedures, because the baseline hazard function is estimated independently of any parametric assumptions.

The exponentiation of [beta] corresponding to a dichotomous variable in the Cox model gives an estimate for the hazard ratio of the comparison and referent groups, holding all other variables in the model constant. In applied terms, the hazard ratio may be interpreted as the odds that an individual in the group with the higher hazard reaches the endpoint first. Accordingly, the probability of reaching the endpoint first equals HR/(1 + HR).18 However, the HR does not gauge “how much faster” an event occurs in a particular arm of the study. When xi is continuously distributed, the HR represents the multiplicative factor for risk corresponding to a unit increase in that variable.17

A key assumption of Cox regression is that the hazard ratio is constant over time. This implies that the hazard for any study participant is a fixed proportion of the hazard for any other participant. Thus, the individual log hazards plotted over time should be parallel.19

Given the observed data for (n) individuals, an estimate of the vector of β coefficients, ie, [beta] = ([beta]1, ..., [beta]r), is obtained by maximizing the partial likelihood function in[βxilog{jRiexp(βxj)}], where xi = (xi1, …, xir) is a r-dimensional vector of covariates, t1< … <tk denote the k ordered distinct event times, and the censored observations occupy the remaining n-k places in arbitrary order.20 The k risk sets are defined such that j is an element of Ri, representing the participants at risk immediately prior to the ith event, if and only if tj ≥ ti. The unknown parameter estimates are determined iteratively by setting the first partial derivatives in the above equation equal to zero, computing the matrix of second partial derivatives, and applying the Newton-Raphson method. An estimate for the hazard rate is easily found to equal [(titi1)(jRiexp(βxj))]1. Although the Cox model as specified does not allow for tied values, approximate partial likelihood functions have been developed to handle non-distinct event times and these approximations have been incorporated into commonly available computer software packages.19

A seasonal variable in the simplest case may be expressed as a binary term, eg, where birth occurred in summer compared with winter. However, more complex forms may be important to consider when modeling annual seasonality. A temporal variable such as date of birth (DOB, coded as an integer from 1 to 365) may be expressed as a trigonometric function.21,22 In this example, let x1 = cos[2•arccos(−1) ((DOB-ξmax)/365)], where ξmax is determined iteratively by finding the value from 1 to 365 that maximizes [beta]1. In the case of a leap year, the 29th day of February is recoded as calendar day 59 so that the respective year consists of 365 days.23

The maximum 3-month seasonal period of risk for a time-to-event outcome is found by taking the 91.25 day wide interval centered on ξmax. Analogously, the minimum risk period is found by taking the symmetrically opposite 3-month interval centered on ξmin (ie, the value from 1 to 365 that minimizes [beta]1 in the equation for x1). Dummy coding x1 = 1 if DOB falls within the maximum window of risk and x1 = 0 if DOB falls within the minimum window of risk and exponentiating [beta]1 gives an estimate of the peak-to-trough HR, assuming 4 distinct seasons. Using standard statistical methods, a corresponding 95% confidence interval (CI) may be computed assuming asymptotic normality of log(HR).19

A P value for determining the statistical significance of the sinusoidal term may be determined by taking twice the logarithm of the ratio of the partial likelihood for the model with and without the variable. The resulting value is compared with a χ2 statistic having 1 degree of freedom. The seasonal association is visualized by plotting harmonic displacement ([beta]1x1) against DOB over the range 1 to 365.

The sinusoidal Cox model also may be used to model multiple cycles within a period of interest. For example, certain biologic phenomenon may occur in synchrony with a lunar cycle and have a peak incidence every 29.53 days. The model for a lunar cycle would be computed by substituting 29.53 for 365 in the denominator of the sinusoidal term. In another example, a scientist wishes to test the hypothesis that weather-related stress (eg, cold winters and hot summers) at the time of birth triggers epigenetic mechanisms which program immune response later in life. As above, 182.5 would be substituted for 365 in the denominator of x1 in order to fit a bi-modal sinusoidal model to the data.


Using anonymized DOB data for a rare neurogenic cancer (see Appendix), we conducted analyses using the method described above, and for comparison, the typical, more basic method to examine whether a seasonal pattern of birth significantly predicts time to death following diagnosis (note: this simplified example is presented for illustration purposes only and is not intended to represent a comprehensive epidemiologic analysis of the data). The identification of an underlying sinusoidal trend in births would be consistent with the hypothesis that a seasonally varying exposure around the time of birth influences the risk of dying from this cancer later in life.

Among patients alive at last follow-up, 17% were >60 years of age compared with 32% who died (Table 1). A discernable pattern for period of birth was not observed, although a higher percentage of living patients were born during the 1950s than deaths. Overall, the percentage of deaths was higher among patients born in winter and summer than those born during spring and fall, suggesting a possible seasonal of birth survival difference. However, individual follow-up times differed considerably.

Table 1.
Characteristics of patients by survival status (N = 958).

A traditional Cox regression model was used to obtain HR’s for month-to-month birth comparisons and to account for differences in patient follow-up times (Table 2). Similar to the above results comparing the percentage of deaths by month, a bimodal pattern was observed in the month-to-month birth HR’s. However, all CI’s overlapped unity after adjusting for multiplicity.24

Table 2.
Month-to-month birth hazard ratios (HR) and 95% confidence intervals (CI) of dying following diagnosis.*§

Applying an unimodal sinusoidal Cox regression model to the data, peak seasonal risk for death was observed at calendar day 254 (mid-September), however the result was not statistically significant (Fig. 1). Furthermore, the 3-month peak-to-trough HR did not statistically differ from a null result (HR = 1.1, 95% CI = 0.89–1.4) (not shown in Fig. 1). However, when fitting a bimodal model to the data, statistically significant seasonal peaks for birth (likelihood ratio test, P = 5.6E-6) were observed at day 15 (mid-January) and day 196 (mid-July) (Fig. 2).

Figure 1.
Calendar day of birth for peak risk of dying following diagnosis—unimodal fit (adjusted for period of birth and age).
Figure 2.
Calendar day for birth and peak risk of dying following diagnosis—bimodal fit (adjusted for period of birth and age).

Simulation Results

Using the SAS® programming language (version 9.2, Cary, NC), a simulation was performed to illustrate the relative efficiency of sinusoidal Cox regression compared with a traditional season-to-season Cox regression model. Sinusoidally varying observations, with the longest average survival times occurring among participants born during spring versus fall (ie, greatest risk of dying for fall births), were simulated as


In the above code, rannor and ranuni are SAS® functions that generate values from a standard normal and uniform distribution, respectively. The numbers within the parenthesis for the functions denote random number generator seed values. The floor function returns the largest integer that is less than or equal to the argument and gives the day of birth.

A sinusoidal Cox regression model, as presented in this paper, was used to test the seasonality hypothesis that fall births were more likely to die sooner from disease than spring births. A season-to-season analysis involved comparing survival times among fall and spring (referent group) births by dummy coding season of birth (ie, fall = 1, spring = 0) as the independent variable in a Cox regression model.

The sinusoidal Cox regression model was observed to be relatively more efficient than the traditional season-to-season method at detecting a statistically significant season-of-birth effect, and efficiency improved as the number (n) of simulated survival time observations increased (Table 3). For example, for n = 35, the sinusoidal Cox model detected a significant season-of-birth effect on disease survival (P = 0.01491), whereas the season-to-season analysis failed to achieve a statistically significant result (P = 0.23593). A Kaplan-Meier plot (n = 200) contrasting survival curves for fall and spring births is shown in Fig. 3.

Figure 3.
Kaplan-Meier plot comparing survival times for fall and spring births (n = 200 simulated observations).
Table 3.
Comparison of sinusoidal and season-to-season Cox regression models.


We have presented a simple, iterative Cox regression-based method to analyze censored time-to-event outcome data with seasonal predictor variables. The method is a simple extension of earlier trigonometric models yet is easier to apply and interpret.2530 A parallel sinusoidal logistic regression model has been presented in the literature for analyzing noncensored binary event data.23

A useful feature of sinusoidal Cox regression is its ability to optimally fit a sinusoidal curve to the underlying data by plotting harmonic displacement against calendar time. Whereas no single method provides a universal approach to analyze harmonic data, the current method accommodates varying lengths of months, different populations at risk, simultaneous adjustment for multiple confounders, and it is reasonably robust when used for small samples. The accompanying statistical test will have greater family-wise power to detect a sinusoidal pattern than performing multiple pairwise seasonal or monthly comparisons.

Similar to a dose response relationship based upon a best-fitting monotonic model and a priori biologic mechanism of action, multiplicity correction is not necessary for the optimal peak sinusoidal Cox regression model. Additionally, the model takes into account consecutively high/low time periods (eg, order of events), and the definition of season does not depend on an arbitrary start and end date but rather is determined by the model algorithm.

Several potential limitations should be noted when interpreting the results of a sinusoidal Cox regression analysis. A discrepancy between values expected under the model and the actual data may result in biased parameter estimates. Accordingly, the data should be tested for goodness-of-fit using standard statistics methods for Cox regression.19 Ambiguous results may occur in the case of competing out-of-phase cycles resulting in a cancelling of effects (eg, opposing seasonally effects by hemisphere of birth). When appropriate, stratification or the use of a multimodal model may help mitigate this problem. Additionally, the sinusoidal Cox regression model as specified will not distinguish between major and minor peaks since the magnitude of harmonic displacement will be equal for all peaks.

A statistically insignificant seasonal risk factor in the model does not necessarily rule out an underlying seasonal effect. For example, the factor may a have lopsided shape that may be difficult to statistically detect using a sinusoidal Cox model. Conversely, the seasonal association of a specific risk factor with disease does not necessarily imply causality. As with any statistical model, the results of sinusoidal Cox regression should be carefully interpreted in light of underlying limitations and biologic plausibility. Furthermore, the lack of a well defined hypothesis in advance of analysis and selection bias may lead to spurious results. Selection bias generally is difficult to correct after the data have been collected. A type of selection bias based on differential survival of certain individuals in the population at risk is a particular concern in season-of-birth studies. For example, susceptible individuals with a weak immune system may die soon after birth and distort the population base of adult survival studies.

The example presented in this paper is limited in scope. Future studies would be needed to determine whether a bimodal peak of seasonal risk is unique to this data set. New studies also would benefit by adjusting for potential confounders such as tumor grade, gender, diet, and body weight, and stratifying analyses by hemisphere or latitude of birth. The sinusoidal Cox regression model will provide a flexible tool for conducting such analyses.

In summary, seasonal environmental exposures occurring within a brief “critical window” during prenatal development or early infancy have been hypothesized to influence susceptibility to disease and survival later in life.31,32 Studies of season of birth and survival time have been difficult to interpret due to limitations of the statistical to methods used to analyze the data. In this paper, we have presented a sinusoidal Cox regression method that is derivative-free, easy to use, and it does not require the arbitrary demarcation of seasons characterizing other techniques. Furthermore, the sinusoidal Cox regression model yields a single P value, unlike traditional methods for analyzing seasonal data which require multiple pairwise comparisons. The sinusoidal model also is statistical more powerful in detecting season-of-birth trends in the data.


The author thanks Dr. Katherine T. Jones for valuable comments during the writing of this manuscript and her knowledge and insight are greatly appreciated. The contents of this publication are solely the responsibility of the author and do not necessarily represent the views of any institution or funding agency.

Appendix. Data File


Abbreviations: CDB, calendar day of the year birth date; P, period of birth (1 = 1917–29, 2 = 1930–39, 3 = 1940–49, 4 = 1950–59, 5 = 1960–69, 6 = 1970–79, 7 = 1980–89) A, age category (1 = ≤40 yrs., 2 = 41–50 yrs., 3 = 51–60 yrs., 4 = >60 yrs.); C, Censor variable (1 = D ead, 0 = Alive); STM, survival time in weeks.



This manuscript has been read and approved by the author. This paper is unique and is not under consideration by any other publication and has not been published elsewhere. The author and peer reviewers of this paper report no conflicts of interest. The author confirms that they have permission to reproduce any copyrighted material.


1. Munoz-Tuduri M, Garcia-Moro C. Season of birth affects short- and long-term survival. Am J Anthropol. 2008;135:462–8. [PubMed]
2. Yamakawa Y, Fukui M, Kinoshita K, Ohgami S, Kitamura K. Seasonal variation in incidence of cerebellar medulloblastoma by month of birth. Fukuoka Igaku Zasshi (Hukuoka Acta Medica) 1979;70:295–300. [PubMed]
3. Ederer F, Miller R, Scotto J, Bailer J. Birth-month and infant cancer mortality. Lancet. 1965;7404:185–6.
4. Meltzer A, Annegers F, Spitz M. Month-of-birth and incidence of acute lymphoblastic leukemia in children. Leuk Lymphoma. 1996;23:85–92. [PubMed]
5. Halperin E, Miranda M, Watson D, George S, Stanberry M. Medulloblastoma and birth date: evaluating three US data sets. Arch Environ Health. 2004;59:26–30. [PubMed]
6. Rice J, Ward J. Age dependence of susceptibility to carcinogenesis in the nervous system. Ann NY Acad Sci. 1982;381:274–89. [PubMed]
7. Alexandrov V, Aiello C, Rossi L. Modifying factors in prenatal carcinogenesis. In Vivo. 1990;4:327–36. [PubMed]
8. Druckrey H, Ivankovic S. Teratogenic and carcinogenic effects in the offspring after single injection of ethylnitrosourea to pregnant rats. Nature. 1966;210:1378–9. [PubMed]
9. Wechsler W, Kleihues P, Matsumoto S, et al. Pathology of experimental neurogenic tumors chemically induced during prenatal and postnatal life. Ann NY Acad Sci. 1969;159:360–408.
10. Sanders F. Experimental carcinogenesis: induction of multiple tumors by viruses. Cancer. 1977;40:1841–4. [PubMed]
11. Efird J. Season of birth and risk for adult onset glioma. Int J Environ Res Public Health. 2010;7:1913–36. [PMC free article] [PubMed]
12. Bengtsson T, Lindström M. Airborne infectious diseases during infancy and mortality in later life in southern Sweden, 1766–1894. Int J Epidemiol. 2003;32:286–94. [PubMed]
13. Cox D. Regression models and life-tables (with discussion) J R Stat Soc Ser B Stat (Methodological) 1972;34:187–220.
14. Clark T, Bradburn M, Love S, Altman D. Survival analysis part I: basic concepts and first analyses. Br J Cancer. 2003;89:232–8. [PMC free article] [PubMed]
15. Bradburn M, Clark T, Love S, Altman D. Survival analysis part II: multivariate data analysis—an introduction to concepts and methods. Br J Cancer. 2003;89:431–6. [PMC free article] [PubMed]
16. Kelsey J, Whittemore A, Evans A, Thompson W. Methods in Observational Epidemiology. New York: Oxford University Press; 1996. pp. 1–412.
17. Friedman G. Primer of Epidemiology. New York: McGraw-Hill; 1994. pp. 1–366.
18. Spruance S, Reid J, Grace M, Samore M. Hazard ratio in clinical trials. Antimicrob Agents Chemother. 2004;48:2787–92. [PMC free article] [PubMed]
19. Allison P. Survival Analysis Using the SAS System: A Practical Guide. Cary: SAS Institute; 1995. pp. 1–292.
20. Link C. Confidence intervals for the survival function using Cox’s proportional-hazard model with covariates. Biometrics. 1984;40:601–10. [PubMed]
21. Beckett L. Harvard University; 1987. Personal communication.
22. Chodick G, Shalev V, Goren I, Inskip P. Seasonality in birth weight in Israel: new evidence suggests several global patterns and different etiologies. Ann Epidemiol. 2007;17:440–6. [PubMed]
23. Efird J, Searles Nielsen S. A method to model season of birth as a surrogate environmental risk factor for disease. Int J Environ Res Public Health. 2008;5:49–53. [PubMed]
24. Efird J, Searles Nielsen S. A method to compute multiplicity corrected confidence intervals for odds ratios and other relative effect estimates. Int J Environ Res Public Health. 2008;5:394–8. [PubMed]
25. Stutvoet H. Seasonal birth frequencies in parameters. Acta Genet Stat Med. 1951;2:177–92. [PubMed]
26. Edwards J. The recognition and estimation of cyclic trends. Ann Hum Genet Lond. 1961;25:83–7. [PubMed]
27. Thomas J, Wallis K. Seasonal variation in regression analysis. J R Statist Soc A. 1971;134:57–72.
28. Roger J. A significant test for cyclic trends in incidence data. Biometrika. 1977;64:152–5.
29. Stolwijk A, Straatman H, Zielhuis G. Studying seasonality by using sine and cosine functions in regression analysis. J Epidemiol Community Health. 1999;53:235–8. [PMC free article] [PubMed]
30. Haus E, Touitou Y. Chronobiology in laboratory medicine. In: Touitou Y, Haus E, editors. Biological Rhythms in Clinical and Laboratory Medicine. Berlin and Heidelberg: Springer; 2001. pp. 693–5.
31. Ben-Shloma K, Kuh D. Model advocates that an exposure in a critical period results in permanent and irreversible damage or disease. Int J Epidemiol. 2002;31:285–93. [PubMed]
32. Gillman M. Epidemiological challenges in studying the fetal origins of adult chronic disease. Int J Epidemiol. 2002;31:294–9. [PubMed]

Articles from Cancer Informatics are provided here courtesy of SAGE Publications