Home | About | Journals | Submit | Contact Us | Français |

**|**HHS Author Manuscripts**|**PMC3057135

Formats

Article sections

- SUMMARY
- 1. INTRODUCTION
- 2. THE WEIBULL DISTRIBUTION AND ITS EXTENSIONS
- 3. INFERENCE UNDER COMPETING RISKS
- 4. APPLICATION OF WEIBULL-BASED MODELS TO BREAST CANCER DATA
- 5. CONCLUSION
- REFERENCES

Authors

Related links

Stat Med. Author manuscript; available in PMC 2011 March 15.

Published in final edited form as:

PMCID: PMC3057135

NIHMSID: NIHMS268452

Department of Biostatistics, Graduate School of Public Health, University of Pittsburgh, Pittsburgh, PA 15261, USA.

In this article, we propose a new generalization of the Weibull distribution, which incorporates the exponentiated Weibull distribution introduced by Mudholkar and Srivastava [1] as a special case. We refer to the new family of distributions as the beta-Weibull distribution. We investigate the potential usefulness of the beta-Weibull distribution for modeling censored survival data from biomedical studies. Several other generalizations of the standard two-parameter Weibull distribution are compared with regards to maximum likelihood inference of the cumulative incidence function, under the setting of competing risks. These Weibull-based parametric models are fit to a breast cancer dataset from the National Surgical Adjuvant Breast and Bowel Project (NSABP). In terms of statistical significance of the treatment effect and model adequacy, all generalized models lead to similar conclusions, suggesting that the beta-Weibull family is a reasonable candidate for modeling survival data.

A suitable parametric model is often of interest in the analysis of survival data, as it provides insight into characteristics of failure times and hazard functions that may not be available with non-parametric methods. The Weibull distribution [2] is one of the most commonly used families for modeling such data. However, only monotonically increasing and decreasing hazard functions can be generated from the classic two-parameter Weibull distribution; as such this two-parameter model is inadequate when the true hazard shape is of unimodal or bathtub nature.

The widely used Kaplan-Meier product-limit estimator is a flexible method to model survival data but is noted to often be inefficient [3]. Other semi-parametric methods, such as proportional hazards modelling, require assumptions that may not be plausible in many situations [4]. Meanwhile, various parametric techniques have been developed to incorporate the wide variety of patterns in survival data. Some of the proposed parametric models have incorporated a shape parameter into the classic Weibull distribution to account for additional possible hazard shapes. One of the first models of this type was proposed by Kalbfleisch [5], but this model may be impractical in the presence of censored data as it often requires the evaluation of an incomplete gamma integral or beta ratio.

Mudholkar and Srivastava [1] introduced an exponentiated version of the Weibull model that included an additional shape parameter. The distribution has a closed form of probability density, survival, and hazard functions, that are flexible and able to generate a wide variety of frequently observed hazard shapes, including unimodal and bathtub. It can be shown that this extension of the Weibull can be achieved through a simple application of the probability integral transform, using the densities of Beta and two-parameter Weibull distributions. Mudholkar, Srivastava, and Kollia [6] proposed another generalization of the Weibull model, which is able to generate similar types of hazard shapes as the exponentiated model; however irregularities may arise as the support of the distribution becomes dependent on the parameter space. A four-parameter generalization of the Weibull distribution was introduced by Jeong [7] based on stable distributions proposed by Hougaard [8], induced under the semiparametric frailty model. This model can also be viewed as a generalization of the model of Mudholkar et al. [6].

A decade has passed since the exponentiated Weibull was proposed, however a limited amount of literature is available on its application to survival data. This article introduces the family of beta-Weibull distributions encompassing regular two-parameter Weibull and the exponentiated Weibull distributions. It focuses on the potential use of the beta-Weibull distribution as a model for survival data in the presence of competing risks, in comparison with other generalizations of the Weibull model.

In the analysis of breast cancer data, it is often of interest to investigate only a subset of any first events. For example, investigators may be interested in investigating local or regional recurrence of the original breast cancer, considering other events such as distant recurrence, new primary cancers other than breast, or deaths prior to any disease [9]. In this case, a competing risks situation arises, i.e. local or regional events against other events. It is well-known that the cumulative incidence function estimates the proportion of local or regional recurrences correctly as a function of time in the presence of other events [5, 10]. In this paper, the cumulative incidence function will be parameterized and compared by using various existing Weibull distributions along with the proposed beta-Weibull distribution. These parametric approaches will be illustrated in a data set from one of the phase III breast cancer clinical trials performed by National Surgical Adjuvant Breast and Bowel Project (NSABP).

The article is organized as follows. In Section 2, the beta-Weibull family of distributions is introduced along with an overview of existing Weibull-based families. In Section 3, maximum likelihood estimation and corresponding inferential procedures are described for the generalized Weibull models in the setting of competing risks. In Section 4, these models are fitted to a previously analyzed breast cancer data set, and performance of the Weibull-based approaches are evaluated by using conventional model assessment methods. We conclude with a few remarks in Section 5.

The classic two-parameter Weibull distribution has the probability density function (pdf)

(1)

with its cumulative distribution function being

(2)

While this convenient distribution can be easily incorporated into practical tools for survival analysis such as proportional hazards and accelerated failure time models, only monotonically increasing or decreasing hazard shapes can be properly estimated. To overcome this limitation, several generalized versions of the classic two-parameter Weibull model have been proposed. In this section we describe two such models that require at most two extra parameters to be estimated.

A three-parameter generalization [6] of the Weibull distribution is defined by the survival function

(3)

and the corresponding hazard function

The regular case of this generalized Weibull distribution occurs when λ ≥ 0 with the density having support (0, ∞), generating monotonically decreasing or unimodal hazard functions. In particular, the distribution approaches the two-parameter Weibull as λ → ∞. When λ < 0, monotonically increasing or bathtub hazard shapes are generated, but the support of the density becomes parameter-dependent, within the range (0, β(−λ)^{α}). Similar to other non-regular densities, in these situations the likelihood may become unbounded and thus maximum likelihood estimates may not exist, or alternately, although the maximum likelihood estimates exist, their asymptotic properties may not hold for the classical asymptotic theory [11].

A four-parameter Weibull model was proposed by Jeong [7]. This generalization of the Weibull distribution incorporates an additional parameter τ to the three-parameter version proposed by Mudholkar [6] that was previously discussed. This generalization can be characterized by the survival function

with the corresponding hazard function

This distribution reduces to Mudholkar’s generalized Weibull distribution as τ approaches to zero. As one would expect, it covers a wider range of shapes than the three-parameter extension and is regular, regardless of the parameter values.

It has been demonstrated by Wahed [12] and Ferreira and Steel [13] that any parametric family of distribution can be incorporated into larger families through an application of the probability integral transform. Specifically, given two valid probability density functions (pdf) *g*_{1}(·) and *g*_{2}(·), with the latter having support on the unit interval, a third valid density function *f*(·) may be obtained by applying the equation

(4)

where *G*_{1}(·) is the cumulative distribution function (cdf) corresponding to *g*_{1}(·). Using this technique, we obtain yet another generalization of the two-parameter Weibull distribution as described below.

To obtain the appropriate extension of the Weibull distribution, we considered *g*_{1}(·) as the two-parameter Weibull pdf given by (1) and *g*_{2}(·) as a two-parameter beta distribution with pdf

(5)

Using (1) and (5) in (4) we obtain a new probability density function

(6)

where 0 < *x* < ∞, 0 < α_{1}, α_{2}, γ, β < ∞ and *B*(α_{1}, α_{2}) = Γ(α_{1})Γ(α_{2})/Γ(α_{1}+α_{2}). We will refer to this distribution as the beta-Weibull distribution throughout the article. The density given by (6) is a proper density and when α_{1} = α_{2} = 1, it reduces to the two-parameter Weibull density with scale parameter β and the shape parameter γ. This model is generated entirely based on two commonly used probability distributions, beta and Weibull, and therefore can be easily implemented using conventional statistical software packages. This process of using the beta distribution to generate a new class of distribution has been previously considered by Jones [14]. The survival function of the beta-Weibull distribution cannot be expressed in a closed form; however it may be estimated by evaluating the regularized beta function at the cumulative distribution function of the Weibull distribution. Specifically, the survival function of the beta-Weibull distribution (6) is given by

(7)

where

(8)

is the incomplete beta function. The corresponding hazard function is obtained as

(9)

The survival and hazard functions are plotted in Figures 1 and and2,2, respectively. The hazard plots show that this model can generate unimodal, v-shaped and bathtub shaped hazards. The parameters, α_{1} and α_{2}, are related to the beta-component of the distribution; as a result, the mean and variance of the distribution increases as α_{1} increases, as for the beta distribution. Similarly, the survival probability is higher for larger α_{1}. The shape of the hazard function is influenced by the parameter α_{1} and γ, while β remains as a scale parameter. The second parameter from the beta distribution, α_{2}, also primarily acts as a scale parameter as it is absorbed into the Weibull function through the exponent term *e*^{−(x/β)γ}.

The skewing mechanism based on the beta distribution has also been considered in other settings. For example, Ferreira and Steel [15] used a restricted version of the beta distribution to obtain skewed versions of multivariate distributions where they imposed the restriction α_{2} = 1/α_{1} on the beta-parameters. We impose the same constraint on (6) to obtain what we will refer to as a restricted beta-Weibull distribution.

Notice that when α_{2} = 1, the beta-Weibull distribution defined by (6) reduces to

(10)

which is the so-called exponentiated Weibull distribution [1, 16]. For this specific case, both survival function and hazard function are evaluable in closed forms and are given by

(11)

and

(12)

respectively.

Setting α = α_{1} for notational convenience, the derivative of the hazard function with respect to *x* can be expressed as

(13)

where

(14)

Notice that *m*(*x*; α, β, γ) is non-negative regardless of any values of *x* or parameters.

As discussed in Mudholkar and Srivastava [1], for α < 1, the hazard shapes are unimodal if α < 1/γ and monotonically decreasing otherwise; likewise when α > 1 bathtub shapes are generated for α > 1/γ, with monotonically increasing shapes otherwise. The hazard function is monotonically increasing when α ≥ 1, γ > 1, while α ≤ 1, γ < 1 implies a monotonically decreasing function; when both parameters are equal to 1, the distribution reduces to the exponential distribution (constant hazard). However, when one of the parameters is greater than 1 and the other is less than 1, more shapes can be generated. If α > 1 and γ < 1, a unimodal hazard shape is generated if α > 1/γ and is monotonically decreasing otherwise; if α < 1 and γ > 1, bathtub shapes are generated when α < 1/γ < 1 and are monotonically increasing otherwise. Behavior of the hazard function with respect to the parameters is depicted in Figure 3 for values of α less than 1 (left) and greater than 1 (right). Similarly, the hazard shapes for different values of the shape parameter γ for α < 1 (left) and α > 1 (right) are plotted in Figure 4. Different types of hazard shapes may be obtained by altering this parameter while keeping the other two parameters fixed.

Hazard shapes for exponentiated Weibull, with varying α and fixed β = 2.0, for γ = 0.5 (left), and γ = 2.0 (right).

Hazard functions of exponentiated Weibull distribution for varying γ and fixed β = 2.0, for α = 0.2 (left), and α = 2.0 (right). Three types of hazard shapes are generated in each graph, with respect to γ only.

While both exponentiated Weibull distribution and three-parameter generalized Weibull (Section 2.1) reduce to the two-parameter Weibull distribution when λ approaches ∞ and λ approaches 1, there is no clear link between the parameters that comprise these distribution functions. This is apparent from the behavior of respective hazard functions; λ < 0 implies either a monotonically increasing or bathtub shape for the generalized Weibull, while α > 1 may result in a monotone hazard function that is increasing or decreasing, or a unimodal function. Similarly, λ > 0 produces a monotonically decreasing or unimodal shape for the generalized Weibull, whereas if 0 < α < 1 for the exponentiated Weibull, a bathtub shape is possible, along with monotonically decreasing or increasing shapes. The exponentiated Weibull is a generalization of the Burr Type X distribution *S*(*x*) = [1 − *e*^{−(x/β)2}]^{α}, while the family of the three-parameter generalized Weibull includes that of the Burr Type XII family of distributions [17] as a special case.

When multiple cause-specific events, or competing risks, are present, the cumulative incidence function correctly assesses the cumulative probability of a particular cause-specific event while taking into account other types of events [5]. This prevents the cumulative probability of a sub-distribution of cause specific events of interest from being overestimated, which would occur if failures due to other causes are considered as censored [10]. For data consisting of only two types of cause-specific events, the cumulative incidence function for cause-specific events of type 1, under the assumption of cumulative hazards, is defined as

(15)

where *S*_{2}(·, ϕ_{2}) and *f*_{1}(·, ϕ_{1}) are survival function for events of type 2 and probability density function for events of type 1, respectively. In this paper, we will focus on parameterizing *S*_{2}(·, ϕ_{2}) and *f*_{1}(·, ϕ_{1}) via Weibull-based models to estimate the cumulative incidence function.

The set of parameter vector ϕ = (ϕ_{1}, ϕ_{2}) of a given Weibull-based model for a given data set can be estimated through the maximum likelihood method. For the cause-specific event of type *k*(*k* = 1, 2), the observed data for the *ith* individual is denoted by (*x _{i}*, δ

(16)

For all the Weibull-based models described in the previous section, the likelihood function (16) cannot be maximized analytically. Thus, an iterative procedure such as the Newton-Raphson method needs to be used to obtain maximum likelihood estimates for the parameter vector ϕ_{k} for *k*th cause-specific event. We have used the S-Plus function *nlminb* to minimize −*l _{k}*(ϕ

The corresponding estimators for the cumulative incidence and survival functions are obtained by substituting the parameter estimates into the respective functions. For instance, the cumulative incidence function (15) is estimated by

(17)

The standard errors of the estimated cumulative incidence function and the survival probability is then obtained through an application of the multivariate delta method. For instance, an estimate of the approximate variance of the estimated cumulative incidence function *F*_{1}(·) is given by (see [7] for derivation)

(18)

Equality of survival or cumulative incidence functions between independent treatment groups can then be tested using the Wald test. For testing the equality of two parametric functions θ^{(1)}(*t*) and θ^{(2)}(*t*) from two different treatments at a specific time point *t*, the appropriate test statistic is

(19)

where *Z* ~ *N*(0, 1). For instance, the appropriate Wald test statistic to compare two cumulative incidence functions at time *t* will have the form

(20)

We fitted the models to a data set from a breast cancer study (Protocol B-20) performed by the National Surgical Adjuvant Breast and Bowel Project Study (NSABP) [19]. The study was designed to test if the addition of chemotherapy to tamoxifen would result in increased disease-free survival compared to tamoxifen alone. Here the disease-free survival events include breast cancer recurrences in local, regional, or distant sites, second primary cancers other than breast, or death prior to previously mentioned events, whichever occurs first. In this section, the cause-specific events of interest will be the local or regional recurrences, and the other competing events consist of distant recurrences, second primaries, and death without evidence of disease (see Figure 5). A total of 2,363 patients were randomized to tamoxifen, CMF (a chemotherapy containing the alkylating agent cyclephosphamide) or CMFT (CMF plus tamoxifen). The sample consists of 770 of the patients in the tamoxifen group and 766 in the CMFT group. We present our data only up to first 10 years as the risk set become sparse afterwards.

Maximum likelihood estimates for the parameters from different model fits are displayed in Table I for the cause-specific models, and Table II for the disease-free model. The Akaike Informational Criterion (AIC) is also provided. For the 2-parameter, 3-parameter, 4-parameter and exponentiated versions of the Weibull model, the standard errors were obtained from inverting the matrix of minus the second-derivatives of the log-likelihoods, evaluated at the MLEs. Bootstrapping, using 1000 replicates, was used to estimate the standard errors for the restricted and full beta-Weibull models, for which the second derivatives of the log-likelihood were not available in a closed form.

Maximum likelihood estimates of Weibull parameters from breast cancer data (all disease-free events)

Within the tamoxifen group, the 4-parameter Weibull model had the best fit as assessed by the AIC followed by the restricted and unrestricted beta-Weibull model, although there were hardly any difference between the AIC’s for the latter two models. The two-parameter Weibull model had the largest AIC. By contrast, in the CMFT group there were no significant differences among the fits, regardless of the number of parameters in the model.

The estimated overall hazard function from each of the six model fits is displayed in Figure 6 for each type of the competing events and in Figure 7 for all disease-free events. The non-parametric hazard estimates are life table estimates of the annual hazard rate at the midpoint of each year. Model fits seem generally better for the other events, as local or regional recurrences were relatively infrequent and had more variable hazard shapes (see more on model adequacy in Section 4.2).

Estimated event-specific hazard functions; a non-parametric fit (straight lines) and fitted curves from each of the 6 Weibull models, for local or regional recurrences (left) and other events (right), and the tamoxifen (top) and CMFT (bottom) groups.

Estimated hazard functions for all disease-free events, from a non-parametric fit (straight lines) and fitted curves from each of the six Weibull models, for the tamoxifen (left) and CMFT (right) groups.

The generalized Weibull models, including the beta-Weibull versions, were able to capture more complex unimodal hazard shapes (local or regional recurrences) as well as the basic monotone shapes (other events). Within the tamoxifen group, according to the non-parametric fit there was an apparent peak at 2–3 years which leveled off after 10 years. This peak was inadequately captured by the two-parameter model, which only produced monotonic shapes; the Weibull extensions produced unimodal shapes that had captured this peak. The curves from the exponentiated model were closest to those from the two-parameter model, tending to produce a more conservative peak at 2–3 years than those from the other four extensions of the Weibull. The hazards from the restricted version of beta-Weibull tended to follow more closely those from the full beta-Weibull model. It is interesting to observe that, as the number of parameters increases, a model tends to pick up the peak earlier in this dataset.

There did not appear to be a similar single peak for the observed events in the CMFT group. As such, monotonically increasing hazard shapes from the two-parameter Weibull model provided an adequate fit to the observed non-parametric curves. The three and four-parameter models very closely followed the two-parameter fit.

Model adequacy was investigated using signed deviance residuals with censored observations [20] to assess goodness of fit (Tables III and andIV).IV). These are obtained by calculating the expected number of events *e _{y}* during each year

(21)

The models tended to provide a relatively poor fit to the data for the endpoint of local or regional recurrences, in large part due to their relative scarcity and the irregular hazard shapes; the models provided better fits for the other events and when all events were grouped together to estimate the overall survival curve.

The cumulative incidence function was estimated using all six Weibull models for each type of competing events, i.e. local or regional recurrences and other events.

The non-parametric cumulative incidence was estimated through Gray’s method [21], while the parametric cumulative incidence functions were obtained through the invariance property of the maximum likelihood estimates from all six Weibull models. To demonstrate how closely the parametric cumulative incidence curves followed the non-parametric estimates, we only presented the estimated cumulative incidence curves from the four-parameter Jeong model and the restricted beta-Weibull model in Figure 8 for each event type. We have chosen these two Weibull models as they had lower AIC values compared to other models. Figure 9 shows the corresponding disease-free survival curves. There were hardly any differences in cumulative incidence or survival estimates between the two models. The fits from the Weibull models generally followed the non-parametric cumulative incidence curves closely. There is a significant reduction in local or regional recurrences in the CMFT group, but no apparent difference between the two treatment groups in other types of events (Table V). A formal Wald test also concluded that there was a significant improvement in disease-free survival in the CMFT group, which was mainly due to reduction in local or regional recurrences. Both Jeong model and the restricted beta-Weibull model resulted in the same conclusions.

Cumulative incidence function estimates and 95% confidence intervals, between tamoxifen (solid) and CMFT (dashed) groups, for local or regional recurrences (top) and other events (bottom) using the restricted beta-Weibull model (left panels) and 4-parameter **...**

Estimated disease-free probability in tamoxifen (solid) and CMFT (dashed) groups using the restricted beta-Weibull model (left) and 4-parameter Jeong model (right), along with the corresponding 95% confidence intervals. Non-parametric estimates (non-smooth **...**

In this article, we have presented a new generalization of Weibull distribution, the beta-Weibull family, by applying the beta skewing mechanism on the classical two-parameter Weibull distribution. This generalization is easily obtained by transforming the two-parameter Weibull model through a beta distribution and hence can be implemented easily using standard statistical software packages.

The beta-Weibull, two-parameter, exponentiated and other existing generalized models were used to model competing risks and disease-free survival in data from a previously published breast cancer study. The two-parameter Weibull distribution produced simple monotone hazard shapes, as expected, that did not reflect pattern of the hazard shape in the observed data. On the other hand, the extensions of the Weibull model were able to more accurately capture the observed hazard pattern. In terms of statistical significance of the treatment effect and model adequacy, all extended models lead to similar conclusions, suggesting that beta-Weibull family could play a reasonable role as a candidate for modeling survival data, including competing risks. One limitation of the beta-Weibull family is that the survival and the hazard functions cannot be expressed in closed form and hence numerical integration techniques were needed to evaluate them.

1. Mudholkar GS, Srivastava DK. Exponentiated weibull family for analyzing bathtub failure-rate data. IEEE Transactions on Reliability. 1993;42:299–302.

2. Weibull WA. A statistical distribution of wide applicability. Journal of Applied Mechanics. 1951;18:293–297.

3. Miller RG. What price Kaplan-Meier? Biometrics. 1983;39:1077–1081. [PubMed]

4. Cox DR, Oakes D. Analysis of Survival Data. New York: Chapman & Hall Ltd; 1984.

5. Kalbfleisch JD, Prentice RL. The Statistical Analysis of Failure Time Data. New York: John Wiley & Sons; 1980. p. 321.

6. Mudholkar GS, Srivastava DK, Kollia GD. A generalization of the Weibull distribution with application to the analysis of survival data. Journal of the American Statistical Association. 1996;91:1575–1583.

7. Jeong JH. A new parametric family for modelling cumulative incidence function: application to breast cancer data. J.R. Statistic. Soc. A. 2006;169:289–303.

8. Hougaard P. Survival models for heterogeneous populations derived from stable distributions. Biometrika. 1986;73:387–396.

9. Taghian A, Jeong JH, Mamounas E, Anderson S, Bryant J, Deutsch M, Wolmark N. Patterns of locoregional failure in patients with operable breast cancer treated by mastectomy and adjuvant chemotherapy with or without tamoxifen and without radiotherapy: results from five National Surgical Adjuvant Breast and Bowel Project randomized clinical trials. Journal of Clinical Oncology. 2004;22:4237–4239. [PubMed]

10. Gooley T, Leisenring W, Crowley J, Storer B. Estimation of failure probabilities in the presence of competing risks: new representations of old estimators. Statistics in Medicine. 1999;18:695–706. [PubMed]

11. Smith RL. Maximum likelihood estimation in a class of nonregular cases. Biometrika. 1985;72:67–90.

12. Wahed AS. A general method of constructing extended families of distributions from an existing continuous class. Journal of Probability and Statistical Science. 2006;4:165–177.

13. Ferreira JT, Steel M. A constructive representation of univariate skewed distributions. Journal of the American Statistical Association, Theory and methods. 2006;101:823–829.

14. Jones MC. Families of Distributions arising from Distributions of Order Statistics. Test. 2004;13:1–43.

15. Ferreira JT, Steel M. Model comparison of coordinate-free multivariate skewed distributions with an application to stochastic frontiers. Journal of Econometrics. 2007;137:641–673.

16. Mudholkar GS, Srivastava DK, Freimer M. The exponentiated Weibull family: a reanalysis of the bus-motor-failure data. Technometrics. 1995;37:436–445.

17. Rodriguez RN. A guide to the Burr type XII distributions. Biometrika. 1977;64:129–134.

18. Thisted RA. Elements of Statistical Computing. New York: Chapman & Hall; 1988.

19. Fisher B, Dignam J, Wolmark N, DeCillis A, Emir B, Wickerham DL, Bryant J, Dimitrov N, Abramson N, Atkins J, Shibata H, Deschenes L, Margolese R. Tamoxifen and Chemotherapy for Lymph Node-Negative, Estrogen Receptor-Positive Breast Cancer. Journal of the National Cancer Institute. 1997;89.22:1673–1682. [PubMed]

20. Efron B. Logistic regression, survival analysis, and the Kaplan-Meier curve. Journal of the American Statistical Association. 1988;83:414–425.

21. Gray R. A class of k-sample tests for comparing the cumulative incidence of a competing risk. Annals of Statistics. 1988;16:1141–1154.

PubMed Central Canada is a service of the Canadian Institutes of Health Research (CIHR) working in partnership with the National Research Council's Canada Institute for Scientific and Technical Information in cooperation with the National Center for Biotechnology Information at the U.S. National Library of Medicine(NCBI/NLM). It includes content provided to the PubMed Central International archive by participating publishers. |