Search tips
Search criteria 


Logo of ijayurvedaLink to Publisher's site
Int J Ayurveda Res. 2011 Jan-Mar; 2(1): 62–64.
PMCID: PMC3157115

Revisiting survival analysis


The methods of survival analysis are required to analyze duration (time-to-event) data but their use is restricted possibly due to lack of awareness and the intricacies involved. These methods (including clinical life table, Kaplan-Meier survival estimation, Log-rank test, Survival analysis with covariates like Cox proportional hazards models or accelerated failure time models, etc.) deal with any duration from a defined start to a specific endpoint and not only “death”. The article titled “Understanding survival analysis: Kaplan–Meier estimate” (vol. 1, issue 4, pp 274-278) is very good. Although understanding of Kaplan–Meier procedure and subsequent Log-rank test is difficult, they must be known to clinicians and other researchers because they are very useful techniques as they (all in this category) allow ‘serial intake’ and ‘serial dropout’. It is important to understand the concepts. Authors have done it excellently. It is the ‘concept’ and not ‘calculations’ which should be given importance. These days good software is available for calculations and so why not use them. One very good/useful software in public domain (can be downloaded free of cost from http:/ is WHO-CDC's EPI-INFO.

However, it is necessary to understand ‘how these values are arrived at’ and for that the calculations should be done correctly. In this article, first two tables [Tables [Tables11 and and2]2] are correct. One suggestion is, adding a column showing number of ‘censored’ (indicated with ‘*’) will help understand ‘No. at risk’ (i.e. numbers in column with heading ‘Live at the start of the day’) as No. at risk at start of next point = No. at risk at start of previous point – [No. died at/after previous point + No. censored at/after previous point]. Both these tables are re-calculated.

Table 1
(Kaplan–Meier survival probabilities - sample-I)
Table 2
(Kaplan–Meier survival probabilities - sample-II)

Life table method of survival analysis is generally used for grouped-interval censored data where the exact duration is not known but only the interval is known (or known but grouped for convenience). This method of data collection is generally adopted when the number of subjects is really large and periodic visits to the system are more cost-effective than continuous observations. Usual life-table method assumes that the events occur uniformly over the interval for subjects dropping out (i.e. censored) in that interval. Probability of survival for each interval is obtained conditioned on surviving the preceding interval. Survival function is obtained by multiplication of the successive conditional probabilities. Plot of survival function against the end-point of the time interval, when joined by lines, is the survival curve.

The idea behind the Log-rank test for the comparison of two life tables (or survival curves by Kaplan–Meier method) is simple: If there were no differences between the two groups, the total deaths occurring any time should split between the two groups in the ratio of the numbers at risk in the two groups at that time. For example, if the numbers at risk in the first and second groups (in some fixed interval / fixed time point) were 70 and 30, respectively, and 10 deaths occurred in that interval or at that point in time, in both groups together, we would expect 10 × (70 / 70+30) = 7 deaths to have occurred in the first group, and 10 × (30 / 70+30) = 3 deaths to have occurred in the second group.

A similar calculation can be made at each time of death (in either group). By adding together (sum) for first group the results of all such calculations, we obtain what is called as the ‘extent of exposure’ which represents the ‘expected number of deaths’ in first group if the two groups had the same distribution of survival time (and denoted as E1). This ‘extent of exposure’ (denoted as E2) for second group can be obtained in the same way or for each time point by subtracting that number (expected deaths at that time point) from total observed deaths at that time point and then summing.

Let O1 and O2 denote the actual/observed number of deaths in the two groups, respectively. Since O1 + O2 = E1 + E2, E2 can even be calculated just by subtraction. The discrepancy between the O's and E's is measured by {[O1 - E1]2/ E1} + {[O2 - E2]2/ E2} and is called as ‘Log-Rank’ test statistic which follows χ2 distribution with 1 degree-of-freedom (this test is by Mentel–Cox, other three being Breslow's generalized Wilcoxon, Tarone–Ware, and Peto–Prentice). Table 3 of that article displays calculations of log-rank. There are some errors in that table (for example, for time of event = 27 ‘Live at start of the day’ i.e. N = 41 as five deaths have occurred together without any ‘censored’ before that point and therefore N = 46 – 5 = 41 and not 40 as shown). Fortunately these errors have not changed the conclusion, however, borderline significance may change it. This table is also re-calculated and displayed below:

Table 3
Log-Rank test statistic calculations

The last two columns in the table given in this article are numbers and not probabilities as said. Note that few values are above one in these column(s). Column(s) should have heading like ‘Expected number of deaths in group ….’.

‘Log-rank’ test statistic yielded by EPI-INFO is 0.3789 and P=0.5382. These figures are confirmed by larger (but priced) software Bio-Medical Data Processing (BMDP). When software is available, why take burden of performing these complicated calculations?

Articles from International Journal of Ayurveda Research are provided here courtesy of Wolters Kluwer -- Medknow Publications