Survival analysis: part II – applied clinical data analysis

Article information

Korean J Anesthesiol. 2019;72(5):441-457

Publication date (electronic) : 2019 May 17

doi : https://doi.org/10.4097/kja.19183

Junyong In ¹

, Dong Kyu Lee^,²

¹Department of Anesthesiology and Pain Medicine, Dongguk University Ilsan Hospital, Goyang, Korea

²Department of Anesthesiology and Pain Medicine, Guro Hospital, Korea University School of Medicine, Seoul, Korea

Corresponding author: Dong Kyu Lee, M.D., Ph.D. Department of Anesthesiology and Pain Medicine, Guro Hospital, Korea University School of Medicine, 148 Gurodong-ro, Guro-gu, Seoul 08308, Korea Tel: +82-2-2626-3237, Fax: +82-2-2626-1438 Email: entopic@korea.ac.kr

Received 2019 May 2; Revised 2019 May 13; Accepted 2019 May 16.

Abstract

As a follow-up to a previous article, this review provides several in-depth concepts regarding a survival analysis. Also, several codes for specific survival analysis are listed to enhance the understanding of such an analysis and to provide an applicable survival analysis method. A proportional hazard assumption is an important concept in survival analysis. Validation of this assumption is crucial for survival analysis. For this purpose, a graphical analysis method and a goodnessof- fit test are introduced along with detailed codes and examples. In the case of a violated proportional hazard assumption, the extended models of a Cox regression are required. Simplified concepts of a stratified Cox proportional hazard model and time-dependent Cox regression are also described. The source code for an actual analysis using an available statistical package with a detailed interpretation of the results can enable the realization of survival analysis with personal data. To enhance the statistical power of survival analysis, an evaluation of the basic assumptions and the interaction between variables and time is important. In doing so, survival analysis can provide reliable scientific results with a high level of confidence.

Keywords: Cox regression; Extended Cox regression; Goodness of fit test; Log minus log plot; Proportional hazard assumption; Schoenfeld residual; Stratified Cox regression; Survival analysis; Time-dependent coefficient; Time-dependent Cox regression

Introduction

The previous article ‘Survival analysis: Part I – analysis of time-to-event’ introduced the basic concepts of a survival analysis [1]. To decrease the gap between the data from a clinical case and a statistical analysis, this article presents several extended forms of the Cox proportional hazards (CPH) model in-series.

The most important aspect of the CPH model is a proportional hazard assumption during the observation period. The hazard of an event occurring during an observation cannot always be remained constantly, and the hazard ratio cannot be maintained at a constant level. This is the main obstacle for a clinical data analysis using a CPH model.

The basic concepts required to understand and interpret the results of a survival analysis were covered in a previous article [1]. Part 2 of this article, described herein, focuses on the analytical methods applying clinical data and coping with problems that can occur during an analysis. Such methods for validating a proportional hazard assumption apply clinical data and several extended Cox models to overcome the problem of a violated proportional hazard assumption. This article also includes the R codes used for estimating several Cox models based on clinical data. For those familiar with a statistical analysis, the R codes can easily enable an extension of the Cox model estimation.¹⁾

Proportional Hazard Assumption

Refer to the previous article [1] for a description of diagnostic methods applied to a CPH model. Here, we consider only a proportional hazard assumption. A hazard is defined as the probability of an event occurring at a time point (t). The survival function of a CPH model is an exponential function, and the hazard ratio (λ) is constant during an observation; thus, a survival function is defined in the exponential form of the hazard ratio at a time point (equation 1) [1].

(1) s(t) = exp−λts(t): survival function based on the CPH modelt: specific time pointλ: hazard ratio

To estimate hazard ratio, which is included in the survival function, hazard function (h) is required and it contains a specific explanatory variable (X) which indicates a specific treatment or exposure to a specific circumstance. At the time point of t, the hazard function of the control group is defined as the basal hazard function (h₀ (t)), and hazard function of the treatment group as the combined form of the basal hazard function and a certain function with the explanatory variable (X). The hazard ratio is the value of the hazard functions of treatment over control groups (equation 2) [2].

(2) hC (t) = h0 (t)hT (t) = h0 (t) × expβX λ =hT (t)hc (t)=h0 (t) × expβXh0 (t)=expβXhC (t): Hazard function of control grouphT (t): Hazard function of treatment groupλ: Hazard ratioh0 (t): Baseline hazard function at time tt: specific time pointX: explanatory variableβ: coefficient for X

As shown in equation 2, the CPH model processes the analysis under the constant hazard ratio assumption with the explanatory variable, which is not affected by the time [3]. The hazard ratio remains constant, and the hazards of each group at any time point remain at a distance and never meet graphically during an observation. However, this does not guarantee the satisfaction of the proportional hazard assumption. In a clinical setting, one hazard could remain lower or higher than the others, and their ratio cannot be constant because the treatment effect may vary owing to various factors. Therefore, we need a statistical method to prove the satisfaction or violation of the proportional hazard assumption.

Validation of Proportional Hazard Assumption

There are three representative validation methods of a proportional hazard assumption. One is a graphical approach, another is using the goodness of fit (GOF), and the last is applying a time-dependent covariate [4,5].

Graphical analysis for validation of proportional hazard assumption

As mentioned in the previous article, a log minus log plot (LML plot) is one of the most frequently used methods for the validation of a proportional hazard assumption [1]. The log transformation is applied twice during a mathematical process for estimating the survival function. The first log transformation results in negative values because the probability values from the survival function lay between zero and 1, and such values should be made positive to conduct a second log transformation. The name of the LML plot implies this process. A survival function is the exponential form of a hazard ratio, and the hazard ratio is constituted with the hazard function, which is an exponential form of an explanatory variable. As a result of an LML transformation, the survival function is converted into a linear functional form, and the difference from the explanatory variable creates a distance on the y-axis at a time point. Ultimately, survival functions that are log transformed twice become parallel during the observation period. Deductively, two curves on an LML plot also become parallel, which indicates that the hazard ratio remains constant during the observation period [4].

There is a risk of subjective decision regarding the validation of a proportional hazard assumption using an LML plot because this method is based on a visual check. It is recommended that the interpretation be as conservative as possible except under strong evidence of a violation, including instances in which the curves are crossing each other or apparently meet. A continuous explanatory variable should be converted into a categorical variable of two or three levels to produce an LML plot. When doing so, the data thin out and a different result can be reached according to the criteria used for dividing the variable [5].

R codes for Kaplan–Meier survival analysis under the assumption of a proportional hazard

The sample data ‘Survival2_PONV.csv’ contains the imaginary data of 104 patients regarding the first onset time of postoperative nausea and vomiting (PONV). All patients received one of two types of antiemetics (Drugs A or B). The columns represent the patient number (No), types of antiemetics (Antiemetics), age (Age), body weight (Wt), amount of opioid used during anesthesia (Inopioid), the first PONV onset time (Time), and whether PONV occurred (PONV). To load such data into R software 3.5.2 (R Development Core Team, Vienna, Austria, 2018), the following code can be used. In this code, the location of the CSV file on the hard drive is ‘d:’, and users should adequately modify the path. This code provides the first several lines of data (Table 1).

Table 1.

First Three Data Imported as PONV.raw

# Read data

PONV.raw <- read.csv ("d:/Survival2_PONV.csv",

TRUE, sep = ","

)

# Check imported data

head(PONV.raw)

To conduct a survival analysis using R, two R packages are required, ‘survival’²⁾^, and ‘survminer’.³⁾ When these packages are not supplied as a default, manual installation is not difficult when using the command ‘install.packages(“package name”)’. These packages are then called.

#Load Package: survival, survminer

library (survival)

library (survminer)

Then, a Kaplan–Meier survival analysis is applied. The following code covers a Kaplan–Meier analysis, comparing the PONV using a log-rank test, and the LML plot introduced in part I of this article [1]. Small modifications of this code can enable a survival analysis with the user’s own data.

### Kaplan-Meier Estimation (KME)

#Add survival object

PONV.raw$Survobj <- with(PONV.raw,

Surv(Time, PONV == 1)

)

head (PONV.raw)

## Single KME. The log-log confidence interval is preferred.

km.one <- survfit(Survobj ~1, data = PONV.raw,

conf.type = "log-log")

# Result of KME

km.one

# Survival table

summary (km.one)

# Survival curve

ggsurvplot (km.one, data = PONV.raw,

conf.int = TRUE,

palette = "grey",

surv.median.line = "hv",

break.time.by = 4,

censor = TRUE,

legend = "none",

xlab = "Time (hour)",

risk.table = TRUE,

tables.height = 0.2,

tables.theme = theme_cleantable(),

risk.table.y.text = FALSE

)

R applies a Kaplan–Meier analysis using the new variable ‘Survobj’. The results of a Kaplan–Meier analysis and a survival table are presented in Table 2. Out of 104 patients, 63 patients suffered from PONV, and the median onset time was 10 h. A graphical presentation is also possible (Fig. 1). Here, ‘ggsurvplot’ produces survival curves with complex arguments, fine-tuning the argument options to draw intuitive graphs.

Table 2.

Results of Kaplan–Meier Estimation and Survival Table

Fig. 1.

Kaplan–Meier curve of overall survival status with sample data. A 95% confidence interval (estimated from a log hazard) is presented in the shadowed area. The dashed line indicates the median survival time.

The next code is for an estimation of the survival curves according to two antiemetics and conducting a log-rank test.

### KME by Antiemetics

km.antiemetics <- survfit (Survobj ~ Antiemetics,

data = PONV.raw,

conf.type = "log-log"

)

# Result of KME by Antiemetics

km.antiemetics

# Survival table of KME by Antiemetics

summary (km.antiemetics)

# KM estimation, log-rank test

survdiff ( formula = Surv(Time, PONV == 1)

~ Antiemetics,

data = PONV.raw

)

# Survival curve of KME by Antiemetics

ggsurvplot ( km.antiemetics, data = PONV.raw,

fun = "pct", pval = TRUE,

conf.int = TRUE, surv.median.line = "hv",

linetype = "strata", palette = "grey",

xlab="Time (hour)",

legend.title = "Antiemetics",

legend.labs = c("Drug A", "Drug B"),

legend = c(.1, .2), break.time.by = 4,

risk.table = TRUE,

tables.height = 0.2,

tables.theme = theme_cleantable(),

risk.table.y.text.col = TRUE,

risk.table.y.text = TRUE

)

Table 3 and Fig. 2 show the results of this code. Antiemetics are coded as 0 for Drug A and 1 for Drug B, namely ‘Antiemetics = 0’ and ‘1’ represent Drugs A and B, respectively in the Table and Figure.

Table 3.

Results of Kaplan–Meier Estimation between Antiemetics, Survival Tables of Two Antiemetics, and Comparison Results of Log-rank Test

Fig. 2.

Kaplan–Meier curves of two antiemetics with sample data. The P value is estimated based on a log-rank test. A 95% confidence interval (estimated from a log hazard) is presented in the shadowed area. The dashed lines indicate the median survival times of groups taking Drugs A and B. Drug A is coded as ‘Antiemetics = 0’ and Drug B is coded as ‘Antiemetics = 1’ in the original data.

As the interpretation of a log-rank test, the survival functions of two antiemetics are statistically different (P = 0.009), and the median PONV free time is 13 and 6 h for Drugs A and B, respectively.

The log-rank test is also based on the proportional hazard assumption, and an LML plot can be used to validate this assumption. The code for this process is as follows, and the output graph is shown in Fig. 3.⁴^,,⁵)

Fig. 3.

Log minus log plot of Kaplan–Meier estimation with log-rank test between two antiemetics. The two curves do not meet during the observation period, indicating the satisfaction of the proportional hazard assumption. The log-time scale is shown in the x-axis.

# LML plot

plot (survfit(Surv(Time, PONV == 1) ~ Antiemetics,

data = PONV.raw), fun = "cloglog")

The goodness of fir test (GOF test)

The second method for validating a proportional hazard assumption is a GOF test between the observed and estimated survival function values. This provides a P value and hence is a more objective method than a visual check [5].

A Schoenfeld residual test is a representative GOF test for validation of a proportional hazard assumption [6–8]. A Schoenfeld residual is the difference between explanatory variables observed in the real world and estimated using a CPH model for patients who experience an event. Thus, Schoenfeld residuals are calculated using all explanatory variables included in the model. If the CPH model includes two explanatory variables, the two Schoenfeld residuals come out for one patient at a time.⁶)

Because the hazard ratio is constant during the observation period (a proportional hazard assumption), Schoenfeld residuals are independent of time. A violation of the proportional hazard assumption may be suspected when the Schoenfeld residual plot presents a relationship with time. Also, a Schoenfeld residual test is possible under a null hypothesis of ‘there is no correlation between the Schoenfeld residuals and ranked event time’.⁷)

Schoenfeld residual tests cannot be used to validate a proportional hazard assumption in a Kaplan–Meier estimation because it is based on estimated values using the CPH model. A Schoenfeld residual test is lacking in terms of the statistical hypothesis testing process. Null hypothesis significance testing applies a statistical process to validate ‘no difference,’ and when the null hypothesis is not true under a significant level, an alternative hypothesis is true except for the probability of the significance level, that is, differences exist between comparatives within the probability of significance. A Schoenfeld residual test determines whether a proportional hazard assumption is violated based on the probability of the correlation statistics. Correlation statistics with a higher probability than the significance level result in a satisfaction of the proportional hazard assumption without null hypothesis testing. This method cannot guarantee sufficient evidence to reject a hypothesis, however. Furthermore, the P value is dependent on the sample size, and large sample size will produce a high significance with a minimal violation of the assumption; an apparent assumption violation may be insignificant with small sample size. Although a Schoenfeld residual test is more objective than an LML plot, the use of two methods simultaneously is recommended owing to the problems listed above [4,5,7].

R codes for the Cox proportional hazard regression model and GOF test

To estimate a CPH model, libraries used in a Kaplan–Meier analysis are also required. After importing the data and calling the required libraries, the CPH model can be estimated according to the antiemetics using the following code.

# Univariate Cox proportional hazard model

# for a single covariate

cph.antiemetics <- coxph(Surv(Time, PONV == 1) ~ Antiemetics

, data = PONV.raw

)

summary(cph.antiemetics)

Table 4 summarizes the results. The PONV incidence rate is 1.9471-fold higher (95% CI, 1.174–3.229, P = 0.010) in the drug B groups than in the drug A groups.

Table 4.

Results of the Cox Proportional Hazard Model Estimation Using Antiemetics with Sample Data

Survival2_PONV.csv has four covariates. A multivariate analysis is possible using these covariates with a CPH model. Multivariate analysis can estimate the most compatible model, including significant covariates, through regression diagnostic statistics. Still, several controversies remain [9], both directional stepwise selection methods are applied in this example.

# Multivariate Cox regression

cph.full <- coxph (Surv(Time, PONV == 1)

~ Antiemetics + Age + Wt + Inopioid

, data = PONV.raw

)

summary (cph.full)

# Variables selection

cph.selection <- step(

coxph(Surv(Time, PONV == 1)

~ Antiemetics + Age + Wt + Inopioid

, data = PONV.raw)

, direction = "both"

)

summary (cph.selection)

# Final model selected

cph.selected <- coxph(Surv(Time, PONV == 1)

~ Antiemetics + Inopioid

, data = PONV.raw

)

summary (cph.selected)

After examining the full model including all covariates (summary(cph.full)), the most compatible model is confirmed through a covariate selection (summary(cph.selection)), and a clean result is finally obtained (summary(cph.selected)). Table 5 shows the final model. According to the result, the PONV increment is estimated as 2.021-fold (95% CI, 1.217–3.358, P = 0.007) based on the antiemetics, and 1.013-fold (95% CI, 1.008–1.018, P < 0.001) based on intraoperative opioid usage.

Table 5.

Multivariate Cox Proportional Hazard Model with Sample Data

The next code draws survival curves against the antiemetics for the final model (Fig. 4).⁸)

Fig. 4.

Survival curves of antiemetics estimated using the Cox proportional hazards regression model. a solid black line indicates Drug A (Antiemetics = 0) and a solid grey line indicates Drug B (Antiemetic = 1). Dashed lines present a 95% CI range. Drug A is coded as ‘Antiemetics = 0’ and drug B is coded as ‘Antiemetics = 1’ in the original data.

# Survival curves of the Cox PH model

# grouped by Antiemetics

new.cph.antiemetics <-with (PONV.raw

,data.frame(Antiemetics = c(0, 1),

Inopioid = c(0,0)

))

new.cph.antiemetics.fit <- survfit(cph.selected

, newdata = new.cph.antiemetics

)

ggsurvplot(new.cph.antiemetics.fit

, data = PONV.raw

, conf.int = TRUE

, conf.int.style = "step"

, censor = FALSE

, palette = "grey"

, break.time.by = 4

, linetype = "solid"

, axes.offset = FALSE

, xlab = "Time (hour)"

, legend = c(0.1, 0.15)

, legend.labs = c("Drug A", "Drug B")

, legend.title = "Antiemetics")

The R code for an LML plot is described above. For categorical variables, an LML plot provides an easy to interpret and intuitive validation method for a proportional hazard assumption.⁹^,) Validation of the proportional hazard assumption of the antiemetics, which is a categorical variable, is possible using an LML plot. (Fig. 5)

Fig. 5.

LML plot of Cox proportional hazards model based on antiemetics with sample data.

#LML for CoxPH

plot (survfit(coxph(Surv(Time, PONV == 1)

~ strata(Antiemetics)

, data = PONV.raw

)

fun = "cloglog"

)

The proportional hazard assumption of the antiemetics is not violated according to the graphs shown in Fig. 5. The covariate “Inopioid” is a continuous type of variable, and an LML plot using this variable is impossible to achieve without a categorical transformation.

A Schoenfeld residual test is shown below. Here, ‘cox.zph’ included in the ‘survminer’ library enables this test. The results are listed in Table 6, and graphical output is shown in Fig. 6.

Table 6.

Results of the Schoenfeld Residual Test

Fig. 6.

Schoenfeld residual plot with ‘Antiemetics’ and ‘Inopioid’. Dotted horizontal lines indicate the estimated coefficient values of these covariates.

# Schoenfeld residuals test

sf.residual <- cox.zph(cph.selected)

print(sf.residual) # display the results

par (mfrow = c(2,1))

plot(sf.residual[1]) # plot curves

abline (h = coef(cph.selected)[1]

, lty = "dotted", lwd = 1)

plot(sf.residual[2])

abline (h = coef(cph.selected)[2]

, lty = "dotted", lwd = 1)

The P value in Table 6 indicates the significance probability of the Schoenfeld residual test for the antiemetics and intraoperative opioid used, and such values indicate a violation of the proportional hazard assumption. A positive increment of the Schoenfeld residual curve for ‘Inopioid’ is shown in Fig. 6. The curve for the antiemetics gradually changes toward a negative value over time, but not continuously. In this way, a Schoenfeld residual test provides more objective results than an LML plot, which is strictly conservative.

Adding a time-dependent covariate

To validate a proportional hazard assumption in a CPH model, a time-dependent covariate is intentionally added into the estimated model. This covariate can be made using a time-independent variable and time, or a function of time. For example, the process compares two models, namely, a CPH model that assumes the proportional hazard assumption has not been violated, and another model incorporated with a combined covariate of the explanatory variable and time (or a function of time) in the estimated CPH model. A likelihood ratio test or Wald statistics are used for comparison. This type of method has certain advantages, including a simultaneous comparison with multiple covariates and various time functions; note that the results may change depending on the covariates and types of functions selected [5,10,11].¹⁰)

Cox Proportional Hazard Regression Models with Time-dependent Covariates

Covariates violating the proportional hazard assumption in a CPH model should be adequately adjusted. This section introduces a stratification and time-dependent Cox regression to deal with covariates violating the proportional hazard assumption.

Stratified Cox proportional hazard model

To fit the CPH model with variables violating the proportional hazard assumption, one method is to apply a stratified CPH model. This method makes one integrated result from the results of each stratum containing a categorical variable classified based on a certain criterion. Unlike the Mantel–Haenszel method, which is based on the sample size of each stratum, stratification in the CPH model sets a different baseline hazard corresponding to each stratum, and a statistical estimation is then applied to achieve common coefficients for the remaining explanatory variables except for the stratified variables.¹¹^,) This provides a hazard ratio of the controlled effects of variables violating the proportional hazard assumption [12].

A stratified CPH model can be applied to control the variables violating a constant hazard assumption, as well as to control the confounding factors that influence the results with little or no clinical significance. Stratification always requires categorical variables, and conversion into categorical variables is required for continuous variables. Under this situation, care should be taken that the sample size of each stratum is reduced (data thinned out) and information held by the continuous variable is simplified. Therefore, conversion into a categorical variable should consider as small number of strata as possible, setting the range of clinical or scientific meaning, and maintaining a balance among the strata [12].

R codes for stratified Cox proportional hazard model

In the previous CPH modeling, the variable ‘Inopioid’ violated the constant hazard assumption based on the Schoenfeld residual test (Fig. 6). Here, ‘Inopioid’ is a continuous variable that records the dose of intraoperatively used opioid. To apply a stratified CPH modeling, continuous variables should be converted into categorical variables. For convenience, the following is a code that converts ‘Inopioid’ into a categorical variable of 0 or 1, when not used or used, respectively.

##### Stratified Cox regression

### Add categorical variables from Inopioid

PONV.raw <- transform(PONV.raw,

Inopioid_c = ifelse(

Inopioid == 0, 0, 1))

head (PONV.raw)

According to this, the categorical variable ‘Inopioid_c’ is recorded as 0 or 1 and is newly added to the dataset (Table 7).

Table 7.

PONV.raw Added a New Categorical Variable ‘Inopioid_c’ from the Variable ‘Inopioid’

Next, the code for a stratified CPH model is as follows:

### Stratified Cox proportional hazard modeling

cph.strata <- coxph (Surv(Time, PONV == 1)

~ Antiemetics + strata(Inopioid_c)

, data = PONV.raw)

summary (cph.strata)

ggsurvplot(survfit(cph.strata)

, data = PONV.raw

, risk.table = TRUE

, palette = c("black","black")

, linetype = c("solid","dashed")

)

par( mfrow = c(1,1))

plot (survfit(cph.strata)

, fun = "cloglog"

, main = "Antiememtics"

)

sf.residual.strata <- cox.zph(cph.strata)

print(sf.residual.strata)

plot(sf.residual.strata)

abline (h = coef(cph.strata)

, lty = "dotted"

, lwd = 1)

This code outputs a stratified CPH model by controlling ‘Inopioid_c’ (Table 8). The command ‘ggsurvplot’ provides survival curves of two strata and prints the LML plot using the last ‘plot’ command (Fig. 7). The Schoenfeld residual test using a ‘cox.zph’ command reveals that ‘Antiemetics’ violates the proportional hazard assumption (rho = −0.265, chisq = 4.26, P = 0.039, shown in Fig. 8). It is possible to obtain an adequate CPH model by stratifying ‘Inopioid’ and ‘Antiemetics’, although the interpretations may be complex because it is difficult to integrate the comparison results among all strata.

Table 8.

Results of Stratified Cox Proportional Hazard Model. Stratification with ‘Inopioid_c’

Fig. 7.

Examples of the stratified Cox proportional hazard model and corresponding LML plot. (A) Survival curves of estimated stratified Cox proportional hazard model. Stratification is achieved using the categorical variable ‘Inopioid_c’. (B) Log-minus log plot for evaluation of proportional hazard assumption against two antiemetics. Note that a non-parallelism of below 2 h is not assured, whereas the overall curves are roughly parallel without crossing.

Fig. 8.

Schoenfeld residual test for the stratified Cox proportional hazard model. For the covariate ‘Antemetics’, the probability was estimated as 0.039, and a violation of the proportional hazard assumption was strongly suggested under the controlled covariate ‘Inopioid’ (the dotted horizontal line shows the estimated coefficient of ‘Antiemetics’).

Time-dependent Cox regression

Most clinical situations change over time, and the variables affected by a specific treatment also change even when the treatment remains constant during the observation period [13]. For example, consider an analgesic having a toxic effect on the hepatobiliary function for patients with chronic pain. A periodic liver function test will be crucial, and all laboratory results will vary for every follow-up time. The administration dose may also vary according to the laboratory results or analgesic effects. Moreover, the laboratory results may not be valid after the patients are censored or after an event occurs. These variables are common in clinical practice, and the existence of time-dependent variables should be considered and checked before starting the data collection for survival analysis. If an adequate measurement method is developed, a time-dependent covariate Cox regression will be possible. Another type of time-dependent variable is a covariate with a time-dependent coefficient [14]. If the analgesics mentioned above produces a level of tolerance, its effect decreases over time. This indicates that the risk of breakthrough pain occurrence may be higher as time passes, which apparently violates the proportional hazard assumption. In this case, the effect of the analgesics can be included in the survival function, which is expressed as a covariate with a coefficient of the function of time.

As mentioned above, a time-dependent covariate is incorporated into the analysis as a single value according to the repeated observation intervals. For example, a patient under analgesics medication takes an initial liver function test, the results of which show 40 IU/L and 100 IU/L after four weeks with continued pain and 130 IU/L at eight weeks with pain, whereas at 12 weeks after analgesics administration, the pain is subsided and medications are discontinued without a further laboratory test. The laboratory data input for the time-dependent covariate are 40 until 4th weeks without an event, 100 from 4th to 8th weeks without an event, 130 from 8th to 12th weeks, and an event occurs at 12th weeks.

Clinical studies in the area of anesthesiology often include variables related to the response or effect of a specific treatment or medication. Depending on the characteristics and measurement methods of the variables, once a specific treatment or medication is applied, their effects are gradually decreased over time or delayed until the onset time. The effects of treatment or medication changes over time, the coefficient of these effects can be expressed as a time function, and for Cox regression, a step function is frequently applied. A step function is a method applying different coefficient values to different time intervals. A Cox regression can thus be established and output the integrated results [15]. In addition, a continuous parametric function for a time-dependent coefficient can be used for analysis instead of a step function [14].

R code for time-dependent coefficient Cox regression model: step function

As shown in Fig. 6, the Schoenfeld residuals of ‘Antiemetics’ and ‘Inopioid’ turn from positive to negative or vice versa at approximately 3 and 6 h. The data are arbitrarily separated using these time points.

tdc <- survSplit (Surv(Time, PONV) ~.

, data = PONV.raw

, cut=c(3, 6)

, episode = "tgroup"

, id = "id"

)

head(tdc)

The command ‘survSplit’ separates the patient data according to the established time interval, where the value for each interval is the measured value on the left side of the interval (start time, ‘tstart’), and ‘Time,’ which is the end of the interval succeeds the next interval. That is, one interval is closed at the left and opened at the right, and if an event occurs during the interval, the survival function is estimated using the variables measured at the left side of the interval (Table 9). It seems that the data being duplicated at the end and the start of the interval, problems do not occur because the divided time does not overlap. It is possible to apply a Cox regression and GOF test with these separated data.

Table 9.

Data Divided by survSplit Function

# Fitting Cox regression

fit.tdc <- coxph(Surv(tstart,Time, PONV)

~ Antiemetics:strata(tgroup)

+ Inopioid

, data = tdc)

summary(fit.tdc)

# GOF test

sf.tdc <- cox.zph(fit.tdc)

print (sf.tdc)

par(mfrow=c(2,2))

plot(sf.tdc[1])

abline (h = coef(fit.tdc)[1], lty = "dotted")

plot(sf.tdc[2])

abline (h = coef(fit.tdc)[2], lty = "dotted")

plot(sf.tdc[3])

abline (h = coef(fit.tdc)[3], lty = "dotted")

plot(sf.tdc[4])

abline (h = coef(fit.tdc)[4], lty = "dotted")

Table 10 shows the estimated Cox regression and GOF test results, and Fig. 9 presents a plot of the Schoenfeld residuals. The risk of the PONV increases 1.0126-fold (95% CI: 1.0078–1.017, P < 0.001) by one unit of intraoperative opioid. For the antiemetics, the group taking drug B showed an increased PONV risk of 3.6545-fold (95% CI: 1.2024–11.107, P = 0.022) until 3 h post-operation, 3.8969-fold (95% CI: 1.4020–10.831, P = 0.009) until 6 h post-operation, with no significant difference shown until the end of the observation (risk ratio = 0.9382, 95% CI: 0.4242–2.075, P = 0.957). The results of the Schoenfeld residual test (Table 10 and Fig. 9) indicate that all variables do not violate the proportional hazard assumption. These results cannot provide a single desired outcome, and it is necessary to combine the results.

Table 10.

Results of Time-dependent Coefficient Cox Regression Using Step Function and Schoenfeld Residual Test

Fig. 9.

Schoenfeld residual graphs of time-dependent coefficient Cox regression.

# Combined results

combine.tdc <- data.frame(tstart = rep(c(0,3,6), 2)

, Time = rep(c(3,6, 24), 2)

, PONV = rep(0,12)

, tgroup= rep(1:3,4)

, trt = rep(1,12)

, prior= rep(0,12)

, Antiemetics = rep(c(0,1), each = 6)

, Inopioid = rep (c(0,1), each = 3)

, parameter = rep(0:1, each = 6)

)

combine.tdc

cfit.tdc <- survfit(fit.tdc

, newdata = combine.tdc

, id = parameter

)

cfit.tdc

km <- survfit(Surv(Time, PONV) ~Antiemetics

, data = PONV.raw

)

summary (km)

par( mfrow = c(1,1))

plot(km, xmax= 24, col="Black"

, lty = c("solid","dashed"), lwd=2

, xlab="Postoperative hours"

, ylab="PONV free"

)

lines(cfit.tdc, col="Grey"

, lty= c("solid","dashed"), lwd=2)

legend (x = 0.15, y = 0.25

, c("Drug A, Kaplan-Meier estimation"

, "Drug B, Kaplan-Meier estimation"

, "Drug A, Cox regression with time-dependent coefficient"

, "Drug B, Cox regression with time-dependent coefficient"

)

, col = c("black", "black", "grey", "grey")

, lty = c("solid", "dashed", "solid", "dashed")

)

To compare the results from two antiemetics, the data divided by ‘survSplit’ are combined to enable an interpretation (combine. tdc). The results are shown in Table 11. The survival model considering the time-dependent coefficient increases the sample size because the data of one patient are separated at the established time points. Note that the median survival times in this model are 31 and 16 h, and the median survival times from the Kaplan–Meier analysis are 13 and 6 h. Plotting these two models into a single graph enables a visual comparison (Fig. 10). Here, although ‘ggsurvplot’ provides comprehensive graphs, it cannot draw two graphs simultaneously. Another graphics software is required to make a single graph from these graphs (Fig. 11).

Table 11.

Comparison Kaplan–Meier Analysis and Survival Analysis with Time-dependent Coefficient

Fig. 10.

Graphical comparison between survival models of Kaplan– Meier and Cox regression with time-dependent coefficient. Black curves indicate the model fitted using a Kaplan–Meier analysis, and the gray curves are from a Cox regression with a time-dependent coefficient. The solid lines indicate Antiemetics = 0 (Drug A), and the dashed lines indicate Antiemetics = 1 (Drug B).

Fig. 11.

Cox regression model with the time-dependent coefficient. Survival curves of Kaplan–Meier analysis (A) and time-dependent coefficient (B) using ‘ggsurvplot’ command. Gray solid lines indicate Antiemetics = 0 (Drug A), and the black dashed lines indicate Antiemetics = 1 (Drug B). The results of the survival analysis are changed when considering the constant hazard ratio assumption.

## plot using ggsurvplot

ggsurvplot ( km, data = PONV.raw,

fun = "pct", pval = TRUE,

conf.int = TRUE, surv.median.line = "hv",

linetype = "strata", palette = "grey",

legend.title = "Antiemetics",

legend.labs = c("Drug A", "Drug B"),

legend = c(.1, .2), break.time.by = 4,

xlab = "Time (hour)",

risk.table = TRUE, tables.height = 0.2,

tables.theme = theme_cleantable(),

risk.table.y.text.col = TRUE,

risk.table.y.text = TRUE

)

ggsurvplot ( cfit.tdc, data = PONV.raw,

fun = "pct",

conf.int = TRUE, surv.median.line = "hv",

linetype = "strata", palette = "grey",

legend.title = "Antiemetics",

legend.labs = c("Drug A", "Drug B"),

legend = c(.1, .2), break.time.by = 4,

xlab = "Time (hour)",

risk.table = TRUE, tables.height = 0.2,

tables.theme = theme_cleantable(),

risk.table.y.text.col = TRUE,

risk.table.y.text = TRUE

Conclusions

Clinical studies in the area of anesthesiology had rarely presented statistical results using survival analysis. In recent years, studies on the survival or recurrence of cancer according to the anesthetics have been actively published [16–18]. Survival analysis has the power to present clear and comprehensive results based on studies on pain control or the effects of medications. Previous articles have focused on the basic concepts of survival analysis and interpretations of the published results [1], and the present article covered the process of conducting a survival analysis using clinical data, finding errors, and achieving adequate results. Although this article does not include all existing survival analysis methods, it introduced several R codes to enable an intermediate level of survival analysis for clinical data in the field of anesthesiology.12)

Some clinical papers dealing with a survival analysis have presented statistical results without considering a proportional hazard assumption or an interaction between the covariates and time. The power of a log-rank test, which is commonly used to compare two groups, tends to decrease when a proportional hazard assumption is violated and can generate an incorrect result [19,20]. An investigation into the reporting of survival analysis results in leading medical journals indicated that the use of survival analysis has significantly increased, although several problems still exist, including descriptions regarding the censoring, sample size calculation, constant proportional hazard ratio assumption validations, and GOF testing [21]. Because most statistical analyses require several basic assumptions, survival analysis also requires some essential assumptions. In a Kaplan–Meier analysis, the likelihood of an event of interest and censoring occurring should be independent from each other, and the survival probabilities of patients who participated in earlier and later studies should be similar. A log-rank test also requires the previously described and proportional hazard assumptions [22]. A CPH model requires a proportional hazard assumption, independence between the survival times among different patients, and a multiplicative relationship between the predictors and hazard [23].

When reporting or interpreting the results of survival analysis, it is important that the identification of the underlying assumptions corresponds to the statistical analysis, and it is necessary to verify that the assumptions are reasonable and well maintained. Statistical results with violated assumptions cause deviated decisions because of an increased probability of error. Survival analysis will be a powerful tool to achieve a scientific conclusion when an appropriate method is chosen with regard to the nature of the variables, the relationship with time, and other basic assumptions.

Notes

¹⁾

Sample data (Survival2_PONV.csv) and the R console output of entire code are provided as supplemental information. Refer to online help or R statistical textbooks for detailed explanations of the argument. The included R code covers the process beginning with the survival analysis introduced in [1]. A detailed description of a violation of a proportional hazard assumption is provided in [14].

²⁾

Terry M. survival: Survival Analysis. R package version 2.42-4. 2018. https://github.com/therneau/survival.

³⁾

Alboukadel K, Marcin K, Przemyslaw B, Scheipl F. survminer: Drawing Survival Curves using 'ggplot2', R package version 0.4.3. 2018. http://www.sthda.com/english/rpkgs/survminer/.

⁴⁾

There are several ways to draw an LML plot in R; ‘plot.survfit’ with the argument ‘fun = “cloglog”’ provides an LML plot of the log-scaled x-axis. Most statistical references describe a log-scaled x-axis LML plot, whereas others describe a standard linear-scaled x-axis LML plot. The R code for a non-log-scaled LML plot can be created through the following. # Non-log scaled LML plot ponvsurv=Surv(PONV.raw$Time, PONV.raw$PONV) NLML.fun=function(p){return(log(-log(p)))} plot (survfit(ponvsurv ~ PONV.raw$Antiemetics), fun=NLML.fun)

⁵⁾

R package ‘survival’ version 2.44-1 (updated in March 2019) has an error with an x value of 1 when log scaled using the x-axis. Versions before 2.44-1 work properly.

⁶⁾

Schoenfeld residual is only for the patient who experienced the event. It is the difference between observed value of explanatory variable at a specific time and expected value of the explanatory variable (covariate) at a specific time which is a weighted-average value by likelihood of event from the risk set at that time point.

⁷⁾

Some statistical software provides a method using scaled Schoenfeld residuals. Under a specific circumstance, these two results are different, although they mostly produce similar results. Please refer to the following: Grambsch, P.M. and Therneau, T.M. 1994. Proportional hazards tests and diagnostics based on weighted residuals. Biometrika 81: 515-526.

⁸⁾

The command ‘ggadjustedcurves’ included in the ‘survminer’ library easily produces the survival curves of the CPH model. Unfortunately, this command still has minor functional errors such as in printing the 95% CI or labelling, and a somewhat complex ‘ggsurvplot’ is used in this example.

⁹⁾

In R, categorical variables should be treated as a stratum for comparison using an LML plot of the CPH model.

¹⁰⁾

Because various application methods and their variations are available, they are not discussed in detail herein.

¹¹⁾

This is a non-interaction stratified CPH model. Several survival functions are estimated through stratification, and if the explanatory variables have interactions with each other, the coefficients at each stratum may be different. In this case, it is assumed that an interaction model between explanatory variables and a likelihood ratio test provide clues to judge whether there is an interaction between explanatory variables. That is, if two or more variables are included in the model, it is necessary to check whether an interaction between them exists.

¹²⁾

A clustered event time analysis and an accelerated failure time analysis are often applied to survival analysis methods in clinical study. A clustered event time analysis is similar with a stratified CPH model, and has certain advantages when each stratum has insufficient event cases. It has two types of processes, one is a marginal approach that estimates the survival function through an overall cluster from the pooled effect of each stratum, and another is a conditional approach that estimates the survival function from the heterogeneity between clusters. An accelerated failure time analysis estimates the model similarly with a linear regression based on a Weibull distribution or log-logistic distribution. Unlike a CPH model that continuously maintains the risk ratio of the covariates, this model assumes that the disease process can be accelerated or decelerated over time.

Notes

Conflicts of Interest

No potential conflict of interest relevant to this article was reported.

Author Contributions

Junyong In (Conceptualization; Writing–original draft; Writing–review & editing)

DDong Kyu Lee (Conceptualization; Writing–original draft; Writing – review & editing)

Supplementary Materials

Futher detailes are presented in the online version of this article (Available from https://doi.org/10.4097/kja.19183).

kja-19183-suppl.pdf

kja-19183-suppl2.csv

References

1. In J, Lee DK. Survival analysis: Part I - analysis of time-to-event. Korean J Anesthesiol 2018;71:182–91.

2. Clark TG, Bradburn MJ, Love SB, Altman DG. Survival analysis part I: basic concepts and first analyses. Br J Cancer 2003;89:232–8.

3. Bewick V, Cheek L, Ball J. Statistics review 12: survival analysis. Crit Care 2004;8:389–94.

4. Hancock MJ, Maher CG, Costa Lda C, Williams CM. A guide to survival analysis for manual therapy clinicians and researchers. Man Ther 2014;19:511–6.

5. Kleinbaum D, Klein M. Evaluating the proportional hazards assumption. Survival Analysis. A Self-Learning Text 2nd edth ed. New York: Springer Science+Business Media, Inc; 2005. p. 131–72.

6. Schonfeld D. Partial residuals for the proportional hazards model. Biometrika 1982;69:238–41.

7. Grambsch PM, Therneau TM. Proportional hazards tests and diagnostics based on weighted residuals. Biometrika 1994;81:515–26.

8. Abeysekera W, Sooriyarachchi R. Use of Schoenfeld’s global test to test the proportional hazards assumption in the Cox proportional hazards model: an application to a clinical study. J Natl Sci Found 2009;37:41–51.

9. Ekman A. Variable selection for the Cox proportional hazards model: A simulation study comparing the stepwise, lasso and bootstrap approach [Master's thesis]. [Umeå]: UMEÅUniversity;; 2017. 50 p. Available from http://urn.kb.se/resolve?urn=urn:nbn:se:umu:diva-130521.

10. Prashant Narayan KC. Extension of Cox PH Model When Hazards are Non-Proportional Applied to Residential Treatment for Drug Abuse [Master's thesis]. [Mankato (MN)]: Minnesota State University;; 2016. 51 p. Available from https://cornerstone.lib.mnsu.edu/etds/661/.

11. Collett D. Testing the assumption of proportional hazards. Modelling Survival Data in Medical Research 2nd edth ed. Boca Raton: Chapman & Hall/CRC; 2003. p. 141131–7.

12. Kleinbaum D, Klein M. The stratified Cox procedure. Survival analysis 2nd edth ed. New York: Springer Science+Business Media, Inc; 2005. p. 173–210.

13. Collett D. Time-dependent variables Modelling Survival Data in Medical Research. 2nd edth ed. Boca Raton: Chapman & Hall/CRC; 2003. p. 251–72.

14. Zhang Z, Reinikainen J, Adeleke KA, Pieterse ME, Groothuis-Oudshoorn CG. Time-varying covariates and coefficients in Cox regression models. Ann Transl Med 2018;6:121.

15. Thomas L, Reyes EM. Tutorial: survival estimation for Cox regression models with time-varying coefficients using SAS and R. J Stat Softw 2014;61:1–23.

16. Wigmore TJ, Mohammed K, Jhanji S. Long-term survival for patients undergoing volatile versus IV anesthesia for cancer surgery: a retrospective analysis. Anesthesiology 2016;124:69–79.

17. Tsui BC, Rashiq S, Schopflocher D, Murtha A, Broemling S, Pillay J, et al. Epidural anesthesia and cancer recurrence rates after radical prostatectomy. Can J Anaesth 2010;57:107–12.

18. Biki B, Mascha E, Moriarty DC, Fitzpatrick JM, Sessler DI, Buggy DJ. Anesthetic technique for radical prostatectomy surgery affects cancer recurrence: a retrospective analysis. Anesthesiology 2008;109:180–7.

19. Qiu P, Sheng J. A two‐stage procedure for comparing hazard rate functions. J R Stat Soc Series B Stat Methodol 2008;70:191–208.

20. Li H, Han D, Hou Y, Chen H, Chen Z. Statistical inference methods for two crossing survival curves: a comparison of methods. PLoS One 2015;10e0116774.

21. Abraira V, Muriel A, Emparanza JI, Pijoan JI, Royuela A, Plana MN, et al. Reporting quality of survival analyses in medical journals still needs improvement. A minimal requirements proposal. J Clin Epidemiol 2013;66:1340–6.

22. Goel MK, Khanna P, Kishore J. Understanding survival analysis: Kaplan-Meier estimate. Int J Ayurveda Res 2010;1:274–8.

23. George B, Seals S, Aban I. Survival analysis and regression models. J Nucl Cardiol 2014;21:686–94.

Article information Continued

This is an open-access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

Call: survfit(formula = Survobj ~ 1, data = PONV.raw, conf.type = "log − log")
n		Events		Median	0.95LCL		0.95UCL
104		63		10	7		16

Call: survfit(formula = Survobj ~ 1, data = PONV.raw, conf.type = "log − log")
Time	n.risk		n.event	Survival	std.err	Lower 95% CI		Upper 95% CI

1	104		8	0.923	0.0261	0.852		0.961
2	96		7	0.856	0.0345	0.772		0.910
3	89		3	0.827	0.0371	0.739		0.887

Call: survfit(formula = Survobj ~ Antiemetics, data = PONV.raw, conf.type = "log − log")
	n		Events	Median	0.95LCL	0.95UCL
Antiemetics = 0	51		25	13	9	NA
Antiemetics = 1	53		38	6	4	10

Call: survfit(formula = Survobj ~ Antiemetics, data = PONV.raw, conf.type = "log − log")
Antiemetics = 0
Time	n.risk	n.event	Survival	std.err	Lower 95% CI	Upper 95% CI

1	51	1	0.980	0.0194	0.869	0.997
2	50	2	0.941	0.0329	0.829	0.981
3	48	1	0.922	0.0376	0.804	0.970
4	47	2	0.882	0.0451	0.757	0.945
︙	︙	︙	︙	︙	︙	︙

Antiemetics = 1
Time	n.risk	n.event	Survival	std.err	Lower 95% CI	Upper 95% CI

1	53	7	0.868	0.0465	0.743	0.935
2	46	5	0.774	0.0575	0.636	0.865
3	41	2	0.736	0.0606	0.595	0.834
4	39	6	0.623	0.0666	0.478	0.738
︙	︙	︙	︙	︙	︙	︙

Call: survdiff(formula = Surv(Time, PONV == 1) ~ Antiemetics, data = PONV.raw)
		N		Observed		Expected

Antiemetics = 0		51		25		34.9
Antiemetics = 1		53		38		28.1
Chisq = 6.8 on 1 degrees of freedom, P = 0.009

Call: coxph(formula = Surv(Time, PONV == 1) ~ Antiemetics, data = PONV.raw)
n = 104, number of events = 63
	coef	exp(coef)	se(coef)	z	Pr(>\|z\|)
Antiemetics	0.6664	1.9471	0.2581	2.582	0.00983**
---

Signif. codes: 0 ‘’ 0.001 ‘’ 0.01 ‘’ 0.05 ‘.’ 0.1 ‘’ 1**
	exp(coef)	exp(-coef)	Lower .95	Upper .95

Antiemetics	1.947	0.5136	1.174	3.229

Concordance		= 0.615	(se = 0.032)
Rsquare		= 0.064	(max possible = 0.993)
Likelihood ratio test		= 6.85	on 1 df, P = 0.009
Wald test		= 6.67	on 1 df, P = 0.01
Score (logrank) test		= 6.91	on 1 df, P = 0.009

Call: coxph(formula = Surv(Time, PONV == 1) ~ Antiemetics + Inopioid, data = PONV.raw)
n = 104, number of events = 63
	coef	exp(coef)	se(coef)	z	Pr(>\|z\|)
Antiemetics	0.703650	2.021116	0.258971	2.717	0.00659**
Inopioid	0.012740	1.012821	0.002417	5.271	1.35e-07***
---

Signif. codes: 0 ‘*’ 0.001 ‘’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘’
	exp(coef)	exp(-coef)	Lower .95	Upper .95

Antiemetics	2.021	0.4948	1.217	3.358
Inopioid	1.013	0.9873	1.008	1.018

Concordance		= 0.694	(se = 0.03)
Rsquare		= 0.284	(max possible= 0.993)
Likelihood ratio test		= 34.69	on 2 df, P = 3e-08
Wald test		= 34.09	on 2 df, P = 4e-08
Score (logrank) test		= 38.29	on 2 df, P = 5e-09

Results of ‘print(sf.residual)’
	rho	chisq	P
Antiemetics	−0.275	4.5	0.0340
Inopioid	0.307	5.36	0.0206
GLOBAL	NA	10.26	0.0059

Result of command “head(PONV.raw)
	No	Antiemetics	Age	Wt	Inopioid	Time	PONV
1	1	0	48	78.5	0	4	0
2	3	0	54	88.3	100	21	0
3	4	0	22	49.4	0	14	0
︙	︙	︙	︙	︙	︙	︙	︙

Call: coxph(formula = Surv(tstart, Time, PONV) ~ Antiemetics:strata(tgroup) + Inopioid, data = tdc)
n = 250, number of events = 63
	coef	exp(coef)	se(coef)	z	Pr(>\|z\|)
Inopioid	0.012477	1.012556	0.002413	5.172	2.32E-07***
Antiemetics: strata(tgroup)tgroup = 1	1.295949	3.654464	0.567181	2.285	0.02232*
Antiemetics: strata(tgroup)tgroup = 2	1.360185	3.896914	0.521567	2.608	0.00911**
Antiemetics: strata(tgroup)tgroup = 3	−0.063743	0.938247	0.404993	−0.157	0.87494
---

Signif. codes: 0 ‘’ 0.001 ‘’ 0.01 ‘’ 0.05 ‘.’ 0.1 ‘ ’ 1**
	exp(coef)	exp(-coef)		Lower .95	Upper .95

Inopioid	1.0126	0.9876		1.0078	1.017
Antiemetics: strata(tgroup)tgroup = 1	3.6545	0.2736		1.2024	11.107
Antiemetics: strata(tgroup)tgroup = 2	3.8969	0.2566		1.4020	10.831
Antiemetics: strata(tgroup)tgroup = 3	0.9382	1.0658		0.4242	2.075
Concordance		= 0.67		(se = 0.031)
Rsquare		= 0.152		(max possible = 0.874)
Likelihood ratio test		= 41.35		on 4 df, P = 2e-08
Wald test		= 38.92		on 4 df, P = 7e-08
Score (logrank) test		= 44.61		on 4 df, P = 5e-09

Results of Schoenfeld residual test
	rho		chisq		P

Inopioid	0.29948		5.150327		0.0232
Antiemetics: strata(tgroup)tgroup = 1	−0.02755		0.047411		0.8276
Antiemetics: strata(tgroup)tgroup = 2	−0.00368		0.000845		0.9768
Antiemetics: strata(tgroup)tgroup = 3	0.02486		0.038692		0.8441
GLOBAL	NA		5.199691		0.2674