Using diagnostic testing to determine the presence or absence of a disease is essential in clinical practice. In many cases, test results are obtained as continuous values and require a process of conversion and interpretation and into a dichotomous form to determine the presence of a disease. The primary method used for this process is the receiver operating characteristic (ROC) curve. The ROC curve is used to assess the overall diagnostic performance of a test and to compare the performance of two or more diagnostic tests. It is also used to select an optimal cut-off value for determining the presence or absence of a disease. Although clinicians who do not have expertise in statistics do not need to understand both the complex mathematical equation and the analytic process of ROC curves, understanding the core concepts of the ROC curve analysis is a prerequisite for the proper use and interpretation of the ROC curve. This review describes the basic concepts for the correct use and interpretation of the ROC curve, including parametric/nonparametric ROC curves, the meaning of the area under the ROC curve (AUC), the partial AUC, methods for selecting the best cut-off value, and the statistical software to use for ROC curve analyses.

Using diagnostic testing to determine the presence or absence of a disease is an essential process in the medical field. To determine whether a patient is diseased or not, it is necessary to select the diagnostic method with the best performance be used by comparing various diagnostic tests. In many cases, test results are obtained as continuous values, which require conversion and interpretation into dichotomous groups to determine the presence or absence of a disease. At this time, determining the cut-off value (also called the reference value) to discriminate between normal and abnormal conditions is critical. The method that is mainly used for this process is the receiver operating characteristic (ROC) curve. The ROC curve aims to classify a patient’s disease state as either positive or negative based on test results and to find the optimal cut-off value with the best diagnostic performance. The ROC curve is also used to evaluate the overall diagnostic performance of a test and to compare the performance of two or more tests.

Although non-statisticians do not need to understand all the complex mathematical equations and the analytical process associated with ROC curves, understanding the core concepts of the ROC curve analysis is a prerequisite for the correct interpretation and application of analysis results. This review describes the basic concepts for the correct use and interpretation of the ROC curve, including how to draw an ROC curve, the difference between parametric and nonparametric ROC curves, the meaning of the area under the ROC curve (AUC) and the partial AUC, the methods for selecting the best cut-off value, and the statistical software for ROC curve analysis.

To understand the ROC curve, it is first necessary to understand the meaning of sensitivity and specificity, which are used to evaluate the performance of a diagnostic test. Sensitivity is defined as the proportion of people who actually have a target disease that are tested positive, and specificity is the proportion of people who do not have a target disease that are tested negative. FP refers to the proportion of people that do not have a disease but are incorrectly tested positive, while FN refers to the proportion of people that have the disease but are incorrectly tested negative (

As shown in

The ROC curve is an analytical method, represented as a graph, that is used to evaluate the performance of a binary diagnostic classification method. The diagnostic test results need to be classified into one of the clearly defined dichotomous categories, such as the presence or absence of a disease. However, since many test results are presented as continuous or ordinal variables, a reference value (cut-off value) for diagnosis must be set. Whether a disease is present can thus be determined based on the cut-off value. An ROC curve is used for this process.

The ROC curve was initially developed to determine between a signal (true positive result) and noise (false positive result) when analyzing signals on a radar screen during World War II. This method, which has been used for signal detection/discrimination, was later introduced to psychology [

The ROC curve connects the coordinate points using “1 – specificity (false positive rate)” as the x-axis and “sensitivity” as the y-axis for all cut-off values measured from the test results. The stricter the criteria for determining a positive result, the more points on the curve shift downward and to the left (

The ROC curve has various advantages and disadvantages. First, the ROC curve provides a comprehensive visualization for discriminating between normal and abnormal over the entire range of test results. Second, because the ROC curve shows all the sensitivity and specificity at each cut-off value obtained from the test results in the graph, the data do not need to be grouped like a histogram to draw the curve. Third, since the ROC curve is a function of sensitivity and specificity, it is not affected by prevalence, meaning that samples can be taken regardless of the prevalence of a disease in the population [

The types of ROC curves can be primarily divided into nonparametric (or empirical) and parametric. Examples of the two curves are shown in

However, when the ROC curve is obtained using the parametric method, an improper ROC curve is obtained if the data does not meet the normality assumption or within-group variations are not similar (heteroscedasticity). An example of an improper parametric ROC curve is shown in

To overcome this limitation, a nonparametric ROC curve can be used since this method does not take into account the distribution of the data. This is the most commonly used ROC curve analysis method (also called the empirical method). For this method, the test results do not require an assumption of normality. The sensitivity and false positive rates calculated from the 2 × 2 table based on each cut-off value are simply plotted on the graph, resulting in a jagged line rather than a smooth curve.

Additionally, a semiparametric ROC curve is sometimes used to overcome the drawbacks of the nonparametric and parametric methods. This method has the advantage of presenting a smooth curve without requiring assumptions about the distribution of the diagnostic test results. However, many statistical packages do not include this method, and it is not widely used in the medical research.

Consider an example in which a cancer marker is measured for a total of 10 patients to determine the presence of cancer, and an empirical ROC curve is drawn (

The AUC is widely used to measure the accuracy of diagnostic tests. The closer the ROC curve is to the upper left corner of the graph, the higher the accuracy of the test because in the upper left corner, the sensitivity = 1 and the false positive rate = 0 (specificity = 1). The ideal ROC curve thus has an AUC = 1.0. However, when the coordinates of the x-axis (1 – specificity) and the y-axis correspond to 1 : 1 (i.e., true positive rate = false positive rate), a graph is drawn on the 45° diagonal (y = x) of the ROC curve (AUC = 0.5). Such a situation corresponds to determining the presence or absence of disease by an accidental method, such as a coin toss, and has no meaning as a diagnostic tool. Therefore, for any diagnostic technique to be meaningful, the AUC must be greater than 0.5, and in general, it must be greater than 0.8 to be considered acceptable (

The AUC is often presented with a 95% CI because the data obtained from the sample are not fixed values but rather influenced by statistical errors. The 95% CI provides a range of possible values around the actual value. Therefore, for any test to be statistically significant, the lower 95% CI value of the AUC must be > 0.5.

The CI of the AUC can be estimated using the parametric or nonparametric method. The binormal method proposed by Metz [

Nonparametric AUC estimates for empirical ROC curves tend to underestimate the AUC on a discrete rating scale, such as a 5-point scale. Except when the sample size is extremely small, the parametric method is preferred even for discrete data, because the bias in the parametric estimates of the AUC is small enough to be negligible. However, if the collected data are not normally distributed, a nonparametric method is the correct option. For continuous data, the parametric and nonparametric estimates of the AUC have very similar values [_{1} vs. AUC_{2}), the test can be tested using the following Z-statistics. To determine whether an AUC (A_{1}) is significant under the null hypothesis, Z can be calculated by substituting A_{2} = 0.5.

When comparing the AUC of two diagnostic tests, if the AUC values are the same, this only means that the overall diagnostic performance of the two tests are the same and not necessarily that the ROC curves of the two tests are the same [

As its name suggests, the pAUC is the area below some of the ROC curve. It is the region between two points of false positive rate (FPR), defined as the pAUC between the two FPRs (FPR_{1} = _{1}_{2} = _{2}_{1}_{2}_{1}_{2}_{1}_{2}

The minimum possible value of the pAUC can be expressed as _{1} = _{1}_{2} = _{2}_{2}_{1}

To calculate the sample size for the ROC curve analysis, the expected AUCs to be compared (namely, AUC_{1} and AUC_{2}, where AUC_{2} = 0.5 for the null hypothesis), the significance level (α), power (1 – β), and the ratio of negative/positive results should be considered [

In general, it is crucial to set a cut-off value with an appropriate sensitivity and specificity because applying less stringent criteria to increase sensitivity results in a trade-off in which specificity decreases. Finding the optimal cut-off value is not simply done by maximizing sensitivity and specificity, but by finding an appropriate compromise between them based on various criteria. Sensitivity is more important than specificity when a disease is highly contagious or associated with serious complications, such as COVID-19. In contrast, specificity is more important than sensitivity when a test to confirm the diagnosis is expensive or highly risky. If there is no preference between sensitivity and specificity, or if both are equally important, then the most reasonable approach is to maximize them both. Since the methods introduced here are based on various assumptions, the choice of which method to use should be judged based on the importance of the sensitivity versus the specificity of the test. There are more than 30 methods known to find the optimal cut-off value [

Youden’s J statistic refers to the distance between the 45° diagonal and the ROC curve while moving the 45° diagonal (a straight line with a slope of 1) in the coordinate (0, 1) direction (

Another method for determining the optimal reference value is to use the Euclidean distance from the coordinate (0, 1), which is also called the upper-left (UL) index [

The point at which this value is minimized is considered the optimal cut-off value. The Euclidean distance on the ROC curve is shown in

Accuracy refers to the proportion of the cases that are accurately classified, as shown in

This definition assumes that all correctly classified results (whether it is true positive or true negative) are of equal value, and all misclassified results are equally undesirable. However, this is often not the case. The costs of false-positive and false-negative classifications are rarely equivalent; the more significant the cost difference between false positive and false negative results, the more likely that the accuracy distorts the clinical usefulness of the test results. Accuracy is highly dependent on the prevalence of a disease in the sample; therefore, even when the sensitivity and specificity are low, the accuracy may be high [

IU uses the absolute difference between the diagnostic measurement and the AUC value to minimize the misclassification rate, calculated using the following formula [

IU is a method for finding the point at which the sensitivity and specificity are simultaneously maximized. It is similar to the Euclidean distance; however, it differs in that it uses the absolute differences between the AUC value and diagnostic accuracy measurements (sensitivity and specificity). This method does not require complicated calculations since it only involves checking whether the sensitivity and specificity at the optimal cut-off value are sufficiently close to the AUC values. In addition, the IU has been found to have a better diagnostic performance compared to the other methods in most cases [

The cost approach is a method for finding the optimal cut-off value that takes into account the benefits of correct classification or the costs of misclassification. This method can be used when the costs of true positives (TPs), true negatives (TNs), false positives (FPs), and false negatives (FNs) of a diagnostic test are known. The costs here can be medical or financial and can be considered from a patient and/or social perspective. When determining the cut-off value using the cost approach, there are two ways; to calculate the cost itself [_{m}) [

where Pr is prevalence and C_{FP}, C_{TN}, C_{FN}, and C_{TP} refer to the costs of FPs, TNs, FNs, and TPs, respectively. These four costs should be expressed as a common unit. When the cost index (_{m}) is maximized, the average cost is minimized, and this point is considered the optimal cut-off value.

Another method for determining the optimal cut-off value in terms of cost is to use the misclassification cost term (MCT). Considering only the prevalence of the disease, the C_{FP}, and the C_{FN}, the point at which the MCT is minimized is determined as the optimal cut-off value [

LR^{+} is the ratio of true positives to false positives, and LR^{–} is the ratio of false negatives to true negatives.

Researchers can choose a cut-off value that either maximizes LR^{+} or minimizes LR^{–}.

For this method, the point at which the product of Se and Sp is maximized is considered the optimal cut-off value.

This can also be represented graphically, as shown in

For this method, the point at which the sum of Se and Sp is maximized is considered the optimal cut-off value.

At the point where the summation value is maximized, Youden’s index (Se + Sp – 1) and the difference between the true positives (Se) and false positives (1 – Sp) are also maximized [

This method refers to the number of patients required to obtain one misdiagnosis when conducting a diagnostic test. In other words, if NNM = 10, it means that ten people must be tested to find one misdiagnosed patient. The higher the NNM, the better the test performance. NNM is calculated as follows, and the point at which the NNM is maximized can be selected as the optimal cut-off value [

Statistical programs used to perform the ROC curve analysis include various commercial software programs such as IBM SPSS, MedCalc, Stata, and NCSS and open-source software such as R. Most statistical analysis software programs provide basic ROC analysis functions. However, the functions provided by each software product are slightly different. IBM SPSS, the most widely used commercial software, can provide fundamental statistical analyses for ROC curves, such as plotting ROC curves, calculating the AUC, and CIs with statistical significance. However, IBM SPSS does not include various functions for optimal cut-off values and does not provide a sample size calculation. Stata provides a variety of functions for ROC curve analyses, including the pAUC, multiple ROC curve comparisons, optimal cut-off value determination using Youden’s index, and multiple performance measures. MedCalc, as the name suggests, is a software developed specifically for medical research. MedCalc provides a sample size estimation for a single diagnostic test and includes various analytical techniques to determine the optimal cut-off value but does not provide a function to calculate the pAUC.

Unlike commercial software packages, the R program is a free, open-source software that includes all the functions for ROC curve analyses using packages such as ROCR [

Although these R packages have a considerable number of functions, they require good programming knowledge of the R language. Therefore, for someone who is not an R user, working with a command-based interface may be challenging and time-consuming. Therefore, a web-based tool that combines several R packages has recently been developed to overcome these shortcomings, enabling a more straightforward ROC analysis. The web tool for the ROC curve analysis based on R, which includes easyROC and plotROC [

The ROC curve is used to represent the overall performance of a diagnostic test by connecting the coordinate points with “1 – specificity” (= false positive rate) as the x-axis and “sensitivity” as the y-axis for all cut-off point at which the test results are measured. It is also used to determine the optimal cut-off value for diagnosing a disease. The AUC is a measure of the overall performance of a diagnostic test and can be interpreted as the average value of sensitivities for all possible specificities. The AUC has a value between 0 and 1 but is meaningful as a diagnostic test only when it is > 0.5. The larger the value, the better the overall performance of the test. Since nonparametric estimates of the AUC tend to be underestimated with discrete grade scale data, whereas parametric estimates of the AUC have a low risk of bias unless the sample size is very small, it is recommended to use parametric estimates for discrete grade scale data. When evaluating the diagnostic performance of a test only in some regions of the overall ROC curve, the pAUC should be used in specific FPR regions.

Youden’s index, Euclidean distance, accuracy, and cost index can be used to determine the optimal cut-off value. However, the approach should be selected according to the clinical situation that the researcher intends to analyze. Various commercial programs and R packages as well as a web tool based on R can be used for ROC curve analyses.

In conclusion, the ROC curve is a statistical method used to determine the diagnostic method and the best cut-off value showing the best diagnostic performance. The best diagnostic test method and the optimal cut-off value should be determined using the appropriate method.

The author would like to thank Ms. Mihee Park at the Seoul National University Bundang Hospital for her assistance in editing the figures included in this paper.

None.

No potential conflict of interest relevant to this article was reported.

Graphical illustrations of two hypothetical distributions for patients with or without disease of interest. The vertical line indicates the cut-point criterion to determine the presence of the disease. TN: true negative, TP: true positive, FN: false negative, FP: false positive.

A receiver operating characteristic (ROC) curve connects coordinate points with 1 - specificity (= false positive rate) as the x-axis and sensitivity as the y-axis at all cut-off values measured from the test results. When a strict cut-off point (reference) value is applied, the point on the curve moves downward and to the left (Point A). When a loose cut-off point value is applied, the point moves upward and to the right (Point B). The 45° diagonal line serves as the reference line, since it is the ROC curve of random classification.

The features of the empirical (nonparametric) and binormal (parametric) receiver operating characteristic (ROC) curves. In contrast to the empirical ROC curve, the binormal ROC curve assumes the normal distribution of the data, resulting in a smooth curve. For estimating the binormal ROC curve, the sample mean and sample standard deviation are calculated from the disease-positive group and the disease-negative group. The 45° diagonal line serves as the reference line, since it is the ROC curve of random classification.

A comparison of the empirical (solid line) and parametric (dot-dashed line) receiver operating characteristic (ROC) curves drawn from the same data. In contrast to the empirical ROC curve, an inappropriate parametric ROC curve can be distorted or pass through the 45° diagonal line if the data are not normally distributed or heteroscedastic. In this case, the empirical method is recommended to overcome this problem.

Empirical (A) and parametric (B) receiver operating characteristic (ROC) curves drawn from the data in Table 3. Eleven labeled points on the empirical ROC curve correspond to each cut-off value to estimate sensitivity and specificity. A gradual increase or decrease of the cut-off values will change the proportion of disease-positive patients. Depending on the cut-off values, each sensitivity and specificity pair can be obtained. Using these calculated sensitivity and specificity pairs, a ROC curve can be obtained with “1 – specificity” as the x coordinates and “sensitivity” as the y coordinates.

Schematic diagram of two receiver operating characteristic (ROC) curves with an equal area under the ROC curve (AUC). Although the AUC is the same, the features of the ROC curves are not identical. Test B shows better performance in the high false-positive rate range than test A, whereas test A is better in the low false-positive range. In this example, the partial AUC (pAUC) can compare these two ROC curves at a specific false positive rate range.

Figures illustrating the various methods to select the best cut-off values. (A) Youden’s J statistics, (B) Euclidean distance to the upper-left corner, and (C) maximum multiplication of sensitivity and specificity.

The Decision Matrix

Predicted condition |
|||
---|---|---|---|

Test (+) | Test (-) | ||

True condition | Disease (+) | a | b |

Disease (-) | c | d |

The receiver operating characteristic curve is drawn with the x-axis as 1 - specificity (false positive) and the y-axis as sensitivity. sensitivity = a / (a + b), specificity = d / (c + d), false negative = b / (a + b), false positive = c / (c + d), and accuracy = (a + d) / (a + b + c + d).

Pros and Cons of the Nonparametric (Empirical) and Parametric Receiver Operating Characteristic Curve Approaches

Nonparametric ROC curve | Parametric ROC curve | |
---|---|---|

Pros | No need for assumptions about the distribution of data. | Shows a smooth curve. |

Provides unbiased estimates of sensitivity and specificity. | Compares plots at any sensitivity and specificity value. | |

The plot passes through all points. | ||

Uses all data. | ||

Computation is simple. | ||

Cons | Has a jagged or staircase appearance. | Actual data are discarded. |

Compares plots only at observed values of sensitivity or specificity. | Curve does not necessarily go through actual points. | |

ROC curves and the AUC are possibly biased. | ||

Computation is complex. |

ROC: receiver operating characteristic curve, AUC: area under the curve.

An Example of Simple Data with Ten Patients for Drawing Receiver Operating Characteristic Curves

Patient | Confirmed cancer | Tumor marker (continuous value) | |||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|

1 | (-) | 25.8 | 25.8 | 25.8 | 25.8 | 25.8 | 25.8 | 25.8 | 25.8 | 25.8 | 25.8 | ||||||||||||

2 | (-) | 26.6 | 26.6 | 26.6 | 26.6 | 26.6 | 26.6 | 26.6 | 26.6 | 26.6 | 26.6 | ||||||||||||

3 | (-) | 28.1 | 28.1 | 28.1 | 28.1 | 28.1 | 28.1 | 28.1 | 28.1 | 28.1 | 28.1 | ||||||||||||

4 | (+) | 29.0 | 29.0 | 29.0 | 29.0 | 29.0 | 29.0 | 29.0 | 29.0 | 29.0 | 29.0 | ||||||||||||

5 | (-) | 30.5 | 30.5 | 30.5 | 30.5 | 30.5 | 30.5 | 30.5 | 30.5 | 30.5 | 30.5 | ||||||||||||

6 | (-) | 31.0 | 31.0 | 31.0 | 31.0 | 31.0 | 31.0 | 31.0 | 31.0 | 31.0 | 31.0 | ||||||||||||

7 | (-) | 33.6 | 33.6 | 33.6 | 33.6 | 33.6 | 33.6 | 33.6 | 33.6 | 33.6 | 33.6 | ||||||||||||

8 | (-) | 39.3 | 39.3 | 39.3 | 39.3 | 39.3 | 39.3 | 39.3 | 39.3 | 39.3 | 39.3 | ||||||||||||

9 | (+) | 43.3 | 43.3 | 43.3 | 43.3 | 43.3 | 43.3 | 43.3 | 43.3 | 43.3 | 43.3 | ||||||||||||

10 | (+) | 45.8 | 45.8 | 45.8 | 45.8 | 45.8 | 45.8 | 45.8 | 45.8 | 45.8 | 45.8 | ||||||||||||

∖ | Tumor marker (binary results) | ||||||||||||||||||||||

(+) | (-) | (+) | (-) | (+) | (-) | (+) | (-) | (+) | (-) | (+) | (-) | (+) | (-) | (+) | (-) | (+) | (-) | (+) | (-) | (+) | (-) | ||

Confirmed cancer | (+) | 3 | 0 | 3 | 0 | 3 | 0 | 3 | 0 | 2 | 1 | 2 | 1 | 2 | 1 | 2 | 1 | 2 | 1 | 1 | 2 | 0 | 3 |

(-) | 7 | 0 | 6 | 1 | 5 | 2 | 4 | 3 | 4 | 3 | 3 | 4 | 2 | 5 | 1 | 6 | 0 | 7 | 0 | 7 | 0 | 7 | |

Sensitivity | 1.00 | 1.00 | 1.00 | 1.00 | 0.67 | 0.67 | 0.67 | 0.67 | 0.67 | 0.33 | 0 | ||||||||||||

Specificity | 0.00 | 0.14 | 0.29 | 0.43 | 0.43 | 0.57 | 0.71 | 0.86 | 1.00 | 1.00 | 1.00 |

Suppose three patients had biopsy-confirmed cancer diagnoses. The grey-colored values refer to the cases determined to be cancer according to each cut-off value highlighted in bold. The continuous test results can be transformed into binary categories by comparing each value with the cut-off (reference) value. As the cut-off value increases, the sensitivity for cancer diagnosis decreases and the specificity increases. At each cut-off value, one pair of sensitivity and specificity values can be obtained from the 2 x 2 table.

Interpretation of the Area Under the Curve

Area under the curve (AUC) | Interpretation |
---|---|

0.9 ≤ AUC | Excellent |

0.8 ≤ AUC < 0.9 | Good |

0.7 ≤ AUC < 0.8 | Fair |

0.6 ≤ AUC < 0.7 | Poor |

0.5 ≤ AUC < 0.6 | Fail |

For a diagnostic test to be meaningful, the AUC must be greater than 0.5. Generally, an AUC ≥ 0.8 is considered acceptable.

Comparison of the Statistical Packages for Receiver Operating Characteristic Curve Analyses

Statistical packages | ROC plot | Confidence interval | pAUC | Multiple comparisons | Cut-off values | Sample size | Open source | Web tool access | User interface | |
---|---|---|---|---|---|---|---|---|---|---|

Commercial program | IBM SPSS (ver. 25) | ○ | ○ | × | × | × | × | × | × | ○ |

STATA (ver. 14) | ○ | ○ | ○ | ○ | ○ | × | × | × | ○ | |

MedCalc (ver. 19.4.1) | ○ | ○ | × | ○ | ○ | ○ | × | × | ○ | |

NCSS 2021 | ○ | ○ | × | ○ | ○ | ○ | × | × | ○ | |

Free program | OptimalCutpoints | ○ | ○ | × | × | ○ | × | ○ | × | × |

(ver. 1.1-4) | ||||||||||

ROCR (ver. 1.0-11) | ○ | ○ | ○ | × | × | × | ○ | × | × | |

pROC (ver. 1.17.0.1) | ○ | ○ | ○ | ○ | ○ | ○ | ○ | × | ○ | |

easyROC (ver. 1.3.1) | ○ | ○ | ○ | ○ | ○ | ○ | ○ | ○ | ○ | |

plotROC (ver. 2.2.1) | ○ | ○ | × | ○ | ○ | × | ○ | ○ | ○ |

This table was adapted and modified from Goksuluk et al. [