# Key insights and challenges in noninferiority trials

## Article information

## Abstract

Noninferiority clinical trials are crucial for evaluating the effectiveness of new interventions compared to standard interventions. By establishing statistical and clinical comparability, these trials can be conducted to demonstrate that a new intervention is not significantly inferior to the standard intervention. However, selecting appropriate noninferiority margins and study designs are essential to ensuring valid and reliable results. Moreover, employing the Consolidated Standards of Reporting Trials (CONSORT) statement for reporting noninferiority clinical trials enhances the quality and transparency of research findings. This article addresses key considerations and challenges faced by investigators in planning, conducting, and interpreting the results of noninferiority clinical trials.

**Keywords:**Bias; Clinical trial; Noninferiority; Randomized controlled trial; Research design; Sample size; Treatment outcome

## Introduction

One commonly using study design is a comparison between a new treatment method or method of interest and another (existing method or placebo). Null hypothesis significance testing is used for statistical analysis in comparative studies. The null hypothesis, which states that no difference exists between two treatments, is rejected and the alternative hypothesis, which states that a difference does exist between two treatments, is adopted if the P value is less than the pre-determined significance level (typically 5%). If the result is not significant (P ≥ 0.05), the null hypothesis cannot be rejected, and a conclusion that no difference exists between the two groups must be adopted. However, the possibility of a type II error, which states that the null hypothesis is accepted even when a difference exists in reality, must also be considered. The researchers may have simply not discovered statistical evidence supporting a difference between the two treatments, or the research may not have had sufficient power to detect differences [1].

The research design described above cannot be used to demonstrate the similarity between the effects of two treatments. Conducting a clinical trial with a placebo control group is unethical when an established standard treatment is available. A hypothesis test can be conducted to test whether a new treatment is better than the standard treatment by examining the difference between the two treatments. However, even if the new treatment has similar effects to the standard treatment, it will be clinically important as long as it offers additional benefits such as enhanced safety, convenience, cost-effectiveness, or other supplementary effects. Trials that show that a new treatment (experimental treatment) is noninferior to the standard treatment (active control treatment) are referred to as noninferiority clinical trials.

In recent years, many noninferiority clinical trials have been conducted to compare new drugs or nerve block approaches with well-established strategies for postoperative pain management [2–7]. These trials are believed to be appropriate designs in the context of a recent trend in which new nerve block strategies are being developed to address the limitations of existing strategies and test new approaches while eliminating ethical concerns regarding placebo treatment. In this review, the assumptions that must be considered in noninferiority clinical trials, equations for calculating sample size, interpretation of results using forest plots, and Consolidated Standards of Reporting Trials (CONSORT) guidelines for reporting noninferiority clinical trials using pain scores as an example are described, including the latest trends.

## Hypothesis setting in noninferiority clinical trials

In noninferiority clinical trials, hypotheses are tested using a value known as the noninferiority margin (*M*). The null hypothesis states that an experimental treatment is inferior to the active control treatment, meaning that the difference in the pain scores between the two treatments is greater than or equal to *M*. The alternative hypothesis states that the experimental treatment is not inferior to the active control treatment, meaning that the difference in the pain scores between the two treatments is less than *M*:

where *μ _{e}* is the mean pain score of the experimental treatment and

*μ*is the mean pain score of the standard (active control) treatment.

_{s}## Noninferiority margin

The noninferiority margin is defined as an insignificant or acceptable difference between the experimental and active control treatments. If the noninferiority margin is too large, it may lead to the erroneous conclusion that the experimental treatment is noninferior, even when a clinically meaningful difference exists between the two treatments. Conversely, if the noninferiority margin is set unreasonably small, a clinically insignificant difference may exceed the noninferiority margin and thus erroneously lead to a failure to reject the null hypothesis and a conclusion that the experimental treatment is not noninferior. The noninferiority margin should be smaller than or equal to the observed typical effect size of the active control treatment compared to the placebo treatment in previous trials (statistical margin). At the same time, the noninferiority margin should be determined as the largest clinically acceptable difference that is not clinically relevant in practice (clinical margin). If the clinical margin is larger than the statistical margin, the noninferiority margin is determined using the statistical margin. This ensures that the experimental treatment can be shown to have a greater effect than the virtual placebo treatment. Conversely, if the clinical margin is smaller than the statistical margin, the noninferiority margin is determined based on the clinical margin. In essence, the smaller of the two values is chosen as the noninferiority margin. The statistical margin can be set based on the lower limit of the CI or a portion of the effect size of the active control treatment compared to previous placebo treatments, taking into consideration uncertain differences across previous studies. As the noninferiority margin is not a fixed value, it should be clearly justified and explained, and the value must be acceptable to researchers in the field. As the sample size in a study is determined by the noninferiority margin, it should be predetermined based on strict criteria and cannot be modified based on the results after the study is completed.

## Assumptions of noninferiority clinical trials

In a noninferiority clinical trial comparing experimental treatment with an active control treatment, one necessary assumption is that the experimental treatment is superior to a virtual placebo treatment. The term ‘active control treatment’ refers to a treatment proven to be more effective than placebo treatment in previous research and currently serves as the control group in noninferiority clinical trials. In contrast, ‘virtual placebo treatment’ is a purely hypothetical treatment that does not exist in the current noninferiority study. In a noninferiority clinical trial, the experimental treatment is compared with an active control treatment that has been shown to be superior to a placebo treatment in a previous trial, thereby indirectly confirming the superiority of the experimental treatment over the virtual placebo treatment. The experimental treatment is considered superior to the virtual placebo treatment when the difference between the effects of the active control and experimental treatments in the current noninferiority clinical trial is smaller than the difference between the effects of the active control and placebo treatments identified in a previous trial [8].

In a parallel-group trial, such as a randomized case-control trial, direct comparisons can be used to definitively assess whether the experimental treatment is better than the control treatment. However, in a noninferiority clinical trial using an active control treatment, proving the noninferiority of the experimental treatment to the active control treatment, this does not necessarily suggest that the experimental treatment is better than the virtual placebo treatment. This is because a noninferiority clinical trial does not directly compare experimental treatments with a virtual placebo treatment. The hypothesis of the noninferiority trial assumes that the effect of the active control treatment in previous studies with assay sensitivity is reproduced. Assay sensitivity refers to the ability of the clinical study to distinguish whether a treatment is effective or not compared with a placebo. Maintaining assay sensitivity in noninferiority clinical trials is critical because of the absence of a placebo for evaluating the effect of the experimental treatment. For more details, refer to the guideline on noninferiority clinical trials released by the U.S. Food and Drug Administration in 2016. The guideline outlines three considerations when determining if a noninferiority trial has assay sensitivity: 1) historical evidence of sensitivity to drug effects, 2) similarity of the new noninferiority trial to the historical trials (the constancy assumption), and 3) quality of the new trial (ruling out defects that would tend to minimize differences between treatments). Assay sensitivity is also dependent on the measurement tool for an outcome or the noninferiority margin.

A few aspects need to be considered when attempting to reproduce the effects of active control treatment in a noninferiority clinical trial. First, consistent study findings indicate that the effects of the active control treatment are stable and reliable. If the effects of the active control treatment vary across studies, the superiority of the experimental treatment to a virtual placebo treatment cannot be definitively concluded, even if noninferiority to the active control treatment is confirmed. Second, noninferiority clinical trials must be designed in a manner similar to that of the previous studies that investigated the effects of the active control treatment. Deviations from previous study conditions can lead to a misinterpretation of results. For example, if the trial involves surgeries that are less painful than those in prior studies, patients might report lower pain scores, even if the new intervention has minimal analgesic effects. This can lead to an overestimation of the therapeutic benefits such that they appear similar to the standard treatment. Third, a high standard must be used to manage noninferiority clinical trials to maintain data reliability and prevent the withdrawal of participants. When data acquisition is affected by issues such as ethical concerns, including data manipulation, and errors due to mistakes or other factors resulting in the similarity of the outcomes of the two treatments, the null hypothesis is selected in trials assessing superiority or inequality, whereas the alternative hypothesis is adopted for an inferiority test. If a high dropout rate occurs due to an imperfect protocol, the sample size decreases, widening the CI of the difference in sample means and reducing statistical power. Indeed, protocols that are not adequately designed to align with the research objectives can lead to an increased dropout rate and a subsequent reduction in the reliability of the study. For instance, if patients who received unplanned additional analgesics are excluded from the experimental treatment due to protocol violations, those who may have experienced relatively more severe pain are excluded from the analysis. Consequently, the pain scores for the experimental treatment may appear similar to that for the active control treatment. Therefore, noninferiority clinical trials with a high dropout rate require the careful examination of dropout cases and the reasons for withdrawal [9]. Conversely, if patients with well-controlled pain due to additional treatment are included in the active control treatment, the difference between the active control and experimental treatments may be reduced, leading to noninferior results. In the context of a noninferiority clinical trial, when the experimental treatment is reported to be noninferior to the active control treatment, one may erroneously conclude that the new treatment can be considered for clinical application. Therefore, noninferiority clinical trials require strict management.

## Hypothesis testing by interval estimation

For noninferiority testing, the central tendency and CI of the difference in effects (*μ _{e}* −

*μ*) between two treatments are estimated and compared using the predetermined noninferiority margin for hypothesis testing. The noninferiority hypothesis is formulated as one-sided relative to the noninferiority margin, thus requiring the use of a one-sided test. In other words, comparing one side of the CI based on the noninferiority margin allows for the alternative hypothesis to be accepted or rejected.

_{s}For outcome measures that indicate better outcomes with smaller values, such as pain scores, the upper limit of the CI must be smaller than the predefined noninferiority margin to adopt the alternative hypothesis. The alternative hypothesis in noninferiority testing does not include an equal sign (=). Thus, if the upper limit of the 95% CI is equal to the noninferiority margin, the alternative hypothesis is not satisfied, and the null hypothesis must be adopted [10]. Forest plots can be used to visualize the results more intuitively (Fig. 1). When presenting results using a figure for outcomes, with a smaller value indicating a better outcome, as in the example, subtracting the value of the active control treatment from that of the experimental treatment and determining whether the upper limit of the CI is below the noninferiority margin allows for a more intuitive determination of noninferiority. Conversely, for better outcomes with higher values, noninferiority can be determined by comparing the upper limit of the CI obtained by subtracting the value of the experimental treatment from that of the active control treatment with the noninferiority margin. Noninferiority can be determined based on the intersection between the CI and the noninferiority margin.

Importantly, the opposite of noninferiority is not inferiority [8]. An experimental treatment that does not meet the conditions of noninferiority must not be considered inferior [11]. Inferiority is indicated when the value is far to the right of the predefined noninferiority margin (Figs. 1C and D). Additionally, while the experimental treatment in Fig. 1C is noninferior to the active control treatment based on the noninferiority margin, the lower limit of the CI is greater than 0, indicating that the experimental treatment is significantly different from the active control treatment.

The P value for noninferiority testing is estimated by comparing the mean difference between the two treatments with the noninferiority margin based on the hypothesis. The P value can be calculated in a similar way to traditional one-sided null hypothesis testing. The P value is a widely accepted statistic to describe the evidence of statistical decisions. For noninferiority test results, describing a one-sided P value at a half significance level (e.g., 2.5%) and two-sided CIs (e.g., 95% CI) are highly recommended. Two-sided CIs are especially useful when arguing for noninferiority as well as for the results of the equivalence test. Two-sided significance-level testing is a standard method for the equivalence test. For *t*-tests in R (R Core Team, 2021, R Foundation for Statistical Computing, https://www.R-project.org/), and the noninferiority margin is entered in mu. A two-tailed test is recommended. If necessary, a one-sided test can be performed using greater or less than for the alternative argument.

# Noninferiority *t*-test example

set.seed(0) # for the consistent result

# Generate 100 random numbers for the experimental

# treatment following a normal distribution with

# a mean of 2.1 and a standard deviation of 1.5

A <− rnorm (100, 2.1, 1.5)

# Generate 100 random numbers for the active control

# treatment following a normal distribution with

# a mean of 2 and a standard deviation of 1

B <−rnorm (100, 2, 1)

# Mean of experimental treatment minus mean of active

# control treatment

mean (A) – mean (B)

## [1] 0.1795542

# One-sided result of noninferiority *t*-test

# noninferiority margin of 1 was used

# when comparing the lower confidence interval to *M*,

# alternative = 'greater' should be used.

t.test(A, B,

alternative = 'less',

conf.level = 0.975,

mu = 1)

## Welch Two Sample t-test

## data: A and B

## t = −5.0069, df = 181.08, p-value = 6.527e-07

## alternative hypothesis: true difference in means is less than 1

## 97.5 percent confidence interval:

## −Inf 0.5028789

## sample estimates:

## mean of x mean of y

## 2.134003 1.954448

# Two-sided result of noninferiority t-test

t.test(A, B, alternative = 'two.sided', mu = 1)

## Welch Two Sample t-test

## data: A and B

## t = −5.0069, df = 181.08, p-value = 1.305e-06

## alternative hypothesis: true difference in means is not equal to 1

## 95 percent confidence interval:

## −0.1437704 0.5028789

## sample estimates:

## mean of x mean of y

## 2.134003 1.954448

As a result, the mean difference between the two groups was 0.179 (95% CI: −0.144, 0.503), and the upper limit of the 95% CI was lower than the specified noninferiority margin of 1 (P value of one-sided significance level 0.975 was < 0.001).

## Sample size calculation for noninferiority test

The sample size for noninferiority testing is determined by significance level (α), power (1 − β), standard deviation, and the noninferiority margin (*M*) using the following equation [12,13]:

where *r* is the ratio of the groups, *Z* is the probability of standard normal distribution (mean = 0, standard deviation = 1), *σ* is the standard deviation, *μ _{s}* is the population mean of the active control treatment (standard treatment),

*μ*is the population mean of the experimental treatment,

_{e}*μ*–

_{s}*μ*is the expected mean difference, and

_{e}*M*is the noninferiority margin.

To obtain balanced data, the inter-treatment ratio is set to 1. When an unequal allocation is required for any reason [14], *r* can change. As the active control and experimental treatments in the population are generally assumed not to be different, the expected mean difference is 0. Under the common clinical trial conditions with a power of 80% (*Z*_{1-β} = 0 .842) for a one-sided test at a 0.025 significance level (*Z*_{1-α} = 1.96), the sample size equation shown above can be simplified as follows:

With the commonly used non-inferiority margin of 1 for pain scores [15], assuming standard deviations of 1, 2, and 3, the required sample sizes for the active control treatment are approximately 15.7, 62.8, and 142.3, respectively.

If we aim for a power of 90%, we can use *Z*_{1-β} = 1.282 for calculations:

For simpler calculations, an online sample size calculator, such as that offered by Sealed Envelope^{TM} (sealedenvelope.com), can be used as it provides syntax (Fig. 2) [16]. The mobile application Cytel East Lite for iOS ver 1.0 (2017, Cytel Inc.) provides the sample size calculation with an unequal allocation ratio.

## Analysis in noninferiority clinical trials

In clinical trials, bias may occur depending on the method used to create the analysis set [17]. Two commonly used approaches are the intention-to-treat (ITT) analysis, which includes all randomized subjects in the analysis regardless of whether they received the assigned intervention, and the per-protocol (PP) analysis, which includes only those subjects who have successfully completed the intervention according to the trial protocol [18]. For the ITT analysis, subjects’ data are analyzed based on their initial assignment, irrespective of whether they received the assigned intervention or issues of protocol violation, compliance, or withdrawal occurred. This approach includes subjects who may have received a different treatment than assigned, and thus, the effect may be underestimated compared to that for the PP analysis [19]. The ITT analysis is generally a conservative analytical approach close to real-world scenarios that preserves the value of randomization and is widely accepted as the primary analysis method in superiority or inequality assessments of randomized controlled trials (RCTs). However, in noninferiority clinical trials, ITT analyses are not considered conservative as the actual difference between the two treatments can be reduced, leading to inaccurate conclusions that an experimental treatment is noninferior when it is actually inferior. Therefore, the PP analysis is a more conservative approach and is accepted as more appropriate for noninferiority clinical trials [20].

However, the inherent exclusion of subjects as a result of noncompliance or nonadherence, which is an inevitable aspect of PP analyses, can potentially impact the interpretation of the results. Therefore, being mindful of the potential risks associated with information censoring in PP analyses is crucial [18]. Thus, the characteristics of the subjects excluded from the PP analysis must be carefully reviewed before drawing conclusions. Both ITT and PP analysis approaches must be employed in noninferiority clinical trials, and the alternative hypothesis can be adopted only when the results of both analyses consistently indicate the noninferiority of the experimental treatment. In the case of existing discrepancies between ITT and PP analysis results, the possibility of bias, such as exclusion bias, cannot be ruled out. To enhance patient compliance and lower protocol violations, investigators should strive to reduce the discrepancy between the ITT and PP sets during the trial and specify the criteria for the two analysis sets during the trial planning stage.

If both noninferiority and superiority assessments are required, they should be specified during the trial-planning stage since establishing a noninferiority margin and incorporating a noninferiority hypothesis after the completion of a clinical trial can introduce subjective judgment from investigators. Generally, if a single primary endpoint has been indicated, a trial designed to demonstrate noninferiority can be used to establish superiority without increasing the risk of type 1 errors [21]. For instance, if noninferiority is established, the CI can be used to estimate the statistical significance of a particular result to determine superiority [22].

## Reporting of noninferiority randomized trials: CONSORT statement

The CONSORT statement contains a set of guidelines for reporting RCTs. In 2006, an extension to the CONSORT statement specifically addressing noninferiority clinical trials was introduced [20], and a brief update was presented in 2012 (Table 1) [22]. An explanation of each item in the summary is provided below.

The title should include the term ‘noninferiority’ to clearly inform readers that this is a noninferiority trial. This is important both to clearly present the purpose of the study in the title and for the systematic review process.

In the background section, evidence of the superiority of the active control treatment from previous studies should be presented, along with the effect size and significance level, and the supporting literature should be cited. If no research exists on the effect size of active control treatments, alternative evidence for the therapeutic effect of active control should be presented. The rationale for demonstrating noninferiority should be provided by presenting evidence for the additional benefits that could be obtained from the experimental treatment compared to the active control treatment. The noninferiority hypothesis should be established for the primary outcome of the study and results on the additional benefits of the experimental treatment should be presented. If sequential noninferiority and superiority assessments are planned, they should also be disclosed. Rationale for the choice of noninferiority margin should be provided and whether the value is absolute or relative should be specified. The results should also be presented as absolute or relative values, depending on the noninferiority margin.

The assay sensitivity of a noninferiority clinical trial should be addressed in the methods section. Noninferiority hypothesis testing assumes that the active control treatment is effective, as reported in previous trials; thus, describing how the study participants differ from those in previous studies is essential, as this could potentially influence their responses to the intervention. The similarity of the intervention for the active control treatment in the present trial to that of previous trials should be presented, and any differences should be explained. In addition, any disparities in the outcomes between the current study and previous studies should be elucidated. This could include variations in the timing of outcome assessments, and any enhancements or modifications arising from differences in the nature of the outcome measures should be clarified. The noninferiority margin should be presented with rational clinical evidence. If the noninferiority margin is too large, it can lead to the inaccurate conclusion that an intervention is noninferior when it is actually inferior. Conversely, a small noninferiority margin may make it difficult to obtain noninferior results, even for interventions that may be adequately significant, or require an excessively large sample size.

In noninferiority trials, the results are typically interpreted based on the upper limit of the two-sided 95% CI, which is equivalent to the upper limit of the 97.5% CI for a one-sided test. Two-sided 95% CIs can be used to determine whether the experimental treatment is superior to the active control treatment after establishing that it is clearly not inferior. A forest plot can provide a more intuitive visualization of the relationship between the CI and noninferiority margin and may be helpful for readers when interpreting the results. The results must be interpreted with respect to the purpose. Therefore, the conclusion of the trial should pertain to the noninferiority status, and if a conclusion regarding superiority is drawn, it should be supported using valid evidence.

Abstracts are crucial for study accessibility and screening. The abstract of a noninferiority trial must include the noninferiority margin, a clear description of the noninferiority hypothesis, and a discussion of whether the results of previous studies point to noninferiority or superiority. An analysis of the primary outcomes should be presented in comparison with the noninferiority margin, and the interpretation of the conclusion should be presented based on the noninferiority hypothesis.

## Conclusion

Noninferiority testing is an important research method that demonstrates that a new intervention is not substantially worse than a standard intervention and is at least comparable at a statistically and clinically acceptable level. The clinical efficacy of a new approach can be established using indirect comparisons with a virtual placebo treatment by choosing an appropriate study design and a valid noninferiority margin based on previous studies. A valid interpretation of the results should be presented and the CONSORT statement for reporting noninferiority clinical trials can be used to enhance the quality of the study. We hope that this article provides the necessary information for investigators performing and assessing noninferiority clinical trials to adequately consider the issues mentioned in this article as they plan and interpret the results.

## Notes

**Funding**

This work was supported by the National Research Foundation of Korea (NRF-2022R1C1C1007982) and Chungnam National University.

**Conflicts of Interest**

No potential conflict of interest relevant to this article was reported.

**Data Availability**

All data generated or analyzed during this study are included in this published article.

**Author Contributions**

Boohwi Hong (Conceptualization; Funding acquisition; Writing – original draft)

Dong-Kyu Lee (Conceptualization; Project administration; Supervision; Visualization; Writing – review & editing)