The letter to the editor by Carson and the reply by Green and Schriger raise some important questions about the nature of noninferiority studies and the meaning of these results in the article by Rehrer et al. In my opinion both sides are correct. Noninferiority studies have recently enjoyed resurgence by examining previously validated and investigational therapies. The key to interpreting these studies lies in two domains, defining the noninferiority margin and interpreting the statistical outcomes.
The principle of a noninferiority study is that we are “very sure” that the experimental therapy is not worse than the standard control therapy. The noninferiority margin is the maximal difference in outcomes that we would consider not to be clinically significant. This is determined according to previous studies of the control intervention. This tends to be selected in a more or less subjective manner. A more clinically relevant interpretation of this margin is the number needed to treat to fail the outcome. In the study by Rehrer et al, the noninferiority margin selected was 8%, the same as a number needed to treat to fail the outcome of 13. For every 13 patients treated with oral dexamethasone, one additional patient will fail therapy and return to the ED within 14 days with an exacerbation of asthma.
The next issue is that of statistical significance. In a noninferiority study, the a priori hypothesis is that the experimental treatment is not worse than the control and is therefore either better or the same. This allows use of a one-tailed statistical test in calculating the P value and the 95% confidence intervals (CIs) of the difference between groups found in the study. Using a 1-tailed test means that finding superiority would not be statistically significant because this result could have occurred by chance alone when in fact the two are equivalent. If we used the standard 2-tailed test, the P value is .51, whereas it is .29 with the 1-tailed test.
Interpreting this CI to calculate number needed to treat to fail the outcome is a bit more complex. If the CI is completely within the noninferiority range, we accept the results to be “statistically significant.” In the case of the study by Rehrer et al, the difference between groups was 2.3% and the 95% CI was –4.1% to 8.6%, which was not statistically significant because the noninferiority margin of 8% was crossed.
Number needed to treat is the inverse of the percentages in the CI. Because zero is one possible outcome, the true CI will include 100% divided by zero, or an infinite or indeterminate number. At the lowest end of the 95% CI, –4.7%, there is a decrease in the percentage of patients receiving oral dexamethasone who returned to the emergency department (ED) within 14 days. This is equal to a number needed to treat for benefit of 22. As the absolute rate reduction percentage decreases toward zero, the number needed to treat for benefit increases to very large numbers, suggesting that an increasing number of patients need to be treated for one additional patient to have a changed outcome. This implies equality of the two treatments. As the positive absolute rate reduction increases from zero, the number needed to treat to fail the outcome will decrease. The upper limit of the 95% CI is 8.6%, which gives a number needed to treat to fail the outcome of 12. A smaller number needed to treat means that more people will benefit from one of the two interventions, and therefore we would use the superior one. Noninferiority studies cannot tell you that this is correct, only that this is another testable hypothesis.
The decision to accept the results would be up to the clinician. The case for accepting the study result is that with a confidence level of 95%, the lowest statistically expected number needed to treat to fail the outcome would be 12. This means that for every 12 patients treated with dexamethasone rather than prednisone, one additional patient will fail outpatient management and return to the ED within 14 days with an asthma exacerbation. This would be a relatively small additional cost for using the therapy, making the result clinically reasonable. Factors such as patient acceptability (taste and frequency of administration) and treatment cost would then dictate whether the treatment should be considered. This method of analysis gives the physician some tools for a meaningful shared decisionmaking discussion with the patient.