### NHST and Related Problems

_{0}). In other words, when the concept was firstly published, only the null hypothesis was mentioned, while the concept of the alternative hypothesis was absent. In addition, the critical value (CV) of the P value, which is currently employed as the criterion of acceptance or rejection, was not defined before starting an experiment. The concept of the alternative hypothesis was introduced later. The method of testing an alternative hypothesis is similar to the method devised by Fisher, but there is one difference. A method of testing was suggested on the basis of the Type I error (α), Type II error (β), minimum effect size (MES), power (1-β), alternative hypothesis, and the concept of a critical value to determine whether or not the two hypotheses can be accepted. At first glance, this method appears to be very similar to Fisher's method, but the fundamental difference is that everything discussed above is established before starting a study, and the number of subjects in the study is determined before starting the study according to the setup. There is another difference. Because Fisher's method did not include the concept of a critical value, with regard to the interpretation of a small P value, Fisher concluded that when it is difficult to explain experimental results using a null hypothesis, the result is significant; thus, an experiment may explain the difference. Here, a smaller P value was interpreted as 'more significant.' However, in the Neyman- Pearson method, a critical value is set up in advance as a criterion of judgment about a hypothesis, and therefore a dichotomous judgment is made. When the critical value is set to 0.05, if the experimental results show a value greater than 0.05, the null hypothesis is accepted - as an MES is set up, the null hypothesis is not a nil hypothesis. Alternatively, if the experimental results show a value of less than 0.05, the alternative hypothesis is accepted. In other words, even when the P value is 0.01 or smaller, the result may not be interpreted as 'more significant,' and even when the P value is 0.06, the result may not be interpreted as 'tending to be significant.' The P value in Fisher's method may be interpreted as representing the strength of the significance, while the P value in the Neyman-Pearson method may be used only as a criterion with which to select the main hypothesis and the alternative hypothesis with reference to the CV, not as a means of determining the strength of the significance. The objectives of the two methods are also different: the objective of Fisher's method is to test the significance of the research result, whereas that of Neyman-Pearson's method is to choose either of the contradicting hypotheses.

### Incorrect Interpretation of the P value

**the probability**that

**is very low**.

_{0}= not having the disease

_{A}= having the disease

_{+}= positive test result

_{-}= negative test result

_{0}|D

_{+}))? If the normal method of interpreting a P value is followed, the probability should be 5%. However, the answer is not 5%. The calculation is summarized in Table 1. In the population, there are 275 positive test results, and the number of normal cases who have received a positive test result is 180. Therefore, the probability that a person who has received a positive test result is in fact normal is 180/275, that is, 0.65. Isn't it surprising? Of course, the data were created arbitrarily. However, the data were prepared by referring to the fact that the success ratio of a drug starting from a clinical trial to approval is close to 10% [5]. The sensitivity 95% was established by applying the normal probability of Type I error at 5%, and the specificity of 80% was set by applying the statistical power.