A comprehensive resource providing new statistical methodologies and demonstrating how new approaches work for applications
M-statistics introduces a new approach to statistical inference, redesigning the fundamentals of statistics, and improving on the classical methods we already use. This book targets exact optimal statistical inference for a small sample under one methodological umbrella. Two competing approaches are offered: maximum concentration (MC) and mode (MO) statistics combined under one methodological umbrella, which is why the symbolic equation M=MC+MO. M-statistics defines an estimator as the limit point of the MC or MO exact optimal confidence interval when the confidence level approaches zero, the MC and MO estimator, respectively. Neither mean nor variance plays a role in M-statistics theory.
Novel statistical methodologies in the form of double-sided unbiased and short confidence intervals and tests apply to major statistical parameters: - Exact statistical inference for small sample sizes is illustrated with effect size and coefficient of variation, the rate parameter of the Pareto distribution, two-sample statistical inference for normal variance, and the rate of exponential distributions. - M-statistics is illustrated with discrete, binomial, and Poisson distributions. Novel estimators eliminate paradoxes with the classic unbiased estimators when the outcome is zero. - Exact optimal statistical inference applies to correlation analysis including Pearson correlation, squared correlation coefficient, and coefficient of determination. New MC and MO estimators along with optimal statistical tests, accompanied by respective power functions, are developed. - M-statistics is extended to the multidimensional parameter and illustrated with the simultaneous statistical inference for the mean and standard deviation, shape parameters of the beta distribution, the two-sample binomial distribution, and finally, nonlinear regression.
Our new developments are accompanied by respective algorithms and R codes, available at GitHub, and as such readily available for applications.
M-statistics is suitable for professionals and students alike. It is highly useful for theoretical statisticians and teachers, researchers, and data science analysts as an alternative to classical and approximate statistical inference.
Table of Contents
Preface xi
1 Limitations of classic statistics and motivation 1
1.1 Limitations of classic statistics 2
1.1.1 Mean 2
1.1.2 Unbiasedness 4
1.1.3 Limitations of equal-tail statistical inference 5
1.2 The rationale for a new statistical theory 5
1.3 Motivating example: normal variance 7
1.3.1 Confidence interval for the normal variance 7
1.3.2 Hypothesis testing for the variance 12
1.3.3 MC and MO estimators of the variance 14
1.3.4 Sample size determination for variance 15
1.4 Neyman-Pearson lemma and its extensions 17
1.4.1 Introduction 17
1.4.2 Two lemmas 19
References 28
2 Maximum concentration statistics 31
2.1 Assumptions 31
2.2 Short confidence interval and MC estimator 34
2.3 Density level test 42
2.4 Efficiency and the sufficient statistic 45
2.5 Parameter is positive or belongs to a finite interval 48
2.5.1 Parameter is positive 48
2.5.2 Parameter belongs to a finite interval 50
References 51
3 Mode statistics 53
3.1 Unbiased test 53
3.2 Unbiased CI and MO estimator 58
3.3 Cumulative information and the sufficient statistic 61
References 65
4 P -value and duality 67
4.1 P -value for the double-sided hypothesis 67
4.1.1 General definition 68
4.1.2 P -value for normal variance 72
4.2 The overall powerful test 77
4.3 Duality: converting the CI into a hypothesis test 83
4.4 Bypassing assumptions 85
4.5 Overview 86
References 89
5 M-statistics for major statistical parameters 91
5.1 Exact statistical inference for standard deviation 91
5.1.1 MC-statistics 92
5.1.2 MC-statistics on the log scale 94
5.1.3 MO-statistics 95
5.1.4 Computation of the p-value 95
5.2 Pareto distribution 95
5.2.1 Confidence intervals 96
5.2.2 Hypothesis testing 99
5.3 Coefficient of variation for lognormal distribution 101
5.4 Statistical testing for two variances 103
5.4.1 Computation of the p-value 105
5.4.2 Optimal sample size 106
5.5 Inference for two-sample exponential distribution 106
5.5.1 Unbiased statistical test 108
5.5.2 Confidence intervals 110
5.5.3 The MC estimator of ν 112
5.6 Effect size and coefficient of variation 113
5.6.1 Effect size 114
5.6.2 Coefficient of variation 120
5.6.3 Double-sided hypothesis tests 125
5.6.4 Multivariate ES 126
5.7 Binomial probability 127
5.7.1 The MCL estimator 128
5.7.2 The MCL2 estimator 130
5.7.3 The MCL2 estimator of pn 131
5.7.4 Confidence interval on the double-log scale 132
5.7.5 Equal-tail and unbiased tests 134
5.8 Poisson rate 137
5.8.1 Two-sided short CI on the log scale 138
5.8.2 Two-sided tests and p-value 140
5.8.3 The MCL estimator of the rate parameter 142
5.9 Meta-analysis model 143
5.9.1 CI and MCL estimator 146
5.10 M-statistics for the correlation coefficient 149
5.10.1 MC and MO estimators 150
5.10.2 Equal-tail and unbiased tests 153
5.10.3 Power function and p-value 154
5.10.4 Confidence intervals 156
5.11 The square multiple correlation coefficient 159
5.11.1 Unbiased statistical test 160
5.11.2 Computation of p-value 163
5.11.3 Confidence intervals 164
5.11.4 The two-sided CI on the log scale 165
5.11.5 The MCL estimator 166
5.12 Coefficient of determination for linear model 168
5.12.1 CoD and multiple correlation coefficient 169
5.12.2 Unbiased test 170
5.12.3 The MCL estimator for CoD 171
References 173
6 Multidimensional parameter 177
6.1 Density level test 177
6.2 Unbiased test 180
6.3 Confidence region dual to the DL test 181
6.4 Unbiased confidence region 184
6.5 Simultaneous inference for normal mean and standard deviation 187
6.5.1 Statistical test 187
6.5.2 Confidence region 192
6.6 Exact confidence inference for parameters of the beta distribution 194
6.6.1 Statistical tests 196
6.6.2 Confidence regions 197
6.7 Two-sample binomial probability 199
6.7.1 Hypothesis testing 199
6.7.2 Confidence region 202
6.8 Exact and profile statistical inference for nonlinear regression 204
6.8.1 Statistical inference for the whole parameter 205
6.8.2 Statistical inference for an individual parameter of interest via profiling 210
References 215
Index 219