import stata_setup
'C:/Program Files/Stata18', 'mp', splash=False) stata_setup.config(
18-有序多分类Logistic回归
Ordinal Logistic Regression
1 多分类变量
多分类变量主要分为有序多分类和无序多分类变量
1.1 有许多分类变量
疾病分期;严重程度;发展阶段等
1.2 无序多分类变量
方位(东、南、西、北);品牌等
2 有序多分类
2.1 有序多分类的原理
- 将y变量的n个分类拆分成n-1个二分类Logistic回归
- 例子中的Excellent; Good; Average; Fair; Poor拆分成:
2.2 Proportional odds 假定
- 多个二元Logistic回归中,除了\(\beta_0\)以外的系数相等 \[Odds(Poor)/Odds(Excellent+Good+Average+Fair)\\ = Odds(Fair+Poor)/Odds(Excellent+Good+Average)\\ =Odds(Average+Fair+Poor)/Odds(Excellent+Good)\\ = Odds(Good+Average+Fair+Poor)/Odds(Excellent)\]
- Proportionalodds假定是否成立更多是由研究问题的自身性质决定,可以用数据进行检测,但数据本身可能有Bias
- 如果该假定不成立:当做无序多分类Logistic回归
3 导入数据
1977年汽车修理记录数据
%%stata
webuse fullauto.dta,clear
(Automobile models)
3.1 结局变量
outcome:车辆维修状况
%%stata
codebook rep77
-------------------------------------------------------------------------------
rep77 Repair record 1977
-------------------------------------------------------------------------------
Type: Numeric (byte)
Label: repair
Range: [1,5] Units: 1
Unique values: 5 Missing .: 8/74
Tabulation: Freq. Numeric Label
3 1 Poor
11 2 Fair
27 3 Average
20 4 Good
5 5 Excellent
8 .
3.2 暴露变量
exposure:是否为进口车
%%stata
codebook foreign
-------------------------------------------------------------------------------
foreign Foreign
-------------------------------------------------------------------------------
Type: Numeric (byte)
Label: foreign
Range: [0,1] Units: 1
Unique values: 2 Missing .: 0/74
Tabulation: Freq. Numeric Label
52 0 Domestic
22 1 Foreign
4 卡方检验
\(H_0\):车辆是否为进口车和车辆维修状况没有关系
%%stata
tab foreign rep77,chi2
| Repair record 1977
Foreign | Poor Fair Average Good Excellent | Total
-----------+-------------------------------------------------------+----------
Domestic | 2 10 20 13 0 | 45
Foreign | 1 1 7 7 5 | 21
-----------+-------------------------------------------------------+----------
Total | 3 11 27 20 5 | 66
Pearson chi2(4) = 13.8619 Pr = 0.008
\(P=0.008<0.05\),在\(\alpha=0.05\) 的检验水准下,拒绝零假设,得出结论:车辆是否为进口车和车辆维修状况有关系
5 有序Logistic回归
5.1 语法
ologit y x1 x2 x3 ...xn [if] [in] [weight] [,options]
- 最常用的
[,options]
是or
,他可以直接给出OR
值 - Examples:
- ologit rep77 foreign
- ologit rep77 foreign, or
- ologit rep77 foreign length mpg, or
%%stata
ologit rep77 foreign
Iteration 0: Log likelihood = -89.895098
Iteration 1: Log likelihood = -85.951765
Iteration 2: Log likelihood = -85.908227
Iteration 3: Log likelihood = -85.908161
Iteration 4: Log likelihood = -85.908161
Ordered logistic regression Number of obs = 66
LR chi2(1) = 7.97
Prob > chi2 = 0.0047
Log likelihood = -85.908161 Pseudo R2 = 0.0444
------------------------------------------------------------------------------
rep77 | Coefficient Std. err. z P>|z| [95% conf. interval]
-------------+----------------------------------------------------------------
foreign | 1.455878 .5308951 2.74 0.006 .4153425 2.496413
-------------+----------------------------------------------------------------
/cut1 | -2.765562 .5988208 -3.939229 -1.591895
/cut2 | -.9963603 .3217706 -1.627019 -.3657016
/cut3 | .9426153 .3136398 .3278925 1.557338
/cut4 | 3.123351 .5423257 2.060412 4.18629
------------------------------------------------------------------------------
进口车(foreign=1)和国产车(foreign=0)比:
\[Odds=e^{-1.46}=0.23\]
- 更高维修状况等级为reference,在更低维修状况的odds
也可以是:
\[Odds=e^{\beta}=e^{1.46}=4.29\]
- 更低维修状况等级为reference,在更高维修状况的odds
进口(Foreign=1)车和国产车相比(Foreign=0),在“更低的车辆维修状况等级”的odds是在“更高维修状况等级”的0.23倍
一般使用如下假释:进口(Foreign=1)车和国产车相比(Foreign=0),在“更高的车辆维修状况等级”的odds是在“更低维修状况等级”的4.29倍
%%stata
or ologit rep77 foreign,
Iteration 0: Log likelihood = -89.895098
Iteration 1: Log likelihood = -85.951765
Iteration 2: Log likelihood = -85.908227
Iteration 3: Log likelihood = -85.908161
Iteration 4: Log likelihood = -85.908161
Ordered logistic regression Number of obs = 66
LR chi2(1) = 7.97
Prob > chi2 = 0.0047
Log likelihood = -85.908161 Pseudo R2 = 0.0444
------------------------------------------------------------------------------
rep77 | Odds ratio Std. err. z P>|z| [95% conf. interval]
-------------+----------------------------------------------------------------
foreign | 4.288246 2.276609 2.74 0.006 1.51489 12.13888
-------------+----------------------------------------------------------------
/cut1 | -2.765562 .5988208 -3.939229 -1.591895
/cut2 | -.9963603 .3217706 -1.627019 -.3657016
/cut3 | .9426153 .3136398 .3278925 1.557338
/cut4 | 3.123351 .5423257 2.060412 4.18629
------------------------------------------------------------------------------
Note: Estimates are transformed only in the first equation to odds ratios.
%%stata
or ologit rep77 foreign length mpg,
Iteration 0: Log likelihood = -89.895098
Iteration 1: Log likelihood = -78.775147
Iteration 2: Log likelihood = -78.254294
Iteration 3: Log likelihood = -78.250719
Iteration 4: Log likelihood = -78.250719
Ordered logistic regression Number of obs = 66
LR chi2(3) = 23.29
Prob > chi2 = 0.0000
Log likelihood = -78.250719 Pseudo R2 = 0.1295
------------------------------------------------------------------------------
rep77 | Odds ratio Std. err. z P>|z| [95% conf. interval]
-------------+----------------------------------------------------------------
foreign | 18.1162 14.32342 3.66 0.000 3.846558 85.32223
length | 1.086354 .024682 3.65 0.000 1.03904 1.135823
mpg | 1.259567 .0887425 3.28 0.001 1.097109 1.44608
-------------+----------------------------------------------------------------
/cut1 | 17.92748 5.551191 7.047344 28.80761
/cut2 | 19.86506 5.59648 8.896161 30.83396
/cut3 | 22.10331 5.708936 10.914 33.29262
/cut4 | 24.69213 5.890754 13.14647 36.2378
------------------------------------------------------------------------------
Note: Estimates are transformed only in the first equation to odds ratios.
- 在控制了汽车的长度、里程之后,进口车有着更高车辆维修状况等级的odds是国产车的18.12倍(95%CI:3.85,85.32)
- 在控制了汽车的产地、里程之后,车辆每增加1 inch,有更高车辆维修状况的odds增加8.64%(95% CI: 3.90,13.58)
- mpg