import stata_setup
'C:/Program Files/Stata18', 'mp', splash=False) stata_setup.config(
19-无序多分类Logistic回归
Multinomial Logistic Regression
1 无序多分类Logistic回归
Proportional odds 假定满足
%%stata
webuse fullauto.dta,clear
(Automobile models)
%%stata
or ologit rep77 foreign,
Iteration 0: Log likelihood = -89.895098
Iteration 1: Log likelihood = -85.951765
Iteration 2: Log likelihood = -85.908227
Iteration 3: Log likelihood = -85.908161
Iteration 4: Log likelihood = -85.908161
Ordered logistic regression Number of obs = 66
LR chi2(1) = 7.97
Prob > chi2 = 0.0047
Log likelihood = -85.908161 Pseudo R2 = 0.0444
------------------------------------------------------------------------------
rep77 | Odds ratio Std. err. z P>|z| [95% conf. interval]
-------------+----------------------------------------------------------------
foreign | 4.288246 2.276609 2.74 0.006 1.51489 12.13888
-------------+----------------------------------------------------------------
/cut1 | -2.765562 .5988208 -3.939229 -1.591895
/cut2 | -.9963603 .3217706 -1.627019 -.3657016
/cut3 | .9426153 .3136398 .3278925 1.557338
/cut4 | 3.123351 .5423257 2.060412 4.18629
------------------------------------------------------------------------------
Note: Estimates are transformed only in the first equation to odds ratios.
进口车(Foreign=1)有着更高车辆维修状况等级的odds是国产车(Foreign=0)的4.29倍(95% CI: 1.51,12.13)
Proportional odds 假定不满足
使用 Generalized Ordinal Logistic Regression
需要安装 gologit2
命令
%%stata
ssc install gologit2
checking gologit2 consistency and verifying not already installed...
installing into C:\Users\asus\ado\plus\...
installation complete.
2 gologit2 命令
2.1 满足Proportional Odds假定
y x x₂ x.., pl or gologit2
这个command和 ologit command
给出的结果相同
2.2 不满足Proportional0dds假定
y x x₂ x. ., npl or gologit2
pl
&npl
分别表示满足 parallel
2.3 检验是否满足Proportional Odds假定
Likelihood-ratio test:lrtest
%%stata
or gologit2 rep77 foreign,pl
Generalized Ordered Logit Estimates Number of obs = 66
LR chi2(1) = 7.97
Prob > chi2 = 0.0047
Log likelihood = -85.908161 Pseudo R2 = 0.0444
( 1) [Poor]foreign - [Fair]foreign = 0
( 2) [Fair]foreign - [Average]foreign = 0
( 3) [Average]foreign - [Good]foreign = 0
------------------------------------------------------------------------------
rep77 | Odds ratio Std. err. z P>|z| [95% conf. interval]
-------------+----------------------------------------------------------------
Poor |
foreign | 4.288247 2.276609 2.74 0.006 1.51489 12.13888
_cons | 15.88797 9.514049 4.62 0.000 4.913051 51.37901
-------------+----------------------------------------------------------------
Fair |
foreign | 4.288247 2.276609 2.74 0.006 1.51489 12.13888
_cons | 2.708406 .8714855 3.10 0.002 1.441525 5.088683
-------------+----------------------------------------------------------------
Average |
foreign | 4.288247 2.276609 2.74 0.006 1.51489 12.13888
_cons | .3896075 .1221964 -3.01 0.003 .2106962 .7204404
-------------+----------------------------------------------------------------
Good |
foreign | 4.288247 2.276609 2.74 0.006 1.51489 12.13888
_cons | .0440095 .0238675 -5.76 0.000 .0152026 .1274015
------------------------------------------------------------------------------
Note: _cons estimates baseline odds.
%%stata
or gologit2 rep77 foreign,npl
Generalized Ordered Logit Estimates Number of obs = 66
LR chi2(4) = 15.24
Prob > chi2 = 0.0042
Log likelihood = -82.27372 Pseudo R2 = 0.0848
------------------------------------------------------------------------------
rep77 | Odds ratio Std. err. z P>|z| [95% conf. interval]
-------------+----------------------------------------------------------------
Poor |
foreign | .9300305 1.166495 -0.06 0.954 .0795928 10.86727
_cons | 21.50014 15.55202 4.24 0.000 5.208693 88.74704
-------------+----------------------------------------------------------------
Fair |
foreign | 3.453614 2.818944 1.52 0.129 .6974251 17.10213
_cons | 2.750213 .9271033 3.00 0.003 1.420445 5.324862
-------------+----------------------------------------------------------------
Average |
foreign | 3.281111 1.804947 2.16 0.031 1.116279 9.644262
_cons | .4062893 .1336252 -2.74 0.006 .2132467 .7740847
-------------+----------------------------------------------------------------
Good |
foreign | 3.94e+07 6.55e+10 0.01 0.992 0 .
_cons | 7.93e-09 .0000132 -0.01 0.991 0 .
------------------------------------------------------------------------------
Note: _cons estimates baseline odds.
当Proportional Odds假定不成立时
进口车(Foreign=1)和国产车(Foreign=0)比:
- Odds(Excellent+Good+Average+Fair)/Odds(Poor)= 0.93
- Odds(Excellent+Good+Average)/Odds(Fair+Poor)= 3.45
- Odds(Excellent+Good)/Odds(Average+Fair+Poor)= 3.28
- Odds(Excellent)/Odds(Good+Average+Fair+Poor)= 3.94*10^7
2.4 检查Proportional Odds假定是否成立
\(H_0\):Non-Proportional Odds 模型可以更好解释结局变量各个等级之间关系
%%stata
or
gologit2 rep77 foreign,pl
est store Aor
gologit2 rep77 foreign,npl
est store B//Likelihood-ratio test lrtest A B
. gologit2 rep77 foreign,pl or
Generalized Ordered Logit Estimates Number of obs = 66
LR chi2(1) = 7.97
Prob > chi2 = 0.0047
Log likelihood = -85.908161 Pseudo R2 = 0.0444
( 1) [Poor]foreign - [Fair]foreign = 0
( 2) [Fair]foreign - [Average]foreign = 0
( 3) [Average]foreign - [Good]foreign = 0
------------------------------------------------------------------------------
rep77 | Odds ratio Std. err. z P>|z| [95% conf. interval]
-------------+----------------------------------------------------------------
Poor |
foreign | 4.288247 2.276609 2.74 0.006 1.51489 12.13888
_cons | 15.88797 9.514049 4.62 0.000 4.913051 51.37901
-------------+----------------------------------------------------------------
Fair |
foreign | 4.288247 2.276609 2.74 0.006 1.51489 12.13888
_cons | 2.708406 .8714855 3.10 0.002 1.441525 5.088683
-------------+----------------------------------------------------------------
Average |
foreign | 4.288247 2.276609 2.74 0.006 1.51489 12.13888
_cons | .3896075 .1221964 -3.01 0.003 .2106962 .7204404
-------------+----------------------------------------------------------------
Good |
foreign | 4.288247 2.276609 2.74 0.006 1.51489 12.13888
_cons | .0440095 .0238675 -5.76 0.000 .0152026 .1274015
------------------------------------------------------------------------------
Note: _cons estimates baseline odds.
. est store A
. gologit2 rep77 foreign,npl or
Generalized Ordered Logit Estimates Number of obs = 66
LR chi2(4) = 15.24
Prob > chi2 = 0.0042
Log likelihood = -82.27372 Pseudo R2 = 0.0848
------------------------------------------------------------------------------
rep77 | Odds ratio Std. err. z P>|z| [95% conf. interval]
-------------+----------------------------------------------------------------
Poor |
foreign | .9300305 1.166495 -0.06 0.954 .0795928 10.86727
_cons | 21.50014 15.55202 4.24 0.000 5.208693 88.74704
-------------+----------------------------------------------------------------
Fair |
foreign | 3.453614 2.818944 1.52 0.129 .6974251 17.10213
_cons | 2.750213 .9271033 3.00 0.003 1.420445 5.324862
-------------+----------------------------------------------------------------
Average |
foreign | 3.281111 1.804947 2.16 0.031 1.116279 9.644262
_cons | .4062893 .1336252 -2.74 0.006 .2132467 .7740847
-------------+----------------------------------------------------------------
Good |
foreign | 3.94e+07 6.55e+10 0.01 0.992 0 .
_cons | 7.93e-09 .0000132 -0.01 0.991 0 .
------------------------------------------------------------------------------
Note: _cons estimates baseline odds.
. est store B
. lrtest A B //Likelihood-ratio test
Likelihood-ratio test
Assumption: A nested within B
LR chi2(3) = 7.27
Prob > chi2 = 0.0638
.
根据 Likelihood-ratio test 得出的结果,\(P=0.0638>0.05\),拒绝\(H_0\): Non-Proportional 0dds并没有更好解释结局变量各个等级之间关系。
3 无序多分类 Logistic 回归
- 把结局变量的某个分类作为reference,然后比较结局变量其他分类相对于reference的相对风险(Relative Risk)
\[RR_j=Pr(cat=j)/Pr(reference\ cat)\] \[log(RR_j)=\beta_{0j}+\beta_{1j}X_1+\cdots +\beta_{pj}X_p\]
notice:cat
是 category
的缩写
4 有序和无序多分类比较
- 有序多分类 Logistic 回归:
- \(RR_j=Pr(cat>j)/Pr(cat\leq j)\)
ologit y x_1 x_2 x_3 ...,or
- 无序多分类Logistic回归:
- \(RR_j=Pr(cat=j)/Pr(reference\ cat)\)
mlogit y x_1 x_2x x_3...,rrr baseoutcome(j)
mlogit
是 multi logit 的缩写
baseoutcome(j)
如果不指定, Stata 会自动选择
%%stata
1) mlogit rep77 foreign,rrr baseoutcome(
Iteration 0: Log likelihood = -89.895098
Iteration 1: Log likelihood = -85.605381
Iteration 2: Log likelihood = -82.670821
Iteration 3: Log likelihood = -82.335383
Iteration 4: Log likelihood = -82.28077
Iteration 5: Log likelihood = -82.274431
Iteration 6: Log likelihood = -82.273851
Iteration 7: Log likelihood = -82.273742
Iteration 8: Log likelihood = -82.273725
Iteration 9: Log likelihood = -82.27372
Multinomial logistic regression Number of obs = 66
LR chi2(4) = 15.24
Prob > chi2 = 0.0042
Log likelihood = -82.27372 Pseudo R2 = 0.0848
------------------------------------------------------------------------------
rep77 | RRR Std. err. z P>|z| [95% conf. interval]
-------------+----------------------------------------------------------------
Poor | (base outcome)
-------------+----------------------------------------------------------------
Fair |
foreign | .2000452 .3225721 -1.00 0.318 .0084834 4.717229
_cons | 5.000509 3.873398 2.08 0.038 1.095653 22.82209
-------------+----------------------------------------------------------------
Average |
foreign | .7001516 .9110327 -0.27 0.784 .054653 8.969536
_cons | 10.00009 7.416364 3.10 0.002 2.337371 42.78389
-------------+----------------------------------------------------------------
Good |
foreign | 1.076972 1.412458 0.06 0.955 .0823847 14.07869
_cons | 6.500016 4.937183 2.46 0.014 1.466803 28.80429
-------------+----------------------------------------------------------------
Excellent |
foreign | 1.32e+07 1.52e+10 0.01 0.989 0 .
_cons | 3.79e-07 .0004353 -0.01 0.990 0 .
------------------------------------------------------------------------------
Note: _cons estimates baseline relative risk for each outcome.
进口车(Foreign=1)和国产车(Foreign=0)比:
- Risk(Fair)/Risk(Poor)=0.20
- Risk(Average)/Risk(Poor)=0.70
- Risk(Good)/Risk(Poor)= 1.08
- Risk(Excellent)/Risk(Poor)= 1.32*10^7
Risk(Excellent)/Risk(Poor)= 1.32*10^7,这个结果之所以如此大,是因为有一个
Excellent
样本是0,所以估计有偏