19-无序多分类Logistic回归

Multinomial Logistic Regression

作者

Simon Zhou

发布于

2025年5月9日

import stata_setup
stata_setup.config('C:/Program Files/Stata18', 'mp', splash=False)

1 无序多分类Logistic回归

Proportional odds 假定满足

%%stata
webuse fullauto.dta,clear
(Automobile models)
%%stata
ologit rep77 foreign,or

Iteration 0:  Log likelihood = -89.895098  
Iteration 1:  Log likelihood = -85.951765  
Iteration 2:  Log likelihood = -85.908227  
Iteration 3:  Log likelihood = -85.908161  
Iteration 4:  Log likelihood = -85.908161  

Ordered logistic regression                             Number of obs =     66
                                                        LR chi2(1)    =   7.97
                                                        Prob > chi2   = 0.0047
Log likelihood = -85.908161                             Pseudo R2     = 0.0444

------------------------------------------------------------------------------
       rep77 | Odds ratio   Std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
     foreign |   4.288246   2.276609     2.74   0.006      1.51489    12.13888
-------------+----------------------------------------------------------------
       /cut1 |  -2.765562   .5988208                     -3.939229   -1.591895
       /cut2 |  -.9963603   .3217706                     -1.627019   -.3657016
       /cut3 |   .9426153   .3136398                      .3278925    1.557338
       /cut4 |   3.123351   .5423257                      2.060412     4.18629
------------------------------------------------------------------------------
Note: Estimates are transformed only in the first equation to odds ratios.

进口车(Foreign=1)有着更高车辆维修状况等级的odds是国产车(Foreign=0)的4.29倍(95% CI: 1.51,12.13)

Proportional odds 假定不满足

使用 Generalized Ordinal Logistic Regression

需要安装 gologit2 命令

%%stata
ssc install gologit2
checking gologit2 consistency and verifying not already installed...
installing into C:\Users\asus\ado\plus\...
installation complete.

2 gologit2 命令

2.1 满足Proportional Odds假定

gologit2 y x x₂ x.., pl or

这个command和 ologit command 给出的结果相同

2.2 不满足Proportional0dds假定

gologit2 y x x₂ x. ., npl or

pl & npl 分别表示满足 parallel

2.3 检验是否满足Proportional Odds假定

Likelihood-ratio test:lrtest

%%stata
gologit2 rep77 foreign,pl or

Generalized Ordered Logit Estimates                     Number of obs =     66
                                                        LR chi2(1)    =   7.97
                                                        Prob > chi2   = 0.0047
Log likelihood = -85.908161                             Pseudo R2     = 0.0444

 ( 1)  [Poor]foreign - [Fair]foreign = 0
 ( 2)  [Fair]foreign - [Average]foreign = 0
 ( 3)  [Average]foreign - [Good]foreign = 0
------------------------------------------------------------------------------
       rep77 | Odds ratio   Std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
Poor         |
     foreign |   4.288247   2.276609     2.74   0.006      1.51489    12.13888
       _cons |   15.88797   9.514049     4.62   0.000     4.913051    51.37901
-------------+----------------------------------------------------------------
Fair         |
     foreign |   4.288247   2.276609     2.74   0.006      1.51489    12.13888
       _cons |   2.708406   .8714855     3.10   0.002     1.441525    5.088683
-------------+----------------------------------------------------------------
Average      |
     foreign |   4.288247   2.276609     2.74   0.006      1.51489    12.13888
       _cons |   .3896075   .1221964    -3.01   0.003     .2106962    .7204404
-------------+----------------------------------------------------------------
Good         |
     foreign |   4.288247   2.276609     2.74   0.006      1.51489    12.13888
       _cons |   .0440095   .0238675    -5.76   0.000     .0152026    .1274015
------------------------------------------------------------------------------
Note: _cons estimates baseline odds.
%%stata
gologit2 rep77 foreign,npl or

Generalized Ordered Logit Estimates                     Number of obs =     66
                                                        LR chi2(4)    =  15.24
                                                        Prob > chi2   = 0.0042
Log likelihood = -82.27372                              Pseudo R2     = 0.0848

------------------------------------------------------------------------------
       rep77 | Odds ratio   Std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
Poor         |
     foreign |   .9300305   1.166495    -0.06   0.954     .0795928    10.86727
       _cons |   21.50014   15.55202     4.24   0.000     5.208693    88.74704
-------------+----------------------------------------------------------------
Fair         |
     foreign |   3.453614   2.818944     1.52   0.129     .6974251    17.10213
       _cons |   2.750213   .9271033     3.00   0.003     1.420445    5.324862
-------------+----------------------------------------------------------------
Average      |
     foreign |   3.281111   1.804947     2.16   0.031     1.116279    9.644262
       _cons |   .4062893   .1336252    -2.74   0.006     .2132467    .7740847
-------------+----------------------------------------------------------------
Good         |
     foreign |   3.94e+07   6.55e+10     0.01   0.992            0           .
       _cons |   7.93e-09   .0000132    -0.01   0.991            0           .
------------------------------------------------------------------------------
Note: _cons estimates baseline odds.

当Proportional Odds假定不成立时

进口车(Foreign=1)和国产车(Foreign=0)比:

  • Odds(Excellent+Good+Average+Fair)/Odds(Poor)= 0.93
  • Odds(Excellent+Good+Average)/Odds(Fair+Poor)= 3.45
  • Odds(Excellent+Good)/Odds(Average+Fair+Poor)= 3.28
  • Odds(Excellent)/Odds(Good+Average+Fair+Poor)= 3.94*10^7

2.4 检查Proportional Odds假定是否成立

\(H_0\):Non-Proportional Odds 模型可以更好解释结局变量各个等级之间关系

%%stata
gologit2 rep77 foreign,pl or 
est store A
gologit2 rep77 foreign,npl or 
est store B
lrtest A B //Likelihood-ratio test

. gologit2 rep77 foreign,pl or 

Generalized Ordered Logit Estimates                     Number of obs =     66
                                                        LR chi2(1)    =   7.97
                                                        Prob > chi2   = 0.0047
Log likelihood = -85.908161                             Pseudo R2     = 0.0444

 ( 1)  [Poor]foreign - [Fair]foreign = 0
 ( 2)  [Fair]foreign - [Average]foreign = 0
 ( 3)  [Average]foreign - [Good]foreign = 0
------------------------------------------------------------------------------
       rep77 | Odds ratio   Std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
Poor         |
     foreign |   4.288247   2.276609     2.74   0.006      1.51489    12.13888
       _cons |   15.88797   9.514049     4.62   0.000     4.913051    51.37901
-------------+----------------------------------------------------------------
Fair         |
     foreign |   4.288247   2.276609     2.74   0.006      1.51489    12.13888
       _cons |   2.708406   .8714855     3.10   0.002     1.441525    5.088683
-------------+----------------------------------------------------------------
Average      |
     foreign |   4.288247   2.276609     2.74   0.006      1.51489    12.13888
       _cons |   .3896075   .1221964    -3.01   0.003     .2106962    .7204404
-------------+----------------------------------------------------------------
Good         |
     foreign |   4.288247   2.276609     2.74   0.006      1.51489    12.13888
       _cons |   .0440095   .0238675    -5.76   0.000     .0152026    .1274015
------------------------------------------------------------------------------
Note: _cons estimates baseline odds.

. est store A

. gologit2 rep77 foreign,npl or 

Generalized Ordered Logit Estimates                     Number of obs =     66
                                                        LR chi2(4)    =  15.24
                                                        Prob > chi2   = 0.0042
Log likelihood = -82.27372                              Pseudo R2     = 0.0848

------------------------------------------------------------------------------
       rep77 | Odds ratio   Std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
Poor         |
     foreign |   .9300305   1.166495    -0.06   0.954     .0795928    10.86727
       _cons |   21.50014   15.55202     4.24   0.000     5.208693    88.74704
-------------+----------------------------------------------------------------
Fair         |
     foreign |   3.453614   2.818944     1.52   0.129     .6974251    17.10213
       _cons |   2.750213   .9271033     3.00   0.003     1.420445    5.324862
-------------+----------------------------------------------------------------
Average      |
     foreign |   3.281111   1.804947     2.16   0.031     1.116279    9.644262
       _cons |   .4062893   .1336252    -2.74   0.006     .2132467    .7740847
-------------+----------------------------------------------------------------
Good         |
     foreign |   3.94e+07   6.55e+10     0.01   0.992            0           .
       _cons |   7.93e-09   .0000132    -0.01   0.991            0           .
------------------------------------------------------------------------------
Note: _cons estimates baseline odds.

. est store B

. lrtest A B //Likelihood-ratio test

Likelihood-ratio test
Assumption: A nested within B

 LR chi2(3) =   7.27
Prob > chi2 = 0.0638

. 

根据 Likelihood-ratio test 得出的结果,\(P=0.0638>0.05\),拒绝\(H_0\): Non-Proportional 0dds并没有更好解释结局变量各个等级之间关系。

3 无序多分类 Logistic 回归

  • 把结局变量的某个分类作为reference,然后比较结局变量其他分类相对于reference的相对风险(Relative Risk)

\[RR_j=Pr(cat=j)/Pr(reference\ cat)\] \[log(RR_j)=\beta_{0j}+\beta_{1j}X_1+\cdots +\beta_{pj}X_p\]

noticecatcategory 的缩写

4 有序和无序多分类比较

  1. 有序多分类 Logistic 回归:
    • \(RR_j=Pr(cat>j)/Pr(cat\leq j)\)
    • ologit y x_1 x_2 x_3 ...,or
  2. 无序多分类Logistic回归:
    • \(RR_j=Pr(cat=j)/Pr(reference\ cat)\)
    • mlogit y x_1 x_2x x_3...,rrr baseoutcome(j)

mlogitmulti logit 的缩写

baseoutcome(j) 如果不指定, Stata 会自动选择

%%stata
mlogit rep77 foreign,rrr baseoutcome(1)

Iteration 0:  Log likelihood = -89.895098  
Iteration 1:  Log likelihood = -85.605381  
Iteration 2:  Log likelihood = -82.670821  
Iteration 3:  Log likelihood = -82.335383  
Iteration 4:  Log likelihood =  -82.28077  
Iteration 5:  Log likelihood = -82.274431  
Iteration 6:  Log likelihood = -82.273851  
Iteration 7:  Log likelihood = -82.273742  
Iteration 8:  Log likelihood = -82.273725  
Iteration 9:  Log likelihood =  -82.27372  

Multinomial logistic regression                         Number of obs =     66
                                                        LR chi2(4)    =  15.24
                                                        Prob > chi2   = 0.0042
Log likelihood = -82.27372                              Pseudo R2     = 0.0848

------------------------------------------------------------------------------
       rep77 |        RRR   Std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
Poor         |  (base outcome)
-------------+----------------------------------------------------------------
Fair         |
     foreign |   .2000452   .3225721    -1.00   0.318     .0084834    4.717229
       _cons |   5.000509   3.873398     2.08   0.038     1.095653    22.82209
-------------+----------------------------------------------------------------
Average      |
     foreign |   .7001516   .9110327    -0.27   0.784      .054653    8.969536
       _cons |   10.00009   7.416364     3.10   0.002     2.337371    42.78389
-------------+----------------------------------------------------------------
Good         |
     foreign |   1.076972   1.412458     0.06   0.955     .0823847    14.07869
       _cons |   6.500016   4.937183     2.46   0.014     1.466803    28.80429
-------------+----------------------------------------------------------------
Excellent    |
     foreign |   1.32e+07   1.52e+10     0.01   0.989            0           .
       _cons |   3.79e-07   .0004353    -0.01   0.990            0           .
------------------------------------------------------------------------------
Note: _cons estimates baseline relative risk for each outcome.

进口车(Foreign=1)和国产车(Foreign=0)比:

  • Risk(Fair)/Risk(Poor)=0.20
  • Risk(Average)/Risk(Poor)=0.70
  • Risk(Good)/Risk(Poor)= 1.08
  • Risk(Excellent)/Risk(Poor)= 1.32*10^7

Risk(Excellent)/Risk(Poor)= 1.32*10^7,这个结果之所以如此大,是因为有一个 Excellent 样本是0,所以估计有偏