18-有序多分类Logistic回归

Ordinal Logistic Regression

作者

Simon Zhou

发布于

2025年5月8日

import stata_setup
stata_setup.config('C:/Program Files/Stata18', 'mp', splash=False)

1 多分类变量

多分类变量主要分为有序多分类和无序多分类变量

1.1 有许多分类变量

疾病分期;严重程度;发展阶段等

1.2 无序多分类变量

方位(东、南、西、北);品牌等

2 有序多分类

2.1 有序多分类的原理

  1. 将y变量的n个分类拆分成n-1个二分类Logistic回归
  2. 例子中的Excellent; Good; Average; Fair; Poor拆分成:

2.2 Proportional odds 假定

  1. 多个二元Logistic回归中,除了\(\beta_0\)以外的系数相等 \[Odds(Poor)/Odds(Excellent+Good+Average+Fair)\\ = Odds(Fair+Poor)/Odds(Excellent+Good+Average)\\ =Odds(Average+Fair+Poor)/Odds(Excellent+Good)\\ = Odds(Good+Average+Fair+Poor)/Odds(Excellent)\]
  2. Proportionalodds假定是否成立更多是由研究问题的自身性质决定,可以用数据进行检测,但数据本身可能有Bias
  3. 如果该假定不成立:当做无序多分类Logistic回归

3 导入数据

1977年汽车修理记录数据

%%stata
webuse fullauto.dta,clear
(Automobile models)

3.1 结局变量

outcome:车辆维修状况

%%stata
codebook rep77

-------------------------------------------------------------------------------
rep77                                                        Repair record 1977
-------------------------------------------------------------------------------

                  Type: Numeric (byte)
                 Label: repair

                 Range: [1,5]                         Units: 1
         Unique values: 5                         Missing .: 8/74

            Tabulation: Freq.   Numeric  Label
                            3         1  Poor
                           11         2  Fair
                           27         3  Average
                           20         4  Good
                            5         5  Excellent
                            8         .  

3.2 暴露变量

exposure:是否为进口车

%%stata
codebook foreign

-------------------------------------------------------------------------------
foreign                                                                 Foreign
-------------------------------------------------------------------------------

                  Type: Numeric (byte)
                 Label: foreign

                 Range: [0,1]                         Units: 1
         Unique values: 2                         Missing .: 0/74

            Tabulation: Freq.   Numeric  Label
                           52         0  Domestic
                           22         1  Foreign

4 卡方检验

\(H_0\):车辆是否为进口车和车辆维修状况没有关系

%%stata
tab foreign rep77,chi2

           |                   Repair record 1977
   Foreign |      Poor       Fair    Average       Good  Excellent |     Total
-----------+-------------------------------------------------------+----------
  Domestic |         2         10         20         13          0 |        45 
   Foreign |         1          1          7          7          5 |        21 
-----------+-------------------------------------------------------+----------
     Total |         3         11         27         20          5 |        66 

          Pearson chi2(4) =  13.8619   Pr = 0.008

\(P=0.008<0.05\),在\(\alpha=0.05\) 的检验水准下,拒绝零假设,得出结论:车辆是否为进口车和车辆维修状况有关系

5 有序Logistic回归

5.1 语法

ologit y x1 x2 x3 ...xn [if] [in] [weight] [,options]
  • 最常用的 [,options]or,他可以直接给出OR
  • Examples:
    • ologit rep77 foreign
    • ologit rep77 foreign, or
    • ologit rep77 foreign length mpg, or
%%stata
ologit rep77 foreign

Iteration 0:  Log likelihood = -89.895098  
Iteration 1:  Log likelihood = -85.951765  
Iteration 2:  Log likelihood = -85.908227  
Iteration 3:  Log likelihood = -85.908161  
Iteration 4:  Log likelihood = -85.908161  

Ordered logistic regression                             Number of obs =     66
                                                        LR chi2(1)    =   7.97
                                                        Prob > chi2   = 0.0047
Log likelihood = -85.908161                             Pseudo R2     = 0.0444

------------------------------------------------------------------------------
       rep77 | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
     foreign |   1.455878   .5308951     2.74   0.006     .4153425    2.496413
-------------+----------------------------------------------------------------
       /cut1 |  -2.765562   .5988208                     -3.939229   -1.591895
       /cut2 |  -.9963603   .3217706                     -1.627019   -.3657016
       /cut3 |   .9426153   .3136398                      .3278925    1.557338
       /cut4 |   3.123351   .5423257                      2.060412     4.18629
------------------------------------------------------------------------------

进口车(foreign=1)和国产车(foreign=0)比:

\[Odds=e^{-1.46}=0.23\]

  • 更高维修状况等级为reference,在更低维修状况的odds

也可以是:

\[Odds=e^{\beta}=e^{1.46}=4.29\]

  • 更低维修状况等级为reference,在更高维修状况的odds

进口(Foreign=1)车和国产车相比(Foreign=0),在“更低的车辆维修状况等级”的odds是在“更高维修状况等级”的0.23倍

一般使用如下假释:进口(Foreign=1)车和国产车相比(Foreign=0),在“更高的车辆维修状况等级”的odds是在“更低维修状况等级”的4.29倍

%%stata
ologit rep77 foreign,or

Iteration 0:  Log likelihood = -89.895098  
Iteration 1:  Log likelihood = -85.951765  
Iteration 2:  Log likelihood = -85.908227  
Iteration 3:  Log likelihood = -85.908161  
Iteration 4:  Log likelihood = -85.908161  

Ordered logistic regression                             Number of obs =     66
                                                        LR chi2(1)    =   7.97
                                                        Prob > chi2   = 0.0047
Log likelihood = -85.908161                             Pseudo R2     = 0.0444

------------------------------------------------------------------------------
       rep77 | Odds ratio   Std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
     foreign |   4.288246   2.276609     2.74   0.006      1.51489    12.13888
-------------+----------------------------------------------------------------
       /cut1 |  -2.765562   .5988208                     -3.939229   -1.591895
       /cut2 |  -.9963603   .3217706                     -1.627019   -.3657016
       /cut3 |   .9426153   .3136398                      .3278925    1.557338
       /cut4 |   3.123351   .5423257                      2.060412     4.18629
------------------------------------------------------------------------------
Note: Estimates are transformed only in the first equation to odds ratios.
%%stata
ologit rep77 foreign length mpg,or

Iteration 0:  Log likelihood = -89.895098  
Iteration 1:  Log likelihood = -78.775147  
Iteration 2:  Log likelihood = -78.254294  
Iteration 3:  Log likelihood = -78.250719  
Iteration 4:  Log likelihood = -78.250719  

Ordered logistic regression                             Number of obs =     66
                                                        LR chi2(3)    =  23.29
                                                        Prob > chi2   = 0.0000
Log likelihood = -78.250719                             Pseudo R2     = 0.1295

------------------------------------------------------------------------------
       rep77 | Odds ratio   Std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
     foreign |    18.1162   14.32342     3.66   0.000     3.846558    85.32223
      length |   1.086354    .024682     3.65   0.000      1.03904    1.135823
         mpg |   1.259567   .0887425     3.28   0.001     1.097109     1.44608
-------------+----------------------------------------------------------------
       /cut1 |   17.92748   5.551191                      7.047344    28.80761
       /cut2 |   19.86506    5.59648                      8.896161    30.83396
       /cut3 |   22.10331   5.708936                        10.914    33.29262
       /cut4 |   24.69213   5.890754                      13.14647     36.2378
------------------------------------------------------------------------------
Note: Estimates are transformed only in the first equation to odds ratios.
  • 在控制了汽车的长度、里程之后,进口车有着更高车辆维修状况等级的odds是国产车的18.12倍(95%CI:3.85,85.32)
  • 在控制了汽车的产地、里程之后,车辆每增加1 inch,有更高车辆维修状况的odds增加8.64%(95% CI: 3.90,13.58)
  • mpg