import stata_setup
'C:/Program Files/Stata18', 'mp', splash=False) stata_setup.config(
14-广义线性模型(GLM)
Generalized Linear Model
%%stata
sysuse auto.dta,clear
(1978 automobile data)
大于15%会产生Bias的这个问题,只有在我们想用odds去estimaterisk的时候才会发生。
- 在cohort study和clinicaltrial中,outcome是结局事件的发生率;
- 在cross-sectional中,outcome是结局事件的prevalence;
- 在case-control中,我们不报告risk,只报告odds,因此就没有这个问题了。
1 什么是广义线性模型
- 线性回归 Linear regression:
- \(exp Y = \beta_0 + \beta_1*X_1+\cdots + \beta_n *X_n\)
- 广义线性回归 Generalized linear model:
- \(f(exp Y)= β_0 + β_1*X_1 + \cdots + \beta_n* X_n\)
- Linear regression is a special case of GLM
- Same for logistic regression, Poisson regression, Log-Binomial regression
- Even for Cox regression (Aug.31st & Sep.7th), GEE model (Sep.14th)
- 总结:在GLM中,转化Y使得转化之后的Y和X们呈线性关系
2 Simple GLM vs. Multiple GLM
2.1 简单广义线性模型
\[f(exp Y)= β_0 + β_1*X_1\]
2.2 多元广义线性模型
\[f(exp Y)= β_0 + β_1*X_1 + \cdots + \beta_n* X_n\]
X变量可以是各种类型的变量(连续、分类)
3 Specify Options
3.1 Family
Y 变量的分布类型
3.2 Link
把 Y 怎么转化,才和 X 成linear关系
4 linear regression
4.1 Y var
- 连续分布(continuous variable)
- 如何分布? 高斯分布(Gaussian Distribution)
- Model:
- \[f(exp Y)= β_0 + β_1*X_1 + \cdots + \beta_n* X_n\]
- Y 怎么转化才和 X 成 linear 关系?
- 不用转化
- Link:Identity(恒等)
5 Logistic regression
5.1 Y var
- Binary variable
- 如何分布? 二项分布(Binomial)
- Model:
- \[ln[P(Y=1)/P(Y=0)]= β_0 + β_1*X_1+\cdots + β_n*X_n\]
- Exp Y = P(Y=1)
- Y 怎么转化才和X成linear关系?Link:logit
6 Poisson regression
6.1 Y var
- Count variable:整数(1,2,3,.n)
- 如何分布? 泊松分布(Poisson)
- Model:
- \[ln[risk of event)]= β_0 + β_1*X_1+\cdots + β_n*X_n\]
- Y 怎么转化才和X成linear关系?Link:Log
7 Log-binomial Regression
7.1 Y var
- Binary variable
- 如何分布? 二项分布(Binomial)
- Model:
- \[ln[P(Y=1)]= β_0 + β_1*X_1+\cdots + β_n*X_n\]
- Exp Y = P(Y=1)
- Y 怎么转化才和X成linear关系?Link:Log
%%stata
help glm
[R] glm -- Generalized linear models
(View complete PDF manual entry)
Syntax
------
glm depvar [indepvars] [if] [in] [weight] [, options]
options Description
-------------------------------------------------------------------------
Model
family(familyname) distribution of depvar; default is
family(gaussian)
link(linkname) link function; default is canonical link for
family() specified
Model 2
noconstant suppress constant term
exposure(varname) include ln(varname) in model with coefficient
constrained to 1
offset(varname) include varname in model with coefficient
constrained to 1
constraints(constraints) apply specified linear constraints
asis retain perfect predictor variables
mu(varname) use varname as the initial estimate for the
mean of depvar
init(varname) synonym for mu(varname)
SE/Robust
vce(vcetype) vcetype may be oim, robust, cluster clustvar,
eim, opg, bootstrap, jackknife, hac kernel,
jackknife1, or unbiased
vfactor(#) multiply variance matrix by scalar #
disp(#) quasilikelihood multiplier
scale(x2|dev|#) set the scale parameter
Reporting
level(#) set confidence level; default is level(95)
eform report exponentiated coefficients
nocnsreport do not display constraints
display_options control columns and column formats, row
spacing, line width, display of omitted
variables and base and empty cells, and
factor-variable labeling
Maximization
ml use maximum likelihood optimization; the
default
irls use iterated, reweighted least-squares
optimization of the deviance
maximize_options control the maximization process; seldom used
fisher(#) use the Fisher scoring Hessian or expected
information matrix (EIM)
search search for good starting values
noheader suppress header table from above coefficient
table
notable suppress coefficient table
nodisplay suppress the output; iteration log is still
displayed
collinear keep collinear variables
coeflegend display legend instead of statistics
-------------------------------------------------------------------------
familyname Description
-------------------------------------------------------------------------
gaussian Gaussian (normal)
igaussian inverse Gaussian
binomial[varnameN|#N] Bernoulli/binomial
poisson Poisson
nbinomial[#k|ml] negative binomial
gamma gamma
-------------------------------------------------------------------------
linkname Description
-------------------------------------------------------------------------
identity identity
log log
logit logit
probit probit
cloglog cloglog
power # power
opower # odds power
nbinomial negative binomial
loglog log-log
logc log-complement
-------------------------------------------------------------------------
indepvars may contain factor variables; see fvvarlist.
depvar and indepvars may contain time-series operators; see tsvarlist.
bayes, bootstrap, by, collect, fmm, fp, jackknife, mfp, mi estimate,
nestreg, rolling, statsby, stepwise, and svy are allowed; see prefix.
For more details, see [BAYES] bayes: glm and [FMM] fmm: glm.
vce(bootstrap), vce(jackknife), and vce(jackknife1) are not allowed with
the mi estimate prefix.
Weights are not allowed with the bootstrap prefix.
aweights are not allowed with the jackknife prefix.
vce(), vfactor(), disp(), scale(), irls, fisher(), noheader, notable,
nodisplay, and weights are not allowed with the svy prefix.
fweights, aweights, iweights, and pweights are allowed; see weight.
noheader, notable, nodisplay, collinear, and coeflegend do not appear in
the dialog box.
See [R] glm postestimation for features available after estimation.
Menu
----
Statistics > Generalized linear models > Generalized linear models (GLM)
Description
-----------
glm fits generalized linear models. It can fit models by using either
IRLS (maximum quasilikelihood) or Newton-Raphson (maximum likelihood)
optimization, which is the default.
See [U] 27 Overview of Stata estimation commands for a description of all
of Stata's estimation commands, several of which fit models that can also
be fit using glm.
Links to PDF documentation
--------------------------
Quick start
Remarks and examples
Methods and formulas
The above sections are not included in this help file.
Options
-------
+-------+
----+ Model +------------------------------------------------------------
family(familyname) specifies the distribution of depvar; family(gaussian)
is the default.
link(linkname) specifies the link function; the default is the canonical
link for the family() specified (except for family(nbinomial)).
+---------+
----+ Model 2 +----------------------------------------------------------
noconstant, exposure(varname), offset(varname), constraints(constraints);
see [R] Estimation options. constraints(constraints) is not allowed
with irls.
asis forces retention of perfect predictor variables and their
associated, perfectly predicted observations and may produce
instabilities in maximization; see [R] probit. This option is
allowed only with option family(binomial) with a denominator of 1.
mu(varname) specifies varname as the initial estimate for the mean of
depvar. This option can be useful with models that experience
convergence difficulties, such as family(binomial) models with power
or odds-power links. init(varname) is a synonym.
+-----------+
----+ SE/Robust +--------------------------------------------------------
vce(vcetype) specifies the type of standard error reported, which
includes types that are derived from asymptotic theory (oim, opg),
that are robust to some kinds of misspecification (robust), that
allow for intragroup correlation (cluster clustvar), and that use
bootstrap or jackknife methods (bootstrap, jackknife); see [R]
vce_option.
In addition to the standard vcetypes, glm allows the following
alternatives:
vce(eim) specifies that the EIM estimate of variance be used.
vce(jackknife1) specifies that the one-step jackknife estimate of
variance be used.
vce(hac kernel [#]) specifies that a heteroskedasticity- and
autocorrelation-consistent (HAC) variance estimate be used. HAC
refers to the general form for combining weighted matrices to
form the variance estimate. There are three kernels built into
glm. kernel is a user-written program or one of
nwest | gallant | anderson
# specifies the number of lags. If # is not specified, N - 2 is
assumed. If you wish to specify vce(hac ... ), you must tsset
your data before calling glm.
vce(unbiased) specifies that the unbiased sandwich estimate of
variance be used.
vfactor(#) specifies a scalar by which to multiply the resulting variance
matrix. This option allows you to match output with other packages,
which may apply degrees of freedom or other small-sample corrections
to estimates of variance.
disp(#) multiplies the variance of depvar by # and divides the deviance
by #. The resulting distributions are members of the quasilikelihood
family. This option is allowed only with option irls.
scale(x2|dev|#) overrides the default scale parameter. This option is
allowed only with Hessian (information matrix) variance estimates.
By default, scale(1) is assumed for the discrete distributions
(binomial, Poisson, and negative binomial), and scale(x2) is assumed
for the continuous distributions (Gaussian, gamma, and inverse
Gaussian).
scale(x2) specifies that the scale parameter be set to the Pearson
chi-squared (or generalized chi-squared) statistic divided by the
residual degrees of freedom, which is recommended by McCullagh and
Nelder (1989) as a good general choice for continuous distributions.
scale(dev) sets the scale parameter to the deviance divided by the
residual degrees of freedom. This option provides an alternative to
scale(x2) for continuous distributions and overdispersed or
underdispersed discrete distributions. This option is allowed only
with option irls.
scale(#) sets the scale parameter to #. For example, using scale(1)
in family(gamma) models results in exponential-errors regression.
Additional use of link(log) rather than the default link(power -1)
for family(gamma) essentially reproduces Stata's streg, dist(exp)
nohr command (see [ST] streg) if all the observations are uncensored.
+-----------+
----+ Reporting +--------------------------------------------------------
level(#); see [R] Estimation options.
eform displays the exponentiated coefficients and corresponding standard
errors and confidence intervals. For family(binomial) link(logit)
(that is, logistic regression), exponentiation results are odds
ratios; for family(nbinomial) link(log) (that is, negative binomial
regression) and for family(poisson) link(log) (that is, Poisson
regression), exponentiated coefficients are incidence-rate ratios.
nocnsreport; see [R] Estimation options.
display_options: noci, nopvalues, noomitted, vsquish, noemptycells,
baselevels, allbaselevels, nofvlabel, fvwrap(#), fvwrapon(style),
cformat(%fmt), pformat(%fmt), sformat(%fmt), and nolstretch; see [R]
Estimation options.
+--------------+
----+ Maximization +-----------------------------------------------------
ml requests that optimization be carried out using Stata's ml commands
and is the default.
irls requests iterated, reweighted least-squares (IRLS) optimization of
the deviance instead of Newton-Raphson optimization of the log
likelihood. If the irls option is not specified, the optimization is
carried out using Stata's ml commands, in which case all options of
ml maximize are also available.
maximize_options: difficult, technique(algorithm_spec), iterate(#),
[no]log, trace, gradient, showstep, hessian, showtolerance,
tolerance(#), ltolerance(#), nrtolerance(#), nonrtolerance, and
from(init_specs); see [R] Maximize. These options are seldom used.
Setting the optimization method to technique(bhhh) resets the default
vcetype to vce(opg).
If option irls is specified, only maximize_options iterate(), nolog,
trace, and ltolerance() are allowed. With irls specified, the
convergence criterion is satisfied when the absolute change in
deviance from one iteration to the next is less than or equal to
ltolerance(), where ltolerance(1e-6) is the default.
fisher(#) specifies the number of Newton-Raphson steps that should use
the Fisher scoring Hessian or EIM before switching to the observed
information matrix (OIM). This option is useful only for
Newton-Raphson optimization (and not when using irls).
search specifies that the command search for good starting values. This
option is useful only for Newton-Raphson optimization (and not when
using irls).
The following options are available with glm but are not shown in the
dialog box:
noheader suppresses the header information from the output. The
coefficient table is still displayed.
notable suppresses the table of coefficients from the output. The header
information is still displayed.
nodisplay suppresses the output. The iteration log is still displayed.
collinear, coeflegend; see [R] Estimation options. collinear is not
allowed with irls.
Remarks
-------
Although glm can be used to fit linear regression (and, in fact, does so
by default), this should be viewed as an instructional feature; regress
produces such estimates more quickly, and many postestimation commands
are available to explore the adequacy of the fit; see [R] regress and [R]
regress postestimation.
In any case, you should specify the link function by using the link()
option and specify the distributional family by using family(). The
available link functions are
Link function glm option
----------------------------------------
identity link(identity)
log link(log)
logit link(logit)
probit link(probit)
complementary log-log link(cloglog)
odds power link(opower #)
power link(power #)
negative binomial link(nbinomial)
log-log link(loglog)
log-complement link(logc)
The available distributional families are
Family glm option
----------------------------------------
Gaussian(normal) family(gaussian)
inverse Gaussian family(igaussian)
Bernoulli/binomial family(binomial)
Poisson family(poisson)
negative binomial family(nbinomial)
gamma family(gamma)
You do not have to specify both family() and link(); the default link()
is the canonical link for the specified family() (except for nbinomial):
Family Default link
--------------------------------------
family(gaussian) link(identity)
family(igaussian) link(power -2)
family(binomial) link(logit)
family(poisson) link(log)
family(nbinomial) link(log)
family(gamma) link(power -1)
If you specify both family() and link(), not all combinations make sense.
You may choose from the following combinations:
| id log logit probit clog pow opower nbinomial loglog logc
----------+-------------------------------------------------------------------
Gaussian | x x x
inv. Gau. | x x x
binomial | x x x x x x x x x
Poisson | x x x
neg. bin. | x x x x
gamma | x x x
Examples
--------
---------------------------------------------------------------------------
Setup
. webuse lbw
Generalized linear model with Bernoulli family and default logit link
. glm low age lwt i.race smoke ptl ht ui, family(binomial)
Replay results and report exponentiated coefficients
. glm, eform
---------------------------------------------------------------------------
Setup
. webuse ldose
Generalized linear model with binomial family and default logit link
. glm r ldose, family(binomial n)
Generalized linear model with binomial family and cloglog link
. glm r ldose, family(binomial n) link(cloglog)
---------------------------------------------------------------------------
Setup
. webuse beetle
Generalized linear model with binomial family and cloglog link
. glm r i.beetle ldose, family(binomial n) link(cloglog)
Replay results with 99% confidence intervals
. glm, level(99)
---------------------------------------------------------------------------
Stored results
--------------
glm, ml stores the following in e():
Scalars
e(N) number of observations
e(k) number of parameters
e(k_eq) number of equations in e(b)
e(k_eq_model) number of equations in overall model test
e(k_dv) number of dependent variables
e(df_m) model degrees of freedom
e(df) residual degrees of freedom
e(phi) scale parameter
e(aic) model AIC
e(bic) model BIC
e(ll) log likelihood, if NR
e(N_clust) number of clusters
e(chi2) chi-squared
e(p) p-value for model test
e(deviance) deviance
e(deviance_s) scaled deviance
e(deviance_p) Pearson deviance
e(deviance_ps) scaled Pearson deviance
e(dispers) dispersion
e(dispers_s) scaled dispersion
e(dispers_p) Pearson dispersion
e(dispers_ps) scaled Pearson dispersion
e(nbml) 1 if negative binomial parameter estimated via
ML, 0 otherwise
e(vf) factor set by vfactor(), 1 if not set
e(power) power set by link(power #) or link(opower #)
e(rank) rank of e(V)
e(ic) number of iterations
e(rc) return code
e(converged) 1 if converged, 0 otherwise
Macros
e(cmd) glm
e(cmdline) command as typed
e(depvar) name of dependent variable
e(varfunc) program to calculate variance function
e(varfunct) variance title
e(varfuncf) variance function
e(link) program to calculate link function
e(linkt) link title
e(linkf) link function
e(m) number of binomial trials
e(wtype) weight type
e(wexp) weight expression
e(title) title in estimation output
e(clustvar) name of cluster variable
e(offset) linear offset variable
e(chi2type) Wald; type of model chi-squared test
e(cons) noconstant, if specified
e(hac_kernel) HAC kernel
e(hac_lag) HAC lag
e(vce) vcetype specified in vce()
e(vcetype) title used to label Std. err.
e(opt) ml or irls
e(opt1) optimization title, line 1
e(opt2) optimization title, line 2
e(which) max or min; whether optimizer is to perform
maximization or minimization
e(ml_method) type of ml method
e(user) name of likelihood-evaluator program
e(technique) maximization technique
e(properties) b V
e(predict) program used to implement predict
e(marginsok) predictions allowed by margins
e(marginsnotok) predictions disallowed by margins
e(asbalanced) factor variables fvset as asbalanced
e(asobserved) factor variables fvset as asobserved
Matrices
e(b) coefficient vector
e(Cns) constraints matrix
e(ilog) iteration log (up to 20 iterations)
e(gradient) gradient vector
e(V) variance-covariance matrix of the estimators
e(V_modelbased) model-based variance
Functions
e(sample) marks estimation sample
In addition to the above, the following is stored in r():
Matrices
r(table) matrix containing the coefficients with their
standard errors, test statistics, p-values,
and confidence intervals
Note that results stored in r() are updated when the command is replayed
and will be replaced when any r-class command is run after the estimation
command.
glm, irls stores the following in e():
Scalars
e(N) number of observations
e(k) number of parameters
e(k_eq_model) number of equations in overall model test
e(df_m) model degrees of freedom
e(df) residual degrees of freedom
e(phi) scale parameter
e(disp) dispersion parameter
e(bic) model BIC
e(N_clust) number of clusters
e(deviance) deviance
e(deviance_s) scaled deviance
e(deviance_p) Pearson deviance
e(deviance_ps) scaled Pearson deviance
e(dispers) dispersion
e(dispers_s) scaled dispersion
e(dispers_p) Pearson dispersion
e(dispers_ps) scaled Pearson dispersion
e(nbml) 1 if negative binomial parameter estimated via
ML, 0 otherwise
e(vf) factor set by vfactor(), 1 if not set
e(power) power set by link(power #) or link(opower #)
e(rank) rank of e(V)
e(rc) return code
Macros
e(cmd) glm
e(cmdline) command as typed
e(depvar) name of dependent variable
e(varfunc) program to calculate variance function
e(varfunct) variance title
e(varfuncf) variance function
e(link) program to calculate link function
e(linkt) link title
e(linkf) link function
e(m) number of binomial trials
e(wtype) weight type
e(wexp) weight expression
e(clustvar) name of cluster variable
e(offset) linear offset variable
e(cons) noconstant, if specified
e(hac_kernel) HAC kernel
e(hac_lag) HAC lag
e(vce) vcetype specified in vce()
e(vcetype) title used to label Std. err.
e(opt) ml or irls
e(opt1) optimization title, line 1
e(opt2) optimization title, line 2
e(properties) b V
e(predict) program used to implement predict
e(marginsok) predictions allowed by margins
e(marginsnotok) predictions disallowed by margins
e(asbalanced) factor variables fvset as asbalanced
e(asobserved) factor variables fvset as asobserved
Matrices
e(b) coefficient vector
e(V) variance-covariance matrix of the estimators
e(V_modelbased) model-based variance
Functions
e(sample) marks estimation sample
In addition to the above, the following is stored in r():
Matrices
r(table) matrix containing the coefficients with their
standard errors, test statistics, p-values,
and confidence intervals
Note that results stored in r() are updated when the command is replayed
and will be replaced when any r-class command is run after the estimation
command.
Reference
---------
McCullagh, P., and J. A. Nelder. 1989. Generalized Linear Models. 2nd
ed. London: Chapman & Hall/CRC.
glm command in Stata - Umbrella Command - linear regression: family(gaussian) link(identity) - logistic regression: family(binomial) link(logit) - poisson regression: family(poisson) link(log) - log-binomial regression: family(binomial) link(log)
加粗部分表示,在实际代码中可以简写为加粗的字母或单词即可
8 代码的转换
8.1 Linear regression:
regress price weight length mpg i.rep78
glm price weight length mpg i.rep78, family(gaussian)link(identity)
8.2 Logistic regression:
logistic low age i.smoke i.race
glm low age i.smoke i.race,family(binomial) link(logit)
glm low age i.smoke i.race,family(binomial) link(logit) eform
eform
直接输出 OR 值,即 \(e^{\beta}\)
9 线性回归模型的比较
%%stata
regress price weight length mpg i.rep78
Source | SS df MS Number of obs = 69
-------------+---------------------------------- F(7, 61) = 7.25
Model | 262008114 7 37429730.6 Prob > F = 0.0000
Residual | 314788844 61 5160472.86 R-squared = 0.4542
-------------+---------------------------------- Adj R-squared = 0.3916
Total | 576796959 68 8482308.22 Root MSE = 2271.7
------------------------------------------------------------------------------
price | Coefficient Std. err. t P>|t| [95% conf. interval]
-------------+----------------------------------------------------------------
weight | 5.186695 1.163383 4.46 0.000 2.860367 7.513022
length | -124.1544 40.07637 -3.10 0.003 -204.292 -44.01671
mpg | -126.8367 84.49819 -1.50 0.138 -295.8012 42.12791
|
rep78 |
2 | 1137.284 1803.332 0.63 0.531 -2468.701 4743.269
3 | 1254.642 1661.545 0.76 0.453 -2067.823 4577.108
4 | 2267.188 1698.018 1.34 0.187 -1128.208 5662.584
5 | 3850.759 1787.272 2.15 0.035 276.8886 7424.63
|
_cons | 14614.49 6155.842 2.37 0.021 2305.125 26923.86
------------------------------------------------------------------------------
%%stata
glm price weight length mpg i.rep78, family(gaussian)link(identity)
Iteration 0: Log likelihood = -626.90582
Generalized linear models Number of obs = 69
Optimization : ML Residual df = 61
Scale parameter = 5160473
Deviance = 314788844.4 (1/df) Deviance = 5160473
Pearson = 314788844.4 (1/df) Pearson = 5160473
Variance function: V(u) = 1 [Gaussian]
Link function : g(u) = u [Identity]
AIC = 18.40307
Log likelihood = -626.9058204 BIC = 3.15e+08
------------------------------------------------------------------------------
| OIM
price | Coefficient std. err. z P>|z| [95% conf. interval]
-------------+----------------------------------------------------------------
weight | 5.186695 1.163383 4.46 0.000 2.906506 7.466883
length | -124.1544 40.07637 -3.10 0.002 -202.7026 -45.60612
mpg | -126.8367 84.49819 -1.50 0.133 -292.4501 38.77675
|
rep78 |
2 | 1137.284 1803.332 0.63 0.528 -2397.182 4671.75
3 | 1254.642 1661.545 0.76 0.450 -2001.927 4511.211
4 | 2267.188 1698.018 1.34 0.182 -1060.866 5595.241
5 | 3850.759 1787.272 2.15 0.031 347.7711 7353.747
|
_cons | 14614.49 6155.842 2.37 0.018 2549.263 26679.72
------------------------------------------------------------------------------
10 logistic 回归的比较
重新导入数据:
%%stata
webuse lbw,clear
(Hosmer & Lemeshow data)
%%stata
logistic low age i.smoke i.race
Logistic regression Number of obs = 189
LR chi2(4) = 15.81
Prob > chi2 = 0.0033
Log likelihood = -109.4311 Pseudo R2 = 0.0674
------------------------------------------------------------------------------
low | Odds ratio Std. err. z P>|z| [95% conf. interval]
-------------+----------------------------------------------------------------
age | .9657186 .0322573 -1.04 0.296 .9045206 1.031057
|
smoke |
Smoker | 3.00582 1.118001 2.96 0.003 1.449982 6.231081
|
race |
Black | 2.749483 1.356659 2.05 0.040 1.045318 7.231924
Other | 2.876948 1.167921 2.60 0.009 1.298314 6.375062
|
_cons | .365111 .3146026 -1.17 0.242 .0674491 1.976395
------------------------------------------------------------------------------
Note: _cons estimates baseline odds.
%%stata
glm low age i.smoke i.race,family(binomial) link(logit) eform
Iteration 0: Log likelihood = -109.53148
Iteration 1: Log likelihood = -109.43111
Iteration 2: Log likelihood = -109.4311
Iteration 3: Log likelihood = -109.4311
Generalized linear models Number of obs = 189
Optimization : ML Residual df = 184
Scale parameter = 1
Deviance = 218.8621974 (1/df) Deviance = 1.189468
Pearson = 182.9642078 (1/df) Pearson = .9943707
Variance function: V(u) = u*(1-u) [Bernoulli]
Link function : g(u) = ln(u/(1-u)) [Logit]
AIC = 1.210911
Log likelihood = -109.4310987 BIC = -745.6193
------------------------------------------------------------------------------
| OIM
low | Odds ratio std. err. z P>|z| [95% conf. interval]
-------------+----------------------------------------------------------------
age | .9657186 .0322573 -1.04 0.296 .9045206 1.031057
|
smoke |
Smoker | 3.00582 1.118001 2.96 0.003 1.449982 6.231081
|
race |
Black | 2.749483 1.356659 2.05 0.040 1.045318 7.231924
Other | 2.876948 1.167921 2.60 0.009 1.298314 6.375062
|
_cons | .365111 .3146026 -1.17 0.242 .0674491 1.976395
------------------------------------------------------------------------------
Note: _cons estimates baseline odds.
11 Log-Binomial Model
glm low age i.smoke i.race,family(binomial) link(log)
- Logistic regression 的一种替代
- Fail to converge
- Poisson regression with robust variance estimate
为了避免出现 Fail to converge 的问题,采用稳健方差估计:
glm low age i.smoke i.race,family(poisson) link(log) robust
robust
可以缩写为r
%%stata
glm low age i.smoke i.race,family(poisson) link(log) robust
Iteration 0: Log pseudolikelihood = -124.55888
Iteration 1: Log pseudolikelihood = -122.39663
Iteration 2: Log pseudolikelihood = -122.39591
Iteration 3: Log pseudolikelihood = -122.39591
Generalized linear models Number of obs = 189
Optimization : ML Residual df = 184
Scale parameter = 1
Deviance = 126.791811 (1/df) Deviance = .6890859
Pearson = 124.4629927 (1/df) Pearson = .6764293
Variance function: V(u) = u [Poisson]
Link function : g(u) = ln(u) [Log]
AIC = 1.348105
Log pseudolikelihood = -122.3959055 BIC = -837.6896
------------------------------------------------------------------------------
| Robust
low | Coefficient std. err. z P>|z| [95% conf. interval]
-------------+----------------------------------------------------------------
age | -.0246781 .0203034 -1.22 0.224 -.0644722 .0151159
|
smoke |
Smoker | .6972155 .211848 3.29 0.001 .2820011 1.11243
|
race |
Black | .639629 .2818353 2.27 0.023 .0872419 1.192016
Other | .6813692 .2428128 2.81 0.005 .2054649 1.157273
|
_cons | -1.286544 .5405667 -2.38 0.017 -2.346035 -.2270526
------------------------------------------------------------------------------