import stata_setup
'C:/Program Files/Stata18', 'mp', splash=False) stata_setup.config(
24-双重差分(DID)
双重差分模型与示例
1 什么是 DID
双重差分回归 (DID) 用于评估一个事件的因果效应,其方法是比较事件发生的单元集合(处理组)与事件未发生的单元集合(控制组)。
DID 背后的逻辑是,如果事件从未发生,处理组和控制组之间的差异应该随着时间的推移保持不变。
DID 通过比较处理组和控制组在事件发生前后的差异来估计事件的因果效应。
DID 法是一种无法随机分配样本情况下的替代方法,主要应用于区域行的策略评估问题。
目标:获取相对同质的策略组和控制组,这个“相对”是指除策略影响外,策略组和控制组的结果变量随时间的变化存在一个基本固定的差异。
对于相对同质的策略组和控制组,DID法通过第一次的差分消除这个基本固定的差异,通过第二次的差分消除时间趋势的影响,评估策略带来的实际效应。
从DID 法的目标中可知,该方法面对的实验数据是面板数据(多个时间点的截面数据组成面板数据),即在策略干预时间点前,至少有两个时间点的数据。
\[ y = \alpha_0 +\alpha_1g +\alpha_2T + \alpha_3gT + \epsilon \] \(\alpha_0\)为常数项,\(\alpha_1\)为处理组和控制组的差异,\(\alpha_2\)为时间效应,\(\epsilon\)为误差项。 \(\alpha_3\)为交互项的系数,表示处理组和控制组在事件发生前后的差异。
其中,\(y\)为结果变量,\(g\)为处理组和控制组的虚拟变量,\(T\)为时间虚拟变量,\(gT\)为交互项。 \(\alpha_3\)为DID估计量,表示处理组和控制组在事件发生前后的差异。
DID 模型的有效性检验
为了保证该模型的有效性,在试验设计时需要满足平行趋势假设:在事件发生前,处理组和控制组的结果变量随时间的变化存在一个基本固定的差异。
平行趋势,即策略组和控制组在干预前保持相同的变化趋势。
3种常见的平行趋势的检验方法:
- 画图法:画出处理组和控制组在事件发生前后的结果变量的变化趋势图,观察两组的变化趋势是否平行。
- 统计检验法:使用t检验或F检验等统计方法,检验处理组和控制组在事件发生前的结果变量的差异是否显著。
- 伪DID法:在事件发生前,随机选择一个时间点,将处理组和控制组的结果变量进行差分,检验差分后的结果变量是否显著。
2 导入数据
使用 Princeton University 提供的示例程序与数据集,具体参见:Differences‐in‐Differences (using Stata)
%%stata
"http://dss.princeton.edu/training/Panel101.dta", clear
use * 查看数据随机十个数据
list in 1/10
. use "http://dss.princeton.edu/training/Panel101.dta", clear
. * 查看数据随机十个数据
. list in 1/10
+------------------------------------------------------------------------+
1. | country | year | y | y_bin | x1 | x2 | x3 |
| A | 1990 | 1.343e+09 | 1 | .2779036 | -1.107956 | .2825536 |
|------------------------------------------------------------------------|
| opinion | op |
| Str agree | 1 |
+------------------------------------------------------------------------+
+------------------------------------------------------------------------+
2. | country | year | y | y_bin | x1 | x2 | x3 |
| A | 1991 | -1.900e+09 | 0 | .3206847 | -.94872 | .4925385 |
|------------------------------------------------------------------------|
| opinion | op |
| Disag | 0 |
+------------------------------------------------------------------------+
+------------------------------------------------------------------------+
3. | country | year | y | y_bin | x1 | x2 | x3 |
| A | 1992 | -11234363 | 0 | .3634657 | -.789484 | .7025234 |
|------------------------------------------------------------------------|
| opinion | op |
| Disag | 0 |
+------------------------------------------------------------------------+
+------------------------------------------------------------------------+
4. | country | year | y | y_bin | x1 | x2 | x3 |
| A | 1993 | 2.646e+09 | 1 | .246144 | -.885533 | -.0943909 |
|------------------------------------------------------------------------|
| opinion | op |
| Disag | 0 |
+------------------------------------------------------------------------+
+------------------------------------------------------------------------+
5. | country | year | y | y_bin | x1 | x2 | x3 |
| A | 1994 | 3.008e+09 | 1 | .424623 | -.7297683 | .9461306 |
|------------------------------------------------------------------------|
| opinion | op |
| Disag | 0 |
+------------------------------------------------------------------------+
+------------------------------------------------------------------------+
6. | country | year | y | y_bin | x1 | x2 | x3 |
| A | 1995 | 3.230e+09 | 1 | .4772141 | -.723246 | 1.02968 |
|------------------------------------------------------------------------|
| opinion | op |
| Str agree | 1 |
+------------------------------------------------------------------------+
+------------------------------------------------------------------------+
7. | country | year | y | y_bin | x1 | x2 | x3 |
| A | 1996 | 2.757e+09 | 1 | .499805 | -.7815716 | 1.092288 |
|------------------------------------------------------------------------|
| opinion | op |
| Disag | 0 |
+------------------------------------------------------------------------+
+------------------------------------------------------------------------+
8. | country | year | y | y_bin | x1 | x2 | x3 |
| A | 1997 | 2.772e+09 | 1 | .0516284 | -.7048455 | 1.415901 |
|------------------------------------------------------------------------|
| opinion | op |
| Str agree | 1 |
+------------------------------------------------------------------------+
+------------------------------------------------------------------------+
9. | country | year | y | y_bin | x1 | x2 | x3 |
| A | 1998 | 3.397e+09 | 1 | .3664108 | -.6983712 | 1.548723 |
|------------------------------------------------------------------------|
| opinion | op |
| Disag | 0 |
+------------------------------------------------------------------------+
+------------------------------------------------------------------------+
10. | country | year | y | y_bin | x1 | x2 | x3 |
| A | 1999 | 39770336 | 1 | .3958425 | -.643154 | 1.794198 |
|------------------------------------------------------------------------|
| opinion | op |
| Str disag | 0 |
+------------------------------------------------------------------------+
.
3 创建变量
3.1 创建时间虚拟变量
创建一个虚拟变量来指示治疗开始的时间。假设治疗始于 1994 年。在这种情况下,1994 年之前的值为 0,1994 年之后的值为 1。如果您已经创建了虚拟变量,请跳过此步骤。
%%stata
= (year>=1994) & !missing(year) gen time
3.2 创建治疗虚拟变量
创建一个虚拟变量来标识接受治疗的组。在本例中,假设代码为 5、6 和 7 的国家/地区接受了治疗 (=1)。代码为 1-4 的国家/地区未接受治疗 (=0)。如果您已经创建了虚拟变量,请跳过此步骤。
%%stata
= (country>4) & !missing(country) gen treated
4 创建交互项
在时间和治疗之间创建交互。我们将此交互称为 “did”
%%stata
= time*treated gen did
5 估计 DID 估计量
%%stata
reg y time treated did, r
Linear regression Number of obs = 70
F(3, 66) = 2.17
Prob > F = 0.0998
R-squared = 0.0827
Root MSE = 3.0e+09
------------------------------------------------------------------------------
| Robust
y | Coefficient std. err. t P>|t| [95% conf. interval]
-------------+----------------------------------------------------------------
time | 2.29e+09 9.00e+08 2.54 0.013 4.92e+08 4.09e+09
treated | 1.78e+09 1.05e+09 1.70 0.094 -3.11e+08 3.86e+09
did | -2.52e+09 1.45e+09 -1.73 0.088 -5.42e+09 3.81e+08
_cons | 3.58e+08 7.61e+08 0.47 0.640 -1.16e+09 1.88e+09
------------------------------------------------------------------------------
did
的系数是 双重差分 的估计量。效果在置信水准为 10% 时显著,且治疗措施产生了负面影响。
估计 DID 中的差分估计量(使用 #
方法,无需生成交互项)
%%stata
##treated, r reg y time
Linear regression Number of obs = 70
F(3, 66) = 2.17
Prob > F = 0.0998
R-squared = 0.0827
Root MSE = 3.0e+09
------------------------------------------------------------------------------
| Robust
y | Coefficient std. err. t P>|t| [95% conf. interval]
-------------+----------------------------------------------------------------
1.time | 2.29e+09 9.00e+08 2.54 0.013 4.92e+08 4.09e+09
1.treated | 1.78e+09 1.05e+09 1.70 0.094 -3.11e+08 3.86e+09
|
time#treated |
1 1 | -2.52e+09 1.45e+09 -1.73 0.088 -5.42e+09 3.81e+08
|
_cons | 3.58e+08 7.61e+08 0.47 0.640 -1.16e+09 1.88e+09
------------------------------------------------------------------------------
变量 time#treated
的系数即为 DID 估计量(对应前述示例中的 did
)。该估计在10%的显著性水平下显著,显示处理措施产生了负向影响
6 使用 diff 命令
diff
由外部宏包提供,需要用户进行安装才可以使用:
%%stata
ssc install diff
checking diff consistency and verifying not already installed...
installing into C:\Users\asus\ado\plus\...
installation complete.
%%stata
diff y, t(treated) p(time)
DIFFERENCE-IN-DIFFERENCES ESTIMATION RESULTS
--------------------------------------------
Number of observations in the DIFF-IN-DIFF: 70
Before After
Control: 16 24 40
Treated: 12 18 30
28 42
--------------------------------------------------------
Outcome var. | y | S. Err. | |t| | P>|t|
----------------+---------+---------+---------+---------
Before | | | |
Control | 3.6e+08| | |
Treated | 2.1e+09| | |
Diff (T-C) | 1.8e+09| 1.1e+09| 1.58 | 0.120
After | | | |
Control | 2.6e+09| | |
Treated | 1.9e+09| | |
Diff (T-C) | -7.4e+08| 9.2e+08| 0.81 | 0.422
| | | |
Diff-in-Diff | -2.5e+09| 1.5e+09| 1.73 | 0.088*
--------------------------------------------------------
R-square: 0.08
* Means and Standard Errors are estimated by linear regression
**Inference: *** p<0.01; ** p<0.05; * p<0.1
使用 help diff
查看更多的细节和选项。
7 双重差分
8 前后虚拟变量
8.1 创建一个指示变量
- 0 :表示事件发生之前的时间
- 1 :表示事件发生之时及其之后的时间
8.2 导入数据
%%stata
"C:\Users\asus\Desktop\R\quarto\Med-Stat-Notes\Data\WDI.dta", clear use
- 虚拟事件 X 在 2009 年发生,影响所有国家
- 构建一个事件前后虚拟变量:0 表示事件发生前,1 表示事件发生后
%%stata
= (year >= 2009) if !missing(year) gen after
8.3 检查数据类型
%%stata
tab year after
| after
Time | 0 1 | Total
-----------+----------------------+----------
2000 | 126 0 | 126
2001 | 126 0 | 126
2002 | 126 0 | 126
2003 | 126 0 | 126
2004 | 126 0 | 126
2005 | 126 0 | 126
2006 | 126 0 | 126
2007 | 126 0 | 126
2008 | 126 0 | 126
2009 | 0 126 | 126
2010 | 0 126 | 126
2011 | 0 126 | 126
2012 | 0 126 | 126
2013 | 0 126 | 126
2014 | 0 126 | 126
2015 | 0 126 | 126
2016 | 0 126 | 126
2017 | 0 126 | 126
2018 | 0 126 | 126
2019 | 0 126 | 126
2020 | 0 126 | 126
2021 | 0 126 | 126
-----------+----------------------+----------
Total | 1,134 1,638 | 2,772
9 干预变量
创建一个指示变量,用于识别接受处理的观测单位,其中:
- 0 表示从未接受处理的单位,例如,从未实施相关政策的州;
- 1 表示接受过处理的单位,例如,实施过相关政策的州。
例如,若 abc
、xyz
和 cgi
三个州属于处理组,且州名称为字符串格式,则可按如下方式创建处理变量:
gen treated = (state == "abc" | /// state == "xyz" | ///
"cgi") if !missing(state) state ==
在本示例中,接受处理的国家已另存为一个虚拟的 Stata 数据集,其中包含一个名为 treated
、取值为 1
的变量。
接下来我们将该文件合并,以便在主数据集中获得处理变量。
%%stata
1 country using "C:\Users\asus\Desktop\R\quarto\Med-Stat-Notes\Data\Treated.dta", gen(merge1) merge m:
Result Number of obs
-----------------------------------------
Not matched 1,276
from master 1,276 (merge1==1)
from using 0 (merge1==2)
Matched 1,496 (merge1==3)
-----------------------------------------
未处理的单位将显示为缺失值(.
)
%%stata
= 0 if treated == . replace treated
(1,276 real changes made)
查看数据
%%stata
tab country treated
| treated
Country Name | 0 1 | Total
----------------------+----------------------+----------
Albania | 0 22 | 22
Algeria | 22 0 | 22
Argentina | 22 0 | 22
Armenia | 0 22 | 22
Australia | 22 0 | 22
Austria | 22 0 | 22
Bahamas, The | 22 0 | 22
Bangladesh | 0 22 | 22
Belarus | 0 22 | 22
Belgium | 22 0 | 22
Belize | 22 0 | 22
Benin | 22 0 | 22
Bhutan | 0 22 | 22
Bolivia | 0 22 | 22
Bosnia and Herzegov.. | 0 22 | 22
Botswana | 22 0 | 22
Brazil | 22 0 | 22
Brunei Darussalam | 0 22 | 22
Bulgaria | 0 22 | 22
Burundi | 0 22 | 22
Cambodia | 0 22 | 22
Cameroon | 0 22 | 22
Canada | 22 0 | 22
Chile | 0 22 | 22
Colombia | 0 22 | 22
Comoros | 22 0 | 22
Congo, Dem. Rep. | 22 0 | 22
Congo, Rep. | 22 0 | 22
Costa Rica | 0 22 | 22
Cote d'Ivoire | 22 0 | 22
Croatia | 0 22 | 22
Cuba | 22 0 | 22
Cyprus | 0 22 | 22
Czechia | 0 22 | 22
Denmark | 22 0 | 22
Dominican Republic | 0 22 | 22
Ecuador | 22 0 | 22
Egypt, Arab Rep. | 0 22 | 22
El Salvador | 0 22 | 22
Estonia | 0 22 | 22
Eswatini | 0 22 | 22
Finland | 22 0 | 22
France | 22 0 | 22
Gabon | 22 0 | 22
Germany | 22 0 | 22
Greece | 0 22 | 22
Guatemala | 0 22 | 22
Haiti | 22 0 | 22
Honduras | 22 0 | 22
Hong Kong SAR, China | 0 22 | 22
Hungary | 22 0 | 22
Iceland | 0 22 | 22
India | 22 0 | 22
Indonesia | 0 22 | 22
Iran, Islamic Rep. | 0 22 | 22
Ireland | 0 22 | 22
Israel | 22 0 | 22
Italy | 22 0 | 22
Japan | 0 22 | 22
Jordan | 22 0 | 22
Kazakhstan | 0 22 | 22
Kenya | 22 0 | 22
Korea, Rep. | 22 0 | 22
Kyrgyz Republic | 22 0 | 22
Latvia | 22 0 | 22
Lebanon | 22 0 | 22
Lithuania | 0 22 | 22
Luxembourg | 0 22 | 22
Macao SAR, China | 0 22 | 22
Madagascar | 0 22 | 22
Malaysia | 0 22 | 22
Mali | 0 22 | 22
Malta | 22 0 | 22
Mauritania | 0 22 | 22
Mauritius | 22 0 | 22
Mexico | 0 22 | 22
Moldova | 0 22 | 22
Morocco | 0 22 | 22
Mozambique | 0 22 | 22
Namibia | 22 0 | 22
Netherlands | 0 22 | 22
New Zealand | 0 22 | 22
Nicaragua | 0 22 | 22
Niger | 22 0 | 22
Nigeria | 22 0 | 22
North Macedonia | 0 22 | 22
Norway | 22 0 | 22
Oman | 0 22 | 22
Pakistan | 22 0 | 22
Panama | 0 22 | 22
Paraguay | 0 22 | 22
Peru | 22 0 | 22
Philippines | 22 0 | 22
Poland | 0 22 | 22
Portugal | 22 0 | 22
Romania | 0 22 | 22
Russian Federation | 22 0 | 22
Rwanda | 0 22 | 22
Saudi Arabia | 0 22 | 22
Senegal | 22 0 | 22
Serbia | 22 0 | 22
Sierra Leone | 22 0 | 22
Singapore | 0 22 | 22
Slovak Republic | 22 0 | 22
Slovenia | 0 22 | 22
Solomon Islands | 22 0 | 22
South Africa | 22 0 | 22
Spain | 0 22 | 22
Sri Lanka | 0 22 | 22
Sudan | 0 22 | 22
Sweden | 0 22 | 22
Switzerland | 0 22 | 22
Tanzania | 22 0 | 22
Thailand | 22 0 | 22
Timor-Leste | 22 0 | 22
Tunisia | 0 22 | 22
Turkiye | 0 22 | 22
Uganda | 0 22 | 22
Ukraine | 0 22 | 22
United Kingdom | 22 0 | 22
United States | 0 22 | 22
Uruguay | 0 22 | 22
Uzbekistan | 22 0 | 22
Vietnam | 0 22 | 22
West Bank and Gaza | 22 0 | 22
Zimbabwe | 22 0 | 22
----------------------+----------------------+----------
Total | 1,276 1,496 | 2,772
10 DID 指示变量
DID(双重差分)指示变量是”处理变量”和”事件前后变量”的交互项。
在本示例中,我们将处理变量命名为 treated
,事件前后变量命名为 after
(可根据自己的数据替换变量名)。
下面创建 DID 指示变量:
%%stata
= after * treated gen did
创建一个带标签的数值型变量,用作分组或面板变量。这是为了让 Stata 的相关命令能够识别数据中的面板结构
%%stata
encode country, gen(country1)
将数据设置为面板数据格式(仅适用于以 xt
开头的命令)
%%stata
xtset country1 year
Panel variable: country1 (strongly balanced)
Time variable: year, 2000 to 2021
Delta: 1 unit
事件在所有处理组中同时发生
使用 Stata 的 xtdidregress
或 didregress
命令进行差分中的差分分析
- 如果是面板数据,使用
xtdidregress
; - 如果是重复截面数据(即不同时点的抽样调查),使用
didregress
- 面板数据(panel data):同一组单位在多个时间点被观察(如同一批国家或个人被多次追踪)
- 重复截面数据(repeated cross-sectional data):每个时间点抽取不同的单位样本(如每年全国抽样调查)
11 使用 Stata 的 xtdidregress
命令
仅适用于 Stata 17 及以上版本(手动估计方法见后续)。
如需查看该命令的详细说明与示例,请输入:help xtdidregress
%%stata
xtdidregress (gdppc) (did), group(country1) time(year)
Treatment and time information
Time variable: year
Control: did = 0
Treatment: did = 1
-----------------------------------
| Control Treatment
-------------+---------------------
Group |
country1 | 58 68
-------------+---------------------
Time |
Minimum | 2000 2009
Maximum | 2000 2009
-----------------------------------
Difference-in-differences regression Number of obs = 2,772
Data type: Longitudinal
(Std. err. adjusted for 126 clusters in country1)
------------------------------------------------------------------------------
| Robust
gdppc | Coefficient std. err. t P>|t| [95% conf. interval]
-------------+----------------------------------------------------------------
ATET |
did |
(1 vs 0) | 1164.492 610.0838 1.91 0.059 -42.93971 2371.923
------------------------------------------------------------------------------
Note: ATET estimate adjusted for panel effects and time effects.
11.1 平行趋势
如需了解 didregress 的后估计命令的详细信息和示例,请输入:help xtdidregress_postestimation
运行 xtdidregress
%%stata
xtdidregress (gdppc) (did), group(country1) time(year)
Treatment and time information
Time variable: year
Control: did = 0
Treatment: did = 1
-----------------------------------
| Control Treatment
-------------+---------------------
Group |
country1 | 58 68
-------------+---------------------
Time |
Minimum | 2000 2009
Maximum | 2000 2009
-----------------------------------
Difference-in-differences regression Number of obs = 2,772
Data type: Longitudinal
(Std. err. adjusted for 126 clusters in country1)
------------------------------------------------------------------------------
| Robust
gdppc | Coefficient std. err. t P>|t| [95% conf. interval]
-------------+----------------------------------------------------------------
ATET |
did |
(1 vs 0) | 1164.492 610.0838 1.91 0.059 -42.93971 2371.923
------------------------------------------------------------------------------
Note: ATET estimate adjusted for panel effects and time effects.
12 xtdidregress 的可视化
可视化的示例和更多信息,请输入:help xtdidregress_postestimation
%%stata
xtdidregress (gdppc) (did), group(country1) time(year) estat trendplots, ytitle(GDP pc)
. xtdidregress (gdppc) (did), group(country1) time(year)
Treatment and time information
Time variable: year
Control: did = 0
Treatment: did = 1
-----------------------------------
| Control Treatment
-------------+---------------------
Group |
country1 | 58 68
-------------+---------------------
Time |
Minimum | 2000 2009
Maximum | 2000 2009
-----------------------------------
Difference-in-differences regression Number of obs = 2,772
Data type: Longitudinal
(Std. err. adjusted for 126 clusters in country1)
------------------------------------------------------------------------------
| Robust
gdppc | Coefficient std. err. t P>|t| [95% conf. interval]
-------------+----------------------------------------------------------------
ATET |
did |
(1 vs 0) | 1164.492 610.0838 1.91 0.059 -42.93971 2371.923
------------------------------------------------------------------------------
Note: ATET estimate adjusted for panel effects and time effects.
. estat trendplots, ytitle(GDP pc)
.
13 使用 OLS 固定效应回归(手动估计)
13.1 双重差分的基础回归:所有单位在同一时间经历事件
- 创建一个带标签的数值型变量,用作分组或面板变量
encode country, gen(country1)
- DID 回归中不需要单独包括 after 和 treated 变量,因为已包含面板和时间固定效应
在使用固定效应模型(如 xtreg
, fe
或 areg
带 absorb()
)进行差分中的差分回归时,只需要包含处理组与时间的交互项(即 treated#after
),因为:
- 个体固定效应 已控制了处理组与对照组之间的时间不变差异;
- 时间固定效应 已控制了所有单位随时间的共同趋势。
xtreg gdppc did i.year, fe vce(cluster country1)
变量 did
的回归系数即为双重差分(DID)的估计量。该效应在 95% 显著性水平下不显著(P>|t| > 0.05),因此我们可以认为该事件对因变量没有显著影响。
13.2 可视化平行趋势
bysort year treated: egen mean_gdppc = mean(gdppc)
twoway line mean_gdppc year if treated == 0, sort || ///
line mean_gdppc year if treated == 1, sort lpattern(dash) ///
legend(label(1 "Control") label(2 "Treated")) ///
xline(2009)