2023年6月24日发(作者:)
Average Treatment Effect
Li Gan
Nov, 2007
1. The Regression Method:
We are interested in average changes in outcome y. Denote 1 if with treatment,
and 0 without treatment. Average Treatment Effect is defined as:
ATE = E(y1 – y0) (1)
The difficulty in estimating is that we observe y1 or y0, not both, for each person.
More precisely, let w = 1 if treatment. The observed outcome y can be written as:
y = (1-w) y0 + w y1. (2)
If w is independent of y, then:
E(y1-y0) = E(y1-y0 |w) = E(y1|w=1) – E(y0|w=0)
In fact, we only need the weak assumption (rather than independence): mean
independence: E(y0|w) = E(y0), E(y1|w) = E(y1).
Now let:
y0=μ0+v0,E(v0)=0
y1=μ1+v1,E(v1)=0
Therefore, (2) can be written as:
y=(1−w)y0+wy1 (3)
=μ0+(μ1−μ0)w+v0+w(v1−v0)
First, assume conditional mean independence:
Assumption 1 (ATE 1): (a) E(y0|w,x) = E(y0|x), and (b) E(y1|w,x) = E(y1|x)
Intuition: even though y1 and y0 may be correlated with w, they are uncorrelated
with w if we partial out x.
Taking expectation of (3) (and with ATE 1):
E(y|w,x) = μ0 + αw + g0(x) + w(g1(x)- g0(x)), (4)
1
where α=μ1- μ0 is the Average Treatment Effect (ATE), and gi(x)=E(vi|x).
Linearization of gi(x):
E(y|w,x) = μ0 + αw + xβ0 + w (x-ψ)δ,
where ψ=E(x). The last term is to ensure that g1(x)- g0(x)=0. So the regression to
estimate ATE α is:
yi on 1, wi, xi, wi(xi –
x)
Here the control functions involve not just xi, but also interactions of the
covariates with the treatment variable.
We can estimate treatment effect conditional on x:
ˆE(x)=αˆ+(x−x)δˆ
AT
2. Propensity Score:
Let p(x) = Pr(w=1|x).
(w – p(x)) y = (w – p(x))(wy1 + (1-w) y0)
= wy1 – p(x) (1-w) y0 – p(x)wy1
Take conditional expectation with respect to y:
Ey[(w – p(x))y|w,x]= wm1(x)– p(x) (1-w) m0(x)– p(x)wm1(x),
where E(yj|w,x)= E(yj|x)=mj(x). Taking expectation with respect to w:
Ew{Ey[(w – p(x))y|w,x]|x}
= Ew[wm1(x)– p(x) (1-w) m0(x)– p(x)wm1(x)]
= p(x)m1(x)– p(x) (1- p(x)) m0(x)– p(x) p(x)m1(x)
= m1(x)p(x)(1-p(x))- m0(x)p(x)(1- p(x))
=(m1(x)-m0(x))p(x)(1-p(x))
Therefore,
2ATE=m1(x)−m0(x)E((w−p(x))y)
=p(x)(1−p(x))
A simple and popular estimator in program evaluation is obtained from OLS
regression:
ˆ(xi) yi on 1, wi, p
where coefficient for wi is the estimate of the treatment effect. In other words, the
estimated propensity score plays the role of the control function.
3. Dummy Endogenous Variables
Consider the model again:
E(y|w,x) = μ0 + αw + xβ0 + u0, (4)
w is endogenous. Again, w = 1 if treated, and 0 otherwise.
Assume that Pr(w=1|x,z) = G(x, z; γ)
Procedure 1:
(1)
Estimate the binary response model Pr(wi=1|xi,zi) = G(xi,zi;γ), and obtain the
ˆ. fitted values
Giˆ and xi. (2)
Estimate (4) using instruments 1,
Gi
Procedure 1 has important robustness property:
ˆ as an IV, the model Pr(wi=1|xi,zi) = G(xi,zi;γ) does not
(a)
Because we use
Gihave to be correctly specified.
(b)
Technically, α and β are identified even if we do not have extra variables
excluded from x. But can rarely justify the estimator in this case.
Suppose that w given x follows a probit model (no z). Because G(x, γ)
= Φ(γ0 +xγ1), is a nonlinear function of x, it is not perfectly correlated
with x, so it ca nbe used as IV for w.
(c)
In principle, it important to recognize that Procedure 1 is not the same as
using G as a regressor in place of w.
3ˆ and xi.
yi on 1,
Gi
Consistency of the OLS estimators from the regression:
ˆ+xβ+u (5)
yi=δ0+αGii0iwould rely on G( ) to be correctly specified. Note that (5) also has problems with
standard errors that need to be corrected.
Allow interact term:
yi=δ0+αwi+xiβ0+wi(xi−x)δ+ei (6)
Procedure 2:
(a) Estimate Pr(wi=1|xi,zi) = G(xi,zi;γ)
ˆ and xi, and
Gˆ(x−x) as IVs.
(b) Use 1,
GiiiDiscussions are the same as before.
4. Regression discontinuity
It is useful to distinguish between two general settings, the Sharp and the Fuzzy
Regression Discontinuity designs. In the sharp design, the assignment wi is a
deterministic function of one of the covariates, the forcing (or treatment-determining)
variable x:
Sharp design:
wi = 1(xi > x0)
All units with xi > x0 are assigned to the treatment group (and participation is
mandatory for these individuals), and all units with xi ≤ x0 are assigned to the control
group. In this sharp design, we look at the discontinuity in the conditional expectation of
the outcome given the covariates to uncover the ATE:
ATE=lim+E[y|x]−lim−E[y|x]=E(y1−y0|x=x0)
x→x0x→x0
Fuzzy design:
4 E(wi|xi = x) = Pr(wi = 1|x) is discontinuous at known value x0.
The sharp and fuzzy designs differ in that in the sharp design the treatment
assignment is deterministic given x, while the fuzzy design the treatment assignment may
depend on additional factors unobserved by econometrician. In both designs, the
discontinuity point x0 is known.
Assumption (RD):
(i)
w+=limx→x+E(w|x) and
w−=limx→x−E(w|x) exist.
(ii) w ≠ w
+ -00In Angrist and Lavy (1999), an identifying assumption would be that the class
size for a student in a school with a number of pupils approaching (for example) 800
above differs from that of a student in a school with a number of pupils approaching 800
from below.
Assumption: E(y1i – y0i |xi = x) is continuous in x at x0.
This assumption is valid where we have reason to believe that person close to
threshold c are similar and thus would experience similar outcome absent treatment.
Theorem: ATE, denoted as α:
y+−y−
α=+
w−w−Proof:
Let Δ to be a small positive number.
E(y|x0+Δ)−E(y|x0−Δ)
=E((y1−y0)w+y0|x0+Δ)−E((y1−y0)w+y0|x0−Δ)=E((y1−y0)w|x0+Δ)−E((y1−y0)w|x0−Δ)+(E(y0|x0+Δ)−E(y0|x0−Δ))=α(E(w|x0+Δ)−E(w|x0−Δ))+(E(y0|x0+Δ)−E(y0|x0−Δ))
As Δ
Æ0, we have:
y+−y−=α(w+−w−)
Here we use the fact (assumption) that E(y0) is continuous at x0 without treatment.
The conclusion follows.
5Given this theorem, we can obtain an estimate of α by estimating y+, y-, w+, and
w -. There are several ways to estimate this. The most popular way is to do it non-parametrically.
In practice,
ˆ+=y
ˆ−y
∑y1(x ∑y1(x−h ˆ+=wˆ−w∑w1(x w1(x−h where h is the bandwidth. An interesting note is that this is numerically equivalent to an IV estimator for the regression of yi on wi for people in the subsample (x0−h 1(x0 useful because one can add control variables in the regression. Practically, for a sharp design, 1. Graph the data by computing the average value of the outcome variable over a set of bins. The bandwidth has to be large enough to have sufficient amount of precision so that the plots look smooth on either side of the cutoff value, but at the same time small enough to make the jump around the cutoff value clear. 2. Estimate the treatment effect by running linear regression on both sides of the cutoff point. Since we propose to use a rectangular kernel, these are just standard 6regression estimated within a bin of width h on both sides of cutoff point. Note that: i. ii. Fuzzy design: 1. Graph the average outcome over a set of bins as in the case of SRD, but also graph the probability of treatment. 2. Estimating the treatment effect using TSLS. Standard errors can be computed using standard leas square methods (robust standard errors) The optimal bandwidth can be chosen using cross-validation methods. 7
发布者:admin,转转请注明出处:http://www.yc00.com/news/1687608657a24297.html
评论列表(0条)