Notes on treatment-effect高级计量Notes(美国A&M大学,甘犁教授)

2023年6月24日发(作者：)

Average Treatment Effect

Li Gan

Nov, 2007

1. The Regression Method:

We are interested in average changes in outcome y. Denote 1 if with treatment,

and 0 without treatment. Average Treatment Effect is defined as:

ATE = E(y1 – y0) (1)

The difficulty in estimating is that we observe y1 or y0, not both, for each person.

More precisely, let w = 1 if treatment. The observed outcome y can be written as:

y = (1-w) y0 + w y1. (2)

If w is independent of y, then:

E(y1-y0) = E(y1-y0 |w) = E(y1|w=1) – E(y0|w=0)

In fact, we only need the weak assumption (rather than independence): mean

independence: E(y0|w) = E(y0), E(y1|w) = E(y1).

Now let:

y0=μ0+v0,E(v0)=0

y1=μ1+v1,E(v1)=0

Therefore, (2) can be written as:

y=(1−w)y0+wy1 (3)

=μ0+(μ1−μ0)w+v0+w(v1−v0)

First, assume conditional mean independence:

Assumption 1 (ATE 1): (a) E(y0|w,x) = E(y0|x), and (b) E(y1|w,x) = E(y1|x)

Intuition: even though y1 and y0 may be correlated with w, they are uncorrelated

with w if we partial out x.

Taking expectation of (3) (and with ATE 1):

E(y|w,x) = μ0 + αw + g0(x) + w(g1(x)- g0(x)), (4)

where α=μ1- μ0 is the Average Treatment Effect (ATE), and gi(x)=E(vi|x).

Linearization of gi(x):

E(y|w,x) = μ0 + αw + xβ0 + w (x-ψ)δ,

where ψ=E(x). The last term is to ensure that g1(x)- g0(x)=0. So the regression to

estimate ATE α is:

yi on 1, wi, xi, wi(xi –

Here the control functions involve not just xi, but also interactions of the

covariates with the treatment variable.

We can estimate treatment effect conditional on x:

ˆE(x)=αˆ+(x−x)δˆ

2. Propensity Score:

Let p(x) = Pr(w=1|x).

(w – p(x)) y = (w – p(x))(wy1 + (1-w) y0)

= wy1 – p(x) (1-w) y0 – p(x)wy1

Take conditional expectation with respect to y:

Ey[(w – p(x))y|w,x]= wm1(x)– p(x) (1-w) m0(x)– p(x)wm1(x),

where E(yj|w,x)= E(yj|x)=mj(x). Taking expectation with respect to w:

Ew{Ey[(w – p(x))y|w,x]|x}

= Ew[wm1(x)– p(x) (1-w) m0(x)– p(x)wm1(x)]

= p(x)m1(x)– p(x) (1- p(x)) m0(x)– p(x) p(x)m1(x)

= m1(x)p(x)(1-p(x))- m0(x)p(x)(1- p(x))

=(m1(x)-m0(x))p(x)(1-p(x))

Therefore,

2ATE=m1(x)−m0(x)E((w−p(x))y)

=p(x)(1−p(x))

A simple and popular estimator in program evaluation is obtained from OLS

regression:

ˆ(xi) yi on 1, wi, p

where coefficient for wi is the estimate of the treatment effect. In other words, the

estimated propensity score plays the role of the control function.

3. Dummy Endogenous Variables

Consider the model again:

E(y|w,x) = μ0 + αw + xβ0 + u0, (4)

w is endogenous. Again, w = 1 if treated, and 0 otherwise.

Assume that Pr(w=1|x,z) = G(x, z; γ)

Procedure 1:

(1)

Estimate the binary response model Pr(wi=1|xi,zi) = G(xi,zi;γ), and obtain the

ˆ. fitted values

Giˆ and xi. (2)

Estimate (4) using instruments 1,

Procedure 1 has important robustness property:

ˆ as an IV, the model Pr(wi=1|xi,zi) = G(xi,zi;γ) does not

(a)

Because we use

Gihave to be correctly specified.

(b)

Technically, α and β are identified even if we do not have extra variables

excluded from x. But can rarely justify the estimator in this case.

Suppose that w given x follows a probit model (no z). Because G(x, γ)

= Φ(γ0 +xγ1), is a nonlinear function of x, it is not perfectly correlated

with x, so it ca nbe used as IV for w.

(c)

In principle, it important to recognize that Procedure 1 is not the same as

using G as a regressor in place of w.

3ˆ and xi.

yi on 1,

Consistency of the OLS estimators from the regression:

ˆ+xβ+u (5)

yi=δ0+αGii0iwould rely on G( ) to be correctly specified. Note that (5) also has problems with

standard errors that need to be corrected.

Allow interact term:

yi=δ0+αwi+xiβ0+wi(xi−x)δ+ei (6)

Procedure 2:

(a) Estimate Pr(wi=1|xi,zi) = G(xi,zi;γ)

ˆ and xi, and

Gˆ(x−x) as IVs.

(b) Use 1,

GiiiDiscussions are the same as before.

4. Regression discontinuity

It is useful to distinguish between two general settings, the Sharp and the Fuzzy

Regression Discontinuity designs. In the sharp design, the assignment wi is a

deterministic function of one of the covariates, the forcing (or treatment-determining)

variable x:

Sharp design:

wi = 1(xi > x0)

All units with xi > x0 are assigned to the treatment group (and participation is

mandatory for these individuals), and all units with xi ≤ x0 are assigned to the control

group. In this sharp design, we look at the discontinuity in the conditional expectation of

the outcome given the covariates to uncover the ATE:

ATE=lim+E[y|x]−lim−E[y|x]=E(y1−y0|x=x0)

x→x0x→x0

Fuzzy design:

4 E(wi|xi = x) = Pr(wi = 1|x) is discontinuous at known value x0.

The sharp and fuzzy designs differ in that in the sharp design the treatment

assignment is deterministic given x, while the fuzzy design the treatment assignment may

depend on additional factors unobserved by econometrician. In both designs, the

discontinuity point x0 is known.

Assumption (RD):

(i)

w+=limx→x+E(w|x) and

w−=limx→x−E(w|x) exist.

(ii) w ≠ w

+ -00In Angrist and Lavy (1999), an identifying assumption would be that the class

size for a student in a school with a number of pupils approaching (for example) 800

above differs from that of a student in a school with a number of pupils approaching 800

from below.

Assumption: E(y1i – y0i |xi = x) is continuous in x at x0.

This assumption is valid where we have reason to believe that person close to

threshold c are similar and thus would experience similar outcome absent treatment.

Theorem: ATE, denoted as α:

y+−y−

α=+

w−w−Proof:

Let Δ to be a small positive number.

E(y|x0+Δ)−E(y|x0−Δ)

As Δ

Æ0, we have:

y+−y−=α(w+−w−)

Here we use the fact (assumption) that E(y0) is continuous at x0 without treatment.

The conclusion follows.

5Given this theorem, we can obtain an estimate of α by estimating y+, y-, w+, and

w -. There are several ways to estimate this. The most popular way is to do it non-parametrically.

In practice,

ˆ+=y

ˆ−y

∑y1(x

∑y1(x−h

ˆ+=wˆ−w∑w1(x

w1(x−h

where h is the bandwidth. An interesting note is that this is numerically equivalent to an

IV estimator for the regression of yi on wi for people in the subsample

(x0−h

1(x0

useful because one can add control variables in the regression.

Practically, for a sharp design,

Graph the data by computing the average value of the outcome variable over a set

of bins. The bandwidth has to be large enough to have sufficient amount of

precision so that the plots look smooth on either side of the cutoff value, but at the

same time small enough to make the jump around the cutoff value clear.

Estimate the treatment effect by running linear regression on both sides of the

cutoff point. Since we propose to use a rectangular kernel, these are just standard

6regression estimated within a bin of width h on both sides of cutoff point. Note

that:

ii.

Fuzzy design:

Graph the average outcome over a set of bins as in the case of SRD, but also

graph the probability of treatment.

Estimating the treatment effect using TSLS.

Standard errors can be computed using standard leas square methods

(robust standard errors)

The optimal bandwidth can be chosen using cross-validation methods.

发布者：admin，转转请注明出处：http://www.yc00.com/news/1687608657a24297.html

Notes on treatment-effect高级计量Notes(美国A&M大学,甘犁教授)_百...

发表回复

评论列表（0条）

联系我们

400-800-8888

Notes on treatment-effect高级计量Notes(美国A&amp;M大学,甘犁教授)_百...

相关推荐

发表回复

评论列表（0条）

联系我们

400-800-8888

Notes on treatment-effect高级计量Notes(美国A&M大学,甘犁教授)_百...