## Field Experiments II

**POLSCI 4SS3**

Winter 2024

## Last time

We learned about implementing field experients

Lots of details!

Sometimes cannot randomly assign

`(stepped-wedge design)`

**Today:**Thinking about how to do better

## Why do better?

Conducting research is expensive

Field experiments are

**very**expensiveEven if you had the resources, we have a mandate to do better

## Research ethics

**Belmont report:**Benefits should outweigh costs**:**Researchers have duties beyond getting review board approvalAt a minimum, participating in a study takes time

**Mandate:**Find the most efficient, ethical study before collecting dataSometimes that means doing more with a

*smaller sample*

# Improving Precision

## Two ways to improve precision

\[ SE(\widehat{ATE}) =\\ \sqrt{\frac{\text{Var}(Y_i(0)) + \text{Var}(Y_i(1)) + 2\text{Cov}(Y_i(0), Y_i(1))}{N-1}} \]

## Two ways to improve precision

\[ SE(\widehat{ATE}) =\\ \sqrt{\frac{\text{Var}(Y_i(0)) + \text{Var}(Y_i(1)) + 2\text{Cov}(Y_i(0), Y_i(1))}{\color{#ac1455}{N-1}}} \]

- Increase sample size Make
**denominator**larger

## Two ways to improve precision

\[ SE(\widehat{ATE}) =\\ \sqrt{\frac{\color{#ac1455} {\text{Var}(Y_i(0)) + \text{Var}(Y_i(1)) + 2\text{Cov}(Y_i(0), Y_i(1))}}{N-1}} \]

- Alternative research design Make
**numerator**smaller

## Pre-post design

. . .

Outcomes are measured

*at least*twiceOnce before treatment, once after treatment

. . .

Condition | \(t=1\) | Treatment | \(t=2\) |
---|---|---|---|

\(Z_i=1\) | \(Y_{i, t=1}\) | X | \(Y_{i, t=2}(1)\) |

\(Z_i=0\) | \(Y_{i, t=1}\) | \(Y_{i, t=2}(0)\) |

AKA repeated measures design

## How does this work?

. . .

- Standard ATE estimator:

\[ E[Y_i(1) | Z_i = 1] - E[Y_i(0) | Z_i = 0] \]

. . .

- Pre-post ATE estimator:

\[ E[(Y_{i,t=2}(1) - Y_{i,t=1}) | Z_i = 1] - E[(Y_{i,t=2}(0) - Y_{i,t=1}) | Z_i = 0] \]

## How does this work?

- Standard ATE estimator:

\[ E[Y_i(1) | Z_i = 1] - E[Y_i(0) | Z_i = 0] \]

- Pre-post ATE estimator:

\[ E[(Y_{i,t=2}(1) \color{#ac1455} {- Y_{i,t=1}}) | Z_i = 1] - E[(Y_{i,t=2}(0) \color{#ac1455} {- Y_{i,t=1}}) | Z_i = 0] \]

. . .

- We improve precision by subtracting the variation in the outcome that is unrelated to the treatment

## Reasons to use pre-post design

- To increase precision in ATE estimates

. . .

- Most useful when pre-treatment outcomes correlate highly with post-treatment outcomes

. . .

- Problematic when:

. . .

- Pre-treatment outcomes correlate with
*potential outcomes* - Measuring pre-treatment outcomes leads to
*attrition*

## Block randomization

Change how randomization happens

Group units in

*blocks*or*strata*Estimate average treatment effect within each

Aggregate with a weighted average

## How does it work?

. . .

- Within-block ATE estimator:

\[ \widehat{ATE}_b = E[Y_{ib}(1) | Z_{ib} = 1] - E[Y_{ib}(0) | Z_{ib} = 0] \]

## How does it work?

- Within-block ATE estimator:

\[ \widehat{ATE}_\color{#ac1455}b = E[Y_{i\color{#ac1455}b}(1) | Z_{i\color{#ac1455}b} = 1] - E[Y_{i\color{#ac1455}b}(0) | Z_{i\color{#ac1455}b} = 0] \]

. . .

- Overall ATE estimator:

\[ \widehat{ATE}_{\text{Block}} = \sum_{b=1}^B \frac{n_b}{N} \widehat{ATE}_b \]

## Illustration

ID | Block | \(Y_i(0)\) | \(Y_i(1)\) |
---|---|---|---|

1 | 1 | 1 | 4 |

2 | 1 | 2 | 5 |

3 | 1 | 1 | 4 |

4 | 1 | 2 | 5 |

5 | 2 | 3 | 8 |

6 | 2 | 4 | 9 |

7 | 2 | 3 | 8 |

8 | 2 | 4 | 9 |

Potential outcomes

*correlate*with blocksTrue \(ATE = 4\)

Do 500 experiments

Compare complete and block-randomized experiment

## Simulation

Block randomization yields a narrower distribution of estimates

## Reasons to block randomize

To increase precision in ATE estimates

To account for possible heterogeneous treatment effects

. . .

Most useful when blocking variables correlate with potential outcomes

And it rarely hurts when they do not correlate!

`(more in the lab!)`

# Example

## Kalla et al (2018): Are You My Mentor?

Correspondence experiment with \(N = 8189\) legislators in the US

Send email about fake student seeking advice to become politician

Cue gender with student’s name

Also called *audit* experiments since they were originally designed to audit how responsive elected officials are

## Sample email

## Data strategy

Block-randomize by legislator’s gender

`(why?)`

**Outcomes:**Reply content and length

## Findings

Outcome | Male Sender | Female Sender | p-value |
---|---|---|---|

Received reply | 0.25 | 0.27 | 0.15 |

Meaningful response | 0.11 | 0.13 | 0.47 |

Praised | 0.05 | 0.06 | 0.17 |

Offer to help | 0.03 | 0.05 | 0.09 |

Warned against running | 0.01 | 0.02 | 0.14 |

Substantive advice | 0.07 | 0.08 | 0.33 |

Word count (logged) | 1.00 | 1.10 | 0.06 |

Character count | 145.00 | 170.00 | 0.04 |

. . .

- Why not much difference by gender?

Adapted from Table 1

## Break time!