Skip to content

Commit 4989b28

Browse files
authored
Merge pull request #26 from DoubleML/s-add-did-multi-cs
Add coverage simulations for repeated cross sections over multiple periods
2 parents 10a5f4b + 163e9f6 commit 4989b28

28 files changed

+1054
-208
lines changed

.github/workflows/did_sim.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -20,6 +20,7 @@ jobs:
2020
'scripts/did/did_pa_atte_coverage.py',
2121
'scripts/did/did_cs_atte_coverage.py',
2222
'scripts/did/did_pa_multi.py',
23+
'scripts/did/did_cs_multi.py',
2324
]
2425

2526
steps:

doc/_website.yml

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -27,7 +27,8 @@ website:
2727
- plm/pliv.qmd
2828
- text: "DID"
2929
menu:
30-
- did/did_multi.qmd
30+
- did/did_pa_multi.qmd
31+
- did/did_cs_multi.qmd
3132
- did/did_pa.qmd
3233
- did/did_cs.qmd
3334
- text: "SSM"

doc/did/did_cs_multi.qmd

Lines changed: 322 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,322 @@
1+
---
2+
title: "DiD for Cross-Sectional Data over Multiple Periods"
3+
4+
jupyter: python3
5+
---
6+
7+
```{python}
8+
#| echo: false
9+
10+
import numpy as np
11+
import pandas as pd
12+
from itables import init_notebook_mode
13+
import os
14+
import sys
15+
16+
doc_dir = os.path.abspath(os.path.join(os.getcwd(), ".."))
17+
if doc_dir not in sys.path:
18+
sys.path.append(doc_dir)
19+
20+
from utils.style_tables import generate_and_show_styled_table
21+
22+
init_notebook_mode(all_interactive=True)
23+
```
24+
25+
## ATTE Coverage
26+
27+
The simulations are based on the [make_did_cs_CS2021](https://docs.doubleml.org/dev/api/generated/doubleml.did.datasets.make_did_cs_CS2021.html)-DGP with $2000$ observations. Learners are both set to either boosting or a linear (logistic) model. Due to time constraints we only consider the following DGPs:
28+
29+
- Type 1: Linear outcome model and treatment assignment
30+
- Type 4: Nonlinear outcome model and treatment assignment
31+
- Type 6: Randomized treatment assignment and nonlinear outcome model
32+
33+
The non-uniform results (coverage, ci length and bias) refer to averaged values over all $ATTs$ (point-wise confidence intervals).
34+
35+
::: {.callout-note title="Metadata" collapse="true"}
36+
37+
```{python}
38+
#| echo: false
39+
metadata_file = '../../results/did/did_cs_multi_metadata.csv'
40+
metadata_df = pd.read_csv(metadata_file)
41+
print(metadata_df.T.to_string(header=False))
42+
```
43+
44+
:::
45+
46+
```{python}
47+
#| echo: false
48+
49+
# set up data
50+
df = pd.read_csv("../../results/did/did_cs_multi_detailed.csv", index_col=None)
51+
52+
assert df["repetition"].nunique() == 1
53+
n_rep = df["repetition"].unique()[0]
54+
55+
display_columns = ["Learner g", "Learner m", "DGP", "In-sample-norm.", "Bias", "CI Length", "Coverage", "Uniform CI Length", "Uniform Coverage"]
56+
```
57+
58+
### Observational Score
59+
60+
```{python}
61+
#| echo: false
62+
generate_and_show_styled_table(
63+
main_df=df,
64+
filters={"level": 0.95, "Score": "observational"},
65+
display_cols=display_columns,
66+
n_rep=n_rep,
67+
level_col="level",
68+
coverage_highlight_cols=["Coverage", "Uniform Coverage"]
69+
)
70+
```
71+
72+
```{python}
73+
#| echo: false
74+
generate_and_show_styled_table(
75+
main_df=df,
76+
filters={"level": 0.9, "Score": "observational"},
77+
display_cols=display_columns,
78+
n_rep=n_rep,
79+
level_col="level",
80+
coverage_highlight_cols=["Coverage", "Uniform Coverage"]
81+
)
82+
```
83+
84+
85+
### Experimental Score
86+
87+
The results are only valid for the DGP 6, as the experimental score assumes a randomized treatment assignment.
88+
89+
```{python}
90+
#| echo: false
91+
generate_and_show_styled_table(
92+
main_df=df,
93+
filters={"level": 0.95, "Score": "experimental"},
94+
display_cols=display_columns,
95+
n_rep=n_rep,
96+
level_col="level",
97+
coverage_highlight_cols=["Coverage", "Uniform Coverage"]
98+
)
99+
```
100+
101+
```{python}
102+
#| echo: false
103+
generate_and_show_styled_table(
104+
main_df=df,
105+
filters={"level": 0.9, "Score": "experimental"},
106+
display_cols=display_columns,
107+
n_rep=n_rep,
108+
level_col="level",
109+
coverage_highlight_cols=["Coverage", "Uniform Coverage"]
110+
)
111+
```
112+
113+
## Aggregated Effects
114+
115+
These simulations test different types of aggregation, as described in [DiD User Guide](https://docs.doubleml.org/dev/guide/models.html#difference-in-differences-models-did).
116+
117+
The non-uniform results (coverage, ci length and bias) refer to averaged values over all $ATTs$ (point-wise confidence intervals).
118+
119+
### Group Effects
120+
121+
```{python}
122+
#| echo: false
123+
124+
# set up data
125+
df_group = pd.read_csv("../../results/did/did_cs_multi_group.csv", index_col=None)
126+
127+
assert df_group["repetition"].nunique() == 1
128+
n_rep_group = df_group["repetition"].unique()[0]
129+
130+
display_columns = ["Learner g", "Learner m", "DGP", "In-sample-norm.", "Bias", "CI Length", "Coverage", "Uniform CI Length", "Uniform Coverage"]
131+
```
132+
133+
#### Observational Score
134+
135+
```{python}
136+
#| echo: false
137+
generate_and_show_styled_table(
138+
main_df=df_group,
139+
filters={"level": 0.95, "Score": "observational"},
140+
display_cols=display_columns,
141+
n_rep=n_rep_group,
142+
level_col="level",
143+
coverage_highlight_cols=["Coverage", "Uniform Coverage"]
144+
)
145+
```
146+
147+
```{python}
148+
#| echo: false
149+
generate_and_show_styled_table(
150+
main_df=df_group,
151+
filters={"level": 0.9, "Score": "observational"},
152+
display_cols=display_columns,
153+
n_rep=n_rep_group,
154+
level_col="level",
155+
coverage_highlight_cols=["Coverage", "Uniform Coverage"]
156+
)
157+
```
158+
159+
#### Experimental Score
160+
161+
The results are only valid for the DGP 6, as the experimental score assumes a randomized treatment assignment.
162+
163+
```{python}
164+
#| echo: false
165+
generate_and_show_styled_table(
166+
main_df=df_group,
167+
filters={"level": 0.95, "Score": "experimental"},
168+
display_cols=display_columns,
169+
n_rep=n_rep_group,
170+
level_col="level",
171+
coverage_highlight_cols=["Coverage", "Uniform Coverage"]
172+
)
173+
```
174+
175+
```{python}
176+
#| echo: false
177+
generate_and_show_styled_table(
178+
main_df=df_group,
179+
filters={"level": 0.9, "Score": "experimental"},
180+
display_cols=display_columns,
181+
n_rep=n_rep_group,
182+
level_col="level",
183+
coverage_highlight_cols=["Coverage", "Uniform Coverage"]
184+
)
185+
```
186+
187+
### Time Effects
188+
189+
```{python}
190+
#| echo: false
191+
192+
# set up data
193+
df_time = pd.read_csv("../../results/did/did_cs_multi_time.csv", index_col=None)
194+
195+
assert df_time["repetition"].nunique() == 1
196+
n_rep_time = df_time["repetition"].unique()[0]
197+
198+
display_columns = ["Learner g", "Learner m", "DGP", "In-sample-norm.", "Bias", "CI Length", "Coverage", "Uniform CI Length", "Uniform Coverage"]
199+
```
200+
201+
#### Observational Score
202+
203+
```{python}
204+
#| echo: false
205+
generate_and_show_styled_table(
206+
main_df=df_time,
207+
filters={"level": 0.95, "Score": "observational"},
208+
display_cols=display_columns,
209+
n_rep=n_rep_time,
210+
level_col="level",
211+
coverage_highlight_cols=["Coverage", "Uniform Coverage"]
212+
)
213+
```
214+
215+
```{python}
216+
#| echo: false
217+
generate_and_show_styled_table(
218+
main_df=df_time,
219+
filters={"level": 0.9, "Score": "observational"},
220+
display_cols=display_columns,
221+
n_rep=n_rep_time,
222+
level_col="level",
223+
coverage_highlight_cols=["Coverage", "Uniform Coverage"]
224+
)
225+
```
226+
227+
#### Experimental Score
228+
229+
The results are only valid for the DGP 6, as the experimental score assumes a randomized treatment assignment.
230+
231+
```{python}
232+
#| echo: false
233+
generate_and_show_styled_table(
234+
main_df=df_time,
235+
filters={"level": 0.95, "Score": "experimental"},
236+
display_cols=display_columns,
237+
n_rep=n_rep_time,
238+
level_col="level",
239+
coverage_highlight_cols=["Coverage", "Uniform Coverage"]
240+
)
241+
```
242+
243+
```{python}
244+
#| echo: false
245+
generate_and_show_styled_table(
246+
main_df=df_time,
247+
filters={"level": 0.9, "Score": "experimental"},
248+
display_cols=display_columns,
249+
n_rep=n_rep_time,
250+
level_col="level",
251+
coverage_highlight_cols=["Coverage", "Uniform Coverage"]
252+
)
253+
```
254+
255+
### Event Study Aggregation
256+
257+
```{python}
258+
#| echo: false
259+
260+
# set up data
261+
df_es = pd.read_csv("../../results/did/did_cs_multi_eventstudy.csv", index_col=None)
262+
263+
assert df_es["repetition"].nunique() == 1
264+
n_rep_es = df_es["repetition"].unique()[0]
265+
266+
display_columns = ["Learner g", "Learner m", "DGP", "In-sample-norm.", "Bias", "CI Length", "Coverage", "Uniform CI Length", "Uniform Coverage"]
267+
```
268+
269+
#### Observational Score
270+
271+
```{python}
272+
#| echo: false
273+
generate_and_show_styled_table(
274+
main_df=df_es,
275+
filters={"level": 0.95, "Score": "observational"},
276+
display_cols=display_columns,
277+
n_rep=n_rep_es,
278+
level_col="level",
279+
coverage_highlight_cols=["Coverage", "Uniform Coverage"]
280+
)
281+
```
282+
283+
```{python}
284+
#| echo: false
285+
generate_and_show_styled_table(
286+
main_df=df_es,
287+
filters={"level": 0.9, "Score": "observational"},
288+
display_cols=display_columns,
289+
n_rep=n_rep_es,
290+
level_col="level",
291+
coverage_highlight_cols=["Coverage", "Uniform Coverage"]
292+
)
293+
```
294+
295+
#### Experimental Score
296+
297+
The results are only valid for the DGP 6, as the experimental score assumes a randomized treatment assignment.
298+
299+
300+
```{python}
301+
#| echo: false
302+
generate_and_show_styled_table(
303+
main_df=df_es,
304+
filters={"level": 0.95, "Score": "experimental"},
305+
display_cols=display_columns,
306+
n_rep=n_rep_es,
307+
level_col="level",
308+
coverage_highlight_cols=["Coverage", "Uniform Coverage"]
309+
)
310+
```
311+
312+
```{python}
313+
#| echo: false
314+
generate_and_show_styled_table(
315+
main_df=df_es,
316+
filters={"level": 0.9, "Score": "experimental"},
317+
display_cols=display_columns,
318+
n_rep=n_rep_es,
319+
level_col="level",
320+
coverage_highlight_cols=["Coverage", "Uniform Coverage"]
321+
)
322+
```

doc/did/did_multi.qmd renamed to doc/did/did_pa_multi.qmd

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -36,7 +36,7 @@ The non-uniform results (coverage, ci length and bias) refer to averaged values
3636

3737
```{python}
3838
#| echo: false
39-
metadata_file = '../../results/did/did_multi_metadata.csv'
39+
metadata_file = '../../results/did/did_pa_multi_metadata.csv'
4040
metadata_df = pd.read_csv(metadata_file)
4141
print(metadata_df.T.to_string(header=False))
4242
```
@@ -47,7 +47,7 @@ print(metadata_df.T.to_string(header=False))
4747
#| echo: false
4848
4949
# set up data
50-
df = pd.read_csv("../../results/did/did_multi_detailed.csv", index_col=None)
50+
df = pd.read_csv("../../results/did/did_pa_multi_detailed.csv", index_col=None)
5151
5252
assert df["repetition"].nunique() == 1
5353
n_rep = df["repetition"].unique()[0]
@@ -122,7 +122,7 @@ The non-uniform results (coverage, ci length and bias) refer to averaged values
122122
#| echo: false
123123
124124
# set up data
125-
df_group = pd.read_csv("../../results/did/did_multi_group.csv", index_col=None)
125+
df_group = pd.read_csv("../../results/did/did_pa_multi_group.csv", index_col=None)
126126
127127
assert df_group["repetition"].nunique() == 1
128128
n_rep_group = df_group["repetition"].unique()[0]
@@ -190,7 +190,7 @@ generate_and_show_styled_table(
190190
#| echo: false
191191
192192
# set up data
193-
df_time = pd.read_csv("../../results/did/did_multi_time.csv", index_col=None)
193+
df_time = pd.read_csv("../../results/did/did_pa_multi_time.csv", index_col=None)
194194
195195
assert df_time["repetition"].nunique() == 1
196196
n_rep_time = df_time["repetition"].unique()[0]
@@ -258,7 +258,7 @@ generate_and_show_styled_table(
258258
#| echo: false
259259
260260
# set up data
261-
df_es = pd.read_csv("../../results/did/did_multi_eventstudy.csv", index_col=None)
261+
df_es = pd.read_csv("../../results/did/did_pa_multi_eventstudy.csv", index_col=None)
262262
263263
assert df_es["repetition"].nunique() == 1
264264
n_rep_es = df_es["repetition"].unique()[0]

doc/index.qmd

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -264,3 +264,5 @@ fig.show()
264264
```
265265

266266
:::
267+
268+
:::
Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,6 @@
11
"""Monte Carlo coverage simulations for DiD."""
22

3+
from montecover.did.did_cs_multi import DIDCSMultiCoverageSimulation
34
from montecover.did.did_pa_multi import DIDMultiCoverageSimulation
45

5-
__all__ = ["DIDMultiCoverageSimulation"]
6+
__all__ = ["DIDMultiCoverageSimulation", "DIDCSMultiCoverageSimulation"]

0 commit comments

Comments
 (0)