update cs example and add to gallery

SvenKlaassen · SvenKlaassen · commit 23fa23c63ea3 · 2025-07-07T17:36:20.000+02:00
diff --git a/doc/examples/did/py_rep_cs.ipynb b/doc/examples/did/py_rep_cs.ipynb
@@ -4,7 +4,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "# Python: Pepeated Cross-Sectional Data with Multiple Time Periods\n",
+    "# Python: Repeated Cross-Sectional Data with Multiple Time Periods\n",
     "\n",
     "In this example, a detailed guide on Difference-in-Differences with multiple time periods using the [DoubleML-package](https://docs.doubleml.org/stable/index.html). The implementation is based on [Callaway and Sant'Anna(2021)](https://doi.org/10.1016/j.jeconom.2020.12.001).\n",
     "\n",
@@ -37,9 +37,11 @@
    "source": [
     "## Data\n",
     "\n",
-    "We will rely on the `make_did_CS2021` DGP, which is inspired by [Callaway and Sant'Anna(2021)](https://doi.org/10.1016/j.jeconom.2020.12.001) (Appendix SC) and [Sant'Anna and Zhao (2020)](https://doi.org/10.1016/j.jeconom.2020.06.003).\n",
+    "We will rely on the `make_did_cs_CS2021` DGP, which is inspired by [Callaway and Sant'Anna(2021)](https://doi.org/10.1016/j.jeconom.2020.12.001) (Appendix SC) and [Sant'Anna and Zhao (2020)](https://doi.org/10.1016/j.jeconom.2020.06.003).\n",
     "\n",
-    "We will observe `n_obs` units over `n_periods`. Remark that the dataframe includes observations of the potential outcomes `y0` and `y1`, such that we can use oracle estimates as comparisons. "
+    "We will observe approximately `n_obs` units over `n_periods`. The parameter `lambda_t` determines the probability of observing a unit ``i`` in time period ``t``. The parameter `lambda_t` is set to 0.5 for all time periods, which means that each unit has a 50% chance of being observed in each time period.\n",
+    "\n",
+    "Remark that the dataframe includes observations of the potential outcomes `y0` and `y1`, such that we can use oracle estimates as comparisons."
    ]
   },
   {
@@ -389,7 +391,7 @@
     "The choice `gt_combinations=\"standard\"`, used estimates all possible combinations of $ATT(g,t_\\text{eval})$ via $\\widehat{ATT}(\\mathrm{g},t_\\text{pre},t_\\text{eval})$,\n",
     "where the standard choice is $t_\\text{pre} = \\min(\\mathrm{g}, t_\\text{eval}) - 1$ (without anticipation).\n",
     "\n",
-    "Remark that this includes pre-tests effects if $\\mathrm{g} > t_{eval}$, e.g. $\\widehat{ATT}(g=\\text{2025-04}, t_{\\text{pre}}=\\text{2025-01}, t_{\\text{eval}}=\\text{2025-02})$ which estimates the pre-trend from January to February even if the actual treatment occured in April."
+    "Remark that this includes pre-tests effects if $\\mathrm{g} > t_{eval}$, e.g. $\\widehat{ATT}(g=3, t_{\\text{pre}}=0, t_{\\text{eval}}=1)$ which estimates the pre-trend from time period $0$ to $1$ even if the actual treatment occured in time period $3$."
    ]
   },
   {
@@ -630,9 +632,9 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "df[\"e\"] = pd.to_datetime(df[\"t\"]).values.astype(\"datetime64[M]\") - \\\n",
-    "    pd.to_datetime(df[\"d\"]).values.astype(\"datetime64[M]\")\n",
-    "df.groupby(\"e\")[\"ite\"].mean()[1:]"
+    "df_treated = df[df[\"d\"] != np.inf].copy()\n",
+    "df_treated[\"e\"] = df_treated[\"t\"] - df_treated[\"d\"]\n",
+    "df_treated.groupby(\"e\")[\"ite\"].mean().iloc[1:]"
    ]
   },
   {
@@ -899,9 +901,11 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "### Selected Combinations\n",
+    "### Universal Base Period\n",
     "\n",
-    "Instead it is also possible to just submit a list of tuples containing $(\\mathrm{g}, t_\\text{pre}, t_\\text{eval})$ combinations. E.g. only two combinations"
+    "The  option `gt_combinations=\"universal\"` set $t_\\text{pre} = \\mathrm{g} - \\delta - 1$, corresponding to a universal/constant comparison or base period.\n",
+    "\n",
+    "Remark that this implies $t_\\text{pre} > t_\\text{eval}$ for all pre-treatment periods (accounting for anticipation). Therefore these effects do not have the same straightforward interpretation as ATT's."
    ]
   },
   {
@@ -910,28 +914,19 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "gt_dict = {\n",
-    "    \"gt_combinations\": [\n",
-    "        (4.0, 1, 2),\n",
-    "        (4.0, 1, 3),\n",
-    "        ]\n",
-    "}\n",
-    "\n",
-    "dml_obj_all = DoubleMLDIDMulti(dml_data, **(default_args| gt_dict))\n",
-    "dml_obj_all.fit()\n",
-    "dml_obj_all.bootstrap(n_rep_boot=5000)\n",
-    "dml_obj_all.plot_effects()"
+    "dml_obj_universal = DoubleMLDIDMulti(dml_data, **(default_args| {\"gt_combinations\": \"universal\"}))\n",
+    "dml_obj_universal.fit()\n",
+    "dml_obj_universal.bootstrap(n_rep_boot=5000)\n",
+    "dml_obj_universal.plot_effects()"
    ]
   },
   {
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "### Universal Base Period\n",
-    "\n",
-    "The  option `gt_combinations=\"universal\"` set $t_\\text{pre} = \\mathrm{g} - \\delta - 1$, corresponding to a universal/constant comparison or base period.\n",
+    "### Selected Combinations\n",
     "\n",
-    "Remark that this implies $t_\\text{pre} > t_\\text{eval}$ for all pre-treatment periods (accounting for anticipation). Therefore these effects do not have the same straightforward interpretation as ATT's."
+    "Instead it is also possible to just submit a list of tuples containing $(\\mathrm{g}, t_\\text{pre}, t_\\text{eval})$ combinations. E.g. only two combinations"
    ]
   },
   {
@@ -940,16 +935,23 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "dml_obj_universal = DoubleMLDIDMulti(dml_data, **(default_args| {\"gt_combinations\": \"universal\"}))\n",
-    "dml_obj_universal.fit()\n",
-    "dml_obj_universal.bootstrap(n_rep_boot=5000)\n",
-    "dml_obj_universal.plot_effects()"
+    "gt_dict = {\n",
+    "    \"gt_combinations\": [\n",
+    "        (4.0, 1, 2),\n",
+    "        (4.0, 1, 3),\n",
+    "        ]\n",
+    "}\n",
+    "\n",
+    "dml_obj_all = DoubleMLDIDMulti(dml_data, **(default_args| gt_dict))\n",
+    "dml_obj_all.fit()\n",
+    "dml_obj_all.bootstrap(n_rep_boot=5000)\n",
+    "dml_obj_all.plot_effects()"
    ]
   }
  ],
  "metadata": {
   "kernelspec": {
-   "display_name": "Python 3",
+   "display_name": "dml_dev",
    "language": "python",
    "name": "python3"
   },
@@ -963,7 +965,7 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.12.10"
+   "version": "3.12.8"
   }
  },
  "nbformat": 4,
diff --git a/doc/examples/index.rst b/doc/examples/index.rst
@@ -62,6 +62,7 @@ Difference-in-Differences
     did/py_panel_simple.ipynb
     did/py_panel.ipynb
     did/py_panel_data_example.ipynb
+    did/py_rep_cs.ipynb
 
 
 R: Case studies