diff --git a/doc/examples/did/py_panel.ipynb b/doc/examples/did/py_panel.ipynb
index 64d84be8..9defe82b 100644
--- a/doc/examples/did/py_panel.ipynb
+++ b/doc/examples/did/py_panel.ipynb
@@ -64,7 +64,7 @@
    "source": [
     "### Data Details\n",
     "\n",
-    "*Here, we slightly abuse the definition of the potential outcomes. $Y_{i,t}(1)$ corresponds to the (potential) outcome if unit $i$ would have received treatment at time period $\\mathrm{g}$ (where the group $\\mathrm{g}$ is drawn with probabilities based on $Z$).*\n",
+    "Here, we slightly abuse the definition of the potential outcomes. $Y_{i,t}(1)$ corresponds to the (potential) outcome if unit $i$ would have received treatment at time period $\\mathrm{g}$ (where the group $\\mathrm{g}$ is drawn with probabilities based on $Z$).\n",
     "\n",
     "More specifically\n",
     "\n",
@@ -873,7 +873,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "### All combinations\n",
+    "### All Combinations\n",
     "\n",
     "The  option `gt_combinations=\"all\"` includes all relevant group time values with $t_\\text{pre} < \\min(\\mathrm{g}, t_\\text{eval})$, including longer parallel trend assumptions.\n",
     "This can result in multiple estimates for the same $ATT(\\mathrm{g},t)$, which have slightly different assumptions (length of parallel trends)."
@@ -891,6 +891,29 @@
     "dml_obj_all.plot_effects()"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Universal Base Period\n",
+    "\n",
+    "The  option `gt_combinations=\"universal\"` set $t_\\text{pre} = \\mathrm{g} - \\delta - 1$, corresponding to a universal/constant comparison or base period.\n",
+    "\n",
+    "Remark that this implies $t_\\text{pre} > t_\\text{eval}$ for all pre-treatment periods (accounting for anticipation). Therefore these effects do not have the same straightforward interpretation as ATT's."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "dml_obj_universal = DoubleMLDIDMulti(dml_data, **(default_args| {\"gt_combinations\": \"universal\"}))\n",
+    "dml_obj_universal.fit()\n",
+    "dml_obj_universal.bootstrap(n_rep_boot=5000)\n",
+    "dml_obj_universal.plot_effects()"
+   ]
+  },
   {
    "cell_type": "markdown",
    "metadata": {},
diff --git a/doc/examples/did/py_rep_cs.ipynb b/doc/examples/did/py_rep_cs.ipynb
new file mode 100644
index 00000000..d666dee0
--- /dev/null
+++ b/doc/examples/did/py_rep_cs.ipynb
@@ -0,0 +1,971 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Python: Pepeated Cross-Sectional Data with Multiple Time Periods\n",
+    "\n",
+    "In this example, a detailed guide on Difference-in-Differences with multiple time periods using the [DoubleML-package](https://docs.doubleml.org/stable/index.html). The implementation is based on [Callaway and Sant'Anna(2021)](https://doi.org/10.1016/j.jeconom.2020.12.001).\n",
+    "\n",
+    "The notebook requires the following packages:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import seaborn as sns\n",
+    "import matplotlib.pyplot as plt\n",
+    "import pandas as pd\n",
+    "import numpy as np\n",
+    "\n",
+    "from lightgbm import LGBMRegressor, LGBMClassifier\n",
+    "from sklearn.linear_model import LinearRegression, LogisticRegression\n",
+    "\n",
+    "from doubleml.did import DoubleMLDIDMulti\n",
+    "from doubleml.data import DoubleMLPanelData\n",
+    "\n",
+    "from doubleml.did.datasets import make_did_cs_CS2021"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Data\n",
+    "\n",
+    "We will rely on the `make_did_CS2021` DGP, which is inspired by [Callaway and Sant'Anna(2021)](https://doi.org/10.1016/j.jeconom.2020.12.001) (Appendix SC) and [Sant'Anna and Zhao (2020)](https://doi.org/10.1016/j.jeconom.2020.06.003).\n",
+    "\n",
+    "We will observe `n_obs` units over `n_periods`. Remark that the dataframe includes observations of the potential outcomes `y0` and `y1`, such that we can use oracle estimates as comparisons. "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "n_obs = 5000\n",
+    "n_periods = 6\n",
+    "\n",
+    "df = make_did_cs_CS2021(n_obs, dgp_type=4, include_never_treated=True, n_periods=n_periods, n_pre_treat_periods=3,\n",
+    "                        lambda_t=0.5, time_type=\"float\")\n",
+    "df[\"ite\"] = df[\"y1\"] - df[\"y0\"]\n",
+    "\n",
+    "print(df.shape)\n",
+    "df.head()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Data Details\n",
+    "\n",
+    "*Here, we slightly abuse the definition of the potential outcomes. $Y_{i,t}(1)$ corresponds to the (potential) outcome if unit $i$ would have received treatment at time period $\\mathrm{g}$ (where the group $\\mathrm{g}$ is drawn with probabilities based on $Z$).*\n",
+    "\n",
+    "The data set with repeated cross-sectional data is generated on the basis of a panel data set with the following data generating process (DGP). To obtain repeated cross-sectional data, the number of generated individuals is increased to $\\frac{n_{obs}}{\\lambda_t}$, where $\\lambda_t$ denotes the probability to observe a unit at each time period (time constant).\n",
+    "\n",
+    "More specifically\n",
+    "\n",
+    "$$\n",
+    "\\begin{align*}\n",
+    "Y_{i,t}(0)&:= f_t(Z) + \\delta_t + \\eta_i + \\varepsilon_{i,t,0}\\\\\n",
+    "Y_{i,t}(1)&:= Y_{i,t}(0) + \\theta_{i,t,\\mathrm{g}} + \\epsilon_{i,t,1} - \\epsilon_{i,t,0}\n",
+    "\\end{align*}\n",
+    "$$\n",
+    "\n",
+    "where\n",
+    " - $f_t(Z)$ depends on pre-treatment observable covariates $Z_1,\\dots, Z_4$ and time $t$\n",
+    " - $\\delta_t$ is a time fixed effect\n",
+    " - $\\eta_i$ is a unit fixed effect\n",
+    " - $\\epsilon_{i,t,\\cdot}$ are time varying unobservables (iid. $N(0,1)$)\n",
+    " - $\\theta_{i,t,\\mathrm{g}}$ correponds to the exposure effect of unit $i$ based on group $\\mathrm{g}$ at time $t$\n",
+    "\n",
+    "For the pre-treatment periods the exposure effect is set to\n",
+    "$$\n",
+    "\\theta_{i,t,\\mathrm{g}}:= 0 \\text{ for } t<\\mathrm{g}\n",
+    "$$\n",
+    "such that \n",
+    "\n",
+    "$$\n",
+    "\\mathbb{E}[Y_{i,t}(1) - Y_{i,t}(0)] = \\mathbb{E}[\\epsilon_{i,t,1} - \\epsilon_{i,t,0}]=0  \\text{ for } t<\\mathrm{g}\n",
+    "$$\n",
+    "\n",
+    "The [DoubleML Coverage Repository](https://docs.doubleml.org/doubleml-coverage/) includes coverage simulations based on this DGP."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Data Description"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "The data is a balanced panel where each unit is observed over `n_periods` starting Janary 2025."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "df.groupby(\"t\").size()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "The treatment column `d` indicates first treatment period of the corresponding unit, whereas `NaT` units are never treated.\n",
+    "\n",
+    "*Generally, never treated units should take either on the value `np.inf` or `pd.NaT` depending on the data type (`float` or `datetime`).*\n",
+    "\n",
+    "The individual units are roughly uniformly divided between the groups, where treatment assignment depends on the pre-treatment covariates `Z1` to `Z4`."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "df.groupby(\"d\", dropna=False).size()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Here, the group indicates the first treated period and `NaT` units are never treated. To simplify plotting and pands"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "df.groupby(\"d\", dropna=False).size()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "To get a better understanding of the underlying data and true effects, we will compare the unconditional averages and the true effects based on the oracle values of individual effects `ite`."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# rename for plotting\n",
+    "\n",
+    "# Create aggregation dictionary for means\n",
+    "def agg_dict(col_name):\n",
+    "    return {\n",
+    "        f'{col_name}_mean': (col_name, 'mean'),\n",
+    "        f'{col_name}_lower_quantile': (col_name, lambda x: x.quantile(0.05)),\n",
+    "        f'{col_name}_upper_quantile': (col_name, lambda x: x.quantile(0.95))\n",
+    "    }\n",
+    "\n",
+    "# Calculate means and confidence intervals\n",
+    "agg_dictionary = agg_dict(\"y\") | agg_dict(\"ite\")\n",
+    "\n",
+    "agg_df = df.groupby([\"t\", \"d\"]).agg(**agg_dictionary).reset_index()\n",
+    "agg_df.head()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "def plot_data(df, col_name='y'):\n",
+    "    \"\"\"\n",
+    "    Create an improved plot with colorblind-friendly features\n",
+    "    \n",
+    "    Parameters:\n",
+    "    -----------\n",
+    "    df : DataFrame\n",
+    "        The dataframe containing the data\n",
+    "    col_name : str, default='y'\n",
+    "        Column name to plot (will use '{col_name}_mean')\n",
+    "    \"\"\"\n",
+    "    plt.figure(figsize=(12, 7))\n",
+    "    n_colors = df[\"d\"].nunique()\n",
+    "    color_palette = sns.color_palette(\"colorblind\", n_colors=n_colors)\n",
+    "\n",
+    "    sns.lineplot(\n",
+    "        data=df,\n",
+    "        x='t',\n",
+    "        y=f'{col_name}_mean',\n",
+    "        hue='d',\n",
+    "        style='d',\n",
+    "        palette=color_palette,\n",
+    "        markers=True,\n",
+    "        dashes=True,\n",
+    "        linewidth=2.5,\n",
+    "        alpha=0.8\n",
+    "    )\n",
+    "    \n",
+    "    plt.title(f'Average Values {col_name} by Group Over Time', fontsize=16)\n",
+    "    plt.xlabel('Time', fontsize=14)\n",
+    "    plt.ylabel(f'Average Value {col_name}', fontsize=14)\n",
+    "    \n",
+    "\n",
+    "    plt.legend(title='d', title_fontsize=13, fontsize=12, \n",
+    "               frameon=True, framealpha=0.9, loc='best')\n",
+    "    \n",
+    "    plt.grid(alpha=0.3, linestyle='-')\n",
+    "    plt.tight_layout()\n",
+    "\n",
+    "    plt.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "So let us take a look at the average values over time"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "plot_data(agg_df, col_name='y')"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Instead the true average treatment treatment effects can be obtained by averaging (usually unobserved) the `ite` values.\n",
+    "\n",
+    "The true effect just equals the exposure time (in months):\n",
+    "\n",
+    "$$\n",
+    "ATT(\\mathrm{g}, t) = \\min(\\mathrm{t} - \\mathrm{g} + 1, 0) =: e\n",
+    "$$\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "jupyter": {
+     "source_hidden": true
+    }
+   },
+   "outputs": [],
+   "source": [
+    "plot_data(agg_df, col_name='ite')"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### DoubleMLPanelData\n",
+    "\n",
+    "Finally, we can construct our `DoubleMLPanelData`, specifying\n",
+    "\n",
+    " - `y_col` : the outcome\n",
+    " - `d_cols`: the group variable indicating the first treated period for each unit\n",
+    " - `id_col`: the unique identification column for each unit\n",
+    " - `t_col` : the time column\n",
+    " - `x_cols`: the additional pre-treatment controls\n",
+    " - `datetime_unit`: unit required for `datetime` columns and plotting"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "dml_data = DoubleMLPanelData(\n",
+    "    data=df,\n",
+    "    y_col=\"y\",\n",
+    "    d_cols=\"d\",\n",
+    "    id_col=\"id\",\n",
+    "    t_col=\"t\",\n",
+    "    x_cols=[\"Z1\", \"Z2\", \"Z3\", \"Z4\"],\n",
+    ")\n",
+    "print(dml_data)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## ATT Estimation\n",
+    "\n",
+    "The [DoubleML-package](https://docs.doubleml.org/stable/index.html) implements estimation of group-time average treatment effect via the `DoubleMLDIDMulti` class (see [model documentation](https://docs.doubleml.org/stable/guide/models.html#difference-in-differences-models-did))."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Basics\n",
+    "\n",
+    "The class basically behaves like other `DoubleML` classes and requires the specification of two learners (for more details on the regression elements, see [score documentation](https://docs.doubleml.org/stable/guide/scores.html#difference-in-differences-models)).\n",
+    "\n",
+    "The basic arguments of a `DoubleMLDIDMulti` object include\n",
+    "\n",
+    " - `ml_g` \"outcome\" regression learner\n",
+    " - `ml_m` propensity Score learner\n",
+    " - `control_group` the control group for the parallel trend assumption\n",
+    " - `gt_combinations` combinations of $(\\mathrm{g},t_\\text{pre}, t_\\text{eval})$\n",
+    " - `anticipation_periods` number of anticipation periods\n",
+    "\n",
+    "We will construct a `dict` with \"default\" arguments.\n",
+    "\n",
+    "For repeated cross-sectional data, we additionally specify the argument\n",
+    "\n",
+    "  - `panel=False`"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "default_args = {\n",
+    "    \"ml_g\": LGBMRegressor(n_estimators=500, learning_rate=0.01, verbose=-1, random_state=123),\n",
+    "    \"ml_m\": LGBMClassifier(n_estimators=500, learning_rate=0.01, verbose=-1, random_state=123),\n",
+    "    \"control_group\": \"never_treated\",\n",
+    "    \"anticipation_periods\": 0,\n",
+    "    \"n_folds\": 5,\n",
+    "    \"n_rep\": 1,\n",
+    "    \"panel\": False,\n",
+    "}"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    " The model will be estimated using the `fit()` method. "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "dml_obj = DoubleMLDIDMulti(dml_data, **default_args)\n",
+    "dml_obj.fit()\n",
+    "print(dml_obj)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "The summary displays estimates of the $ATT(g,t_\\text{eval})$ effects for different combinations of $(g,t_\\text{eval})$ via $\\widehat{ATT}(\\mathrm{g},t_\\text{pre},t_\\text{eval})$, where\n",
+    " - $\\mathrm{g}$ specifies the group\n",
+    " - $t_\\text{pre}$ specifies the corresponding pre-treatment period\n",
+    " - $t_\\text{eval}$ specifies the evaluation period\n",
+    "\n",
+    "The choice `gt_combinations=\"standard\"`, used estimates all possible combinations of $ATT(g,t_\\text{eval})$ via $\\widehat{ATT}(\\mathrm{g},t_\\text{pre},t_\\text{eval})$,\n",
+    "where the standard choice is $t_\\text{pre} = \\min(\\mathrm{g}, t_\\text{eval}) - 1$ (without anticipation).\n",
+    "\n",
+    "Remark that this includes pre-tests effects if $\\mathrm{g} > t_{eval}$, e.g. $\\widehat{ATT}(g=\\text{2025-04}, t_{\\text{pre}}=\\text{2025-01}, t_{\\text{eval}}=\\text{2025-02})$ which estimates the pre-trend from January to February even if the actual treatment occured in April."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "As usual for the DoubleML-package, you can obtain joint confidence intervals via bootstrap."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "level = 0.95\n",
+    "\n",
+    "ci = dml_obj.confint(level=level)\n",
+    "dml_obj.bootstrap(n_rep_boot=5000)\n",
+    "ci_joint = dml_obj.confint(level=level, joint=True)\n",
+    "ci_joint"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "A visualization of the effects can be obtained via the `plot_effects()` method.\n",
+    "\n",
+    "Remark that the plot used joint confidence intervals per default. "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "tags": [
+     "nbsphinx-thumbnail"
+    ]
+   },
+   "outputs": [],
+   "source": [
+    "dml_obj.plot_effects()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Sensitivity Analysis\n",
+    "\n",
+    "As descripted in the [Sensitivity Guide](https://docs.doubleml.org/stable/guide/sensitivity.html), robustness checks on omitted confounding/parallel trend violations are available, via the standard `sensitivity_analysis()` method."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "dml_obj.sensitivity_analysis()\n",
+    "print(dml_obj.sensitivity_summary)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "In this example one can clearly, distinguish the robustness of the non-zero effects vs. the pre-treatment periods."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Control Groups\n",
+    "\n",
+    "The current implementation support the following control groups\n",
+    "\n",
+    " - ``\"never_treated\"``\n",
+    " - ``\"not_yet_treated\"``\n",
+    "\n",
+    "Remark that the ``\"not_yet_treated\" depends on anticipation.\n",
+    "\n",
+    "For differences and recommendations, we refer to [Callaway and Sant'Anna(2021)](https://doi.org/10.1016/j.jeconom.2020.12.001)."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "dml_obj_nyt = DoubleMLDIDMulti(dml_data, **(default_args | {\"control_group\": \"not_yet_treated\"}))\n",
+    "dml_obj_nyt.fit()\n",
+    "dml_obj_nyt.bootstrap(n_rep_boot=5000)\n",
+    "dml_obj_nyt.plot_effects()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Linear Covariate Adjustment\n",
+    "\n",
+    "Remark that we relied on boosted trees to adjust for conditional parallel trends which allow for a nonlinear adjustment. In comparison to linear adjustment, we could rely on linear learners.\n",
+    "\n",
+    "**Remark that the DGP (`dgp_type=4`) is based on nonlinear conditional expectations such that the estimates will be biased**\n",
+    "\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "linear_learners = {\n",
+    "    \"ml_g\": LinearRegression(),\n",
+    "    \"ml_m\": LogisticRegression(),\n",
+    "}\n",
+    "\n",
+    "dml_obj_linear = DoubleMLDIDMulti(dml_data, **(default_args | linear_learners))\n",
+    "dml_obj_linear.fit()\n",
+    "dml_obj_linear.bootstrap(n_rep_boot=5000)\n",
+    "dml_obj_linear.plot_effects()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Aggregated Effects\n",
+    "As the [did-R-package](https://bcallaway11.github.io/did/index.html), the $ATT$'s can be aggregated to summarize multiple effects.\n",
+    "For details on different aggregations and details on their interpretations see [Callaway and Sant'Anna(2021)](https://doi.org/10.1016/j.jeconom.2020.12.001).\n",
+    "\n",
+    "The aggregations are implemented via the `aggregate()` method."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Group Aggregation\n",
+    "\n",
+    "\n",
+    "To obtain group-specific effects one can would like to average $ATT(\\mathrm{g}, t_\\text{eval})$ over $t_\\text{eval}$.\n",
+    "As a sample oracle we will combine all `ite`'s based on group $\\mathrm{g}$."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "df_post_treatment = df[df[\"t\"] >= df[\"d\"]]\n",
+    "df_post_treatment.groupby(\"d\")[\"ite\"].mean()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "To obtain group-specific effects it is possible to aggregate several $\\widehat{ATT}(\\mathrm{g},t_\\text{pre},t_\\text{eval})$ values based on the group $\\mathrm{g}$ by setting the `aggregation=\"group\"` argument."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "aggregated_group = dml_obj.aggregate(aggregation=\"group\")\n",
+    "print(aggregated_group)\n",
+    "_ = aggregated_group.plot_effects()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "The output is a `DoubleMLDIDAggregation` object which includes an overall aggregation summary based on group size."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Time Aggregation\n",
+    "\n",
+    "To obtain time-specific effects one can would like to average $ATT(\\mathrm{g}, t_\\text{eval})$ over $\\mathrm{g}$ (respecting group size).\n",
+    "As a sample oracle we will combine all `ite`'s based on group $\\mathrm{g}$. As oracle values, we obtain"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "df_post_treatment.groupby(\"t\")[\"ite\"].mean()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "To aggregate $\\widehat{ATT}(\\mathrm{g},t_\\text{pre},t_\\text{eval})$, based on $t_\\text{eval}$, but weighted with respect to group size. Corresponds to *Calendar Time Effects* from the [did-R-package](https://bcallaway11.github.io/did/index.html).\n",
+    "\n",
+    "For calendar time effects set `aggregation=\"time\"`."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "aggregated_time = dml_obj.aggregate(\"time\")\n",
+    "print(aggregated_time)\n",
+    "fig, ax = aggregated_time.plot_effects()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Event Study Aggregation\n",
+    "\n",
+    "To obtain event-study-type effects one can would like to aggregate $ATT(\\mathrm{g}, t_\\text{eval})$ over $e = t_\\text{eval} - \\mathrm{g}$ (respecting group size).\n",
+    "As a sample oracle we will combine all `ite`'s based on group $\\mathrm{g}$. As oracle values, we obtain"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "df[\"e\"] = pd.to_datetime(df[\"t\"]).values.astype(\"datetime64[M]\") - \\\n",
+    "    pd.to_datetime(df[\"d\"]).values.astype(\"datetime64[M]\")\n",
+    "df.groupby(\"e\")[\"ite\"].mean()[1:]"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Analogously, `aggregation=\"eventstudy\"` aggregates $\\widehat{ATT}(\\mathrm{g},t_\\text{pre},t_\\text{eval})$ based on exposure time $e = t_\\text{eval} - \\mathrm{g}$ (respecting group size)."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "aggregated_eventstudy = dml_obj.aggregate(\"eventstudy\")\n",
+    "print(aggregated_eventstudy)\n",
+    "aggregated_eventstudy.plot_effects()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Aggregation Details\n",
+    "\n",
+    "The `DoubleMLDIDAggregation` objects include several `DoubleMLFrameworks` which support methods like `bootstrap()` or `confint()`.\n",
+    "Further, the weights can be accessed via the properties\n",
+    "\n",
+    " - ``overall_aggregation_weights``: weights for the overall aggregation\n",
+    " - ``aggregation_weights``: weights for the aggregation\n",
+    "\n",
+    "To clarify, e.g. for the eventstudy aggregation"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "print(aggregated_eventstudy)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Here, the overall effect aggregation aggregates each effect with positive exposure"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "print(aggregated_eventstudy.overall_aggregation_weights)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "If one would like to consider how the aggregated effect with $e=0$ is computed, one would have to look at the corresponding set of weights within the ``aggregation_weights`` property"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# the weights for e=0 correspond to the fifth element of the aggregation weights\n",
+    "aggregated_eventstudy.aggregation_weights[4]"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Taking a look at the original `dml_obj`, one can see that this combines the following estimates (only show month):\n",
+    "\n",
+    " - $\\widehat{ATT}(04,03,04)$\n",
+    " - $\\widehat{ATT}(05,04,05)$\n",
+    " - $\\widehat{ATT}(06,05,06)$"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "print(dml_obj.summary[\"coef\"])"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Anticipation\n",
+    "\n",
+    "As described in the [Model Guide](https://docs.doubleml.org/stable/guide/models.html#difference-in-differences-models-did), one can include anticipation periods $\\delta>0$ by setting the `anticipation_periods` parameter."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Data with Anticipation\n",
+    "\n",
+    "The DGP allows to include anticipation periods via the `anticipation_periods` parameter.\n",
+    "In this case the observations will be \"shifted\" such that units anticipate the effect earlier and the exposure effect is increased by the number of periods where the effect is anticipated."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "n_obs = 4000\n",
+    "n_periods = 6\n",
+    "\n",
+    "df_anticipation = make_did_cs_CS2021(n_obs, dgp_type=4, n_periods=n_periods, n_pre_treat_periods=3, time_type=\"datetime\", anticipation_periods=1)\n",
+    "\n",
+    "print(df_anticipation.shape)\n",
+    "df_anticipation.head()\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "To visualize the anticipation, we will again plot the \"oracle\" values"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "df_anticipation[\"ite\"] = df_anticipation[\"y1\"] - df_anticipation[\"y0\"]\n",
+    "agg_df_anticipation = df_anticipation.groupby([\"t\", \"d\"]).agg(**agg_dictionary).reset_index()\n",
+    "agg_df_anticipation.head()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "One can see that the effect is already anticipated one period before the actual treatment assignment."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "plot_data(agg_df_anticipation, col_name='ite')"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Initialize a corresponding `DoubleMLPanelData` object."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "dml_data_anticipation = DoubleMLPanelData(\n",
+    "    data=df_anticipation,\n",
+    "    y_col=\"y\",\n",
+    "    d_cols=\"d\",\n",
+    "    id_col=\"id\",\n",
+    "    t_col=\"t\",\n",
+    "    x_cols=[\"Z1\", \"Z2\", \"Z3\", \"Z4\"],\n",
+    "    datetime_unit=\"M\"\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### ATT Estimation\n",
+    "\n",
+    "Let us take a look at the estimation without anticipation."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "dml_obj_anticipation = DoubleMLDIDMulti(dml_data_anticipation, **default_args)\n",
+    "dml_obj_anticipation.fit()\n",
+    "dml_obj_anticipation.bootstrap(n_rep_boot=5000)\n",
+    "dml_obj_anticipation.plot_effects()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "The effects are obviously biased. To include anticipation periods, one can adjust the `anticipation_periods` parameter. Correspondingly, the outcome regression (and not yet treated units) are adjusted."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "dml_obj_anticipation = DoubleMLDIDMulti(dml_data_anticipation, **(default_args| {\"anticipation_periods\": 1}))\n",
+    "dml_obj_anticipation.fit()\n",
+    "dml_obj_anticipation.bootstrap(n_rep_boot=5000)\n",
+    "dml_obj_anticipation.plot_effects()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Group-Time Combinations\n",
+    "\n",
+    "The default option `gt_combinations=\"standard\"` includes all group time values with the specific choice of $t_\\text{pre} = \\min(\\mathrm{g}, t_\\text{eval}) - 1$ (without anticipation) which is the weakest possible parallel trend assumption.\n",
+    "\n",
+    "Other options are possible or only specific combinations of $(\\mathrm{g},t_\\text{pre},t_\\text{eval})$."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### All combinations\n",
+    "\n",
+    "The  option `gt_combinations=\"all\"` includes all relevant group time values with $t_\\text{pre} < \\min(\\mathrm{g}, t_\\text{eval})$, including longer parallel trend assumptions.\n",
+    "This can result in multiple estimates for the same $ATT(\\mathrm{g},t)$, which have slightly different assumptions (length of parallel trends)."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "dml_obj_all = DoubleMLDIDMulti(dml_data, **(default_args| {\"gt_combinations\": \"all\"}))\n",
+    "dml_obj_all.fit()\n",
+    "dml_obj_all.bootstrap(n_rep_boot=5000)\n",
+    "dml_obj_all.plot_effects()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Selected Combinations\n",
+    "\n",
+    "Instead it is also possible to just submit a list of tuples containing $(\\mathrm{g}, t_\\text{pre}, t_\\text{eval})$ combinations. E.g. only two combinations"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "gt_dict = {\n",
+    "    \"gt_combinations\": [\n",
+    "        (4.0, 1, 2),\n",
+    "        (4.0, 1, 3),\n",
+    "        ]\n",
+    "}\n",
+    "\n",
+    "dml_obj_all = DoubleMLDIDMulti(dml_data, **(default_args| gt_dict))\n",
+    "dml_obj_all.fit()\n",
+    "dml_obj_all.bootstrap(n_rep_boot=5000)\n",
+    "dml_obj_all.plot_effects()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Universal Base Period\n",
+    "\n",
+    "The  option `gt_combinations=\"universal\"` set $t_\\text{pre} = \\mathrm{g} - \\delta - 1$, corresponding to a universal/constant comparison or base period.\n",
+    "\n",
+    "Remark that this implies $t_\\text{pre} > t_\\text{eval}$ for all pre-treatment periods (accounting for anticipation). Therefore these effects do not have the same straightforward interpretation as ATT's."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "dml_obj_universal = DoubleMLDIDMulti(dml_data, **(default_args| {\"gt_combinations\": \"universal\"}))\n",
+    "dml_obj_universal.fit()\n",
+    "dml_obj_universal.bootstrap(n_rep_boot=5000)\n",
+    "dml_obj_universal.plot_effects()"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.12.10"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}
diff --git a/doc/guide/models/did/did_aggregation.rst b/doc/guide/models/did/did_aggregation.rst
index d1750089..cd69a65e 100644
--- a/doc/guide/models/did/did_aggregation.rst
+++ b/doc/guide/models/did/did_aggregation.rst
@@ -52,6 +52,10 @@ The method ``aggregate()`` requires the ``aggregation`` argument to be set to on
 * ``'eventstudy'``: aggregates :math:`ATT(\mathrm{g},t)` estimates based on time difference to first treatment assignment like an event study (based on group size).
 * ``dictionary``: a dictionary with values containing the aggregation weights (as ``numpy.ma.MaskedArray``).
 
+.. warning::
+    Remark that ``'time'`` and ``'eventstudy'`` aggregation use internal group reweighting according to the total group size (e.g. the group decomposition should be relatively stable over time, as assumed in Assumption 2).
+    It can be helpful to check the aggregation weights as in the :ref:`example gallery <did_examplegallery>`.
+
 .. note::
     A more detailed example on effect aggregation is available in the :ref:`example gallery <did_examplegallery>`.
     For a detailed discussion on different aggregation schemes, we refer to of `Callaway and Sant'Anna (2021) <https://doi.org/10.1016/j.jeconom.2020.12.001>`_.
diff --git a/doc/guide/models/did/did_cs.rst b/doc/guide/models/did/did_cs.rst
index dee07ecb..980293fb 100644
--- a/doc/guide/models/did/did_cs.rst
+++ b/doc/guide/models/did/did_cs.rst
@@ -1,2 +1,74 @@
+For the estimation of the target parameters :math:`ATT(\mathrm{g},t)` the following nuisance functions are required:
+
+.. math::
+    \begin{align}
+    g^{\text{treat}}_{0,\mathrm{g}, t, \text{eval} + \delta}(X_i) &:= \mathbb{E}[Y_{i,t} |X_i, G_i^{\mathrm{g}}=1, T_i=t], \\
+    g^{\text{control}}_{0,\mathrm{g}, t, \text{eval} + \delta}(X_i) &:= \mathbb{E}[Y_{i,t} |X_i, C_{i,t_\text{eval} + \delta}^{(\cdot)}=1, T_i=t], \\
+    m_{0, \mathrm{g}, t_\text{eval} + \delta}(X_i) &:= P(G_i^{\mathrm{g}}=1|X_i, G_i^{\mathrm{g}} + C_{i,t_\text{eval} + \delta}^{(\cdot)}=1).
+    \end{align}
+
+for :math:`t\in\{t_\text{pre}, t_\text{eval}\}`.
+Here, :math:`g^{(\cdot)}_{\mathrm{g}, t, \text{eval} + \delta}(\cdot)` denotes the population outcome regression function (for either treatment or control group at time period :math:`t`) and :math:`m_{0, \mathrm{g}, t_\text{eval} + \delta}(\cdot)` the generalized propensity score.
+
 .. note::
-    Will be implemented soon.
\ No newline at end of file
+    Remark that the nuisance functions depend on the control group used for the estimation of the target parameter.
+    By slight abuse of notation we use the same notation for both control groups :math:`C_{i,t}^{(\text{nev})}` and :math:`C_{i,t}^{(\text{nyt})}`. More specifically, the
+    control group only depends on :math:`\delta` for *not yet treated* units.
+
+For a given tuple :math:`(\mathrm{g}, t_\text{pre}, t_\text{eval})` the target parameter :math:`ATT(\mathrm{g},t)` is estimated by solving the empirical version of the the following linear moment condition:
+
+.. math::
+    ATT(\mathrm{g}, t_\text{pre}, t_\text{eval}):= -\frac{\mathbb{E}[\psi_b(W,\eta_0)]}{\mathbb{E}[\psi_a(W,\eta_0)]}
+
+with nuisance elements :math:`\eta_0=(g^{0,\text{treat}}_{\mathrm{g}, t_\text{pre}, t_\text{eval} + \delta}, g^{0,\text{control}}_{\mathrm{g}, t_\text{pre}, t_\text{eval} + \delta}, g^{0,\text{treat}}_{\mathrm{g}, t_\text{eval}, t_\text{eval} + \delta}, g^{0,\text{control}}_{\mathrm{g}, t_\text{eval}, t_\text{eval} + \delta}, m_{0, \mathrm{g}, t_\text{eval}})` and score function :math:`\psi(W,\theta, \eta)` defined in the :ref:`DiD Score Section<did-cs-score>`.
+
+Setting ``panel=False`` will estimate the target parameter for repeated cross sections. Estimation is conducted via its ``fit()`` method:
+
+.. tab-set::
+
+    .. tab-item:: Python
+        :sync: py
+
+        .. ipython:: python
+            :okwarning:
+
+            import numpy as np
+            import doubleml as dml
+            from doubleml.did.datasets import make_did_cs_CS2021
+            from sklearn.ensemble import RandomForestRegressor, RandomForestClassifier
+
+            np.random.seed(42)
+            df = make_did_cs_CS2021(n_obs=500) 
+            dml_data = dml.data.DoubleMLPanelData(
+                df,
+                y_col="y",
+                d_cols="d",
+                id_col="id",
+                t_col="t",
+                x_cols=["Z1", "Z2", "Z3", "Z4"],
+                datetime_unit="M"
+            )
+            dml_did_obj = dml.did.DoubleMLDIDMulti(
+                obj_dml_data=dml_data,
+                ml_g=RandomForestRegressor(min_samples_split=10),
+                ml_m=RandomForestClassifier(min_samples_split=10),
+                gt_combinations="standard",
+                control_group="never_treated",
+                panel=False,
+            )
+            print(dml_did_obj.fit())
+
+.. note::
+    Remark that the output contains four different outcome regressions :math:`g(d,t, X)` for :math:`d,t\in\{0,1\}` . As in the :ref:`IRM model <irm-model>`
+    the outcome regression with :math:`d=0` refers to the control group, whereas :math:`t=0` refers to the pre-treatment period, i.e.
+
+    .. math::
+        \begin{align}
+        g(0,0,X) &\approx g^{\text{control}}_{0,\mathrm{g}, t_\text{pre}, \text{eval} + \delta}(X_i) = \mathbb{E}[Y_{i,t_\text{pre}} |X_i, C_{i,t_\text{eval} + \delta}^{(\cdot)}=1, T_i=t_\text{pre}],\\
+        g(0,1,X) &\approx g^{\text{control}}_{0,\mathrm{g}, t_\text{eval}, \text{eval} + \delta}(X_i) = \mathbb{E}[Y_{i,t_\text{eval}} |X_i, C_{i,t_\text{eval} + \delta}^{(\cdot)}=1, T_i=t_\text{eval}],\\
+        g(1,0,X) &\approx g^{\text{treat}}_{0,\mathrm{g}, t_\text{pre}, \text{eval} + \delta}(X_i) = \mathbb{E}[Y_{i,t_\text{pre}} |X_i, G_i^{\mathrm{g}}=1,, T_i=t_\text{pre}],\\
+        g(1,1,X) &\approx g^{\text{treat}}_{0,\mathrm{g}, t_\text{eval}, \text{eval} + \delta}(X_i) = \mathbb{E}[Y_{i,t_\text{eval}} |X_i, G_i^{\mathrm{g}}=1,, T_i=t_\text{eval}].
+        \end{align}
+
+.. note::
+    A more detailed example is available in the :ref:`Example Gallery <did_examplegallery>`.
\ No newline at end of file
diff --git a/doc/guide/models/did/did_implementation.rst b/doc/guide/models/did/did_implementation.rst
new file mode 100644
index 00000000..20e06e88
--- /dev/null
+++ b/doc/guide/models/did/did_implementation.rst
@@ -0,0 +1,35 @@
+To estimate the target parameter :math:`ATT(\mathrm{g},t_\text{eval})`, the implementation (both for panel data or repeated cross sections) is based on the following parameters:
+
+* :math:`\mathrm{g}` is the first post-treatment period of interest, i.e. the treatment group.
+* :math:`t_\text{pre}` is the pre-treatment period, i.e. the time period from which the conditional parallel trends are assumed.
+* :math:`t_\text{eval}` is the time period of interest or evaluation period, i.e. the time period where the treatment effect is evaluated.
+* :math:`\delta` is number of anticipation periods, i.e. the number of time periods for which units are assumed to anticipate the treatment.
+
+
+Under the assumptions above the target parameter :math:`ATT(\mathrm{g},t_\text{eval})` can be estimated by choosing a suitable combination
+of :math:`(\mathrm{g}, t_\text{pre}, t_\text{eval}, \delta)` if :math:`t_\text{eval} - t_\text{pre} \ge 1 + \delta`, i.e. the parallel trends are assumed to hold at least one period more than the anticipation period.
+
+.. note::
+    The choice :math:`t_\text{pre}= \min(\mathrm{g},t_\text{eval}) -\delta-1` corresponds to the definition of :math:`ATT_{dr}(\mathrm{g},t_\text{eval};\delta)` from `Callaway and Sant'Anna (2021) <https://doi.org/10.1016/j.jeconom.2020.12.001>`_.
+
+    As an example, if the target parameter is the effect on the group receiving treatment in :math:`2006` but evaluated in :math:`2007` with an anticipation period of :math:`\delta=1`, then the pre-treatment period is :math:`2004`.
+    The parallel trend assumption is slightly stronger with anticipation as the trends have to parallel for a longer periods, i.e. :math:`ATT_{dr}(2006,2007;1)=ATT(2006,2004;2006)`.
+
+In the following, we will omit the subscript :math:`\delta` in the notation of the nuisance functions and the control group (implicitly assuming :math:`\delta=0`).
+
+For a given tuple :math:`(\mathrm{g}, t_\text{pre}, t_\text{eval})` the target parameter :math:`ATT(\mathrm{g},t)` is estimated by solving the empirical version of the the following linear moment condition:
+
+.. math::
+    ATT(\mathrm{g}, t_\text{pre}, t_\text{eval}):= -\frac{\mathbb{E}[\psi_b(W,\eta_0)]}{\mathbb{E}[\psi_a(W,\eta_0)]}
+
+with nuisance elements :math:`\eta_0` which depend on the parameter combination :math:`(\mathrm{g}, t_\text{pre}, t_\text{eval})` and score function :math:`\psi(W,\theta, \eta)` (for details, see :ref:`Panel Data Details <did-pa-model>` or :ref:`Repeated Cross-Section Details <did-cs-model>`).
+Under the identifying assumptions above 
+
+.. math::
+    ATT(\mathrm{g}, t_\text{pre}, t_\text{eval}) = ATT(\mathrm{g},t).
+
+``DoubleMLDIDMulti`` implements the estimation of :math:`ATT(\mathrm{g}, t_\text{pre}, t_\text{eval})` for multiple time periods and requires :ref:`DoubleMLPanelData <dml_panel_data>` as input.
+
+Setting ``gt_combinations='standard'`` will estimate the target parameter for all (possible) combinations of :math:`(\mathrm{g}, t_\text{pre}, t_\text{eval})` with :math:`\mathrm{g}\in\{2,\dots,\mathcal{T}\}` and :math:`(t_\text{pre}, t_\text{eval})` with :math:`t_\text{eval}\in\{2,\dots,\mathcal{T}\}` and
+:math:`t_\text{pre}= \min(\mathrm{g},t_\text{eval}) -\delta-1`.
+This corresponds to the setting where all trends are set as short as possible, but still respecting the anticipation period.
diff --git a/doc/guide/models/did/did_models.inc b/doc/guide/models/did/did_models.inc
index 88575949..f9324b18 100644
--- a/doc/guide/models/did/did_models.inc
+++ b/doc/guide/models/did/did_models.inc
@@ -1,10 +1,17 @@
 .. include:: /guide/models/did/did_setup.rst
 
+.. _did-implementation-model:
+
+Parameters & Implementation
+***************************
+
+.. include:: /guide/models/did/did_implementation.rst
+
 
 .. _did-pa-model:
 
 Panel data
-**********
+******************
 
 .. include:: /guide/models/did/did_pa.rst
 
@@ -12,7 +19,7 @@ Panel data
 .. _did-cs-model:
 
 Repeated cross-sections
-***********************
+*******************************
 
 .. include:: /guide/models/did/did_cs.rst
 
diff --git a/doc/guide/models/did/did_pa.rst b/doc/guide/models/did/did_pa.rst
index 3b12d6d9..08ad728a 100644
--- a/doc/guide/models/did/did_pa.rst
+++ b/doc/guide/models/did/did_pa.rst
@@ -6,45 +6,19 @@ For the estimation of the target parameters :math:`ATT(\mathrm{g},t)` the follow
     m_{0, \mathrm{g}, t_\text{eval} + \delta}(X_i) &:= P(G_i^{\mathrm{g}}=1|X_i, G_i^{\mathrm{g}} + C_{i,t_\text{eval} + \delta}^{(\cdot)}=1).
     \end{align}
 
-where :math:`g_{0, \mathrm{g}, t_\text{pre}, t_\text{eval},\delta}(\cdot)` denotes the population outcome regression function and :math:`m_{0, \mathrm{g}, t_\text{eval} + \delta}(\cdot)` the generalized propensity score.
-The interpretation of the parameters is as follows:
-
-* :math:`\mathrm{g}` is the first post-treatment period of interest, i.e. the treatment group.
-* :math:`t_\text{pre}` is the pre-treatment period, i.e. the time period from which the conditional parallel trends are assumed.
-* :math:`t_\text{eval}` is the time period of interest or evaluation period, i.e. the time period where the treatment effect is evaluated.
-* :math:`\delta` is number of anticipation periods, i.e. the number of time periods for which units are assumed to anticipate the treatment.
+where :math:`g_{0, \mathrm{g}, t_\text{pre}, t_\text{eval},\delta}(\cdot)` denotes the population outcome change regression function and :math:`m_{0, \mathrm{g}, t_\text{eval} + \delta}(\cdot)` the generalized propensity score.
 
 .. note::
     Remark that the nuisance functions depend on the control group used for the estimation of the target parameter.
     By slight abuse of notation we use the same notation for both control groups :math:`C_{i,t}^{(\text{nev})}` and :math:`C_{i,t}^{(\text{nyt})}`. More specifically, the
     control group only depends on :math:`\delta` for *not yet treated* units.
 
-Under these assumptions the target parameter :math:`ATT(\mathrm{g},t_\text{eval})` can be estimated by choosing a suitable combination
-of :math:`(\mathrm{g}, t_\text{pre}, t_\text{eval}, \delta)` if :math:`t_\text{eval} - t_\text{pre} \ge 1 + \delta`, i.e. the parallel trends are assumed to hold at least one period more than the anticipation period.
-
-.. note::
-    The choice :math:`t_\text{pre}= \min(\mathrm{g},t_\text{eval}) -\delta-1` corresponds to the definition of :math:`ATT_{dr}(\mathrm{g},t_\text{eval};\delta)` from `Callaway and Sant'Anna (2021) <https://doi.org/10.1016/j.jeconom.2020.12.001>`_.
-
-    As an example, if the target parameter is the effect on the group receiving treatment in :math:`2006` but evaluated in :math:`2007` with an anticipation period of :math:`\delta=1`, then the pre-treatment period is :math:`2004`.
-    The parallel trend assumption is slightly stronger with anticipation as the trends have to parallel for a longer periods, i.e. :math:`ATT_{dr}(2006,2007;1)=ATT(2006,2004;2006)`.
-
-In the following, we will omit the subscript :math:`\delta` in the notation of the nuisance functions and the control group (implicitly assuming :math:`\delta=0`).
-
 For a given tuple :math:`(\mathrm{g}, t_\text{pre}, t_\text{eval})` the target parameter :math:`ATT(\mathrm{g},t)` is estimated by solving the empirical version of the the following linear moment condition:
 
 .. math::
     ATT(\mathrm{g}, t_\text{pre}, t_\text{eval}):= -\frac{\mathbb{E}[\psi_b(W,\eta_0)]}{\mathbb{E}[\psi_a(W,\eta_0)]}
 
-with nuisance elements :math:`\eta_0=(g_{0, \mathrm{g}, t_\text{pre}, t_\text{eval}}, m_{0, \mathrm{g}, t_\text{eval}})` and score function :math:`\psi(W,\theta, \eta)` being defined in section :ref:`did-pa-score`.
-Under the identifying assumptions above 
-
-.. math::
-    ATT(\mathrm{g}, t_\text{pre}, t_\text{eval}) = ATT(\mathrm{g},t).
-
-``DoubleMLDIDMulti`` implements the estimation of :math:`ATT(\mathrm{g}, t_\text{pre}, t_\text{eval})` for multiple time periods and requires :ref:`DoubleMLPanelData <dml_panel_data>` as input.
-Setting ``gt_combinations='standard'`` will estimate the target parameter for all (possible) combinations of :math:`(\mathrm{g}, t_\text{pre}, t_\text{eval})` with :math:`\mathrm{g}\in\{2,\dots,\mathcal{T}\}` and :math:`(t_\text{pre}, t_\text{eval})` with :math:`t_\text{eval}\in\{2,\dots,\mathcal{T}\}` and
-:math:`t_\text{pre}= \min(\mathrm{g},t_\text{eval}) -\delta-1`.
-This corresponds to the setting where all trends are set as short as possible, but still respecting the anticipation period. 
+with nuisance elements :math:`\eta_0=(g_{0, \mathrm{g}, t_\text{pre}, t_\text{eval}}, m_{0, \mathrm{g}, t_\text{eval}})` and score function :math:`\psi(W,\theta, \eta)` being defined in the :ref:`DiD Score Section<did-pa-score>`.
 
 Estimation is conducted via its ``fit()`` method:
 
@@ -83,7 +57,7 @@ Estimation is conducted via its ``fit()`` method:
 
 .. note::
     Remark that the output contains two different outcome regressions :math:`g(0,X)` and :math:`g(1,X)`. As in the :ref:`IRM model <irm-model>`
-    the outcome regression :math:`g(0,X)` refers to the control group, whereas :math:`g(1,X)` refers to the outcome regression for the treatment group, i.e.
+    the outcome regression :math:`g(0,X)` refers to the control group, whereas :math:`g(1,X)` refers to the outcome change regression for the treatment group, i.e.
 
     .. math::
         \begin{align}
diff --git a/doc/guide/models/did/did_setup.rst b/doc/guide/models/did/did_setup.rst
index 73fbb732..18fd0000 100644
--- a/doc/guide/models/did/did_setup.rst
+++ b/doc/guide/models/did/did_setup.rst
@@ -40,8 +40,15 @@ The corresponding identifying assumptions are:
     :math:`D_{i,1} = 0 \quad a.s.`
     For all :math:`t=2,\dots,\mathcal{T}`, :math:`D_{i,t-1} = 1` implies :math:`D_{i,t} = 1 \quad a.s.`
 
-2. **Panel Data (Random Sampling):** 
-    :math:`(Y_{i,1},\dots, Y_{i,\mathcal{T}}, X_i, D_{i,1}, \dots, D_{i,\mathcal{T}})_{i=1}^n` is independent and identically distributed.
+2. **Data:**
+    The observed data are generated according to the following mechanisms:
+
+    a. **Panel Data (Random Sampling):** 
+    The sample :math:`(Y_{i,1},\dots, Y_{i,\mathcal{T}}, X_i, D_{i,1}, \dots, D_{i,\mathcal{T}})_{i=1}^n` is independent and identically distributed.
+    
+    b. **Repeated Cross Sections:**
+    The sample consists of :math:`(Y_{i,t},G^{2}_i,\dots,G^{\mathcal{T}}_i, C_i,T_i, X_i)_{i=1}^n`, where :math:`T_i\in \{1,\dots,\mathcal{T}\}` denotes the time period of unit :math:`i` being observed.
+    Conditional on :math:`T=t`, the data are independent and identically distributed from the distribution of :math:`(Y_{t},G^{2},\dots,G^{\mathcal{T}}, C, X)`, with :math:`(G^{2},\dots,G^{\mathcal{T}}, C, X)` being invariant to :math:`T`.
 
 3. **Limited Treatment Anticipation:**
     There is a known :math:`\delta\ge 0` such that
@@ -64,4 +71,4 @@ The corresponding identifying assumptions are:
 .. note:: 
     For a detailed discussion of the assumptions see `Callaway and Sant'Anna (2021) <https://doi.org/10.1016/j.jeconom.2020.12.001>`_.
 
-Under the assumptions above (either Assumption 4.a or 4.b), the target parameter :math:`ATT(\mathrm{g},t)` is identified see Theorem 1. `Callaway and Sant'Anna (2021) <https://doi.org/10.1016/j.jeconom.2020.12.001>`_.
\ No newline at end of file
+Under the assumptions above (either Assumption a. or b.), the target parameter :math:`ATT(\mathrm{g},t)` is identified see Theorem 1. `Callaway and Sant'Anna (2021) <https://doi.org/10.1016/j.jeconom.2020.12.001>`_.
diff --git a/doc/guide/scores/did/did_cs_binary_score.rst b/doc/guide/scores/did/did_cs_binary_score.rst
index 72717da4..0930ec49 100644
--- a/doc/guide/scores/did/did_cs_binary_score.rst
+++ b/doc/guide/scores/did/did_cs_binary_score.rst
@@ -55,8 +55,8 @@ If ``in_sample_normalization='False'``, the score is set to
 
     =\; &\psi_a(W; \eta) \theta + \psi_b(W; \eta)
 
-with :math:`\eta=(g, p, \lambda)`, where :math:`p_0 = \mathbb{E}[D]` and :math:`\lambda_0 = \mathbb{E}[T]` are estimated on the cross-fitting folds.
-Remark that this will result in the same score, but just uses slightly different normalization.
+with :math:`\eta=(g, p, \lambda)`, where :math:`p_0 = \mathbb{E}[D]` and :math:`\lambda_0 = \mathbb{E}[T]` are estimated on the whole sample.
+Remark that this will result in a similar score, but just uses slightly different normalization.
 
 ``score='experimental'`` assumes that the treatment probability is independent of the covariates :math:`X` and
 implements the score function:
@@ -115,5 +115,5 @@ Analogously, if ``in_sample_normalization='False'``,  the score is set to
 
     =\; &\psi_a(W; \eta) \theta + \psi_b(W; \eta)
 
-with :math:`\eta=(g, m, p, \lambda)`, where :math:`p_0 = \mathbb{E}[D]` and :math:`\lambda_0 = \mathbb{E}[T]` are estimated on the cross-fitting folds.
-Remark that this will result in the same score, but just uses slightly different normalization.
+with :math:`\eta=(g, m, p, \lambda)`, where :math:`p_0 = \mathbb{E}[D]` and :math:`\lambda_0 = \mathbb{E}[T]` are estimated on the whole sample.
+Remark that this will result in a similar score, but just uses slightly different normalization.
diff --git a/doc/guide/scores/did/did_cs_score.rst b/doc/guide/scores/did/did_cs_score.rst
index dee07ecb..1f465b1f 100644
--- a/doc/guide/scores/did/did_cs_score.rst
+++ b/doc/guide/scores/did/did_cs_score.rst
@@ -1,2 +1,125 @@
+As in the description of the :ref:`DiD model <did-cs-model>`, the required nuisance elements are
+
+.. math::
+    \begin{align}
+    g^{\text{treat}}_{0,\mathrm{g}, t, \text{eval} + \delta}(X_i) &:= \mathbb{E}[Y_{i,t} |X_i, G_i^{\mathrm{g}}=1, T_i=t], \\
+    g^{\text{control}}_{0,\mathrm{g}, t, \text{eval} + \delta}(X_i) &:= \mathbb{E}[Y_{i,t} |X_i, C_{i,t_\text{eval} + \delta}^{(\cdot)}=1, T_i=t], \\
+    m_{0, \mathrm{g}, t_\text{eval} + \delta}(X_i) &:= P(G_i^{\mathrm{g}}=1|X_i, G_i^{\mathrm{g}} + C_{i,t_\text{eval} + \delta}^{(\cdot)}=1).
+    \end{align}
+
+for :math:`t\in\{t_\text{pre}, t_\text{eval}\}` and a certain choice of :math:`(\mathrm{g}, t_\text{pre}, t_\text{eval})` and :math:`\delta` and control group :math:`C_{i,t_\text{eval} + \delta}^{(\cdot)}`.
+
+For notational purposes, we will omit the subscripts :math:`\mathrm{g}, t_\text{pre}, t_\text{eval}, \delta` in the following and use the notation 
+
+* :math:`g_0(1, 0, X_i) \equiv g^{\text{treat}}_{0,\mathrm{g}, t_\text{pre}, \text{eval} + \delta}(X_i)` (pop. outcome regr. function for treatment group in :math:`t_\text{pre}`)
+* :math:`g_0(1, 1, X_i) \equiv g^{\text{treat}}_{0,\mathrm{g}, t_\text{eval}, \text{eval} + \delta}(X_i)` (pop. outcome regr. function for treatment group in :math:`t_\text{eval}`)
+* :math:`g_0(0, 0, X_i) \equiv g^{\text{control}}_{0,\mathrm{g}, t_\text{pre}, \text{eval} + \delta}(X_i)` (pop. outcome regr. function for control group in :math:`t_\text{pre}`)
+* :math:`g_0(0, 1, X_i) \equiv g^{\text{control}}_{0,\mathrm{g}, t_\text{eval}, \text{eval} + \delta}(X_i)` (pop. outcome regr. function for control group in :math:`t_\text{eval}`)
+* :math:`m_0(X_i)\equiv m_{0, \mathrm{g}, t_\text{eval} + \delta}(X_i)` (generalized propensity score).
+
+All scores in the multi-period setting have the form 
+
+.. math::
+
+    \psi(W_i,\theta, \eta) := 
+    \begin{cases}
+    \tilde{\psi}(W_i,\theta, \eta) & \text{for } G_i^{\mathrm{g}} \vee C_{i,t_\text{eval} + \delta}^{(\cdot)}=1 \\
+    0 & \text{otherwise}
+    \end{cases}
+
+i.e. the score is only non-zero for units in the corresponding treatment group :math:`\mathrm{g}` and control group :math:`C_{i,t_\text{eval} + \delta}^{(\cdot)}`.
+
+For the difference-in-differences model implemented in ``DoubleMLDIDMulti`` one can choose between
+``score='observational'`` and ``score='experimental'``.
+
+``score='observational'`` implements the score function (dropping the unit index :math:`i`):
+
+.. math::
+
+    \tilde{\psi}(W,\theta,\eta) :=\; & - \frac{G^{\mathrm{g}}}{\mathbb{E}_n[G^{\mathrm{g}}]}\theta + \frac{G^{\mathrm{g}}}{\mathbb{E}_n[G^{\mathrm{g}}]}\Big(g(1,1,X) - g(1,0,X) - (g(0,1,X) - g(0,0,X))\Big)
+
+    & + \frac{G^{\mathrm{g}}T}{\mathbb{E}_n[G^{\mathrm{g}}T]} (Y - g(1,1,X)) 
+
+    & - \frac{G^{\mathrm{g}}(1-T)}{\mathbb{E}_n[G^{\mathrm{g}}(1-T)]}(Y - g(1,0,X))
+
+    & - \frac{m(X) (1-G^{\mathrm{g}})T}{1-m(X)} \mathbb{E}_n\left[\frac{m(X) (1-G^{\mathrm{g}})T}{1-m(X)}\right]^{-1} (Y-g(0,1,X)) 
+
+    & + \frac{m(X) (1-G^{\mathrm{g}})(1-T)}{1-m(X)} \mathbb{E}_n\left[\frac{m(X) (1-G^{\mathrm{g}})(1-T)}{1-m(X)}\right]^{-1} (Y-g(0,0,X))
+
+    =\; &\tilde{\psi}_a(W; \eta) \theta + \tilde{\psi}_b(W; \eta)
+
+where the components of the final linear score :math:`\psi` are
+
+.. math::
+    \psi_a(W; \eta) &=  \tilde{\psi}_a(W; \eta) \cdot \max(G^{\mathrm{g}}, C^{(\cdot)}),
+
+    \psi_b(W; \eta) &= \tilde{\psi}_b(W; \eta) \cdot \max(G^{\mathrm{g}}, C^{(\cdot)})
+
+and the nuisance elements :math:`\eta=(g, m)`.
+
 .. note::
-    Will be implemented soon.
\ No newline at end of file
+    Remark that :math:`1-G^{\mathrm{g}}=C^{(\cdot)}` if :math:`G^{\mathrm{g}} \vee C_{t_\text{eval} + \delta}^{(\cdot)}=1`.
+
+If ``in_sample_normalization='False'``, the score is set to
+
+.. math::
+
+    \tilde{\psi}(W,\theta,\eta) :=\; & - \frac{G^{\mathrm{g}}}{p}\theta + \frac{G^{\mathrm{g}}}{p}\Big(g(1,1,X) - g(1,0,X) - (g(0,1,X) - g(0,0,X))\Big)
+
+    & + \frac{G^{\mathrm{g}}T}{p\lambda} (Y - g(1,1,X)) 
+
+    & - \frac{G^{\mathrm{g}}(1-T)}{p(1-\lambda)}(Y - g(1,0,X))
+
+    & - \frac{m(X) (1-G^{\mathrm{g}})T}{p(1-m(X))\lambda} (Y-g(0,1,X)) 
+
+    & + \frac{m(X) (1-G^{\mathrm{g}})(1-T)}{p(1-m(X))(1-\lambda)} (Y-g(0,0,X))
+
+    =\; &\tilde{\psi}_a(W; \eta) \theta + \tilde{\psi}_b(W; \eta)
+
+with :math:`\eta=(g, m, p, \lambda)`, where :math:`p_0 = \mathbb{E}[G^{\mathrm{g}}]` and :math:`\lambda_0 = \mathbb{E}[T]` are estimated on the subsample.
+Remark that this will result a similar score, but just uses slightly different normalization.
+
+``score='experimental'`` assumes that the treatment probability is independent of the covariates :math:`X` and
+implements the score function:
+
+.. math::
+
+    \tilde{\psi}(W,\theta,\eta) :=\; & - \theta + \Big(g(1,1,X) - g(1,0,X) - (g(0,1,X) - g(0,0,X))\Big)
+
+    & + \frac{G^{\mathrm{g}}T}{\mathbb{E}_n[G^{\mathrm{g}}T]} (Y - g(1,1,X)) 
+
+    & - \frac{G^{\mathrm{g}}(1-T)}{\mathbb{E}_n[G^{\mathrm{g}}(1-T)]}(Y - g(1,0,X))
+
+    & - \frac{(1-G^{\mathrm{g}})T}{\mathbb{E}_n[(1-G^{\mathrm{g}})T]} (Y-g(0,1,X)) 
+
+    & + \frac{(1-G^{\mathrm{g}})(1-T)}{\mathbb{E}_n[(1-G^{\mathrm{g}})(1-T)]} (Y-g(0,0,X))
+
+    =\; &\tilde{\psi}_a(W; \eta) \theta + \tilde{\psi}_b(W; \eta)
+
+where the components of the final linear score :math:`\psi` are
+
+.. math::
+    \psi_a(W; \eta) &=  \tilde{\psi}_a(W; \eta) \cdot \max(G^{\mathrm{g}}, C^{(\cdot)}),
+
+    \psi_b(W; \eta) &= \tilde{\psi}_b(W; \eta) \cdot \max(G^{\mathrm{g}}, C^{(\cdot)})
+
+and the nuisance elements :math:`\eta=(g, m)`.
+
+Analogously, if ``in_sample_normalization='False'``,  the score is set to
+
+.. math::
+
+    \tilde{\psi}(W,\theta,\eta) :=\; & - \theta + \Big(g(1,1,X) - g(1,0,X) - (g(0,1,X) - g(0,0,X))\Big)
+
+    & + \frac{G^{\mathrm{g}}T}{p\lambda} (Y - g(1,1,X)) 
+
+    & - \frac{G^{\mathrm{g}}(1-T)}{p(1-\lambda)}(Y - g(1,0,X))
+
+    & - \frac{(1-G^{\mathrm{g}})T}{(1-p)\lambda} (Y-g(0,1,X)) 
+
+    & + \frac{(1-G^{\mathrm{g}})(1-T)}{(1-p)(1-\lambda)} (Y-g(0,0,X))
+
+    =\; &\tilde{\psi}_a(W; \eta) \theta + \tilde{\psi}_b(W; \eta)
+
+with :math:`\eta=(g, m, p, \lambda)`, where :math:`p_0 = \mathbb{E}[G^{\mathrm{g}}]` and :math:`\lambda_0 = \mathbb{E}[T]` are estimated on the subsample.
+Remark that this will result in a similar score, but just uses slightly different normalization.
\ No newline at end of file
diff --git a/doc/guide/scores/did/did_pa_score.rst b/doc/guide/scores/did/did_pa_score.rst
index 8d6602e5..1f1ab9f5 100644
--- a/doc/guide/scores/did/did_pa_score.rst
+++ b/doc/guide/scores/did/did_pa_score.rst
@@ -10,7 +10,7 @@ for a certain choice of :math:`(\mathrm{g}, t_\text{pre}, t_\text{eval})` and :m
 
 For notational purposes, we will omit the subscripts :math:`\mathrm{g}, t_\text{pre}, t_\text{eval}, \delta` in the following and use the notation 
 
-* :math:`g_0(0, X_i)\equiv g_{0, \mathrm{g}, t_\text{pre}, t_\text{eval}, \delta}(X_i)` (population outcome regression function of the control group)
+* :math:`g_0(0, X_i)\equiv g_{0, \mathrm{g}, t_\text{pre}, t_\text{eval}, \delta}(X_i)` (population outcome change regression function of the control group)
 * :math:`m_0(X_i)\equiv m_{0, \mathrm{g}, t_\text{eval} + \delta}(X_i)` (generalized propensity score)
 
 All scores in the multi-period setting have the form 
diff --git a/doc/guide/sensitivity/did/did_cs_sensitivity.rst b/doc/guide/sensitivity/did/did_cs_sensitivity.rst
index dee07ecb..204c4948 100644
--- a/doc/guide/sensitivity/did/did_cs_sensitivity.rst
+++ b/doc/guide/sensitivity/did/did_cs_sensitivity.rst
@@ -1,2 +1,38 @@
+For a detailed description of the scores and nuisance elements, see :ref:`did-cs-score`.
+
+In the :ref:`did-cs-model` with ``score='observational'`` and ``in_sample_normalization=True`` the score function implies the following representations
+
+.. math::
+
+    m(W,g) &= \Big(\big(g(1,1,X) - g(1,0,X)\big) - \big(g(0,1,X) - g(0,0,X)\big)\Big) \frac{G^{\mathrm{g}}}{\mathbb{E}[G^{\mathrm{g}}]} \cdot \max(G^{\mathrm{g}}, C^{(\cdot)})
+
+    \alpha(W) &= \Bigg(\frac{G^{\mathrm{g}}T}{\mathbb{E}[G^{\mathrm{g}}T]} - \frac{G^{\mathrm{g}}(1-T)}{\mathbb{E}[G^{\mathrm{g}}(1-T)]}
+
+    &\quad - \frac{m(X)(1-G^{\mathrm{g}})T}{1-m(X)}\mathbb{E}\left[\frac{m(X)(1-G^{\mathrm{g}})T}{1-m(X)}\right]^{-1}
+
+    &\quad + \frac{m(X)(1-G^{\mathrm{g}})(1-T)}{1-m(X)}\mathbb{E}\left[\frac{m(X)(1-G^{\mathrm{g}})(1-T)}{1-m(X)}\right]^{-1} \Bigg) \cdot \max(G^{\mathrm{g}}, C^{(\cdot)}).
+
+If instead ``in_sample_normalization=False``, the Riesz representer changes to 
+
+.. math::
+
+    \alpha(W) = \left(\frac{T}{\mathbb{E}[G^{\mathrm{g}}]\mathbb{E}[T]} + \frac{1-T}{\mathbb{E}[G^{\mathrm{g}}](1-\mathbb{E}[T])}\right)\left(G^{\mathrm{g}} - (1-G^{\mathrm{g}})\frac{m(X)}{1-m(X)}\right) \cdot \max(G^{\mathrm{g}}, C^{(\cdot)}).
+
+For ``score='experimental'`` the score function implies the following representations
+
+.. math::
+
+    m(W,g) &= \big(g(1,1,X) - g(1,0,X)\big) - \big(g(0,1,X) - g(0,0,X)\big) \cdot \max(G^{\mathrm{g}}, C^{(\cdot)})
+
+    \alpha(W) &= \left(\frac{G^{\mathrm{g}}T}{\mathbb{E}[G^{\mathrm{g}}T]} - \frac{G^{\mathrm{g}}(1-T)}{\mathbb{E}[G^{\mathrm{g}}(1-T)]} - \frac{(1-G^{\mathrm{g}})T}{\mathbb{E}[(1-G^{\mathrm{g}})T]} + \frac{(1-G^{\mathrm{g}})(1-T)}{\mathbb{E}[(1-G^{\mathrm{g}})(1-T)]}\right) \cdot \max(G^{\mathrm{g}}, C^{(\cdot)}).
+
+And again, if instead ``in_sample_normalization=False``, the Riesz representer changes to 
+
+.. math::
+
+    \alpha(W) = \left(\frac{G^{\mathrm{g}}T}{\mathbb{E}[G^{\mathrm{g}}]\mathbb{E}[T]} - \frac{G^{\mathrm{g}}(1-T)}{\mathbb{E}[G^{\mathrm{g}}](1-\mathbb{E}[T])} - \frac{(1-G^{\mathrm{g}})T}{(1-\mathbb{E}[G^{\mathrm{g}}])\mathbb{E}[T]} + \frac{(1-G^{\mathrm{g}})(1-T)}{(1-\mathbb{E}[G^{\mathrm{g}}])(1-\mathbb{E}[T])}\right) \cdot \max(G^{\mathrm{g}}, C^{(\cdot)}).
+
+The ``nuisance_elements`` are then computed with plug-in versions according to the general :ref:`sensitivity_implementation`, but the scores :math:`\psi_{\sigma^2}` and :math:`\psi_{\nu^2}` are scaled according to the sample size of the subset, i.e. with scaling factor :math:`c=\frac{n_{\text{obs}}}{n_{\text{subset}}}`.
+
 .. note::
-    Will be implemented soon.
\ No newline at end of file
+    Remark that the elements are only non-zero for units in the corresponding treatment group :math:`\mathrm{g}` and control group :math:`C^{(\cdot)}`, as :math:`1-G^{\mathrm{g}}=C^{(\cdot)}` if :math:`\max(G^{\mathrm{g}}, C^{(\cdot)})=1`.
\ No newline at end of file
diff --git a/doc/guide/sensitivity/did/did_pa_sensitivity.rst b/doc/guide/sensitivity/did/did_pa_sensitivity.rst
index 7a9a665d..8d9b1d36 100644
--- a/doc/guide/sensitivity/did/did_pa_sensitivity.rst
+++ b/doc/guide/sensitivity/did/did_pa_sensitivity.rst
@@ -4,7 +4,7 @@ In the :ref:`did-pa-model` with ``score='observational'`` and ``in_sample_normal
 
 .. math::
 
-    m(W,g) &= \big(g(1,X) - g(0,X)\big)\cdot \frac{G^{\mathrm{g}}}{\mathbb{E}[G^{\mathrm{g}}]}
+    m(W,g) &= \big(g(1,X) - g(0,X)\big)\cdot \frac{G^{\mathrm{g}}}{\mathbb{E}[G^{\mathrm{g}}]}\cdot \max(G^{\mathrm{g}}, C^{(\cdot)})
 
     \alpha(W) &= \left(\frac{G^{\mathrm{g}}}{\mathbb{E}[G^{\mathrm{g}}]} - \frac{\frac{m(X)(1-G^{\mathrm{g}})}{1-m(X)}}{\mathbb{E}\left[\frac{m(X)(1-G^{\mathrm{g}})}{1-m(X)}\right]}\right) \cdot \max(G^{\mathrm{g}}, C^{(\cdot)}).
 
@@ -22,7 +22,7 @@ For ``score='experimental'`` implies the score function implies the following re
 
     \alpha(W) &= \left(\frac{G^{\mathrm{g}}}{\mathbb{E}[G^{\mathrm{g}}]} - \frac{1-G^{\mathrm{g}}}{1-\mathbb{E}[G^{\mathrm{g}}]}\right) \cdot \max(G^{\mathrm{g}}, C^{(\cdot)}).
 
-The ``nuisance_elements`` are then computed with plug-in versions according to the general :ref:`sensitivity_implementation`.
+The ``nuisance_elements`` are then computed with plug-in versions according to the general :ref:`sensitivity_implementation`, but the scores :math:`\psi_{\sigma^2}` and :math:`\psi_{\nu^2}` are scaled according to the sample size of the subset, i.e. with scaling factor :math:`c=\frac{n_{\text{ids}}}{n_{\text{subset}}}`.
 
 .. note::
-    Remark that the elements are only non-zero for units in the corresponding treatment group :math:`\mathrm{g}` and control group :math:`C^{(\cdot)}`, as :math:`1-G^{\mathrm{g}}=C^{(\cdot)}` if :math:`G^{\mathrm{g}} \vee C_{t_\text{eval} + \delta}^{(\cdot)}=1`.
+    Remark that the elements are only non-zero for units in the corresponding treatment group :math:`\mathrm{g}` and control group :math:`C^{(\cdot)}`, as :math:`1-G^{\mathrm{g}}=C^{(\cdot)}` if :math:`\max(G^{\mathrm{g}}, C^{(\cdot)})=1`.