Skip to content

Add weights option to event_study #920

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 32 commits into
base: master
Choose a base branch
from

Conversation

shapiromh
Copy link
Contributor

@shapiromh shapiromh commented May 28, 2025

This PR opens the (analytic) weight option to the did/event_study function and all the subclasses of DiD it currently calls (twfe, did2s, and saturated_twfe). The related issue is #919

Changes in this PR:

  • The DiD2S class already permitted weights so the new event_study function simply passes weights to it. No new test was added for this change.
  • The TWFE class was opened to accept weights. They are passed through to the underlying FEOLS call. One test was added on top of existing TWFE event study tests.
  • The Saturated Event Study was similarly opened to accept weights. It required a substantive change to use these weights in its aggregation method. New tests were added comparing it to fixest sunab with weights.
  • The ABC DID now expects weights as an argument. LPDID is the only subclass of DID that still does not allow weights.

Matthew Shapiro and others added 25 commits May 17, 2025 11:09
@shapiromh
Copy link
Contributor Author

I also added a fix for the saturated event study not being able to call summarize because of an out-of-date check in the summarize code.

Copy link

codecov bot commented May 28, 2025

Codecov Report

Attention: Patch coverage is 90.00000% with 2 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
pyfixest/report/summarize.py 0.00% 2 Missing ⚠️
Flag Coverage Δ
core-tests 78.36% <35.00%> (-0.11%) ⬇️
tests-extended ?
tests-vs-r 48.84% <85.00%> (+33.42%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
pyfixest/did/did.py 85.00% <100.00%> (+0.78%) ⬆️
pyfixest/did/did2s.py 90.47% <ø> (+0.85%) ⬆️
pyfixest/did/estimation.py 98.70% <100.00%> (+18.43%) ⬆️
pyfixest/did/saturated_twfe.py 75.00% <100.00%> (+58.44%) ⬆️
pyfixest/did/twfe.py 80.76% <100.00%> (ø)
pyfixest/estimation/estimation.py 94.35% <ø> (ø)
pyfixest/report/summarize.py 87.40% <0.00%> (-0.33%) ⬇️

... and 11 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@s3alfisc
Copy link
Member

s3alfisc commented May 28, 2025

Awesome, thanks so much! Will review this first thing tomorrow morning!

@s3alfisc
Copy link
Member

pre-commit.ci autofix

if weights is not None and use_weights:
post = (
df[df[treatment] == 1]
.groupby([cohort, period])[weights]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't this be frequency weights? I.e. we sum up weights over all treated by cohort and period? 🤔

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, this is also the implementation in fixest. I didn't test but would guess the std error if we called this frequency weights wouldn't match.

Copy link
Member

@s3alfisc s3alfisc May 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on the fixest docs - if use_weights = True, then the aggregation using weights and the weights are treated as frequency weights:

#' @param use_weights Logical, default is TRUE. If the estimation was weighted,
#' whether the aggregation should take into account the weights. Basically if the
#' weights reflected frequency it should be TRUE.

So my understanding would be - use_weights = FALSE -> analytical weights, use_weights = TRUE -> frequency weights?

Co-authored-by: Alexander Fischer <alexander-fischer1801@t-online.de>
Copy link
Contributor Author

@shapiromh shapiromh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the comments, let me make sure the weighted test covers the saturated twfe function and update the PR.

The "use_weights" issue is down to your taste, I think. I added to match fixest, but I also don't see the purpose in having this as a separate argument if the user already supplied weights.

I think the questions on freq versus analytic comes back to the discussion we were having about guessing which it was fixest used. At least feeding the argument as analytic to feols matched the std. errors from fixest.

if weights is not None and use_weights:
post = (
df[df[treatment] == 1]
.groupby([cohort, period])[weights]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, this is also the implementation in fixest. I didn't test but would guess the std error if we called this frequency weights wouldn't match.

@s3alfisc
Copy link
Member

pre-commit.ci autofix

@s3alfisc
Copy link
Member

Now I think every thing looks good - the only thing I'd like to clarify before merging:

  • Does use_weights = True trigger fweights or aweights? If fweights, we need to update the docs to make this behavior explicit
  • If fweights, should we maybe a) deviate from fixest syntax and call the argument "weights_type" instead of "use_weights", and b) default to analytical weights? As aweights are default in feols(), this is what I think users would expect as behavior?

@shapiromh
Copy link
Contributor Author

Thanks, Alex. I checked to verify that the tests would fail if we were using "fweights" instead of "aweights". I had expected this since you had mentioned that fixest isn't explicit about the types of weights it uses, but it should be aweights.

As far as I can see, the sunab code only additionally weights in the way I already implemented in the code. I agree both the text and the usage make it fairly clear that the weights here are used as "fweights", though I don't think this is ever seen by the underlying estimation routine that is fitting the model.

My best current guess is that the weights argument is used in both ways in fixest. It is used as aweights when running the core estimation routine. Then for aggregating from the cohort-period estimates to whatever the user specifies, it is practically treated as a frequency weight.

Finally, I realized that the "aggregate" function is not really user facing in fixest. It should just be called behind the scenes when the user specifies the "agg" argument when summarizing the result from the estimation (https://lrberge.github.io/fixest/reference/sunab.html). In the code base I didn't find a case where use_weights = False so I am guessing they always use these weights if supplied when setting up in the initial model.

If my understanding of all this is correct, then I don't think there is a way to match fixest's output and be consistent in pyfixest's use of the weights. If you want to match the output, then I guess the right thing is to remove the "use_weights" option and force the aggregation to use the weights as sampling weights. If you don't want to match the fixest output, necessarily, the whole PR might need a re-think.

@s3alfisc
Copy link
Member

s3alfisc commented Jun 1, 2025

Sorry for the delayed response, my parents were visiting over the weekend =)

My best current guess is that the weights argument is used in both ways in fixest. It is used as aweights when running the core estimation routine. Then for aggregating from the cohort-period estimates to whatever the user specifies, it is practically treated as a frequency weight.

This is also my understanding, and I wonder if this does not lead to errors because frequency weights and analytical weights SEs are different? I.e. when computing the vcov matrix, fixest uses "analytical" errors? There's also a slight chance that analytical and frequency errors have the exact same form and I still misunderstand something.

Mabye the best next step here would be to open a PR in the fixest repo and ask Laurent directly?

@s3alfisc
Copy link
Member

s3alfisc commented Jun 1, 2025

Also linking to the Stata documentation that might be helpful / I need to take a closer look at later: link

@shapiromh
Copy link
Contributor Author

Thanks, I'll also try to read through and get more clarity before pushing an update. I'll should have time again this weekend.

@shapiromh
Copy link
Contributor Author

Sorry for disappearing on this one, Alex. Let me take a look this weekend.

@s3alfisc
Copy link
Member

Sorry for disappearing on this one, Alex. Let me take a look this weekend.

Hi, no worries at all! Thanks for the update!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants