-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Elex 1235 create default aggregates #53
Conversation
…lections). Also can now aggregate lower than district for districtwide elections
…valid for state house elections
src/elexmodel/client.py
Outdated
@@ -142,7 +160,8 @@ def get_estimates( | |||
column_values = current_data[0] | |||
current_data = pd.DataFrame(current_data[1:], columns=column_values) | |||
features = kwargs.get("features", []) | |||
aggregates = kwargs.get("aggregates", ["postal_code", "unit"]) | |||
default_aggregates = self.get_default_aggregates(office) | |||
aggregates = kwargs.get("aggregates", default_aggregates) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do you feel about doing this in one line like:
aggregates = kwargs.get("aggregates", self.get_default_aggregates(office))
The reason I'm suggesting that is that if we have a new office that we want to include, we don't have to make a code change to add it to the defaults list, we can just include the aggregates arg.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done!
…ed to get it to break if no aggregates are passed
Description
We now set the default aggregates dynamically based on whether the election is statewide, in which case the default aggregate is
postal_code
or districtwide (e.g. house elections), in which case the default aggregate is(postal_code, district)
.The second change is that we no longer hard-code
postal_code
as the main aggregate when generating the aggregate predictions. Previously we always passed justpostal_code
as the largest aggregate to the model, even when were were generatingcounty_classification
predictions for House races. This meant that we wouldn't be able to createcounty_classification
predictions for each state, district, but only for each state. This is now resolved, since we use the default aggregate for that also.Jira Ticket
https://arcpublishing.atlassian.net/browse/ELEX-1235
Test Steps
Added unit tests to run tox. To see the new functionality run this in
develop
:The
county_fips
predictions do not take into account the district, since we are aggregating overpostal_code, county_fips
instead ofpostal_code, district, county_fips
. If you run the same invocation in this branch, this will be resolved.Note
The model now forces the user to input
district
for district wide races (ie. when office id isH
,Y
orZ
since otherwise the model may break when dealing with unexpected units. This is because district was not in the passed in defaults (so we do not create a district column for the unexpected units) but it's expected as part of the default aggregates when creating the list to generate the aggregate predictions. Here is an example:This can likely be fixed, if we think that is necessary.