Skip to content

Le Wagon Data Science bootcamp final project

Notifications You must be signed in to change notification settings

andreabrumana/UK_election

 
 

Repository files navigation

TL;DR

Check our app here, and forecast 2024 UK general election results!

Forecasting the results of the next UK election

In this project, we use a machine learning model to forecast the results of the next UK general election expected in 2024. Given the census for England and Wales is done separately to that of Scotland and Northern Ireland, we have restricted the scope of this work to England and Wales only.

We use data on demographics, polling, and prior election results to train a model at the constituency1 level, to estimate the way each constituency would vote based on user-defined poll ratings prior to the next election.

Methods

Demographic data

Demographic data was taken from the UK census (source House of Commons, link) at the consituency level from 2011 and 2021. This gives us the proportion of each constituency by:

  • Age bucket (10 year buckets from 0-9 years to 80+)
  • Broad ethnicity (Black, Asian, Mixed, Othe, White)
  • Living status (Home owner, private renter, social renter)

Due to nature of census data, they are collected every 10 years. We interpolated the data for general election years (i.e. 2010, 2015, 2017, ,2019, 2024) using a linear interpolation function.

General election results data

Historic general election data was taken from the House of Commons Library (link). This gave the number of votes for each party in each previous general election, by constituency.

Polling data

Polling data was taken from a source of aggregated polling data from 1943 (link).

Polling data was available at National Level only. This was transformed to constituency level data by weighting the national poll for each constituency by the proportion of votes for each party in the previous election against the national average for each party in the previous election.

Machine learning models

Multiple machine learning models were fitted, including Ridge Regression, K Nearest Neighbours (KNN), Gradient Boosted Tree (XGB) and Support Vector Machine (SVC). All models were fitted using the scikit-learn package in Python. In this work, we present the output of the XGB model. A grid search was conducted to determine best fitting parameters for n_estimators, learning_rate and max_tree_depth.

Interface

The user interface was created using streamlit package; the UK map was created using altair package. The user interface allows users to toggle pre-election poll to infer the impact on results at the consituency and national level.

You can use the interface here

Footnotes

  1. A constituency is a geographical area that elects a representative to serve in the Parliament.

About

Le Wagon Data Science bootcamp final project

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 98.8%
  • Other 1.2%