FINA4350

Predicting Rate Hikes with FOMC and FED data - a NLP and ML approach

About

This project is a project for FINA4350 to predict rate hike with textual data from official sources. The sources include FOMC statements, FOMC minutes, FED speeches and FED testimonies. We apply NLP (Natural Language Processing) techniques and ML (Machine Learning) models on the data to train models for prediction.

As our aim is to predict rate hike, all of the textual data for prediction are released before the rate hike is announced. During merging, we separated into 3 datasets:

Rate Hike with FOMC statements and minutes
Rate Hike with FED speeches and testimonies
Rate Hike with FOMC statements, FOMC minutes, FED speeches and FED testimonies (Full)

The data here are collected until April 2023, if you wish to train the model with more data, you could run the Data Collection, Preprocessing and Merging programs to collect new data.

For NLP and ML, we have applied various techniques and trained various models. For machine learning models, we applied count vectoriser and term frequency–inverse document frequency (tf-idf) to transform data before machine learning. Then, we train 3 ML models, namely Support Vector Machine (SVM), Random Forest Classifier (RF) and k Nearest Neighbour (kNN). For Deep Learning, we apply pre-trained GloVe (Global Vector for Word Representation) and combine with a 3-layer LSTM model. We also trained another model by applying pre-trained BERT (Bidirectional Encoder Representations from Transformers) and combine with CNN model.

Results

The full dataset has the highest accuracy across the 3 datasets in most of the model, so the following results correspond to the full dataset.

Model	Accuracy (Testing Data)
SVM	65.22%
kNN	69.57%
RF	73.91%
LSTM - GLoVe	67.39%
CNN - BERT	71.74%

How to use

First download the repository or clone the repository. Then, install the required libraries by

pip install -r requirements.txt

Data Collection:

cd "Data Collection"
python3 Minutes.py
python3 fomc_testimony.py
python3 speeches.py
python3 Statements.py

Please note that rate hike data are downloaded rather than collected.

Data Preprocessing and Topic Modelling:

Please return to the base directory (FINA4350) before running following codes. Please note that speeches_precessing.py must be run before speeches_topic_classification.py

cd "Data Collection"
python3 fomc_testimony_preprocessing.py
python3 minutes.py
python3 ratehikes_processing.py
python3 speeches_preprocessing.py
python3 speeches_topic_classification.py
python3 statement_preprocessed.py

Data Merging:

cd "Data Merging"
python3 merging_rate_FOMC.py
python3 merging_rate_speeches_testimony.py
python3 merging_rate_FOMC_speeches_testimony.py

Machine Learning:

Please return to the base directory (FINA4350) before running following codes.

cd "Machine learning models"

For SVM, RF or kNN models:

SVM: SVM.py, RF: Random_forest.py, kNN: knn.py

E.g.

python3 SVM.py
Please choose the data : [data]

You can choose [data] from merging_rate_FOMC, merging_rate_speeches_testimony or merging_rate_FOMC_speeches_testimony E.g. Please choose the data : merging_rate_FOMC

For Deep Learning models (LSTM with Glove, CNN with BERT):

LSTM with Glove: LSTM_Glove.py, CNN with BERT: CNN.py

python3 [file] [data]

[data] could be "../Data/Merged Data/rate_FOMC.csv", "../Data/Merged Data/rate_speeches_testimony.csv" or "../Data/Merged Data/rate_FOMC_speeches_testimony.csv"

E.g.

python3 LSTM_Glove.py "../Data/Merged Data/rate_FOMC.csv"

If you meet error about not permitted to create folder/ create a file, please run the code with higher privilege (e.g. sudo), as the pre-trained GloVe model has to be downloaded to the computer.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

FINA4350

Predicting Rate Hikes with FOMC and FED data - a NLP and ML approach

About

Results

How to use

Data Collection:

Data Preprocessing and Topic Modelling:

Data Merging:

Machine Learning:

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 114 Commits
Data Collection		Data Collection
Data Merging		Data Merging
Data Preprocessing		Data Preprocessing
Data		Data
Machine learning models		Machine learning models
README.md		README.md
requirements.txt		requirements.txt

thomasshin/FINA4350_NLP_in_Finance

Folders and files

Latest commit

History

Repository files navigation

FINA4350

Predicting Rate Hikes with FOMC and FED data - a NLP and ML approach

About

Results

How to use

Data Collection:

Data Preprocessing and Topic Modelling:

Data Merging:

Machine Learning:

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages