We implement LDA using scikit-learn on two different datasets.
- Text documents from the Associated Press found here and here in our project.
- Speech-to-Text recordings of IFT6269 lectures at the MILA (Université de Montréal) found here.
We pre-process the data and store it as corpus.txt.
- Use train to train the model and save it as a pickle file.
- Use save to save the topics extracted from training.
- Text documents from the scribe notes of IFT6269 here
- Text documents from the Associated Press found here and here in our project.
Completed in lda. However, it needs to be fixed and cleaned as it was run in Colab.