Skip to content

script to convert Coursera's on-demand data to a postgresql database.

License

Notifications You must be signed in to change notification settings

renspoesse/coursera-ondemand-exports

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

70 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Convert Coursera on-demand export data to a PostGreSQL database

  • Written by: Jasper Ginn
  • Date: 27-11-2015
  • Email: j.h.ginn[at]cdh.leidenuniv.nl

Feedback/Questions

If you have any questions or feedback, please contact me at: j.h.ginn[at]cdh.leidenuniv.nl

Introduction

This repository contains a python script to convert on-demand export data to a PostGreSQL database. It further contains an R script showing how to connect to the database. More scripts will be added later as I spend more time on analyzing the data.

You can find more information on the python conversion script in the convert_ondemand folder.

Why PostGreSQL?

If you look at the headers that are supplied in the .html files, you see that some of these contain header specifications that are illegal in e.g. MySQL. Additionally, PostGreSQL can import CSV files pretty painlessly. Of course, you can choose to not use these header files and use another database.

Dependencies

PostGreSQL

To install PostGreSQL, please refer to the official documentation

Then, follow the 'first steps' tutorial here

Python

Python version: 2.7.9

The python script to convert the data depends on the following modules:

  • BeautifulSoup 4 (tested with version == 4.3.2)
  • psycopg (tested with version == 2.6.1)

You can install these modules with easy_install or pip. To install pip, please see this link (NB: To install psycopg2 in Ubuntu, you need to first do: sudo apt-get install python-dev)

Alternatively, you can install Anaconda.

R

The R script depends on the RPostGreSQL package. Installation instructions can be found in the script.

About

script to convert Coursera's on-demand data to a postgresql database.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Python 83.1%
  • R 16.9%