- Written by: Jasper Ginn
- Date: 27-11-2015
- Email: j.h.ginn[at]cdh.leidenuniv.nl
If you have any questions or feedback, please contact me at: j.h.ginn[at]cdh.leidenuniv.nl
This repository contains a python script to convert on-demand export data to a PostGreSQL database. It further contains an R script showing how to connect to the database. More scripts will be added later as I spend more time on analyzing the data.
You can find more information on the python conversion script in the convert_ondemand
folder.
If you look at the headers that are supplied in the .html files, you see that some of these contain header specifications that are illegal in e.g. MySQL. Additionally, PostGreSQL can import CSV files pretty painlessly. Of course, you can choose to not use these header files and use another database.
To install PostGreSQL, please refer to the official documentation
Then, follow the 'first steps' tutorial here
Python version: 2.7.9
The python script to convert the data depends on the following modules:
- BeautifulSoup 4 (tested with version == 4.3.2)
- psycopg (tested with version == 2.6.1)
You can install these modules with easy_install or pip. To install pip, please see this link (NB: To install psycopg2 in Ubuntu, you need to first do: sudo apt-get install python-dev)
Alternatively, you can install Anaconda.
The R script depends on the RPostGreSQL
package. Installation instructions can be found in the script.