In this project, we build an accurate fine-scale air quality prediction model. Here is the paper: Mining Public Datasets for Modeling Intra-City PM2.5 Concentrations at a Fine Spatial Resolution
We are collecting the air quality data, including O3, PM25, PM10, CO, NO2, and SO2 concentration/AQI observations from the monitoring stations in Los Angeles County through the EPA’s Airnow web service. The air quality table (los_angeles_air_quality) has been initialized in JonSnow database (SQL code). We query the web servive every hour automatically (Python code) using crontab (Appendix I).
We are collecting meteorological data through the Dark Sky API. The weather table (los_angeles_meteorology) has been initialized in JonSnow database (SQL code). We query the web servive every hour at a given location (or sensor locations) automatically (Python code) in the same way. We can also query the data from a given time to a given time (Python code).
We are using Openstreetmap to generate geographic features for our model. For a given location, it creates the buffers (default 100m-3000m with 100m interval) around the location and compute the intersected area/length/count between those buffers and various geographic categories in Openstreetmap data (see figure below). (Python code)
We are collecting data from Purple Air. We query the Purple Air web service that each sensor updates air qualty data around every minute, including PM2.5, PM10, PM1, temperature, and humidity. Each machine has two channels (A and B) at the same location. The two-channel mechanism ensures if one channel has noises, the other one can still work properly. Each unique "sensor" has three ID numbers:
- id - Each sensor (channel A or B) has its unique id
- sensor_id - Each sensor (channel A or B) has its unique sensor_id
- parent_id - For channel A sensor, it has a unique parent_id. When parent_id = null, it indicates a channel B sensor. If the id from a channel B sensor equals to the parent_id from a channel A sensor, the two sensors share the same machine and location.
Grids over Los Angeles County (around 3000 points), used for fine-scale prediction
Edit configuration in config.json.
- Cross Validation Run CrossValidation.scala to evaluate the model with itself.
- Validation Run Validation.scala to evaluate the model with other dataset.
Run FishnetPrediction.scala to get the prediction result for fishnet. (Current Time or From Time To Time)
You need to get the username and password for both JonSnow server and database.
- Log in server with the server username and password
ssh -L [your local port]:localhost:5432 [your username]@jonsnow.usc.edu
- Use Postico (only Mac) or PgAdmin to log in with database username and passward show in the figure below. [port] would be [your local port].
- Check all the running crontab
crontab -l
- Edit user crontab file
crontab -e
- Update crontab operations
sudo /bin/systemctl restart crond.service