Skip to content

spatial-computing/air-quality-prediction-scala

Repository files navigation

prisms-air-quality-modeling

In this project, we build an accurate fine-scale air quality prediction model. Here is the paper: Mining Public Datasets for Modeling Intra-City PM2.5 Concentrations at a Fine Spatial Resolution

Data Source

Air Quality Data

We are collecting the air quality data, including O3, PM25, PM10, CO, NO2, and SO2 concentration/AQI observations from the monitoring stations in Los Angeles County through the EPA’s Airnow web service. The air quality table (los_angeles_air_quality) has been initialized in JonSnow database (SQL code). We query the web servive every hour automatically (Python code) using crontab (Appendix I).

Weather Data

We are collecting meteorological data through the Dark Sky API. The weather table (los_angeles_meteorology) has been initialized in JonSnow database (SQL code). We query the web servive every hour at a given location (or sensor locations) automatically (Python code) in the same way. We can also query the data from a given time to a given time (Python code).

Geographic Data

We are using Openstreetmap to generate geographic features for our model. For a given location, it creates the buffers (default 100m-3000m with 100m interval) around the location and compute the intersected area/length/count between those buffers and various geographic categories in Openstreetmap data (see figure below). (Python code)

ScreenShot

Other data sources

Purple Air

We are collecting data from Purple Air. We query the Purple Air web service that each sensor updates air qualty data around every minute, including PM2.5, PM10, PM1, temperature, and humidity. Each machine has two channels (A and B) at the same location. The two-channel mechanism ensures if one channel has noises, the other one can still work properly. Each unique "sensor" has three ID numbers:

  • id - Each sensor (channel A or B) has its unique id
  • sensor_id - Each sensor (channel A or B) has its unique sensor_id
  • parent_id - For channel A sensor, it has a unique parent_id. When parent_id = null, it indicates a channel B sensor. If the id from a channel B sensor equals to the parent_id from a channel A sensor, the two sensors share the same machine and location.

Fishnet Data

Grids over Los Angeles County (around 3000 points), used for fine-scale prediction

Algorithm

Edit configuration in config.json.

High Level Architecture

ScreenShot

Model Evaluation

Fishnet Prediction

Run FishnetPrediction.scala to get the prediction result for fishnet. (Current Time or From Time To Time)

Appendix

I. Access JonSnow Database

You need to get the username and password for both JonSnow server and database.

  • Log in server with the server username and password
ssh -L [your local port]:localhost:5432 [your username]@jonsnow.usc.edu
  • Use Postico (only Mac) or PgAdmin to log in with database username and passward show in the figure below. [port] would be [your local port].

ScreenShot

II. crontab

  • Check all the running crontab
crontab -l    
  • Edit user crontab file
crontab -e   
  • Update crontab operations
sudo /bin/systemctl restart crond.service

About

Build an accurate fine-scale air quality prediction model

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published