Skip to content

Python x GCP x dbt x PowerBI project. I built a datawharehouse in Google BigQuery based on data I scraped from several chocolate makers and distributors websites in France. Such data were transformed in dbt and visualized in a PowerBI dashboard.

Notifications You must be signed in to change notification settings

MargotMarchais/Chocolate-e-commerce

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

97 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Market study | Chocolate e-commerce in France

Executive summary: My goal for this project was to collect data from various chocolate makers and distributors in France, and eventually build a comprehensive dataset about the chocolate online market in France.

Methodology:

  • Web scraping: I scraped product data from several French e-commerce websites (chocolate section), using Python scrapy and requests libraries.
  • Python: I build an additional script to:
    • do some minor transformations to the .csv files resulting from web scraping (data cleaning)
    • automatically load the resulting dataframes in BigQuery using the bigquery.client
  • GCP BigQuery: I created an account and an empty project in GCP.
  • dbt: I created a dbt Cloud project that is connected to my BigQuery project. In this dbt project, I created 3 sections:
    • Bronze (staging): raw data, with few modifications.
    • Silver (transformations): the raw data with some transformations (new columns, filters, etc)
    • Gold (final): final datasets that will be used for analysis. Thanks to dbt, I could 'export' the SQL views and tables to GCP BigQuery. I also managed to create the data lineage, QoD tests and documentation. All dbt modifications were saved thanks to dbt-Github.
  • PowerBI: Finally, I could plug my Gold final datasets to Power BI to create a visual overview of the market.

Final output

  • A comprehensive dataset about the French chocolates online market
  • A visual Power BI dashboard

Go further To learn more about the project, you may read my non technical article here: https://margot-marchais-maurice.webflow.io/chocolate-french-market

Technical learnings: I did this project to help me acquire new skills such as: build a datawharehouse in GCP Bigquery, automatically feed this DWH with scraped data thanks to a Python script, learn how to use dbt (data transformations, tests and docs generation),... It also made me refresh my web scraping skills (scrapy and requests libraries).

Aperçu du dashboard:

2024-05-06_11h58_36 2024-05-06_11h58_45 2024-05-06_12h39_09

Brands positioning:

2024-05-06_12h39_36 2024-05-06_11h59_27

About

Python x GCP x dbt x PowerBI project. I built a datawharehouse in Google BigQuery based on data I scraped from several chocolate makers and distributors websites in France. Such data were transformed in dbt and visualized in a PowerBI dashboard.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published