Skip to content

MEDHAT-ALHADDAD/Pizza_Runner

Repository files navigation

Case Study #2 - Pizza Runner

Introduction

Did you know that over 115 million kilograms of pizza is consumed daily worldwide??? (Well according to Wikipedia anyway…)

Danny was scrolling through his Instagram feed when something really caught his eye - “80s Retro Styling and Pizza Is The Future!”

Danny was sold on the idea, but he knew that pizza alone was not going to help him get seed funding to expand his new Pizza Empire - so he had one more genius idea to combine with it - he was going to Uberize it - and so Pizza Runner was launched!

Danny started by recruiting “runners” to deliver fresh pizza from Pizza Runner Headquarters (otherwise known as Danny’s house) and also maxed out his credit card to pay freelance developers to build a mobile app to accept orders from customers.

Available Data

Because Danny had a few years of experience as a data scientist - he was very aware that data collection was going to be critical for his business’ growth.

He has prepared for us an entity relationship diagram of his database design but requires further assistance to clean his data and apply some basic calculations so he can better direct his runners and optimise Pizza Runner’s operations.

All datasets exist within the pizza_runner database schema - be sure to include this reference within your SQL scripts as you start exploring the data and answering the case study questions.

ERD.png


Table 1: runners

The runners table shows the registration_date for each new runner

runner_id registration_date
1 2021-01-01
2 2021-01-03
3 2021-01-08
4 2021-01-15

Table 2: customer_orders

Customer pizza orders are captured in the customer_orders table with 1 row for each individual pizza that is part of the order.

The pizza_id relates to the type of pizza which was ordered whilst the exclusions are the ingredient_id values which should be removed from the pizza and the extras are the ingredient_id values which need to be added to the pizza.

Note that customers can order multiple pizzas in a single order with varying exclusions and extras values even if the pizza is the same type!

The exclusions and extras columns will need to be cleaned up before using them in your queries.

order_id customer_id pizza_id exclusions extras order_time
1 101 1 2021-01-01 18:05:02
2 101 1 2021-01-01 19:00:52
3 102 1 2021-01-02 23:51:23
3 102 2 NaN 2021-01-02 23:51:23
4 103 1 4 2021-01-04 13:23:46
4 103 1 4 2021-01-04 13:23:46
4 103 2 4 2021-01-04 13:23:46
5 104 1 null 1 2021-01-08 21:00:29
6 101 2 null null 2021-01-08 21:03:13
7 105 2 null 1 2021-01-08 21:20:29
8 102 1 null null 2021-01-09 23:54:33
9 103 1 4 1, 5 2021-01-10 11:22:59
10 104 1 null null 2021-01-11 18:34:49
10 104 1 2, 6 1, 4 2021-01-11 18:34:49

Table 3: runner_orders

After each orders are received through the system - they are assigned to a runner - however not all orders are fully completed and can be cancelled by the restaurant or the customer.

The pickup_time is the timestamp at which the runner arrives at the Pizza Runner headquarters to pick up the freshly cooked pizzas. The distance and duration fields are related to how far and long the runner had to travel to deliver the order to the respective customer.

There are some known data issues with this table so be careful when using this in your queries - make sure to check the data types for each column in the schema SQL!

order_id runner_id pickup_time distance duration cancellation
1 1 2021-01-01 18:15:34 20km 32 minutes
2 1 2021-01-01 19:10:54 20km 27 minutes
3 1 2021-01-03 00:12:37 13.4km 20 mins NaN
4 2 2021-01-04 13:53:03 23.4 40 NaN
5 3 2021-01-08 21:10:57 10 15 NaN
6 3 null null null Restaurant Cancellation
7 2 2020-01-08 21:30:45 25km 25mins null
8 2 2020-01-10 00:15:02 23.4 km 15 minute null
9 2 null null null Customer Cancellation
10 1 2020-01-11 18:50:20 10km 10minutes null

Table 4: pizza_names

At the moment - Pizza Runner only has 2 pizzas available the Meat Lovers or Vegetarian!

pizza_id pizza_name
1 Meat Lovers
2 Vegetarian

Table 5: pizza_recipes

Each pizza_id has a standard set of toppings which are used as part of the pizza recipe.

pizza_id toppings
1 1, 2, 3, 4, 5, 6, 8, 10
2 4, 6, 7, 9, 11, 12

Table 6: pizza_toppings

This table contains all of the topping_name values with their corresponding topping_id value

topping_id topping_name
1 Bacon
2 BBQ Sauce
3 Beef
4 Cheese
5 Chicken
6 Mushrooms
7 Onions
8 Pepperoni
9 Peppers
10 Salami
11 Tomatoes
12 Tomato Sauce

PART I: Connecting to DB

1- Establish Connection

%load_ext sql
from sqlalchemy import create_engine
conn_text = 'postgresql://{}:{}@{}/{}'.format(
    pg_user, pg_password, pg_host, pg_db
)

%sql $conn_text
engine = create_engine(conn_text)

2-Create DB

%%sql

CREATE SCHEMA IF NOT EXISTS pizza_runner;
SET search_path = pizza_runner;

DROP TABLE IF EXISTS runners CASCADE;
CREATE TABLE runners (
  "runner_id" INTEGER,
  "registration_date" DATE
);
INSERT INTO runners
  ("runner_id", "registration_date")
VALUES
  (1, '2021-01-01'),
  (2, '2021-01-03'),
  (3, '2021-01-08'),
  (4, '2021-01-15');


DROP TABLE IF EXISTS customer_orders CASCADE;
CREATE TABLE customer_orders (
  "order_id" INTEGER,
  "customer_id" INTEGER,
  "pizza_id" INTEGER,
  "exclusions" VARCHAR(4),
  "extras" VARCHAR(4),
  "order_time" TIMESTAMP
);

INSERT INTO customer_orders
  ("order_id", "customer_id", "pizza_id", "exclusions", "extras", "order_time")
VALUES
  ('1', '101', '1', '', '', '2020-01-01 18:05:02'),
  ('2', '101', '1', '', '', '2020-01-01 19:00:52'),
  ('3', '102', '1', '', '', '2020-01-02 23:51:23'),
  ('3', '102', '2', '', NULL, '2020-01-02 23:51:23'),
  ('4', '103', '1', '4', '', '2020-01-04 13:23:46'),
  ('4', '103', '1', '4', '', '2020-01-04 13:23:46'),
  ('4', '103', '2', '4', '', '2020-01-04 13:23:46'),
  ('5', '104', '1', 'null', '1', '2020-01-08 21:00:29'),
  ('6', '101', '2', 'null', 'null', '2020-01-08 21:03:13'),
  ('7', '105', '2', 'null', '1', '2020-01-08 21:20:29'),
  ('8', '102', '1', 'null', 'null', '2020-01-09 23:54:33'),
  ('9', '103', '1', '4', '1, 5', '2020-01-10 11:22:59'),
  ('10', '104', '1', 'null', 'null', '2020-01-11 18:34:49'),
  ('10', '104', '1', '2, 6', '1, 4', '2020-01-11 18:34:49');


DROP TABLE IF EXISTS runner_orders CASCADE;
CREATE TABLE runner_orders (
  "order_id" INTEGER,
  "runner_id" INTEGER,
  "pickup_time" VARCHAR(19),
  "distance" VARCHAR(7),
  "duration" VARCHAR(10),
  "cancellation" VARCHAR(23)
);

INSERT INTO runner_orders
  ("order_id", "runner_id", "pickup_time", "distance", "duration", "cancellation")
VALUES
  ('1', '1', '2020-01-01 18:15:34', '20km', '32 minutes', ''),
  ('2', '1', '2020-01-01 19:10:54', '20km', '27 minutes', ''),
  ('3', '1', '2020-01-03 00:12:37', '13.4km', '20 mins', NULL),
  ('4', '2', '2020-01-04 13:53:03', '23.4', '40', NULL),
  ('5', '3', '2020-01-08 21:10:57', '10', '15', NULL),
  ('6', '3', 'null', 'null', 'null', 'Restaurant Cancellation'),
  ('7', '2', '2020-01-08 21:30:45', '25km', '25mins', 'null'),
  ('8', '2', '2020-01-10 00:15:02', '23.4 km', '15 minute', 'null'),
  ('9', '2', 'null', 'null', 'null', 'Customer Cancellation'),
  ('10', '1', '2020-01-11 18:50:20', '10km', '10minutes', 'null');


DROP TABLE IF EXISTS pizza_names CASCADE;
CREATE TABLE pizza_names (
  "pizza_id" INTEGER,
  "pizza_name" TEXT
);
INSERT INTO pizza_names
  ("pizza_id", "pizza_name")
VALUES
  (1, 'Meatlovers'),
  (2, 'Vegetarian');


DROP TABLE IF EXISTS pizza_recipes;
CREATE TABLE pizza_recipes (
  "pizza_id" INTEGER,
  "toppings" TEXT
);
INSERT INTO pizza_recipes
  ("pizza_id", "toppings")
VALUES
  (1, '1, 2, 3, 4, 5, 6, 8, 10'),
  (2, '4, 6, 7, 9, 11, 12');


DROP TABLE IF EXISTS pizza_toppings;
CREATE TABLE pizza_toppings (
  "topping_id" INTEGER,
  "topping_name" TEXT
);
INSERT INTO pizza_toppings
  ("topping_id", "topping_name")
VALUES
  (1, 'Bacon'),
  (2, 'BBQ Sauce'),
  (3, 'Beef'),
  (4, 'Cheese'),
  (5, 'Chicken'),
  (6, 'Mushrooms'),
  (7, 'Onions'),
  (8, 'Pepperoni'),
  (9, 'Peppers'),
  (10, 'Salami'),
  (11, 'Tomatoes'),
  (12, 'Tomato Sauce');
 * postgresql://medhat:***@localhost/medhat
Done.
Done.
Done.
Done.
4 rows affected.
Done.
Done.
14 rows affected.
Done.
Done.
10 rows affected.
Done.
Done.
2 rows affected.
Done.
Done.
2 rows affected.
Done.
Done.
12 rows affected.





[]

3- Checking DB created successfully

%%sql

SELECT
    runners.runner_id,
    runners.registration_date,
    COUNT(DISTINCT runner_orders.order_id) AS orders
FROM pizza_runner.runners
INNER JOIN pizza_runner.runner_orders
    ON runners.runner_id = runner_orders.runner_id
WHERE runner_orders.cancellation IS NOT NULL
GROUP BY
    runners.runner_id,
    runners.registration_date;
 * postgresql://medhat:***@localhost/medhat
3 rows affected.
runner_id registration_date orders
1 2021-01-01 3
2 2021-01-03 3
3 2021-01-08 1

PART II: Exploring DB

1- checking tables.columns and dtypes

%%sql

SELECT ordinal_position, table_name, column_name, data_type
FROM INFORMATION_SCHEMA.columns
WHERE table_schema = 'pizza_runner'
ORDER BY table_name, ordinal_position;
 * postgresql://medhat:***@localhost/medhat
140 rows affected.
ordinal_position table_name column_name data_type
1 customer_orders order_id integer
2 customer_orders customer_id integer
3 customer_orders pizza_id integer
4 customer_orders exclusions character varying
5 customer_orders extras character varying
6 customer_orders order_time timestamp without time zone
1 customer_orders_clean order_id integer
2 customer_orders_clean customer_id integer
3 customer_orders_clean pizza_id integer
4 customer_orders_clean exclusions character varying
5 customer_orders_clean extras character varying
6 customer_orders_clean order_time timestamp without time zone
1 date_dim date timestamp without time zone
1 exclusions_extras_dim exc_ext_id integer
2 exclusions_extras_dim concat_exclusions text
3 exclusions_extras_dim concat_extras text
4 exclusions_extras_dim exclusions_or_extras text
1 execlusions_inclusions_dim order_id numeric
2 execlusions_inclusions_dim pizza_id numeric
3 execlusions_inclusions_dim bacon boolean
4 execlusions_inclusions_dim bbq_sauce boolean
5 execlusions_inclusions_dim beef boolean
6 execlusions_inclusions_dim cheese boolean
7 execlusions_inclusions_dim chicken boolean
8 execlusions_inclusions_dim mushrooms boolean
9 execlusions_inclusions_dim onions boolean
10 execlusions_inclusions_dim pepperoni boolean
11 execlusions_inclusions_dim peppers boolean
12 execlusions_inclusions_dim salami boolean
13 execlusions_inclusions_dim tomatoes boolean
14 execlusions_inclusions_dim tomato_sauce boolean
15 execlusions_inclusions_dim concat_toppings text
16 execlusions_inclusions_dim exclusions_or_extras text
1 init_exclusions_extras_dim exc_ext_id integer
2 init_exclusions_extras_dim concat_exclusions text
3 init_exclusions_extras_dim concat_extras text
4 init_exclusions_extras_dim exclusions_or_extras text
1 init_order_fact order_id integer
2 init_order_fact customer_id integer
3 init_order_fact pizza_id integer
4 init_order_fact runner_id integer
5 init_order_fact exc_ext_id integer
6 init_order_fact order_date timestamp without time zone
7 init_order_fact pickup_time timestamp without time zone
8 init_order_fact duration interval
9 init_order_fact expected_arrival timestamp without time zone
10 init_order_fact distance numeric
11 init_order_fact exclusions character varying
12 init_order_fact extras character varying
13 init_order_fact cancellation character varying
1 order_fact order_id integer
2 order_fact customer_id integer
3 order_fact pizza_id integer
4 order_fact runner_id integer
5 order_fact exc_ext_id integer
6 order_fact order_date timestamp without time zone
7 order_fact pickup_time timestamp without time zone
8 order_fact duration interval
9 order_fact expected_arrival timestamp without time zone
10 order_fact distance numeric
11 order_fact exclusions character varying
12 order_fact extras character varying
13 order_fact cancellation character varying
14 order_fact rating integer
1 order_preparing_time order_id integer
2 order_preparing_time pizza_numbers bigint
3 order_preparing_time preparing_time interval
1 orders_speed order_id integer
2 orders_speed runner_id integer
3 orders_speed distance numeric
4 orders_speed duration interval
5 orders_speed speed (km per min) numeric
1 pizza_dim pizza_id integer
2 pizza_dim pizza_name text
3 pizza_dim bacon boolean
4 pizza_dim bbq_sauce boolean
5 pizza_dim beef boolean
6 pizza_dim cheese boolean
7 pizza_dim chicken boolean
8 pizza_dim mushrooms boolean
9 pizza_dim onions boolean
10 pizza_dim pepperoni boolean
11 pizza_dim peppers boolean
12 pizza_dim salami boolean
13 pizza_dim tomatoes boolean
14 pizza_dim tomato_sauce boolean
15 pizza_dim concat_toppings text
1 pizza_names pizza_id integer
2 pizza_names pizza_name text
1 pizza_recipes pizza_id integer
2 pizza_recipes toppings text
1 pizza_toppings topping_id integer
2 pizza_toppings topping_name text
1 runner_orders order_id integer
2 runner_orders runner_id integer
3 runner_orders pickup_time character varying
4 runner_orders distance character varying
5 runner_orders duration character varying
6 runner_orders cancellation character varying
1 runner_orders_clean order_id integer
2 runner_orders_clean runner_id integer
3 runner_orders_clean pickup_time timestamp without time zone
4 runner_orders_clean distance numeric
5 runner_orders_clean duration interval
6 runner_orders_clean cancellation character varying
1 runners runner_id integer
2 runners registration_date date
1 runners_clean runner_id integer
2 runners_clean registration_date date
1 test_al pizza_id integer
2 test_al pizza_name text
3 test_al bacon boolean
4 test_al bbq_sauce boolean
5 test_al beef boolean
6 test_al cheese boolean
7 test_al chicken boolean
8 test_al mushrooms boolean
9 test_al onions boolean
10 test_al pepperoni boolean
11 test_al peppers boolean
12 test_al salami boolean
13 test_al tomatoes boolean
14 test_al tomato_sauce boolean
15 test_al concat_toppings text
1 updated_exclusions_extras_dim order_id numeric
2 updated_exclusions_extras_dim pizza_id numeric
3 updated_exclusions_extras_dim bacon boolean
4 updated_exclusions_extras_dim bbq_sauce boolean
5 updated_exclusions_extras_dim beef boolean
6 updated_exclusions_extras_dim cheese boolean
7 updated_exclusions_extras_dim chicken boolean
8 updated_exclusions_extras_dim mushrooms boolean
9 updated_exclusions_extras_dim onions boolean
10 updated_exclusions_extras_dim pepperoni boolean
11 updated_exclusions_extras_dim peppers boolean
12 updated_exclusions_extras_dim salami boolean
13 updated_exclusions_extras_dim tomatoes boolean
14 updated_exclusions_extras_dim tomato_sauce boolean
15 updated_exclusions_extras_dim concat_toppings text
16 updated_exclusions_extras_dim exclusions_or_extras text
%%sql

-- explore missing and unique values for each table

SELECT 'exclusions: (' || string_agg(DISTINCT exclusions, ' | ') || ')' FROM customer_orders
UNION
SELECT 'extras: (' || string_agg(DISTINCT extras, ' | ') || ')' FROM customer_orders
UNION
SELECT 'pizza_name: (' || string_agg(DISTINCT pizza_name, ' | ') || ')' FROM pizza_names
UNION
SELECT 'pizza_recipes: (' || string_agg(DISTINCT toppings, ' | ') || ')' FROM pizza_recipes
UNION
SELECT 'cancellation: (' || string_agg(DISTINCT cancellation, ' | ') || ')' FROM runner_orders;
 * postgresql://medhat:***@localhost/medhat
5 rows affected.
?column?
exclusions: ( | 2, 6 | 4 | null)
cancellation: ( | Customer Cancellation | Restaurant Cancellation | null)
pizza_recipes: (1, 2, 3, 4, 5, 6, 8, 10 | 4, 6, 7, 9, 11, 12)
pizza_name: (Meatlovers | Vegetarian)
extras: ( | 1 | 1, 4 | 1, 5 | null)

Notes

we have 6 tables, wrangling and cleaning requirments

. RELATIONS AND TYPES:

  • customer_orders(exclusions, extras) --> trim, replace null and empty with null
  • runner_orders(pickup_time to timestamp & duration to interval, distance to int, cancellation nan, null, and empty to null)

. ADDED COLUMNS:

  • add expected_arrival column (runner_order)

2- Clean Tables

i) customer_orders

%%sql
DROP TABLE IF EXISTS customer_orders_clean;

CREATE TABLE customer_orders_clean AS (
    SELECT * FROM customer_orders
);

UPDATE customer_orders_clean
    SET exclusions= NULL WHERE exclusions IN ('null', '');
UPDATE customer_orders_clean
    SET extras= NULL WHERE extras IN ('null', '');
    

SELECT * FROM customer_orders_clean 
ORDER BY 1;
 * postgresql://medhat:***@localhost/medhat
Done.
14 rows affected.
9 rows affected.
9 rows affected.
14 rows affected.
order_id customer_id pizza_id exclusions extras order_time
1 101 1 None None 2020-01-01 18:05:02
2 101 1 None None 2020-01-01 19:00:52
3 102 1 None None 2020-01-02 23:51:23
3 102 2 None None 2020-01-02 23:51:23
4 103 1 4 None 2020-01-04 13:23:46
4 103 1 4 None 2020-01-04 13:23:46
4 103 2 4 None 2020-01-04 13:23:46
5 104 1 None 1 2020-01-08 21:00:29
6 101 2 None None 2020-01-08 21:03:13
7 105 2 None 1 2020-01-08 21:20:29
8 102 1 None None 2020-01-09 23:54:33
9 103 1 4 1, 5 2020-01-10 11:22:59
10 104 1 2, 6 1, 4 2020-01-11 18:34:49
10 104 1 None None 2020-01-11 18:34:49

ii) runner_orders

%%sql
DROP TABLE IF EXISTS runner_orders_clean;

CREATE TABLE IF NOT EXISTS runner_orders_clean AS (
    SELECT * FROM runner_orders
);

-- REPLACE 'null' with NULL
UPDATE runner_orders_clean
    SET pickup_time = NULL WHERE pickup_time = 'null';
UPDATE runner_orders_clean
    SET duration = NULL WHERE duration = 'null';
UPDATE runner_orders_clean
    SET distance = NULL WHERE distance = 'null';
UPDATE runner_orders_clean
    SET cancellation = NULL WHERE cancellation IN ('null', '');


UPDATE runner_orders_clean
    SET duration = REPLACE( REPLACE( REPLACE(
        TRIM(duration), 'minute', ''), 's', ''), 'min', '') || 'm',
    distance = REPLACE( TRIM(distance), 'km', '');
    

    
-- CAST COLUMNS
ALTER TABLE runner_orders_clean 
    ALTER COLUMN pickup_time TYPE TIMESTAMP USING pickup_time::timestamp without time zone,
    ALTER COLUMN distance TYPE NUMERIC USING distance::numeric,
    ALTER COLUMN duration TYPE INTERVAL USING duration::interval;
        
-- CHECK COLUMNS DTYPE
SELECT ordinal_position, table_name, column_name, data_type
FROM INFORMATION_SCHEMA.columns
WHERE table_name = 'runner_orders_clean';
 * postgresql://medhat:***@localhost/medhat
Done.
10 rows affected.
2 rows affected.
2 rows affected.
2 rows affected.
5 rows affected.
10 rows affected.
Done.
6 rows affected.
ordinal_position table_name column_name data_type
1 runner_orders_clean order_id integer
2 runner_orders_clean runner_id integer
3 runner_orders_clean pickup_time timestamp without time zone
4 runner_orders_clean distance numeric
5 runner_orders_clean duration interval
6 runner_orders_clean cancellation character varying
%%sql
SELECT * FROM runner_orders_clean
 * postgresql://medhat:***@localhost/medhat
10 rows affected.
order_id runner_id pickup_time distance duration cancellation
3 1 2020-01-03 00:12:37 13.4 0:20:00 None
4 2 2020-01-04 13:53:03 23.4 0:40:00 None
5 3 2020-01-08 21:10:57 10 0:15:00 None
6 3 None None None Restaurant Cancellation
9 2 None None None Customer Cancellation
1 1 2020-01-01 18:15:34 20 0:32:00 None
2 1 2020-01-01 19:10:54 20 0:27:00 None
7 2 2020-01-08 21:30:45 25 0:25:00 None
8 2 2020-01-10 00:15:02 23.4 0:15:00 None
10 1 2020-01-11 18:50:20 10 0:10:00 None

3- Modeling

To simplify Queries, let's merge customer and runner orders in one order_fact table.

  • order_fact [ order_id, customer_id, runner_id, pizza_id, pickup_time, order_date, duration, expected_arrival, distance, exclusions, extras, cancellation]

i) Create Order Fact Table

%%sql

-- CREATE order_fact
DROP TABLE IF EXISTS order_fact CASCADE;

CREATE TABLE IF NOT EXISTS order_fact AS
    SELECT
        customer_orders_clean.order_id,
        customer_id,
        pizza_id,
        runner_id,
        CASE WHEN exclusions IS NOT NULL OR extras IS NOT NULL
            THEN floor(random() * 1000000 + 1)::int END AS exc_ext_id,
        order_time AS order_date,
        pickup_time,
        duration,
        pickup_time + duration AS expected_arrival,
        distance,
        exclusions,
        extras,
        cancellation
    FROM customer_orders_clean
    JOIN runner_orders_clean
    USING(order_id)
    ORDER BY 1;
    
SELECT *
FROM order_fact
LIMIT 3;
 * postgresql://medhat:***@localhost/medhat
Done.
14 rows affected.
3 rows affected.
order_id customer_id pizza_id runner_id exc_ext_id order_date pickup_time duration expected_arrival distance exclusions extras cancellation
1 101 1 1 None 2020-01-01 18:05:02 2020-01-01 18:15:34 0:32:00 2020-01-01 18:47:34 20 None None None
2 101 1 1 None 2020-01-01 19:00:52 2020-01-01 19:10:54 0:27:00 2020-01-01 19:37:54 20 None None None
3 102 1 1 None 2020-01-02 23:51:23 2020-01-03 00:12:37 0:20:00 2020-01-03 00:32:37 13.4 None None None

ii) Create Date Dim

%%sql
DROP TABLE IF EXISTS date_dim;

CREATE TABLE IF NOT EXISTS date_dim AS
    SELECT 
        generate_series(min_date, max_date, '1 day'::INTERVAL) AS date
    FROM (
        SELECT MIN(order_date) AS min_date, MAX(order_date) AS max_date
        FROM order_fact
    ) AS date_intervals;
    
SELECT *
FROM date_dim
 * postgresql://medhat:***@localhost/medhat
Done.
11 rows affected.
11 rows affected.
date
2020-01-01 18:05:02
2020-01-02 18:05:02
2020-01-03 18:05:02
2020-01-04 18:05:02
2020-01-05 18:05:02
2020-01-06 18:05:02
2020-01-07 18:05:02
2020-01-08 18:05:02
2020-01-09 18:05:02
2020-01-10 18:05:02
2020-01-11 18:05:02

iii) spliting toppings to fixed-positional columns with concatenated desc, then denormalizing pizza_names, pizza_recipes, pizza_toppings in to single dim

%%sql

CREATE OR REPLACE VIEW pizza_r2 AS (
    SELECT 
        pizza_id,
        pizza_name,
        LOWER(REPLACE(STRING_AGG(topping_name, ','), ' ', '_')) AS toppings
    FROM pizza_names
    JOIN (
        SELECT
            pizza_id,
            UNNEST(REGEXP_MATCHES(toppings, '[0-9]{1,2}','g'))::INTEGER AS topping_id
        FROM pizza_recipes
    ) splitted_recipes
    USING(pizza_id)
    JOIN pizza_toppings
    USING(topping_id)
    GROUP BY 1, 2
);

SELECT * FROM pizza_r2;
 * postgresql://medhat:***@localhost/medhat
Done.
2 rows affected.
pizza_id pizza_name toppings
1 Meatlovers bbq_sauce,pepperoni,cheese,salami,chicken,bacon,mushrooms,beef
2 Vegetarian tomato_sauce,cheese,mushrooms,onions,peppers,tomatoes
%%sql

DROP TABLE IF EXISTS pizza_dim;

CREATE TABLE IF NOT EXISTS pizza_dim AS (
    SELECT pizza_id, pizza_name
    FROM pizza_names
);

-- 1) create toppings columns
DO $$
DECLARE
    col TEXT;

BEGIN
  -- Generate column dynamically
  FOR col IN
    SELECT toppings[i]
    FROM (
        SELECT
            string_to_array(
                STRING_AGG(topping_name, ','), ',') AS toppings,
            generate_series(1, 
                array_length(
                    string_to_array(
                        STRING_AGG(topping_name, ','), ','), 1)) AS i
        FROM pizza_toppings
    ) subquery
  LOOP
    EXECUTE 'ALTER TABLE pizza_dim ADD COLUMN ' || replace(col, ' ', '_') || ' BOOL';
  END LOOP;
        EXECUTE 'ALTER TABLE pizza_dim ADD COLUMN concat_toppings TEXT';
END $$;

SELECT * FROM pizza_dim;
 * postgresql://medhat:***@localhost/medhat
Done.
2 rows affected.
Done.
2 rows affected.
pizza_id pizza_name bacon bbq_sauce beef cheese chicken mushrooms onions pepperoni peppers salami tomatoes tomato_sauce concat_toppings
1 Meatlovers None None None None None None None None None None None None None
2 Vegetarian None None None None None None None None None None None None None
%%sql

-- FILL Boolean columns 
DO $$
DECLARE
    p_id INT;
    query_text TEXT;
    record_toppings TEXT;
    col_name TEXT;
    condition TEXT;
    column_names TEXT[] := (
        SELECT
            string_to_array(
                REPLACE(STRING_AGG(topping_name, ','), ' ', '_'), ',')
            AS toppings
        FROM pizza_toppings
    );
BEGIN
    -- Iterate over pizza_ids
    FOR p_id IN SELECT pizza_id FROM pizza_dim
    LOOP
        query_text := 'UPDATE pizza_dim SET ';

        -- Iterate over columns
        FOR col_name IN 
            SELECT col[i]
            FROM (
                SELECT column_names AS col,
                    generate_series(1, array_length(column_names, 1), 1) AS i
            ) subquery
        LOOP
            
            -- current pizza_toppings
            record_toppings := (
                SELECT toppings FROM pizza_r2 WHERE pizza_id=p_id
            );
            
            condition := CASE 
                WHEN record_toppings ILIKE '%' || col_name || '%' 
                THEN 'TRUE' 
                ELSE 'FALSE' END;
        
            query_text := query_text || col_name || '=' || condition || ', ';
        END LOOP;
        query_text := query_text || 'concat_toppings= '''|| record_toppings || '''';
        query_text := query_text || ' WHERE pizza_id = ' || p_id || ';';

        EXECUTE query_text;
    END LOOP;
END $$;

SELECT * FROM pizza_dim;
 * postgresql://medhat:***@localhost/medhat
Done.
2 rows affected.
pizza_id pizza_name bacon bbq_sauce beef cheese chicken mushrooms onions pepperoni peppers salami tomatoes tomato_sauce concat_toppings
1 Meatlovers True True True True True True False True False True False False bacon,bbq_sauce,beef,cheese,chicken,mushrooms,pepperoni,salami
2 Vegetarian False False False True False True True False True False True True cheese,mushrooms,onions,peppers,tomatoes,tomato_sauce

b) repeat steps for execlusions_inclusions_dim

%%sql
DROP TABLE IF EXISTS init_exclusions_extras_dim CASCADE;

CREATE TABLE IF NOT EXISTS init_exclusions_extras_dim (
    exc_ext_id INT,
    concat_exclusions TEXT,
    concat_extras TEXT,
    exclusions_or_extras TEXT
);

INSERT INTO init_exclusions_extras_dim 
    (exc_ext_id, concat_exclusions, concat_extras, exclusions_or_extras)
SELECT
    exc_ext_id,
    exclusions AS concat_exclusions,
    extras AS concat_extras,
    CASE 
        WHEN exclusions IS NOT NULL AND extras IS NOT NULL 
            THEN 'both'
        WHEN exclusions IS NOT NULL
            THEN 'exclusions'
        WHEN extras IS NOT NULL
            THEN 'extras'
        ELSE NULL END AS exclusions_or_extras       
FROM order_fact
WHERE exclusions IS NOT NULL OR extras IS NOT NULL
ORDER BY 1, 2;


SELECT * FROM init_exclusions_extras_dim;
 * postgresql://medhat:***@localhost/medhat
Done.
Done.
7 rows affected.
7 rows affected.
exc_ext_id concat_exclusions concat_extras exclusions_or_extras
87242 None 1 extras
433564 4 None exclusions
588917 4 None exclusions
638446 2, 6 1, 4 both
678602 None 1 extras
720427 4 1, 5 both
967343 4 None exclusions
%%sql

DROP TABLE IF EXISTS exclusions_extras_dim CASCADE;
CREATE TABLE IF NOT EXISTS exclusions_extras_dim AS (
    SELECT
        exc_ext_id,
        (
            SELECT LOWER(REPLACE(STRING_AGG(topping_name, ','), ' ', '_'))
            FROM (
                SELECT DISTINCT unnest(string_to_array(concat_exclusions, ', '))::integer AS topping_id
                FROM init_exclusions_extras_dim
                WHERE exc_ext_id = e.exc_ext_id
            ) AS subquery
            JOIN pizza_toppings
            ON subquery.topping_id = pizza_toppings.topping_id
        ) AS concat_exclusions,
        (
            SELECT LOWER(REPLACE(STRING_AGG(topping_name, ','), ' ', '_'))
            FROM (
                SELECT DISTINCT unnest(string_to_array(concat_extras, ', '))::integer AS topping_id
                FROM init_exclusions_extras_dim
                WHERE exc_ext_id = e.exc_ext_id
            ) AS subquery
            JOIN pizza_toppings
            ON subquery.topping_id = pizza_toppings.topping_id
        ) AS concat_extras,
        exclusions_or_extras
    FROM init_exclusions_extras_dim AS e
);

SELECT * FROM exclusions_extras_dim;
 * postgresql://medhat:***@localhost/medhat
Done.
7 rows affected.
7 rows affected.
exc_ext_id concat_exclusions concat_extras exclusions_or_extras
87242 None bacon extras
433564 cheese None exclusions
588917 cheese None exclusions
638446 bbq_sauce,mushrooms bacon,cheese both
678602 None bacon extras
720427 cheese bacon,chicken both
967343 cheese None exclusions

Final Schema

modeled_erd.png

PART III: Reporting

1- Pizza Metrics

i) How many pizzas were ordered?

%%sql

SELECT count(*)
FROM order_fact;
 * postgresql://medhat:***@localhost/medhat
1 rows affected.
count
14

ii) How many unique customer orders were made?

%%sql

SELECT count(DISTINCT order_id)
FROM order_fact;
 * postgresql://medhat:***@localhost/medhat
1 rows affected.
count
10

iii) How many successful orders were delivered by each runner?

%%sql

SELECT runner_id, COUNT(DISTINCT order_id) AS successful_orders
FROM order_fact
WHERE cancellation IS NULL
GROUP BY 1;
 * postgresql://medhat:***@localhost/medhat
3 rows affected.
runner_id successful_orders
1 4
2 3
3 1

iv) How many of each type of pizza was delivered?

%%sql

SELECT pizza_name, count(*) AS delivered
FROM order_fact
JOIN pizza_names
USING(pizza_id)
WHERE cancellation IS NULL
GROUP BY 1
ORDER BY 2 DESC;
 * postgresql://medhat:***@localhost/medhat
2 rows affected.
pizza_name delivered
Meatlovers 9
Vegetarian 3

v) How many Vegetarian and Meatlovers were ordered by each customer?

%%sql

SELECT 
    customer_id, 
    SUM(CASE WHEN pizza_id = 1 THEN 1 ELSE 0 END) AS meat_lovers_count,
    SUM(CASE WHEN pizza_id = 2 THEN 1 ELSE 0 END) AS vegetarian_count
FROM order_fact
JOIN pizza_names
USING(pizza_id)
GROUP BY 1
ORDER BY 1;
 * postgresql://medhat:***@localhost/medhat
5 rows affected.
customer_id meat_lovers_count vegetarian_count
101 2 1
102 2 1
103 3 1
104 3 0
105 0 1

vi) What was the maximum number of pizzas delivered in a single order?

%%sql

SELECT order_id, count(*) AS max_pizzas_delivered
FROM order_fact
GROUP BY 1
ORDER BY 2 DESC
LIMIT 1;
 * postgresql://medhat:***@localhost/medhat
1 rows affected.
order_id max_pizzas_delivered
4 3

vii) For each customer, how many delivered pizzas had at least 1 change and how many had no changes?

%%sql

SELECT 
    customer_id, 
    SUM(CASE WHEN exc_ext_id IS NOT NULL THEN 1 ELSE 0 END) AS changed_order,
    SUM(CASE WHEN exc_ext_id IS NULL THEN 1 ELSE 0 END) AS non_changed_order
FROM order_fact as o
LEFT JOIN exclusions_extras_dim as e
USING(exc_ext_id)
WHERE cancellation IS NULL
GROUP BY 1;
 * postgresql://medhat:***@localhost/medhat
5 rows affected.
customer_id changed_order non_changed_order
101 0 2
102 0 3
103 3 0
104 2 1
105 1 0

viii) How many pizzas were delivered that had both exclusions and extras?

%%sql

SELECT COUNT(DISTINCT exc_ext_id)
FROM order_fact
WHERE MOD(exc_ext_id::int, 10) = 0  AND cancellation IS NOT NULL
 * postgresql://medhat:***@localhost/medhat
1 rows affected.
count
0

ix) What was the total volume of pizzas ordered for each hour of the day?

%%sql

-- for hours of the day
/* WITH hours_of_day AS ( SELECT generate_series(1, 24, 1) AS hod)

SELECT 
    h.hod,
    COALESCE(count, 0) AS orders
FROM hours_of_day as h
LEFT JOIN (
    SELECT 
        EXTRACT(hour FROM order_date) AS hod,
        Count(*)
    FROM order_fact
    GROUP BY 1
) AS order_per_hour
USING(hod)
ORDER BY 1; */

SELECT 
    EXTRACT(hour FROM order_date) AS hod,
    Count(*)
FROM order_fact
GROUP BY 1
ORDER BY 1;
 * postgresql://medhat:***@localhost/medhat
6 rows affected.
hod count
11 1
13 3
18 3
19 1
21 3
23 3

x) What was the volume of orders for each day of the week?

%%sql

SELECT 
    TO_CHAR(order_date, 'Day') AS day_of_week,
    Count(*) AS order_count
FROM order_fact
GROUP BY day_of_week, DATE_PART('dow', order_date)
ORDER BY DATE_PART('dow', order_date);
 * postgresql://medhat:***@localhost/medhat
4 rows affected.
day_of_week order_count
Wednesday 5
Thursday 3
Friday 1
Saturday 5

2- Runner and Customer Experience

i) How many runners signed up for each 1 week period? (i.e. week starts 2021-01-01)

%%sql

SELECT 
    (date_trunc('week', registration_date) + INTERVAL '4 DAYS')::DATE,
    count(*)    
FROM runners
GROUP BY 1
ORDER BY 1;
 * postgresql://medhat:***@localhost/medhat
3 rows affected.
date count
2021-01-01 2
2021-01-08 1
2021-01-15 1

ii-a) What was the average time in minutes it took for each runner to arrive at the Pizza Runner HQ to pickup the order?

%%sql

SELECT 
    runner_id,
    EXTRACT( MINUTES FROM AVG(pickup_time - order_date)) AS AVG_MIN
FROM order_fact
GROUP BY 1
ORDER BY 1;
 * postgresql://medhat:***@localhost/medhat
3 rows affected.
runner_id avg_min
1 15
2 23
3 10

ii-b) What is the average pick-up time in minutes for the orders?

%%sql

WITH distinct_orders AS (
    SELECT DISTINCT
        order_id,
        EXTRACT( EPOCH FROM (pickup_time - order_date)) / 60 AS pickup_time
    FROM order_fact
)

SELECT round(avg(pickup_time), 2) AS avg_pickup_time
FROM distinct_orders
 * postgresql://medhat:***@localhost/medhat
1 rows affected.
avg_pickup_time
15.98

iii) Is there any relationship between the number of pizzas and how long the order takes to prepare?

%%sql

-- assumption that pickup_time represent when the pizza is prepared

CREATE VIEW order_preparing_time AS (
    SELECT 
        order_id,
        count(*) AS pizza_numbers,
        MAX(pickup_time - order_date) AS preparing_time
    FROM order_fact
    GROUP BY 1
);

SELECT 
    pizza_numbers,
    round(avg(EXTRACT( EPOCH FROM preparing_time) / 60), 2) AS avg_prep_time
FROM order_preparing_time
GROUP BY 1
ORDER BY 1;
 * postgresql://medhat:***@localhost/medhat
Done.
3 rows affected.
pizza_numbers avg_prep_time
1 12.36
2 18.38
3 29.28
%%sql

WITH avg_prep_time AS (
    SELECT 
        pizza_numbers,
        avg(EXTRACT( EPOCH FROM preparing_time) / 60) AS avg_prep_time
    FROM order_preparing_time
    GROUP BY 1
    ORDER BY 1
)

SELECT ROUND((corr(pizza_numbers, avg_prep_time) * 100)::NUMERIC, 2) AS corr_percent
FROM avg_prep_time;
 * postgresql://medhat:***@localhost/medhat
1 rows affected.
corr_percent
98.64

There appear to be a strong corrolation between number of pizzas in order and preparing time; on avarage every new pizza add about 10 min to total_time.

iv) What was the average distance travelled for each customer?

%%sql

SELECT customer_id, ROUND(avg(distance), 2) AS avg_distance
FROM (
    SELECT DISTINCT order_id, customer_id, distance
    FROM order_fact
    
) AS uniq_orders
GROUP BY 1
ORDER BY 1;
 * postgresql://medhat:***@localhost/medhat
5 rows affected.
customer_id avg_distance
101 20.00
102 18.40
103 23.40
104 10.00
105 25.00

v) What was the difference between the longest and shortest delivery times for all orders?

%%sql

SELECT 
    DATE_part('min', MIN(duration)) AS "shortest_delivery (min)",
    DATE_part('min', MAX(duration)) AS "longest_delivery (min)",
    DATE_part('min', MAX(duration) - MIN(duration)) AS "duration_diff (min)"
FROM order_fact
 * postgresql://medhat:***@localhost/medhat
1 rows affected.
shortest_delivery (min) longest_delivery (min) duration_diff (min)
10.0 40.0 30.0

vi) What was the average speed for each runner for each delivery and do you notice any trend for these values?

%%sql

CREATE OR REPLACE VIEW orders_speed AS (
    SELECT DISTINCT
        order_id,
        runner_id,
        distance,
        duration,
        round(
            distance / (EXTRACT(EPOCH FROM duration) / 60)
            , 2) AS "speed (km per min)"
    FROM order_fact
);

SELECT *
FROM orders_speed
ORDER BY 1;
 * postgresql://medhat:***@localhost/medhat
Done.
10 rows affected.
order_id runner_id distance duration speed (km per min)
1 1 20 0:32:00 0.63
2 1 20 0:27:00 0.74
3 1 13.4 0:20:00 0.67
4 2 23.4 0:40:00 0.59
5 3 10 0:15:00 0.67
6 3 None None None
7 2 25 0:25:00 1.00
8 2 23.4 0:15:00 1.56
9 2 None None None
10 1 10 0:10:00 1.00
%%sql

SELECT runner_id, round(avg("speed (km per min)"), 2)
FROM orders_speed
GROUP BY 
1
ORDER BY 2 DESC;
 * postgresql://medhat:***@localhost/medhat
3 rows affected.
runner_id round
2 1.05
1 0.76
3 0.67

with an avarage of 1 km/min runner 2 seems the fastest on delivering orders; without consedration to

  • number of orders per each
  • time in day of the order

vii) What is the successful delivery percentage for each runner?

%%sql
WITH distinct_orders AS (
    SELECT DISTINCT order_id, runner_id, cancellation
    FROM order_fact
    ORDER BY 1
), runner_orders_val AS (
    SELECT
        runner_id,
        SUM(CASE WHEN cancellation IS NULL THEN 1 ELSE 0 END) 
            AS successful_orders,
        count(order_id) AS total_orders
    FROM distinct_orders
    GROUP BY 1
)


SELECT *, 
    ROUND(successful_orders * 1.0 / total_orders, 2) AS successful_ratio
FROM runner_orders_val
ORDER BY 1;
 * postgresql://medhat:***@localhost/medhat
3 rows affected.
runner_id successful_orders total_orders successful_ratio
1 4 4 1.00
2 3 4 0.75
3 1 2 0.50

3- Ingredient Optimisation

i) What are the standard ingredients for each pizza?

%%sql

SELECT
    pizza_name,
    INITCAP(REPLACE(REPLACE(concat_toppings, '_', ' '), ',', ', '))
        AS toppings
FROM pizza_dim;
 * postgresql://medhat:***@localhost/medhat
2 rows affected.
pizza_name toppings
Meatlovers Bacon, Bbq Sauce, Beef, Cheese, Chicken, Mushrooms, Pepperoni, Salami
Vegetarian Cheese, Mushrooms, Onions, Peppers, Tomatoes, Tomato Sauce

ii) What was the most commonly added extra?

%%sql
SELECT * FROM exclusions_extras_dim
 * postgresql://medhat:***@localhost/medhat
7 rows affected.
exc_ext_id concat_exclusions concat_extras exclusions_or_extras
87242 None bacon extras
433564 cheese None exclusions
588917 cheese None exclusions
638446 bbq_sauce,mushrooms bacon,cheese both
678602 None bacon extras
720427 cheese bacon,chicken both
967343 cheese None exclusions
%%sql

SELECT extras, count(*)
FROM (
    SELECT
        UNNEST(REGEXP_MATCHES(concat_extras, '[a-z]{1,50}','g')) AS extras
    FROM exclusions_extras_dim
) subquery
GROUP BY 1
ORDER BY 2 DESC
LIMIT 1;
 * postgresql://medhat:***@localhost/medhat
1 rows affected.
extras count
bacon 4

iii) What was the most common exclusion?

%%sql

SELECT extras, count(*)
FROM (
    SELECT
        UNNEST(REGEXP_MATCHES(concat_exclusions, '[a-z]{1,50}','g')) AS extras
    FROM exclusions_extras_dim
) subquery
GROUP BY 1

ORDER BY 2 DESC
LIMIT 1;
 * postgresql://medhat:***@localhost/medhat
1 rows affected.
extras count
cheese 4

Who are these people? XD:)

iv) Generate an order item for each record in the customers_orders table in the format of one of the following:

Meat Lovers Meat Lovers - Exclude Beef Meat Lovers - Extra Bacon Meat Lovers - Exclude Cheese, Bacon - Extra Mushroom, Peppers

%%sql

SELECT
    order_id,
    order_date,
    pizza_name::TEXT ||
    CASE WHEN concat_exclusions IS NOT NULL
        THEN ' - Exclude ' || 
            INITCAP(REPLACE(
                REPLACE(e.concat_exclusions, ',', ', '), '_', ' ')) 
        ELSE '' END ||
    CASE WHEN concat_extras IS NOT NULL 
        THEN ' - Extra ' ||
        INITCAP(REPLACE(
            REPLACE(e.concat_extras, ',', ', '), '_', ' '))
        ELSE '' END AS order_item
FROM order_fact as o
JOIN pizza_dim as p
USING(pizza_id)
LEFT JOIN exclusions_extras_dim as e
USING(exc_ext_id)
ORDER BY 1;
 * postgresql://medhat:***@localhost/medhat
14 rows affected.
order_id order_date order_item
1 2020-01-01 18:05:02 Meatlovers
2 2020-01-01 19:00:52 Meatlovers
3 2020-01-02 23:51:23 Vegetarian
3 2020-01-02 23:51:23 Meatlovers
4 2020-01-04 13:23:46 Vegetarian - Exclude Cheese
4 2020-01-04 13:23:46 Meatlovers - Exclude Cheese
4 2020-01-04 13:23:46 Meatlovers - Exclude Cheese
5 2020-01-08 21:00:29 Meatlovers - Extra Bacon
6 2020-01-08 21:03:13 Vegetarian
7 2020-01-08 21:20:29 Vegetarian - Extra Bacon
8 2020-01-09 23:54:33 Meatlovers
9 2020-01-10 11:22:59 Meatlovers - Exclude Cheese - Extra Bacon, Chicken
10 2020-01-11 18:34:49 Meatlovers
10 2020-01-11 18:34:49 Meatlovers - Exclude Bbq Sauce, Mushrooms - Extra Bacon, Cheese

v) Generate an alphabetically ordered comma separated ingredient list for each pizza order from the customer_orders table and add a 2x in front of any relevant ingredients For example: "Meat Lovers: 2xBacon, Beef, ... , Salami"

%%sql
WITH base AS (
    SELECT
        order_id,
        pizza_name,
        exc_ext_id,
        UNNEST(string_to_array(concat_toppings, ','))
    FROM order_fact
    JOIN pizza_dim
    USING(pizza_id)
    WHERE cancellation IS NULL
),
exclusions AS (
    SELECT
        order_id,
        pizza_name,
        exc_ext_id,
        UNNEST(string_to_array(concat_exclusions, ','))
    FROM order_fact
    JOIN pizza_dim
    USING(pizza_id)
    JOIN exclusions_extras_dim
    USING(exc_ext_id)
    WHERE cancellation IS NULL
),
extras AS (
    SELECT
        order_id,
        pizza_name,
        exc_ext_id, UNNEST(string_to_array(concat_extras, ','))
    FROM order_fact
    JOIN pizza_dim
    USING(pizza_id)
    JOIN exclusions_extras_dim
    USING(exc_ext_id)
    WHERE cancellation IS NULL
),
final_ing AS (
    SELECT * FROM base
    EXCEPT ALL
    SELECT * FROM exclusions
    UNION ALL
    SELECT * FROM extras
)

SELECT 
    order_id,
    pizza_name,
    pizza_name || ' ' ||
    STRING_AGG(
        CASE 
        WHEN count > 1 
            THEN count || 'X' || INITCAP(REPLACE(unnest, '_', ' '))
        ELSE INITCAP(REPLACE(unnest, '_', ' ')) END
    , ', ') AS order_item
FROM (
    SELECT order_id, pizza_name, exc_ext_id, unnest, count(*)
    FROM final_ing
    GROUP BY 1,2,3,4
    ORDER BY 1,2,3,4
) subquery
GROUP BY 1,2;

 * postgresql://medhat:***@localhost/medhat
10 rows affected.
order_id pizza_name order_item
1 Meatlovers Meatlovers Bacon, Bbq Sauce, Beef, Cheese, Chicken, Mushrooms, Pepperoni, Salami
2 Meatlovers Meatlovers Bacon, Bbq Sauce, Beef, Cheese, Chicken, Mushrooms, Pepperoni, Salami
3 Meatlovers Meatlovers Bacon, Bbq Sauce, Beef, Cheese, Chicken, Mushrooms, Pepperoni, Salami
3 Vegetarian Vegetarian Cheese, Mushrooms, Onions, Peppers, Tomato Sauce, Tomatoes
4 Meatlovers Meatlovers Bacon, Bbq Sauce, Beef, Chicken, Mushrooms, Pepperoni, Salami, Bacon, Bbq Sauce, Beef, Chicken, Mushrooms, Pepperoni, Salami
4 Vegetarian Vegetarian Mushrooms, Onions, Peppers, Tomato Sauce, Tomatoes
5 Meatlovers Meatlovers 2XBacon, Bbq Sauce, Beef, Cheese, Chicken, Mushrooms, Pepperoni, Salami
7 Vegetarian Vegetarian Bacon, Cheese, Mushrooms, Onions, Peppers, Tomato Sauce, Tomatoes
8 Meatlovers Meatlovers Bacon, Bbq Sauce, Beef, Cheese, Chicken, Mushrooms, Pepperoni, Salami
10 Meatlovers Meatlovers 2XBacon, Beef, 2XCheese, Chicken, Pepperoni, Salami, Bacon, Bbq Sauce, Beef, Cheese, Chicken, Mushrooms, Pepperoni, Salami

vi) What is the total quantity of each ingredient used in all delivered pizzas sorted by most frequent first?

%%sql
WITH base AS (
    SELECT
        exc_ext_id,
        UNNEST(string_to_array(concat_toppings, ','))
    FROM order_fact
    JOIN pizza_dim
    USING(pizza_id)
    WHERE cancellation IS NULL
),
exclusions AS (
    SELECT 
        exc_ext_id,
        UNNEST(string_to_array(concat_exclusions, ','))
    FROM order_fact
    JOIN exclusions_extras_dim
    USING(exc_ext_id)
    WHERE cancellation IS NULL
),
extras AS (
    SELECT 
        exc_ext_id, UNNEST(string_to_array(concat_extras, ','))
    FROM order_fact
    JOIN exclusions_extras_dim
    USING(exc_ext_id)
    WHERE cancellation IS NULL
),
final_ing AS (
    SELECT * FROM base
    EXCEPT ALL
    SELECT * FROM exclusions
    UNION ALL
    SELECT * FROM extras
)

SELECT unnest, count(*)
FROM final_ing
GROUP BY 1
ORDER BY 2 DESC;
 * postgresql://medhat:***@localhost/medhat
12 rows affected.
unnest count
bacon 12
mushrooms 11
cheese 10
salami 9
pepperoni 9
chicken 9
beef 9
bbq_sauce 8
peppers 3
tomato_sauce 3
tomatoes 3
onions 3

4- Pricing and Ratings

i) If a Meat Lovers pizza costs \$12 and Vegetarian costs \$10 and there were no charges for changes - how much money has Pizza Runner made so far if there are no delivery fees?

%%sql

SELECT 
    pizza_name,
    SUM(
        CASE WHEN pizza_id = 1 THEN 12 WHEN pizza_id = 2 THEN 10 END
    )AS total_dollars
FROM order_fact
JOIN pizza_dim
USING(pizza_id)
WHERE cancellation IS NULL
GROUP BY 1
 * postgresql://medhat:***@localhost/medhat
2 rows affected.
pizza_name total_dollars
Meatlovers 108
Vegetarian 30
%%sql

SELECT 
    SUM(
        CASE WHEN pizza_id = 1 THEN 12 WHEN pizza_id = 2 THEN 10 END
    ) AS total_dollars
FROM order_fact
JOIN pizza_dim
USING(pizza_id)
WHERE cancellation IS NULL
 * postgresql://medhat:***@localhost/medhat
1 rows affected.
total_dollars
138

ii) What if there was an additional \$1 charge for any pizza extras?

  • Add cheese is \$1 extras
%%sql

SELECT 
    pizza_name,
    SUM(
        CASE WHEN pizza_id = 1 THEN 12 WHEN pizza_id = 2 THEN 10 END
    ) +
    SUM(ARRAY_LENGTH(
            string_to_array(concat_extras, ','), 1)
    )AS total_dollars
FROM order_fact
JOIN pizza_dim
USING(pizza_id)
LEFT JOIN exclusions_extras_dim
USING(exc_ext_id)
WHERE cancellation IS NULL
GROUP BY 1
 * postgresql://medhat:***@localhost/medhat
2 rows affected.
pizza_name total_dollars
Meatlovers 111
Vegetarian 31
%%sql

SELECT 
    SUM(
        CASE WHEN pizza_id = 1 THEN 12 WHEN pizza_id = 2 THEN 10 END
    ) +
    SUM(ARRAY_LENGTH(
            string_to_array(concat_extras, ','), 1)
    )AS total_dollars
FROM order_fact
JOIN pizza_dim
USING(pizza_id)
LEFT JOIN exclusions_extras_dim
USING(exc_ext_id)
WHERE cancellation IS NULL;
 * postgresql://medhat:***@localhost/medhat
1 rows affected.
total_dollars
142

iii) The Pizza Runner team now wants to add an additional ratings system that allows customers to rate their runner, how would you design an additional table for this new dataset - generate a schema for this new table and insert your own data for ratings for each successful customer order between 1 to 5.

%%sql

ALTER TABLE order_fact
    DROP COLUMN IF EXISTS rating,
    ADD COLUMN IF NOT EXISTS rating INT;

DO $$
    DECLARE
        ord_id INT;
    BEGIN
        FOR ord_id IN SELECT order_id FROM order_fact
        LOOP
            EXECUTE 'UPDATE order_fact SET rating=' ||
                floor(random() * 5 + 1) ||
                ' WHERE cancellation IS NULL AND order_id = ' || ord_id;
        END LOOP;
    END $$;

SELECT order_id, rating
FROM order_fact
ORDER BY 1
LIMIT 3;
 * postgresql://medhat:***@localhost/medhat
Done.
Done.
3 rows affected.
order_id rating
1 5
2 4
3 4

iv) Using your newly generated table - can you join all of the information together to form a table which has the following information for successful deliveries?

  • customer_id
  • order_id
  • runner_id
  • rating
  • order_time
  • pickup_time
  • Time between order and pickup
  • Delivery duration
  • Average speed
  • Total number of pizzas
%%sql

SELECT
    order_id,
    customer_id,
    runner_id,
    rating,
    order_date,
    pickup_time,
    pickup_time - order_date AS "Time between order and pickup",
    duration,
    round(distance / (EXTRACT(EPOCH FROM duration) / 60), 2)
        AS "speed (km per min)",
    count(*) AS "Total number of pizzas"
FROM order_fact
GROUP BY 1, 2, 3, 4, 5, 6, 7, 8, 9
ORDER BY 1;
 * postgresql://medhat:***@localhost/medhat
10 rows affected.
order_id customer_id runner_id rating order_date pickup_time Time between order and pickup duration speed (km per min) Total number of pizzas
1 101 1 5 2020-01-01 18:05:02 2020-01-01 18:15:34 0:10:32 0:32:00 0.63 1
2 101 1 4 2020-01-01 19:00:52 2020-01-01 19:10:54 0:10:02 0:27:00 0.74 1
3 102 1 4 2020-01-02 23:51:23 2020-01-03 00:12:37 0:21:14 0:20:00 0.67 2
4 103 2 3 2020-01-04 13:23:46 2020-01-04 13:53:03 0:29:17 0:40:00 0.59 3
5 104 3 2 2020-01-08 21:00:29 2020-01-08 21:10:57 0:10:28 0:15:00 0.67 1
6 101 3 None 2020-01-08 21:03:13 None None None None 1
7 105 2 4 2020-01-08 21:20:29 2020-01-08 21:30:45 0:10:16 0:25:00 1.00 1
8 102 2 1 2020-01-09 23:54:33 2020-01-10 00:15:02 0:20:29 0:15:00 1.56 1
9 103 2 None 2020-01-10 11:22:59 None None None None 1
10 104 1 2 2020-01-11 18:34:49 2020-01-11 18:50:20 0:15:31 0:10:00 1.00 2

v) If a Meat Lovers pizza was \$12 and Vegetarian \$10 fixed prices with no cost for extras and each runner is paid \$0.30 per kilometre traveled - how much money does Pizza Runner have left over after these deliveries?

%%sql

SELECT 
    *,
    SUM( distance * 0.30 )AS total_cost,
    total_sales - SUM( distance * 0.30 ) AS total_profit
FROM (
    SELECT
        order_id,
        distance,
        count(*),
        SUM(
            CASE WHEN pizza_id = 1 THEN 12 WHEN pizza_id = 2 THEN 10 END
        ) as total_sales
    FROM order_fact
    JOIN pizza_dim
    USING(pizza_id)
    LEFT JOIN exclusions_extras_dim
    USING(exc_ext_id)
    WHERE cancellation IS NULL
    GROUP BY 1, 2
) subquery
GROUP BY 1,2,3,4
ORDER BY 1
 * postgresql://medhat:***@localhost/medhat
8 rows affected.
order_id distance count total_sales total_cost total_profit
1 20 1 12 6.00 6.00
2 20 1 12 6.00 6.00
3 13.4 2 22 4.020 17.980
4 23.4 3 34 7.020 26.980
5 10 1 12 3.00 9.00
7 25 1 10 7.50 2.50
8 23.4 1 12 7.020 4.980
10 10 2 24 3.00 21.00
%%sql
SELECT SUM(total_profit) AS _net
FROM (
    SELECT 
        *,
        SUM( distance * 0.30 )AS total_cost,
        total_sales - SUM( distance * 0.30 ) AS total_profit
    FROM (
        SELECT
            order_id,
            distance,
            count(*),
            SUM(
                CASE WHEN pizza_id = 1 THEN 12 WHEN pizza_id = 2 THEN 10 END
            ) as total_sales
        FROM order_fact
        JOIN pizza_dim
        USING(pizza_id)
        LEFT JOIN exclusions_extras_dim
        USING(exc_ext_id)
        WHERE cancellation IS NULL
        GROUP BY 1, 2
    ) subquery
    GROUP BY 1,2,3,4
    ORDER BY 1
) subquery
ORDER BY 1
 * postgresql://medhat:***@localhost/medhat
1 rows affected.
_net
94.440

5- Bonus DML Challenges (DML = Data Manipulation Language)

i) If Danny wants to expand his range of pizzas - how would this impact the existing data design? Write an INSERT statement to demonstrate what would happen if a new Supreme pizza with all the toppings was added to the Pizza Runner menu?

%%sql

INSERT INTO pizza_dim
VALUES(
        3,
        'Suprime',
        True,
        True,
        True,
        True,
        True,
        True,
        True,
        True,
        True,
        True,
        True,
        True,
        (
            SELECT STRING_AGG(LOWER(REPLACE(topping_name, ' ', '_')), ',')
            FROM pizza_toppings
        )
);

SELECT * FROM pizza_dim;
 * postgresql://medhat:***@localhost/medhat
1 rows affected.
3 rows affected.
pizza_id pizza_name bacon bbq_sauce beef cheese chicken mushrooms onions pepperoni peppers salami tomatoes tomato_sauce concat_toppings
1 Meatlovers True True True True True True False True False True False False bacon,bbq_sauce,beef,cheese,chicken,mushrooms,pepperoni,salami
2 Vegetarian False False False True False True True False True False True True cheese,mushrooms,onions,peppers,tomatoes,tomato_sauce
3 Suprime True True True True True True True True True True True True bacon,bbq_sauce,beef,cheese,chicken,mushrooms,onions,pepperoni,peppers,salami,tomatoes,tomato_sauce,bacon,bbq_sauce,beef,cheese,chicken,mushrooms,onions,pepperoni,peppers,salami,tomatoes,tomato_sauce