SQL Project: Data Cleaning for Layoffs 2022 Dataset

This project involves cleaning and standardizing the Layoffs 2022 dataset from Kaggle. The dataset includes information about layoffs in various companies globally, and the goal was to clean the data to make it usable for further analysis.

Dataset

Source: Kaggle - Layoffs 2022 Dataset
Description: The dataset contains fields such as company, industry, total_laid_off, percentage_laid_off, date, location, stage, country, and funds_raised_millions.

Objective

The objective of this project was to perform a comprehensive data cleaning process on the layoffs dataset. The key tasks included:

Removing duplicates
Standardizing data
Handling null values
Preparing the dataset for further analysis

Steps Involved

1. Staging Table Creation

Created a staging table (layoffs_staging) to work on data cleaning while keeping the raw data intact for reference.

2. Duplicate Removal

Checked for duplicates using ROW_NUMBER() and partitioning techniques. Duplicates were identified and removed from the staging table.

3. Data Standardization

Standardized various columns, including:

Industry: Consolidated multiple entries of the same industry (e.g., "Crypto Currency" and "CryptoCurrency" were standardized to "Crypto").
Country: Corrected inconsistencies in country names (e.g., "United States." to "United States").
Date: Converted string-formatted dates to the DATE data type.

4. Handling Null Values

Identified null values and handled them accordingly. For example, populated null values in the industry column by referencing non-null values for the same company. Kept null values in key columns like total_laid_off for further analysis.

5. Removing Unnecessary Data

Removed rows with no useful data (e.g., both total_laid_off and percentage_laid_off were null) and dropped temporary columns.

Outcome

The dataset was successfully cleaned, standardized, and prepared for further exploratory data analysis (EDA) and modeling. The clean dataset can now be used for insightful analysis on layoffs trends.

Technologies Used

SQL: MySQL
Tools: MySQL Workbench

How to Use

Clone this repository.
Import the raw dataset into your MySQL environment.
Execute the SQL scripts in the data_cleaning.sql file to clean the data.
Once cleaned, you can proceed with your analysis or export the cleaned data for further processing.

Acknowledgements

Dataset sourced from Kaggle.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
Data Cleaning with sql.sql		Data Cleaning with sql.sql
README.md		README.md
layoffs.csv		layoffs.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SQL Project: Data Cleaning for Layoffs 2022 Dataset

Dataset

Objective

Steps Involved

1. Staging Table Creation

2. Duplicate Removal

3. Data Standardization

4. Handling Null Values

5. Removing Unnecessary Data

Outcome

Technologies Used

How to Use

Acknowledgements

About

Releases

Packages

vannyyyaaa/DataCleaningwithSQL

Folders and files

Latest commit

History

Repository files navigation

SQL Project: Data Cleaning for Layoffs 2022 Dataset

Dataset

Objective

Steps Involved

1. Staging Table Creation

2. Duplicate Removal

3. Data Standardization

4. Handling Null Values

5. Removing Unnecessary Data

Outcome

Technologies Used

How to Use

Acknowledgements

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages