This repository contains an Exploratory Data Analysis (EDA) of a vehicle repairs dataset. The dataset consists of 100 records and 52 columns, capturing detailed information about vehicle repair transactions. The goal is to uncover trends, frequent issues, and potential improvements based on the data.
- File Name:
vehicle_repairs.csv
- Records: 100
- Columns: 52
VIN
: Vehicle Identification NumberTRANSACTION_ID
: Unique repair transaction IDCORRECTION_VERBATIM
: Description of the repair doneCUSTOMER_VERBATIM
: Customer's issue descriptionREPAIR_DATE
: Date of repairCAUSAL_PART_NM
: Part responsible for the issueGLOBAL_LABOR_CODE_DESCRIPTION
: Type of repair performedPLATFORM
: Vehicle platform (e.g., Full-Size Trucks, BEV)BODY_STYLE
: Body style (e.g., Crew Cab, 4 Door Utility)REPORTING_COST
,TOTALCOST
,LBRCOST
: Cost-related metrics- ...and many more
- Examined dataset shape, data types, and missing values
- Generated descriptive statistics for numerical and categorical columns
- Identified unique values and frequent patterns
- Replaced missing categorical values with
"Unknown"
- Substituted corrupted characters with
"Corrupt Value"
- Filled missing
TOTALCOST
values usingREPORTING_COST
for consistency
- Common Repairs: Steering wheel-related issues were most frequent
- Platform Trends: Full-Size Trucks had the highest repair count
- Cost Distribution: Repair costs varied significantly; a few were high-cost outliers
- Geographical Patterns: Most repairs occurred in the US, especially in CA and TX
cleaned_steering_repair_data.csv
: Final cleaned version of the dataset after preprocessinggenerated_repair_tags.csv
: Extracted tags fromCORRECTION_VERBATIM
andCUSTOMER_VERBATIM
columns for downstream use
eda.ipynb
: Jupyter notebook containing all EDA steps—loading, cleaning, exploration, and visualizationvehicle_repairs.csv
: The original dataset in csv formatcleaned_vehicle_repairs.csv
: Cleaned dataset with consistent and processed valuesgenerated_tags.csv
: Tags generated from free-text fields (correction/customer verbatim)
This analysis can support further research on repair cost optimization, predictive maintenance, and customer experience improvements.