Skip to content

Latest commit

 

History

History
64 lines (57 loc) · 3.4 KB

README.md

File metadata and controls

64 lines (57 loc) · 3.4 KB

Workshop on "Machine learning model developments for classification problems"

Workshop Day 1

  1. Covered general introduction to build ML Model
  2. Classification problem use case
  3. HR analysitcs Data - Context and content of the data
  4. Practical Demo using R
  • Data Cleaning
  • Data Pre-processing
  • Missing Imputations
  • Exploratory data analysis
  • Feature Engineering
  • Feature Selection
  • Data scaling

Created a master data with all existing and derived features. We derived ~190 additional features after processing and feature engineering.

Workshop Day 2

  1. Created Train and Valid data based off 80:20 ratio
  2. Logistic Regression
  • GLM model development
  • Predict score on validataion data
  • Evaluation of the model before optimising the probability cut off
  • Evaluation of the model after optimising the probability cut off
  • AUC, Confusion matric and ROC curve
  • Gain table
  1. Random Forest Model
  • GLM model development
  • Predict score on validataion data
  • Evaluation of the model before optimising the probability cut off
  • Evaluation of the model after optimising the probability cut off
  • AUC, Confusion matric and ROC curve
  • Gain table
  1. XGBoost Model
  • GLM model development
  • Predict score on validataion data
  • Evaluation of the model before optimising the probability cut off
  • Evaluation of the model after optimising the probability cut off
  • AUC, Confusion matric and ROC curve
  • Gain table
  1. Model comparision
  2. Model Deployment

Reference:

  1. Data Source:
  1. Logistic Regression
  1. Random Forest
  1. XGBoost