- Covered general introduction to build ML Model
- Classification problem use case
- HR analysitcs Data - Context and content of the data
- Practical Demo using R
- Data Cleaning
- Data Pre-processing
- Missing Imputations
- Exploratory data analysis
- Feature Engineering
- Feature Selection
- Data scaling
Created a master data with all existing and derived features. We derived ~190 additional features after processing and feature engineering.
- Created Train and Valid data based off 80:20 ratio
- Logistic Regression
- GLM model development
- Predict score on validataion data
- Evaluation of the model before optimising the probability cut off
- Evaluation of the model after optimising the probability cut off
- AUC, Confusion matric and ROC curve
- Gain table
- Random Forest Model
- GLM model development
- Predict score on validataion data
- Evaluation of the model before optimising the probability cut off
- Evaluation of the model after optimising the probability cut off
- AUC, Confusion matric and ROC curve
- Gain table
- XGBoost Model
- GLM model development
- Predict score on validataion data
- Evaluation of the model before optimising the probability cut off
- Evaluation of the model after optimising the probability cut off
- AUC, Confusion matric and ROC curve
- Gain table
- Model comparision
- Model Deployment
- Data Source:
- Logistic Regression
- https://www.kdnuggets.com/2018/02/logistic-regression-concise-technical-overview.html#%2EWoceyX3ClwM%2Elinkedin![image](https://user-images.githubusercontent.com/17849762/125186922-9a5a5280-e24a-11eb-9bf8-e01db4e03367.png)
- http://www.real-statistics.com/logistic-regression/receiver-operating-characteristic-roc-curve/![image](https://user-images.githubusercontent.com/17849762/125186930-a219f700-e24a-11eb-8d90-4a062322417e.png)
- https://medium.com/convoy-tech/down-the-auc-rabbit-hole-and-into-open-source-part-1-42c47e90e357![image](https://user-images.githubusercontent.com/17849762/125186934-ae9e4f80-e24a-11eb-9ada-d3a8e46d92aa.png)
- Random Forest
- https://www.hackerearth.com/practice/machine-learning/machine-learning-algorithms/tutorial-random-forest-parameter-tuning-r/tutorial/
- https://blog.clairvoyantsoft.com/entropy-information-gain-and-gini-index-the-crux-of-a-decision-tree-99d0cdc699f4
- https://towardsdatascience.com/an-implementation-and-explanation-of-the-random-forest-in-python-77bf308a9b76
- https://www.topcoder.com/thrive/articles/understanding-random-forest-and-hyper-parameter-tuning![image](https://user-images.githubusercontent.com/17849762/125186892-70089500-e24a-11eb-8f15-356d38f16b75.png)
- https://towardsdatascience.com/hyperparameter-tuning-the-random-forest-in-python-using-scikit-learn-28d2aa77dd74![image](https://user-images.githubusercontent.com/17849762/125186904-80b90b00-e24a-11eb-9f39-9631e26ecff3.png)
- XGBoost
- https://xgboost.readthedocs.io/en/latest/
- http://explained.ai/gradient-boosting/index.html![image](https://user-images.githubusercontent.com/17849762/125186878-62eba600-e24a-11eb-95d9-355b2641243a.png)
- https://eng.uber.com/productionizing-distributed-xgboost/
- https://machinelearningmastery.com/gentle-introduction-xgboost-applied-machine-learning/