Implementation of multiple linear regression model.
- Multiple linear regression algorithm is used for regression problems.
- Dataset is having continous data and multiple independent variable, so multiple linear regression algorithm used for building model.
- 50 start ups dataset used for model building.
- CSV (Comma Separated Values) format.
- Attributes can be integer or real values.
- Responses can be integer, real or categorical.
The primary goal is predict profit of start up based on R&D spend, administration, marketing spend and state.
- pandas, numpy, matplotlib,seaborn,sklearn,joblib used in project
-
- followed indistry standard practice of machine learning life cycle steps.
-
- implement necessary transformation, preprocessing of dataset.
- conduct exploratory data analysis on dataset.
-
- visualised data using visualisation library like matplotlib, seaborn.
-
- scikit library use for linear regression.
-
- model validate with r2_score, RMSE.
-
- joblib library used to dump model.
- model is saved in .ipynb formate as 50_startups_multiple_regression_model.
- No any null values in dataset.
- profit having strong correlation with R&D Spend - 97% and then marketing spend- 74%.
- No outliers in dataset.
- california state profit shows high increase trend wrt to R&D spend.
- california state profit shows high increase trend with marketing spend.
- highest profitable state is new york - 1.93M.
- R&D spend in new york - 35.1% of total spend.
- highest marketing spend is in florida state.
- multiple number of model build using different independent variable.
- state categorial data converted into numeric using get dummies.
- model with r2_score - 90% is found highest accuracy, so it is load and saved using joblib.