-
Notifications
You must be signed in to change notification settings - Fork 2
HOUSING_PRICE
TYehan edited this page Feb 27, 2025
·
1 revision
This document explains the primary machine learning concepts demonstrated in the Housing Price prediction practical notebook. The notebook employs an end-to-end pipeline—from data acquisition to model evaluation—using the California housing dataset.
-
Dataset Acquisition:
- The notebook uses
fetch_california_housing
fromsklearn.datasets
to load the California housing dataset.
- The notebook uses
-
Data Structuring:
- A Pandas DataFrame is created from the dataset features, and a
target
column is appended representing the median house prices.
- A Pandas DataFrame is created from the dataset features, and a
-
Initial Exploration:
- A preview of the DataFrame is displayed to verify the data structure and to inspect feature values.
-
Feature Selection:
- Two features (
MedInc
andAveRooms
) are removed from the DataFrame to create the feature matrix X. - The target variable y is set as the
target
column.
- Two features (
-
Train/Test Split:
- The dataset is partitioned into training and test sets using an 80/20 split with a fixed random state, ensuring reliable evaluation on unseen data.
-
Linear Regression Model:
- A
LinearRegression
model is instantiated and trained using the training data. - After training, the model is used to predict the housing prices on the test set.
- A
-
Metric Computation:
- The notebook calculates several regression metrics:
- Mean Absolute Error (MAE)
- Mean Squared Error (MSE)
- Root Mean Squared Error (RMSE)
- R² Score (Coefficient of Determination)
- The notebook calculates several regression metrics:
-
Visualization:
- A plot is created using matplotlib to graphically compare the evaluation metrics, utilizing a logarithmic scale for clearer visualization.
-
Output Display:
- The calculated metrics are printed to provide a quantitative measure of the model’s performance.
This practical notebook serves as a comprehensive example of applying fundamental machine learning techniques to a real-world housing dataset for predictive modeling.