# Linear Regression

Train a linear regression model on a real estate listings in the San Luis Obispo county to predict the price of the house based on the number of bedrooms, number of bathrooms and the square footage.

Print the line coefficients and the line intercept

Print the estimated prices for the following houses:

a). Bedrooms = 3, bathrooms = 3, size = 2371

b). Bedrooms = 5, bathrooms = 2, size = 3600


In [1]:
import numpy as np
import pandas as pd
import numpy.matlib
import sklearn
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression

# read dataset
df = pd.read_csv("housing_price_data.csv")
print(df.head())

Z = (df["Bedrooms"].values, df["Bathrooms"].values, df["Size"].values, df["Price"].values)
Z = np.transpose(Z)
np.random.shuffle(Z)

     MPLS   Price  Bedrooms  Bathrooms  Size  PriceSq
0  132842  795000         3          3  2371   335.30
1  134364  399000         4          3  2818   141.59
2  135141  545000         4          3  3032   179.75
3  135712  909000         4          4  3540   256.78
4  136282  109900         3          1  1249    87.99


In [2]:
# Training
X = Z[:,0:-1]
y = Z[:,-1]
y = y.reshape(len(y),1)

model = LinearRegression()
model.fit(X, y)
print ('Line coefficients {0}, line intercept={1}'.format(model.coef_, model.intercept_))

Line coefficients [[-100544.07258947   45981.15827545     309.60130515]], line intercept=[47553.18737558]


In [3]:
# Prediction
bedrooms = 3
bathrooms = 3
size = 2371
x = (np.array([bedrooms, bathrooms, size]))  # no bedrooms, no bathrooms, size
yhat = np.int32(np.matmul(model.coef_,x) + model.intercept_)
print ('Estimated price is', yhat)

Estimated price is [617929]
