Machine Learning With Python – A Real Life Example

In this article we are going to discuss machine learning with python with the help of a real-life example. Before we proceed towards a real-life example, just recap the basic concept of Linear Regression.

Usually, Linear Regression is used for predictive analysis. It is a linear approximation of a fundamental relationship between two (one dependent and one independent variable) or more variables (one dependent and two or more independent variables).

The main processes of linear regression are to get sample data, design a model that works finest for that sample, and make prediction for the whole dataset. Linear Regression is mainly used for trend forecasting, finding the strength of forecasters and predicting an effect.

There are various types of Linear Regression Analysis in which, Simple Linear Regression (One dependent variable and one independent variable), Multiple Linear Regression (one dependent variable and two or more independent variables), and Logistic Linear Regression (one dependent variable and two plus independent variables) are commonly used.

Let’s start with Simple Linear Regression with one dependent variable and one independent variable.

On the basis of the given data we will build a machine learning model that will predict the price of one Kg mangoes in upcoming years i.e. 2020 and 2021.

year mangoes_price (in Rs.)
2011 40
2012 50
2013 55
2014 60
2015 65
2016 70
2017 75
2018 80
2019 90

We can represent the values in aforementioned table as a scatter plot and then draw a straight line that best fits values on chart as shown in figure.

We can also draw multiple lines like this but we definitely select the one where the total sum of error is lowest.

Total sum of error can be calculated as

We have already learned in mathematics during high school days, y=mx+b, therefore, mangoes prices can be represented by the following equation.

Mangoes_price = m × year + b

Here, m is slope or gradient and b is intercept.

Now, let’s start coding in python, first we import the important libraries, such as pandas (for data manipulation in a tabular form and analysis), numpy (allows us to work with multidimensional arrays and matrices along with a large collection of high-level mathematical functions to operate on these arrays), mathplotlib (a 2D plotting library for python programming which is specially designed for visualization of NumPy computation) and sklearn (formally known as scikit-learn for data mining and data analysis) as shown in figure.

import pandas as pd
import numpy as np
from sklearn import linear_model
import matplotlib.pyplot as plt

Load the dataset: N

ow we load the dataset i.e. mangoes_price.csv which is already placed in the same folder where Simple Linear Regression.ipynb file saved and also check the dataset what is inside the file as shown in the figure.

df = pd.read_csv('mangoes_price.csv')
df

We can also represent that data frame as a scatter plot as shown here.

%matplotlib inline
plt.xlabel('year')
plt.ylabel('mangoes_price')
plt.scatter(df.year,df.mangoes_price,color='blue',marker='.', linewidth='5')

The basic purpose of this plotting data points on a scatter plot chart to find the linear relationship between variables, if the linear relationship found between these variables then we will use the Linear Regression Model.

In this scenario, there is a linear relationship between year and mangoes_price because price of mangoes increased with the passage of time. Before creating a linear model, we will create a new data frame in which we will drop a column (mangoes_price) as the linear model except for 2-D array.

new_df = df.drop('mangoes_price',axis='columns')
new_df

Also, check the price of mangoes like this

mangoes_price = df.mangoes_price
mangoes_price

In order to train the model, we will create an object of Linear Regression class and call a fit() method like this

reg_model = linear_model.LinearRegression()
reg_model.fit(new_df,df.mangoes_price)

We will predict the price of mangoes in the year-2020 and 2021.

reg_model.predict([[2020]])

Now, we manually check the model how it is being predicted this value. Therefore, we will find the slope (coefficient) and intercept like this

reg_model.coef_
reg_model.intercept_

As we already know, y = mx + b, where, ‘m’ is a slope and ‘b’ is an intercept. Hence, after putting the values of coefficient and intercept in the above equation and obtained an equal value of one Kg mangoes in year 2020 that our model has already predicted, result shown in figure

2020*5.66666667 + (-11353.333333333334)

This means that our linear model work good, now we will check its accuracy,

reg_model.score(new_df,mangoes_price)

Woo… our model works perfectly as it provides 98.80% accuracy.

Now, we will generate a csv file (in which only year mentioned but no mangoes price) with list of mangoes price predictions, like this

year_df = pd.read_csv("year.csv")
year_df
price = reg_model.predict(year_df)
price
year_df['mangoes_price']=price
year_df

Comparison of these actual and predicted prices of manages during the last five years i.e. 2015 to 2019 are given below.

S # Year Actual Price of per Kg mangoes (in Rs.) Actual Price of per Kg mangoes (in Rs.)
1 2015 65 65.00
2 2016 70 70.66
3 2017 75 76.33
4 2018 80 82.00
5 2019 90 87.66

Lastly, we will save this result in a new csv file namely price_prediction.csv.

year_df.to_csv("price_prediction.csv")

As we already know, “Practice makes a man perfect”, therefore, we have two problem statements for you to do some exercises to get the optimum grab on this technique.

Problem Statement No.1:

You are required to build a Regression Model and predict the price of Lux Soap in the upcoming year i.e. 2020. Download the file lux_price.csv

Problem Statement No.2:

You are required to build a Regression Model and predict the per capita income of the citizens of a country in the previous years (1990 & 1994). Download the file country_income.csv

Leave a Reply