Machine Learning With Python - A Real Life Example

In this article we are going to discuss machine learning with python with the help of a real-life example. Before we proceed towards a real-life example, just recap the basic concept of Linear Regression.

Usually, Linear Regression is used for predictive analysis. It is a linear approximation of a fundamental relationship between two (one dependent and one independent variable) or more variables (one dependent and two or more independent variables).

Read also: 4 Types of Machine Learning

The main processes of linear regression are to get sample data, design a model that works finest for that sample, and make prediction for the whole dataset. Linear Regression is mainly used for trend forecasting, finding the strength of forecasters and predicting an effect.

There are various types of Linear Regression Analysis in which, Simple Linear Regression (One dependent variable and one independent variable), Multiple Linear Regression (one dependent variable and two or more independent variables), and Logistic Linear Regression (one dependent variable and two plus independent variables) are commonly used.

Let’s start with Simple Linear Regression with one dependent variable and one independent variable.

On the basis of the given data we will build a machine learning model that will predict the price of one Kg mangoes in upcoming years i.e. 2020 and 2021.

year	mangoes_price (in Rs.)
2011	40
2012	50
2013	55
2014	60
2015	65
2016	70
2017	75
2018	80
2019	90

We can represent the values in aforementioned table as a scatter plot and then draw a straight line that best fits values on chart as shown in figure.

Machine Learning With Python - A Real Life Example 1

We can also draw multiple lines like this but we definitely select the one where the total sum of error is lowest.

Machine Learning With Python - A Real Life Example 2

Total sum of error can be calculated as

We have already learned in mathematics during high school days, y=mx+b, therefore, mangoes prices can be represented by the following equation.

Mangoes_price = m × year + b

Here, m is slope or gradient and b is intercept.

Now, let’s start coding in python, first we import the important libraries, such as pandas (for data manipulation in a tabular form and analysis), numpy (allows us to work with multidimensional arrays and matrices along with a large collection of high-level mathematical functions to operate on these arrays), mathplotlib (a 2D plotting library for python programming which is specially designed for visualization of NumPy computation) and sklearn (formally known as scikit-learn for data mining and data analysis) as shown in figure.

import pandas as pd
import numpy as np
from sklearn import linear_model
import matplotlib.pyplot as plt

Machine Learning With Python - A Real Life Example 3

Load the dataset: N

ow we load the dataset i.e. mangoes_price.csv which is already placed in the same folder where Simple Linear Regression.ipynb file saved and also check the dataset what is inside the file as shown in the figure.

df = pd.read_csv('mangoes_price.csv')
df

Machine Learning With Python - A Real Life Example 4

We can also represent that data frame as a scatter plot as shown here.

%matplotlib inline
plt.xlabel('year')
plt.ylabel('mangoes_price')
plt.scatter(df.year,df.mangoes_price,color='blue',marker='.', linewidth='5')

Machine Learning With Python - A Real Life Example 5

The basic purpose of this plotting data points on a scatter plot chart to find the linear relationship between variables, if the linear relationship found between these variables then we will use the Linear Regression Model.

In this scenario, there is a linear relationship between year and mangoes_price because price of mangoes increased with the passage of time. Before creating a linear model, we will create a new data frame in which we will drop a column (mangoes_price) as the linear model except for 2-D array.

new_df = df.drop('mangoes_price',axis='columns')
new_df

Machine Learning With Python - A Real Life Example 6

Also, check the price of mangoes like this

mangoes_price = df.mangoes_price
mangoes_price

Machine Learning With Python - A Real Life Example 7

In order to train the model, we will create an object of Linear Regression class and call a fit() method like this

reg_model = linear_model.LinearRegression()
reg_model.fit(new_df,df.mangoes_price)

Machine Learning With Python - A Real Life Example 8

We will predict the price of mangoes in the year-2020 and 2021.

reg_model.predict([[2020]])

Machine Learning With Python - A Real Life Example 9

Now, we manually check the model how it is being predicted this value. Therefore, we will find the slope (coefficient) and intercept like this

reg_model.coef_

reg_model.intercept_

Machine Learning With Python - A Real Life Example 10

As we already know, y = mx + b, where, ‘m’ is a slope and ‘b’ is an intercept. Hence, after putting the values of coefficient and intercept in the above equation and obtained an equal value of one Kg mangoes in year 2020 that our model has already predicted, result shown in figure

2020*5.66666667 + (-11353.333333333334)

Machine Learning With Python - A Real Life Example 11

This means that our linear model work good, now we will check its accuracy,

reg_model.score(new_df,mangoes_price)

Woo… our model works perfectly as it provides 98.80% accuracy.

Now, we will generate a csv file (in which only year mentioned but no mangoes price) with list of mangoes price predictions, like this

year_df = pd.read_csv("year.csv")
year_df

Machine Learning With Python - A Real Life Example 13

price = reg_model.predict(year_df)
price

year_df['mangoes_price']=price
year_df

Machine Learning With Python - A Real Life Example 14

Comparison of these actual and predicted prices of manages during the last five years i.e. 2015 to 2019 are given below.

S #	Year	Actual Price of per Kg mangoes (in Rs.)	Actual Price of per Kg mangoes (in Rs.)
1	2015	65	65.00
2	2016	70	70.66
3	2017	75	76.33
4	2018	80	82.00
5	2019	90	87.66

Lastly, we will save this result in a new csv file namely price_prediction.csv.

year_df.to_csv("price_prediction.csv")

As we already know, “Practice makes a man perfect”, therefore, we have two problem statements for you to do some exercises to get the optimum grab on this technique.

Problem Statement No.1:

You are required to build a Regression Model and predict the price of Lux Soap in the upcoming year i.e. 2020. Download the file lux_price.csv

Problem Statement No.2:

You are required to build a Regression Model and predict the per capita income of the citizens of a country in the previous years (1990 & 1994). Download the file country_income.csv