Machine Learning -Linear Regression

Balachandar Paulraj
4 min readAug 1, 2020

Let us learn about the concepts of Linear Regression by relating it with single input of data. Like most of the learning URL’s, let me consider square feet of an area as an input (denoted as x). Based on the input, I’ll be predicting price of a house (denoted as y).

Need For Algorithm:

Before diving deep into Machine learning, let us try to figure out the need for algorithms implemented in Machine Learning. Consider the table with Dollars(x) and Products(y), where x represents input and y is denoted to represent output. Our job is to frame an equation to predict output (dollars) based on given input (products)

It can be easily derived as y=x+(x/5). This equation can be easily derived, if our predicted output depends on only one input variable. Imagine a case, where output depends on more than 10 input variables. There are cases where 100 to 1000 inputs are used to determine an output. It urges the need to go for an Algorithm to suit complex scenarios.

Important terms in Machine Learning:

  • Hypothesis : How are you going to predict the output? Hypothesis is a formula/expression used to compute the value of output based on given number of input features. In the above example, y=x+(x/5) is the hypothesis used to predict output.
  • Cost Function : Calculate the difference between actual output and predicted output (hypothesis). Cost function is also called as squared error function. Cost function for above example is 0. Because, if you apply all values of x from the table in hypothesis (y=x+(x/5)) to find predicted output, it have same values as in actual output. In Microsoft Excel, it is represented as “Error Bars”. (For better clarity and to explore, try to enable “Error Bars” in scatter chart)
  • Algorithm : Reduce the value of cost function i.e to enhance the performance and value of hypothesis close to actual output. Predicted output can be closer to actual output, if cost function derived using hypothesis is closer to 0. How can we achieve it? It can be done by using some algorithms developed for Machine Learning.

Linear Regression:

When the hypothesis we are trying to predict is continuous, the problem is termed as Linear Regression.

Consider the above picture where a graph has been plotted for data points formed with Area-Square Feet as input(x) and Price of a house(y) as output. Let us go through hypothesis, cost function and algorithm implemented in Linear Regression by referring the graph plotted.

Derivation of Hypothesis:

In the graph plotted, our job is to find the line that passes close to all data points. In Microsoft Excel, this line is termed as “Trend Lines”. With given x(input) and y(output), how a line can be formed? Just try to go through your school mathematics books, a line can be formed through an equation y=mx+b.(Refer ‘m’ and ‘b’ separately. It is not explained here) ‘m’ and ‘b’ in Linear Regression are denoted as θ⁰ and θ¹. (Note: 0 and 1 in θ has been placed in superscript due to restrictions in LinkedIn. It should come as part of subscript as per Linear Regression Equation). Hence, hypothesis for Linear Regression can be derived using h(x)=θ⁰+θ¹x

Derivation of Cost Function:

Why the cost function is needed when hypothesis is already formed? Take a stab at the below pictures where hypothesis results in three different lines in same graph.

How to find which line derived from hypothesis is close to data points? By checking the graph plotted, we can easily say line plotted in Figure 2 is close to data points plotted in the graph. It can be easily known by plotting the graph for single input.

But, if number of input features are more than 10, then it is difficult to identify the closest line. Hence, we can make use of cost function to identify best line close to data points.

Cost Function calculates a cost for each lines plotted in three different figures. The line having least cost having it’s predicted output close to actual output. Note: Formula and Derivation of cost function for linear regression is not explained here.

Derivation of Algorithm:

In other words, algorithm for any problem is required to identify the optimal values for θ⁰ and θ¹. Optimal values of θ⁰ and θ¹ results in getting the predicted output closer to actual output. In other words, the line drawn using the theta obtained from Algorithm will be closer to data points plotted. The Algorithm used by Linear Regression to minimize the cost function is called as Gradient Descent.

--

--