# Cost function

why are we minimizing the square of (prediction - actual) while finding out the value of theta 0 and theta 1 in hypothesis.
We generally square the difference before summation to avoid zero. If we do not square the individual differences, and then sum over all the values, there a chance we may end up with a zero value for cost function.
While the cost function should only be zero when predicted value is equal to label. Squaring ensures this.
