# sigma part
Best answer by mars
We generally square the difference before summation to avoid zero. If we do not square the individual differences, and then sum over all the values, there a chance we may end up with a zero value for cost function.
While the cost function should only be zero when predicted value is equal to label. Squaring ensures this. View original