Bias-Variance Trade-off

Published in

WiCDS

3 min readJan 25, 2021

BIAS

It is an error that is introduced in model due to oversimplificated of Machine Learning Algorithm that leads to underfitting. A biased dataset does not accurately represent a model’s use case, resulting in skewed outcomes, low accuracy levels, and analytical errors. Bias is how far the predicted values are from the actual values. Bias is know as the difference between the average prediction of our model and the actual or(correct) value. If the average predicted values are far off from the actual values then the bias is high.

VARIANCE

It is the error that is introduced in the model due to complex ML algorithm. Model learns noise from the training dataset and performs badly on test dataset. It can lead to high sensitivity and overfitting .

High Bias Low Variance: Models are consistent but inaccurate on average. High Bias High Variance : Models are inaccurate and also inconsistent on average. Low Bias Low Variance: Models are accurate and consistent on averages.

low variance , high bias = Underfitting

low bias , high variance = Overfitting

low bias ML algorithms : Decision Tree, KNN , SVM

high bias ML algorithms : Linear Regression , Logistic Regression

Algorithm Linear Regression, Decision Tree, Bagging ,Random Forest
Bias High Bias , Low Bias , Low Bias , Low Bias
Variance Less Variance , High Variance ,High Variance , High Variance

Trade -off

It is the Tension between error introduced by bias and variance.

Best fit : Low Variance and Low Bias

Overfitting: Good performance on the training data, poor generliazation to other data. Here the model wants to cover all the points of the x-y plot, A statistical model is said to be overfitted, when we train it with a lot of data

Why is Overfitting called high variance?

If, our model has parameters which gives low error in training data. So if it gets any test data which resembles training data error is going to be low. But when it encounters a test dataset which is not similiar as train set it is going to give a big error. It leads to higher variance.

Techniques to reduce overfitting :
1. Increase training data.
2. Reduce model complexity.
3. Ridge Regularization and Lasso Regularization
4. Cross-validations like K-Fold.
5.Keep model simple ,reduce variance by taking account fewer variable and parameter.

Underfitting: Poor performance on the training data and poor generalization to other data. Here only few points will be in the straight line and other will be far in x-y plot. Underfitting destroys the accuracy of our machine learning model. Its occurrence simply means that our model or the algorithm does not fit the data well enough. It usually happens when we have less data to build an accurate model and also when we try to build a linear model with a non-linear data.

Techniques to reduce underfitting :

1. Increase model complexity
2. Increase number of features, performing feature engineering
3. Remove noise from the data.

Noise: It is the irreducible error that a model cannot predict.

WHAT EXACTLY IS BIAS-VARIANCE TRADE-OFF

Goal of any supervised Machine Learning Algorithm is the have low bias and low variance to achieve good prediction performance. If the algorithm is too simple then it may be on high bias and low variance condition and thus is error-prone. If algorithms fit too complex then it may be on high variance and low bias. Well, there is something between both of these conditions, known as Trade-off or Bias Variance Trade-off.