Technology

Alt Full Text
Learning Curves

Learning Curves

Learning curves in machine learning are used to evaluate how models will perform with varying numbers of training samples. This is achieved by monitoring training and validation scores (model accuracy) with an increasing number of training samples.

Knowing how to use learning curves helps in assessing whether the model is suffering from high bias (underfitting) or high variance (overfitting) and whether increasing training data samples could help solve the bias or variance problem.

Learning Curve

The plot given above represents models with different training, validation accuracies. In the plot, the training accuracy of the model is denoted by orange dashed line, the validation accuracy of the model is denoted by the blue line, and the desired model accuracy is denoted by the black dashed line

Notes:

  1. Use learning curve as a mechanism to diagnose machine learning model bias-variance problem.
  2. For model having underfitting / high-bias, both the training and validation scores are vary low and also lesser than the desired accuracy.
  3. In order to reduce underfitting, consider adding more features. Or, consider reducing degree of regularization for models (build using SVM, logistic regression etc) which support regularization.
  4. For model having overfitting / high-variance, there is a large gap between training and validation accuracy. Also, training accuracy may come to be more than desired accuracy.
  5. In order to reduce overfitting, consider adding more features and data (although adding data may not always work). For regularized models, consider increasing the value of regularization. But take caution or else model will underfit.

Sources

  1. Ajitesh Kumar
  2. Valamis

Related Articles