Technology

Alt Full Text
Understanding Classification Models

Classification Models

Classification is a task of machine learning which assigns a label value to a specific observation and then uses these labelled observations to identify a particular type to be of one kind or the other. An example is the classification of email as either spam or not spam

To build any classification model, you will require a training dataset with several examples of inputs (feature variables) and outputs (target variables) from which the model will train itself. The training data must include all the possible scenarios of the problem and must have sufficient data for each label for the model to be trained correctly. Class labels are often returned as string values and hence needs to be encoded into an integer like either representing 0 for "spam" and 1 for "no-spam".

Types of classification tasks

  1. Binary Classification
  2. Multi-class classification
  3. Multi-label classification
  4. Imbalanced classification

Binary Classification for Machine Learning

  • These refers to those tasks which can give either of any two classes as output
  • One class is normally considered as the normal state, and the other is considered as the abnormal state
    • Email spam detection: Normal State - Not Spam, Abnormal State - Spam
    • Conversion prediction: Normal State - Not Churned, Abnormal State - Churn
  • Mostly followed notation is that the normal state is assigned a 0 and the abnormal state is assigned a 1
  • One can also create, instead of predicting a class label, the model predicts a Bernoulli probability for the output
  • Most popular algorithms for binary classification task are:
    • K-Nearest Neighbours
    • Logistic Regression
    • Support Vector Machine
    • Decision Trees
    • Naive Bayes

Multi-class Classification for Machine Learning

  • These can have any number of labels with the minimum being three labels
  • Examples are:
    • Plant species classification 
    • Sentiment analysis (happy, sad, neutral)
    • Optical character recognition
  • These types of models are normally done using Categorical Distribution
  • The model predicts the probability of input with respect to each of the output labels
  • Most common algorithms used for Multi-Class classification:
    • K-Nearest Neighbours
    • Naive Bayes
    • Decision Trees
    • Gradient Boosting
    • Random Forest

Multi-label Classification for Machine Learning

  • In these tasks, we assign two or more specific class labels that could be predicted for each example
  • Example is where we have a single photo that needs to identify all the objects in the photo
  • The commonly used algorithms are:
    • Multi-label Random Forest
    • Multi-label Decision Trees
    • Multi-label Gradient Boosting

Imbalanced Classification for Machine Learning

  • This refers to those tasks where the number of classes are unequally distributed
  • Generally, imbalanced classification tasks are binary classification jobs where a major portion of the training dataset is of the normal class type and a minority of them belong to the abnormal class type
  • Examples are:
    • Fraud Detection
    • Outlier Detection
    • Medical Diagnosis Test

Evaluating Classification Models

 

Sources

  1. Arnab Mondal
  2. Bernoulli Distribution
  3. Categorical Distribution
  4. Mayank Banoula
  5. Zoumana Keita
  6. Mohammad Waseem
  7. Classification Predictive Modelling

Related Articles