Classification is a task of machine learning which assigns a label value to a specific observation and then uses these labelled observations to identify a particular type to be of one kind or the other. An example is the classification of email as either spam or not spam
To build any classification model, you will require a training dataset with several examples of inputs (feature variables) and outputs (target variables) from which the model will train itself. The training data must include all the possible scenarios of the problem and must have sufficient data for each label for the model to be trained correctly. Class labels are often returned as string values and hence needs to be encoded into an integer like either representing 0 for "spam" and 1 for "no-spam".
Types of classification tasks
Binary Classification
Multi-class classification
Multi-label classification
Imbalanced classification
Binary Classification for Machine Learning
These refers to those tasks which can give either of any two classes as output
One class is normally considered as the normal state, and the other is considered as the abnormal state
Email spam detection: Normal State - Not Spam, Abnormal State - Spam
Conversion prediction: Normal State - Not Churned, Abnormal State - Churn
Mostly followed notation is that the normal state is assigned a 0 and the abnormal state is assigned a 1
One can also create, instead of predicting a class label, the model predicts a Bernoulli probability for the output
Most popular algorithms for binary classification task are:
K-Nearest Neighbours
Logistic Regression
Support Vector Machine
Decision Trees
Naive Bayes
Multi-class Classification for Machine Learning
These can have any number of labels with the minimum being three labels
Examples are:
Plant species classification
Sentiment analysis (happy, sad, neutral)
Optical character recognition
These types of models are normally done using Categorical Distribution
The model predicts the probability of input with respect to each of the output labels
Most common algorithms used for Multi-Class classification:
K-Nearest Neighbours
Naive Bayes
Decision Trees
Gradient Boosting
Random Forest
Multi-label Classification for Machine Learning
In these tasks, we assign two or more specific class labels that could be predicted for each example
Example is where we have a single photo that needs to identify all the objects in the photo
The commonly used algorithms are:
Multi-label Random Forest
Multi-label Decision Trees
Multi-label Gradient Boosting
Imbalanced Classification for Machine Learning
This refers to those tasks where the number of classes are unequally distributed
Generally, imbalanced classification tasks are binary classification jobs where a major portion of the training dataset is of the normal class type and a minority of them belong to the abnormal class type