When do we cross features in logistic regression

Crossing features in Logistic Regression

Robert Ayub Technology 02 July 2023 Hits: 943

Crossing features, also known as interaction terms is a product of two or more predictor variables that are included in the model. Interaction terms are used when the effect of one predictor variable on the response variable depends on the value of another predictor variable.

Example #1

Building a logistic regression model that predicts whether a patient will develop a disease based on their age and BMI. The effect on BMI on the likelihood of developing the disease depends on the patients age and thus we could include an interaction term between age and BMI in the logistic regression model. The interaction term would look something like age * BMI

Example #2

Building a logistic regression model to predict whether a customer will purchase a product based on their age and income. Age an income may have a complex relationship with the likelihood of purchase, and it may be useful to include an interaction term between and and income to capture the complexity.

Necessity

When there is reason to believe that the relationship between the response variable and the predictor variables is not purely additive (if the effect of one predictor variable on the response variable changes depending on the value of another predictor variable), then it may be useful o include interaction terms in the logistic regression model!

Overfitting

Including interaction terms in logistic regression may lead to overfitting if there are too many interaction terms or if they are not carefully selected. It is important to use domain knowledge and statistical techniques such as:

stepwise regression - involves adding or removing interaction terms one by one based on their statistical significance
regularization - this involves adding a penalty term to the model to discourage overfitting

in order to avoid overfitting.

Types of Interaction Terms

Multiplicative - Most common and are created by multiplying two or more predictor variables together
Polynomial - created by raising one or more predictor variables to a power and multiplying them together. For example if the predictor variable is x1, then the interaction term would be x1^2 * x2
Categorical - created by multiplying a categorical predictor variable with a continous predictor variable. For example if the predictor variables are gender and age, the interaction term would be gender * age

Interpretation

The interpretation of interaction terms can be complex. The coefficient of an interaction term represents the change in the log odds of the response variable associated with one unit increase in the predictor variable, while holding the other predictor variable constant. The direction and magnitude of the coefficient can provide insights into the nature of the relationship between the predictor variable and the response variable.

Non Linear effects

In some cases, the relationship between the predictor variables and the response variable may be non-linear, and interactions terms may not be sufficient to capture this complexity. In such cases, it may be useful to include higher-order terms, such as quadratic or cubic terms or to use non-linear regression techniques.

Practical Notebooks

Students enrolling for any AI related course from Carnegie Training Institute have access to jupyter notebook, class exercises illustrating this reasoning.

Sources

Robert Ayub

Kenya

+254 718 758 221

robert@ayub.co.ke

+254 718 758 221

Technology

Crossing features in Logistic Regression

Robert Ayub

Kenya

+254 718 758 221

robert@ayub.co.ke

+254 718 758 221

Technology

Crossing features in Logistic Regression

Related Articles

What is Artificial Intelligence

Mobile App or Website?

Multipl Linear Regression