Linear regression is a powerful statistical tool used to model the relationship between a dependent variable and one or more independent variables (features). An important, and often forgotten, concept in regression analysis is that of interaction terms. In short, interaction terms enable you to examine whether the relationship between the target and the independent variable changes depending on the value of another independent variable.
Interaction terms are a crucial component of regression analysis, and understanding how they work can help practitioners better train models and interpret their data. Despite their importance, however, interaction terms can be difficult to understand.
This post provides an intuitive explanation of interaction terms in the context of linear regression.
What are interaction terms in regression models?
First, here’s the simpler case; that is, a linear model without interaction terms. Such a model assumes that the effect of each feature or predictor on the dependent variable (target) is independent of other predictors in the model.
The following equation describes such a model specification with two features:
To make the explanation easier to understand, here’s an example. Imagine that you are interested in modeling the price of real estate properties (y) using two features: their size (X1) and a Boolean flag indicating whether the apartment is located in the city center (X2). is the intercept,
and
are coefficients of the linear model, and
is the error term (unexplained by the model).
After gathering data and estimating a linear regression model, you obtain the following coefficients:
Knowing the estimated coefficients and that X2 is a Boolean feature, you can write out the two possible scenarios depending on the value of X2.
City center
Outside of the city center
How to interpret those? While this might not make a lot of sense in the context of real estate, you can say that a 0-square-meter apartment in the city center costs 310 (the value of the intercept). Each square meter of additional space increases the price by 20. In the other case, the only difference is that the intercept is smaller by 10 units. Figure 1 shows the two best-fit lines.

As you can see, the lines are parallel and they have the same slope — the coefficient by X1, which is the same in both scenarios.
Interaction terms represent joint effects
At this point, you might argue that an additional square meter in an apartment in the city center costs more than an additional square meter in an apartment on the outskirts. In other words, there might be a joint effect of these two features on the price of real estate.
So, you believe that not only the intercept should be different between the two scenarios but also the slope of the lines. How to do that? That is exactly when the interaction terms come into play. They make the models’ specifications more flexible and enable you to account for such patterns.
An interaction term is effectively a multiplication of the two features that you believe have a joint effect on the target. The following equation presents the model’s new specification:
Again, assume that you have estimated your model and you know the coefficients. For simplicity, I’ve kept the same values as in the previous example. Bear in mind that in a real-life scenario, they would likely differ.
City center
Outside of the city center
After you write out the two scenarios for X2 (city center or outside of the city center), you can immediately see that the slope (coefficient by X1) of the two lines is different. As hypothesized, an additional square meter of space in the city center is now more expensive than in the suburbs.
Interpreting the coefficients with interaction terms
Adding interaction terms to a model changes the interpretation of all the coefficients. Without an interaction term, you interpret the coefficients as the unique effect of a predictor on the dependent variable.
So in this case, you could say that was the unique effect of the size of an apartment on its price. However, with an interaction term, the effect of the apartment’s size is different for different values of X2. In other words, the unique effect of apartment size on its price is no longer limited to
.
To better understand what each coefficient represents, here’s one more look at the raw specification of a linear model with interaction terms. Just as a reminder, X2 is a Boolean feature indicating whether a particular apartment is in the city center.
Now, you can interpret each of the coefficients in the following way:
: Intercept for the apartments outside of the city center (or whichever group that had a zero value for the Boolean feature X2).
: Slope (effect of price) for apartments outside of the city center.
: Difference in the intercept between the two groups.
: Difference in slopes between apartments in the city center and outside of it.
For example, assume that you are testing a hypothesis that there is an equal impact of the size of an apartment on its price, regardless of whether the apartment is in the city center. Then, you would estimate the linear regression with the interaction term and check whether is significantly different from 0.
Some additional notes on the interaction terms:
- I’ve presented two-way interaction terms; however, higher-order interactions (for example, of three features) are also possible.
- In this example, I showed an interaction of a numerical feature (size of the apartment) with a Boolean one (is the apartment in the city center?). However, you can create interaction terms for two numerical features as well. For example, you might create an interaction term of the size of the apartment with the number of rooms. For more information, see the Related resourcessection.
- It might be the case that the interaction term is statistically significant, but the main effects are not. Then, you should follow the hierarchical principle stating that if you include an interaction term in the model, you should also include the main effects, even if their impacts are not statistically significant.
Hands-on example in Python
After all the theoretical introduction, here’s how to add interaction terms to a linear regression model in Python. As always, start by importing the required libraries.
import numpy as npimport pandas as pdimport statsmodels.api as smimport statsmodels.formula.api as smf# plottingimport seaborn as snsimport matplotlib.pyplot as plt# settingsplt.style.use("seaborn-v0_8")sns.set_palette("colorblind")plt.rcParams["figure.figsize"] = (16, 8)%config InlineBackend.figure_format = 'retina'
In this example, you estimate linear models using the statsmodels
library. For the dataset, use the mtcars
dataset. I am quite sure that if you have ever worked with R, you will be already familiar with this dataset.
First, load the dataset:
mtcars = sm.datasets.get_rdataset("mtcars", "datasets", cache=True)print(mtcars.__doc__)
Executing the code example prints a comprehensive description of the dataset. In this post, I only show the relevant parts — an overall description and the definition of the columns:
====== =============== mtcars R Documentation ====== ===============
The data was extracted from the 1974 US magazine MotorTrend, and is composed of fuel consumption and 10 aspects of automobile design and performance for 32 automobiles (1973–74 models).
Here’s a DataFrame with 32 observations on 11 (numeric) variables:
===== ==== ======================================== [, 1] mpg Miles/(US) gallon [, 2] cyl Number of cylinders [, 3] disp Displacement (cu.in.) [, 4] hp Gross horsepower [, 5] drat Rear axle ratio [, 6] wt Weight (1000 lbs) [, 7] qsec 1/4 mile time [, 8] vs Engine (0 = V-shaped, 1 = straight) [, 9] am Transmission (0 = automatic, 1 = manual) [,10] gear Number of forward gears [,11] carb Number of carburetors ===== ==== ========================================
Then, extract the actual dataset from the loaded object:
df = mtcars.datadf.head()
mpg | cyl | disp | hp | drat | wt | qsec | vs | am | gear | carb | |
Mazda RX4 | 21.0 | 6 | 160 | 110 | 3.90 | 2.620 | 16.46 | 0 | 1 | 4 | 4 |
Mazda RX4 Wag | 21.0 | 6 | 160 | 110 | 3.90 | 2.875 | 17.02 | 0 | 1 | 4 | 4 |
Datsun 710 | 22.8 | 4 | 108 | 93 | 3.85 | 2.320 | 18.61 | 1 | 1 | 4 | 1 |
Hornet 4 Drive | 21.4 | 6 | 258 | 110 | 3.08 | 3.215 | 19.44 | 1 | 0 | 3 | 1 |
Hornet Sportabout | 18.7 | 8 | 360 | 175 | 3.15 | 3.440 | 17.02 | 0 | 0 | 3 | 2 |
For this example, assume that you want to investigate the relationship between the miles per gallon (mpg
) and two features: weight (wt
, continuous) and type of transmission (am
, Boolean).
First, plot the data to get some initial insights:
sns.lmplot(x="wt", y="mpg", hue="am", data=df, fit_reg=False)plt.ylabel("Miles per Gallon")plt.xlabel("Vehicle Weight");

Just by eyeballing Figure 2, you can see that the regression lines for the two categories of the am variable will be quite different. For comparison’s sake, start off with a model without interaction terms.
model_1 = smf.ols(formula="mpg ~ wt + am", data=df).fit()model_1.summary()
The following tables show the results of fitting a linear regression without the interaction term.
OLS Regression Results ==============================================================================Dep. Variable: mpg R-squared: 0.753Model: OLS Adj. R-squared: 0.736Method: Least Squares F-statistic: 44.17Date: Sat, 22 Apr 2023 Prob (F-statistic): 1.58e-09Time: 23:15:11 Log-Likelihood: -80.015No. Observations: 32 AIC: 166.0Df Residuals: 29 BIC: 170.4Df Model: 2 Covariance Type: nonrobust ============================================================================== coef std err t P>|t| [0.025 0.975]------------------------------------------------------------------------------Intercept 37.3216 3.055 12.218 0.000 31.074 43.569wt -5.3528 0.788 -6.791 0.000 -6.965 -3.741am -0.0236 1.546 -0.015 0.988 -3.185 3.138==============================================================================Omnibus: 3.009 Durbin-Watson: 1.252Prob(Omnibus): 0.222 Jarque-Bera (JB): 2.413Skew: 0.670 Prob(JB): 0.299Kurtosis: 2.881 Cond. No. 21.7==============================================================================
From the summary tables, you can see that the coefficient by the am feature is not statistically significant. Using the interpretation of the coefficients you’ve already learned, you can plot the best-fit lines for both classes of the am feature.
X = np.linspace(1, 6, num=20)sns.lmplot(x="wt", y="mpg", hue="am", data=df, fit_reg=False)plt.title("Best fit lines for from the model without interactions")plt.ylabel("Miles per Gallon")plt.xlabel("Vehicle Weight")plt.plot(X, 37.3216 - 5.3528 * X, "blue")plt.plot(X, (37.3216 - 0.0236) - 5.3528 * X, "orange");

Figure 3 shows that the lines are almost overlapping, as the coefficient by the am feature is basically zero.
Follow up with a second model, this time with an interaction term between the two features. Here’s how to add an interaction term as an additional input in the statsmodels
formula.
model_2 = smf.ols(formula="mpg ~ wt + am + wt:am", data=df).fit()model_2.summary()
The following summary tables show the results of fitting a linear regression with the interaction term.
OLS Regression Results ==============================================================================Dep. Variable: mpg R-squared: 0.833Model: OLS Adj. R-squared: 0.815Method: Least Squares F-statistic: 46.57Date: Mon, 24 Apr 2023 Prob (F-statistic): 5.21e-11Time: 21:45:40 Log-Likelihood: -73.738No. Observations: 32 AIC: 155.5Df Residuals: 28 BIC: 161.3Df Model: 3 Covariance Type: nonrobust =============================================================================== coef std err t P>|t| [0.025 0.975]------------------------------------------------------------------------------Intercept 31.4161 3.020 10.402 0.000 25.230 37.602wt -3.7859 0.786 -4.819 0.000 -5.395 -2.177am 14.8784 4.264 3.489 0.002 6.144 23.613wt:am -5.2984 1.445 -3.667 0.001 -8.258 -2.339==============================================================================Omnibus: 3.839 Durbin-Watson: 1.793Prob(Omnibus): 0.147 Jarque-Bera (JB): 3.088Skew: 0.761 Prob(JB): 0.213Kurtosis: 2.963 Cond. No. 40.1==============================================================================
Here are two conclusions that you can quickly draw from the summary tables with the interaction term:
- All the coefficients, including the interaction term, are statistically significant.
- By inspecting the R2 (and its adjusted variant, as you have a different number of features in the models), you can state that the model with the interaction term results in a better fit.
Similarly to the previous case, plot the best-fit lines:
X = np.linspace(1, 6, num=20)sns.lmplot(x="wt", y="mpg", hue="am", data=df, fit_reg=False)plt.title("Best fit lines for from the model with interactions")plt.ylabel("Miles per Gallon")plt.xlabel("Vehicle Weight")plt.plot(X, 31.4161 - 3.7859 * X, "blue")plt.plot(X, (31.4161 + 14.8784) + (-3.7859 - 5.2984) * X, "orange");

In Figure 4, you can immediately see the difference in the fitted lines, both in terms of the intercept and the slope, for cars with automatic and manual transmissions.
Here’s a bonus: You can also add interaction terms using scikit-learn
’s PolynomialFeatures
. The transformer offers not only the possibility to add interaction terms of arbitrary order, but it also creates polynomial features (for example, squared values of the available features). For more information, see sklearn.preprocessing.PolynomialFeatures.
Wrapping up
When working with interaction terms in linear regression, there are a few things to remember:
- Interaction terms enable you to examine whether the relationship between the target and a feature changes depending on the value of another feature.
- Add interaction terms as a multiplication of the original features. By adding these new variables to the regression model, you can measure the effects of the interaction between them and the target. It is crucial to interpret the coefficients of the interaction terms carefully to understand the direction and the strength of the relationship.
- By using interaction terms, you can make the specification of a linear model more flexible (different slopes for different lines), which can result in a better fit to the data and better predictive performance.
You can find the code used in this post in my /erykml GitHub repo. As always, any constructive feedback is more than welcome. You can reach out to me on Twitter or in the comments below.
Related resources
FAQs
How do you explain interaction terms in regression? ›
Interaction effect means that two or more features/variables combined have a significantly larger effect on a feature as compared to the sum of the individual variables alone. This effect is important to understand in regression as we try to study the effect of several variables on a single response variable.
How do you add an interaction term in linear regression? ›- Standardize the observations for each variables.
- Multiply corresponding standardized values from specific variables to create the interaction terms and then add these new variables to the set of regression data.
- Run the regression.
- When they have large main effects. ...
- When the effect of one changes for various subgroups of the other. ...
- When the importance of the interaction has already been proven in previous studies. ...
- When you want to explore new hypotheses.
A significant interaction effect means that there are significant differences between your groups and over time. In other words, the change in scores over time is different depending on group membership.
What are three way interaction terms in regression? ›A three way interaction means that the interaction among the two factors (A * B) is different across the levels of the third factor (C). If the interaction of A * B differs a lot among the levels of C then it sounds reasonable that the two way interaction A * B should not appear as significant.
How many two way interaction terms can be considered in the linear regression? ›We fit a model with the three continuous predictors, or main effects, and their two-way interactions. Because we have three main effects, there are three possible two-way interactions.
Can you have two interaction terms in regression? ›First, it is possible to have multiple interaction terms in a model. Second, people may ask why you included some interaction terms and not others - you need to have a good answer.
What could adding interaction terms in linear regression lead to? ›Adding interaction terms to a regression model has real benefits. It greatly expands your understanding of the relationships among the variables in the model. And you can test more specific hypotheses.
How do you explain interaction variables? ›An interaction variable or interaction feature is a variable constructed from an original set of variables to try to represent either all of the interaction present or some part of it.
Should I include interaction terms in regression? ›When the effect of one independent variable depends on the level of another independent variable, we have an interaction; and an interaction term should be included in the regression equation.
Do interaction terms solve multicollinearity? ›
Both higher-order terms and interaction terms produce multicollinearity because these terms include the main effects. Centering the variables is a simple way to reduce structural multicollinearity. Centering the variables is also known as standardizing the variables by subtracting the mean.
When should you use interaction terms? ›Adding an interaction term to a model — estimated using linear regression — becomes necessary when the statistical association between a predictor and an outcome depends on the value/level of another predictor.
How to tell if there is a linear relationship between two variables in regression? ›The linear relationship between two variables is positive when both increase together; in other words, as values of get larger values of get larger. This is also known as a direct relationship. The linear relationship between two variables is negative when one increases as the other decreases.
What p-value is significant for interaction? ›The p-value for the test for a significant interaction between factors is 0.562. This p-value is greater than 5% (α), therefore we fail to reject the null hypothesis.
How would you most accurately define a statistical interaction? ›-My definition of statistical interaction: "Statistical interaction means the effect of one independent variable(s) on the dependent variable depends on the value of another independent variable(s)." Conversely, "Additivity means that the effect of one independent variable(s) on the dependent variable does NOT depend ...
What are the 3 models of interaction? ›- Gravity model. The level of interaction between two locations is a function of their attributes pondered by their level of separation. ...
- Potential model. ...
- Retail model.
Species interactions within ecological webs include four main types of two-way interactions: mutualism, commensalism, competition, and predation (which includes herbivory and parasitism).
What are the three levels of interaction? ›Don Norman proposes the emotional system consists of three different, yet interconnected levels, each of which influences our experience of the world in a particular way. The three levels are visceral, behavioral, and reflective.
What if the interaction term is not significant? ›If the interaction term is not statistically significant, it should be removed from the model and the analysis rerun without the interaction term. Failure to remove an interaction term that was not statistically significant also can lead to an incorrect conclusion (Engqvist, 2005).
How many variables is too many linear regression? ›Many difficulties tend to arise when there are more than five independent variables in a multiple regression equation. One of the most frequent is the problem that two or more of the independent variables are highly correlated to one another. This is called multicollinearity.
What happen if two features correlates in a linear regression? ›
Multicollinearity occurs when two or more independent variables have a high correlation with one another in a regression model, which makes it difficult to determine the individual effect of each independent variable on the dependent variable.
Can you have an interaction without a main effect? ›The simple answer is no, you don't always need main effects when there is an interaction. However, the interaction term will not have the same meaning as it would if both main effects were included in the model.
Does an interaction term count as a predictor variable? ›They are both considered predictor variables. The interaction tells us that the effect of X on Y is different at different values of Z. It also tells us that the effect of Z on Y is different at different values of X. You can interpret it either way.
Can two regression lines intersect each other? ›The two lines of regression intersect each other at
The two lines of regression coincide and both pass through the common point. It is the point of intersection of the two regression lines. This point of intersection gives the value of the mean.
interact_plot plots regression lines at user-specified levels of a moderator variable to explore interactions. The plotting is done with ggplot2 rather than base graphics, which some similar functions use.
How can you create interaction terms? ›A common interaction term is a simple product of the predictors in question. For example, a product interaction between VARX and VARY can be computed and called INTXY with the following command. COMPUTE INTXY = VARX * VARY. The new predictors are then included in a REGRESSION procedure.
What are some examples of interaction? ›Arguments | Brainstorming |
---|---|
Entertainment (together) | Exchanging Letters |
Expressing Gratitude | Family Activities |
Goodbyes | Greetings |
Group Assignments | Helping Others |
A basic assumption of linear regression is that the relationship between the predictors and response variable is linear. When you have an interaction effect, you add the assumption that relationship between your predictor and response is linear regardless of the level of the moderator.
When should you avoid linear regression? ›[1] To recapitulate, first, the relationship between x and y should be linear. Second, all the observations in a sample must be independent of each other; thus, this method should not be used if the data include more than one observation on any individual.
What is the rule of thumb for VIF? ›The VIF of an explanatory variable indicates the strength of the linear relationship between the variable and the remaining explanatory variables. A rough rule of thumb is that the VIFs greater than 10 give some cause for concern.
When can you safely ignore multicollinearity? ›
It increases the standard errors of their coefficients, and it may make those coefficients unstable in several ways. But so long as the collinear variables are only used as control variables, and they are not collinear with your variables of interest, there's no problem.
Should a VIF value be less than 5? ›A VIF less than 5 indicates a low correlation of that predictor with other predictors. A value between 5 and 10 indicates a moderate correlation, while VIF values larger than 10 are a sign for high, not tolerable correlation of model predictors (James et al.
How do you describe the interaction effect? ›What are Interaction Effects? An interaction effect occurs when the effect of one variable depends on the value of another variable. Interaction effects are common in regression models, ANOVA, and designed experiments.
How would you explain interaction? ›Interaction is simply defined as the interacting components communicating with each other. It is a necessary activity to ensure learning in learning environments.
How do you describe an interaction? ›Interaction comes from Latin inter meaning "between," and ago meaning "to do" or "to act" — any “action between” is considered an interaction, like the interaction between a teacher and a student, two countries, or even baking soda and vinegar (boom!).