How to draw multiple regression line in R

To create multiple regression lines in a single plot using ggplot2, we can use geom_jitter function along with geom_smooth function. The geom_smooth function will help us to different regression line with different colors and geom_jitter will differentiate the points.

Check out the below Example to understand how it can be done.

Example

Following snippet creates a sample data frame −

x1<-rpois(20,1)
y1<-rpois(20,5)
x2<-rpois(20,2)
y2<-rpois(20,8)
x3<-rpois(20,2)
y3<-rpois(20,4)
df<-data.frame(x1,y1,x2,y2,x3,y3)
df

The following dataframe is created

  x1 y1 x2 y2 x3 y3
 1 2  2  0  6  1 6
 2 3  4  0  9  1 7
 3 2  4  3  7  2 3
 4 0 12  2 11  0 1
 5 0  2  0  6  1 1
 6 1  7  2  7  1 3
 7 0  4  0  4  1 5
 8 0  3  2  5  0 1
 9 1  4  3  3  0 9
10 0  2  0  8  3 5
11 0  7  4 11  2 4
12 0  4  3  8  2 1
13 0  6  0  6  2 4
14 1  6  1  9  2 2
15 2  3  1  9  6 2
16 1  3  1 10  5 2
17 0  5  1  8  2 6
18 1  2  4  7  2 4
19 0  5  2 11  0 7
20 2  8  4  8  2 4

To load the ggplot2 package and create regression lines for multiple models in single plot on the above created data frame, add the following code to the above snippet −

Today let’s re-create two variables and see how to plot them and include a regression line. We take height to be a variable that describes the heights (in cm) of ten people. Copy and paste the following code to the R command line to create this variable.

height <- c(176, 154, 138, 196, 132, 176, 181, 169, 150, 175)

Now let’s take bodymass to be a variable that describes the masses (in kg) of the same ten people. Copy and paste the following code to the R command line to create the bodymass variable.

bodymass <- c(82, 49, 53, 112, 47, 69, 77, 71, 62, 78)

Both variables are now stored in the R workspace. To view them, enter:

height
 [1] 176 154 138 196 132 176 181 169 150 175
bodymass
 [1] 82 49 53 112 47 69 77 71 62 78

We can now create a simple plot of the two variables as follows:

plot(bodymass, height)

How to draw multiple regression line in R

We can enhance this plot using various arguments within the plot() command. Copy and paste the following code into the R workspace:

plot(bodymass, height, pch = 16, cex = 1.3, col = "blue", main = "HEIGHT PLOTTED AGAINST BODY MASS", xlab = "BODY MASS (kg)", ylab = "HEIGHT (cm)")

How to draw multiple regression line in R

 

In the above code, the syntax pch = 16 creates solid dots, while cex = 1.3 creates dots that are 1.3 times bigger than the default (where cex = 1). More about these commands later.

Now let’s perform a linear regression using lm() on the two variables by adding the following text at the command line:

lm(height ~ bodymass)
Call:
lm(formula = height ~ bodymass)
Coefficients:
(Intercept)     bodymass
    98.0054       0.9528

We see that the intercept is 98.0054 and the slope is 0.9528. By the way – lm stands for “linear model”.

Finally, we can add a best fit line (regression line) to our plot by adding the following text at the command line:

R provides comprehensive support for multiple linear regression. The topics below are provided in order of increasing complexity.

Fitting the Model

# Multiple Linear Regression Example
fit <- lm(y ~ x1 + x2 + x3, data=mydata)
summary(fit) # show results

# Other useful functions
coefficients(fit) # model coefficients
confint(fit, level=0.95) # CIs for model parameters
fitted(fit) # predicted values
residuals(fit) # residuals
anova(fit) # anova table
vcov(fit) # covariance matrix for model parameters
influence(fit) # regression diagnostics

Diagnostic Plots

Diagnostic plots provide checks for heteroscedasticity, normality, and influential observerations.

# diagnostic plots
layout(matrix(c(1,2,3,4),2,2)) # optional 4 graphs/page
plot(fit)

click to view

For a more comprehensive evaluation of model fit see regression diagnostics or the exercises in this interactive course on regression.

Comparing Models

You can compare nested models with the anova( ) function. The following code provides a simultaneous test that x3 and x4 add to linear prediction above and beyond x1 and x2.

# compare models
fit1 <- lm(y ~ x1 + x2 + x3 + x4, data=mydata)
fit2 <- lm(y ~ x1 + x2)
anova(fit1, fit2)

Cross Validation

You can do K-Fold cross-validation using the cv.lm( ) function in the DAAG package.

# K-fold cross-validation
library(DAAG)
cv.lm(df=mydata, fit, m=3) # 3 fold cross-validation

Sum the MSE for each fold, divide by the number of observations, and take the square root to get the cross-validated standard error of estimate.

You can assess R2 shrinkage via K-fold cross-validation. Using the crossval() function from the bootstrap package, do the following:

# Assessing R2 shrinkage using 10-Fold Cross-Validation

fit <- lm(y~x1+x2+x3,data=mydata)

library(bootstrap)
# define functions
theta.fit <- function(x,y){lsfit(x,y)}
theta.predict <- function(fit,x){cbind(1,x)%*%fit$coef}

# matrix of predictors
X <- as.matrix(mydata[c("x1","x2","x3")])
# vector of predicted values
y <- as.matrix(mydata[c("y")])

results <- crossval(X,y,theta.fit,theta.predict,ngroup=10)
cor(y, fit$fitted.values)**2 # raw R2
cor(y,results$cv.fit)**2 # cross-validated R2

Variable Selection

Selecting a subset of predictor variables from a larger set (e.g., stepwise selection) is a controversial topic. You can perform stepwise selection (forward, backward, both) using the stepAIC( ) function from the MASS package. stepAIC( ) performs stepwise model selection by exact AIC.

# Other useful functions
coefficients(fit) # model coefficients
confint(fit, level=0.95) # CIs for model parameters
fitted(fit) # predicted values
residuals(fit) # residuals
anova(fit) # anova table
vcov(fit) # covariance matrix for model parameters
influence(fit) # regression diagnostics
0

Alternatively, you can perform all-subsets regression using the leaps( ) function from the leaps package. In the following code nbest indicates the number of subsets of each size to report. Here, the ten best models will be reported for each subset size (1 predictor, 2 predictors, etc.).

# Other useful functions
coefficients(fit) # model coefficients
confint(fit, level=0.95) # CIs for model parameters
fitted(fit) # predicted values
residuals(fit) # residuals
anova(fit) # anova table
vcov(fit) # covariance matrix for model parameters
influence(fit) # regression diagnostics
1

click to view

Other options for plot( ) are bic, Cp, and adjr2. Other options for plotting with
subset( ) are bic, cp, adjr2, and rss.

Relative Importance

The relaimpo package provides measures of relative importance for each of the predictors in the model. See help(calc.relimp) for details on the four measures of relative importance provided.

# Other useful functions
coefficients(fit) # model coefficients
confint(fit, level=0.95) # CIs for model parameters
fitted(fit) # predicted values
residuals(fit) # residuals
anova(fit) # anova table
vcov(fit) # covariance matrix for model parameters
influence(fit) # regression diagnostics
2

# Other useful functions
coefficients(fit) # model coefficients
confint(fit, level=0.95) # CIs for model parameters
fitted(fit) # predicted values
residuals(fit) # residuals
anova(fit) # anova table
vcov(fit) # covariance matrix for model parameters
influence(fit) # regression diagnostics
3

click to view

Graphic Enhancements

The car package offers a wide variety of plots for regression, including added variable plots, and enhanced diagnostic and Scatterplots.

Going Further

Nonlinear Regression

The nls package provides functions for nonlinear regression. See John Fox's Nonlinear Regression and Nonlinear Least Squares for an overview. Huet and colleagues' Statistical Tools for Nonlinear Regression: A Practical Guide with S-PLUS and R Examples is a valuable reference book.

Robust Regression

There are many functions in R to aid with robust regression. For example, you can perform robust regression with the rlm( ) function in the MASS package. John Fox's (who else?) Robust Regression provides a good starting overview. The UCLA Statistical Computing website has Robust Regression Examples.

The robust package provides a comprehensive library of robust methods, including regression. The robustbase package also provides basic robust statistics including model selection methods. And David Olive has provided an detailed online review of Applied Robust Statistics with sample R code.

To Practice

This course in machine learning in R includes excercises in multiple regression and cross validation.

How to plot two different regression lines in R?

19.2 Two Regression Lines in Basic R The regression line will be drawn using the function abline( ) with the function, lm( ), for linear model. The syntax is: abline(lm(y-coordinate ~ x-coordinate). We will use the same colors as those used in the scatterplot to differentiate the two regression lines.

How to make a multiple regression model in R?

Step 1 - Install the necessary libraries. ... .
Step 2 - Read a csv file and do EDA : Exploratory Data Analysis. ... .
Step 3 - Plot a scatter plot between x and y. ... .
Step 4 - Train and Test data. ... .
Step 5 - Create a linear regression model. ... .
Step 6 - Add regression line to the plot. ... .
Step 7 - Make predictions on the test dataset..

How to draw a regression line in R?

A regression line will be added on the plot using the function abline(), which takes the output of lm() as an argument. You can also add a smoothing line using the function loess().

Can we plot multiple linear regression?

At the center of the multiple linear regression analysis lies the task of fitting a single line through a scatter plot. More specifically, the multiple linear regression fits a line through a multi-dimensional cloud of data points. The simplest form has one dependent and two independent variables.