How to do regressions in excel

In Excel for the web, you can view the results of a regression analysis (in statistics, a way to predict and forecast trends), but you can’t create one because the Regression tool isn’t available.

You also won’t be able to use a statistical worksheet function such as LINEST to do a meaningful analysis because it requires you enter it as an array formula, which isn’t supported in Excel for the web.

If you have the Excel desktop application, you can use the Open in Excel button to open your workbook and use either the Analysis ToolPak’s Regression tool or statistical functions to perform a regression analysis there.

Click Open in Excel and perform a regression analysis.

Button to Edit in Excel

For news about the latest Excel for the web updates, visit the Microsoft Excel blog.

For the full suite of Office applications and services, try or buy it at Office.com.

Need more help?

Want more options?

Explore subscription benefits, browse training courses, learn how to secure your device, and more.

Communities help you ask and answer questions, give feedback, and hear from experts with rich knowledge.

Regression is an Analysis Tool, which we use for analyzing large amounts of data and making forecasts and predictions in Microsoft Excel.

regression analysis excel

Want to predict the future? No, we are not going to learn astrology. We are into numbers and we will learn regression analysis in Excel today.

To predict future estimates, we will study:

  • REGRESSION ANALYSIS USING EXCEL FUNCTIONS (MANUAL REGRESSION FINDING)
  • REGRESSION ANALYSIS USING EXCEL’S ANALYSIS TOOLPAK ADD-IN
  • REGRESSION CHART IN EXCEL

Let’s do it…

Scenario:

Let’s assume you sell soft drinks. How cool will it be if you can predict:

  • How many soft drinks will be sold next year based on previous year’s data?
  • Which fields need to be focused?
  • And how can you increase your sales by changing your strategy?

It will be profitably awesome. Right?… I know. So let’s get started.

You have 11 records of salesmen and soft drinks sold.

regression analysis excel

Now based on this data you want to predict the number of salesmen required to achieve 2000 sales of soft drinks.

regression analysis in excel

The regression equation is a tool to make such close estimates. To do so, we need to know Regression first.

REGRESSION ANALYSIS USING EXCEL FUNCTIONS (MANUAL REGRESSION FINDING)

This part will make you understand regression better than just telling excel regression procedure.

Introduction:

Simple Linear Regression: 

The study of the relationship between two variables is called Simple Linear Regression. Where one variable depends on the other independent variable. The dependent variable is often called by names such as Driven, Response, and Target variable. And the independent variable is often pronounced as a Driving, Predictor or simply Independent variable. These names clearly describe them.

kcKoLq4gi

Now let’s compare this with your scenario. You want to know the number of salesmen required to achieve 2000 sales. So here, the dependent variable is the number of salesmen and the independent variable is sold soft drinks.

The independent variable is mostly denoted as x and dependent variable as y.

In our case, soft drinks are sold x and the number of salesmen is y.

unnamed (11)

If we want to know how many soft drinks will be sold if we appoint 200 salesmen, then the scenario will be vice-versa.

unnamed (12)

Moving On.

The “Simple” Math of Linear Regression Equation:

Well, it’s not simple. But Excel made it simple to do.

We need to predict the required number of salesmen for all 11 cases to get the 12th closest prediction.

pasted image 0 (39)

Let’s say:

Soft Drink Sold is x

The number of Salesmen is y

The predicted y (number of salesmen) also called Regression Equation,  would be

Now you must be wondering where the stat will you get the slope and intercept. Don’t worry, excel has functions for them. You do not need to learn how to find the slope and intercept it manually.

If you want, I will prepare a separate tutorial for that. Let me know in the comments section. These are some important data analytics tools.

Now let’s step into our calculation:

Step1: Prepare this small table

pasted image 0 (40)

Step 2: Find the slope of the regression line

Excel Function for slopes is

=SLOPE(known_y’s,known_x’s)

Your known_y’s are in range B2:B12 and known_x’s are in range C2:C12

pasted image 0 (41)

In cell B16, write the formula below

pasted image 0 (42)

(Note: Slope is also called coefficient of x in the regression equation)

You will get 0.058409. Round up to 2 decimal digits and you will get 0.06.

Step 3: Find the Intercept of Regression Line

Excel function for the intercept is

=INTERCEPT(known_y’s, known_x’s)

We know what our known x’s and y’s

In cell B17, write down this formula

pasted image 0 (43)

=INTERCEPT(B2:B12, C2:C12)

You will get a value of -1.1118969. Roundup to 2 decimal digits. You will get -1.11.

Our Linear Regression Equation is = x*0.06 + (-1.11). Now we can predict possible y depending on the target x easily.

Step 4: In D2 write the formula below

=C2*$B$16+$B$17    (Regression Equation)

You will get a value of 13.55.
pasted image 0 (44)

Select D2 to D13 and press CTRL+D to fill down the formula in the range D2:D13
pasted image 0 (45)

In cell D13 you have your required number of salesmen.

Hence, to achieve the target of 2000 Soft Drink Sales, you need an estimate of 115.71 salesmen or say 116 since it is illegal to cut humans into pieces.

Now using this you can easily conduct What-If analysis in excel. Just change the number of sales and it will show you many salesmen will it take to get that sales target achieved.

Play around it to find out:

How much workforce do you need to increase sales?

How many sales will increase if you increase your salesmen?

Make Your Estimate More Reliable:

Now you know that you need 116 salesmen to get 2000 sales done.

In analytics, nothing is just said and believed. You must give a percentage of reliability on your estimate. It is like giving a certificate of your equation.

unnamed (13)

Correlation Coefficient Formula:

The next thing you will be asked is how much these two variables are related. In static terms, you need to tell the coefficient of correlation.

Excel function for correlation is

In your case, known_x’s and Know_y’s are array1 and array2 irrespectively.

In B18 enter this formula

pasted image 0 (46)

You will have 0.919090. Formate cell B2 into the percentage. Now have 92% of correlation.

Now, what this 92% means. It means, there 92% of chances of sales increase if you increase the number of salesmen and 92% of sales decrease if you decrease the number of salesmen. It is called Positive Correlation Coefficient.

R Squire (R^2) :

R Squire value tells you, by what percentage your regression equation is not a fluke. How much it is accurate by the data provided.

The Excel function for R squire is RSQ.

RSQ(known_y’s, Known_x’s)

In our case, we will get R squire value in cell B19.

In B19 enter this formula

pasted image 0 (47)

So we have 84% of r Square value. Which is a very good explanation of our regression. It says that 84% of our data is just not by chance. Y (number of salesmen) is very much dependent on X (sales of soft drinks).

There are many other tests we can do on this data to ensure our regression. But manually it will be a complex and lengthy procedure. That is why excel provides Analysis Toolpak. Using this tool we can do this regression analysis in seconds.

REGRESSION IN EXCEL USING EXCEL’S ANALYSIS TOOLPAK ADD-IN

If you already know what regression equations are, and you just want your results quickly then this part is for you. But if you want to understand regression equations easily then scroll up to REGRESSION ANALYSIS USING EXCEL FUNCTIONS (MANUAL REGRESSION FINDING).

Excel provides a whole bunch of tools for analysis in its Analysis Toolpak. By default, it is not available in the Data tab. You need to add it. So let’s add it first.

Adding Analysis Toolpak to Excel 2016

If you don’t know where is data analysis in excel follow these steps

Step 1: Go to Excel Options: File? Options? Add-Ins

pasted image 0 (48)

Step 2: Click on Add-Ins. You will see a list of available add-Ins.

Select Analysis ToolPak and at the bottom of the window, find manage. In manage select Excel Add-Ins and Click on GO.

pasted image 0 (49)

Add-ins window will open. Here, select Analysis ToolPak. Then click the ok button.

pasted image 0 (50)

Now you can access all functions of data analysis ToolPak from Data Tab.

Using Analysis ToolPak for Regression

Step 1: Go to the Data tab, Locate Data Analysis. Then click on it.

pasted image 0 (51)

A dialogue box will pop up.

pasted image 0 (52)

Step 2: Find ‘Regression’ in Analysis Tools list and hit the OK button.

The regression input window will pop up. You will see a number of available input options. But for now, we will just concentrate on Y Range and X Range, leaving everything else to default.

pasted image 0 (53)

Step 4: Provide Inputs:

pasted image 0 (54)

No. of Salesmen is Y

Sales of soft drinks are X

Hence

  • Y Range= B2:B11

And 

  •  X Range = C2:C11

unnamed (14)

For the output range, I have selected E4 on the same sheet. You may select a new worksheet to get results on a new worksheet in the same workbook or a complete new workbook. When you are done with your inputs, hit the OK button.

Results:

You will be served with a variety of information from your data. Don’t get overwhelmed. You don’t need to consume all the dishes.

Regress Analysis Excel

We will only deal with those results which will help us to estimate the required number of salesmen

Step 5: We know the regression equation for estimation of y, that is

x*Slope+Intercept

We just need to locate Slope and Intercept in results.
pasted image 0 (56)

And here they are.

The intercept Coefficient is clearly mentioned.

The slope is written as ‘X Variable 1’, some times also mentioned as the coefficient of X. Round up them and we will get -1.11 as Intercept and 0.06 as Slope.

Step 6: From results, we can drive the Regression equation. And that would be

=x*(0.06) + (-1.11)

Prepare this table in excel.

pasted image 0 (57)

For now, x is 2000, which is in cell E2.

In Cell F2 enter this formula

=E2*F21+F20
pasted image 0 (58)

You will get a result of 115.7052757.

pasted image 0 (59)

Rounding it up will give us 116 of Required Salesmen.

So we have learned how to form the regression equation manually and using Analysis ToolPak. How can you use this equation to estimate future stats?

Now let’s understand the regression output given by Analysis Toolpak.

Understanding the Regression Output:

There is no benefit, if you do regression analysis using analysis tool pack in excel and can’t interpret its meaning.

pasted image 0 (60)

Summary Section:

As the name suggests, it is a summary of the data.

pasted image 0 (61)

    1. Multiple R: It tells how fit the regression equation is to the data. It is also called the correlation coefficient. 

In our case, it is 0.919090619 or 0.92 (roundup). This means that there is a 92% chance of an increase in sales if we increase our salesmen count.

    1. R Square: It tells the reliability of found regression. It tells us how many observations are part of our line of regression. In our case, it is 0.844727566 or 0.85. It means that our regression is fit by 85%.
    2. Adjusted R Square: Theadjusted square is just a more testified version of R square. Mainly useful in Multiple Regression Analysis.
    3. Standard Error: While R. Squire tells you how many data points fall near the regression line, the standard error tells you how far a data point can go from the regression line. 

In our case, it is 6.74.

  1. Observation: This is simply the number of observations, which is 11 in our example.

Anova Section:

This section is hardly used in linear regression.

pasted image 0 (62)

  1. df. It is a degree of freedom. It is used when calculating regression manually.
  2. SS. Sum of squares. It is just a sum of squares of variances. Used to find R squire values.
  3. MS. This means squared value.
  4. And 5. F and Significance of F. If the significance of F (p-value of the slope) is less than the F test than you can discard the null hypothesis and prove your hypothesis. In simple language, you can conclude that there is some effect of x on y when changed. 

In our case, F is 48.96264 and Significance of F is 0.000063. It means our regression fits the data.

Regression Section:

pasted image 0 (63)

In this section, we have the two most important values for our regression equation.

  1. Intercept: We have an intercept here that tells where x-intercepts on Y. This is an important part of the regression equation. It is -1.11 in our case.
  2. X variable 1 (Slope). Also called the coefficient of x. It defines the tangent of the regression line.

REGRESSION CHART IN EXCEL

In excel, it is easy to plot a regression chart. Just follow these steps. To add Regression Chart in Excel 2016, 2013, and 2010 follow these simple steps.

Step 1. Have your known x’s in the first column and know y’s in the second.

In our case, we know Known_ x’s are Soft Drinks Sold. And known_y’s are Salesmen.

pasted image 0 (64)

Step 2. Select your known x’s and y’s range.
pasted image 0 (65)

Step 3: Go to the Insert tab and click on the scatter chart.

pasted image 0 (66)

You will have a chart that looks like this.

pasted image 0 (67)

Step 4. Add the trend line: Goto layout and locate the trendline option in the analysis section.

Under the Trendline option, click on Linear Trendline.
pasted image 0 (68)

You will have your graph looking like this.

pasted image 0 (69)

This is your regression graph.

Now if you add the data below and extend the selected data. You will see a change in your graph.

For our example, we added 2000 to the Soft Drink Sold and left the Salesmen blank. And when we extend the range of the graph, this is what we will have.

It will give the required number of salesmen for doing 2000 sales of soft drinks in graphical form. Which is slightly below 120 in the graph. And from our regression equation, we know it is 116.

pasted image 0 (70)

In this article, I tried to cover everything under Excel Regression Analysis. I explained regression in excel 2016. Regression in excel 2010 and excel 2013 is same as in excel 2016.

For any further query on this topic, use the comments section. Ask a question, give an opinion or just mention my grammatical mistakes. Everything is welcome. Just don’t hesitate to use the comment section.

Related Data:
How to Use STDEV Function in Excel

How To Calculate MODE function in Excel

How To Calculate Mean function in Excel

How to Create Standard Deviation Graph

Descriptive Statistics in Microsoft Excel 2016

How to Use Excel NORMDIST Function

How to use the Pareto Chart and Analysis

Popular Articles:

50 Excel Shortcut to Increase Your Productivity

How to use the VLOOKUP Function in Excel

How to use the COUNTIF function in Excel 2016

How to use the SUMIF Function in Excel

Regression is done to define relationships between two or more variables in a data set. In statistics, regression is done by some complex formulas. But, Excel has provided us with tools for regression analysis. So, in the Excel Analysis ToolPak, click “Data Analysis” and “Regression” to conduct regression analysis in Excel.

Table of contents
  • What is Regression Analysis in Excel?
    • Explained
    • Examples
    • How to Run Regression Analysis Tool in Excel?
    • How to Use Regression Analysis Tool in Excel?
    • Steps to Create Regression Chart in Excel
    • Things to Remember
    • Recommended Articles

Explained

The Regression analysis tool performs linear regression in excelLinear Regression is a statistical excel tool that is used as a predictive analysis model to examine the relationship between two sets of data. Using this analysis, we can estimate the relationship between dependent and independent variables.read more examination using the “minimum squares” technique to fit a line through many observations. You can examine how an individual dependent variable is influenced by the estimations of at least one independent variable. For instance, you can investigate how such factors influence a sportsman’s performance as age, height, and weight. You can distribute shares in the execution measure to every one of these three components, given a lot of execution information, and then utilize the outcomes to foresee the execution of another person.

The Excel regression analysis tool helps you see how the dependent variable changes when one of the independent variables fluctuates and permits you to numerically figure out which of those variables truly has an effect.

You are free to use this image on your website, templates, etc, Please provide us with an attribution linkArticle Link to be Hyperlinked
For eg:
Source: Regression Analysis in Excel (wallstreetmojo.com)

Examples

  1. Sales of shampoo are dependent upon the advertisement. If $1 million increases advertising expenditure, sales will be expected to increase by $23 million. If there were no advertising, we would expect sales without any increment.
  2. House sales (selling price, number of bedrooms, location, size, design) predict the selling price of future sales in the same area.
  3. Soft drink sales massively increase in summer when the weather is too hot. People purchase more and more soft drinks to keep them cool. The higher the temperature, the higher the sales and vice versa.
  4. In March, exam season started, and sales increased due to students purchasing exam pads. Exam pads sale depends upon the examination season.

How to Run Regression Analysis Tool in Excel?

  1. We must enable the Analysis ToolPak Add-in.
  2. In Excel, click on the “File” on the extreme left-hand side, go and click on the “Options” at the end. 

    Regression 1

  3. On clicking on “Options,” select “Add-ins” on the left side. Excel Add-ins are chosen in the “View and manage Microsoft Add-ins” and “Manage” boxes. Then, click “Go.”

    Regression 2

  4. In the Add-in dialog box, click on Analysis Toolpak, and click OK:

    Regression 3

    It will add the “Data Analysis”  tools on the right-hand side to the Excel ribbon’s “Data” tab.

    Regression 4

How to Use Regression Analysis Tool in Excel?

We must use the data for regression analysis in Excel.

You can download this Regression Excel Template here – Regression Excel Template

Regression example 1

Once Analysis ToolpakExcel’s data analysis toolpak can be used by users to perform data analysis and other important calculations. It can be manually enabled from the addins section of the files tab by clicking on manage addins, and then checking analysis toolpak.read more is added and enabled in the Excel workbook, follow the steps mentioned below to practice the analysis of regression in Excel:

  • Step 1: On the Data tab in the Excel ribbonThe ribbon is an element of the UI (User Interface) which is seen as a strip that consists of buttons or tabs; it is available at the top of the excel sheet. This option was first introduced in the Microsoft Excel 2007.read more, click the Data Analysis

Regression example 1-1

  • Step 2: Click on the “Regression” and click “OK” to enable the function.

Regression example 1-2

  • Step 3: On clicking the “Regression dialog box, we must arrange the accompanying settings:
    • For the dependent variable, select the “Input Y Range,” which denotes the dependent data. Here, in the below-given screenshot, we have selected the range from $D$2:$D$13.

Regression example 1-3

  • Select the “Input X Range,” which denotes the independent data for the independent variable. Here, in the below-given screenshot, we have selected the range from $C$2:$C$13.

Regression example 1-4

  • Step 4: Click “OK” and analyze the data accordingly.

Regression example 1-5

When you run the regression analysis in Excel, the following output will come:

example 1-6

example 1-7

example 1-8

You can also make a scatter plot in excelScatter plot in excel is a two dimensional type of chart to represent data, it has various names such XY chart or Scatter diagram in excel, in this chart we have two sets of data on X and Y axis who are co-related to each other, this chart is mostly used in co-relation studies and regression studies of data.read more of these residuals.

Steps to Create Regression Chart in Excel

  • Step 1: Select the data as given in the below screenshot.

example 1-9

  • Step 2: Tap on the “Inset” tab. In the “Charts” gathering, tap the “Scatter” diagram or some other as a required symbol. Select the chart which suits the information.

example 1-10

  • Step 3: We can modify the chart when required and fill in the hues and lines of your decision. For instance, we can pick alternate shading and utilize a strong line of a dashed line. We can customize the graph as we want to customize it.

example 1-11

Things to Remember

  1. We must always check the dependent and independent values. Otherwise, the analysis will be wrong.
  2. If you test a huge number of data and thoroughly rank them based on their validation period statisticsStatistics is the science behind identifying, collecting, organizing and summarizing, analyzing, interpreting, and finally, presenting such data, either qualitative or quantitative, which helps make better and effective decisions with relevance.read more.
  3. Choose the data carefully to avoid any kind of error in excel analysis.
  4. We can optionally check any of the boxes at the bottom of the screen, although none of these is necessary to obtain the line best-fit formula.
  5. Start practicing with small data to understand the better analysis and run the regression analysis tool in Excel easily.

Recommended Articles

This article is a step-by-step guide to Regression Analysis in Excel. Here we discuss how to run regression in Excel, its interpretation, and use this tool along with Excel examples and downloadable Excel templates. You may also look at these useful functions in Excel: –

  • Examples of Normal Distribution Graph in Excel
  • Regression vs. ANOVABoth the Regression and ANOVA are the statistical models which are used in order to predict the continuous outcome but in case of the regression, continuous outcome is predicted on basis of the one or more than one continuous predictor variables whereas in case of ANOVA continuous outcome is predicted on basis of the one or more than one categorical predictor variables.read more
  • Excel Exponential Smoothing
  • Exponential Function ExcelExponential Excel function(EXP) is an inbuilt function in excel used to calculate the exponent raised to the power of any number you provide. In this function the exponent is constant and is also known as the base of the natural algorithm.read more

Reader Interactions

In this tutorial, you’ll learn how to perform Linear Regression in Excel. Linear regression is an approach to linear modeling the relationship between a dependent and an independent variable. Simple linear regression uses an independent variable to predict the outcome of the dependent variable.

The equation for linear regression is given by: y = a + bx, where x is the independent variable, y is the dependent variable and the coefficients are given by:

linear regression formula

Our aim is to find coefficients a which is the intercept and b which is the slope to obtain the equation of the straight line which best fits our data by the least square method. There are two ways in Excel in which we can find the linear regression line which is discussed below for the following data set:

x y data 1

Calculate Linear Regression in Excel Using Its Formula

First, we need to calculate the parameters in the formula for coefficients a and b. The parameters are Σx, Σy, Σxy and Σx2 . To calculate Σx follow these steps:

  • Select the cell where you want to calculate and display the summation of x.
  • Type =SUM(, select the cells containing the numbers and complete the formula with ).
summation
  • Press the Enter key to display the result.
summation x result

To calculate Σy follow these steps:

  • Select the cell where you want to calculate and display the summation of y.
  • Type =SUM(, select the cells containing the numbers and complete the formula with ).
summation y
  • Press the Enter key to display the result.
summation y result

To calculate Σxy follow these steps:

  • Select the cell where you want to calculate and display the product of a pair of x and y values.
  • Type =B2*C2, as the first x and y values are in cells B2 and C2 respectively.
single
  • Press the Enter key to display the result.
first xy product
  • Copy the formula for the calculation of product xy for the entire list by dragging down the fill handle.
xy entire list
  • Select the cell where you want to calculate and display the summation of xy.
  • Type =SUM(, select the cells containing the numbers and complete the formula with ).
summation
  • Press the Enter key to display the result.
summation xy result

To calculate Σx2 follow these steps:

  • Select the cell where you want to calculate and display the square of the first x value.
  • Type =B2^2, as the first x value, is in cell B2. The caret operator raises the number to the power written next to it.
x squared sfirst
  • Press the Enter key to display the result.
x squared sfirst result
  • Copy the formula for the calculation of squares of x for the entire list by dragging down the fill handle.
squares of x entire list
  • Select the cell where you want to calculate and display the summation of x2.
  • Type =SUM(, select the cells containing the numbers and complete the formula with ).
summation x sqaured formula
  • Press the Enter key to display the result.
summation x sqaured

We now have the parameters essential for the calculation of coefficients intercept a and slope b. Follow the steps to calculate the intercept a:

  • Select the cell where you want to display the value of the intercept.
  • Type =(C6*E6-B6*D6)/(4*E6-B6^2), where C6 contains the value of Σy, E6 contains the value of Σx2, B6 contains the value of Σx, D6 contains the value of Σxy and 4 is the number of data points in the data set.
intercept calculation
  • Press the Enter key to display the result.
intercept result

Follow the steps to calculate the slope b:

  • Select the cell where you want to display the value of the slope.
  • Type =(4*D6-B6*C6)/(4*E6-B6^2), where C6 contains the value of Σy, E6 contains the value of Σx2, B6 contains the value of Σx, D6 contains the value of Σxy and 4 is the number of data points in the data set.
slope calculation
  • Press the Enter key to display the result.
slope result

We have the values for the slope and intercept, the equation for the linear regression can be written as y = 1.5 + 0.95x. This equation can now be used to predict values of y for different values of x.

Linear Regression in Excel Using Data Analysis

To use the Data Analysis feature, you need to enable Analysis Toolpak in Excel. Follow these steps to manually enable the feature:

  • Click on the File option present at the top left corner of the Excel window.
  • From the menu that appears, click on Options to launch the Excel Options dialog box.
  • Select the Add-ins option at the left side of the Excel Options dialog box.
  • Select Excel Add-ins in the Manage box, and click Go.
Excel Options 2
  • In the Add-ins dialog box, check the Analysis Toolpak checkbox, and then click OK.
add ins dialog
  • The Data Analysis option now appears in the Analysis group on the Data tab.
data analysis group

Follow these steps to perform linear regression using Data Analysis:

  • Click on Data Analysis present in the Analysis group on the Data tab.
  • From the Data Analysis dialog box that appears, select Regression under the Analysis Tools and click on OK.
data analysis dialog
  • Enter the cell ranges containing y values in the Input Y Range: text box and x values in the Input X Range: text box in the Regression dialog box and click OK.
regression dialog
  • The results are displayed in a new worksheet. You can copy the intercept and slope coefficients to obtain an equation for the linear regression: y = 1.5 + 0.95x.
data analysis regression results

Conclusion

In this tutorial, we learned how to perform linear regression both using the formulas and Excel Add-ins.

References

  • Load the Analysis ToolPak in Excel – Office Support (microsoft.com)


Download Article


Download Article

Regression analysis can be very helpful for analyzing large amounts of data and making forecasts and predictions. To run regression analysis in Microsoft Excel, follow these instructions.

  1. Image titled Run Regression Analysis in Microsoft Excel Step 1

    1

    If your version of Excel displays the ribbon (Home, Insert, Page Layout, Formulas…)

    • Click on the Office Button at the top left of the page and go to Excel Options.
    • Click on Add-Ins on the left side of the page.
    • Find Analysis tool pack. If it’s on your list of active add-ins, you’re set.
      • If it’s on your list of inactive add-ins, look at the bottom of the window for the drop-down list next to Manage, make sure Excel Add-Ins is selected, and hit Go. In the next window that pops up, make sure Analysis tool pack is checked and hit OK to activate. Allow it to install if necessary.
  2. Image titled Run Regression Analysis in Microsoft Excel Step 2

    2

    If your version of Excel displays the traditional toolbar (File, Edit, View, Insert…)

    • Go to Tools > Add-Ins.
    • Find Analysis tool pack. (If you don’t see it, look for it using the Browse function.)
      • If it’s in the Add-Ins Available box, make sure Analysis tool pack is checked and hit OK to activate. Allow it to install if necessary.

    Advertisement

  3. Image titled Run Regression Analysis in Microsoft Excel Step 3

    3

    Excel for Mac 2011 and higher do not include the analysis tool pack. You can’t do it without a different piece of software. This was by design since Microsoft does not like Apple.

  4. Advertisement

  1. Image titled Run Regression Analysis in Microsoft Excel Step 4

    1

    Enter the data into the spreadsheet that you are evaluating. You should have at least two columns of numbers that will be representing your Input Y Range and your Input X Range. Input Y represents the dependent variable while Input X is your independent variable.

  2. Image titled Run Regression Analysis in Microsoft Excel Step 5

    2

    Open the Regression Analysis tool.

    • If your version of Excel displays the ribbon, go to Data, find the Analysis section, hit Data Analysis, and choose Regression from the list of tools.
    • If your version of Excel displays the traditional toolbar, go to Tools > Data Analysis and choose Regression from the list of tools.
  3. Image titled Run Regression Analysis in Microsoft Excel Step 6

    3

    Define your Input Y Range. In the Regression Analysis box, click inside the Input Y Range box. Then, click and drag your cursor in the Input Y Range field to select all the numbers you want to analyze. You will see a formula that has been entered into the Input Y Range spot.

  4. Image titled Run Regression Analysis in Microsoft Excel Step 7

    4

    Repeat the previous step for the Input X Range.

  5. Image titled Run Regression Analysis in Microsoft Excel Step 8

    5

    Modify your settings if desired. Choose whether or not to display labels, residuals, residual plots, etc. by checking the desired boxes.

  6. Image titled Run Regression Analysis in Microsoft Excel Step 9

    6

    Designate where the output will appear. You can either select a particular output range or send the data to a new workbook or worksheet.

  7. Image titled Run Regression Analysis in Microsoft Excel Step 10

    7

    Click OK. The summary of your regression output will appear where designated.

  8. Advertisement

Sample Regression Analyses

Add New Question

  • Question

    What is the slope in a simple regression data?

    Community Answer

    The slope is the Beta variable B1 that is a coefficient of the independent variable X. Bo is a constant and the «intercept». Example, Y = Bo + B1X.

  • Question

    How do I calculate standard error?

    Community Answer

    Step 1: Calculate the mean (Total of all samples divided by the number of samples). Step 2: Calculate each measurement’s deviation from the mean (Mean minus the individual measurement). Step 3: Square each deviation from mean. Squared negatives become positive.

  • Question

    How can I calculate the equation of a line in regression in Excel?

    Community Answer

    One quick way to do this is to arrange your X and Y variables in adjacent columns (X on the left), then select the two-column range and use the Insert/Scatterchart command to insert an X-Y scatterchart. Then right-click on the chart, choose Add Trendline from the drop-down menu, and then check the box for Display-Equation-on-Chart. Or, you could use some good software to fit the whole regression model. Try RegressIt, a free add-in (available at regressit-dot-com), It gives very detailed and well-designed output, and among other things it will show the equation for any number of independent variables. Just click the «Show All» button after fitting a model.

Ask a Question

200 characters left

Include your email address to get a message when this question is answered.

Submit

Advertisement

Video

Thanks for submitting a tip for review!

About This Article

Thanks to all authors for creating a page that has been read 1,310,913 times.

Is this article up to date?

Skip to content

How to do Linear Regression in Excel: Full Guide (2023)

How to do Linear Regression in Excel: Full Guide (2023)

Linear regression is an easy way of evaluating the relationship between two variables.

Previously, performing linear regression in Excel was nothing less than a complex task. But with advanced Excel data analysis tools, it is now only a matter of a few clicks.

The guide below will not only teach you how to perform linear regression in Excel but also how you may analyze a linear regression graph in Excel.

So, without further ado, let’s dive right in 👇

Download our free sample workbook here as you continue reading.

Linear regression equation

Simple linear regression draws the relationship between a dependent and an independent variable.

👉 The dependent variable is the variable that needs to be predicted (or whose value is to be found).

👉 The independent variable explains (or causes) the change in the dependent variable.

Simply put, the dependent variable depends upon the independent variable. And as the independent variable changes, the dependent variable changes too.

Mathematically, the linear relationship between these two variables is explained as follows:

Y= a + bx

Where,

Y = dependent variable

a = regression intercept term

b = regression slope coefficient

x = independent variable

“a” and “b” are also called regression coefficients. And Excel returns the predicted values of these regression coefficients too.

Kasper Langmann, Microsoft Office Specialist

How to do linear regression through a graph

Imagine a company that sells sweaters in a cold region. And the sale of sweaters is directly linked to the temperatures in that region.

The colder it is (low temperatures 🥶), the higher the sales of sweaters 🧣 go. This means sales (the dependent variable) depend upon the temperature (the independent variable).

Now, to predict the company’s sales for the future, you must analyze the sales trend in the past. This can be done by drawing a trendline.

Drawing this trendline between a dependent variable Y (the sales) and an independent variable X (the temperature) is called running linear regression.

So let’s do it!

Data of X and and Y values

The image above contains the historical data for both variables (temperatures and sales) for a few months.

To explain the relationship between these variables, we need to make a scatter plot.

To plot the above data in a scatter plot in Excel:

  1. Select the data.
  2. Go to the Insert Tab > Charts Group
Selection of the graph from the insert tab
  1. Click on the scatterplot part icon.
  2. Choose a scatter plot type from the drop-down menu.
Selection of data and scatterplot

Excel plots the data in a scatter plot.

Scatterplot in raw form

Note that each dot in the scatter plot above is formed at the intersection of Variable X and Y.

For example, the first dot is plotted at the point where Y = 625 and X = 2.

Next, we must draw a trend line out of this scatter plot. To do so:

  1. Click anywhere on the chart to select it.
  2. Click on the “+” icon on the top right of the chart.
Chart elements for trendline
  1. Hover your cursor over the option “Trendline”📈

A drop-down menu appears.

Chart elements for trendline
  1. Select More Options. This will take you to the Format Trendline Pane.
  2. Choose the linear trendline option to draw a trendline between the scatter points.
Selection for trendline option

And there you go! Excel draws a linear trendline on the scatterplot.

Trendline for the data

The above image shows a downward regression line which represents a negative trend. But why is that?

To understand that, you must know how to analyze the results of a linear regression graph. And don’t worry – it’s only a section ahead.

Adding the equation and R-squared

We also want Excel to show the equation and R-squared for this graph. For that:

  1. Scroll down the Task pane.
  2. Check the option for “Equation” and “R-squared” on the graph.
Selection of options for the graph

And Excel will display the following regression statistics on the graph:

Equation: y= -19.622x + 612.77

R-squared= 0.7456

Regression statistics for dependent variables

What are these? And what do they tell? We will discuss this shortly.

Pro Tip!

How to quickly interpret the relationship between two variables? By checking the sign of the x variable 💡

A positive sign means a positive relationship. And a negative sign means a negative relationship between the two variables.

Since our equation shows a “-19.622x”, the relation between our variables is negative.

Formatting the trendline

Do you also find the trendline a little overshadowed? Not to worry – You can always format it in Excel.

For example, to change the color of the trendline:

  1. Select the trendline and right-click on it to launch the context menu.
  2. Go to Format Trendline.
  3. Under the Format Trendline pane, select “Fill & Line”.
  4. To change the color of the trendline, choose a color as shown below.
Color tab

Guess we will go with red for now 🚩 What do you think about it?

Changing the color of the trendline

Trendline Style

Not only the color, but you can also change the style of the trendline.

Say, we want to change our dotted trendline to a solid one. To do so:

  1. Select the trendline and right-click on it to launch the context menu.
  2. Click on Format Trendline to launch the Format Trendline Pane.
  3. Go to “Dash type” from the fill & line menu.
  4. Select a solid line type.
Formatting of the trendline

This will change the style of the trendline from a dotted line to a perfectly solid line.

Changing the style of the trendline

Chart Title

To enhance the readability of the graph, you may add graph titles and axes titles to it as follows:

  1. Select the graph.
  2. Go to Chart Elements > Chart Title > above chart.
Adding chart title
  1. Type in a Graph/Chart title as desired.
Adding chart title

Axis titles

How about adding the Axis titles too?

To add a vertical title (for the Y-axis) to your chart:

  1. Click Chart Elements > Axis Titles > Primary Vertical.
Adding Axis Title
  1. Type in a suitable title for the subject axis.

We have set the title for the Y-axis to “Sale of Sweaters”.

New Vertical Axis Title

To add a horizontal Axis Title (for the X-axis):

  1. Go to Chart elements > Axis Titles > Primary Horizontal.
Adding Axis Title
  1. Type in a suitable title for the subject axis.

We have set the title for the X-axis to “Avg. Temperature”

New Horizontal Axis Title

And that’s it. We’ve successfully run linear regression in Excel 🥳

How to analyze the linear regression graph

Good job with running linear regression in Excel.

Now is the time that we analyze the linear regression trendline formed above.

A linear trendline in Excel can take the following three shapes:

Positive trendline (upward facing)

If your trendline is upward facing (it elevates as it goes from left to right), it denotes a positive trend.

This means that there exists a positive relationship between both variables. An increase in the independent variable causes the dependent variable to increase.

This is how your graph will look with a positive trendline to it.

Positive trendline

Negative trendline (downward sloping)

If your trendline is downward sloping (it slopes down as it goes from left to right), it denotes a negative trend.

A negative trendline means a negative relationship between both variables.

When there is a negative relationship between two variables, an increase in the independent variable causes the dependent variable to decrease.

This is how your graph will look with a negative trendline to it.

Negative trendline

Jog down your memory lane to remember the trendline type in our example above. It was also a downward-sloping (negative) trendline.

That’s because there exists a negative relationship between sales and temperature. As the temperature falls, sales increase.

No trend

The two variables can also be independent of each other. In this case, movement in both variables is random with no relation to each other.

As there exists no relationship between them (neither positive nor negative), there is no particular slope for the trendline between them (neither upward facing nor downward sloping).

Such a trendline might look like this.

No trend

The trendline above is not exactly horizontal but very close to that. This is because there is no relation between the variables.

The slope of the graph

What if we want to know the percentage of change in Y caused by a change in X?

For example, for every 1% decrease in temperature, sales increase by what percentage?

The slope of the graph is an answer to this. Remember the linear regression equation?

Y = a + bx

In the above equation, the slope is represented by “b”. And the linear regression equation for our example turned out as follows:

Y= 612.77 – 19.622x

Here, the value for b is -19.622 and so is our slope. This means that a 1% change in the X variable (the temperature) causes a -19.622% change in the Y variable (the sales).

Also, as the sign with the value for b is a minus sign, this means that a 1% decrease in Variable X (temperature) causes a 19.622% increase in Variable Y (Sales).

Slope of the trendline

Pro Tip!

An easy way to remember the slope is to remember Rise over Run. Rise means vertical axis. Run means horizontal axis. So the slope defines the change in variable Y caused by a change in variable X.

R-Squared

Another important output of our scatterplot is the R-squared value 👀

It tells us how much variation of the dependent variable comes from the change in the independent variable.

 R-squared of dependent and independent variables

The R-squared for our example is 0.7456.

This tells that only 74.56% variation of Variable Y can be explained by Variable X.

Another statistical measure relevant to the linear regression model is the p value. However, it is totally opposite to the concept of R-squared.

Kasper Langmann, Microsoft Office Specialist

That’s it – Now what?

The above guide explains how to perform a linear regression analysis in Excel. And then, how to analyze the linear regression trendline and other relevant statistics.

👉 In addition to that, it also explains how you may format a trendline in Excel in different ways.

Performing linear regression in Excel through a scatter plot is super smart. But this is only one feature of Excel.

And there are many more smart functions in Excel. Like the VLOOKUP, SUMF, and IF functions.

Want to learn them already? Enroll in my 30-minute free email course that teaches you these and many more functions of Excel.

Other resources

Linear regression can be challenging to understand. But once you get a hold of it, you can run it for any possible dataset with sheer ease.

In addition to linear regression, Excel offers other forecasting functions too. Like the data analysis tools in Excel and the Excel FORECAST function.

Kasper Langmann2023-02-23T14:55:48+00:00

Page load link

What Is Linear Regression?

Linear regression is a type of data analysis that considers the linear relationship between a dependent variable and one or more independent variables. It is typically used to visually show the strength of the relationship or correlation between various factors and the dispersion of results – all for the purpose of explaining the behavior of the dependent variable. The goal of a linear regression model is to estimate the magnitude of a relationship between variables and whether or not it is statistically significant.

Say we wanted to test the strength of the relationship between the amount of ice cream eaten and obesity. We would take the independent variable, the amount of ice cream, and relate it to the dependent variable, obesity, to see if there was a relationship. Given a regression is a graphical display of this relationship, the lower the variability in the data, the stronger the relationship and the tighter the fit to the regression line. 

In finance, linear regression is used to determine relationships between asset prices and economic data across a range of applications. For instance, it is used to determine the factor weights in the Fama-French Model and is the basis for determining the Beta of a stock in the capital asset pricing model (CAPM).

Here, we look at how to use data imported into Microsoft Excel to perform a linear regression and how to interpret the results.

Key Takeaways

  • Linear regression models the relationship between a dependent and independent variable(s).
  • Also known as ordinary least squares (OLS), a linear regression essentially estimates a line of best fit among all variables in the model.
  • Regression analysis can be considered robust if the variables are independent, there is no heteroscedasticity, and the error terms of variables are not correlated.
  • Modeling linear regression in Excel is easier with the Data Analysis ToolPak.
  • Regression output can be interpreted for both the size and strength of a correlation among one or more variables on the dependent variable.

Important Considerations

There are a few critical assumptions about your data set that must be true to proceed with a regression analysis. Otherwise, the results will be interpreted incorrectly or they will exhibit bias:

  1. The variables must be truly independent (using a Chi-square test).
  2. The data must not have different error variances (this is called heteroskedasticity (also spelled heteroscedasticity)).
  3. The error terms of each variable must be uncorrelated. If not, it means the variables are serially correlated.

If those three points sound complicated, they can be. But the effect of one of those considerations not being true is a biased estimate. Essentially, you would misstate the relationship you are measuring.

Outputting a Regression in Excel

The first step in running regression analysis in Excel is to double-check that the free Excel plugin Data Analysis ToolPak is installed. This plugin makes calculating a range of statistics very easy. It is not required to chart a linear regression line, but it makes creating statistics tables simpler.  To verify if installed, select «Data» from the toolbar. If «Data Analysis» is an option, the feature is installed and ready to use. If not installed, you can request this option by clicking on the Office button and selecting «Excel options».

Using the Data Analysis ToolPak, creating a regression output is just a few clicks.

The independent variable in Excel goes in the X range.

Given the S&P 500 returns, say we want to know if we can estimate the strength and relationship of Visa (V) stock returns. The Visa (V) stock returns data populates column 1 as the dependent variable. S&P 500 returns data populates column 2 as the independent variable.

  1. Select «Data» from the toolbar. The «Data» menu displays.
  2. Select «Data Analysis». The Data Analysis — Analysis Tools dialog box displays.
  3. From the menu, select «Regression» and click «OK».
  4. In the Regression dialog box, click the «Input Y Range» box and select the dependent variable data (Visa (V) stock returns).
  5. Click the «Input X Range» box and select the independent variable data (S&P 500 returns).
  6. Click «OK» to run the results.

[Note: If the table seems small, right-click the image and open in new tab for higher resolution.]

Interpret the Results

Using that data (the same from our R-squared article), we get the following table:

The R2 value, also known as the coefficient of determination, measures the proportion of variation in the dependent variable explained by the independent variable or how well the regression model fits the data. The R2 value ranges from 0 to 1, and a higher value indicates a better fit. The p-value, or probability value, also ranges from 0 to 1 and indicates if the test is significant. In contrast to the R2 value, a smaller p-value is favorable as it indicates a correlation between the dependent and independent variables.

Interpreting the Results

The bottom line here is that changes in Visa stock seem to be highly correlated with the S&P 500.

  • In the regression output above, we can see that for every 1-point change in Visa, there is a corresponding 1.36-point change in the S&P 500.
  • We can also see that the p-value is very small (0.000036), which also corresponds to a very large T-test. This indicates that this finding is highly statistically significant, so the odds that this result was caused by chance are exceedingly low.
  • From the R-squared, we can see that the V price alone can explain more than 62% of the observed fluctuations in the S&P 500 index.

However, an analyst at this point may heed a bit of caution for the following reasons:

  • With only one variable in the model, it is unclear whether V affects the S&P 500 prices, if the S&P 500 affects V prices, or if some unobserved third variable affects both prices.
  • Visa is a component of the S&P 500, so there could be a co-correlation between the variables here.
  • There are only 20 observations, which may not be enough to make a good inference.
  • The data is a time series, so there could also be autocorrelation.
  • The time period under study may not be representative of other time periods.

Charting a Regression in Excel

We can chart a regression in Excel by highlighting the data and charting it as a scatter plot. To add a regression line, choose «Add Chart Element» from the «Chart Design» menu. In the dialog box, select «Trendline» and then «Linear Trendline». To add the R2 value, select «More Trendline Options» from the «Trendline menu. Lastly, select «Display R-squared value on chart». The visual result sums up the strength of the relationship, albeit at the expense of not providing as much detail as the table above. 

Image by Sabrina Jiang © Investopedia 2020

How Do You Interpret a Linear Regression?

The output of a regression model will produce various numerical results. The coefficients (or betas) tell you the association between an independent variable and the dependent variable, holding everything else constant. If the coefficient is, say, +0.12, it tells you that every 1-point change in that variable corresponds with a 0.12 change in the dependent variable in the same direction. If it were instead -3.00, it would mean a 1-point change in the explanatory variable results in a 3x change in the dependent variable, in the opposite direction.

How Do You Know If a Regression Is Significant?

In addition to producing beta coefficients, a regression output will also indicate tests of statistical significance based on the standard error of each coefficient (such as the p-value and confidence intervals). Often, analysts use a p-value of 0.05 or less to indicate significance; if the p-value is greater, then you cannot rule out chance or randomness for the resultant beta coefficient. Other tests of significance in a regression model can be t-tests for each variable, as well as an F-statistic or chi-square for the joint significance of all variables in the model together.

How Do You Interpret the R-Squared of a Linear Regression?

R2 (R-squared) is a statistical measure of the goodness of fit of a linear regression model (from 0.00 to 1.00), also known as the coefficient of determination. In general, the higher the R2, the better the model’s fit. The R-squared can also be interpreted as how much of the variation in the dependent variable is explained by the independent (explanatory) variables in the model. Thus, an R-square of 0.50 suggests that half of all of the variation observed in the dependent variable can be explained by the dependent variable(s).

R Square | Significance F and P-Values | Coefficients | Residuals

This example teaches you how to run a linear regression analysis in Excel and how to interpret the Summary Output.

Below you can find our data. The big question is: is there a relation between Quantity Sold (Output) and Price and Advertising (Input). In other words: can we predict Quantity Sold if we know Price and Advertising?

Regression Data in Excel

1. On the Data tab, in the Analysis group, click Data Analysis.

Click Data Analysis

Note: can’t find the Data Analysis button? Click here to load the Analysis ToolPak add-in.

2. Select Regression and click OK.

Select Regression

3. Select the Y Range (A1:A8). This is the predictor variable (also called dependent variable).

4. Select the X Range(B1:C8). These are the explanatory variables (also called independent variables). These columns must be adjacent to each other.

5. Check Labels.

6. Click in the Output Range box and select cell A11.

7. Check Residuals.

8. Click OK.

Regression Input and Output

Excel produces the following Summary Output (rounded to 3 decimal places).

R Square

R Square equals 0.962, which is a very good fit. 96% of the variation in Quantity Sold is explained by the independent variables Price and Advertising. The closer to 1, the better the regression line (read on) fits the data.

R Square

Significance F and P-values

To check if your results are reliable (statistically significant), look at Significance F (0.001). If this value is less than 0.05, you’re OK. If Significance F is greater than 0.05, it’s probably better to stop using this set of independent variables. Delete a variable with a high P-value (greater than 0.05) and rerun the regression until Significance F drops below 0.05.

Most or all P-values should be below below 0.05. In our example this is the case. (0.000, 0.001 and 0.005).

Anova

Coefficients

The regression line is: y = Quantity Sold = 8536.214 -835.722 * Price + 0.592 * Advertising. In other words, for each unit increase in price, Quantity Sold decreases with 835.722 units. For each unit increase in Advertising, Quantity Sold increases with 0.592 units. This is valuable information.

You can also use these coefficients to do a forecast. For example, if price equals $4 and Advertising equals $3000, you might be able to achieve a Quantity Sold of 8536.214 -835.722 * 4 + 0.592 * 3000 = 6970.

Residuals

The residuals show you how far away the actual data points are fom the predicted data points (using the equation). For example, the first data point equals 8500. Using the equation, the predicted data point equals 8536.214 -835.722 * 2 + 0.592 * 2800 = 8523.009, giving a residual of 8500 — 8523.009 = -23.009.

Residuals

You can also create a scatter plot of these residuals.

Scatter Plot

Понравилась статья? Поделить с друзьями:
  • How to do regression in excel
  • How to do ranking in excel
  • How to do probabilities in excel
  • How to do powers on excel
  • How to find meaning of a word