Introduction to Simple Regression

Berger, Paul D.; Maurer, Robert E.; Celli, Giovana B.

doi:10.1007/978-3-319-64583-4_14

Paul D. Berger⁴,
Robert E. Maurer⁵ &
Giovana B. Celli⁶

5499 Accesses

Abstract

In previous chapters, we have had data for which there has been a dependent variable (Y ) and an independent variable (X – even though, to be consistent with the notation that is close to universal in the field of experimental design, we have been using factor names, A, B, etc., or “column factor” and “row factor,” instead of, literally, the letter X ). The latter has been treated mostly as a categorical variable, whether actually numerical/metric or not. Often, we have had more than one independent variable. Assuming only one independent variable, if we want to say it this way (and we do!), we can say that we have had n (X, Y ) pairs of data, where n is the total number of data points. With more than one independent variable, we can say that we have n (X ₁, X ₂, …, Y ) data points.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Notes

1.
Unless the scatter diagram indicates dramatically otherwise, we usually consider first a straight line, since it is the simplest functional form.
2.
The correlation coefficient is calculated as
$$ r=\frac{\sum_{i=1}^n\left({X}_i-\overline{X}\right)\left({Y}_i-\overline{Y}\right)}{\left(n-1\right){s}_X{s}_Y} $$
where s _X and s _Y are the sample standard deviation of the independent and the dependent variables, respectively, and, of course, $ \overline{X} $ and $ \overline{Y} $ are their sample means. Note that r is the same, regardless of which variable is labeled as independent or dependent.
3.
There are an infinite number of possible lines – so that REALLY would take up our free time! Until the availability of personal computers with software packages such as Excel , we would find the parameter values by evaluating equations.
4.
It can be shown that the LS line is unique.
5.
Confidence intervals for predictions are covered in the next chapter.

Author information

Authors and Affiliations

Bentley University, Waltham, MA, USA
Paul D. Berger
Questrom School of Business, Boston University, Boston, MA, USA
Robert E. Maurer
Cornell University, Ithaca, NY, USA
Giovana B. Celli

Authors

Paul D. Berger
View author publications
You can also search for this author in PubMed Google Scholar
Robert E. Maurer
View author publications
You can also search for this author in PubMed Google Scholar
Giovana B. Celli
View author publications
You can also search for this author in PubMed Google Scholar

1 Electronic Supplementary Material

Table Ex3 (XLS 58 kb)

Appendix

Example 14.9 Trends in Selling Toys using R

To analyze the same example, we can import the data as we have done previously or create our own in R. We will demonstrate the second option here – after all, it is more fun! This is how it is done:

> advertisement <- c(10, 20, 30, 40, 50, 60, 70, 80) > sales <- c(35, 80, 56, 82, 126, 104, 177, 153) > toy <- data.frame(advertisement, sales)

A quick inspection will show us the data frame was successfully created.

> toy

	advertisement	sales
1	10	35
2	20	80
3	30	56
4	40	82
5	50	126
6	60	104
7	70	177
8	80	153

Using the plot() function, we can generate a scatter plot of our data, shown in Fig. 14.10. The correlation analysis is shown next.

> plot(toy, pch=16, cex=1.0, main="Sales +vs. Advertisement")

> cor(toy,method="pearson")

	advertisement	sales
advertisement	1.0000000	0.9060138
sales	0.9060138	1.0000000

> cor.test(toy$advertisement, toy$sales, method="pearson") Pearson's product-moment correlation data: toy$advertisement and toy$sales t = 5.2434, df = 6, p-value = 0.001932 alternative hypothesis: true correlation is not equal to 0 95 percent confidence interval: 0.5568723 0.9830591 sample estimates: cor 0.9060138

Now, we will perform a regression analysis , using the lm() function.

> toy_regression <- lm(sales~advertisement, data=toy) > summary(toy_regression) Call: lm(formula = sales ~ advertisement, data = toy)

Residuals:

Min	1Q	Median	3Q	Max
-24.393	-13.027	-7.434	17.336	30.762

Coefficients:

	Estimate	Std. Error	t value	Pr(>\|t\|)
(Intercept)	21.3214	17.1861	1.241	0.26105
advertisement	1.7845	0.3403	5.243	0.00193	**

--- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 22.06 on 6 degrees of freedom Multiple R-squared: 0.8209, Adjusted R-squared: 0.791 F-statistic: 27.49 on 1 and 6 DF, p-value: 0.001932 > anova(toy_regression) Analysis of Variance Table Response: sales

	Df	Sum Sq	Mean Sq	F value	Pr(>F)
advertisement	1	13375.0	13375.0	27.494	0.001932	**
Residuals	6	2918.9	486.5

--- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

To include the regression line in the scatter plot (shown in Fig. 14.11), we can use any of the following commands:

> abline(21.3214, 1.7845) > abline(lm(sales~advertisement))

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Berger, P.D., Maurer, R.E., Celli, G.B. (2018). Introduction to Simple Regression. In: Experimental Design. Springer, Cham. https://doi.org/10.1007/978-3-319-64583-4_14

Download citation

DOI: https://doi.org/10.1007/978-3-319-64583-4_14
Published: 30 November 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-64582-7
Online ISBN: 978-3-319-64583-4
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics

Introduction to Simple Regression

Abstract

Access this chapter

Notes

Author information

Authors and Affiliations

1 Electronic Supplementary Material

Table Ex3 (XLS 58 kb)

Appendix

Appendix

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation