linear regression
Algebra
Finance
Statistics
Examples of linear regression in the following topics:
-
The Equation of a Line
- In statistics, linear regression can be used to fit a predictive model to an observed data set of $y$ and $x$ values.
- In statistics, simple linear regression is the least squares estimator of a linear regression model with a single explanatory variable.
- Simple linear regression fits a straight line through the set of $n$ points in such a way that makes the sum of squared residuals of the model (that is, vertical distances between the points of the data set and the fitted line) as small as possible.
- Linear regression was the first type of regression analysis to be studied rigorously, and to be used extensively in practical applications.
- If the goal is prediction, or forecasting, linear regression can be used to fit a predictive model to an observed data set of $y$ and $X$ values.
-
Introduction to inference for linear regression
- In this section we discuss uncertainty in the estimates of the slope and y-intercept for a regression line.
- However, in the case of regression, we will identify standard errors using statistical software.
- This video introduces consideration of the uncertainty associated with the parameter estimates in linear regression.
-
Evaluating Model Utility
- Multiple regression is beneficial in some respects, since it can show the relationships between more than just two variables; however, it should not always be taken at face value.
- It is easy to throw a big data set at a multiple regression and get an impressive-looking output.
- But many people are skeptical of the usefulness of multiple regression, especially for variable selection, and you should view the results with caution.
- You should examine the linear regression of the dependent variable on each independent variable, one at a time, examine the linear regressions between each pair of independent variables, and consider what you know about the subject matter.
- You should probably treat multiple regression as a way of suggesting patterns in your data, rather than rigorous hypothesis testing.
-
Slope and Intercept
- A simple example is the equation for the regression line which follows:
- Linear regression is an approach to modeling the relationship between a scalar dependent variable $y$ and one or more explanatory (independent) variables denoted $X$.
- The case of one explanatory variable is called simple linear regression.
- For more than one explanatory variable, it is called multiple linear regression.
- (This term should be distinguished from multivariate linear regression, where multiple correlated dependent variables are predicted, rather than a single scalar variable).
-
Polynomial Regression
- The goal of polynomial regression is to model a non-linear relationship between the independent and dependent variables.
- Although polynomial regression fits a nonlinear model to the data, as a statistical estimation problem it is linear, in the sense that the regression function $E(y\ | \ x)$ is linear in the unknown parameters that are estimated from the data.
- For this reason, polynomial regression is considered to be a special case of multiple linear regression.
- This is similar to the goal of non-parametric regression, which aims to capture non-linear regression relationships.
- Explain how the linear and nonlinear aspects of polynomial regression make it a special case of multiple linear regression.
-
Checking the Model and Assumptions
- These assumptions are similar to those of standard linear regression models.
- Linearity.
- Fortunately, slight deviations from linearity will not greatly affect a multiple regression model.
- In effect, residuals appear clustered and spread apart on their predicted plots for larger and smaller values for points along the linear regression line; the mean squared error for the model will be incorrect.
- Paraphrase the assumptions made by multiple regression models of linearity, homoscedasticity, normality, multicollinearity and sample size.
-
Estimating and Making Inferences About the Slope
- The purpose of a multiple regression is to find an equation that best predicts the $Y$ variable as a linear function of the $X$ variables.
- You use multiple regression when you have three or more measurement variables.
- The purpose of a multiple regression is to find an equation that best predicts the $Y$ variable as a linear function of the $X$variables.
- When the purpose of multiple regression is prediction, the important result is an equation containing partial regression coefficients (slopes).
- A graphical representation of a best fit line for simple linear regression.
-
Predictions and Probabilistic Models
- Best-practice advice here is that a linear-in-variables and linear-in-parameters relationship should not be chosen simply for computational convenience, but that all available knowledge should be deployed in constructing a regression model.
- A scatterplot shows a linear relationship between a quantitative explanatory variable $x$ and a quantitative response variable $y$.
- A good rule of thumb when using the linear regression method is to look at the scatter plot of the data.
- This graph is a visual example of why it is important that the data have a linear relationship.
- Each of these four data sets has the same linear regression line and therefore the same correlation, 0.816.
-
Regression Analysis for Forecast Improvement
- One can forecast based on linear relationships.
- Regression Analysis is a causal / econometric forecasting method.
- These methods include both parametric (linear or non-linear) and non-parametric techniques.
- The predictors are linearly independent, i.e. it is not possible to express any predictor as a linear combination of the others.
- Familiar methods, such as linear regression and ordinary least squares regression, are parametric, in that the regression function is defined in terms of a finite number of unknown parameters that are estimated from the data.
-
Multiple Regression Models
- Multiple regression is used to find an equation that best predicts the $Y$ variable as a linear function of the multiple $X$ variables.
- You use multiple regression when you have three or more measurement variables.
- The purpose of a multiple regression is to find an equation that best predicts the $Y$ variable as a linear function of the $X$ variables.
- Multiple regression is a statistical way to try to control for this; it can answer questions like, "If sand particle size (and every other measured variable) were the same, would the regression of beetle density on wave exposure be significant?
- As you are doing a multiple regression, there is also a null hypothesis for each $X$ variable, meaning that adding that $X$ variable to the multiple regression does not improve the fit of the multiple regression equation any more than expected by chance.