variable
Algebra
(noun)
An alphabetic character representing a number that is arbitrary or unknown.
(noun)
A symbol that represents a quantity in a mathematical expression, as used in many sciences.
Economics
(noun)
something whose value may be dictated or discovered.
Statistics
(noun)
a quantity that may assume any one of a set of values
Examples of variable in the following topics:
-
Explanatory and response variables
- If we suspect poverty might affect spending in a county, then poverty is the explanatory variable and federal spending is the response variable in the relationship.
- Sometimes the explanatory variable is called the independent variable and the response variable is called the dependent variable.
- If there are many variables, it may be possible to consider a number of them as explanatory variables.
- The explanatory variable might affect response variable.
- In some cases, there is no explanatory or response variable.
-
Variables
- In this case, the variable is "type of antidepressant. " When a variable is manipulated by an experimenter, it is called an independent variable.
- An important distinction between variables is between qualitative variables and quantitative variables.
- Qualitative variables are sometimes referred to as categorical variables.
- Quantitative variables are those variables that are measured in terms of numbers.
- The variable "type of supplement" is a qualitative variable; there is nothing quantitative about it.
-
Types of Variables
- Numeric variables have values that describe a measurable quantity as a number, like "how many" or "how much. " Therefore, numeric variables are quantitative variables.
- A continuous variable is a numeric variable.
- A discrete variable is a numeric variable.
- An ordinal variable is a categorical variable.
- A nominal variable is a categorical variable.
-
Qualitative Variable Models
- Dummy, or qualitative variables, often act as independent variables in regression and affect the results of the dependent variables.
- Dummy variables are "proxy" variables, or numeric stand-ins for qualitative facts in a regression model.
- In regression analysis, the dependent variables may be influenced not only by quantitative variables (income, output, prices, etc.), but also by qualitative variables (gender, religion, geographic region, etc.).
- One type of ANOVA model, applicable when dealing with qualitative variables, is a regression model in which the dependent variable is quantitative in nature but all the explanatory variables are dummies (qualitative in nature).
- Break down the method of inserting a dummy variable into a regression analysis in order to compensate for the effects of a qualitative variable.
-
Slope and Intercept
- The general purpose is to explain how one variable, the dependent variable, is systematically related to the values of one or more independent variables.
- The coefficients are numeric constants by which variable values in the equation are multiplied or which are added to a variable value to determine the unknown.
- Here, by convention, $x$ and $y$ are the variables of interest in our data, with $y$ the unknown or dependent variable and $x$ the known or independent variable.
- Linear regression is an approach to modeling the relationship between a scalar dependent variable $y$ and one or more explanatory (independent) variables denoted $X$.
- An equation where y is the dependent variable, x is the independent variable, m is the slope, and b is the intercept.
-
Types of variables
- This variable seems to be a hybrid: it is a categorical variable but the levels have a natural ordering.
- A variable with these properties is called an ordinal variable.
- To simplify analyses, any ordinal variables in this book will be treated as categorical variables.
- Are these numerical or categorical variables?
- Thus, each is categorical variables.
-
Correlation and Causation
- A positive correlation means that as one variable increases (e.g., ice cream consumption) the other variable also increases (e.g., crime).
- A negative correlation is just the opposite; as one variable increases (e.g., socioeconomic status), the other variable decreases (e.g., infant mortality rates).
- Causation refers to a relationship between two (or more) variables where one variable causes the other.
- change in the independent variable must precede change in the dependent variable in time
- it must be shown that a different (third) variable is not causing the change in the two variables of interest (a.k.a., spurious correlation)
-
Controlling for a Variable
- Controlling for a variable is a method to reduce the effect of extraneous variations that may also affect the value of the dependent variable.
- For instance, temperature is a continuous variable, while the number of legs of an animal is a discrete variable.
- There are also quasi-independent variables, which are used by researchers to group things without affecting the variable itself.
- In a scientific experiment measuring the effect of one or more independent variables on a dependent variable, controlling for a variable is a method of reducing the confounding effect of variations in a third variable that may also affect the value of the dependent variable.
- The failure to do so results in omitted-variable bias.
-
An alternative test statistic
- Recall that R2 described the proportion of variability in the response variable (y) explained by the explanatory variable (x).
- If this proportion is large, then this suggests a linear relationship exists between the variables.
- This concept – considering the amount of variability in the response variable explained by the explanatory variable – is a key component in some statistical techniques.
- The method states that if enough variability is explained away by the categories, then we would conclude the mean varied between the categories.
- On the other hand, we might not be convinced if only a little variability is explained.
-
Email data
- The email data set was first presented in Chapter 1 with a relatively small number of variables.In fact, there are many more variables available that might be useful for classifying spam.Descriptions of these variables are presented in Table 8.13.The spam variable will be the outcome, and the other 10 variables will be the model predictors.While we have limited the predictors used in this section to be categorical variables (where many are represented as indicator variables), numerical predictors may also be used in logistic regression.
- Recall from Chapter 7 that if outliers are present in predictor variables, the corresponding observations may be especially influential on the resulting model.
- This is the motivation for omitting the numerical variables, such as the number of characters and line breaks in emails, that we saw in Chapter 1.
- These variables exhibited extreme skew.
- We could resolve this issue by transforming these variables (e.g. using a log-transformation), but we will omit this further investigation for brevity.