Logistical regression models have binary or categorical variables as their independent variable.
Types of logistical regression models
| Model | Dependent variable | Independent variables | Example |
|---|---|---|---|
| Binary Logit | Binary → two outcomes | Individual-level | Does a person vote or not? |
| Multinomial Logit | Unordered categorical → three or more outcomes | Individual-level | What is the probability a person votes for the Greens (compared to other parties)? |
| Conditional Logit | Choice among alternatives → dependent on attributes f alternatives | Alternative-level (+ individual-level) | Why a voter chooses one candidate over another based on candidate attributes? |
Binary Logit
Binary Logistical Regression Models have a binary dependent variable. The variable is coded 0 for no and 1 for true.
The goal is:
- to estimate who likely it is that the variable is true (i.e. takes the value 1).
- to see how each of the independent variables affects that probability
Example
- A person voted in the last election
- A person voted by mail in the last election
- A person voted for a specific party, e.g. the Greens
Why can’t you use an linear (OLS) regression for this kind of questions?
Because it violates some of the OLS assumptions and therefore leads to incorrect results.
- OLS can predict probabilities that are below 0 and above 1 which doesn’t make sense given the dependent variable can only take two values
- The homoskedacitiy assumption is violated
- Linear relationship is inappropriate for binary outcomes
Model formula:
- The model predicts the probability for observation that the dependent variable = 1. Example: The probability that a person voted in an election.
- is a linear predictor: . It can take any real values
| Logistic function | Log odds (logit) |
|---|---|
| Probability between 0 and 1 | Can take any real value (-∞ to +∞) |
| Non-linear | Linear in predictors |
| Interpreted as expected probability | Interpreted as ratio of success/failure odds |
Multinomial Logit
Multinomial Logistical Regression models have a categorical dependent variable (e.g. which party someone votes for at an election).
Whereas the binary model only has two outcomes, the multinomial model can have more than two outcomes.
Model formula
Probability for the alternatives:
The model gives the probability for the different alternatives compared to an reference alternative .
- = Probability that observation chooses category conditional on a set of predictors
- = Alternatives of the outcome category
- = Reference alternative
- = vector of predictors for observation
- = Linear predictor (one for each category, each time with its own coefficients, scaled in log-odds)
- The denominator is for normalising the probabilities. It sums the exponentiated linear predictors over all categories, including the reference category. This way the resulting probabilities have positive values and sum up to 1.
- The 1 represents the reference category.
- The sums all the predictors for the other categories.
Probability for the reference categories:
For the numerator: The value of the linear predictor of the reference category is 0. As it is exponentiated as the other predictors are it takes the value 1 ().
Conditional Logit
A conditional logit models choices as the result of utility-maximation.
The goal is to examine how individual-specific characteristics and alternative-specific characteristics influence the probability of choosing an alternative from a set of alternatives .
- or = the expected utility for individual choosing alternative
- = an alternative-specific intercept (each choice can have its own baseline attractiveness)
- = a vector of individual-specific characteristics (age, income, etc.) + their alternative-specific coefficients → tells us how the individual characteristics affect the utility of alternative
- = a vector of alternative-specific characteristics that vary by individual. This term captures the effect of characteristics that depend both on the alternative and the individual
Example
Suppose a voter is choosing among three candidates in an election.
- Alternative-specific intercept: Things like the general popularity of the candidates
- Individual characteristics: Age and income of the voter
- Because each candidate might appeal differently to people of different ages and with different incomes the characteristics have alternative specific-coefficients.
- Alternative-specific characteristics: Campaign-spending per voter
Difference between Multinomial and Conditional Logit
The core difference between the two models is that the Conditional Logit allows one to specify alternative-specific characteristics, such as ideological distances (which vary across alternatives and individuals) in addition to individual-specific characteristics such as age.
Interpretations of coefficients
- Odds scale → log-odds, odds, odds ratio
- Probability scale → predicted probabilities, marginal effects
Log odds (logits)
The coefficients of the linear predictor take the form of log odds (logarithmised odds).
Formula:
Interpretation:
“If increases by 1 unit, log-odds of change by , holoding all other variables constant”
Because they are logarithmised they are hard to interpret substantially. Only the direction of the effect (positive / negative sign) can be interpreted.
The advantage is that their value does not depend on the levels of the independent variables.
Example
“For every unit increase in the year of schooling, the log-odds of going to vote (versus non-voting) increase by 0.71”
The log-odds are additive (as in a linear model).
Multinomial logit models
The interpretation of the log odds always have to be relative to the specified base alternative .
Interpretation:
“For a one unit change in , the logit of outcome versus outcome is expected to change by unites, holding all other variables constant”
Conditional logit models
Odds
The odds are the ratio of two probabilities:
- the probability that is true (takes the value of 1)
- the probability that is false (takes the value of 0)
Percentage change in odds:
- : Coefficient of the predictor (from the logistic regression output)
- : Size of change in units of the predictor
Hint
Odds can only be determined for specific observations as you need information from all covariates.
Multinomial Models → Relative Risks
For multinomial models, the odds (also called relative risks) are the ratio of:
- the risk / probability for the outcome
- the risk / probability of the base outcome .
Odds Ratios
The odds ratio compares two odds at different levels. It usually referes to the changes in odds when the value of the independent variable changes by 1.
To get the log odds, the log odds can be exponentiated. The log odds are multiplicative in parameters.
Interpretation:
- : the odds increase (they are times larger)
- : the odds do not change
- : the odds decrease (they are times smaller)
Odds ratios can also be interpreted as percentage changes:
If changes unites, the odds change by a factor (important, because it indicates that the change is multiplicative instead of additive) of .
Example
We use a binary logistical regression model to predict the probability that a person goes to vote. The independent variable we use is age.
The logistic regression output gives 0.05 as a coefficient for age ().
The odds ratio would give us the increase in odds (not in probability!) for one additional unit of age ():
The same can directly be calculated as a percentage change:
If we are interested in the change in odds for multiple years we can simply change the factor:
Multinomial Models → Relative Risk Ratio
Predicted probabilities
It is also possible to caculate for specific values using the probability formula.
Marginal effects
- A marginal effect shows how much the probability changes when a predictor changes slightly.
- In a non-linear model, the effect depends on the values of the predictors.
- Since each observation has different predictor values, the marginal effect can be different for each individual.
Example
In a linear regression model with the independent variable “age” the effect of that variable on the dependent variable is the same whether a person is 20 or 50 years old. The coefficient stays the same across all values for .
In a logistic regression model the effect of the coefficients is not linear. For example, the effect of age on whether a person votes or not is different at age 20 than it is at age 50.
Types of marginal effects:
| Type | Where the effect is evaluated | Interpretation |
|---|---|---|
| Average Marginal Effects | Average of the individual marginal effects over all observations | Population-average effect (on average for everyone) |
| Marginal Effects at the Means | Marginal effect at the mean values of all predictors (independent variables) | Typical-case effect (for single, representative individual) |
| Marginal Effects at a specific (representative) value | Marginal effect at specified values for all predictors (you choose your own values that you think are representative) | Effect for selected cases (e.g. male / female or low / high income) |
Marginplots
You basically plot the probabilities that the event occurs (e.g. that a person votes) for each level of one of the explanatory variables while holding the other variables in the model at a specific value.
Estimation method: Maximum likelihood estimation
Logistical regression models are estimated using the maximum likelihood estimation. The estimates from this method are the values of the model parameters that have the highest likelihood of generating the observed sample of the data.
Measures of model quality
McFaddens Pseudo-R²
The resulting value (which can range from 0 to 1) indicates by how much the log-likelihood improves when including the independent variables.
It basically compares the estimated model with a null model (i.e. one without independent variables) and tells us how much more probable the data is compared to the empty model.
- : Log-likelihood of the estimated model
- : Log-likelihood of the model without independent variables (i.e. one in which all coefficients are 0, expect the constant term)
Hint
Log-likelihood is the natural logarithm of the likelihood function .
The values of McFaddens Pseudo-R² range from 0 to 1. Values between 0.1 and 0.2 already indicate a relatively good fit (unlike with R² for OLS models).
Likelihood-Ratio-Test
The Likelihood-Ratio-Test ask whether the model fits the data significantly better than a null model (i.e. a model without predictors).
Again:
- : Log-likelihood of the estimated model
- : Log-likelihood of the model without independent variables (i.e. one in which all coefficients are 0, expect the constant term)
Values:
- is measured in log-likelihood units
- is close to 0 → the additional variables add zero to little
- is larger than 0 → the additional variables add
Test hypothesis:
- : The additional parameters are (alltogether) 0
- : The additional parameters are (alltogether) larger than 0
Small values of (< 0.05) → can be rejected chi means that the additional variable (alltogether) improve the fit of the model. is the equivalent of the t-value for OLS models.
Application in R
Binomial Logit Model
Command:
glm package
logit_model <- glm(enth ~ frau + alter + demo,
data = clean_data_btw09,
family = binomial(link="logit"))
summary(logit_model) Output:
Call:
glm(formula = enth ~ frau + alter + demo, family = binomial(link = "logit"),
data = clean_data_btw09)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -0.600236 0.187318 -3.204 0.00135 **
frau 0.097672 0.120252 0.812 0.41666
alter -0.014744 0.003423 -4.307 1.66e-05 ***
demo -1.423570 0.169070 -8.420 < 2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 1893.8 on 2103 degrees of freedom
Residual deviance: 1779.9 on 2100 degrees of freedom
AIC: 1787.9
Number of Fisher Scoring iterations: 5Marginal Effects
Plot:
Conditional Logit Model
Generic specification (can be seen from distance varibale and estimate coefficients)
> summary(clogit_model_gles25)
Call:
mclogit(formula = cbind(chosen, ID) ~ asc_spd + asc_fdp + asc_gruene +
asc_linke + asc_afd + asc_bsw + disimmi + voter_satisfaction_demo_spd +
voter_satisfaction_demo_fdp + voter_satisfaction_demo_gruene +
voter_satisfaction_demo_linke + voter_satisfaction_demo_afd +
voter_satisfaction_demo_bsw + voter_age_spd + voter_age_fdp +
voter_age_gruene + voter_age_linke + voter_age_afd + voter_age_bsw,
data = gles_25_LONG)
Estimate Std. Error z value Pr(>|z|)
asc_spd -0.630705 0.451182 -1.398 0.162144
asc_fdp -1.434503 0.565712 -2.536 0.011221 *
asc_gruene 1.708457 0.405892 4.209 2.56e-05 ***
asc_linke -0.038676 0.474478 -0.082 0.935033
asc_afd -4.971085 0.607000 -8.190 2.62e-16 ***
asc_bsw -4.642590 0.703062 -6.603 4.02e-11 ***
disimmi -0.427625 0.016412 -26.056 < 2e-16 ***
voter_satisfaction_demo_spd -0.324406 0.142732 -2.273 0.023036 *
voter_satisfaction_demo_fdp 0.487301 0.173556 2.808 0.004989 **
voter_satisfaction_demo_gruene -0.301205 0.132570 -2.272 0.023084 *
voter_satisfaction_demo_linke 0.496719 0.146618 3.388 0.000704 ***
voter_satisfaction_demo_afd 1.889413 0.164997 11.451 < 2e-16 ***
voter_satisfaction_demo_bsw 1.232663 0.178160 6.919 4.55e-12 ***
voter_age_spd 0.012460 0.005740 2.171 0.029940 *
voter_age_fdp -0.024976 0.007393 -3.378 0.000729 ***
voter_age_gruene -0.020292 0.005177 -3.920 8.87e-05 ***
voter_age_linke -0.033104 0.006206 -5.334 9.62e-08 ***
voter_age_afd -0.022761 0.006947 -3.276 0.001052 **
voter_age_bsw -0.001857 0.008337 -0.223 0.823779
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Null Deviance: 6815
Residual Deviance: 4349
Number of Fisher Scoring iterations: 6
Number of observations: 1751 The logit for disimmi of -0.427625 means that the estimated logarithmic chance of voting for any party (this is the distance measure which is generic - it does not change across alternatives) - as compared to voting for the CDU/CSU - decreases by -0.427625 when disimmi (the distance of the individual’s position with regard to immigration and the position in these issues of the parties) increases by one unit, c.p.
Example with an alternative-specific constant: The logit for asc_gruene (1.708457) means that the estimated logarithmic chance of voting for the Gruene are 1.708457 larger that compared to voting for the CDU/CSU, c.p.OR: The alternative-specific constant for Gruene (1.708457) means that, when all other variables in the model are zero (or at their reference levels), the log-odds of choosing Gruene over the reference category (CDU/CSU) is 1.708457 higher.