Account for U-shaped distribution in Logistic Regression

Thomas Selleslagh

Join Date: May 2020

Posts: 3
#1

Account for U-shaped distribution in Logistic Regression

08 May 2020, 08:39

Hello everyone,

I hope this questions fits here. I've used this forum a lot but due to my current problem I've finally made an account.

I want to run a logistic regression to predict "fail" (company bankruptcy). One of my independent variables "vvlt_twelve" has a U-shaped distribution.
Vvlt_twelve is a factor variable with 12 levels that represent an score for solvency of a company (1 being the worst score and 12 being the best possible score). It is transformed out of a continuous variable.

As you can see in the output below, 76% of all observations are reported in the two outer limits (1 and 12) of "vvlt_twelve", being the worst and best score respectively.
35% (5,426 observations) of al bankruptcy's have the highest/best possible "score" for this variable.

As I believe this distorts my results, is there a way in which I can account for this kind distribution?

I think some people might want to know why the variable (vvlt_twelve) was transformed into a factor variable. This was done because the continuous variable is a ratio whereby higher values represent lower solvency scores. However, the continuous variable can also have negative values, which is even worse for the solvency than a high value. As such, the lowest score of the factor variable (1) represents the negative values of the continuous variant. The following 11 levels of the factor variable (2 - 12) represent the positive values of the continuous variant in decreasing order.

Thank you in advance for your feedback.
Tags: None
Bruce Weaver

Join Date: May 2014

Posts: 1119
#2

08 May 2020, 09:54

Hi Thomas Selleslagh. Thanks for explaining why the original continuous variable has been converted to a factor variable. That was going to be my first question!

Have you considered polynomial contrasts (via p. or q. prefixes)?
https://www.stata.com/manuals/rcontrast.pdf

HTH.

--
Bruce Weaver
Email: [email protected]
Version: Stata/MP 18.5 (Windows)
Comment
Thomas Selleslagh

Join Date: May 2020

Posts: 3
#3

08 May 2020, 11:37

Hi Bruce Weaver . Thank you for your help!
I am not familiar with this technique and will read the documentation you provided. I will come back to you with the results.
Comment
Phil Bromiley

Join Date: Apr 2014

Posts: 4348
#4

11 May 2020, 12:51

To add to Bruce's helpful comment, when you talk about bankruptcy it makes me think that you have panel data. With panel data, including the squared term or whatever can be problematic – there is a paper by Shaver in Strategy Science that addresses the problem of squared terms in fixed effects estimation.
Comment

Announcement

Account for U-shaped distribution in Logistic Regression

Comment

Comment

Comment