Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Account for U-shaped distribution in Logistic Regression

    Hello everyone,

    I hope this questions fits here. I've used this forum a lot but due to my current problem I've finally made an account.

    I want to run a logistic regression to predict "fail" (company bankruptcy). One of my independent variables "vvlt_twelve" has a U-shaped distribution.
    Vvlt_twelve is a factor variable with 12 levels that represent an score for solvency of a company (1 being the worst score and 12 being the best possible score). It is transformed out of a continuous variable.

    As you can see in the output below, 76% of all observations are reported in the two outer limits (1 and 12) of "vvlt_twelve", being the worst and best score respectively.
    35% (5,426 observations) of al bankruptcy's have the highest/best possible "score" for this variable.

    As I believe this distorts my results, is there a way in which I can account for this kind distribution?

    Click image for larger version

Name:	vvlt_twelve.PNG
Views:	1
Size:	44.2 KB
ID:	1551979


    I think some people might want to know why the variable (vvlt_twelve) was transformed into a factor variable. This was done because the continuous variable is a ratio whereby higher values represent lower solvency scores. However, the continuous variable can also have negative values, which is even worse for the solvency than a high value. As such, the lowest score of the factor variable (1) represents the negative values of the continuous variant. The following 11 levels of the factor variable (2 - 12) represent the positive values of the continuous variant in decreasing order.


    Thank you in advance for your feedback.

  • #2
    Hi Thomas Selleslagh. Thanks for explaining why the original continuous variable has been converted to a factor variable. That was going to be my first question!

    Have you considered polynomial contrasts (via p. or q. prefixes)? HTH.
    --
    Bruce Weaver
    Email: [email protected]
    Version: Stata/MP 18.5 (Windows)

    Comment


    • #3
      Hi Bruce Weaver . Thank you for your help!
      I am not familiar with this technique and will read the documentation you provided. I will come back to you with the results.

      Comment


      • #4
        To add to Bruce's helpful comment, when you talk about bankruptcy it makes me think that you have panel data. With panel data, including the squared term or whatever can be problematic – there is a paper by Shaver in Strategy Science that addresses the problem of squared terms in fixed effects estimation.

        Comment

        Working...
        X