Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Assistance Needed: Optimizing Logistic Regression Model for Unbalanced Panel Data in Stata

    Dear Stata-Community,

    I am currently grappling with a challenge related to setting up a logistic regression model in Stata. Specifically, I am working with a highly unbalanced dataset that contains firm-level variables as dependent variables. Given the nature of the data and the possibility of fixed effects at both national and yearly levels, I am seeking your expertise to help me navigate this hurdle effectively.

    Here's a brief summary of my dataset:
    NATION YEAR DEP. VAR. (0 or 1) VAL. DEP. VAR. IND. VAR. VAL. IND. VAR.
    Nation A 2009 Variable Dep Value 1 Variable Ind Value 1
    Nation A 2011 Variable Dep Value 2 Variable Ind Value 2
    Nation B 2009 Variable Dep Value 3 Variable Ind Value 3
    Nation B 2010 Variable Dep Value 4 Variable Ind Value 4
    Nation B 2011 Variable Dep Value 5 Variable Ind Value 5
    Nation B 2012 Variable Dep Value 6 Variable Ind Value 6
    Nation B 2013 Variable Dep Value 7 Variable Ind Value 7
    Nation C 2012 Variable Dep Value 8 Variable Ind Value 8
    Nation D 2011 Variable Dep Value 9 Variable Ind Value 9
    Nation D 2012 Variable Dep Value 10 Variable Ind Value 10
    Nation D 2013 Variable Dep Value 11 Variable Ind Value 11

    Despite conducting multiple tests, I am struggling to identify the most suitable model setup. I have explored several options, including:
    1. logit DEPVAR INDVAR, vce(cluster CommonIdentifier_NATION-YEAR)
    2. logit DEPVAR INDVAR i.NATIONDUMMY i.YEARDUMMY
    3. logit DEPVAR INDVAR i.NATIONDUMMY, vce(YEAR)
    4. logit DEPVAR INDVAR i.YEARDUMMY, vce(NATION)
    Additionally, I attempted to utilize xtlogit for panel data analysis. However, I encountered the following error message:

    xtset NATION YEAR
    repeated time values within panel
    r(451);

    Even after setting "xtset NATION," I continued to face issues:

    xtlogit DEPVAR INDVAR, fe
    note: multiple positive outcomes within groups encountered.
    1,991 (group size) take 1,635 (# positives) combinations results in numeric overflow;
    computations cannot proceed
    r(1400);

    I would deeply appreciate your explicit support and guidance on this matter. Your insights are invaluable to me as I endeavor to optimize my regression model and derive the right conclusions from the data.

    Thank you very much for your support.

    Kind regards,

    Michael


  • #2
    Michael:
    the issue here seems to rest on the evidence that your dependent variable has a too limilted within panel variation.
    A remarkable variation in time-varying variables is one of the main conditions for the (conditional) -fe- estimator to work out properly.
    As an aside, using -panelid- dummy as a predictor (like you would do to estimate -fe- with OLS) does not work with -logit-.
    Kind regards,
    Carlo
    (StataNow 18.5)

    Comment

    Working...
    X