Panel data binary logistic regression xtlogit: Procedure, robust standard errors and country, year & industry fixed effects

Franka Gutzmer

Join Date: Jul 2022

Posts: 8
#1

Panel data binary logistic regression xtlogit: Procedure, robust standard errors and country, year & industry fixed effects

29 Jul 2022, 05:07

Hi!

I look at a binary DV for 139 companies over 8 years. Thus, I have panel data. I have three IV, of which one is also a binary variable, and several control variables. As it is the first time that I look at a binary DV I have a few questions regarding pre-tests and the general approach and I am just insecure about the correct procedure. My approach so far:
Classical data cleaning / preperation including transformation to panel data via xtset id year

Descriptive statistics including correlation and multicollinarity check (VIF). VIF below 10 -->no multicollinarity. Two of the IV are medium correlated and I may sepearte them into two equations.

Heteroscedasticity check confirming heteroscedasticity in my sample. Thus, I thought I have to include robust standard errors via adding vce (robust) at the end of my regression equation.

Now my questions for the further procedure: With panel data I need to use xtlogit as the command, I guess. I used Hausman test to evaluate if I need random or fixed effects. The first time I did it my result said to use fixed effects. Then, I added a CV and the result changed to random effects. However, in both cases I cannot include vce(robust) as I get an error notification. Do I just leave them out now or what is the explanation behind that it cannot be included? Or is there a common way of including them?

Next, I want to include industry, country and year fixed effects. Is there a certain test that I have to do to decide whether these effects need to be included or do I just include them as common sense to reduce the risk for an omitted variable bias and reducing endogeneity concerns? If there is a test, which one and how do I apply it?

Once that is all clarfied and I have my final regression equation I would use Mc Fadden to look at the model fit and the "margins, dydx(IV) atmeans" command for interpreation correct?

Can anyone help me wioth my open questions? Any feedback on my procedure? Am I missing out an important step?

Thanks in advance!
Franka
Tags: fixed effects, logistic regression, logit, margins, panel data

Carlo Lazzaro

Join Date: Apr 2014
Posts: 17709

29 Jul 2022, 05:34

Franka:
welcome to this forum.
The following thread is possibly helpful: https://stats.stackexchange.com/ques...ression-with-d.
In addition:
1) OK, with the proviso that -xtset- does not transform your dataset; it simply alerts Stata taht you're dealing with a panel dataset;
2) Multicolliniearity is as haramful as it creates "weird" standard errors (SEs) (whathever "weird" may mean). In addition, there's growing stance about multicollinearity being a tad oversold an issue (See https://www.hup.harvard.edu/catalog....=9780674175440, Chapter 23);
3) see https://www.statalist.org/forums/for...-time-variable.
4) -hausman does not support non-default SEs, nor is correct to go default SEs and then add non-default SEs after -hausman- outcome;
5) there's no gain in including time-invariant predictors if you go -fe-, as you can see from the following toy-example:

Code:

. use "https://www.stata-press.com/data/r17/nlswork.dta"
(National Longitudinal Survey of Young Women, 14-24 years old in 1968)

. xtlogit not_smsa i.race, fe
note: multiple positive outcomes within groups encountered.
note: 3,976 groups (23,028 obs) omitted because of all positive or
      all negative outcomes.
note: 2.race omitted because of no within-group variance.
note: 3.race omitted because of no within-group variance.


Conditional fixed-effects logistic regression         Number of obs    = 5,498
Group variable: idcode                                Number of groups =   735

                                                      Obs per group:
                                                                   min =     2
                                                                   avg =   7.5
                                                                   max =    15

                                                      LR chi2(0)       = -0.00
Log likelihood = -2112.1815                           Prob > chi2      =     .

------------------------------------------------------------------------------
    not_smsa | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
        race |
      Black  |          0  (omitted)
      Other  |          0  (omitted)
------------------------------------------------------------------------------

.

6) detecting endogeneity has more to do with the knowledge of the data generating process than with white/black magic tests;
7) you may want to test the specification of the fiunctional form of the regressand via procedure similar to the one detailed in -linktest-, as it does not work after [XT] commands.

Kind regards,
Carlo
(Stata 19.0)

Comment

Franka Gutzmer

Join Date: Jul 2022

Posts: 8
#3

30 Jul 2022, 09:09

Hi Carlo,

Thank you very much for the links. That helped. Also, I guess now that fixed effects is actually not fitting my research purpose because of the drops due to the within-subject variability. Can I use xtlogit without random or fixed effects and then include time-invariant predictors like industry or country fixed effects?
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17709
#4

30 Jul 2022, 23:43

Franka:
if you call -xtlogit- without specifyimg its estimator, you're implicitly call the -re- one.
Hence, the answer is that you can switch to the -re- estimator and add the time-invariant predictor in the right-hand side of your regression equation.

Kind regards,
Carlo
(Stata 19.0)
Comment
Franka Gutzmer

Join Date: Jul 2022

Posts: 8
#5

31 Jul 2022, 08:37

Thanks Carlo!
Comment

Announcement