predict after xtlogit with fixed effect

Tracy Yang

Join Date: Jul 2020

Posts: 29
#1

predict after xtlogit with fixed effect

10 Jan 2022, 20:18

I am running a two-stage regression to account for the endogeneity issue. In the first stage I use xtlogit with fixed effects. Here are the codes.

Code:

xtlogit y x1 x2, fe predict yhat

In this stage, some observations are dropped because in some ID groups, the dependent variables are all 0 or 1. But the predict function still gives predicted values for all observations. I am wondering whether this should be the case? In the second stage, I will use yhat as the independent variable. The inconsistency between the number of observations used in the first regression and the number of predicted values will make the number of observations not consistent in the first stage and second stage. Should this be a concern?
Tags: None
William Lisowski

Join Date: Dec 2014

Posts: 10150
#2

11 Jan 2022, 08:06

Code:

predict yhat if e(sample)

as described in the output of help predict.
1 like
Comment
Tracy Yang

Join Date: Jul 2020

Posts: 29
#3

11 Jan 2022, 22:23

Originally posted by William Lisowski View Post

Code:

predict yhat if e(sample)

as described in the output of help predict.

Thanks.
Comment
Joao Santos Silva

Join Date: Apr 2014

Posts: 2962
#4

12 Jan 2022, 03:19

Dear Tracy Yang

In addition to William's helpful advice, please note that the prediction you get is the "probability of a positive outcome conditional on one positive outcome within group," which may not be what you want.

Best wishes,

Joao
Comment
Tracy Yang

Join Date: Jul 2020

Posts: 29
#5

05 Mar 2022, 05:13

Joao Santos Silva Thanks for the note. I am wondering what is the difference between pc1 (the default) and pu0?

pc1 predicted probability of a positive outcome conditional on one positive outcome within group; the default
pu0 probability of a positive outcome assuming that the fixed effect is zero
Comment
Tracy Yang

Join Date: Jul 2020

Posts: 29
#6

05 Mar 2022, 05:14

Joao Santos Silva Also linear predictioin (which is xb) will give values out of 0 and 1 range, I assume?
Comment
Joao Santos Silva

Join Date: Apr 2014

Posts: 2962
#7

05 Mar 2022, 07:06

Dear Tracy Yang,

pc1 and pc0 do exaclly as described, but none of them is interesting; you are right in saying that xb may give you values outside that range and there is no way to transform them into something useful.

How long is your panel?

Best wishes,

Joao
Comment
Tracy Yang

Join Date: Jul 2020

Posts: 29
#8

05 Mar 2022, 20:52

Dear Joao Santos Silva, would you please explain what does it mean of pc1 and pu0? Say what is the meaning of assuming zero fixed effect in pu0 and conditional on one positive outome within group in pc1? Why do you say none of them is interesting?

My dataset is cars running in different locations at different time. I would like to run regression with time and location fixed effects. Since there are multiple cars within location and time group, I am unable to set

Code:

xtset location time

.

Instead I create an id

Code:

egen id = group(location time)

and then set that id as the panel id, which is

Code:

xtset id

This id has 1.7 million unique values. And the total dataset is around 3 million.

Thanks

Tracy
Comment
Joao Santos Silva

Join Date: Apr 2014

Posts: 2962
#9

06 Mar 2022, 00:48

Dear Tracy Yang,

Please check the definition of pc1 and pc0 in the documentation. I do not think your approach is suitable. What are the dimensions your panel?

Best wishes,

Joao
Comment
Tracy Yang

Join Date: Jul 2020

Posts: 29
#10

06 Mar 2022, 06:16

Hi Joao Santos Silva, what do you mean by dimensions of the panel?

Thanks

Tracy
Comment
Joao Santos Silva

Join Date: Apr 2014

Posts: 2962
#11

06 Mar 2022, 11:39

How many ID groups and how many time periods?
Comment
Tracy Yang

Join Date: Jul 2020

Posts: 29
#12

06 Mar 2022, 20:10

Hi Joao Santos Silva, we would like to control time and location fixed effects. If we consider location as ID, it has 46K groups. If we consider time as time, it has 730 days (i.e., two year sample). But this does not work, as multiple cars can run in different locations on different time periods. So Stata gives error "repeated time values within panel"

That is why I turn to combine time and location, and set the combined id as the panel ID. In this case, it has 1.7 million groups and no time periods. The total data volume is around 3 million. Could you please let me know why this approach is not suitable? And do you have any other suggestions on alternative ways?

thanks

Tracy
Comment
Jeff Wooldridge

Join Date: Apr 2014

Posts: 2081
#13

06 Mar 2022, 21:25

There are a bunch of messy statistical issues with these estimates due to the incidental parameters problem, but if you just want to have predictions of the probabilities, it might work to just xtset the location. Then, put in dummies for each of the days.

Code:

xtset location xtlogit y x1 ... xk i.day, fe predict yhat

If you have all zeros or all ones then your prediction is a zero or a one, respectively. That's the deal when trying to predict a binary outcome using fixed effects logit.

Your approach, with not even two cars per ID on average, will have even a lot messier statistical problems than usual.
Comment
Jeff Wooldridge

Join Date: Apr 2014

Posts: 2081
#14

06 Mar 2022, 21:35

By the way, I bet a linear model estimated by FE would give similar results in the end. You'll probably get some negative fitted values and some above one, but I would just bring those into the unit interval -- a lot like you'd predict a zero probability or probability of one in the FE logit case.
Comment
Tracy Yang

Join Date: Jul 2020

Posts: 29
#15

06 Mar 2022, 21:52

Thanks Prof. Jeff Wooldridge . I wll try that.
Comment

Announcement

predict after xtlogit with fixed effect

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment