Logit model including industry FE

Kathrin Me

Join Date: Sep 2021

Posts: 54
#1

Logit model including industry FE

21 Jun 2024, 17:54

Hi all,

I am working with trade data and have one observation per firm, year, and destination. I would like to predict the probability to trade as

Code:

logit exporter distance ... i.industry if inrange(year,2016,2019), vce(robust)

,
where my outcome variable exporter is an indicator variable.
Afterwards, I would like to make an out-of-sample prediction:

Code:

gen prediction_exporter = normprob(_b[cons]+_b[distance]xdistance + ... + _b[_industry__1]+_b[_industry__2] + ... _b[_industry__99]) if year == 2012

I have two questions:
1) Can I use the normal logit-command? (I found forum entries stating that if there are enough observations within each industry and if there are not too many industry categories, one can use xtlogit and include industry dummies. How is "enough" and "not too many" specified? I have 99-categories, and around half of them are omitted when estimating a logit. The minimum amount of observations per industry is 220; most industries have much more observations. It is not possible to use the xtlogit-command because I not only have one observation per firm and year, but I have one observation per firm, year and destination.)
If it is not possible to use the normal logit-command, what could be an alternative?
2) Something with my prediction is wrong, but I don't know what causes the problem. (I tried an in-sample prediction and compared it to

Code:

predict prediction_exporter, pr

, which does not give me the same values.

Unfortunately, I am not able to provide a data example due to confidentiality reasons.

Best,
Kathrin
Tags: fixed effects, logit, prediction
George Ford

Join Date: Aug 2014

Posts: 3035
#2

21 Jun 2024, 18:00

how many observations per industry? if few, you may have an incidental parameters problem.

xtlogit is a RE method, not a FE method.

What's wrong with using predict rather than generate?
Comment
Kathrin Me

Join Date: Sep 2021

Posts: 54
#3

21 Jun 2024, 20:30

George Ford : Thank you for your answer.

As indicated in the post, the smallest industry has 220 observations; 4 industries are below 1,000 observations; the largest has 7 million observations. Most categories are a 6-digit number.

How would you specify the predict command when estimating the logit for 2016-2020, but want to predict the probability for 2012?

Best,
Kathrin
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17601
#4

22 Jun 2024, 02:41

Kathrin:
you may want to go -logit- clustering yiour standard errors on -industry-.

Kind regards,
Carlo
(StataNow 18.5)
Comment
Maarten Buis

Join Date: Mar 2014

Posts: 3405
#5

22 Jun 2024, 04:17

predict prhat2012 if year ==2012

---------------------------------
Maarten L. Buis
University of Konstanz
Department of history and sociology
box 40
78457 Konstanz
Germany
http://www.maartenbuis.nl
---------------------------------
1 like
Comment
Kathrin Me

Join Date: Sep 2021

Posts: 54
#6

22 Jun 2024, 06:57

Carlo Lazzaro Maarten Buis : Thanks for your input.

Fair point with the standard errors. Carlo Lazzaro : Can I "go logit" although more than half of all industry dummies are omitted in the estimation? (I have ~50 industries being omitted, and ~35 that are not.) If I cannot "go logit", what would you suggest instead?
Also, Stata notes "676 failures and 0 successes completely determined", not sure what this means...

Maarten Buis : Sorry. I thought I had tried it and it didn't work, but I just looked at it again and it does work. The amount of predicted values is quite low compared to the total amount of observations in 2012 (~900,000 out of 1,400,000 observations have a non-missing prediction in 2012). I would still be curious to understand how you would do it manually (to get the dynamics behind the predict command.
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17601
#7

22 Jun 2024, 08:53

Kathrin:
1) if you have at least 30 (surviving) industries, you can go -vce(cluster industry)- standard errors (see Cameron_Miller_Cluster_Robust_October152013.pdf (ucdavis.edu));
2) it means that you've a limited variation across observations.

Kind regards,
Carlo
(StataNow 18.5)
Comment

George Ford

Join Date: Aug 2014
Posts: 3035

22 Jun 2024, 09:00

Code:

sysuse auto, clear

logit foreign mpg weight 
matrix b = e(b)
matrix score double xb = b
gen p = invlogit(xb)
predict xb_predict, xb
predict p_predict , pr

PS. Borrowed from Clyde.

HTML Code:

https://www.statalist.org/forums/forum/general-stata-discussion/general/1633917-manually-producing-probabilites-after-logit

Comment

Andrew Musau

Join Date: Oct 2014

Posts: 9945
#9

22 Jun 2024, 09:28

Originally posted by Kathrin Me View Post

Hi all,

I am working with trade data and have one observation per firm, year, and destination. I would like to predict the probability to trade as

Code:

logit exporter distance ... i.industry if inrange(year,2016,2019), vce(robust)

,
where my outcome variable exporter is an indicator variable.
Afterwards, I would like to make an out-of-sample prediction:

Code:

gen prediction_exporter = normprob(_b[cons]+_b[distance]xdistance + ... + _b[_industry__1]+_b[_industry__2] + ... _b[_industry__99]) if year == 2012

Something with my prediction is wrong, but I don't know what causes the problem. (I tried an in-sample prediction and compared it to

Code:

predict prediction_exporter, pr

, which does not give me the same values.

You are mixing up logit and probit. The latter applies a normal transformation from \(Xb\) to \(Pr(Xb)\), which is what you have, whereas the former applies a logistic transformation. The logistic function is very tractable, so you can do the computations by hand. You can find some illustrations here in my old lecture notes.
1 like
Comment
Kathrin Me

Join Date: Sep 2021

Posts: 54
#10

22 Jun 2024, 09:39

Okay, thank you Carlo Lazzaro, George Ford and Andrew Musau!

Last edited by Kathrin Me; 22 Jun 2024, 10:02.
Comment

George Ford

Join Date: Aug 2014
Posts: 3035

#11

22 Jun 2024, 09:48

Adding in Andrew's suggestion:

Code:

sysuse auto, clear

logit foreign mpg weight 
matrix b = e(b)
matrix score double xb = b
gen p = invlogit(xb)
predict xb_predict, xb
predict p_predict , pr
g p2 = invlogit(_b[_cons] + _b[mpg]*mpg + _b[weight]*weight)

Announcement

Logit model including industry FE

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment