Panel data (Large N, Small, T), fractional outcome,

Tariku Getaneh

Join Date: Nov 2021

Posts: 53
#1

Panel data (Large N, Small, T), fractional outcome,

01 Dec 2024, 07:37

Dear all,
I have a panel dataset of firms (N=1800, T=16), the outcome is fractional (ranges between 0 and 1, some firms could have a value of 1 but 0 is impossible). The panel data is unbalanced i.e., there are some firms who have data only for 14 years, some for 13 years etc). Due to the fractional nature of the outcome, I am unaware of the right model, could you please suggest? Does xtreg Yit Xit, fe cluster(firmid) work well? In the situation where N is large and T is small, I could not find it now, but I read at some point Prof. Jeff suggesting random effect model, where the model includes within firm time averaged controls. i.e., xtreg Yit Xit Xit_hat, cluster(firmid) Please could you help?
Tags: None
Jeff Wooldridge

Join Date: Apr 2014

Posts: 2167
#2

01 Dec 2024, 08:33

It's the correlated random effects approach you should apply, and then you should use "fracreg logit" or "glm" with the "fam(bin) link(logit)" options. But you have to generate the time averages, indicators for the number of complete cases, and the control function (from the first stage) if you have endogeneity correlated with the time-varying part of the error term. I have a recent paper with coauthors where we describe how to do this with unbalanced panels. xtreg estimates linear models only. Here is a link to the paper:

Bates, Papke, Wooldridge

The paper has been published in Econometric Reviews. Michael should be able to provide the Stata code.
Comment
Tariku Getaneh

Join Date: Nov 2021

Posts: 53
#3

01 Dec 2024, 13:37

Prof. Jeff,

Thank you very much for your recommendations and for answering my question. I plan to read the paper soon, but I have a few additional queries, if I may:
If I use fraclogit or glm, do I still need to declare my data as panel using xtset firmid year, or can I proceed without it?

I have variables such as the proportion of customers aged 70 and above, aged 60–70, and of black ethnicity for firm i and year t. When calculating firm-level averages, should these variables be included? By “average,” I mean something like:
bys firmid: egen Xi_hat = mean(Xi) if e(sample)
e(sample) is intended to address missing values. Would this be the correct method?

This is still the early stage of my analysis, but I anticipate needing to account for lagged effects of y at some point, therefore, I would appreciate if you have a model suggestion in that case too.

Finally, the Stata code you mentioned would be incredibly helpful. Could you kindly indicate how I might access it?

Thank you again for your guidance.
Comment

Announcement

Panel data (Large N, Small, T), fractional outcome,

Comment

Comment