Pooled cross section the best way to analyze non-panel longitudinal data?

Pratap Pundir

Join Date: Oct 2018

Posts: 143
#1

Pooled cross section the best way to analyze non-panel longitudinal data?

20 Mar 2019, 15:36

There are a bunch of datasets, e.g. world values survey, alcohol usage report etc which present good amounts of data over long periods of time.

However, these aren't really panel datasets (In some cases, they can't be, e.g. many ageing reports) - i.e. the entities surveyed aren't the same over the various "waves". These are pooled cross sections.

So typical longitudinal models such as fixed effects don't make sense.

In such cases, is Pooled OLS the best way to analyze the data in terms of relationship between variables, and if so, what sort of causality can be claimed, assuming each wave grabbed random, representative samples....

Also, if Pooled OLS is the best tool for the job, how does one run a pooled OLS in Stata? Run regressions in a loop for every year and report average coefficients? (What about SE etc?) Is there a specific command for this? Or simply something like

Code:

reg y x i.year, vce(cluster country)

would serve the purpose?

Last edited by Pratap Pundir; 20 Mar 2019, 15:53.

Thank you for your help!

Stata SE/17.0, Windows 10 Enterprise
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#2

20 Mar 2019, 16:15

The ability to claim causality here is no different from claiming causality when you do a fixed-effects regression in panel data. A fixed effects regression in no way enhances the causal nature of the effects being estimated--that's a matter of study design.

There is an important difference between pooled cross sections and a panel data analysis. With panel data your effects represent changes over time within people or entities. In pooled cross sections you can speak only of changes in population prevalence of attributes, but you cannot speak of individuals changing. Observed changes in population prevalence can arise here even if no individual changes at all, due to shifts in the population sample over time. Also, whereas a fixed effects analysis will automatically adjust for time-invariant attributes of individuals, including those which are not observed, in pooled cross section all adjustments must be done explicitly, and it is not possible to adjust for unobserved variables.

Yes, -reg y x i.year- will do just fine. If you like clustered standard errors, that's fine, too. The inclusion of year-specific shocks is sensible if the nature of the outcome calls for it. In some cases a continuous linear time trend might be simpler and appropriate. And perhaps for some variable one might expect not any time effects at all. So just do it thoughtfully.

There is another issue, however. It appears that your survey is administered across several countries. Moreover, I would expect some of the variables you mention to differ appreciably by country. So it would make sense here to -xtset country- and -xtreg y x, fe- (perhaps with i.year or c.year, and with vce(cluster country) if you wish.)
2 likes
Comment

Announcement

Pooled cross section the best way to analyze non-panel longitudinal data?

Comment