Repeated cross sectional analysis - how to start

Chiara De Siena

Join Date: Oct 2020

Posts: 28
#1

Repeated cross sectional analysis - how to start

23 Oct 2020, 05:26

Good morning you all,

I will do a repeated cross sectional analysis on 14 years. However, I am wondering how the dataset should be. Specifically, should I have all years in a single dataset? If so, how can I do it? Which are useful commands in this regard? Can you suggest some useful papers/documents about how to do a repeated cross sectional analysis?

(all variables are the same, and I know how to recode the ones who have changed categories over the years, so this is not an issue)
I don't know if this is a useful info, but I will do a logit and multinomial logistic regression.

I didn't find any answer to this basic question in the forum, but if there is, please, let me know. Thanks a lot for your time and have a nice day.

Chiara DS
Tags: None

Roman Mostazir

Join Date: Apr 2014
Posts: 870

23 Oct 2020, 07:00

Assuming by 'repeated corss sectional analysis" you meant several models for each year. You can have your dataset both in wide or long format. In the long format, you need to specify condition for selective year:

Code:

//Wide format example: y = outcome for several year, x = independent variable for several year
************************************************************************************************************
  id   y2020   y2021   y2022   y2023      x2020      x2021      x2022      x2023  
     1       0       1       0       0   .2278204   .6249384   .0883416   .1439645  
     2       1       0       0       1   .5782465   .6531338   .2020382   .3736859  
     3       1       1       0       1   .7533595   .8717756   .2142248   .9615316  
     4       1       0       1       1   .8570072   .3228789   .6291219   .0034408  
     5       1       0       0       1   .9322746   .7929984   .4517648   .9452119  


//logistic regression for each year:

forval i = 2020/2023 {
    logit y`i' x`i'
}


//Long format example: y = outcome for several year, x = independent variable for several year
************************************************************************************************************

   id   year   y          x  
     1   2020   0   .2278204  
     1   2021   1   .6249384  
     1   2022   0   .0883416  
     1   2023   0   .1439645  
     2   2020   1   .5782465  
     2   2021   0   .6531338  
     2   2022   0   .2020382  
     2   2023   1   .3736859  
     3   2020   1   .7533595  
     3   2021   1   .8717756  

//logistic regression for each year:

forval i = 2020/2023 {
    logit y x if year == `i'
}

Roman

Comment

Chiara De Siena

Join Date: Oct 2020

Posts: 28
#3

27 Oct 2020, 06:07

Thanks Roman! Have a nice day

Chiara
Comment
Jeff Wooldridge

Join Date: Apr 2014

Posts: 2121
#4

27 Oct 2020, 07:07

In my opinion, you should use the long format so that you can take advantage of Stata's built-in commands for panel data analysis. And I doubt you want to estimate a separate model for each year. I guess I'm still not sure if you are following the same individuals (or firms, or whatever the unit is) across years. Or, do you have a new sample in each year?
Comment
Chiara De Siena

Join Date: Oct 2020

Posts: 28
#5

28 Oct 2020, 03:46

Hi Jeff, I have both panel and cross sectional data available, but I do not know which kind of data suits a repeated cross sectional analysis better. Since I am not interested in the changes within individuals over time, I won't do a panel analysis so I am considering of using the cross sectional waves of EUSILC. Is this right?

Moreover, since also cross sectional data has a rotational design (respondents are followed for max 4 years), I think that the samples are not fully independent - but since I am not doing a panel analysis, I do not need to link individuals across waves (and this is also not possible using the cross-sectional data). Lastly, this dependece becomes less when you pool waves that more separated in time, like wave 1 and wave 5.
Comment
Jeff Wooldridge

Join Date: Apr 2014

Posts: 2121
#6

28 Oct 2020, 12:06

Chiara: I don't know why you would want to ignore the fact that you have panel data in doing your analysis. If you have multiple years of data for individuals you can control for individual heterogeneity. If you ignore that then you

Even if you're doing something like diff-in-diffs it is better to use the panel structure. There is no problem with a rotating panel because it just means your panel is unbalanced. You'll have to account for the same units showing up in different periods in computing standard errors, so why not exploit that in estimation, too?

I guess you should be more specific about what you hope to learn.
1 like
Comment

Announcement

Repeated cross sectional analysis - how to start

Comment

Comment

Comment

Comment

Comment