Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Repeated cross sectional analysis - how to start

    Good morning you all,


    I will do a repeated cross sectional analysis on 14 years. However, I am wondering how the dataset should be. Specifically, should I have all years in a single dataset? If so, how can I do it? Which are useful commands in this regard? Can you suggest some useful papers/documents about how to do a repeated cross sectional analysis?

    (all variables are the same, and I know how to recode the ones who have changed categories over the years, so this is not an issue)
    I don't know if this is a useful info, but I will do a logit and multinomial logistic regression.


    I didn't find any answer to this basic question in the forum, but if there is, please, let me know. Thanks a lot for your time and have a nice day.

    Chiara DS

  • #2
    Assuming by 'repeated corss sectional analysis" you meant several models for each year. You can have your dataset both in wide or long format. In the long format, you need to specify condition for selective year:

    Code:
    //Wide format example: y = outcome for several year, x = independent variable for several year
    ************************************************************************************************************
      id   y2020   y2021   y2022   y2023      x2020      x2021      x2022      x2023  
         1       0       1       0       0   .2278204   .6249384   .0883416   .1439645  
         2       1       0       0       1   .5782465   .6531338   .2020382   .3736859  
         3       1       1       0       1   .7533595   .8717756   .2142248   .9615316  
         4       1       0       1       1   .8570072   .3228789   .6291219   .0034408  
         5       1       0       0       1   .9322746   .7929984   .4517648   .9452119  
    
    
    //logistic regression for each year:
    
    forval i = 2020/2023 {
        logit y`i' x`i'
    }
    
    
    //Long format example: y = outcome for several year, x = independent variable for several year
    ************************************************************************************************************
    
       id   year   y          x  
         1   2020   0   .2278204  
         1   2021   1   .6249384  
         1   2022   0   .0883416  
         1   2023   0   .1439645  
         2   2020   1   .5782465  
         2   2021   0   .6531338  
         2   2022   0   .2020382  
         2   2023   1   .3736859  
         3   2020   1   .7533595  
         3   2021   1   .8717756  
    
    //logistic regression for each year:
    
    forval i = 2020/2023 {
        logit y x if year == `i'
    }

    Roman

    Comment


    • #3
      Thanks Roman! Have a nice day

      Chiara

      Comment


      • #4
        In my opinion, you should use the long format so that you can take advantage of Stata's built-in commands for panel data analysis. And I doubt you want to estimate a separate model for each year. I guess I'm still not sure if you are following the same individuals (or firms, or whatever the unit is) across years. Or, do you have a new sample in each year?

        Comment


        • #5
          Hi Jeff, I have both panel and cross sectional data available, but I do not know which kind of data suits a repeated cross sectional analysis better. Since I am not interested in the changes within individuals over time, I won't do a panel analysis so I am considering of using the cross sectional waves of EUSILC. Is this right?

          Moreover, since also cross sectional data has a rotational design (respondents are followed for max 4 years), I think that the samples are not fully independent - but since I am not doing a panel analysis, I do not need to link individuals across waves (and this is also not possible using the cross-sectional data). Lastly, this dependece becomes less when you pool waves that more separated in time, like wave 1 and wave 5.

          Comment


          • #6
            Chiara: I don't know why you would want to ignore the fact that you have panel data in doing your analysis. If you have multiple years of data for individuals you can control for individual heterogeneity. If you ignore that then you

            Even if you're doing something like diff-in-diffs it is better to use the panel structure. There is no problem with a rotating panel because it just means your panel is unbalanced. You'll have to account for the same units showing up in different periods in computing standard errors, so why not exploit that in estimation, too?

            I guess you should be more specific about what you hope to learn.

            Comment

            Working...
            X