Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Problem in setting the panel

    Hi everyone:
    I have a panel that is structured as the following:

    I have 10 countries; for every country I have a wave that refers to the survey where they took the data, that goes from 4 to 57 (from the 2020-04 to 2024-07; i.e. monthly data). In every wave they took data from people, almost 1500 people for every country in every wave, that could do a follow-up in the following months. Some people took just an interview, say in wave 4, some other took from the fourth interview to the 30th one.

    To give you an example, imagine that Mario took the first interview in Austria in 2021-04 up to the last one in 2023-01 (is not allowed to have hole, you must do the follow up the following month if you want to continue); meanwhile George took only two interview in Germany, the 2022-04 and the 2022-05.

    At the first time, when I was setting the panel data, I thought the classic xtset country_n wave; but of course It gave me back an error, i.e. the fact that there are repeated wave within every country. How should I set the panel data if I'm interesting in analyzing the differences between countries across time, and between id in every country across time? At the beginning I was thinking in just doing xtset id wave, but it is very strong unbalanced (for reasons that you can intuitively think, e.g. thinking about the heterogeneity of the number of waves taken) and doesn't work very well.

    Thank you all for the attention

  • #2
    you can leave out the time element in xtset.

    reghdfe does not require you xtset your data.

    Comment


    • #3
      If I understand #1 correctly, at least some of the individuals, indexed by variable id, are re-observed in this data. So it seems that what you have is not real panel data but a three level data design with observations nested within id and id nested within country. (Or perhaps the same person changes country during the course of observation in which case id:country is a multiple membership model.) If you just -xtset country- and proceed with fixed-effects regression models you are ignoring the repeated observations of individuals, which means you are ignoring part of the non-independence of observations.

      Strictly speaking, this kind of data does not lend itself to fixed-effects models. However, it is common practice in this situation to -xtset id- (or -xtset id wave- if you will need lags, leads, or autoregresive structure) and then use standard errors clustered at the country level in your fixed effects analyses. This does take into account the repeated-observations. This is not perfect, but the alternative is to use multilevel random-effects models, which have drawbacks of their own. Which of these imperfect approaches is better depends on what kind of effects you are interested in estimating, or hypotheses you are interested in testing, in this data.

      Comment


      • #4
        Originally posted by Clyde Schechter View Post
        If I understand #1 correctly, at least some of the individuals, indexed by variable id, are re-observed in this data. So it seems that what you have is not real panel data but a three level data design with observations nested within id and id nested within country. (Or perhaps the same person changes country during the course of observation in which case id:country is a multiple membership model.) If you just -xtset country- and proceed with fixed-effects regression models you are ignoring the repeated observations of individuals, which means you are ignoring part of the non-independence of observations.

        Strictly speaking, this kind of data does not lend itself to fixed-effects models. However, it is common practice in this situation to -xtset id- (or -xtset id wave- if you will need lags, leads, or autoregresive structure) and then use standard errors clustered at the country level in your fixed effects analyses. This does take into account the repeated-observations. This is not perfect, but the alternative is to use multilevel random-effects models, which have drawbacks of their own. Which of these imperfect approaches is better depends on what kind of effects you are interested in estimating, or hypotheses you are interested in testing, in this data.
        Dear Clyde,
        thanks a lot for the answer, you captured the idea behind the dataset. Indeed, I think that the variation within the id is crucial in my analysis since that there are many variables that are susceptible of changing during the time of the waves. Fortunately, is not the case in which is allowed to register migration from a country to another one, in that case, the observation is simply taken at the last measurement before the eventual change.

        Best,
        Riccardo

        Comment


        • #5
          Originally posted by George Ford View Post
          you can leave out the time element in xtset.

          reghdfe does not require you xtset your data.
          Dear George,
          thanks for the answers, I'm only scared of eliminating important variation within the same id during the period of the waves. Though, by doing this, am I not cancelling the effect of the variation within the period of id that took, for example, 30 interviews?
          Best
          Riccardo

          Comment


          • #6
            then xtset id wave

            Comment

            Working...
            X