Problem in setting the panel

Riccardo Rasoni

Join Date: Dec 2024

Posts: 8
#1

Problem in setting the panel

20 Jan 2025, 08:00

Hi everyone:
I have a panel that is structured as the following:

I have 10 countries; for every country I have a wave that refers to the survey where they took the data, that goes from 4 to 57 (from the 2020-04 to 2024-07; i.e. monthly data). In every wave they took data from people, almost 1500 people for every country in every wave, that could do a follow-up in the following months. Some people took just an interview, say in wave 4, some other took from the fourth interview to the 30th one.

To give you an example, imagine that Mario took the first interview in Austria in 2021-04 up to the last one in 2023-01 (is not allowed to have hole, you must do the follow up the following month if you want to continue); meanwhile George took only two interview in Germany, the 2022-04 and the 2022-05.

At the first time, when I was setting the panel data, I thought the classic xtset country_n wave; but of course It gave me back an error, i.e. the fact that there are repeated wave within every country. How should I set the panel data if I'm interesting in analyzing the differences between countries across time, and between id in every country across time? At the beginning I was thinking in just doing xtset id wave, but it is very strong unbalanced (for reasons that you can intuitively think, e.g. thinking about the heterogeneity of the number of waves taken) and doesn't work very well.

Thank you all for the attention
Tags: data, panel, panel data, setting, xtset
George Ford

Join Date: Aug 2014

Posts: 2931
#2

20 Jan 2025, 10:02

you can leave out the time element in xtset.

reghdfe does not require you xtset your data.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 29683
#3

20 Jan 2025, 10:27

If I understand #1 correctly, at least some of the individuals, indexed by variable id, are re-observed in this data. So it seems that what you have is not real panel data but a three level data design with observations nested within id and id nested within country. (Or perhaps the same person changes country during the course of observation in which case id:country is a multiple membership model.) If you just -xtset country- and proceed with fixed-effects regression models you are ignoring the repeated observations of individuals, which means you are ignoring part of the non-independence of observations.

Strictly speaking, this kind of data does not lend itself to fixed-effects models. However, it is common practice in this situation to -xtset id- (or -xtset id wave- if you will need lags, leads, or autoregresive structure) and then use standard errors clustered at the country level in your fixed effects analyses. This does take into account the repeated-observations. This is not perfect, but the alternative is to use multilevel random-effects models, which have drawbacks of their own. Which of these imperfect approaches is better depends on what kind of effects you are interested in estimating, or hypotheses you are interested in testing, in this data.
Comment
Riccardo Rasoni

Join Date: Dec 2024

Posts: 8
#4

20 Jan 2025, 10:38

Originally posted by Clyde Schechter View Post

If I understand #1 correctly, at least some of the individuals, indexed by variable id, are re-observed in this data. So it seems that what you have is not real panel data but a three level data design with observations nested within id and id nested within country. (Or perhaps the same person changes country during the course of observation in which case id:country is a multiple membership model.) If you just -xtset country- and proceed with fixed-effects regression models you are ignoring the repeated observations of individuals, which means you are ignoring part of the non-independence of observations.

Strictly speaking, this kind of data does not lend itself to fixed-effects models. However, it is common practice in this situation to -xtset id- (or -xtset id wave- if you will need lags, leads, or autoregresive structure) and then use standard errors clustered at the country level in your fixed effects analyses. This does take into account the repeated-observations. This is not perfect, but the alternative is to use multilevel random-effects models, which have drawbacks of their own. Which of these imperfect approaches is better depends on what kind of effects you are interested in estimating, or hypotheses you are interested in testing, in this data.

Dear Clyde,
thanks a lot for the answer, you captured the idea behind the dataset. Indeed, I think that the variation within the id is crucial in my analysis since that there are many variables that are susceptible of changing during the time of the waves. Fortunately, is not the case in which is allowed to register migration from a country to another one, in that case, the observation is simply taken at the last measurement before the eventual change.

Best,
Riccardo
Comment
Riccardo Rasoni

Join Date: Dec 2024

Posts: 8
#5

20 Jan 2025, 10:40

Originally posted by George Ford View Post

you can leave out the time element in xtset.

reghdfe does not require you xtset your data.

Dear George,
thanks for the answers, I'm only scared of eliminating important variation within the same id during the period of the waves. Though, by doing this, am I not cancelling the effect of the variation within the period of id that took, for example, 30 interviews?
Best
Riccardo
Comment
George Ford

Join Date: Aug 2014

Posts: 2931
#6

20 Jan 2025, 11:48

then xtset id wave
Comment

Announcement

Problem in setting the panel

Comment

Comment

Comment

Comment

Comment