Mixed frequency data in xtdpdgmm package

Nursena Sagir

Join Date: Jan 2022

Posts: 25
#1

Mixed frequency data in xtdpdgmm package

20 Jan 2022, 07:28

Dear Statalisters and Sebastian Kripfganz ,

I have a panel data which consist of weekly observations of income and monthly observations of depression score for 800 individuals. I would like to estimate a dynamic GMM using depression_score as dependent variable and lag of dependent variable and income as independent variables. However, I observe these variables at different frequency (irregular time interval, time spacing) and I have missing values. For example, individual 1 has two mental health interviews at week 5 (wave 1) and week 10 (wave 2) while individual 2 has the same interview at week 3 (wave 1) and week 7 (wave 2). Thus, I could not decide how to define time in xtset for xtdpdgmm (week or wave?). I can keep only non missing depression_score and use week as a time variable. Then, would xtdpdgmm take the difference between time elapsed between two interview for different individuals? Individual 1 has 5 weeks gap between Y and L.Y while individual 2 has 4 weeks gap. Would it be a problem for the estimation?

Or should I use wave instead? This keeps time lag between Y and L.Y same for all individuals (week 3 and 5 in wave 1, week 7 and 10 in wave 2). Do you have any suggestions?

Code:

input float(id week wave) double income double float depression_score 1 1 1 100 . 1 2 1 . . 1 3 1 50 . 1 4 1 . . 1 5 1 60 12 1 6 2 . . 1 7 2 . . 1 8 2 80 . 1 9 2 . . 1 10 2 100 10 2 1 1 . . 2 2 1 50 . 2 3 1 90 8 2 4 1 . . 2 5 1 60 . 2 6 2 . . 2 7 2 100 12 2 8 2 . . 2 9 2 . . 2 10 2 . . end

Thanks in advance!

Best regards,

Last edited by Nursena Sagir; 20 Jan 2022, 07:38.
Tags: None
Sebastian Kripfganz

Join Date: May 2014

Posts: 2562
#2

20 Jan 2022, 10:28

I believe you would need to collapse your data by wave, e.g.

Code:

collapse id (sum) income, by(wave) xtset id wave

xtdpdgmm and other panel data commands do not automatically ignore the missing observations when calculating the differences.

https://www.kripfganz.de/stata/
Comment
Nursena Sagir

Join Date: Jan 2022

Posts: 25
#3

20 Jan 2022, 14:13

Thanks Sebastian. I have one further question. If I have missing values in wave variable, will xtdpgmm or any of panel data packages consider L.Y as Y[_n-1] or Y at wave[_n-1], which are not necessarily the same? Depending on that would it make sense to create variable like "gen lag_Y=L.Y if wave[_n]= wave[_n-1]+1" to keep only true lag of Y in the regression?
Comment
Sebastian Kripfganz

Join Date: May 2014

Posts: 2562
#4

21 Jan 2022, 04:47

With your current data set, you probably have set week as the time variable. L.Y would then correspond to Y in the previous week, not wave.

Also, if wave[_n] is nonmissing in your case, then wave[_n-1] is always missing. Thus, you cannot generate the lag of Y in the way you proposed. As an alternative to collapse, you could drop all observations for which wave is missing, and then set wave as the time identifier:

Code:

drop if missing(wave) xtset id wave

However, you would then lose some information from the income variable.

https://www.kripfganz.de/stata/
Comment
Nursena Sagir

Join Date: Jan 2022

Posts: 25
#5

21 Jan 2022, 05:30

Originally posted by Sebastian Kripfganz View Post

NT] you could drop all observations for which wave is missing, and then set wave as the time identifier:

Code:

drop if missing(wave) xtset id wave

However, you would then lose some information from the income variable.

Yes this what I exactly did. My questions was if I drop missing waves with this command then how panel data packages treat L.Y? Let's say individual 1 has nonmissing Y value in wave 1 and wave 3 and we drop missing wave 2. What is L.Y in the estimation for Y at wave 1 then? Is it wave 3 or Stata automatically drop individual 1 as there is no wave 2?

Moreover, to not loose information about income variable I can generate required lag of income as follows: gen lag_income=income[_n-1]. I think this is the most efficient way. What do you think?
Comment
Sebastian Kripfganz

Join Date: May 2014

Posts: 2562
#6

21 Jan 2022, 08:35

I just notice that in my previous post it should have been missing(depression_score), not missing(wave).

If wave is the time identifier and it can take values 1, 2, 3, then for wave 3 the operation L.Y refers to Y from wave 2, irrespectively of whether that observation is available. If that observation is not available, then it will not be included in any calculation you do.

When working with panel data, you need to be careful with the _n indicator. _n-1 always refers to the previous row in the data set. Yet, this could belong to a different individual. Ideally, define your time variable appropriately and use the time-series lag operator. Alternatively, use the prefix by id:

https://www.kripfganz.de/stata/
Comment
Nursena Sagir

Join Date: Jan 2022

Posts: 25
#7

22 Jan 2022, 06:30

Thanks Sebastian, your answer perfectly clarifies my question.
Comment

Announcement

Mixed frequency data in xtdpdgmm package

Comment

Comment

Comment

Comment

Comment

Comment