Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • XTGEE with different frequency of data collection across countries

    Hello!

    I have individual-level panel data for 12 countries which a are derived based on surveys. In these surveys, individuals are asked to self-report the prices that they pay for cigarettes.

    From each of these datasets, I am interested in calculating a country-level measure of the cigarette price distribution (e.g. the coefficient of variation for all self-reported prices; or differences between upper and lower 25, 10, 5, and 1 percentiles divided by the median price).

    The model I am estimating has the country-level distribution of cigarette prices as the dependent variable. My independent variables are all country-level variables; many of which are dummies.

    The difficulty I am facing is that the number of years (and the gaps between survey years) in each country-survey is rather random (probably a function of donor funding to run the surveys), but the same individuals in each country are still interviewed over time. The table shown in the image below shows the countries, years of data collection and number of waves available to me. As the picture shows, there is no uniformity in the survey years covered (even amongst countries that have the same number of waves - these waves are often collected in different survey years). Moreover, some countries have may more waves of data than others. E.g. Malaysia has 6 waves of data available; Zambia has two. Overall the number of countries =12 and the number of usable waves of data across all countries= 47 (as indicated, the number of survey waves differs by country).
    Click image for larger version

Name:	image_34257.png
Views:	1
Size:	498.1 KB
ID:	1748289



    Virtually all studies that have used these datasets to produce country-level dependent variables (which they then link with country-level independent variables from other sources) employ Generalised Estimating Equations using xtgee. Authors indicate that this is done to account for the correlation within the same country over time.

    The models they run have year dummies; but no country fixed-effects. I presume this is because there isn’t much within-country variation in many of their independent variables, so they rely on between country-variation by excluding the country fixed effects in the GEE framework.

    My questions are as follows:
    1. Is this dataset a special case of an unbalanced cross-country panel? Or is it just simply an unbalanced cross-country panel? Something about the combination of both differences in the years of data collection and the frequency at which the data are collected makes me feel that this is a special breed; but I do not know.

    2. Is it appropriate to use a panel estimation framework (GEE) when you aren’t using within-country variation to measure the effect of the independent variables on the dependent variable? Why not just pooled OLS with SEs clustered at the country level?

    3. Is there a method one can employ to adjust the estimation framework to account for the fact that (a) the number of waves is different across countries; and (b) the fact that the years of data collection differ across many of the countries.

    Thank you for taking the the time to read this post!
    Last edited by Pete Huckelba (StataCorp); 04 Apr 2024, 11:27. Reason: Attempt to fix sort order
Working...
X