Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Can anyone give me some suggestions on what is the right type of regression to run in Stata for this empirical situation?

    I am working on an empirical paper that has a setting like the following.

    Suppose I want to know what determines whether someone goes to church. I have three snapshots for 1000 randomly picked persons when they are 5, 25, and 45 years old respectively. So the dependent variable is an ordered dummy variable indicating three categories: not going to church at all, going to church more than once but less than half of the time, and going to church more than half of the time. And the explanatory variable is family income, education, region fixed effects, calendar year fixed effects etc.

    Right now, I am running three cross-sectional Ordered Logit regressions for 5 year olds, 25 year olds, and 45 year olds, because the relation between the explanatory variable and the dependent variable are likely to be very different when someone is a child, a young person, or an old mature person. Intuitively I feel there is a problem here, because whether someone goes to church when he/she is 25 or 45 should somehow depend on whether they went when they were 5 years old.

    One solution might be pooling all three snapshots together and run a larger Ordered Logit with standard errors clustered at the person level, but I don't think that's good for two reasons: (1) the relation between the dependent variable and explanatory variables are not likely to the same, as imposed by one regression as we only get one sets of coefficient estimates. (2) Not sure if the inference using clustered standard errors in the panel Ordered Logit is well established, when I checked last time in Prof. Wooldridge's paper.

    Another solution is to throw the dependent variable of 5-year olds as an independent variable in the other two regressions; and in the 45-year old regression, maybe control the past two dependent variables. But I also feel unease about this one, as they were dependent variables from the earlier cross sectional regressions.

    What do you think should be the right kind of test that I run here? Any suggestions and discussions are greatly appreciated. Thanks in advance!!

  • #2
    If the observations for 5, 25, and 45 years old refer to the same set of individuals, then you have panel data, and you should treat it as such. These categories can function as your time variable. Since it is not evident that your categories are ordered, I would suggest that you consider fixed effects (FE) multinomial logit and compare the results with FE ordered logit. The advantage here is that, with panel data, you can control for individual heterogeneity.

    Code:
    help xtmlogit
    search feologit
    Right now, I am running three cross-sectional Ordered Logit regressions for 5 year olds, 25 year olds, and 45 year olds, because the relation between the explanatory variable and the dependent variable are likely to be very different when someone is a child, a young person, or an old mature person.
    If you do not have panel data, i.e., if the age cohorts refer to different sets of individuals, you can still pool the data, create an age-cohort variable, and interact it with your other regressors. There is no need to run separate sub-sample regressions. However, note that your analysis will be descriptive in this instance.
    Last edited by Andrew Musau; 22 Jan 2024, 03:38.

    Comment

    Working...
    X