I am working on an empirical paper that has a setting like the following.
Suppose I want to know what determines whether someone goes to church. I have three snapshots for 1000 randomly picked persons when they are 5, 25, and 45 years old respectively. So the dependent variable is an ordered dummy variable indicating three categories: not going to church at all, going to church more than once but less than half of the time, and going to church more than half of the time. And the explanatory variable is family income, education, region fixed effects, calendar year fixed effects etc.
Right now, I am running three cross-sectional Ordered Logit regressions for 5 year olds, 25 year olds, and 45 year olds, because the relation between the explanatory variable and the dependent variable are likely to be very different when someone is a child, a young person, or an old mature person. Intuitively I feel there is a problem here, because whether someone goes to church when he/she is 25 or 45 should somehow depend on whether they went when they were 5 years old.
One solution might be pooling all three snapshots together and run a larger Ordered Logit with standard errors clustered at the person level, but I don't think that's good for two reasons: (1) the relation between the dependent variable and explanatory variables are not likely to the same, as imposed by one regression as we only get one sets of coefficient estimates. (2) Not sure if the inference using clustered standard errors in the panel Ordered Logit is well established, when I checked last time in Prof. Wooldridge's paper.
Another solution is to throw the dependent variable of 5-year olds as an independent variable in the other two regressions; and in the 45-year old regression, maybe control the past two dependent variables. But I also feel unease about this one, as they were dependent variables from the earlier cross sectional regressions.
What do you think should be the right kind of test that I run here? Any suggestions and discussions are greatly appreciated. Thanks in advance!!
Suppose I want to know what determines whether someone goes to church. I have three snapshots for 1000 randomly picked persons when they are 5, 25, and 45 years old respectively. So the dependent variable is an ordered dummy variable indicating three categories: not going to church at all, going to church more than once but less than half of the time, and going to church more than half of the time. And the explanatory variable is family income, education, region fixed effects, calendar year fixed effects etc.
Right now, I am running three cross-sectional Ordered Logit regressions for 5 year olds, 25 year olds, and 45 year olds, because the relation between the explanatory variable and the dependent variable are likely to be very different when someone is a child, a young person, or an old mature person. Intuitively I feel there is a problem here, because whether someone goes to church when he/she is 25 or 45 should somehow depend on whether they went when they were 5 years old.
One solution might be pooling all three snapshots together and run a larger Ordered Logit with standard errors clustered at the person level, but I don't think that's good for two reasons: (1) the relation between the dependent variable and explanatory variables are not likely to the same, as imposed by one regression as we only get one sets of coefficient estimates. (2) Not sure if the inference using clustered standard errors in the panel Ordered Logit is well established, when I checked last time in Prof. Wooldridge's paper.
Another solution is to throw the dependent variable of 5-year olds as an independent variable in the other two regressions; and in the 45-year old regression, maybe control the past two dependent variables. But I also feel unease about this one, as they were dependent variables from the earlier cross sectional regressions.
What do you think should be the right kind of test that I run here? Any suggestions and discussions are greatly appreciated. Thanks in advance!!
Comment