Sorry. I should have said to run -by prepost, sort: tab HID CUTOFF3 if e(sample)-. Your -tab- output is based on your full data set with millions of observations, whereas your estimation sample contains a mere 6,684. It is possible that within the estimation sample, which is restricted both by the -if- condition in your command, and by listwise deletion of any observation containing a missing value for any model variable, there is a 0 cell somewhere in that cross tabulation.
But if that restricted cross tabulation shows no zero cells, that implies that HID#CUTOFF3 is somehow predictable from some other variable(s) in the model. The names of most of your model variables don't convey any meaning to me, so I'm not able to guess which are the most likely culprits. Try to think about the meanings of all your variables and see if any of them correspond to something that should only happen when HID and CUTOFF3 are both true (or, equally good, should never happen when HID and CUTOFF3 are both true.) If you can't come up with anything, you can always find it by brute force. Create a homebrew interaction variable: -gen ix_probe = HID*CUTOFF3-. Then regress ix_probe against all of the variable in your model other than HID#CUTOFF3. (To exclude HID#CUTOFF3, you will have to abandon the ## operator and replace that three way interaction with HID CUTOFF3 HID#postperiod CUTOFF3#postperiod and HID#CUTOFF3#postperiod.) The regression results will give you an R2 of 1.0 (or very nearly so) and the coefficients will tell you the linear combination that is causing the colinearity. Then you can decide how to remove something so that the model is properly identified when still including the HID#CUTOFF3 term. And, I hasten to add, this regression must also be carried out restricted to the estimation sample of the original analysis.
But if that restricted cross tabulation shows no zero cells, that implies that HID#CUTOFF3 is somehow predictable from some other variable(s) in the model. The names of most of your model variables don't convey any meaning to me, so I'm not able to guess which are the most likely culprits. Try to think about the meanings of all your variables and see if any of them correspond to something that should only happen when HID and CUTOFF3 are both true (or, equally good, should never happen when HID and CUTOFF3 are both true.) If you can't come up with anything, you can always find it by brute force. Create a homebrew interaction variable: -gen ix_probe = HID*CUTOFF3-. Then regress ix_probe against all of the variable in your model other than HID#CUTOFF3. (To exclude HID#CUTOFF3, you will have to abandon the ## operator and replace that three way interaction with HID CUTOFF3 HID#postperiod CUTOFF3#postperiod and HID#CUTOFF3#postperiod.) The regression results will give you an R2 of 1.0 (or very nearly so) and the coefficients will tell you the linear combination that is causing the colinearity. Then you can decide how to remove something so that the model is properly identified when still including the HID#CUTOFF3 term. And, I hasten to add, this regression must also be carried out restricted to the estimation sample of the original analysis.
Comment