Hi all,
I am conducting a study where I have measured 2 variables. I know that theoretically, there is a causal relation between the two variables, such that one of the variables (cause C), leads to the other variable (effect, E). I have measured C at time point T1 (Ct1), whereas E was measured at two timepoints, T1 and T2 (Et1 and Et2, respectively). C is categorical ordinal variable containing 4 levels/categories and E is a binary variable.
I have 600 unique subject IDs total, and from each ID the variables Ct1, Et1 and Et2 were measured for 8 different body parts, yielding 4800 entries in total.
I'm using logistic regression to calculate odds ratios for how C (measured at T1) predicts the development of E at T2, corrected for the presence of E at T1. Put in other words I want to see how C predicts development of new E at T2, corrected for preexisting E at T1.
Despite a large sample, some of the cells have small numbers:
My question is should I test the model fit when I am only including two predictors in the model?
As far as I understand the Pearson Chi-Squared Goodness Test is problematic if there are few observations for some of the values of the predictor variable(s) and the Hosmer–Lemeshow Test are best with more than five predictors and works best with continuous predictors. So neither of the two test seems to be good in my case, but are there any other diagnostic tests I should run?
Also, if you have any concerns regarding my approach in general, or if there is model testing I need to do, I appreciate if you let me know.
Best wishes,
Jane
I am conducting a study where I have measured 2 variables. I know that theoretically, there is a causal relation between the two variables, such that one of the variables (cause C), leads to the other variable (effect, E). I have measured C at time point T1 (Ct1), whereas E was measured at two timepoints, T1 and T2 (Et1 and Et2, respectively). C is categorical ordinal variable containing 4 levels/categories and E is a binary variable.
I have 600 unique subject IDs total, and from each ID the variables Ct1, Et1 and Et2 were measured for 8 different body parts, yielding 4800 entries in total.
I'm using logistic regression to calculate odds ratios for how C (measured at T1) predicts the development of E at T2, corrected for the presence of E at T1. Put in other words I want to see how C predicts development of new E at T2, corrected for preexisting E at T1.
Code:
. logistic Et2 i.Ct1 Et1, vce(cluster ID) Logistic regression Number of obs = 4,800 Wald chi2(4) = 245.01 Prob > chi2 = 0.0000 Log pseudolikelihood = -386.54839 Pseudo R2 = 0.3717 (Std. Err. adjusted for 600 clusters in ID) Robust Et2 Odds Ratio Std. Err. z P>z [95% Conf. Interval] Ct1 1 3.198967 1.232098 3.02 0.003 1.503714 6.805409 2 4.993883 3.840523 2.09 0.037 1.10618 22.54504 3 55.21636 33.82345 6.55 0.000 16.62088 183.4348 Et1 53.07789 20.83338 10.12 0.000 24.59303 114.5553 _cons .0118504 .0019137 -27.47 0.000 .0086352 .0162627 Note: _cons estimates baseline odds.
Code:
. tab Ct1 Et2 Et2 Ct1 0 1 Total 0 4,438 67 4,505 1 198 38 236 2 22 8 30 3 6 23 29 Total 4,664 136 4,800
As far as I understand the Pearson Chi-Squared Goodness Test is problematic if there are few observations for some of the values of the predictor variable(s) and the Hosmer–Lemeshow Test are best with more than five predictors and works best with continuous predictors. So neither of the two test seems to be good in my case, but are there any other diagnostic tests I should run?
Also, if you have any concerns regarding my approach in general, or if there is model testing I need to do, I appreciate if you let me know.
Best wishes,
Jane
Comment