Dear Statalisters,
I want to perform a binary logistic regression for a dataset where people have been split into 3 groups (grp), with binary outcome (outcome) and several explanatory variables, some of which are binary, and some continuous (x1, x2, c1, c2...); also 'age', 'sex' and 'proc' (procedure). I have used multilevel models before (-xtmixed- and -mixed-), but not done a binary logistic regression. My questions are at the end.
I started out with the following command:
The output is below:
As 'proc' perfectly predicts failure, I tried re-running without including it as a variable:
To prevent the post from being too long I have omitted the output, but this time 150 observations are used.
Having spoken to one of my colleagues (who uses SPSS), I was advised to perform a stepwise selection of variables, e.g. by removing those with significance level > 0.1. Noting some of the problems with stepwise analysis
, I nevertheless decided to try as I believe it is still used by many non-statisticians (in the medical field). I therefore ran the following:
As you can see, I am unable to specify 'grp' as a categorical variable in this.
The output:
There don't appear to be any significant predictors.
My questions therefore are:
1. Does my first -logit- command tell me all that I need to know?
2. Is it unsafe to remove 'proc' as a variable as I have done in the second command, resulting in more observations being used? i.e. is this quick fix for complete separation likely to render the results inaccurate?
3. Does the -stepwise- command I have used support the results of the first -logit- command?
4. Whichever command I use, do I then need to go on to re-run the model with a more limited number of variables to produce a 'final model'?
thanks
Jem
I want to perform a binary logistic regression for a dataset where people have been split into 3 groups (grp), with binary outcome (outcome) and several explanatory variables, some of which are binary, and some continuous (x1, x2, c1, c2...); also 'age', 'sex' and 'proc' (procedure). I have used multilevel models before (-xtmixed- and -mixed-), but not done a binary logistic regression. My questions are at the end.
I started out with the following command:
Code:
logit outcome age sex x1 c1 x2 x3 c2 proc i.grp
Code:
note: proc != 1 predicts failure perfectly proc dropped and 37 obs not used Iteration 0: log likelihood = -31.403149 Iteration 1: log likelihood = -27.613462 Iteration 2: log likelihood = -26.88484 Iteration 3: log likelihood = -26.874952 Iteration 4: log likelihood = -26.874938 Iteration 5: log likelihood = -26.874938 Logistic regression Number of obs = 113 LR chi2(9) = 9.06 Prob > chi2 = 0.4321 Log likelihood = -26.874938 Pseudo R2 = 0.1442 ------------------------------------------------------------------------------ outcome | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- age | .0382949 .032096 1.19 0.233 -.0246122 .1012019 sex | 1.076406 1.239829 0.87 0.385 -1.353614 3.506426 x1 | -.0144402 1.173195 -0.01 0.990 -2.313859 2.284979 c1 | .0438674 .0333587 1.32 0.189 -.0215144 .1092492 x2 | -1.639493 1.058371 -1.55 0.121 -3.713863 .4348764 x3 | 2.111303 1.22703 1.72 0.085 -.2936309 4.516237 c2 | .0033934 .0177173 0.19 0.848 -.0313318 .0381186 proc | 0 (omitted) | grp | 2 | -.2179836 1.343467 -0.16 0.871 -2.85113 2.415163 3 | 1.477582 1.018119 1.45 0.147 -.5178936 3.473058 | _cons | -8.467525 3.412397 -2.48 0.013 -15.1557 -1.77935 ------------------------------------------------------------------------------
As 'proc' perfectly predicts failure, I tried re-running without including it as a variable:
Code:
logit outcome age sex x1 c1 x2 x3 c2 i.grp
Having spoken to one of my colleagues (who uses SPSS), I was advised to perform a stepwise selection of variables, e.g. by removing those with significance level > 0.1. Noting some of the problems with stepwise analysis
HTML Code:
http://www.stata.com/support/faqs/st...ems/index.html
Code:
stepwise, pr(0.1): logit outcome age sex x1 c1 x2 x3 c2 proc grp
The output:
Code:
note: proc dropped because of estimability note: o.proc dropped because of estimability note: 37 obs. dropped because of estimability begin with full model p = 0.8570 >= 0.1000 removing c2 p = 0.7386 >= 0.1000 removing x1 p = 0.3661 >= 0.1000 removing sex p = 0.1929 >= 0.1000 removing age p = 0.3303 >= 0.1000 removing grp p = 0.2421 >= 0.1000 removing x2 p = 0.2371 >= 0.1000 removing x3 p = 0.2081 >= 0.1000 removing c1 Logistic regression Number of obs = 113 LR chi2(0) = 0.00 Prob > chi2 = . Log likelihood = -31.403149 Pseudo R2 = 0.0000 ------------------------------------------------------------------------------ outcome | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- _cons | -2.447166 .3474572 -7.04 0.000 -3.12817 -1.766163 ------------------------------------------------------------------------------
My questions therefore are:
1. Does my first -logit- command tell me all that I need to know?
2. Is it unsafe to remove 'proc' as a variable as I have done in the second command, resulting in more observations being used? i.e. is this quick fix for complete separation likely to render the results inaccurate?
3. Does the -stepwise- command I have used support the results of the first -logit- command?
4. Whichever command I use, do I then need to go on to re-run the model with a more limited number of variables to produce a 'final model'?
thanks
Jem
Comment