Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Multivariate logistic regression

    Hi all,

    I am trying to run a multivariate logistic regression. I have some independent variables, for which I performed univariate regression analyses to see which factors are significant. No error appeared. I then put all the significant variables into a multivariate model. However, for one of the independent variable e.g. YEAR (2010, 2011, 2012, 2013), Stata indicates that there is no observations for the reference year (2010) and another year (2011) is omitted due to collinearity.

    In the table for multivariate logistic regression:

    YEAR Odds Ratio Std. Err.
    2010 1 (empty)
    2012 .7434123 .1081421
    2013 1 (omitted)

    Like to check what is the problem and is there any way to overcome this? Subsequently, in the multivariate analysis, I plan to manually remove variables that has P > 0.05, till I arrive at the final model. If I encountered the situation above, is it advisable for me to remove 'YEAR' immediately from the model and continue with my procedure? Thank you!

  • #2
    Rey:
    you seem to have a multiple logistic regression (one dependent variable; >=2 independent variables) rather than a multivariate one (>=2 dependent variables; >=2 independent variables).
    Besides, as per your explanation, there's nothing to fix with your data, as the reference year is, as expected, removed and 2011 is omitted due to collinearity.
    As an aside, targetting your efforts towards a "tailor-made" model on the ground of your data, is not an approach that I would vauch.
    Kind regards,
    Carlo
    (Stata 19.0)

    Comment


    • #3
      The empty message for 2010 is probably due to the fact that all observations in 2010 have missing values on your dependent variable.
      ---------------------------------
      Maarten L. Buis
      University of Konstanz
      Department of history and sociology
      box 40
      78457 Konstanz
      Germany
      http://www.maartenbuis.nl
      ---------------------------------

      Comment


      • #4
        Originally posted by Carlo Lazzaro View Post
        Rey:
        you seem to have a multiple logistic regression (one dependent variable; >=2 independent variables) rather than a multivariate one (>=2 dependent variables; >=2 independent variables).
        Besides, as per your explanation, there's nothing to fix with your data, as the reference year is, as expected, removed and 2011 is omitted due to collinearity.
        As an aside, targetting your efforts towards a "tailor-made" model on the ground of your data, is not an approach that I would vauch.
        Thanks for the reply. If there is nothing to fix about the data, should I still remove 'YEAR' from the multiple logistic regression?

        Comment


        • #5
          Originally posted by Maarten Buis View Post
          The empty message for 2010 is probably due to the fact that all observations in 2010 have missing values on your dependent variable.
          Thats the part that I am confused about. I have already run cross-tabulation (YEAR against outcome) and there are at least 100 observations for each year where the outcome is '1' and 3,000 observations for which the outcome is '0'.

          Comment


          • #6
            Rey:
            I would keep -YEAR- as a predictor and rather check whether the problem with 2010 is related to some selection procedure that you made when hunting for significance predictor.
            Another temptative guess may consider some perfect prediction issue, but if it were the case, Stata should have issued a dedicated warning message.
            Otherwise, it sounds strange that Stata reports an empty SE without missing data on your depvar, as Maarten suggested.
            Kind regards,
            Carlo
            (Stata 19.0)

            Comment


            • #7
              Thats the part that I am confused about. I have already run cross-tabulation (YEAR against outcome) and there are at least 100 observations for each year where the outcome is '1' and 3,000 observations for which the outcome is '0'.
              That may well be, but if each of those 100 or so observations has a missing value on one or more of the other variables in your model, then you may find there are no such observations left in the estimation sample. Remember, the only observations included are those with non-missing values on every variable in the model. It may be that there are none like that in year 2010.

              Perfect prediction is another possibility: It may be that the outcome doesn't vary among the observations from 2010 that remain in the estimation sample. But Stata will tell you that as a message near the top of the regression output.

              Comment

              Working...
              X