Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Solving Dummy Trap

    Hello STATA Friends,

    I'm trying to help my girlfriend with her master thesis' analysis and noticed that she fell victim to the dummy trap.
    Her model includes 3 Independent variables (IV) and consequently 3 hypotheses. All IV are dummy variables. If one of them takes the value of 1, the other two are 0. Now, when we run OLS regression, naturally STATA omits one of the IV due to perfect multicollinearity:
    (IV in bold)

    Code:
    . regress $ylist $xlist gender nationality education download vt
    note: ASVO omitted because of collinearity
    
          Source |       SS           df       MS      Number of obs   =       578
    -------------+----------------------------------   F(7, 570)       =     10.34
           Model |  14.9357089         7   2.1336727   Prob > F        =    0.0000
        Residual |  117.645606       570    .2063958   R-squared       =    0.1127
    -------------+----------------------------------   Adj R-squared   =    0.1018
           Total |  132.581315       577  .229776976   Root MSE        =    .45431
    
    ------------------------------------------------------------------------------
        accuracy |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
            ISVO |  -.3565605    .055648    -6.41   0.000    -.4658606   -.2472603
            CSVO |  -.2683451   .0460203    -5.83   0.000    -.3587352    -.177955
            ASVO |          0  (omitted)
          gender |  -.0733097   .0407658    -1.80   0.073    -.1533792    .0067598
     nationality |   .0131282   .0050501     2.60   0.010     .0032092    .0230472
       education |   .0004471   .0032192     0.14   0.890    -.0058757      .00677
        download |   .1688039   .0566191     2.98   0.003     .0575963    .2800114
              vt |   .0604796    .041079     1.47   0.141     -.020205    .1411643
           _cons |   .4201674   .0689283     6.10   0.000     .2847829     .555552
    ------------------------------------------------------------------------------
    I've seen this before but I'm not sure how to deal with this so we can test all three hypotheses. Thank you in advance guys

    Konstantin

  • #2
    I'm not sure what "three hypotheses" you mean exactly but my guess is that you want to test each against a null of 0; if that is the case, use the "hascons" option; see
    Code:
    help regress
    I am also confused by your inclusion of 2 global macros; first, note that globals are dangerous as discussed many times on this list; second, why a macro at all for the, single, outcome variable?

    Finally, your three indicator (dummy) variables are unclear to me and might be better dealt with using a single variable and factor variable notation; this would, e.g., make use of -margins- easier if you wanted to use that command

    Comment


    • #3
      What are the hypotheses? The above happens all the times when you include indicator variables so I am not sure what you want. My guess is that the third hypothesis logically follows from the other two and is therefore redundant.
      -------------------------------------------
      Richard Williams, Notre Dame Dept of Sociology
      StataNow Version: 19.5 MP (2 processor)

      EMAIL: [email protected]
      WWW: https://www3.nd.edu/~rwilliam

      Comment


      • #4
        Originally posted by Rich Goldstein View Post
        I'm not sure what "three hypotheses" you mean exactly but my guess is that you want to test each against a null of 0;
        If Rich is correct, then just interpret the constant term in the model as estimated ...

        Best
        Daniel

        Comment


        • #5
          Thank you for answers.
          @Mr Goldstein: I don't fully comprehend the issue with the two global macros but will investigate it.

          Concerning the hypotheses they go like:

          'ISVO' is negatively related to accuracy.
          'CSVO' is negatively related to accuracy.
          'ASVO' is posititvely related to accuracy.

          My guess is that the third hypothesis logically follows from the other two and is therefore redundant.

          This describes it Pretty well Mr. Williams. If either ISVO or CSVO takes the value of 1 ASVO is consequently 0. The question is how can we show that that if ISVO and CSVO are 0 (and ASVo consequently 1) accuracy will increase?

          Comment


          • #6
            I believe that the hypotheses are not very well stated. A relationship is usually described as the more x the less y. This kind of description does not make sense for indicator variables. I think the hypotheses should be reversed to read, e.g, accuracy is lower/less for ISVO [than for ASVO [and/or CSVO]]. The important thing here is to make the comparison aspect an explicit part of the hypotheses. Form that point of view, you want to follow Rich's advice and have a single variable that takes on three values: ISVO, CSVO, and ASVO. Say that variable is called x, then

            Code:
            regress y i.x
            margins x , pwcompare
            can be used to obtain the pairwise differences.

            Best
            Daniel

            Comment

            Working...
            X