Solving Dummy Trap

Konstantin Wiesemann

Join Date: Jun 2018
Posts: 27

Solving Dummy Trap

12 Jul 2018, 03:47

Hello STATA Friends,

I'm trying to help my girlfriend with her master thesis' analysis and noticed that she fell victim to the dummy trap.
Her model includes 3 Independent variables (IV) and consequently 3 hypotheses. All IV are dummy variables. If one of them takes the value of 1, the other two are 0. Now, when we run OLS regression, naturally STATA omits one of the IV due to perfect multicollinearity:
(IV in bold)

Code:

. regress $ylist $xlist gender nationality education download vt
note: ASVO omitted because of collinearity

      Source |       SS           df       MS      Number of obs   =       578
-------------+----------------------------------   F(7, 570)       =     10.34
       Model |  14.9357089         7   2.1336727   Prob > F        =    0.0000
    Residual |  117.645606       570    .2063958   R-squared       =    0.1127
-------------+----------------------------------   Adj R-squared   =    0.1018
       Total |  132.581315       577  .229776976   Root MSE        =    .45431

------------------------------------------------------------------------------
    accuracy |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
        ISVO |  -.3565605    .055648    -6.41   0.000    -.4658606   -.2472603
        CSVO |  -.2683451   .0460203    -5.83   0.000    -.3587352    -.177955
        ASVO |          0  (omitted)
      gender |  -.0733097   .0407658    -1.80   0.073    -.1533792    .0067598
 nationality |   .0131282   .0050501     2.60   0.010     .0032092    .0230472
   education |   .0004471   .0032192     0.14   0.890    -.0058757      .00677
    download |   .1688039   .0566191     2.98   0.003     .0575963    .2800114
          vt |   .0604796    .041079     1.47   0.141     -.020205    .1411643
       _cons |   .4201674   .0689283     6.10   0.000     .2847829     .555552
------------------------------------------------------------------------------

I've seen this before but I'm not sure how to deal with this so we can test all three hypotheses. Thank you in advance guys

Konstantin

Tags: None

Rich Goldstein

Join Date: Mar 2014

Posts: 4462
#2

12 Jul 2018, 05:57

I'm not sure what "three hypotheses" you mean exactly but my guess is that you want to test each against a null of 0; if that is the case, use the "hascons" option; see

Code:

help regress

I am also confused by your inclusion of 2 global macros; first, note that globals are dangerous as discussed many times on this list; second, why a macro at all for the, single, outcome variable?

Finally, your three indicator (dummy) variables are unclear to me and might be better dealt with using a single variable and factor variable notation; this would, e.g., make use of -margins- easier if you wanted to use that command
Comment
Richard Williams

Join Date: Apr 2014

Posts: 4987
#3

12 Jul 2018, 05:58

What are the hypotheses? The above happens all the times when you include indicator variables so I am not sure what you want. My guess is that the third hypothesis logically follows from the other two and is therefore redundant.

-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
StataNow Version: 19.5 MP (2 processor)
EMAIL: [email protected]
WWW: https://www3.nd.edu/~rwilliam
Comment
daniel klein

Join Date: Mar 2014

Posts: 3850
#4

12 Jul 2018, 06:07

Originally posted by Rich Goldstein View Post

I'm not sure what "three hypotheses" you mean exactly but my guess is that you want to test each against a null of 0;

If Rich is correct, then just interpret the constant term in the model as estimated ...

Best
Daniel
Comment
Konstantin Wiesemann

Join Date: Jun 2018

Posts: 27
#5

12 Jul 2018, 06:22

Thank you for answers.
@Mr Goldstein: I don't fully comprehend the issue with the two global macros but will investigate it.

Concerning the hypotheses they go like:

'ISVO' is negatively related to accuracy.

'CSVO' is negatively related to accuracy.
'ASVO' is posititvely related to accuracy.

My guess is that the third hypothesis logically follows from the other two and is therefore redundant.

This describes it Pretty well Mr. Williams. If either ISVO or CSVO takes the value of 1 ASVO is consequently 0. The question is how can we show that that if ISVO and CSVO are 0 (and ASVo consequently 1) accuracy will increase?
Comment
daniel klein

Join Date: Mar 2014

Posts: 3850
#6

12 Jul 2018, 06:48

I believe that the hypotheses are not very well stated. A relationship is usually described as the more x the less y. This kind of description does not make sense for indicator variables. I think the hypotheses should be reversed to read, e.g, accuracy is lower/less for ISVO [than for ASVO [and/or CSVO]]. The important thing here is to make the comparison aspect an explicit part of the hypotheses. Form that point of view, you want to follow Rich's advice and have a single variable that takes on three values: ISVO, CSVO, and ASVO. Say that variable is called x, then

Code:

regress y i.x margins x , pwcompare

can be used to obtain the pairwise differences.

Best
Daniel
Comment

Announcement

Solving Dummy Trap

Comment

Comment

Comment

Comment

Comment