Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Coding help for regression - dummy variables

    Hello!

    Quick question, and very simple to answer.

    I am wanting to look at whether or not participation in an early childhood education program will effect later school readiness. I will have two dummy variables: participation and non-participation. Since I want to the regression coefficient (or b) to be participation, how should I code my variables so that is what is being looked at in my regression?

    Eg:
    -Participation = 1, non-participation=0
    OR
    -participation=0, non-participation=1

    Thank you for the clarification, I'm just all turned around and cant seem to figure it out.

    Paige


  • #2
    Paige:
    you should create a two-level categorical precictor (say, 0=no participation; 1=participation); something along the line of:
    Code:
    . use "C:\Program Files\Stata16\ado\base\a\auto.dta"
    (1978 Automobile Data)
    
    . tab foreign
    
       Car type |      Freq.     Percent        Cum.
    ------------+-----------------------------------
       Domestic |         52       70.27       70.27
        Foreign |         22       29.73      100.00
    ------------+-----------------------------------
          Total |         74      100.00
    
    . regress price i.foreign
    
          Source |       SS           df       MS      Number of obs   =        74
    -------------+----------------------------------   F(1, 72)        =      0.17
           Model |  1507382.66         1  1507382.66   Prob > F        =    0.6802
        Residual |   633558013        72  8799416.85   R-squared       =    0.0024
    -------------+----------------------------------   Adj R-squared   =   -0.0115
           Total |   635065396        73  8699525.97   Root MSE        =    2966.4
    
    ------------------------------------------------------------------------------
           price |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
         foreign |
        Foreign  |   312.2587   754.4488     0.41   0.680    -1191.708    1816.225
           _cons |   6072.423    411.363    14.76   0.000     5252.386     6892.46
    ------------------------------------------------------------------------------
    
    .
    Please note the use of -fvvarlist- notation.
    Kind regards,
    Carlo
    (Stata 19.0)

    Comment


    • #3
      The choice is substantive and hinges on whatever you want to explain or predict. Often the state of concern is a minority or negative condition, e..g. dying, becoming sick or unemployed, being a smoker. If your concern is with explaining participation that should be the indicator (*) variable.

      (*) For an opinionated discussion of why this is a better term than "dummy", see https://www.stata-journal.com/articl...article=dm0099

      Comment


      • #4
        If you are interested in the effect of participation on something, you code your dummy to be 1 if participation; 0 otherwise. If you are interested in the effect of non-participation, then you are coding your dummy to be 1 if non-participation; 0 otherwise.

        This makes the interpretation of your results after that easier.

        Comment

        Working...
        X