Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Regression on Continuous Variables with ANOVA Normalization

    Dear Statalisters-

    I would like to run a one-way ANOVA-type regression of a continuous variable on a categorical variable and get the output with ANOVA normalization instead of dummy-variable normalization.

    I generated some data in which the grand mean is 50 and four group means of 44, 48, 52, and 56, respectively:

    Code:
    clear
    set obs 1000
    egen group = seq(), to(4) block(250)
    set seed 112
    gen y = 50 + (12*(group-1))/3-6 + rnormal(0,6)
    reg y i.group
    I get this output:

    PHP Code:
    reg y i.group

          Source 
    |       SS           df       MS      Number of obs   =     1,000
    -------------+----------------------------------   F(3996)       =    174.74
           Model 
    |  19051.9031         3  6350.63438   Prob F        =    0.0000
        Residual 
    |  36198.6462       996  36.3440223   R-squared       =    0.3448
    -------------+----------------------------------   Adj R-squared   =    0.3429
           Total 
    |  55250.5494       999  55.3058552   Root MSE        =    6.0286

    ------------------------------------------------------------------------------
               
    |      Coef.   StdErr.      t    P>|t|     [95ConfInterval]
    -------------+----------------------------------------------------------------
           
    group |
              
    2  |   3.229101   .5392144     5.99   0.000     2.170974    4.287227
              3  
    |   7.916969   .5392144    14.68   0.000     6.858843    8.975096
              4  
    |   11.41936   .5392144    21.18   0.000     10.36123    12.47749
                 
    |
           
    _cons |   44.50601   .3812822   116.73   0.000     43.75781    45.25422
    ------------------------------------------------------------------------------ 

    This is the typical dummy variable normalization in which the first group mean is suppressed and "sent" to the constant. So, the constant, beta_0, is the mean of Group 1; the Group 2 mean is (beta_0 + beta_1); the Group 3 mean is (beta_0 + beta_2), etc.

    What I would like is for the output to be expressed as ANOVA-type normalization. In dummy variable normalization, the identifying restriction is to suppress the coefficient for one of the categories. In ANOVA-type normalization, the identifying restriction is that the coefficients sum to 0: summation beta_i = 0. So, the output for the above regression would give the grand mean, 50, as the constant, and all four groups would have regression coefficients: -6, -2, 2, and 6 (or thereabouts), respectively.

    Does anyone know how to do this?

    I greatly appreciate your help.

    Best,
    David
    Last edited by David Crow; 05 Dec 2016, 16:24.
    Web site:
    ​http://investigadores.cide.edu/crow/


    Las Américas y el Mundo:
    http://lasamericasyelmundo.cide.edu/

    ==========================================
    David Crow
    Associate Professor, División de Estudios Internacionales
    Centro de Investigación y Docencia Económicas (CIDE)
    ==========================================

  • #2
    Hi David. Does the g. contrast operator give what you're after? Using your code to simulate the data:

    Code:
    . reg y i.group
    
          Source |       SS       df       MS              Number of obs =    1000
    -------------+------------------------------           F(  3,   996) =  193.11
           Model |  20731.4776     3  6910.49253           Prob > F      =  0.0000
        Residual |  35642.6277   996  35.7857708           R-squared     =  0.3677
    -------------+------------------------------           Adj R-squared =  0.3658
           Total |  56374.1053   999  56.4305359           Root MSE      =  5.9821
    
    ------------------------------------------------------------------------------
               y |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
           group |
              2  |   3.465081   .5350572     6.48   0.000     2.415113     4.51505
              3  |   7.992682   .5350572    14.94   0.000     6.942713    9.042651
              4  |   12.04898   .5350572    22.52   0.000     10.99901    13.09895
                 |
           _cons |   44.24847   .3783425   116.95   0.000     43.50603    44.99091
    ------------------------------------------------------------------------------
    
    . contrast g.group, effects // add nowald option to suppress F-tests
    
    Contrasts of marginal linear predictions
    
    Margins      : asbalanced
    
    ------------------------------------------------
                 |         df           F        P>F
    -------------+----------------------------------
           group |
    (1 vs mean)  |          1      321.69     0.0000
    (2 vs mean)  |          1       54.17     0.0000
    (3 vs mean)  |          1       41.71     0.0000
    (4 vs mean)  |          1      354.86     0.0000
          Joint  |          3      193.11     0.0000
                 |
     Denominator |        996
    ------------------------------------------------
    
    ------------------------------------------------------------------------------
                 |   Contrast   Std. Err.      t    P>|t|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
           group |
    (1 vs mean)  |  -5.876685   .3276543   -17.94   0.000    -6.519657   -5.233714
    (2 vs mean)  |  -2.411604   .3276543    -7.36   0.000    -3.054576   -1.768632
    (3 vs mean)  |   2.115996   .3276543     6.46   0.000     1.473024    2.758968
    (4 vs mean)  |   6.172293   .3276543    18.84   0.000     5.529321    6.815265
    ------------------------------------------------------------------------------
    Notice the the 3-df F-test from contrast matches the model F-test from regress.

    HTH.
    --
    Bruce Weaver
    Email: [email protected]
    Version: Stata/MP 18.5 (Windows)

    Comment


    • #3
      Sorry. The topic should read "Regression on Categorical Variables with ANOVA Normalization." Can anyone tell me how to change a topic title, or delete the post?

      Thanks,
      David
      Web site:
      ​http://investigadores.cide.edu/crow/


      Las Américas y el Mundo:
      http://lasamericasyelmundo.cide.edu/

      ==========================================
      David Crow
      Associate Professor, División de Estudios Internacionales
      Centro de Investigación y Docencia Económicas (CIDE)
      ==========================================

      Comment


      • #4
        Bruce-

        Many thanks for your reply. The g. contrast operator gets me close, but not quite there. The problem is that the grand mean is still missing. Any other options? Is there a way to get what I want in the context of the "reg" command?

        Best,
        David
        Last edited by David Crow; 05 Dec 2016, 16:25.
        Web site:
        ​http://investigadores.cide.edu/crow/


        Las Américas y el Mundo:
        http://lasamericasyelmundo.cide.edu/

        ==========================================
        David Crow
        Associate Professor, División de Estudios Internacionales
        Centro de Investigación y Docencia Económicas (CIDE)
        ==========================================

        Comment


        • #5
          How about using margins to show the grand mean of y, like this? (Note that the original version of this post used a different, clunkier method.)

          Code:
          clear
          set obs 1000
          egen group = seq(), to(4) block(250)
          set seed 112
          gen y = 50 + (12*(group-1))/3-6 + rnormal(0,6)
          reg y i.group
          margins // display grand mean of y
          contrast g.group, effects // add nowald option to suppress F-tests
          Last edited by Bruce Weaver; 05 Dec 2016, 16:38. Reason: Original version used a clunkier method for showing the grand mean of y.
          --
          Bruce Weaver
          Email: [email protected]
          Version: Stata/MP 18.5 (Windows)

          Comment

          Working...
          X