Regression on Continuous Variables with ANOVA Normalization

David Crow

Join Date: Apr 2014

Posts: 37
#1

Regression on Continuous Variables with ANOVA Normalization

05 Dec 2016, 15:57

Dear Statalisters-

I would like to run a one-way ANOVA-type regression of a continuous variable on a categorical variable and get the output with ANOVA normalization instead of dummy-variable normalization.

I generated some data in which the grand mean is 50 and four group means of 44, 48, 52, and 56, respectively:

Code:

clear set obs 1000 egen group = seq(), to(4) block(250) set seed 112 gen y = 50 + (12*(group-1))/3-6 + rnormal(0,6) reg y i.group

I get this output:

PHP Code:

. reg y i.group Source | SS df MS Number of obs = 1,000 -------------+---------------------------------- F(3, 996) = 174.74 Model | 19051.9031 3 6350.63438 Prob > F = 0.0000 Residual | 36198.6462 996 36.3440223 R-squared = 0.3448 -------------+---------------------------------- Adj R-squared = 0.3429 Total | 55250.5494 999 55.3058552 Root MSE = 6.0286 ------------------------------------------------------------------------------ y | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- group | 2 | 3.229101 .5392144 5.99 0.000 2.170974 4.287227 3 | 7.916969 .5392144 14.68 0.000 6.858843 8.975096 4 | 11.41936 .5392144 21.18 0.000 10.36123 12.47749 | _cons | 44.50601 .3812822 116.73 0.000 43.75781 45.25422 ------------------------------------------------------------------------------

This is the typical dummy variable normalization in which the first group mean is suppressed and "sent" to the constant. So, the constant, beta_0, is the mean of Group 1; the Group 2 mean is (beta_0 + beta_1); the Group 3 mean is (beta_0 + beta_2), etc.

What I would like is for the output to be expressed as ANOVA-type normalization. In dummy variable normalization, the identifying restriction is to suppress the coefficient for one of the categories. In ANOVA-type normalization, the identifying restriction is that the coefficients sum to 0: summation beta_i = 0. So, the output for the above regression would give the grand mean, 50, as the constant, and all four groups would have regression coefficients: -6, -2, 2, and 6 (or thereabouts), respectively.

Does anyone know how to do this?

I greatly appreciate your help.

Best,
David

Last edited by David Crow; 05 Dec 2016, 16:24.

Web site:
http://investigadores.cide.edu/crow/

Las Américas y el Mundo:
http://lasamericasyelmundo.cide.edu/

==========================================
David Crow
Associate Professor, División de Estudios Internacionales
Centro de Investigación y Docencia Económicas (CIDE)
==========================================
Tags: None

Bruce Weaver

Join Date: May 2014
Posts: 1132

05 Dec 2016, 16:13

Hi David. Does the g. contrast operator give what you're after? Using your code to simulate the data:

Code:

. reg y i.group

      Source |       SS       df       MS              Number of obs =    1000
-------------+------------------------------           F(  3,   996) =  193.11
       Model |  20731.4776     3  6910.49253           Prob > F      =  0.0000
    Residual |  35642.6277   996  35.7857708           R-squared     =  0.3677
-------------+------------------------------           Adj R-squared =  0.3658
       Total |  56374.1053   999  56.4305359           Root MSE      =  5.9821

------------------------------------------------------------------------------
           y |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
       group |
          2  |   3.465081   .5350572     6.48   0.000     2.415113     4.51505
          3  |   7.992682   .5350572    14.94   0.000     6.942713    9.042651
          4  |   12.04898   .5350572    22.52   0.000     10.99901    13.09895
             |
       _cons |   44.24847   .3783425   116.95   0.000     43.50603    44.99091
------------------------------------------------------------------------------

. contrast g.group, effects // add nowald option to suppress F-tests

Contrasts of marginal linear predictions

Margins      : asbalanced

------------------------------------------------
             |         df           F        P>F
-------------+----------------------------------
       group |
(1 vs mean)  |          1      321.69     0.0000
(2 vs mean)  |          1       54.17     0.0000
(3 vs mean)  |          1       41.71     0.0000
(4 vs mean)  |          1      354.86     0.0000
      Joint  |          3      193.11     0.0000
             |
 Denominator |        996
------------------------------------------------

------------------------------------------------------------------------------
             |   Contrast   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
       group |
(1 vs mean)  |  -5.876685   .3276543   -17.94   0.000    -6.519657   -5.233714
(2 vs mean)  |  -2.411604   .3276543    -7.36   0.000    -3.054576   -1.768632
(3 vs mean)  |   2.115996   .3276543     6.46   0.000     1.473024    2.758968
(4 vs mean)  |   6.172293   .3276543    18.84   0.000     5.529321    6.815265
------------------------------------------------------------------------------

Notice the the 3-df F-test from contrast matches the model F-test from regress.

HTH.

--
Bruce Weaver
Email: [email protected]
Version: Stata/MP 19.5 (Windows)

Comment

David Crow

Join Date: Apr 2014

Posts: 37
#3

05 Dec 2016, 16:14

Sorry. The topic should read "Regression on Categorical Variables with ANOVA Normalization." Can anyone tell me how to change a topic title, or delete the post?

Thanks,
David

Web site:
http://investigadores.cide.edu/crow/

Las Américas y el Mundo:
http://lasamericasyelmundo.cide.edu/

==========================================
David Crow
Associate Professor, División de Estudios Internacionales
Centro de Investigación y Docencia Económicas (CIDE)
==========================================
Comment
David Crow

Join Date: Apr 2014

Posts: 37
#4

05 Dec 2016, 16:16

Bruce-

Many thanks for your reply. The g. contrast operator gets me close, but not quite there. The problem is that the grand mean is still missing. Any other options? Is there a way to get what I want in the context of the "reg" command?

Best,
David

Last edited by David Crow; 05 Dec 2016, 16:25.

Web site:
http://investigadores.cide.edu/crow/

Las Américas y el Mundo:
http://lasamericasyelmundo.cide.edu/

==========================================
David Crow
Associate Professor, División de Estudios Internacionales
Centro de Investigación y Docencia Económicas (CIDE)
==========================================
Comment
Bruce Weaver

Join Date: May 2014

Posts: 1132
#5

05 Dec 2016, 16:31

How about using margins to show the grand mean of y, like this? (Note that the original version of this post used a different, clunkier method.)

Code:

clear set obs 1000 egen group = seq(), to(4) block(250) set seed 112 gen y = 50 + (12*(group-1))/3-6 + rnormal(0,6) reg y i.group margins // display grand mean of y contrast g.group, effects // add nowald option to suppress F-tests

Last edited by Bruce Weaver; 05 Dec 2016, 16:38. Reason: Original version used a clunkier method for showing the grand mean of y.

--
Bruce Weaver
Email: [email protected]
Version: Stata/MP 19.5 (Windows)
Comment

Announcement

Regression on Continuous Variables with ANOVA Normalization

Comment

Comment

Comment

Comment