Dear Statalisters-
I would like to run a one-way ANOVA-type regression of a continuous variable on a categorical variable and get the output with ANOVA normalization instead of dummy-variable normalization.
I generated some data in which the grand mean is 50 and four group means of 44, 48, 52, and 56, respectively:
I get this output:
This is the typical dummy variable normalization in which the first group mean is suppressed and "sent" to the constant. So, the constant, beta_0, is the mean of Group 1; the Group 2 mean is (beta_0 + beta_1); the Group 3 mean is (beta_0 + beta_2), etc.
What I would like is for the output to be expressed as ANOVA-type normalization. In dummy variable normalization, the identifying restriction is to suppress the coefficient for one of the categories. In ANOVA-type normalization, the identifying restriction is that the coefficients sum to 0: summation beta_i = 0. So, the output for the above regression would give the grand mean, 50, as the constant, and all four groups would have regression coefficients: -6, -2, 2, and 6 (or thereabouts), respectively.
Does anyone know how to do this?
I greatly appreciate your help.
Best,
David
I would like to run a one-way ANOVA-type regression of a continuous variable on a categorical variable and get the output with ANOVA normalization instead of dummy-variable normalization.
I generated some data in which the grand mean is 50 and four group means of 44, 48, 52, and 56, respectively:
Code:
clear set obs 1000 egen group = seq(), to(4) block(250) set seed 112 gen y = 50 + (12*(group-1))/3-6 + rnormal(0,6) reg y i.group
PHP Code:
. reg y i.group
Source | SS df MS Number of obs = 1,000
-------------+---------------------------------- F(3, 996) = 174.74
Model | 19051.9031 3 6350.63438 Prob > F = 0.0000
Residual | 36198.6462 996 36.3440223 R-squared = 0.3448
-------------+---------------------------------- Adj R-squared = 0.3429
Total | 55250.5494 999 55.3058552 Root MSE = 6.0286
------------------------------------------------------------------------------
y | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
group |
2 | 3.229101 .5392144 5.99 0.000 2.170974 4.287227
3 | 7.916969 .5392144 14.68 0.000 6.858843 8.975096
4 | 11.41936 .5392144 21.18 0.000 10.36123 12.47749
|
_cons | 44.50601 .3812822 116.73 0.000 43.75781 45.25422
------------------------------------------------------------------------------
This is the typical dummy variable normalization in which the first group mean is suppressed and "sent" to the constant. So, the constant, beta_0, is the mean of Group 1; the Group 2 mean is (beta_0 + beta_1); the Group 3 mean is (beta_0 + beta_2), etc.
What I would like is for the output to be expressed as ANOVA-type normalization. In dummy variable normalization, the identifying restriction is to suppress the coefficient for one of the categories. In ANOVA-type normalization, the identifying restriction is that the coefficients sum to 0: summation beta_i = 0. So, the output for the above regression would give the grand mean, 50, as the constant, and all four groups would have regression coefficients: -6, -2, 2, and 6 (or thereabouts), respectively.
Does anyone know how to do this?
I greatly appreciate your help.
Best,
David
Comment