Regression with noconstant and two sets of dummy variables

Krista Lane

Join Date: Jun 2014

Posts: 81
#1

Regression with noconstant and two sets of dummy variables

06 Jul 2016, 11:53

I'm estimating a model that has two sets of dummy variables (for location and type of observation). When I use the noconstant option with one complete set of dummy variables, nothing is dropped. However, when I include two sets of dummy variables, Stata drops one variable:

Code:

reg cost size ibn.location ibn.type, noconstant

Does anyone have a mathematical understanding of why the noconstant option doesn't allow for two complete sets of dummy variables? Or could this just be a collinearity problem?
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#2

06 Jul 2016, 12:03

It is, indeed a collinearity problem. When you have a single set of indicator (dummy variables) and a constant, the collinearity that arises between the indicators and the constant term is because with a complete set of indicators, the sum of the indicators is always 1. (Each indicator is either 0 or 1, and all but one of them is 0 in any observation.) You get around that by omitting one indicator, corresponding to a "reference" or "base" category, and that breaks the collinearity.

Now when you have two complete sets of indicators (no reference case removed from either set), each set sums to 1 in every observation. So the sum of the first set of indicators equals the sum of the second set of indicators: and that is the collinearity.
1 like
Comment
Krista Lane

Join Date: Jun 2014

Posts: 81
#3

06 Jul 2016, 12:54

Thank you, Clyde. This make sense.
Comment
Ruben Jakobs

Join Date: May 2018

Posts: 22
#4

16 May 2018, 05:31

Clyde Schechter I run in the same problem only I can not find the code which helps me to get around this problem. Is there or do you know any?
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#5

16 May 2018, 10:09

There is no way around this. You can't get around linear algebra. It is not possible to include two complete sets of indicators in a non-constant model. You have to omit one level from one of the categorical variables. You can either choose which one to omit (see -help fvset-, with attention to -fvset base-, or, use the -ib- operator), or you can let Stata pick one if it makes no difference to you which.

If your concern is that you want to see predicted values at all combinations of levels of both categorical variables, you don't need for them all to be explicitly represented in the regression model. Just do

Code:

regression_command i.var1##i.var2 perhaps_other_variables, nocons perhaps_other_options margins var1#var2

The margins output will give you all those results, even though one of the levels of var1 or var2 will be absent from the regression output itself.

Last edited by Clyde Schechter; 16 May 2018, 10:12.
1 like
Comment

Ruben Jakobs

Join Date: May 2018
Posts: 22

21 May 2018, 03:20

Dear Clyde, thank you for your response.

You are indeed correct, sorry for the misunderstanding.

I'm currently writing my master thesis where I reproduce Heston and Rouwenhorst 1992. I wouldn't bother you to much about it, but they argue that, by construction, the constant in the regression is the european equally weighted portfolio return.

They regress, regress ibn.industry ibn.country and get values for all dummy betas since they are all relative to the constant, which in theory the european equally weighted portfolio return (defined in the dataset as eumean is).

Question:
Is there a way in Stata to restrict my regression such that I get a value for all dummies and that I can interpret these values as relative to the constant, eumean?

Code:

input double return long country1 byte industry float eumean
                   . 1 8    .009301667
                   . 1 3   -.005097404
                   0 1 0    .016462082
                   . 1 5    -.02495319
 -.04693019343986542 1 5   .0014418297
                   0 1 0    .003838392
                   0 1 1   .0006765302
                   0 1 3  -.0007444344
                   . 1 5  -.0045234524
                   . 1 9   -.009020145
                   . 1 8   -.008166097
                   . 1 9    .015156833
                   . 1 8   -.009267249
                   . 1 5    .002813685
                   . 1 5  -.0004599904
                   . 1 3    .015990013
 .038940809968847384 1 8   .0014067027
                   . 1 8   -.003939412
-.028766630708378305 1 2    .017752841
                   . 1 3   .0046346895
                   0 1 5  -.0021102552
  .04582210242587605 1 8   -.007787341
                   . 1 8    .006955021
-.019426456984273737 1 5    .007632317
                   0 1 0  -.0019239604
  .04601479046836487 1 5     .00595451
                   . 1 2  -.0044453237
                   0 1 2  -.0028929706
-.015459723352319061 1 3   -.005399399
   -.272572402044293 1 0  -.0010237095
-.025758768218425883 1 8  -.0039903144
  .01149187915305091 1 8   -.004830983
                   . 1 8  .00027958644
  .22726708845056454 1 1    .002823978
-.019894179894179957 1 8   -.001309757
                   0 1 5    -.00683727
                   . 1 1   -.007892507
                   0 1 3   -.011376578
                   . 1 6  -.0012130983
                   0 1 2  -.0022452052
                   . 1 3    .007048565
-.057142857142857065 1 1  -.0031190135
-.004716981132075476 1 5    .005249627
  .11031390134529148 1 3   .0039518373
  -.2857142857142857 1 1   -.009029237
  .01724137931034492 1 3    .003383549
 -.05767012687427905 1 8   -.006635818
                   . 1 3   -.005806014
 .010613207547169821 1 5    .005872808
                   . 1 2 -.00010122437
                   . 1 9    .002817735
                   . 1 2   -.008325396
                   . 1 5     .01303832
  .07861271676300585 1 5    .009426574
                   0 1 2    .006505816
                   0 1 2    .008076578
                   . 1 5   -.009491695
   .2777756718850689 1 1   -.022050153
                   . 1 4   -.017893007
                   . 1 8   -.011441022
   .0736196319018405 1 8   -.008348257
   .0635944700460829 1 5   .0014533884
                   . 1 2    .006948206
                   . 1 8    .022759307
                   . 1 5     .00875297
  .06666666666666672 1 2    .013767933
 .006741213483146158 1 8   .0031469495
                   . 1 2    -.00961061
                   . 1 4   -.007523719
   .0404372752198839 1 8   -.006245673
 -.03847163176402669 1 1   -.015323406
                   . 1 5     .04435108
                   0 1 1  -.0029473284
                   . 1 3   -.016179435
  .09402949759444737 1 8   .0003084856
                   . 1 8    .004910988
                   . 1 8   .0018868532
                   . 1 3   -.002550102
                   . 1 3   -.008367604
                   . 1 6  -.0013081726
 -.10009765625000003 1 3   -.004115071
-.046822742474916426 1 2    .004703355
 -.14155251141552508 1 5    -.00327035
                   . 1 4   -.009987714
                   . 1 5   -.009175543
                   . 1 8   -.014873824
                   . 1 1 -.00017275906
                   . 1 8    .001370456
  .06196784757368259 1 8   .0020699077
                   . 1 8    .006124476
                   0 1 1    .002870253
-.026186906710310882 1 8     .00962502
                   0 1 0    .001393826
                   0 1 2   -.010966062
  .08002183406113543 1 1   .0022542225
                   . 1 8   -.005699447
                   . 1 8  -.0041342042
 -.11538461538461531 1 2   -.005153734
  .06895983812907318 1 1   -.004886846
                   0 1 0    .003494086

Lots of thanks in advance.

Last edited by Ruben Jakobs; 21 May 2018, 03:35.

Comment

Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#7

21 May 2018, 14:55

They regress, regress ibn.industry ibn.country and get values for all dummy betas since they are all relative to the constant, which in theory the european equally weighted portfolio return (defined in the dataset as eumean is).

That is mathematically impossible. I do not believe it.

Heston and Rouwenhorst 1992 may be folklore in your discipline, but the reference means nothing to others (including me). A complete reference or link would perhaps enable me to check it out, if it's not behind a paywall. But I am quite confident that either you are misinterpreting their findings or they have misrepresented them.

Is there a way in Stata to restrict my regression such that I get a value for all dummies and that I can interpret these values as relative to the constant, eumean?

No, not in Stata. Not in any software. You cannot defeat linear algebra.
Comment
Ruben Jakobs

Join Date: May 2018

Posts: 22
#8

22 May 2018, 02:44

I hope this clarifies my question. Thank you a lot.

Last edited by sladmin; 22 May 2018, 09:07. Reason: Attachment removed
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35697
#9

22 May 2018, 05:35

Please don't post entire .pdf unless you are completely confident that copyright is not an issue.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#10

22 May 2018, 09:41

OK. You have mis-interpreted (or perhaps understood but incorrectly explained) what they did. They even acknowledge in the article that due to colinearity, it is not possible to obtain estimates for all of the country and industry effects. As they explain, the conventional approach would be to constrain one country and one industry coefficient to be zero (that country and that industry being the reference or base values). Instead, they chose to have a weighted sum of the country coefficients and a weighted sum of the industry coefficients be constrained to zero, which, they point out, is equivalent to setting the equal-valued mean as the reference.

You can do this in Stata using -constraint def- and -cnsreg-. Before proceeding, you should read the help files and manual sections on both of these. Your situation is complicated because the weighting required for these weighted means is not defined a priori but is data dependent, so you need to calculate those weights and build up the constraints accordingly. The code will be something like this:

Code:

levelsof country1, local(countries) levelsof industry, local(industries) local country_constraint 0 local industry_constraint 0 foreach c of local countries { count if country1 == `c' local country_constraint `country_constraint' + `r(N)'*cc`c' gen byte cc`c' = `c'.country } foreach i of local industries { count if industry == `i' local industry_constraint `industry_constraint' + `r(N)'*ii`i' gen byte ii`i' = `i'.industry } display `"`industry_constraint'"' constraint def 1 `country_constraint' = 0 constraint def 2 `industry_constraint' = 0 cnsreg ret cc* ii*, constraints(1 2)

Now, this code does not work with your example data because it does not have an adequate representation of countries and industries. But I suspect it will work, perhaps with some modifications, in your full data.

Last edited by Clyde Schechter; 22 May 2018, 09:43.
1 like
Comment
Ruben Jakobs

Join Date: May 2018

Posts: 22
#11

23 May 2018, 01:13

Thank you very much Clyde!
Comment
Ruben Jakobs

Join Date: May 2018

Posts: 22
#12

23 May 2018, 03:15

Dear Clyde,

Your code seems to work although I get an error that there are no observations.

Code:

Return code 2000 no observations; You have requested some statistical calculation and there are no observations on which to perform it. Perhaps you specified if or in and inadvertently filtered all the data.

Do you have any clue how this can be? When I open the data it seems as if there are observations.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#13

23 May 2018, 08:57

There are a couple of possibilities. If any observation has a missing value for any of the variables, that observation is excluded. If every observation has a missing value for some variable, then everything is excluded and nothing remains. The other possibility is that one of the variables you have specified is a string variable: regression commands treat those as if they consisted of all missing values.
1 like
Comment

Ruben Jakobs

Join Date: May 2018
Posts: 22

#14

24 May 2018, 01:36

Dear Clyde,

First of all thanks for all the effort. But unfortunately your code only works partly. The estimated beta's times the the number of securities in the respective industry or country equals zero although it multiplies the number of securities times the numbers of observation (991). Probably because the regression is not set for panel data. Do you know how to fix this?

And the second problem is that it still omits two dummies.

Code:

. cnsreg ret cc* ii*, constraints(1 2)
note: cc12 omitted because of collinearity
note: ii9 omitted because of collinearity

Constrained linear regression                   Number of obs     =  1,318,853
                                                F(  18,1318834)   =      30.24
                                                Prob > F          =     0.0000
                                                Root MSE          =     0.0570

 ( 1)  46463*cc1 + 87208*cc2 + 47568*cc3 + 237840*cc4 + 236090*cc5 + 48559*cc6 +
       33694*cc7 + 145677*cc8 + 94145*cc9 + 44595*cc10 + 101082*cc11 +
       518293*o.cc12 = 0
 ( 2)  44581*ii0 + 98129*ii1 + 340901*ii2 + 193278*ii3 + 96144*ii4 + 208134*ii5 +
       27733*ii6 + 56496*ii7 + 472727*ii8 + 103091*o.ii9 = 0
------------------------------------------------------------------------------
      return |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         cc1 |   .0003768   .0002855     1.32   0.187    -.0001827    .0009363
         cc2 |    -.00002   .0002044    -0.10   0.922    -.0004208    .0003807
         cc3 |    .000438    .000275     1.59   0.111    -.0001009     .000977
         cc4 |   .0007736   .0001129     6.85   0.000     .0005524    .0009949
         cc5 |    .000943   .0001164     8.10   0.000     .0007149    .0011711
         cc6 |   .0001051   .0002663     0.39   0.693    -.0004169    .0006271
         cc7 |   .0011205   .0003506     3.20   0.001     .0004333    .0018077
         cc8 |  -.0004979   .0001609    -3.09   0.002    -.0008133   -.0001825
         cc9 |  -.0041431   .0001987   -20.85   0.000    -.0045326   -.0037536
        cc10 |   .0001309   .0002855     0.46   0.647    -.0004287    .0006906
        cc11 |  -.0002903   .0002012    -1.44   0.149    -.0006846     .000104
        cc12 |          0  (omitted)
         ii0 |   .0000802   .0002988     0.27   0.788    -.0005053    .0006658
         ii1 |   .0003641   .0001948     1.87   0.062    -.0000178     .000746
         ii2 |   .0001962   .0000955     2.05   0.040     9.06e-06    .0003833
         ii3 |   .0001457   .0001318     1.11   0.269    -.0001125     .000404
         ii4 |   .0001225   .0002131     0.57   0.566    -.0002953    .0005402
         ii5 |  -.0001471   .0001314    -1.12   0.263    -.0004046    .0001103
         ii6 |   -.000773   .0003941    -1.96   0.050    -.0015455   -5.21e-07
         ii7 |  -.0010563   .0002608    -4.05   0.000    -.0015675   -.0005451
         ii8 |  -.0000727   .0000785    -0.93   0.354    -.0002267    .0000812
         ii9 |          0  (omitted)
       _cons |    .001949   .0000498    39.16   0.000     .0018515    .0020466
------------------------------------------------------------------------------

This is how the code looks now:

Code:

bysort _j: drop if industry==. 

levelsof country1, local(country1)
levelsof industry, local(industry)

local country1_constraint 0
local industry_constraint 0

foreach c of local country1 {
    count if country1 == `c'
    local country1_constraint `country1_constraint' + `r(N)'*cc`c'
    gen byte cc`c' = `c'.country1
}

foreach i of local industry {
    count if industry == `i'
    local industry_constraint `industry_constraint' + `r(N)'*ii`i'
    gen byte ii`i' = `i'.industry
}
display `"`industry_constraint'"'

constraint def 1 `country1_constraint' = 0
constraint def 2 `industry_constraint' = 0

cnsreg ret cc* ii*, noconstant constraints(1 2)

Last edited by Ruben Jakobs; 24 May 2018, 01:41.

Comment

Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#15

24 May 2018, 08:13

Hmm. It appears my approach will not do what you want. I thought that -cnsreg- applies the constraints before dealing with colinearity, but apparently that is not so. And, in any case, the panel data estimators do not accept constrained estimation. I'm afraid I don't know what else to suggest.
1 like
Comment

Announcement