Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Regression with noconstant and two sets of dummy variables

    I'm estimating a model that has two sets of dummy variables (for location and type of observation). When I use the noconstant option with one complete set of dummy variables, nothing is dropped. However, when I include two sets of dummy variables, Stata drops one variable:

    Code:
    reg cost size ibn.location ibn.type, noconstant
    Does anyone have a mathematical understanding of why the noconstant option doesn't allow for two complete sets of dummy variables? Or could this just be a collinearity problem?

  • #2
    It is, indeed a collinearity problem. When you have a single set of indicator (dummy variables) and a constant, the collinearity that arises between the indicators and the constant term is because with a complete set of indicators, the sum of the indicators is always 1. (Each indicator is either 0 or 1, and all but one of them is 0 in any observation.) You get around that by omitting one indicator, corresponding to a "reference" or "base" category, and that breaks the collinearity.

    Now when you have two complete sets of indicators (no reference case removed from either set), each set sums to 1 in every observation. So the sum of the first set of indicators equals the sum of the second set of indicators: and that is the collinearity.

    Comment


    • #3
      Thank you, Clyde. This make sense.

      Comment


      • #4
        Clyde Schechter I run in the same problem only I can not find the code which helps me to get around this problem. Is there or do you know any?

        Comment


        • #5
          There is no way around this. You can't get around linear algebra. It is not possible to include two complete sets of indicators in a non-constant model. You have to omit one level from one of the categorical variables. You can either choose which one to omit (see -help fvset-, with attention to -fvset base-, or, use the -ib- operator), or you can let Stata pick one if it makes no difference to you which.

          If your concern is that you want to see predicted values at all combinations of levels of both categorical variables, you don't need for them all to be explicitly represented in the regression model. Just do

          Code:
          regression_command i.var1##i.var2 perhaps_other_variables, nocons perhaps_other_options
          margins var1#var2
          The margins output will give you all those results, even though one of the levels of var1 or var2 will be absent from the regression output itself.
          Last edited by Clyde Schechter; 16 May 2018, 10:12.

          Comment


          • #6
            Dear Clyde, thank you for your response.

            You are indeed correct, sorry for the misunderstanding.

            I'm currently writing my master thesis where I reproduce Heston and Rouwenhorst 1992. I wouldn't bother you to much about it, but they argue that, by construction, the constant in the regression is the european equally weighted portfolio return.

            They regress, regress ibn.industry ibn.country and get values for all dummy betas since they are all relative to the constant, which in theory the european equally weighted portfolio return (defined in the dataset as eumean is).

            Question:
            Is there a way in Stata to restrict my regression such that I get a value for all dummies and that I can interpret these values as relative to the constant, eumean?

            Code:
            input double return long country1 byte industry float eumean
                               . 1 8    .009301667
                               . 1 3   -.005097404
                               0 1 0    .016462082
                               . 1 5    -.02495319
             -.04693019343986542 1 5   .0014418297
                               0 1 0    .003838392
                               0 1 1   .0006765302
                               0 1 3  -.0007444344
                               . 1 5  -.0045234524
                               . 1 9   -.009020145
                               . 1 8   -.008166097
                               . 1 9    .015156833
                               . 1 8   -.009267249
                               . 1 5    .002813685
                               . 1 5  -.0004599904
                               . 1 3    .015990013
             .038940809968847384 1 8   .0014067027
                               . 1 8   -.003939412
            -.028766630708378305 1 2    .017752841
                               . 1 3   .0046346895
                               0 1 5  -.0021102552
              .04582210242587605 1 8   -.007787341
                               . 1 8    .006955021
            -.019426456984273737 1 5    .007632317
                               0 1 0  -.0019239604
              .04601479046836487 1 5     .00595451
                               . 1 2  -.0044453237
                               0 1 2  -.0028929706
            -.015459723352319061 1 3   -.005399399
               -.272572402044293 1 0  -.0010237095
            -.025758768218425883 1 8  -.0039903144
              .01149187915305091 1 8   -.004830983
                               . 1 8  .00027958644
              .22726708845056454 1 1    .002823978
            -.019894179894179957 1 8   -.001309757
                               0 1 5    -.00683727
                               . 1 1   -.007892507
                               0 1 3   -.011376578
                               . 1 6  -.0012130983
                               0 1 2  -.0022452052
                               . 1 3    .007048565
            -.057142857142857065 1 1  -.0031190135
            -.004716981132075476 1 5    .005249627
              .11031390134529148 1 3   .0039518373
              -.2857142857142857 1 1   -.009029237
              .01724137931034492 1 3    .003383549
             -.05767012687427905 1 8   -.006635818
                               . 1 3   -.005806014
             .010613207547169821 1 5    .005872808
                               . 1 2 -.00010122437
                               . 1 9    .002817735
                               . 1 2   -.008325396
                               . 1 5     .01303832
              .07861271676300585 1 5    .009426574
                               0 1 2    .006505816
                               0 1 2    .008076578
                               . 1 5   -.009491695
               .2777756718850689 1 1   -.022050153
                               . 1 4   -.017893007
                               . 1 8   -.011441022
               .0736196319018405 1 8   -.008348257
               .0635944700460829 1 5   .0014533884
                               . 1 2    .006948206
                               . 1 8    .022759307
                               . 1 5     .00875297
              .06666666666666672 1 2    .013767933
             .006741213483146158 1 8   .0031469495
                               . 1 2    -.00961061
                               . 1 4   -.007523719
               .0404372752198839 1 8   -.006245673
             -.03847163176402669 1 1   -.015323406
                               . 1 5     .04435108
                               0 1 1  -.0029473284
                               . 1 3   -.016179435
              .09402949759444737 1 8   .0003084856
                               . 1 8    .004910988
                               . 1 8   .0018868532
                               . 1 3   -.002550102
                               . 1 3   -.008367604
                               . 1 6  -.0013081726
             -.10009765625000003 1 3   -.004115071
            -.046822742474916426 1 2    .004703355
             -.14155251141552508 1 5    -.00327035
                               . 1 4   -.009987714
                               . 1 5   -.009175543
                               . 1 8   -.014873824
                               . 1 1 -.00017275906
                               . 1 8    .001370456
              .06196784757368259 1 8   .0020699077
                               . 1 8    .006124476
                               0 1 1    .002870253
            -.026186906710310882 1 8     .00962502
                               0 1 0    .001393826
                               0 1 2   -.010966062
              .08002183406113543 1 1   .0022542225
                               . 1 8   -.005699447
                               . 1 8  -.0041342042
             -.11538461538461531 1 2   -.005153734
              .06895983812907318 1 1   -.004886846
                               0 1 0    .003494086
            Lots of thanks in advance.
            Last edited by Ruben Jakobs; 21 May 2018, 03:35.

            Comment


            • #7
              They regress, regress ibn.industry ibn.country and get values for all dummy betas since they are all relative to the constant, which in theory the european equally weighted portfolio return (defined in the dataset as eumean is).
              That is mathematically impossible. I do not believe it.

              Heston and Rouwenhorst 1992 may be folklore in your discipline, but the reference means nothing to others (including me). A complete reference or link would perhaps enable me to check it out, if it's not behind a paywall. But I am quite confident that either you are misinterpreting their findings or they have misrepresented them.

              Is there a way in Stata to restrict my regression such that I get a value for all dummies and that I can interpret these values as relative to the constant, eumean?
              No, not in Stata. Not in any software. You cannot defeat linear algebra.

              Comment


              • #8

                I hope this clarifies my question. Thank you a lot.
                Last edited by sladmin; 22 May 2018, 09:07. Reason: Attachment removed

                Comment


                • #9
                  Please don't post entire .pdf unless you are completely confident that copyright is not an issue.

                  Comment


                  • #10
                    OK. You have mis-interpreted (or perhaps understood but incorrectly explained) what they did. They even acknowledge in the article that due to colinearity, it is not possible to obtain estimates for all of the country and industry effects. As they explain, the conventional approach would be to constrain one country and one industry coefficient to be zero (that country and that industry being the reference or base values). Instead, they chose to have a weighted sum of the country coefficients and a weighted sum of the industry coefficients be constrained to zero, which, they point out, is equivalent to setting the equal-valued mean as the reference.

                    You can do this in Stata using -constraint def- and -cnsreg-. Before proceeding, you should read the help files and manual sections on both of these. Your situation is complicated because the weighting required for these weighted means is not defined a priori but is data dependent, so you need to calculate those weights and build up the constraints accordingly. The code will be something like this:

                    Code:
                    levelsof country1, local(countries)
                    levelsof industry, local(industries)
                    
                    local country_constraint 0
                    local industry_constraint 0
                    
                    foreach c of local countries {
                        count if country1 == `c'
                        local country_constraint `country_constraint' + `r(N)'*cc`c'
                        gen byte cc`c' = `c'.country
                    }
                    
                    foreach i of local industries {
                        count if industry == `i'
                        local industry_constraint `industry_constraint' + `r(N)'*ii`i'
                        gen byte ii`i' = `i'.industry
                    }
                    display `"`industry_constraint'"'
                    
                    constraint def 1 `country_constraint' = 0
                    constraint def 2 `industry_constraint' = 0
                    
                    cnsreg ret cc* ii*, constraints(1 2)
                    Now, this code does not work with your example data because it does not have an adequate representation of countries and industries. But I suspect it will work, perhaps with some modifications, in your full data.
                    Last edited by Clyde Schechter; 22 May 2018, 09:43.

                    Comment


                    • #11
                      Thank you very much Clyde!

                      Comment


                      • #12
                        Dear Clyde,

                        Your code seems to work although I get an error that there are no observations.

                        Code:
                         Return code 2000
                                no observations;
                                You have requested some statistical calculation and there are
                                no observations on which to perform it.  Perhaps you specified
                                if or in and inadvertently filtered all the data.
                        Do you have any clue how this can be? When I open the data it seems as if there are observations.

                        Comment


                        • #13
                          There are a couple of possibilities. If any observation has a missing value for any of the variables, that observation is excluded. If every observation has a missing value for some variable, then everything is excluded and nothing remains. The other possibility is that one of the variables you have specified is a string variable: regression commands treat those as if they consisted of all missing values.

                          Comment


                          • #14
                            Dear Clyde,

                            First of all thanks for all the effort. But unfortunately your code only works partly. The estimated beta's times the the number of securities in the respective industry or country equals zero although it multiplies the number of securities times the numbers of observation (991). Probably because the regression is not set for panel data. Do you know how to fix this?

                            And the second problem is that it still omits two dummies.

                            Code:
                            . cnsreg ret cc* ii*, constraints(1 2)
                            note: cc12 omitted because of collinearity
                            note: ii9 omitted because of collinearity
                            
                            Constrained linear regression                   Number of obs     =  1,318,853
                                                                            F(  18,1318834)   =      30.24
                                                                            Prob > F          =     0.0000
                                                                            Root MSE          =     0.0570
                            
                             ( 1)  46463*cc1 + 87208*cc2 + 47568*cc3 + 237840*cc4 + 236090*cc5 + 48559*cc6 +
                                   33694*cc7 + 145677*cc8 + 94145*cc9 + 44595*cc10 + 101082*cc11 +
                                   518293*o.cc12 = 0
                             ( 2)  44581*ii0 + 98129*ii1 + 340901*ii2 + 193278*ii3 + 96144*ii4 + 208134*ii5 +
                                   27733*ii6 + 56496*ii7 + 472727*ii8 + 103091*o.ii9 = 0
                            ------------------------------------------------------------------------------
                                  return |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
                            -------------+----------------------------------------------------------------
                                     cc1 |   .0003768   .0002855     1.32   0.187    -.0001827    .0009363
                                     cc2 |    -.00002   .0002044    -0.10   0.922    -.0004208    .0003807
                                     cc3 |    .000438    .000275     1.59   0.111    -.0001009     .000977
                                     cc4 |   .0007736   .0001129     6.85   0.000     .0005524    .0009949
                                     cc5 |    .000943   .0001164     8.10   0.000     .0007149    .0011711
                                     cc6 |   .0001051   .0002663     0.39   0.693    -.0004169    .0006271
                                     cc7 |   .0011205   .0003506     3.20   0.001     .0004333    .0018077
                                     cc8 |  -.0004979   .0001609    -3.09   0.002    -.0008133   -.0001825
                                     cc9 |  -.0041431   .0001987   -20.85   0.000    -.0045326   -.0037536
                                    cc10 |   .0001309   .0002855     0.46   0.647    -.0004287    .0006906
                                    cc11 |  -.0002903   .0002012    -1.44   0.149    -.0006846     .000104
                                    cc12 |          0  (omitted)
                                     ii0 |   .0000802   .0002988     0.27   0.788    -.0005053    .0006658
                                     ii1 |   .0003641   .0001948     1.87   0.062    -.0000178     .000746
                                     ii2 |   .0001962   .0000955     2.05   0.040     9.06e-06    .0003833
                                     ii3 |   .0001457   .0001318     1.11   0.269    -.0001125     .000404
                                     ii4 |   .0001225   .0002131     0.57   0.566    -.0002953    .0005402
                                     ii5 |  -.0001471   .0001314    -1.12   0.263    -.0004046    .0001103
                                     ii6 |   -.000773   .0003941    -1.96   0.050    -.0015455   -5.21e-07
                                     ii7 |  -.0010563   .0002608    -4.05   0.000    -.0015675   -.0005451
                                     ii8 |  -.0000727   .0000785    -0.93   0.354    -.0002267    .0000812
                                     ii9 |          0  (omitted)
                                   _cons |    .001949   .0000498    39.16   0.000     .0018515    .0020466
                            ------------------------------------------------------------------------------
                            This is how the code looks now:
                            Code:
                            bysort _j: drop if industry==. 
                            
                            levelsof country1, local(country1)
                            levelsof industry, local(industry)
                            
                            local country1_constraint 0
                            local industry_constraint 0
                            
                            foreach c of local country1 {
                                count if country1 == `c'
                                local country1_constraint `country1_constraint' + `r(N)'*cc`c'
                                gen byte cc`c' = `c'.country1
                            }
                            
                            foreach i of local industry {
                                count if industry == `i'
                                local industry_constraint `industry_constraint' + `r(N)'*ii`i'
                                gen byte ii`i' = `i'.industry
                            }
                            display `"`industry_constraint'"'
                            
                            constraint def 1 `country1_constraint' = 0
                            constraint def 2 `industry_constraint' = 0
                            
                            cnsreg ret cc* ii*, noconstant constraints(1 2)
                            Last edited by Ruben Jakobs; 24 May 2018, 01:41.

                            Comment


                            • #15
                              Hmm. It appears my approach will not do what you want. I thought that -cnsreg- applies the constraints before dealing with colinearity, but apparently that is not so. And, in any case, the panel data estimators do not accept constrained estimation. I'm afraid I don't know what else to suggest.

                              Comment

                              Working...
                              X