Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • OLS regression with dummy variables versus factor variables

    Dear all,

    I am currently estimating a regression equation where the explanatory variables are only dummy variables. It is a cross-sectional data set which contains price data for several items per country. I want to regress item-country prices Pij on country and item dummies. Thus, the regression equation looks as follows: Pij= AiQi+BjCj+Eij, where Q is the item dummy, C is the country dummy, and Eij the error term.

    I am familiar with Stata dropping one categorical dummy per variable to overcome the perfect multi collinearity problem, but if I understand the econometrics correctly, if I drop the intercept term, then it is possible to include a dummy for each category right? In my case it would be country and items if I understand correctly? The reason that I ask this is because I would like to obtain a value for the coefficient for each country dummy (I am not interested in the coefficient of the item dummies), so I would like to deal with the problem of Stata omitting one dummy coefficient. If anyone has an idea how I could do this, I would greatly appreciate it. I have tried the following code:
    Code:
    reg logp  i.itemcode i.country , nocons
    but I still lose one country dummy coefficient. Related to this, I would like to ask a second question if that is okay. When trying a second method to deal with this problem, I first generated the country and item dummies with the following code:
    Code:
     tabulate iso3code,  gen(cc)
    tabulate itemcode, gen(ic)
    Afterwards, I included these dummies in my regression as follows:
    Code:
    reg logp cc* ic*, nocons
    Yet, when using this alternative method I get very different coefficients for the country dummy variables and I cannot seem to figure out why. I have posted the results below (first example is based on first method with factor variables, note here that AUS is the second country in my dataset, as the first one is the dummy that is dropped by Stata)

    Code:
    ------------------------------------------------------------------------------
            logp |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
         country |
            AUS  |   1.743436   .1003003    17.38   0.000     1.546779    1.940092
            AUT  |   1.409392   .1053655    13.38   0.000     1.202804    1.615979
            BEL  |   1.305924   .1185969    11.01   0.000     1.073393    1.538454
            BGR  |   1.269148   .0997988    12.72   0.000     1.073475    1.464821
            BRA  |   1.484607   .1019623    14.56   0.000     1.284692    1.684522
            CAN  |   1.362466   .1092622    12.47   0.000     1.148238    1.576694
            CHL  |   7.195977   .1084014    66.38   0.000     6.983437    7.408517
    Code:
    ------------------------------------------------------------------------------
            logp |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
             cc1 |   7.412792    .587841    12.61   0.000     6.260226    8.565359
             cc2 |   7.589908   .5877965    12.91   0.000     6.437428    8.742387
             cc3 |   7.221128   .5882736    12.28   0.000     6.067713    8.374543
             cc4 |   7.096145   .5897634    12.03   0.000     5.939809    8.252481
             cc5 |    7.10125   .5877189    12.08   0.000     5.948923    8.253578
             cc6 |    7.34436   .5880183    12.49   0.000     6.191446    8.497275
             cc7 |   7.177504   .5887245    12.19   0.000     6.023205    8.331803
             cc8 |   13.02219   .5886107    22.12   0.000     11.86812    14.17627
    As always, I thank you for taking the time to respond to my questions.

    Best,

    Satya
    Last edited by satya otil; 20 Feb 2020, 17:05.

  • #2
    Both of your problems are manifestations of the same underlying difficulty and misunderstanding.

    but if I understand the econometrics correctly, if I drop the intercept term, then it is possible to include a dummy for each category right?

    Well, you sort of understand it correctly, but not entirely. If you had only country and not itemcode, this would be true. But when you have more than one categorical variable, elminating the constant term does not break the colinearity. The reason is that the itemcode indicators ("dummies") sum to 1, so there is still colinearity involving the country and itemcode indicators collectively.

    Now what this points up is that in a model with both country and item code there are two colinearities operating, and eliminating the constant term only resolves one of them. The remaining colinearity involves both country and item code and leaves the model unidentified. So, to carry out a regression, Stata (or any other statistical package) must disrupt the colinear relationship by imposing some constraint or omitting some variables. The choice of which variable(s) get removed changes the coefficients estimated from the others. There is no getting around this. And it is important to remember that the coefficients you get from this kind of regression are meaningless and they do not reflect attributes of the countries or items. They are just a bunch of numbers that work for the regression equation when you add some arbitrary conditions. They are more a reflection of the arbitrary condition chosen than anything else.

    So, what to do? Well, you think you are interested in the country level coefficients. But I am sure that, in fact, you could not (and should not) care less about them. What you can and should care about is the country-level expected prices, for each item, or perhaps for all items on average. Those parameters of your model are simply not reflected in the coefficients (except in a very indirect and largely indeceipherable manner). What you want instead is the output from
    Code:
    margins country
    if you want prices averaged over all items, or,
    Code:
    margins country#itemcode
    if you want country specific expected prices for each item.

    Note: The -margins- command can only be used after a regression model with factor variable notation. Using the indicator variables created by -tab, gen()- will not work with -margins-.

    Comment


    • #3
      Hi Clyde,

      Thank you for your very extensive answer, it becomes clear to me what the problem is and thank you for highlighting the point about interpreting the coefficients of the dummies. I still am left wondering why the alternative methods give different coefficients for the dummies, is this because in one method there are more dummies dropped due to perfect multicollinearity? In principle these are two ways to estimate coefficients for dummy variables right? So why do the results differ so much?

      Best,

      Satya

      Comment


      • #4
        When comparing two approaches, I always look at the fit statistics and the df. If they are identical, it almost always means the models are equivalent but perhaps parameterized differently. So before focusing too much on the coefficients, I would first make sure the models are equivalent.
        -------------------------------------------
        Richard Williams, Notre Dame Dept of Sociology
        StataNow Version: 19.5 MP (2 processor)

        EMAIL: [email protected]
        WWW: https://www3.nd.edu/~rwilliam

        Comment


        • #5
          To Richard Williams' excellent advice I would add: if you run -predict- after both models, you will see that they give identical results. The coefficients that have the same (or, in your case, corresponding) names in the two models mean entirely different things in the two models and there is no reason to expect them to be the same, or even to resemble each other. And, importantly, in neither model do they represent the expected country-specific values.

          Comment


          • #6
            Dear Clyde and Richard,

            Thank you for your insights on this matter, it is much appreciated. This has become much clearer to me now.

            Best,

            Satya

            Comment

            Working...
            X