Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • regression coefficient calculation not correct

    Hello everyone,

    My coefficient for my Construction costs index is too high I have correct construction costs index by dividing it by the cpi index and multiplied by 100 to get the real construction cost variable afterwards I have taken the log of real construction cost to get log real construction cost index. However when I run my regression I get a very high coefficient and I was wondering how I can solve this problem?

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input double ConsCost_index float(real_consCost log_real_consCost)
      100       100 4.6051702
    105.2 102.73438  4.632147
    109.3 102.53437  4.630198
    111.5 101.25672  4.617659
    113.7 101.13087 4.6164155
    115.9 101.76472 4.6226635
    119.3 102.99907   4.63472
    124.1 105.97746 4.6632266
    129.8 109.09948 4.6922603
      130 106.60252  4.669107
    130.8  105.9867 4.6633134
    133.4 106.70628   4.67008
    135.7 106.10562  4.664435
      136 103.74653 4.6419506
    137.2  102.1092  4.626043
    139.8 103.01408 4.6348658
    142.6  104.4506  4.648714
    145.9 106.54813 4.6685967
    149.6 107.74178 4.6797376
    153.8 108.91506 4.6905684
    157.2 108.50176 4.6867666
      100       100 4.6051702
    105.2 102.73438  4.632147
    109.3 102.53437  4.630198
    111.5 101.25672  4.617659
    113.7 101.13087 4.6164155
    115.9 101.76472 4.6226635
    119.3 102.99907   4.63472
    124.1 105.97746 4.6632266
    129.8 109.09948 4.6922603
      130 106.60252  4.669107
    130.8  105.9867 4.6633134
    133.4 106.70628   4.67008
    135.7 106.10562  4.664435
      136 103.74653 4.6419506
    137.2  102.1092  4.626043
    139.8 103.01408 4.6348658
    142.6  104.4506  4.648714
    145.9 106.54813 4.6685967
    149.6 107.74178 4.6797376
    153.8 108.91506 4.6905684
    157.2 108.50176 4.6867666
      100       100 4.6051702
    105.2 102.73438  4.632147
    109.3 102.53437  4.630198
    111.5 101.25672  4.617659
    113.7 101.13087 4.6164155
    115.9 101.76472 4.6226635
    119.3 102.99907   4.63472
    124.1 105.97746 4.6632266
    129.8 109.09948 4.6922603
      130 106.60252  4.669107
    130.8  105.9867 4.6633134
    133.4 106.70628   4.67008
    135.7 106.10562  4.664435
      136 103.74653 4.6419506
    137.2  102.1092  4.626043
    139.8 103.01408 4.6348658
    142.6  104.4506  4.648714
    145.9 106.54813 4.6685967
    149.6 107.74178 4.6797376
    153.8 108.91506 4.6905684
    157.2 108.50176 4.6867666
      100       100 4.6051702
    105.2 102.73438  4.632147
    109.3 102.53437  4.630198
    111.5 101.25672  4.617659
    113.7 101.13087 4.6164155
    115.9 101.76472 4.6226635
    119.3 102.99907   4.63472
    124.1 105.97746 4.6632266
    129.8 109.09948 4.6922603
      130 106.60252  4.669107
    130.8  105.9867 4.6633134
    133.4 106.70628   4.67008
    135.7 106.10562  4.664435
      136 103.74653 4.6419506
    137.2  102.1092  4.626043
    139.8 103.01408 4.6348658
    142.6  104.4506  4.648714
    145.9 106.54813 4.6685967
    149.6 107.74178 4.6797376
    153.8 108.91506 4.6905684
    157.2 108.50176 4.6867666
      100       100 4.6051702
    105.2 102.73438  4.632147
    109.3 102.53437  4.630198
    111.5 101.25672  4.617659
    113.7 101.13087 4.6164155
    115.9 101.76472 4.6226635
    119.3 102.99907   4.63472
    124.1 105.97746 4.6632266
    129.8 109.09948 4.6922603
      130 106.60252  4.669107
    130.8  105.9867 4.6633134
    133.4 106.70628   4.67008
    135.7 106.10562  4.664435
      136 103.74653 4.6419506
    137.2  102.1092  4.626043
    139.8 103.01408 4.6348658
    end


    This is my regression command and results:
    Code:
    . asdoc xtreg log_realHP log_PopulationDensity log_Population Unemployment_rate log_
    > real_consCost logReal_income real_interest i.low_dev i.Year,fe vce(robust) replace
    >  cnames(low development) save(PanelData_regression) add(Low dev Dummy,YES, Year Du
    > mmy,YES) dec(3)
    note: 2018.Year omitted because of collinearity
    note: 2019.Year omitted because of collinearity
    
    Fixed-effects (within) regression               Number of obs     =      2,532
    Group variable: GM_code                         Number of groups  =        282
    
    R-sq:                                           Obs per group:
         within  = 0.8875                                         min =          8
         between = 0.0581                                         avg =        9.0
         overall = 0.0023                                         max =          9
    
                                                    F(13,281)         =     946.78
    corr(u_i, Xb)  = -0.5914                        Prob > F          =     0.0000
    
                                       (Std. Err. adjusted for 282 clusters in GM_code)
    -----------------------------------------------------------------------------------
                      |               Robust
           log_realHP |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
    ------------------+----------------------------------------------------------------
    log_PopulationD~y |   .0364162   .0302895     1.20   0.230    -.0232068    .0960393
       log_Population |   .1433754    .076217     1.88   0.061    -.0066533    .2934041
    Unemployment_rate |     .03056   .0082342     3.71   0.000     .0143513    .0467686
    log_real_consCost |   78.75831   1.275555    61.74   0.000     76.24746    81.26917
       logReal_income |   .2192286   .1029772     2.13   0.034     .0165239    .4219334
        real_interest |   .5209448   .0086732    60.06   0.000     .5038721    .5380175
            1.low_dev |   .0109145   .0026326     4.15   0.000     .0057325    .0160966
                      |
                 Year |
                2012  |   1.041003   .0177587    58.62   0.000     1.006046     1.07596
                2013  |   2.727895   .0451684    60.39   0.000     2.638983    2.816806
                2014  |   3.375704   .0558562    60.44   0.000     3.265754    3.485653
                2015  |   2.838823   .0477437    59.46   0.000     2.744843    2.932804
                2016  |    1.79892   .0302964    59.38   0.000     1.739283    1.858557
                2017  |   .7132875   .0122971    58.00   0.000     .6890815    .7374935
                2018  |          0  (omitted)
                2019  |          0  (omitted)
                      |
                _cons |  -359.9836    6.54667   -54.99   0.000    -372.8704   -347.0969
    ------------------+----------------------------------------------------------------
              sigma_u |  .29981225
              sigma_e |  .02925436
                  rho |  .99056879   (fraction of variance due to u_i)
    ----------------------------------------------------------------------------------

  • #2
    How do you know it isn't the correct coefficient?

    Comment


    • #3
      Originally posted by Jared Greathouse View Post
      How do you know it isn't the correct coefficient?
      Well if log_real_consCost_index increases by 1 percent means that log real house prices increases by 70%, isnt that effect too large?

      Comment


      • #4
        Not at all. I'm not a housing economist, and presumably you've got some data issues, but think about it along with me. The cost of houses explaining the price of houses..... I don't know how much my house costed to build, but ours was like 160k (I think, I didn't buy it).

        Anyways, lets compare this to Jeff Bezos' house or some other home in Buckhead, Atlanta or some other luxurious area. Presumably, by virtue of these houses being much larger and in wealthier areas, a house that costed (I'm making this up) 10,000,000 to build may end up costing far more.

        I don't know and I don't have the full dataset you have, but this relationship doesn't seem entirely unreasonable.

        Comment


        • #5
          Originally posted by Jared Greathouse View Post
          Not at all. I'm not a housing economist, and presumably you've got some data issues, but think about it along with me. The cost of houses explaining the price of houses..... I don't know how much my house costed to build, but ours was like 160k (I think, I didn't buy it).

          Anyways, lets compare this to Jeff Bezos' house or some other home in Buckhead, Atlanta or some other luxurious area. Presumably, by virtue of these houses being much larger and in wealthier areas, a house that costed (I'm making this up) 10,000,000 to build may end up costing far more.

          I don't know and I don't have the full dataset you have, but this relationship doesn't seem entirely unreasonable.
          Actually your example does make sense and I think the coefficient is in this case correct. Maybe my question is stupid, but I use a panel data of 25 cities over 2000-2013 and I have the variable interest. Now my question is should each city be linked with the interest rate or should i isolate interest in my regression?

          Thank you for your time !

          Comment


          • #6
            I'm feeling lucky about my telepathy skills today, so I'll plunge in. (I may well come to regret this.) First, to be clear, I'm with Jared Greathouse on this: there is no reason to think the coefficient is wrong.

            I believe O.P. thinks the coefficient is too high because, compared to all the other coefficients in the output, it is by far the largest, with no close seconds. And he thinks to himself, it cannot possibly be that construction costs are so much more important a factor in the determination of HP (whatever that is!!!!) than anything else. What he is overlooking here is that the way he has constructed his log_real_consCosts variable, it has a very compressed range of values, whereas the other variables have a wider range of values, and he is confusing the importance of a variable with scaling effects on regression coefficients.

            Specifically, variables like log real income are going to vary over a pretty wide range across, and even within (which is what matters here with a fixed-effects model), geographic or social units (whatever a GM_code represents). On the other hand, the variation in log_real_consCosts is quite compressed. Severely so: the entire range in the example data is from just barely above 4.6 to just barely below 4.7. I'm imagining that log real income, by contrast, has a range of something like 10 to 11--an order of magnitude greater, and perhaps even wider. I'm just using log real income as an example here because I have better intuitions about what values that might take on than the other variables--but I'm betting that similar conclusions can be drawn about them as well.

            The "importance" of a variable in a regression model is not reflected cleanly in the magnitude of its coefficient: the scale of the variable also has a strong influence on that. Variables with less variability will have larger regression coefficients than those of equally "important" variables that have greater variability. People sometimes try to overcome this problem by standardizing the variables in a regression. (I do not recommend that because it introduces additional problems of its own, and it doesn't really solve this problem completely anyway. But I digress.)

            So I think that O.P. is simply not appreciating the importance of scaling as a determinant of the regression coefficient. That's my theory of why O.P. mistakenly thinks the coefficient is wrong. It is his understanding of the model, not the model itself, that needs fixing here.

            Added: Crossed with #4 and #5. I will comment on :

            Well if log_real_consCost_index increases by 1 percent means that log real house prices increases by 70%, isnt that effect too large?
            Given the restricted range of log_real_consCost_index, a 1 percent increase take you all the way from the bottom to the middle of the range of housing construction costs. That's a huge change. The same would not be true for, say, log income.
            Last edited by Clyde Schechter; 28 Jan 2022, 17:50.

            Comment

            Working...
            X