Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • coefficient larger than mean of dependent variable

    Dear all,

    I have the following model


    Code:
    reghdfe Y X1 X2 X3  ,  absorb(school_code Year) vce(cluster school_code#Year)
    where Y is binary with 5% mean (only 5% of the population have the value 1)

    I have the following result:

    Code:
    HDFE Linear regression                            Number of obs   =     20,001
    Absorbing 2 HDFE groups                           F(   3,    186) =       1.12
    Statistics robust to heteroskedasticity           Prob > F        =     0.3408
                                                      R-squared       =     0.0062
                                                      Adj R-squared   =     0.0048
                                                      Within R-sq.    =     0.0002
    Number of clusters (school_code#Year) =        187Root MSE        =     0.2235
    
                                 (Std. err. adjusted for 187 clusters in school_code#Year)
    --------------------------------------------------------------------------------------
                         |               Robust
    Y                   | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
    ---------------------+----------------------------------------------------------------
                         X1|  -.1149805   .0667706    -1.72   0.087    -.2467057    .0167446
                         X2|  -.0194905   .1469087    -0.13   0.895     -.309312    .2703309
                         X3|   .0116331   .0122768     0.95   0.345    -.0125866    .0358528
                   _cons |    .047886   .0097145     4.93   0.000     .0287213    .0670508
    --------------------------------------------------------------------------------------
    
    Absorbed degrees of freedom:
    -----------------------------------------------------+
     Absorbed FE | Categories  - Redundant  = Num. Coefs |
    -------------+---------------------------------------|
     school_code |        17           0          17     |
            Year |        11           1          10     |
    -----------------------------------------------------+

    My question is why am I having a coefficient for X1 that is higher than the mean? Do you have any ideas please?
    Note that I cannot use probit or logit models because they do not allow for interaction in clustering.

    All the best

  • #2
    First, you are using a linear probability model, so predicted probabilities outside the 0 to 1 interval are possible. They suggest that a linear probability model may not be appropriate for this data.

    But before reaching a conclusion of that nature, it is important to consider the scale of variable X1. The coefficient represents the expected change in Y associated with a 1-unit difference in X1. If the scale of variable X1 is very small, such that a unit difference in X1 is not even possible, or would be extraordinarily unusual, in the real world, then there is nothing surprising about such a coefficient. If, for example, X1 ranges only between 0.5 and 0.6, then the maximum possible difference in values of X1 is only 0.1, and correspondingly the maximum associated difference in Y would be 0.1*-.1149805 = -.01149805, which is perfectly reasonable.

    Comment


    • #3
      Clyde Schechter thank you so much for your answer.
      You are totally right. X1 varies only between 0 and 0.33. So in my text, should I interpret the results with a 0.1 variation of X1?

      Comment


      • #4
        Yes, in a situation like this, it is probably better to describe your results in terms of the effect on Y of a 0.1 difference in X1, as a unit difference in X never occurs, and perhaps doesn't even make sense at any level.

        Comment

        Working...
        X