Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Overall effect of factor variable

    Stata 14, regress command

    I am trying to summarize the effect of a categorical predictor variable in a regression model. Using the i. prefix gives an estimate of the effect for each value of the categorical predictor, but I need the overall effect of the variable (effect size and p value would suffice). How might i accomplish this?

    Thank you.

  • #2
    Say your predictor variable is called xvar, and you have run a regression command using i.xvar as a predictor. Then running -testparm i.xvar- will give you an omnibus test of the joint significance of all levels of xvar. But I do not even know what you have in mind for an overall effect size.

    Comment


    • #3
      Thanks, Clyde. I am interested in the overall effect of the categorical predictor, the same as stata outputs for continuous predictors. I suppose I might mean 'effect' instead of 'effect size'.

      my code is very simple:
      regress ROA lnTA lnFactiva i.naics1

      I am interested in what would go in the 'Coef.' column below:
      ------------------------------------------------------------------------------
      ROA | Coef. Std. Err. t P>|t| [95% Conf. Interval]
      -------------+----------------------------------------------------------------
      lnTA | -.0088407 .0028043 -3.15 0.002 -.014354 -.0033274
      lnFactiva | .0079373 .002886 2.75 0.006 .0022634 .0136112
      |
      naics1 |
      2 | -.0376348 .0654484 -0.58 0.566 -.1663066 .0910371
      3 | .0038737 .06444 0.06 0.952 -.1228155 .130563
      4 | -.0010321 .0645695 -0.02 0.987 -.1279759 .1259117
      5 | -.0063115 .0644432 -0.10 0.922 -.1330071 .1203841
      6 | -.0387632 .0671382 -0.58 0.564 -.1707572 .0932307
      7 | -.0048737 .065561 -0.07 0.941 -.1337669 .1240195
      9 | -.0167333 .0786204 -0.21 0.832 -.1713013 .1378347
      |
      _cons | .1152539 .0678577 1.70 0.090 -.0181545 .2486624
      ------------------------------------------------------------------------------

      Many thanks.

      Comment


      • #4
        I am interested in the overall effect of the categorical predictor,
        But there is no such thing. Each level of the factor variable has its own separate effect. The "effect" of a continuous variable is defined as the associated change in the outcome corresponding to a unit change in the continuous variable. But for a categorical variable there is no such thing as a unit change in the overall variable. There is no one number that "would go in the Coef column" of the output table.

        The closest thing I can think of to an overall "effect" that one might define here is the change in R2 if you omit the categorical variable entirely from the model. But this isn't an effect on the outcome, it's an effect on the model fit.

        Comment


        • #5
          I understand; i guess i got carried away and forgot that basic fact. Thanks again for your help.

          I wonder if there is a way to get the amount of variance accounted for by all the categories? If there is such a thing for each individual category, it seems like there should be some way to arrive at a figure for total variance accounted for by all the categories.
          Last edited by Michael Kimmel; 22 Sep 2017, 14:06.

          Comment


          • #6
            You may want to look into Sheaf coefficients.

            Best
            Daniel
            ​​

            Comment


            • #7
              I wonder if there is a way to get the amount of variance accounted for by all the categories?
              Yes, that's what I alluded to in the final paragraph of #4.

              Code:
              regress ROA lnTA lnFactiva i.naics1
              local rsq_with `e(r2)'
              regress ROA lnTA lnFactiva if e(sample)
              local rsq_without `e(r2)'
              
              display "Proportion of variance accounted for by naics: " %3.2f  =`rsq_with'-`rsq_without'
              Note the -if e(sample)- in the second -regress-. This is critical, because if there are any missing values for naics1, without that restriction those observations could come back into the estimation sample, so you would be doing the second regression on a larger subset of the data than the first, and the difference in R2 would not be meaningful (and could turn out to be negative).

              Comment


              • #8
                Alternatively, you could use the nestreg prefix command. Note though that it doe not allow use of factor variables, so you have to compute your own indicator variables.

                Code:
                clear
                sysuse auto
                
                * Clyde's method
                regress mpg weight length i.rep78
                local rsq_with `e(r2)'
                regress mpg weight length if e(sample)
                local rsq_without `e(r2)'
                
                * Using -nestreg-, but note that it does not allow factor variables;
                * therefore we need to compute our own indicator variables.
                tabulate rep78, generate(r) // generate indicator variables
                nestreg : regress mpg (weight length) (r2 r3 r4 r5)
                * Compare with result using Clyde's method:
                display "Proportion of variance accounted for by rep78: " %5.4f  =`rsq_with'-`rsq_without'
                Here is the key part of the output.

                Code:
                  +-------------------------------------------------------------+
                  |       |          Block  Residual                     Change |
                  | Block |       F     df        df   Pr > F       R2    in R2 |
                  |-------+-----------------------------------------------------|
                  |     1 |   65.43      2        66   0.0000   0.6647          |
                  |     2 |    1.33      4        62   0.2706   0.6912   0.0264 |
                  +-------------------------------------------------------------+
                
                . display "Proportion of variance accounted for by rep78: " %5.4f  =`rsq_with'-`rsq_without'
                Proportion of variance accounted for by rep78: 0.0264

                --
                Bruce Weaver
                Email: [email protected]
                Version: Stata/MP 18.5 (Windows)

                Comment


                • #9
                  To extend daniel's comment: type in Stata ssc desc sheafcoef and the materials found here: http://www.maartenbuis.nl/software/sheafcoef.html
                  ---------------------------------
                  Maarten L. Buis
                  University of Konstanz
                  Department of history and sociology
                  box 40
                  78457 Konstanz
                  Germany
                  http://www.maartenbuis.nl
                  ---------------------------------

                  Comment

                  Working...
                  X