Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Margins command and factor variable notations.

    I think that it is a fairly basic question but I obtain different coefficients when I run models with and without factor variable notations.

    (1) When I use margins command, is it necessary to use factor variable notations even without interaction terms in the model?

    (2) Also, is it necessary to use factor variables even when I do not use margins command following the regression? I am very confused because I get different coefficients depending on whether I include factor variable notations. In other words, which coefficient should be reported in the paper with or without factor variable notations?

  • #2
    April:
    the one size-fits-all reply states that is a very good habit to use -fvvarlist- notation, especially when you're dealing with categorical variables (and, obviously, interactions).
    As you noticed, neglecting -fvvarlist- notation can produce unreliable/ridicolous results, as reported in the following toy-example (where the second regeression code makes no sense at all, even though -rep78- is a count variable):
    Code:
    . use "C:\Program Files\Stata17\ado\base\a\auto.dta"
    (1978 automobile data)
    
    . regress price i.rep78
    
          Source |       SS           df       MS      Number of obs   =        69
    -------------+----------------------------------   F(4, 64)        =      0.24
           Model |  8360542.63         4  2090135.66   Prob > F        =    0.9174
        Residual |   568436416        64     8881819   R-squared       =    0.0145
    -------------+----------------------------------   Adj R-squared   =   -0.0471
           Total |   576796959        68  8482308.22   Root MSE        =    2980.2
    
    ------------------------------------------------------------------------------
           price | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
    -------------+----------------------------------------------------------------
           rep78 |
              2  |   1403.125   2356.085     0.60   0.554    -3303.696    6109.946
              3  |   1864.733   2176.458     0.86   0.395    -2483.242    6212.708
              4  |       1507   2221.338     0.68   0.500    -2930.633    5944.633
              5  |     1348.5   2290.927     0.59   0.558    -3228.153    5925.153
                 |
           _cons |     4564.5   2107.347     2.17   0.034     354.5913    8774.409
    ------------------------------------------------------------------------------
    
    . regress price rep78
    
          Source |       SS           df       MS      Number of obs   =        69
    -------------+----------------------------------   F(1, 67)        =      0.00
           Model |  24770.7652         1  24770.7652   Prob > F        =    0.9574
        Residual |   576772188        67  8608540.12   R-squared       =    0.0000
    -------------+----------------------------------   Adj R-squared   =   -0.0149
           Total |   576796959        68  8482308.22   Root MSE        =      2934
    
    ------------------------------------------------------------------------------
           price | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
    -------------+----------------------------------------------------------------
           rep78 |   19.28012   359.4221     0.05   0.957    -698.1295    736.6897
           _cons |   6080.379    1274.06     4.77   0.000     3537.345    8623.413
    ------------------------------------------------------------------------------
    
    .
    Last edited by Carlo Lazzaro; 16 Mar 2022, 04:27.
    Kind regards,
    Carlo
    (StataNow 18.5)

    Comment


    • #3
      Thank you but I am still confused because I do not recall that I have seen articles which report different coefficients of each value of categorical variables in the regression analyses.

      Comment


      • #4
        April:
        -fvvarlist- notation has a greater flexibility than simply plugging the categorical predictor in without the -i.- prefix. For instance we can choose the reference category (and switch it in a very comfortable way).
        Code:
        use "C:\Program Files\Stata17\ado\base\a\auto.dta"
        . regress price i0.foreign
        
              Source |       SS           df       MS      Number of obs   =        74
        -------------+----------------------------------   F(1, 72)        =      0.17
               Model |  1507382.66         1  1507382.66   Prob > F        =    0.6802
            Residual |   633558013        72  8799416.85   R-squared       =    0.0024
        -------------+----------------------------------   Adj R-squared   =   -0.0115
               Total |   635065396        73  8699525.97   Root MSE        =    2966.4
        
        ------------------------------------------------------------------------------
               price | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
        -------------+----------------------------------------------------------------
             foreign |
           Domestic  |  -312.2587   754.4488    -0.41   0.680    -1816.225    1191.708
               _cons |   6384.682   632.4346    10.10   0.000     5123.947    7645.417
        ------------------------------------------------------------------------------
        
        . regress price i1.foreign
        
              Source |       SS           df       MS      Number of obs   =        74
        -------------+----------------------------------   F(1, 72)        =      0.17
               Model |  1507382.66         1  1507382.66   Prob > F        =    0.6802
            Residual |   633558013        72  8799416.85   R-squared       =    0.0024
        -------------+----------------------------------   Adj R-squared   =   -0.0115
               Total |   635065396        73  8699525.97   Root MSE        =    2966.4
        
        ------------------------------------------------------------------------------
               price | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
        -------------+----------------------------------------------------------------
             foreign |
            Foreign  |   312.2587   754.4488     0.41   0.680    -1191.708    1816.225
               _cons |   6072.423    411.363    14.76   0.000     5252.386     6892.46
        ------------------------------------------------------------------------------
        
        .
        As an aside, a reference category is mandatory to avoid the so called dummy trap (
        https://en.wikipedia.org/wiki/Dummy_variable_(statistics)
        ).
        Kind regards,
        Carlo
        (StataNow 18.5)

        Comment


        • #5
          Should factor variable notations be used for ordinal variables as well?
          Last edited by April Kimm; 18 Mar 2022, 01:52.

          Comment


          • #6
            April:
            see https://www.statalist.org/forums/for...ting-variables.
            In addition, if you have an ordinal predictor (say, good; discrete; bad) you can use -fvvarlist- notation.
            Kind regards,
            Carlo
            (StataNow 18.5)

            Comment


            • #7
              It may or may not be legitimate to treat ordinal independent variables as continuous. For a discussion see

              https://www3.nd.edu/~rwilliam/xsoc73...ndependent.pdf

              Better yet, if your library provides you with free access, see

              https://methods.sagepub.com/foundati...dent-variables
              -------------------------------------------
              Richard Williams, Notre Dame Dept of Sociology
              StataNow Version: 18.5 MP (2 processor)

              EMAIL: [email protected]
              WWW: https://www3.nd.edu/~rwilliam

              Comment


              • #8
                Thank you all. I will check them out.

                Comment

                Working...
                X