Margins command and factor variable notations.

April Kimm

Join Date: Mar 2021

Posts: 45
#1

Margins command and factor variable notations.

16 Mar 2022, 04:13

I think that it is a fairly basic question but I obtain different coefficients when I run models with and without factor variable notations.

(1) When I use margins command, is it necessary to use factor variable notations even without interaction terms in the model?

(2) Also, is it necessary to use factor variables even when I do not use margins command following the regression? I am very confused because I get different coefficients depending on whether I include factor variable notations. In other words, which coefficient should be reported in the paper with or without factor variable notations?
Tags: None

Carlo Lazzaro

Join Date: Apr 2014
Posts: 17609

16 Mar 2022, 04:19

April:
the one size-fits-all reply states that is a very good habit to use -fvvarlist- notation, especially when you're dealing with categorical variables (and, obviously, interactions).
As you noticed, neglecting -fvvarlist- notation can produce unreliable/ridicolous results, as reported in the following toy-example (where the second regeression code makes no sense at all, even though -rep78- is a count variable):

Code:

. use "C:\Program Files\Stata17\ado\base\a\auto.dta"
(1978 automobile data)

. regress price i.rep78

      Source |       SS           df       MS      Number of obs   =        69
-------------+----------------------------------   F(4, 64)        =      0.24
       Model |  8360542.63         4  2090135.66   Prob > F        =    0.9174
    Residual |   568436416        64     8881819   R-squared       =    0.0145
-------------+----------------------------------   Adj R-squared   =   -0.0471
       Total |   576796959        68  8482308.22   Root MSE        =    2980.2

------------------------------------------------------------------------------
       price | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
       rep78 |
          2  |   1403.125   2356.085     0.60   0.554    -3303.696    6109.946
          3  |   1864.733   2176.458     0.86   0.395    -2483.242    6212.708
          4  |       1507   2221.338     0.68   0.500    -2930.633    5944.633
          5  |     1348.5   2290.927     0.59   0.558    -3228.153    5925.153
             |
       _cons |     4564.5   2107.347     2.17   0.034     354.5913    8774.409
------------------------------------------------------------------------------

. regress price rep78

      Source |       SS           df       MS      Number of obs   =        69
-------------+----------------------------------   F(1, 67)        =      0.00
       Model |  24770.7652         1  24770.7652   Prob > F        =    0.9574
    Residual |   576772188        67  8608540.12   R-squared       =    0.0000
-------------+----------------------------------   Adj R-squared   =   -0.0149
       Total |   576796959        68  8482308.22   Root MSE        =      2934

------------------------------------------------------------------------------
       price | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
       rep78 |   19.28012   359.4221     0.05   0.957    -698.1295    736.6897
       _cons |   6080.379    1274.06     4.77   0.000     3537.345    8623.413
------------------------------------------------------------------------------

.

Last edited by Carlo Lazzaro; 16 Mar 2022, 04:27.

Kind regards,
Carlo
(StataNow 18.5)

Comment

April Kimm

Join Date: Mar 2021

Posts: 45
#3

16 Mar 2022, 04:31

Thank you but I am still confused because I do not recall that I have seen articles which report different coefficients of each value of categorical variables in the regression analyses.
Comment

Carlo Lazzaro

Join Date: Apr 2014
Posts: 17609

16 Mar 2022, 04:42

April:
-fvvarlist- notation has a greater flexibility than simply plugging the categorical predictor in without the -i.- prefix. For instance we can choose the reference category (and switch it in a very comfortable way).

Code:

use "C:\Program Files\Stata17\ado\base\a\auto.dta"
. regress price i0.foreign

      Source |       SS           df       MS      Number of obs   =        74
-------------+----------------------------------   F(1, 72)        =      0.17
       Model |  1507382.66         1  1507382.66   Prob > F        =    0.6802
    Residual |   633558013        72  8799416.85   R-squared       =    0.0024
-------------+----------------------------------   Adj R-squared   =   -0.0115
       Total |   635065396        73  8699525.97   Root MSE        =    2966.4

------------------------------------------------------------------------------
       price | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
     foreign |
   Domestic  |  -312.2587   754.4488    -0.41   0.680    -1816.225    1191.708
       _cons |   6384.682   632.4346    10.10   0.000     5123.947    7645.417
------------------------------------------------------------------------------

. regress price i1.foreign

      Source |       SS           df       MS      Number of obs   =        74
-------------+----------------------------------   F(1, 72)        =      0.17
       Model |  1507382.66         1  1507382.66   Prob > F        =    0.6802
    Residual |   633558013        72  8799416.85   R-squared       =    0.0024
-------------+----------------------------------   Adj R-squared   =   -0.0115
       Total |   635065396        73  8699525.97   Root MSE        =    2966.4

------------------------------------------------------------------------------
       price | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
     foreign |
    Foreign  |   312.2587   754.4488     0.41   0.680    -1191.708    1816.225
       _cons |   6072.423    411.363    14.76   0.000     5252.386     6892.46
------------------------------------------------------------------------------

.

As an aside, a reference category is mandatory to avoid the so called dummy trap (

https://en.wikipedia.org/wiki/Dummy_variable_(statistics)

Kind regards,
Carlo
(StataNow 18.5)

Comment

April Kimm

Join Date: Mar 2021

Posts: 45
#5

18 Mar 2022, 01:48

Should factor variable notations be used for ordinal variables as well?

Last edited by April Kimm; 18 Mar 2022, 01:52.
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17609
#6

18 Mar 2022, 02:44

April:
see https://www.statalist.org/forums/for...ting-variables.
In addition, if you have an ordinal predictor (say, good; discrete; bad) you can use -fvvarlist- notation.

Kind regards,
Carlo
(StataNow 18.5)
Comment
Richard Williams

Join Date: Apr 2014

Posts: 4889
#7

18 Mar 2022, 05:16

It may or may not be legitimate to treat ordinal independent variables as continuous. For a discussion see

https://www3.nd.edu/~rwilliam/xsoc73...ndependent.pdf

Better yet, if your library provides you with free access, see

https://methods.sagepub.com/foundati...dent-variables

-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
StataNow Version: 18.5 MP (2 processor)
EMAIL: [email protected]
WWW: https://www3.nd.edu/~rwilliam
1 like
Comment
April Kimm

Join Date: Mar 2021

Posts: 45
#8

18 Mar 2022, 11:38

Thank you all. I will check them out.
Comment

Announcement