Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • factor variables may not contain non integer values

    I know somebody has asked this question before, but I found the answers to be unhelpful. It seems that STATA automatically treat some data as factor variables. But I do not want those variables to be treated as "factor variables". How can I change the status of this variable to make it just a normal numeric variable? Thanks

  • #2
    Shiping:
    you may try something along the lines of the following example:
    Code:
    . use auto.dta
    (1978 Automobile Data)
    
    . regress price foreign
    
    
    . regress price i.rep78
    
          Source |       SS       df       MS              Number of obs =      69
    -------------+------------------------------           F(  4,    64) =    0.24
           Model |  8360542.63     4  2090135.66           Prob > F      =  0.9174
        Residual |   568436416    64     8881819           R-squared     =  0.0145
    -------------+------------------------------           Adj R-squared = -0.0471
           Total |   576796959    68  8482308.22           Root MSE      =  2980.2
    
    ------------------------------------------------------------------------------
           price |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
           rep78 |
              2  |   1403.125   2356.085     0.60   0.554    -3303.696    6109.946
              3  |   1864.733   2176.458     0.86   0.395    -2483.242    6212.708
              4  |       1507   2221.338     0.68   0.500    -2930.633    5944.633
              5  |     1348.5   2290.927     0.59   0.558    -3228.153    5925.153
                 |
           _cons |     4564.5   2107.347     2.17   0.034     354.5913    8774.409
    ------------------------------------------------------------------------------
    
    . regress price c.rep78
    
          Source |       SS       df       MS              Number of obs =      69
    -------------+------------------------------           F(  1,    67) =    0.00
           Model |  24770.7652     1  24770.7652           Prob > F      =  0.9574
        Residual |   576772188    67  8608540.12           R-squared     =  0.0000
    -------------+------------------------------           Adj R-squared = -0.0149
           Total |   576796959    68  8482308.22           Root MSE      =    2934
    
    ------------------------------------------------------------------------------
           price |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
           rep78 |   19.28012   359.4221     0.05   0.957    -698.1295    736.6897
           _cons |   6080.379    1274.06     4.77   0.000     3537.345    8623.413
    ------------------------------------------------------------------------------
    However, the meaningfulness of what above depends on which values your variable takes on.
    Kind regards,
    Carlo
    (StataNow 18.5)

    Comment


    • #3
      It worked! (I am using a different dataset, but I see the logic). Thanks a lot, Carlo. But is there a way to unmake a "factor variable" (into a regular numeric variable)?

      Comment


      • #4
        HI, Carol. It worked for the regression with interactive terms. But it won't work with the margin estimation. Here is what I have:


        . reg cpi c.elf##c.polright, r

        Linear regression Number of obs = 86
        F( 3, 82) = 37.95
        Prob > F = 0.0000
        R-squared = 0.4456
        Root MSE = 1.808

        ----------------------------------------------------------------------------------
        | Robust
        cpi | Coef. Std. Err. t P>|t| [95% Conf. Interval]
        -----------------+----------------------------------------------------------------
        elf | 3.073329 1.328065 2.31 0.023 .4313854 5.715273
        polright | 1.006441 .1797779 5.60 0.000 .6488049 1.364076
        |
        c.elf#c.polright | -.590187 .2873289 -2.05 0.043 -1.161776 -.0185983
        |
        _cons | 2.105206 .5455919 3.86 0.000 1.01985 3.190562
        ----------------------------------------------------------------------------------

        . margins c.elf#c.polright
        only factor variables and their interactions are allowed
        r(198);

        . margins elf#polright
        elf: factor variables may not contain noninteger values
        r(452);

        Did I still do something wrong?

        Comment


        • #5
          margins will not let you enter continuous predictors as marginlist as this would mean you want the marginal effect for each level of a continuous predictor - which hardly makes sense. Chances are you want something along the lines

          Code:
          margins , dydx(elf polright)
          or perhaps

          Code:
          summarize elf
          local min = r(min)
          local max = r(max)
          local mean = r(mean)
          margins polright , at(elf = (`min' `mean' `max'))
          marginsplot
          although in a linear model, you can see the marginal effect by just looking at the coefficients.

          Also note there is no such thing as a marginal effect for the interaction term (i.e. the product term) itself; see this post by Vince Wiggins, but that might be another story.

          Best
          Daniel

          Comment


          • #6
            That works! Thanks Dan.

            Comment


            • #7
              To be clear, if you do

              reg y x1 x2

              both x1 and x2 are treated as continuous variables. But if you do

              reg y x1 x2 x1#x2

              In the the interaction they will be treated as categorical variables. Hence, when specifying interactions with continuous vars specify

              reg y x1 x2 c.x1#c.x2

              If I ruled the world the default would always be continuous and you would specify i. as needed. As it is, Stata is basically using different defaults for interaction terms (assume vars are categorical unless specified otherwise) and non-interactions (assume continuous unless specified otherwise).

              If you want to be super-safe you can always use c. and i., even when you are replicating the default behavior. Doing so forces yourself to always be clear in your own mind whether a variable is categorical or continuous.
              -------------------------------------------
              Richard Williams, Notre Dame Dept of Sociology
              StataNow Version: 18.5 MP (2 processor)

              EMAIL: [email protected]
              WWW: https://www3.nd.edu/~rwilliam

              Comment


              • #8
                Thanks so much, Richard. Yes, I am also reading the earlier post you guys had.

                Comment

                Working...
                X