Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Dichotomizing a Factor Variable

    Dear All,

    Hi, I have a question about dichotomizing a factor variable (or a variable that contains non-integer values).


    I am currently using -prchange- from spost9 after conducting a logistic regression.
    One of my major independent variables is an average of six related variables (-alpha- value 0.8).
    The six original variables are set variables, which were meant to measure participation frequency in six different areas from level 1 to 4.
    I would dichotomize one of the original variables by recoding 1 through 2 to 0 and 3 through 4 to 1.
    Yet, after I combined these six variables it became hard to dichotomize this variable.

    As many of you might know already, -prchange- command would not accept factor variables for analysis.

    Therefore, I have to dichotomize my averaged variable that contains non-integer values.
    I have been searching for any standard or recommended method to do so, yet I failed to find any.
    Does this mean dichotomizing an averaged variable is against mathematical rules?
    Would it be wrong to dichotomize my average variable by recoding any values above 2 to 1?
    (In other words, what should be the cut point in this case?)

    Any recommendation or advice will be welcome.
    Thank you for reading this post.
    Last edited by Jae Park; 14 Apr 2016, 06:20. Reason: -prchange- "function" to "command"

  • #2
    You can dichotomise anything you like, and that can make sense.

    (speed of car > speed limit)

    keeps important information. Below, you're legal. Above, you're illegal. Watch out for police, but slow down now. If it's somebody else's car, keep out of its way.

    But you could probably collect a hundred articles and books warning that you should not usually do this in statistical analysis, as in most cases it is arbitrary, increases the scope for error and throws away information.

    I don't know anything much about the prchange command (not function), but if it doesn't apply, use something that does.

    I don't understand enough about your problem to say more than this.

    Comment


    • #3
      Dear Nick Cox ,
      Thank you for your advice. I also believe that making a binary variable out of an averaged variable would risk losing a lot of information. I will consider using my original variables for this command, too.

      Comment


      • #4
        You are better off upgrading to spost13 which supports factor variables and using commands like mtable and mchange. Do -findit spost13_ado-. For more see

        http://www.indiana.edu/~jslsoc/web_s..._faqspost9.htm

        For Long & Freese's book, see

        http://www.stata.com/bookstore/regre...ent-variables/
        -------------------------------------------
        Richard Williams, Notre Dame Dept of Sociology
        StataNow Version: 19.5 MP (2 processor)

        EMAIL: [email protected]
        WWW: https://www3.nd.edu/~rwilliam

        Comment


        • #5
          Richard Williams, Thank you so much! In fact I had installed spost13 but I was using the commands such as -prchange-only (I did not realize spost13 would require different commands). I will try again with spost13 commands again!

          Comment


          • #6
            Jae, follow Scott Long's advice about what to do if you want both spost9 and spost13 installed. Otherwise you may run into some problems.

            http://www.indiana.edu/~jslsoc/web_s..._faqspost9.htm

            Long and Freese's book is well worth reading:

            http://www.stata.com/bookstore/regre...ent-variables/
            -------------------------------------------
            Richard Williams, Notre Dame Dept of Sociology
            StataNow Version: 19.5 MP (2 processor)

            EMAIL: [email protected]
            WWW: https://www3.nd.edu/~rwilliam

            Comment


            • #7
              Under recent versions of Stata, and depending on what you wish, - margins - may probably do the trick.

              You may also try - listcoef - under SPost packages. I wonder whether the option "percent" wouldn't present what you wish. Please, see the example below:


              Code:
              . use http://www.stata-press.com/data/r14/lbw.dta
              (Hosmer & Lemeshow data)
              
              . logistic low c.age i.smoke i.race
              
              Logistic regression                             Number of obs     =        189
                                                              LR chi2(4)        =      15.81
                                                              Prob > chi2       =     0.0033
              Log likelihood =  -109.4311                     Pseudo R2         =     0.0674
              
              ------------------------------------------------------------------------------
                       low | Odds Ratio   Std. Err.      z    P>|z|     [95% Conf. Interval]
              -------------+----------------------------------------------------------------
                       age |   .9657186   .0322573    -1.04   0.296     .9045206    1.031057
                           |
                     smoke |
                   smoker  |    3.00582   1.118001     2.96   0.003     1.449982    6.231081
                           |
                      race |
                    black  |   2.749483   1.356659     2.05   0.040     1.045318    7.231924
                    other  |   2.876948   1.167921     2.60   0.009     1.298314    6.375062
                           |
                     _cons |    .365111   .3146026    -1.17   0.242     .0674491    1.976395
              ------------------------------------------------------------------------------
              
              . listcoef, help percent
              
              logit (N=189): Percentage change in odds
              
                Odds of: 1 vs 0
              
              ------------------------------------------------------------------------
                           |         b        z    P>|z|         %     %StdX     SDofX
              -------------+----------------------------------------------------------
                       age |   -0.0349   -1.044    0.296      -3.4     -16.9     5.299
                           |
                     smoke |
                   smoker  |    1.1006    2.959    0.003     200.6      71.4     0.489
                           |
                      race |
                    black  |    1.0114    2.050    0.040     174.9      41.8     0.345
                    other  |    1.0567    2.603    0.009     187.7      66.0     0.480
                           |
                  constant |   -1.0076   -1.169    0.242         .         .         .
              ------------------------------------------------------------------------
                     b = raw coefficient
                     z = z-score for test of b=0
                 P>|z| = p-value for z-test
                     % = percent change in odds for unit increase in X
                 %StdX = percent change in odds for SD increase in X
                 SDofX = standard deviation of X
              Best,

              Marcos
              Last edited by Marcos Almeida; 14 Apr 2016, 07:05.
              Best regards,

              Marcos

              Comment


              • #8
                Thank you so much for your grateful advices. Richard Williams Fortunately I have access to Long and Freese's book which I will start reading right away. Also I will try -margins- and other two options you have suggested, Marcos Almeida. I appreciate your help.

                Comment


                • #9
                  Margins is great. But the spost13 commands are basically shells for margins and make things easier, especially with multiple outcome commands like mlogit and ologit. I also like the mcp command if you want graphs involving continuous variables after single-outcome commands like logit. For examples, see the appendices of

                  http://www3.nd.edu/~rwilliam/xsoc73994/Ologit01.pdf

                  For a discussion of mcp, see

                  http://www3.nd.edu/~rwilliam/xsoc73994/Margins03.pdf
                  -------------------------------------------
                  Richard Williams, Notre Dame Dept of Sociology
                  StataNow Version: 19.5 MP (2 processor)

                  EMAIL: [email protected]
                  WWW: https://www3.nd.edu/~rwilliam

                  Comment

                  Working...
                  X