Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • "Factor variables may not include negative values" for categorical variable

    I am very new to stata and not really sure what I'm doing, I'm just trying to get some regression output for a university assignment.

    The data I am using can be found here https://www.datafirst.uct.ac.za/data...hp/catalog/570
    "Get microdata" will prompt the download.

    The variables I would like to use are named: "w4_a_em1pay" as the dependent variable (monthly income) and "w4_a_em1prod_c" as the independent (economic sector).

    w4_a_em1prod_c is a categorical variable with no numerical obs. as far as I know. There are 26000+ observations and I can't check all of them.

    I am using the following input:

    regress w4_a_em1pay i.w4_a_em1prod_c

    And getting the following output:
    w4_a_em1prod_c: factor variables may not contain negative values

    Is there a fix for this? Can I search for negative values in the variable to remove them? Or somehow ignore the observations with negative values etc.




  • #2
    . table w4_a_em1prod_c

    -------------------------------------------------------------------------------------
    | Frequency
    -------------------------------------------------------------------------+-----------
    eb8 - Sector code for primary occupation |
    Missing | 71
    Private households | 570
    Agriculture, hunting, forestry and fishing | 639
    Mining and Quarrying | 213
    Manufacturing | 685
    Electricity, gas and water supply | 66
    Construction | 423
    Wholesale and Retail trade; repair etc; hotels and restaurants | 1,104
    Transport, storage and communication | 286
    Financial intermediation, insurance, real estate and business services | 593
    Community, social and personal services | 1,784
    Total | 6,434


    Comment


    • #3
      Welcome to Statalist.

      Instead of the command used in #2, try this:
      Code:
      table w4_a_em1prod_c, nolab
      If there is any negative numeric codes they should be showing up. Once you have identified what they are, you may create a copy of that, and reassign another unique value to the category. For example, this code shows how to replace -8 with 5:
      Code:
      gen w4_2 = w4_a_em1prod_c
      replace w4_2 = 5 if w4_2 == -8
      The reason of creating a copy is to avoid overwriting the original information. That way, if a mistake was made in the recoding, you'll always have the original w4_a_em1prod_c to go back to.

      And if you are absolutely sure that the negative values can be excluded, you may drop those cases with:
      Code:
      regress w4_a_em1pay i.w4_a_em1prod_c if w4_a_em1prod_c >= 0 & w4_a_em1prod_c < .
      Last edited by Ken Chui; 30 Sep 2021, 13:11.

      Comment


      • #4
        Originally posted by Ken Chui View Post
        Welcome to Statalist.

        Instead of the command used in #2, try this:
        Code:
        table w4_a_em1prod_c, nolab
        If there is any negative numeric codes they should be showing up. Once you have identified what they are, you may create a copy of that, and reassign another unique value to the category. For example, this code shows how to replace -8 with 5:
        Code:
        gen w4_2 = w4_a_em1prod_c
        replace w4_2 = 5 if w4_2 == -8
        The reason of creating a copy is to avoid overwriting the original information. That way, if a mistake was made in the recoding, you'll always have the original w4_a_em1prod_c to go back to.

        And if you are absolutely sure that the negative values can be excluded, you may drop those cases with:
        Code:
        regress w4_a_em1pay i.w4_a_em1prod_c if w4_a_em1prod_c >= 0 & w4_a_em1prod_c < .
        Hi Ken, thank you for the fast response.
        I tried the command
        table w4_a_em1prod_c, nolab and got the following output . table w4_a_em1prod_c, nolab option nolab not allowed r(198); The rest is very helpful, just need to figure out how to identify the relative values

        Comment


        • #5
          please see #4 in https://www.statalist.org/forums/help#adviceextras

          Comment


          • #6
            Try tabulate rather than table

            (I guess you’re using Stata 17 and Ken is using or presuming an earlier version.)

            Comment


            • #7
              Originally posted by Leo Davis View Post

              Hi Ken, thank you for the fast response.
              I tried the command
              table w4_a_em1prod_c, nolab and got the following output . table w4_a_em1prod_c, nolab option nolab not allowed r(198); The rest is very helpful, just need to figure out how to identify the relative values
              Oh, sorry. As Nick said in #6, it's just me thinking it was "tab" (as tabulate) while copied and pasted your "table" which is a different command than tab. If you replace "table" with "tab" or "tabulate", then you should be able to see the code instead of label.

              Code:
              tab w4_a_em1prod_c, nolab

              Comment


              • #8
                Thank you, tab worked

                Comment


                • #9
                  I’m a big fan of the user-written -fre-, which gives you both numeric values and value labels simultaneously. Available from SSC.
                  -------------------------------------------
                  Richard Williams, Notre Dame Dept of Sociology
                  StataNow Version: 19.5 MP (2 processor)

                  EMAIL: [email protected]
                  WWW: https://www3.nd.edu/~rwilliam

                  Comment


                  • #10
                    We're moving pretty slowly here. Once you see the results you should be able to make a decision on the negative categories. It could be as simple as there is just one negative category, which is for missings. I would change that negative value to a Stata missing value such as system missing (.). See

                    Code:
                    help mvdecode
                    If after that negative codes remain for categories you can get a variable that is coded by positive integers by pushing w4_a_em1prod_c through egen group().

                    Here's a dopey example. Suppose badcat arrived with values -2 -1 0 1 2. You could just add 3 and then everything's positive. But a better idea is to go

                    Code:
                    egen goodcat = group(badcat), label
                    and then you get the best of both worlds, as value labels will get carried across.

                    Comment

                    Working...
                    X