Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Dropping obs with string variables of value "0" / converting string to numeric when string contains "X"

    Hello,

    I am trying to drop all observations in my dataset that have the value "0" in variable "occsoc". Unfortunately, the command

    drop if occsoc=0

    is returning 'type mismatch'.

    This may be because the variable has leading blanks. So to play it safe I created an identical variable 'occsoc_num' and removed the leading blanks with strtrim(occsoc_num). But the command

    drop if occsoc_num=0

    is still returning 'type mismatch'.

    Please see my screenshot below (I know we are not supposed to post screenshots in the forum, but I wasn't able to make dataex work for this example):

    Click image for larger version

Name:	Screen Shot 2019-03-15 at 12.42.09.png
Views:	1
Size:	53.5 KB
ID:	1488277

    I think the 'type mismatch' error may have something to do with the fact that occsoc (and therefore occsoc_num) are string variables. However, these variables contain 'X' in addition to numbers (see data extract below). Can I destring a variable that includes the character X? Numerically, all X's could be replaced with a 0.


    See example below:

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str6 occsoc
    "     0"
    "     0"
    "436010"
    "173020"
    "     0"
    "     0"
    "     0"
    "436010"
    "     0"
    "471011"
    "395012"
    "311010"
    "319091"
    "252020"
    "513020"
    "412010"
    "     0"
    "     0"
    "     0"
    "434051"
    "     0"
    "     0"
    "     0"
    "491011"
    "212011"
    "     0"
    "537064"
    "     0"
    "372012"
    "499071"
    "     0"
    "     0"
    "     0"
    "373010"
    "514XXX"
    "436010"
    "     0"
    "113031"
    "232011"
    "399011"
    "434181"
    "351012"
    "372012"
    "     0"
    "332011"
    "252020"
    "     0"
    "     0"
    "37201X"
    "37201X"
    "     0"
    "     0"
    "472111"
    "436010"
    "37201X"
    "412010"
    "212011"
    "352010"
    "     0"
    "     0"
    "372012"
    "435081"
    "     0"
    "     0"
    "     0"
    "292021"
    "519196"
    "411011"
    "     0"
    "131070"
    "151122"
    "     0"
    "     0"
    "     0"
    "     0"
    "516063"
    "311010"
    "     0"
    "352010"
    "     0"
    "471011"
    "     0"
    "471011"
    "273043"
    "     0"
    "     0"
    "412010"
    "411011"
    "     0"
    "434051"
    "291141"
    "151150"
    "291060"
    "253000"
    "     0"
    "     0"
    "     0"
    "111021"
    "395012"
    "     0"
    end

    Best of thanks for your consideration.

  • #2
    Using the trimming was the correct thing to do.
    Your variable mismatch error is because Stata expects quotation marks around stirng values.
    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str6 occsoc
    "252020"
    "     0"
    "     0"
    "37201X"
    "291060"
    "253000"
    "     0"
    "     0"
    "     0"
    "111021"
    "395012"
    "     0"
    end
    gen var2 = strtrim(occsoc)
    drop if occsoc == "0"
    drop if var2 == "0"
    Note further that your trimmed variable is not numeric just yet. Your variable naming seems to suggest you'd hope for that. Also consider just setting these values to missing rather than drop, as the rest of the data might prove useful.

    Code:
    replace var2="" if var2=="0"

    Comment


    • #3
      Thank you Jorrit, the quotation marks worked!

      And you are right, my trimmed variable is not numeric, although I would like it to be. However, it contains an X in the string. Do you know how I could replace all X's in the string with zeros, in order to then convert it to a numeric variable?

      Comment


      • #4
        starting from your origial variable:
        Code:
        gen occsoc_noX = subinstr(occsoc, "X", "0", .)
        destring occsoc_noX, gen(occsoc_num)
        Possibly followed by one of:
        Code:
        replace occsoc_num=. if occsoc_num==0
        drop if occsoc_num==0
        to get rid of the observations with 0 values

        Comment


        • #5
          Thanks Jorrit. Unfortunately, Stata is telling me that "occsoc_noX: contains nonnumeric characters; no generate". I thought it might have something to do with the fact that occsoc contains leading blanks, but the problem persists even if I use your code after removing leading blanks. See screenshot below:

          Click image for larger version

Name:	Screen Shot 2019-03-20 at 10.43.20.png
Views:	1
Size:	35.6 KB
ID:	1489093


          Do you have any further suggestions?

          Best wishes and many thanks,
          Rosa


          ps: I know it's bad practice to post screenshots, but I can't get dataex to work properly. I've posted a separate question about that here:
          https://www.statalist.org/forums/for...e-to-statalist

          Comment


          • #6
            You'll need to figure out what other non-numeric characters you have in your occsoc variable.
            Figure out which are the values creating the issue with something like below. Probably best to start after youve already removed all the X's you know to occur in your occsoc variable.

            Code:
            * Example generated by -dataex-. To install: ssc install dataex
            clear
            input str6 occsoc
            "252020"
            "     0"
            "     0"
            "37201X"
            "291060"
            "253000"
            "     0"
            "     0"
            "   n/a"
            "111021"
            "395012"
            "     0"
            end
            destring occsoc, gen(occsoc_num) force
            list occsoc if occsoc_num==.
            Code:
            . list occsoc if occsoc_num==.
            
                 +--------+
                 | occsoc |
                 |--------|
              4. | 37201X |
              9. |    n/a |
                 +--------+
            With the dataex issue: dataex is for creating example datasets for posting here.
            The output you have above, you can just copy paste the text from the output window, and use code blocks around them to create the proper layout. Do so by putting the text in between the code blocks that appear when you click the # button in the formatting bar.:
            Click image for larger version

Name:	Untitled.png
Views:	1
Size:	5.5 KB
ID:	1489100
            Last edited by Jorrit Gosens; 20 Mar 2019, 04:12.

            Comment


            • #7
              It worked!! There was an errant "Y", but now occsoc_num is finally numeric, without Xs or Ys. Thanks a lot.

              Comment

              Working...
              X