Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • "Replace if" with strings containing slashes

    Hello,

    I have a weird little problem in Stata 14 that I have been able to work around, but I'm confused as to why it's not working in the way I expect it to.

    I have a dataset that has labels rather than numeric values for responses. I am trying to replace them with values and then destring the varaibles.

    As shown below, my "replace if" command matches the value as reported in the codebook exactly, but then makes no changes.
    Click image for larger version

Name:	Capture.PNG
Views:	1
Size:	9.5 KB
ID:	1470052

    I think it must have to do something with the slash in the label. Below is a reproducible example:

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str25 human1
    "1"                       
    "Refused/
    Skipped"        
    "1"                       
    "Incorrect/
    Does not know"
    "Refused/
    Skipped"          
    end
    After inputting the data, run the following command to illustrate the problem:

    Code:
    codebook human1
    replace human1 = "0" if human1 == "Incorrect/ Does not know"
    Any ideas? Thanks in advance!

  • #2
    Your input code runs for me but yields this:

    Code:
         +------------+
         |     human1 |
         |------------|
      1. |          1 |
      2. |   Refused/ |
      3. |    Skipped |
      4. |          1 |
      5. | Incorrect/ |
      6. |       Does |
      7. |   Refused/ |
      8. |    Skipped |
         +------------+
    The implication to me is that the character after the slash is not a space at all, which is why your replace command failed.

    Try this

    Code:
    replace human1 = "0" if substr(human1, 1, 9) == "Incorrect"
    Incidentally, replacing informative text with 0 seems to me a step backwards.


    Comment


    • #3
      Hi Nick,

      Thanks for the quick response. Yes, I did something similar as the workaround (I used regexm rather than substr, but the result was the same).

      It seems like Stata is adding a "carriage return" which is not included in the "replace if" command. I imagine that there is a way to use my original syntax with a character code, but it's not a big deal.

      As for your other comment, I am not discarding this information, I am just recoding my variables as numeric so that I can manipulate them easier (and so for these binaries, 1 corresponds to correct/yes and 0 corresponds to incorrect/no), I'm immediately re-labelling them.

      Thanks again.

      Comment


      • #4
        I doubt that Stata is inserting carriage returns unilaterally. I would focus on what was imported and how.

        If you replace "Incorrect/" etc with "0" where is the informative text going? At best you have to put it back at some point.

        I'd use encode for your positive purpose. Very likely your reasons for non-response to a question (which these appear to be) should be coded to flavours of extending missing values.

        Comment


        • #5
          Hi Jonathan,

          I am not sure what is your final goal here, but what I would have wanted to do with such data as yours is to expand the qualitative values in a set of dummies capturing the same information as the qualitative values, but with the advantage that can be used in regression analysis.

          The easiest way to do this (if what I have in mind is what you want to do) is

          Code:
          tab human1, gen(response)
          This will generate 4 dummies called response1, response2, response3, response4 (nicely and appropriately labelled) capturing the same information as your original variable but in numerical form ready for inclusion in regression analysis.



          Comment


          • #6
            Jonathan, you may not need to use this, but I pull in a lot of data from websites and PDF files, and carriage returns, line feeds, and other non-printing ASCII characters are the bane of my existence. If you install charlist (SSC), it will tell you what ASCII codes are in your data. (Line Feed==10, Carriage Return==13).

            You can see a list of ASCII codes here (but any code <=31 is a non-printing code. Codes >=127 I would check.

            Code:
            * Seeing which codes are in the variable
            ssc install charlist
            charlist human1 // this will show them as ABCDEFGHIJKLMNOPQRSTUVWXYZabcd, etc
            display r(ascii) // this will list their ASCII codes
            
            * Say it returns a carriage return (char==13)
            replace human1 = subinstr( human1, char(13), " ",.)
            
            * Say it returns a number of codes after charlist (this is a sample from a project of mine)
            foreach var of varlist notes exit_notes industry_comments company_desc {
                foreach  i in 146 147 148 150 153 160 174 {
                      replace `var' = subinstr(`var', char(`i'), " ",.)
                 }
            
            * If you want to run something for ALL string variables
            ds, has(type string) alpha
            foreach var of varlist `r(varlist)' {
            
            *** DO STUFF HERE
            }
            Hope that helps!

            Comment

            Working...
            X