"Replace if" with strings containing slashes

Jonathan Seiden

Join Date: Aug 2017

Posts: 28
#1

"Replace if" with strings containing slashes

12 Nov 2018, 11:55

Hello,

I have a weird little problem in Stata 14 that I have been able to work around, but I'm confused as to why it's not working in the way I expect it to.

I have a dataset that has labels rather than numeric values for responses. I am trying to replace them with values and then destring the varaibles.

As shown below, my "replace if" command matches the value as reported in the codebook exactly, but then makes no changes.

I think it must have to do something with the slash in the label. Below is a reproducible example:

Code:

* Example generated by -dataex-. To install: ssc install dataex clear input str25 human1 "1" "Refused/ Skipped" "1" "Incorrect/ Does not know" "Refused/ Skipped" end

After inputting the data, run the following command to illustrate the problem:

Code:

codebook human1 replace human1 = "0" if human1 == "Incorrect/ Does not know"

Any ideas? Thanks in advance!
Tags: None
Nick Cox

Join Date: Mar 2014

Posts: 35436
#2

12 Nov 2018, 12:02

Your input code runs for me but yields this:

Code:

+------------+ | human1 | |------------| 1. | 1 | 2. | Refused/ | 3. | Skipped | 4. | 1 | 5. | Incorrect/ | 6. | Does | 7. | Refused/ | 8. | Skipped | +------------+

The implication to me is that the character after the slash is not a space at all, which is why your replace command failed.

Try this

Code:

replace human1 = "0" if substr(human1, 1, 9) == "Incorrect"

Incidentally, replacing informative text with 0 seems to me a step backwards.
Comment
Jonathan Seiden

Join Date: Aug 2017

Posts: 28
#3

12 Nov 2018, 12:25

Hi Nick,

Thanks for the quick response. Yes, I did something similar as the workaround (I used regexm rather than substr, but the result was the same).

It seems like Stata is adding a "carriage return" which is not included in the "replace if" command. I imagine that there is a way to use my original syntax with a character code, but it's not a big deal.

As for your other comment, I am not discarding this information, I am just recoding my variables as numeric so that I can manipulate them easier (and so for these binaries, 1 corresponds to correct/yes and 0 corresponds to incorrect/no), I'm immediately re-labelling them.

Thanks again.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35436
#4

12 Nov 2018, 12:32

I doubt that Stata is inserting carriage returns unilaterally. I would focus on what was imported and how.

If you replace "Incorrect/" etc with "0" where is the informative text going? At best you have to put it back at some point.

I'd use encode for your positive purpose. Very likely your reasons for non-response to a question (which these appear to be) should be coded to flavours of extending missing values.
1 like
Comment
Joro Kolev

Join Date: Aug 2018

Posts: 3047
#5

13 Nov 2018, 06:10

Hi Jonathan,

I am not sure what is your final goal here, but what I would have wanted to do with such data as yours is to expand the qualitative values in a set of dummies capturing the same information as the qualitative values, but with the advantage that can be used in regression analysis.

The easiest way to do this (if what I have in mind is what you want to do) is

Code:

tab human1, gen(response)

This will generate 4 dummies called response1, response2, response3, response4 (nicely and appropriately labelled) capturing the same information as your original variable but in numerical form ready for inclusion in regression analysis.
1 like
Comment

David Benson

Join Date: Oct 2018
Posts: 489

13 Nov 2018, 16:31

Jonathan, you may not need to use this, but I pull in a lot of data from websites and PDF files, and carriage returns, line feeds, and other non-printing ASCII characters are the bane of my existence. If you install charlist (SSC), it will tell you what ASCII codes are in your data. (Line Feed==10, Carriage Return==13).

You can see a list of ASCII codes here (but any code <=31 is a non-printing code. Codes >=127 I would check.

Code:

* Seeing which codes are in the variable
ssc install charlist
charlist human1 // this will show them as ABCDEFGHIJKLMNOPQRSTUVWXYZabcd, etc
display r(ascii) // this will list their ASCII codes

* Say it returns a carriage return (char==13)
replace human1 = subinstr( human1, char(13), " ",.)

* Say it returns a number of codes after charlist (this is a sample from a project of mine)
foreach var of varlist notes exit_notes industry_comments company_desc {
    foreach  i in 146 147 148 150 153 160 174 {
          replace `var' = subinstr(`var', char(`i'), " ",.)
     }

* If you want to run something for ALL string variables
ds, has(type string) alpha
foreach var of varlist `r(varlist)' {

*** DO STUFF HERE
}

Hope that helps!

Announcement

"Replace if" with strings containing slashes

Comment

Comment

Comment

Comment

Comment