Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • character encoding problem with string variable value - cannot get STATA to recognize a string

    Hi everyone, my programmer collaborator and I have been banging our heads on this one, can you help? It's for our medical survey.

    I have a variable QID88_7_TEXT
    each obs has an id# so I'll use that for clarity
    that observation has this id value: 1747
    We cannot make any changes in the values of the original data, so we can't replace this value, only create a new dataset based on do files
    in the STATA data browser window, the id 1747 value for QID88_7_TEXT is this:


    Code:
     I’ve stopped asking. And they don’t tell me voluntarily. I give so few fox, that I honestly can’t answer this question with confidence. In my mind the most accurate is PTSD, but I’m sure depression and bipolar are on the chart somewhere.
    When I run this code:

    Code:
    replace dx_recode = 24 if QID88_7_TEXT == "I’ve stopped asking. And they don’t tell me voluntarily. I give so few fox, that I honestly can’t answer this question with confidence. In my mind the most accurate is PTSD, but I’m sure depression and bipolar are on the chart somewhere."
    STATA returns this, indicating it is not finding a == match:

    Code:
    (0 real changes made)
    Looking at the string, I try it with a leading space, like this:

    Code:
    replace dx_recode = 24 if QID88_7_TEXT == " I’ve stopped asking. And they don’t tell me voluntarily. I give so few fox, that I honestly can’t answer this question with confidence. In my mind the most accurate is PTSD, but I’m sure depression and bipolar are on the chart somewhere."
    STATA returns this, indicating it is not finding a == match:

    Code:
    (0 real changes made)
    When I do a partial string match, it finds the obs:

    Code:
    list QID88_7_TEXT if strpos(QID88_7_TEXT, "asking") > 0
    
          +-----------------------------------------------------------------------------------------------------------------------+
          | QID88_7_TEXT                                                                                                          |
          |-----------------------------------------------------------------------------------------------------------------------|
    1138. |  I’ve stopped asking. And they don’t tell me voluntarily. I give so few fox, that I honestly can’t answer this ques.. |
          +-----------------------------------------------------------------------------------------------------------------------+
    
    .
    I think the problem is with the special character, in UTF-8 it is this:

    Code:
     
    E2 80 99 Right single quotation mark
    Note that it appears multiple times in the string. Here is the string again, copypasted from OSX directly from the STATA browser into Chrome:

    Code:
    I’ve stopped asking. And they don’t tell me voluntarily. I give so few fox, that I honestly can’t answer this question with confidence. In my mind the most accurate is PTSD, but I’m sure depression and bipolar are on the chart somewhere.
    So the question is, How do I rewrite the following code so that it will == locate the special character, identify the obs, and do the value replace?

    Code:
    replace dx_recode = 24 if QID88_7_TEXT == " I’ve stopped asking. And they don’t tell me voluntarily. I give so few fox, that I honestly can’t answer this question with confidence. In my mind the most accurate is PTSD, but I’m sure depression and bipolar are on the chart somewhere."
    THANK YOU!!!!!

    PS
    I have tried the code with a leading space and without a leading space. In the STATA data browser a leading space does seem to be in the string in question, but for some reason copy-paste drops the leading space. This might be a red herring but thought I'd mention it!



    Last edited by Will Hall; Yesterday, 10:05.

  • #2
    You can use dataex to capture the entire string as is.

    Code:
    dataex QID88_7_TEXT in 1138

    Comment


    • #3
      Thanks but our do file needs to run and show the actual string so that outside researchers can follow the changes as valid from a methodology standpoint. Otherwise we could just use the ID number or even row number. So your suggestion I think doesn't work. We need something that actually matches the string line a + char(34) + code something.

      Really appreciate the prompt reply!

      Comment


      • #4
        Show us the dataex output.

        Comment

        Working...
        X