Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Identify a segment of a string

    Hi,

    I want to assign a binary code to each column, 1 if it contains information from the first row and 0 otherwise.

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input str33 WHODISCLOSURE2 str31 WHODISCLOSURE3 str26 WHODISCLOSURE4 str24 WHODISCLOSURE5
    "Securities regulator - Encourages" "Securities regulator - Requires" "Corporate law - Encourages" "Corporate law - Requires"
    ""                                  "Requires"                        ""                           ""                        
    ""                                  ""                                ""                           ""                        
    "Encourages"                        ""                                ""                           ""                        
    ""                                  "Requires"                        ""                           ""                        
    "Encourages"                        ""                                ""                           ""                        
    "Encourages"                        "Requires"                        ""                           ""                        
    ""                                  "Requires"                        ""                           "Requires"                
    "Encourages"                        ""                                ""                           "Requires"                
    ""                                  "Requires"                        ""                           ""                        
    ""                                  ""                                ""                           ""                        
    ""                                  "Requires"                        "Encourages"                 ""                        
    "Encourages"                        ""                                ""                           "Requires"                
    "Encourages"                        ""                                ""                           ""                        
    ""                                  "Requires"                        ""                           ""                        
    ""                                  ""                                ""                           ""                        
    "Encourages"                        ""                                ""                           "Requires"                
    ""                                  "Requires"                        "Encourages"                 ""                        
    "Encourages"                        ""                                ""                           ""                        
    ""                                  "Requires"                        ""                           ""                        
    ""                                  ""                                ""                           ""                        
    "Encourages"                        ""                                ""                           ""                        
    ""                                  "Requires"                        "Encourages"                 "Requires"                
    ""                                  ""                                ""                           ""                        
    "Encourages"                        "Requires"                        ""                           ""                        
    ""                                  ""                                ""                           ""                        
    ""                                  "Requires"                        ""                           ""                        
    ""                                  ""                                ""                           ""                        
    ""                                  "Requires"                        ""                           "Requires"                
    ""                                  ""                                "Encourages"                 ""                        
    ""                                  "Requires"                        ""                           "Requires"                
    "Encourages"                        "Requires"                        ""                           "Requires"                
    ""                                  "Requires"                        "Encourages"                 ""                        
    ""                                  ""                                ""                           ""                        
    ""                                  "Requires"                        ""                           ""                        
    "Encourages"                        ""                                ""                           "Requires"                
    ""                                  "Requires"                        ""                           ""                        
    "Encourages"                        "Requires"                        ""                           ""                        
    ""                                  "Requires"                        "Encourages"                 ""                        
    ""                                  ""                                ""                           ""                        
    "Encourages"                        "Requires"                        ""                           ""                        
    ""                                  "Requires"                        ""                           "Requires"                
    ""                                  ""                                ""                           ""                        
    ""                                  ""                                ""                           ""                        
    "Encourages"                        ""                                ""                           ""                        
    ""                                  "Requires"                        ""                           ""                        
    "Encourages"                        ""                                "Encourages"                 ""                        
    "Encourages"                        ""                                "Encourages"                 ""                        
    ""                                  ""                                ""                           ""                        
    ""                                  ""                                ""                           ""                        
    ""                                  ""                                ""                           ""                        
    ""                                  ""                                ""                           ""                        
    "Encourages"                        ""                                "Encourages"                 ""                        
    end
    I have tried:

    Code:
    local N = _N
    forvalues i = 2/`N'{
    foreach j of varlist WHODISCLOSURE* {
    local m1 = `j'[1] 
    if `j'[`i'] == "`m1'" {
    replace `j' = "1" in `i' if `j' != ""
    }
    replace `j' = "0" in `i' if `j' == ""
    }
    }
    However, it doesn't work because the text doesn't exactly match what's in the first line. Does anyone know how to retrieve some information to use from an inexact match?
    Many thanks in advance!

  • #2
    Code:
    foreach v of varlist WHODISCLOSURE* {
        replace `v' = cond(strpos(`v'[1], `v') & !missing(`v'), "1", "0") if _n > 1
    }
    Looping over observations (your forvalues i loop) is rarely a good idea in Stata. There is usually a better way to do things.

    While this code does what you ask, it seems to me that it leaves you with a pretty bizarre dataset. Your first observation's contents looks more like it should be the names (not possible due to embedded spaces) or variable labels (very possible) of the variables. Everything you do from there will require you to explicitly exclude _n == 1 from your calculations. Also, while I am a big fan of 1/0 coding of dichotomies, having the 1 and 0 as strings rather than numbers obliterates the benefits of that because strings cannot be interpreted as logical expressions. You can -destring- the variables, but the stuff in the first observation will torpedo that approach, or, if you -force- the issue, you will lose those contents.

    It's your project, but I would reorganize the data, after the above, as follows:
    Code:
    foreach v of varlist WHODISCLOSURE* {
        label var `v' `"`=`v'[1]'"'
    }
    drop in 1
    destring WHODISCLOSURE*, replace
    I think you will find this a much more workable layout for further data management or analysis in Stata.

    Comment


    • #3
      Thank you Clyde, I have labelled and removed the first row and you are right, the dataset is now much more manageable now.

      Comment

      Working...
      X