Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Concatenating data in Stata for export

    Hi all,

    Thanks, as ever, for the time and advice you all give here!

    I have a dataset I've been working on, which I need to export for use in AirTable. Unfortunately, AirTable isn't great with multiple-choice responses so I need to concatenate some of the data.

    The data below lists each type of Sustainable Development Goal that respondents have selected is relevant for their particular project. I need to combine each response into a new variable (sdgvar). But, I'd like to be able to export the new variable with the labelled values and not the numbers (as this will be how users interact with the data in AirTable).

    Currently, I'm running: egen sdgvar = concat(newsdg2*), punct(,)

    Which correctly concatenates all of the variables into one - but also includes the missing variables. Is there a way to:
    • Concatenate all of the labels (perhaps by re-stringing)?
    • Drop any missing values when concatenating?
    My aim is to have the new variable (sdgvar) have data that looks like:
    • Response 1: SDG 17
    • Response 2: SDG 5, SDG 10
    • Response 3: SDG 17
    • Response 4: SDG 3, SDG 5, SDG 6, etc.
    I hope that makes sense.

    Thanks in advance! And code from the file is below.


    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input byte(newsdg21 newsdg22 newsdg23 newsdg24 newsdg25 newsdg26 newsdg27)
    17  .  .  .  .  .  .
     5 10  .  .  .  .  .
    17  .  .  .  .  .  .
     3  5  6  7 11 13  .
     5  7  9 10 13  .  .
     3  .  .  .  .  .  .
     3 10  .  .  .  .  .
     3  .  .  .  .  .  .
     3 10  .  .  .  .  .
     3  .  .  .  .  .  .
    17  .  .  .  .  .  .
    17  .  .  .  .  .  .
    10  .  .  .  .  .  .
    10  .  .  .  .  .  .
     3  7  8 10  .  .  .
     5  .  .  .  .  .  .
     3  5 17  .  .  .  .
     5  .  .  .  .  .  .
     3 17  .  .  .  .  .
     3  .  .  .  .  .  .
     3  5  7  8 10 13 17
    17  .  .  .  .  .  .
     5  7  8 10  .  .  .
    17  .  .  .  .  .  .
     5  .  .  .  .  .  .
     8 17  .  .  .  .  .
     8 16 17  .  .  .  .
     9 10  .  .  .  .  .
    10  .  .  .  .  .  .
     3  8  .  .  .  .  .
     5  .  .  .  .  .  .
     1  .  .  .  .  .  .
     1  .  .  .  .  .  .
     1  .  .  .  .  .  .
     9 16  .  .  .  .  .
     1  5  9 10 13  .  .
     1  5 10 13  .  .  .
     1  5 10 17  .  .  .
     5  6  .  .  .  .  .
    17  .  .  .  .  .  .
    17  .  .  .  .  .  .
     7 11  .  .  .  .  .
     9  .  .  .  .  .  .
     4  .  .  .  .  .  .
     8  9 11  .  .  .  .
     1 10 11  .  .  .  .
    10 11  .  .  .  .  .
     1 10  .  .  .  .  .
    10  .  .  .  .  .  .
     3  .  .  .  .  .  .
     4  8 11 17  .  .  .
    17  .  .  .  .  .  .
     3  .  .  .  .  .  .
     3  .  .  .  .  .  .
     3  .  .  .  .  .  .
    10  .  .  .  .  .  .
     3  .  .  .  .  .  .
     3  .  .  .  .  .  .
     1  4  5 10 12  .  .
    10  .  .  .  .  .  .
    11 13  .  .  .  .  .
     3  .  .  .  .  .  .
     1  8  .  .  .  .  .
    16  .  .  .  .  .  .
     5 10 17  .  .  .  .
     9 10 17  .  .  .  .
     3  5 10  .  .  .  .
    10 17  .  .  .  .  .
     5  9 10 11 16 17  .
    16  .  .  .  .  .  .
    16  .  .  .  .  .  .
    16  .  .  .  .  .  .
    16  .  .  .  .  .  .
    16  .  .  .  .  .  .
    16  .  .  .  .  .  .
    16  .  .  .  .  .  .
    16  .  .  .  .  .  .
    16  .  .  .  .  .  .
     1  5  8  9 10 16 17
     2  3 10  .  .  .  .
    17  .  .  .  .  .  .
     3  5  8 10 17  .  .
     8  .  .  .  .  .  .
     8  9 13 17  .  .  .
     3  5 16 17  .  .  .
     3 17  .  .  .  .  .
     3 17  .  .  .  .  .
     9  .  .  .  .  .  .
     5 10 13  .  .  .  .
    10 11  .  .  .  .  .
     1  3  5  9 10  .  .
    10  .  .  .  .  .  .
    10  .  .  .  .  .  .
     1  5 10 16  .  .  .
     5  9 10  .  .  .  .
     3  .  .  .  .  .  .
     3  5  .  .  .  .  .
     2  3  4 16  .  .  .
     3  .  .  .  .  .  .
    10  .  .  .  .  .  .
    end
    label values newsdg21 sdg1label
    label values newsdg22 sdg1label
    label values newsdg23 sdg1label
    label values newsdg24 sdg1label
    label values newsdg25 sdg1label
    label values newsdg26 sdg1label
    label values newsdg27 sdg1label
    label def sdg1label 1 "1: No Poverty", modify
    label def sdg1label 2 "2: Zero Hunger", modify
    label def sdg1label 3 "3: Good Health and Well-being", modify
    label def sdg1label 4 "4: Quality Education", modify
    label def sdg1label 5 "5: Gender Equality", modify
    label def sdg1label 7 "7: Affordable and Clean Energy", modify
    label def sdg1label 8 "8: Decent Work and Economic Growth", modify
    label def sdg1label 9 "9: Industry, Innovation and Infrastructure", modify
    label def sdg1label 10 "10: Reduced Inequality", modify
    label def sdg1label 11 "11: Sustainable Cities and Communities", modify
    label def sdg1label 16 "16: Peace and Justice and Strong Institutions", modify
    label def sdg1label 17 "17: Partnerships to achieve the Goals", modify
    label def sdg1label 6 "6: Clean Water and Sanitation", modify
    label def sdg1label 13 "13: Climate Action", modify
    label def sdg1label 12 "12: Responsible Consumption and Production", modify

  • #2
    you could add,
    Code:
    replace sdgvar = subinstr(sdgvar,",.","",.)
    edit:
    Code:
    foreach v of var newsdg2* {
    gen str`v' = "SDG " + string(`v')
    }
    egen sdgvar = concat(strnewsdg2*), punct(", ")
    replace sdgvar = subinstr(sdgvar,", SDG .","",.)
    Last edited by Øyvind Snilsberg; 15 Feb 2022, 02:04.

    Comment


    • #3
      Fantastic, thank you! That drops the missing variables. But, the concatenating is still based on the numeric values - not the labels. If I wanted the concatenating to include the labelled values, should I convert the labels into strings? And then run the two commands?

      Comment


      • #4
        Øyvind, I've just seen your edit. This works perfectly!

        Do you mind walking me through (or pointing me to) what you're doing with each line of the code? I have a number of other multiple-choice responses to concatenate (some are a number of words each, so I won't be able to use an acronym each time), so curious to understand what you've done here!

        Comment


        • #5
          Apologies, just enclosing some code here to make it clearer. As with the SDG example, I'm looking to concatenate the software (Zoom, WhatsApp, .NET, etc.) into one variable - but to concatenate the labelled name, not the numeric identifier.

          Code:
          * Example generated by -dataex-. For more info, type help dataex
          clear
          input int(newsoftware1 newsoftware2 newsoftware3 newsoftware4 newsoftware5 newsoftware6 newsoftware7 newsoftware8 newsoftware9)
          634   .   .   .   .   .   .   .   .
            .   .   .   .   .   .   .   .   .
            .   .   .   .   .   .   .   .   .
            1   .   .   .   .   .   .   .   .
            .   .   .   .   .   .   .   .   .
            8  10  14   .   .   .   .   .   .
          615   .   .   .   .   .   .   .   .
          615   .   .   .   .   .   .   .   .
          615   .   .   .   .   .   .   .   .
          615   .   .   .   .   .   .   .   .
          633   .   .   .   .   .   .   .   .
          204 219 285 311 418 419 466 467   .
          633   .   .   .   .   .   .   .   .
            .   .   .   .   .   .   .   .   .
          633   .   .   .   .   .   .   .   .
          633   .   .   .   .   .   .   .   .
          634   .   .   .   .   .   .   .   .
          633   .   .   .   .   .   .   .   .
            .   .   .   .   .   .   .   .   .
            .   .   .   .   .   .   .   .   .
            .   .   .   .   .   .   .   .   .
            .   .   .   .   .   .   .   .   .
          415   .   .   .   .   .   .   .   .
            .   .   .   .   .   .   .   .   .
            1   .   .   .   .   .   .   .   .
            .   .   .   .   .   .   .   .   .
            .   .   .   .   .   .   .   .   .
            1   .   .   .   .   .   .   .   .
            .   .   .   .   .   .   .   .   .
            .   .   .   .   .   .   .   .   .
            .   .   .   .   .   .   .   .   .
            .   .   .   .   .   .   .   .   .
          138   .   .   .   .   .   .   .   .
            .   .   .   .   .   .   .   .   .
            .   .   .   .   .   .   .   .   .
          466 633   .   .   .   .   .   .   .
          466 633   .   .   .   .   .   .   .
          465 633   .   .   .   .   .   .   .
           29  33  59  92 113 151 174 206 207
            .   .   .   .   .   .   .   .   .
          633   .   .   .   .   .   .   .   .
          633   .   .   .   .   .   .   .   .
          633   .   .   .   .   .   .   .   .
            .   .   .   .   .   .   .   .   .
          633   .   .   .   .   .   .   .   .
          634   .   .   .   .   .   .   .   .
          634   .   .   .   .   .   .   .   .
          634   .   .   .   .   .   .   .   .
          634   .   .   .   .   .   .   .   .
            .   .   .   .   .   .   .   .   .
          196 277 355 357 358 443 615 621 632
            .   .   .   .   .   .   .   .   .
            1   .   .   .   .   .   .   .   .
          379   .   .   .   .   .   .   .   .
            .   .   .   .   .   .   .   .   .
          207 356   .   .   .   .   .   .   .
            .   .   .   .   .   .   .   .   .
            .   .   .   .   .   .   .   .   .
            .   .   .   .   .   .   .   .   .
            .   .   .   .   .   .   .   .   .
          295   .   .   .   .   .   .   .   .
            .   .   .   .   .   .   .   .   .
          633   .   .   .   .   .   .   .   .
           92   .   .   .   .   .   .   .   .
            1  28  42  43 314 467 468 490   .
            .   .   .   .   .   .   .   .   .
            .   .   .   .   .   .   .   .   .
          478 553   .   .   .   .   .   .   .
            .   .   .   .   .   .   .   .   .
            .   .   .   .   .   .   .   .   .
            .   .   .   .   .   .   .   .   .
            .   .   .   .   .   .   .   .   .
            .   .   .   .   .   .   .   .   .
            .   .   .   .   .   .   .   .   .
            .   .   .   .   .   .   .   .   .
            .   .   .   .   .   .   .   .   .
            .   .   .   .   .   .   .   .   .
            .   .   .   .   .   .   .   .   .
            .   .   .   .   .   .   .   .   .
            .   .   .   .   .   .   .   .   .
           28  29  59 102 150 204 226 227 228
          285   .   .   .   .   .   .   .   .
            .   .   .   .   .   .   .   .   .
          207 222 434   .   .   .   .   .   .
          285   .   .   .   .   .   .   .   .
            .   .   .   .   .   .   .   .   .
            .   .   .   .   .   .   .   .   .
          473 540   .   .   .   .   .   .   .
            .   .   .   .   .   .   .   .   .
          624   .   .   .   .   .   .   .   .
            .   .   .   .   .   .   .   .   .
            .   .   .   .   .   .   .   .   .
          207 356   .   .   .   .   .   .   .
            .   .   .   .   .   .   .   .   .
          138 139 140 141 142   .   .   .   .
            .   .   .   .   .   .   .   .   .
            1   3 533 609 610   .   .   .   .
          138 139 140 141 142   .   .   .   .
            .   .   .   .   .   .   .   .   .
          285 454   .   .   .   .   .   .   .
          end
          label values newsoftware1 softwarelabel
          label values newsoftware2 softwarelabel
          label values newsoftware3 softwarelabel
          label values newsoftware4 softwarelabel
          label values newsoftware5 softwarelabel
          label values newsoftware6 softwarelabel
          label values newsoftware7 softwarelabel
          label values newsoftware8 softwarelabel
          label values newsoftware9 softwarelabel
          label def softwarelabel 1 ".NET", modify
          label def softwarelabel 8 "AccessMod", modify
          label def softwarelabel 28 "angular", modify
          label def softwarelabel 29 "Angular 8", modify
          label def softwarelabel 92 "Cisco", modify
          label def softwarelabel 138 "DHIS 2", modify
          label def softwarelabel 196 "Excel", modify
          label def softwarelabel 204 "FHIR", modify
          label def softwarelabel 207 "Flutter", modify
          label def softwarelabel 285 "Java", modify
          label def softwarelabel 295 "KoBoToolbox", modify
          label def softwarelabel 379 "mSupply", modify
          label def softwarelabel 415 "OpenClinica", modify
          label def softwarelabel 465 "post gres DB", modify
          label def softwarelabel 466 "Postgres", modify
          label def softwarelabel 473 "python", modify
          label def softwarelabel 478 "R", modify
          label def softwarelabel 615 "Web", modify
          label def softwarelabel 624 "Wordpress", modify
          label def softwarelabel 633 "Other (Specify)", modify
          label def softwarelabel 634 "Not applicable", modify
          label def softwarelabel 3 "115 Call Center", modify
          label def softwarelabel 10 "ACT (Assisted Contact Tracing)", modify
          label def softwarelabel 33 "Apache Superset", modify
          label def softwarelabel 139 "DHIS 2 Tracker", modify
          label def softwarelabel 219 "GeneXpert Lab Management Information System", modify
          label def softwarelabel 222 "Geospatial Widget", modify
          label def softwarelabel 277 "iOS", modify
          label def softwarelabel 356 "Microsoft SQL", modify
          label def softwarelabel 454 "PHP", modify
          label def softwarelabel 540 "SQL", modify
          label def softwarelabel 553 "Tableau", modify
          label def softwarelabel 14 "ADT (Admissions, discharge and transfer)", modify
          label def softwarelabel 42 "AWS", modify
          label def softwarelabel 59 "Bootstrap", modify
          label def softwarelabel 140 "DHIS 2 Tracker Capture", modify
          label def softwarelabel 355 "Microsoft Excel", modify
          label def softwarelabel 434 "OpenStreetMap", modify
          label def softwarelabel 533 "SMS", modify
          label def softwarelabel 43 "AWS Cloud", modify
          label def softwarelabel 102 "CodeIgniter", modify
          label def softwarelabel 141 "DHIS2", modify
          label def softwarelabel 311 "LIMS (Laboratory Information Management System)", modify
          label def softwarelabel 357 "Microsoft Teams", modify
          label def softwarelabel 609 "Vodacom", modify
          label def softwarelabel 113 "Comodo Technology", modify
          label def softwarelabel 142 "DHIS2 Android App", modify
          label def softwarelabel 150 "django", modify
          label def softwarelabel 314 "LMIS (Logistics Management Information Systems)", modify
          label def softwarelabel 358 "Microsoft Word", modify
          label def softwarelabel 418 "OpenELIS", modify
          label def softwarelabel 610 "Voice", modify
          label def softwarelabel 151 "Docker", modify
          label def softwarelabel 419 "OpenELIS Global", modify
          label def softwarelabel 443 "Other", modify
          label def softwarelabel 467 "Power BI", modify
          label def softwarelabel 174 "eLMIS", modify
          label def softwarelabel 226 "Google Earth", modify
          label def softwarelabel 468 "PowerBI", modify
          label def softwarelabel 206 "Firebase", modify
          label def softwarelabel 227 "Google Maps", modify
          label def softwarelabel 490 "Redis", modify
          label def softwarelabel 621 "WhatsApp", modify
          label def softwarelabel 228 "Google My Maps", modify
          label def softwarelabel 632 "Zoom", modify

          Comment


          • #6
            For #5:

            Code:
            foreach v of var newsoft* {
            decode `v', g(n`v')
            }
            
            egen cnctvar = concat(nnews*), punc(", ")
            gen software = ustrregexs(0) if ustrregexm(cnctvar, ".*[^, ]")
            
            drop nnews* cnctvar
            Roman

            Comment


            • #7
              Hi Roman,

              Thanks for this! I'll give it a test tomorrow.

              Do you have a resource or something I can review that unpacks exactly what each of these values (and the ` and other characters) means? I have a few other variables to apply this to!

              Thank you!

              Comment


              • #8
                Originally posted by Calum Andrew View Post

                Hi Roman,

                Thanks for this! I'll give it a test tomorrow.Do you have a resource or something I can review that unpacks exactly what each of these values (and the ` and other characters) means? I have a few other variables to apply this to!

                Thank you!
                There are several pieces. Single quotes are used to refer Stata macros. Type -help macro- in your Stata command box. -decode-, as the name suggests, is a function that is capturing the strings of the labels. See -help decode- . Finally, -ustrregex- is for string manipulations for unicode characters using regular expression. There is not much in the Stata help file on regular expression -help ustrregex-. But see this UCLA page with -regex- which works for plain ASCII strings.
                Roman

                Comment


                • #9
                  Roman, this worked perfectly - and thank you for the detail above too. Both very much appreciated, thank you!

                  Comment

                  Working...
                  X