Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Splitting a string variable into several variables

    Hi all,

    I am working on this dataset which has a string variable (named "lga_perc_covered") that provides information on all geographical areas an identifier (named "sa2_name") falls in, along with the proportion of area covered.

    What I would like to is:

    1. Split the string variable "lga_perc_covered" into several variables which denotes the Local Government Area (LGA) an identifier falls in and also the proportion of identifier that lies in a LGA. For ex, in row 2, the SA2 area 'Goulburn Region' lies in two LGAs - Upper Lachlan Shire (A) and Queanbeyan-Palerang Regional (A). And 67.59% of Goulburn region is in Upper Lachlan Shire (A) LGA and the remaining area is in Queanbeyan-Palerang Regional (A) LGA. All these information are available in the variable 'lga_perc_covered'.

    2. Now, I would like to split this 'lga_perc_covered' into several variables (such as LGA1, LGA1_Perc, LGA2 and LGA2_Perc) that provides information on which LGA a SA2 falls in and the proportion of SA2 that falls in a LGA.

    3. Finally, create a set of variables called 'LGA_Max' and 'LGA_Perc_Max' that lists the LGA which has the most percentage of a SA2 area. For example, for row 2, Upper Lachlan Shire (A) LGA contains the highest proportion of Goulburn region SA2 i.e. 67.59%.

    Thanks for your help.

    Regards
    Vignesh.


    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input str3 state_name str40 sa2_name str575 lga_perc_covered str24 lga1 float lga1_perc str33 lga2 float lga2_perc str24 lga_max float lga_perc_max
    "NSW" "Goulburn"        `"[{"lga" : "Goulburn Mulwaree (A)",  "perc" : 100.00}]"'                                                                "Goulburn Mulwaree (A)"     100 ""                                     . "Goulburn Mulwaree (A)"     100
    "NSW" "Goulburn Region" `"[{"lga" : "Upper Lachlan Shire (A)",  "perc" : 67.59}, {"lga" : "Queanbeyan-Palerang Regional (A)", "perc" : 32.41}]"' "Upper Lachlan Shire (A)" 67.59 "Queanbeyan-Palerang Regional (A)" 32.41 "Upper Lachlan Shire (A)" 67.59
    "NSW" "Yass"            `"[{"lga" : "Yass Valley (A)",  "perc" : 100.00}]"'                                                                      "Yass Valley (A)"           100 ""                                     . "Yass Valley (A)"           100
    "NSW" "Yass Region"     `"[{"lga" : "Yass Valley (A)", "perc" : 67.84}, {"lga" : "Upper Lachlan Shire (A)", "perc" : 32.16},]"'                  "Yass Valley (A)"         67.84 "Upper Lachlan Shire (A)"          32.16 "Yass Valley (A)"         67.84
    "NSW" "Young"           `"[{"lga" : "Hilltops (A)", "perc" : 100.00}]"'                                                                          "Hilltops (A)"              100 ""                                     . "Hilltops (A)"              100
    end

  • #2
    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input str3 state_name str40 sa2_name str575 lga_perc_covered str24 lga1 float lga1_perc str33 lga2 float lga2_perc str24 lga_max float lga_perc_max
    "NSW" "Goulburn"        `"[{"lga" : "Goulburn Mulwaree (A)",  "perc" : 100.00}]"'                                                                "Goulburn Mulwaree (A)"     100 ""                                     . "Goulburn Mulwaree (A)"     100
    "NSW" "Goulburn Region" `"[{"lga" : "Upper Lachlan Shire (A)",  "perc" : 67.59}, {"lga" : "Queanbeyan-Palerang Regional (A)", "perc" : 32.41}]"' "Upper Lachlan Shire (A)" 67.59 "Queanbeyan-Palerang Regional (A)" 32.41 "Upper Lachlan Shire (A)" 67.59
    "NSW" "Yass"            `"[{"lga" : "Yass Valley (A)",  "perc" : 100.00}]"'                                                                      "Yass Valley (A)"           100 ""                                     . "Yass Valley (A)"           100
    "NSW" "Yass Region"     `"[{"lga" : "Yass Valley (A)", "perc" : 67.84}, {"lga" : "Upper Lachlan Shire (A)", "perc" : 32.16},]"'                  "Yass Valley (A)"         67.84 "Upper Lachlan Shire (A)"          32.16 "Yass Valley (A)"         67.84
    "NSW" "Young"           `"[{"lga" : "Hilltops (A)", "perc" : 100.00}]"'                                                                          "Hilltops (A)"              100 ""                                     . "Hilltops (A)"              100
    end
    
    replace lga_perc_covered = ustrregexra(" " + lga_perc_covered + " ", `"(\(A\)|:|\[|\]|\{|\}|lga|perc|")"', "")
    split lga_perc_covered, p(,) g(wanted)
    l state_name wanted*
    Res.:

    Code:
    . l state_name wanted*
    
         +--------------------------------------------------------------------------------------------+
         | state_~e                wanted1      wanted2                            wanted3    wanted4 |
         |--------------------------------------------------------------------------------------------|
      1. |      NSW     Goulburn Mulwaree        100.00                                               |
      2. |      NSW   Upper Lachlan Shire         67.59      Queanbeyan-Palerang Regional       32.41 |
      3. |      NSW           Yass Valley        100.00                                               |
      4. |      NSW           Yass Valley         67.84               Upper Lachlan Shire       32.16 |
      5. |      NSW              Hilltops        100.00                                               |
         +--------------------------------------------------------------------------------------------+

    Comment


    • #3
      Hi Andrew,

      Thanks for your kind help. It worked perfectly.

      For the last thing I asked earlier i.e. 3. Finally, create a set of variables called 'LGA_Max' and 'LGA_Perc_Max' that lists the LGA which has the most percentage of a SA2 area. For example, for row 2, Upper Lachlan Shire (A) LGA contains the highest proportion of Goulburn region SA2 i.e. 67.59%.

      To execute it, I tried the following:

      drop lga_max lga_perc_max lga1 lga1_perc lga2 lga2_perc lga_perc_covered

      destring wanted2 wanted4, replace
      egen opp = rowmax(wanted2 wanted4)

      rename wanted2 percentage1
      rename wanted4 percentage2

      rename wanted1 lgawanted1
      rename wanted3 lgawanted2

      gen lga_max=""
      replace lga_max=lgawanted1 if percentage1==100

      forvalues i=1/2{
      replace lga_max=lgawanted`i' if opp-percentage`i'<0.0001 & percentage`i'!=. & lga_max!=" " /* This code is used as the variables 'Opp' and 'Percentage' are in different formats */
      }


      I wonder whether there is an easy way please to find the maximum percentage and allocate the respective LGA name to the "lga_max" variable.


      Regards
      Vignesh.

      Comment


      • #4
        Code:
        * Example generated by -dataex-. For more info, type help dataex
        clear
        input str3 state_name str40 sa2_name str575 lga_perc_covered str24 lga1 float lga1_perc str33 lga2 float lga2_perc str24 lga_max float lga_perc_max
        "NSW" "Goulburn"        `"[{"lga" : "Goulburn Mulwaree (A)",  "perc" : 100.00}]"'                                                                "Goulburn Mulwaree (A)"     100 ""                                     . "Goulburn Mulwaree (A)"     100
        "NSW" "Goulburn Region" `"[{"lga" : "Upper Lachlan Shire (A)",  "perc" : 67.59}, {"lga" : "Queanbeyan-Palerang Regional (A)", "perc" : 32.41}]"' "Upper Lachlan Shire (A)" 67.59 "Queanbeyan-Palerang Regional (A)" 32.41 "Upper Lachlan Shire (A)" 67.59
        "NSW" "Yass"            `"[{"lga" : "Yass Valley (A)",  "perc" : 100.00}]"'                                                                      "Yass Valley (A)"           100 ""                                     . "Yass Valley (A)"           100
        "NSW" "Yass Region"     `"[{"lga" : "Yass Valley (A)", "perc" : 67.84}, {"lga" : "Upper Lachlan Shire (A)", "perc" : 32.16},]"'                  "Yass Valley (A)"         67.84 "Upper Lachlan Shire (A)"          32.16 "Yass Valley (A)"         67.84
        "NSW" "Young"           `"[{"lga" : "Hilltops (A)", "perc" : 100.00}]"'                                                                          "Hilltops (A)"              100 ""                                     . "Hilltops (A)"              100
        end
        
        replace lga_perc_covered = ustrregexra(" " + lga_perc_covered + " ", `"(\(A\)|:|\[|\]|\{|\}|lga|perc|")"', "")
        split lga_perc_covered, p(,) g(wanted)
        l state_name wanted*
        
        qui ds wanted*
        local howmany=wordcount("`r(varlist)'")
        local j 1
        forval i= 1(2)`=`howmany'-1'{
            rename wanted`i' lga_`j'
            local ++j
        }
        local j 1
        forval i= 2(2)`howmany'{
            rename wanted`i' percent`j'
            destring percent`j', replace
            local ++j
        }
        reshape long lga_ percent, i(state sa2) j(which)
        gsort state sa2 -percent
        by state_name sa2: gen wanted1= lga_[1]
        by state sa2: gen wanted2= percent[1]
        reshape wide lga_ percent, i(state sa2) j(which)

        Res.:

        Code:
        . list state lga_? percent? wanted*
        
             +---------------------------------------------------------------------------------------------------------------------------+
             | state_~e                  lga_1                              lga_2   percent1   percent2                wanted1   wanted2 |
             |---------------------------------------------------------------------------------------------------------------------------|
          1. |      NSW     Goulburn Mulwaree                                            100          .     Goulburn Mulwaree        100 |
          2. |      NSW   Upper Lachlan Shire       Queanbeyan-Palerang Regional       67.59      32.41   Upper Lachlan Shire      67.59 |
          3. |      NSW           Yass Valley                                            100          .           Yass Valley        100 |
          4. |      NSW           Yass Valley                Upper Lachlan Shire       67.84      32.16           Yass Valley      67.84 |
          5. |      NSW              Hilltops                                            100          .              Hilltops        100 |
             +---------------------------------------------------------------------------------------------------------------------------+
        
        .
        Last edited by Andrew Musau; 28 Apr 2022, 07:06.

        Comment


        • #5
          Hi Andrew,

          Thanks for the code. I tried it and it works well.
          Your help is much appreciated.

          Regards
          Vignesh.

          Comment

          Working...
          X