Splitting a string variable into several variables

Vignesh Ponnusamy

Join Date: May 2018
Posts: 29

Splitting a string variable into several variables

27 Apr 2022, 00:35

Hi all,

I am working on this dataset which has a string variable (named "lga_perc_covered") that provides information on all geographical areas an identifier (named "sa2_name") falls in, along with the proportion of area covered.

What I would like to is:

1. Split the string variable "lga_perc_covered" into several variables which denotes the Local Government Area (LGA) an identifier falls in and also the proportion of identifier that lies in a LGA. For ex, in row 2, the SA2 area 'Goulburn Region' lies in two LGAs - Upper Lachlan Shire (A) and Queanbeyan-Palerang Regional (A). And 67.59% of Goulburn region is in Upper Lachlan Shire (A) LGA and the remaining area is in Queanbeyan-Palerang Regional (A) LGA. All these information are available in the variable 'lga_perc_covered'.

2. Now, I would like to split this 'lga_perc_covered' into several variables (such as LGA1, LGA1_Perc, LGA2 and LGA2_Perc) that provides information on which LGA a SA2 falls in and the proportion of SA2 that falls in a LGA.

3. Finally, create a set of variables called 'LGA_Max' and 'LGA_Perc_Max' that lists the LGA which has the most percentage of a SA2 area. For example, for row 2, Upper Lachlan Shire (A) LGA contains the highest proportion of Goulburn region SA2 i.e. 67.59%.

Thanks for your help.

Regards
Vignesh.

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input str3 state_name str40 sa2_name str575 lga_perc_covered str24 lga1 float lga1_perc str33 lga2 float lga2_perc str24 lga_max float lga_perc_max
"NSW" "Goulburn"        `"[{"lga" : "Goulburn Mulwaree (A)",  "perc" : 100.00}]"'                                                                "Goulburn Mulwaree (A)"     100 ""                                     . "Goulburn Mulwaree (A)"     100
"NSW" "Goulburn Region" `"[{"lga" : "Upper Lachlan Shire (A)",  "perc" : 67.59}, {"lga" : "Queanbeyan-Palerang Regional (A)", "perc" : 32.41}]"' "Upper Lachlan Shire (A)" 67.59 "Queanbeyan-Palerang Regional (A)" 32.41 "Upper Lachlan Shire (A)" 67.59
"NSW" "Yass"            `"[{"lga" : "Yass Valley (A)",  "perc" : 100.00}]"'                                                                      "Yass Valley (A)"           100 ""                                     . "Yass Valley (A)"           100
"NSW" "Yass Region"     `"[{"lga" : "Yass Valley (A)", "perc" : 67.84}, {"lga" : "Upper Lachlan Shire (A)", "perc" : 32.16},]"'                  "Yass Valley (A)"         67.84 "Upper Lachlan Shire (A)"          32.16 "Yass Valley (A)"         67.84
"NSW" "Young"           `"[{"lga" : "Hilltops (A)", "perc" : 100.00}]"'                                                                          "Hilltops (A)"              100 ""                                     . "Hilltops (A)"              100
end

Tags: None

Andrew Musau

Join Date: Oct 2014
Posts: 10190

27 Apr 2022, 07:37

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input str3 state_name str40 sa2_name str575 lga_perc_covered str24 lga1 float lga1_perc str33 lga2 float lga2_perc str24 lga_max float lga_perc_max
"NSW" "Goulburn"        `"[{"lga" : "Goulburn Mulwaree (A)",  "perc" : 100.00}]"'                                                                "Goulburn Mulwaree (A)"     100 ""                                     . "Goulburn Mulwaree (A)"     100
"NSW" "Goulburn Region" `"[{"lga" : "Upper Lachlan Shire (A)",  "perc" : 67.59}, {"lga" : "Queanbeyan-Palerang Regional (A)", "perc" : 32.41}]"' "Upper Lachlan Shire (A)" 67.59 "Queanbeyan-Palerang Regional (A)" 32.41 "Upper Lachlan Shire (A)" 67.59
"NSW" "Yass"            `"[{"lga" : "Yass Valley (A)",  "perc" : 100.00}]"'                                                                      "Yass Valley (A)"           100 ""                                     . "Yass Valley (A)"           100
"NSW" "Yass Region"     `"[{"lga" : "Yass Valley (A)", "perc" : 67.84}, {"lga" : "Upper Lachlan Shire (A)", "perc" : 32.16},]"'                  "Yass Valley (A)"         67.84 "Upper Lachlan Shire (A)"          32.16 "Yass Valley (A)"         67.84
"NSW" "Young"           `"[{"lga" : "Hilltops (A)", "perc" : 100.00}]"'                                                                          "Hilltops (A)"              100 ""                                     . "Hilltops (A)"              100
end

replace lga_perc_covered = ustrregexra(" " + lga_perc_covered + " ", `"(\(A\)|:|\[|\]|\{|\}|lga|perc|")"', "")
split lga_perc_covered, p(,) g(wanted)
l state_name wanted*

Res.:

Code:

. l state_name wanted*

     +--------------------------------------------------------------------------------------------+
     | state_~e                wanted1      wanted2                            wanted3    wanted4 |
     |--------------------------------------------------------------------------------------------|
  1. |      NSW     Goulburn Mulwaree        100.00                                               |
  2. |      NSW   Upper Lachlan Shire         67.59      Queanbeyan-Palerang Regional       32.41 |
  3. |      NSW           Yass Valley        100.00                                               |
  4. |      NSW           Yass Valley         67.84               Upper Lachlan Shire       32.16 |
  5. |      NSW              Hilltops        100.00                                               |
     +--------------------------------------------------------------------------------------------+

Comment

Vignesh Ponnusamy

Join Date: May 2018

Posts: 29
#3

28 Apr 2022, 01:53

Hi Andrew,

Thanks for your kind help. It worked perfectly.

For the last thing I asked earlier i.e. 3. Finally, create a set of variables called 'LGA_Max' and 'LGA_Perc_Max' that lists the LGA which has the most percentage of a SA2 area. For example, for row 2, Upper Lachlan Shire (A) LGA contains the highest proportion of Goulburn region SA2 i.e. 67.59%.

To execute it, I tried the following:

drop lga_max lga_perc_max lga1 lga1_perc lga2 lga2_perc lga_perc_covered

destring wanted2 wanted4, replace
egen opp = rowmax(wanted2 wanted4)

rename wanted2 percentage1
rename wanted4 percentage2

rename wanted1 lgawanted1
rename wanted3 lgawanted2

gen lga_max=""
replace lga_max=lgawanted1 if percentage1==100

forvalues i=1/2{
replace lga_max=lgawanted`i' if opp-percentage`i'<0.0001 & percentage`i'!=. & lga_max!=" " /* This code is used as the variables 'Opp' and 'Percentage' are in different formats */
}

I wonder whether there is an easy way please to find the maximum percentage and allocate the respective LGA name to the "lga_max" variable.

Regards
Vignesh.
Comment

Andrew Musau

Join Date: Oct 2014
Posts: 10190

28 Apr 2022, 06:03

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input str3 state_name str40 sa2_name str575 lga_perc_covered str24 lga1 float lga1_perc str33 lga2 float lga2_perc str24 lga_max float lga_perc_max
"NSW" "Goulburn"        `"[{"lga" : "Goulburn Mulwaree (A)",  "perc" : 100.00}]"'                                                                "Goulburn Mulwaree (A)"     100 ""                                     . "Goulburn Mulwaree (A)"     100
"NSW" "Goulburn Region" `"[{"lga" : "Upper Lachlan Shire (A)",  "perc" : 67.59}, {"lga" : "Queanbeyan-Palerang Regional (A)", "perc" : 32.41}]"' "Upper Lachlan Shire (A)" 67.59 "Queanbeyan-Palerang Regional (A)" 32.41 "Upper Lachlan Shire (A)" 67.59
"NSW" "Yass"            `"[{"lga" : "Yass Valley (A)",  "perc" : 100.00}]"'                                                                      "Yass Valley (A)"           100 ""                                     . "Yass Valley (A)"           100
"NSW" "Yass Region"     `"[{"lga" : "Yass Valley (A)", "perc" : 67.84}, {"lga" : "Upper Lachlan Shire (A)", "perc" : 32.16},]"'                  "Yass Valley (A)"         67.84 "Upper Lachlan Shire (A)"          32.16 "Yass Valley (A)"         67.84
"NSW" "Young"           `"[{"lga" : "Hilltops (A)", "perc" : 100.00}]"'                                                                          "Hilltops (A)"              100 ""                                     . "Hilltops (A)"              100
end

replace lga_perc_covered = ustrregexra(" " + lga_perc_covered + " ", `"(\(A\)|:|\[|\]|\{|\}|lga|perc|")"', "")
split lga_perc_covered, p(,) g(wanted)
l state_name wanted*

qui ds wanted*
local howmany=wordcount("`r(varlist)'")
local j 1
forval i= 1(2)`=`howmany'-1'{
    rename wanted`i' lga_`j'
    local ++j
}
local j 1
forval i= 2(2)`howmany'{
    rename wanted`i' percent`j'
    destring percent`j', replace
    local ++j
}
reshape long lga_ percent, i(state sa2) j(which)
gsort state sa2 -percent
by state_name sa2: gen wanted1= lga_[1]
by state sa2: gen wanted2= percent[1]
reshape wide lga_ percent, i(state sa2) j(which)

Res.:

Code:

. list state lga_? percent? wanted*

     +---------------------------------------------------------------------------------------------------------------------------+
     | state_~e                  lga_1                              lga_2   percent1   percent2                wanted1   wanted2 |
     |---------------------------------------------------------------------------------------------------------------------------|
  1. |      NSW     Goulburn Mulwaree                                            100          .     Goulburn Mulwaree        100 |
  2. |      NSW   Upper Lachlan Shire       Queanbeyan-Palerang Regional       67.59      32.41   Upper Lachlan Shire      67.59 |
  3. |      NSW           Yass Valley                                            100          .           Yass Valley        100 |
  4. |      NSW           Yass Valley                Upper Lachlan Shire       67.84      32.16           Yass Valley      67.84 |
  5. |      NSW              Hilltops                                            100          .              Hilltops        100 |
     +---------------------------------------------------------------------------------------------------------------------------+

.

Last edited by Andrew Musau; 28 Apr 2022, 06:06.

Comment

Vignesh Ponnusamy

Join Date: May 2018

Posts: 29
#5

28 Apr 2022, 23:17

Hi Andrew,

Thanks for the code. I tried it and it works well.
Your help is much appreciated.

Regards
Vignesh.
Comment

Announcement

Splitting a string variable into several variables

Comment

Comment

Comment

Comment