replacing values in a loop

Mudassira Sarfraz

Join Date: Aug 2019

Posts: 21
#1

replacing values in a loop

04 Sep 2019, 04:38

Dear All

This is my first post on statalist. I have a household survey data (119,018 observations). It has a 5-digit variable named "stratum" where the first digit is for the province and second and third digit is for district. Fourth digit is for 'region' whether rural or urban and last digit is for income group. This data set I want to ultimately merge with shapefile for which I need district id variable which I will use as a key variable to merge with shapefile. I have created this variable by recoding the stratum equivalent to district ids in shapefile. There are 161 districts.
I need another variable 'district name' for which I am using a loop but it does not return what I want. It assigns last district name to each observation. What I want is to have the name of each district in front of its code.
Following is the code I have used so far. Thanks in advance.

Code:

clear input long stratum 12410 12410 12410 21110 21110 25410 25410 25410 61121 61121 60010 end capture drop distt_id gen distt_id=stratum recode distt_id (12410=62) (21110=86) (25410=108) (61121 60010=54) label var distt_id "District ID" sort distt_id capture drop distt_name gen distt_name = "" local id 54 62 86 108 local name `" "Islamabad" "Abbottabad" "Attock" "Nankana" "' foreach i of local id { foreach v of local name { replace distt_name="`v'" if distt_id == `i' } }
Tags: None

William Lisowski

Join Date: Dec 2014
Posts: 10150

04 Sep 2019, 13:36

Welcome to Statalist.

I'm not sure I completely understand the results you want, but the following code - to replace the double loop at the bottom of your current code - corrects the problem you are having with the double loop.

Code:

local n : word count `id'
forvalues w = 1/4 {
    local i : word `w' of `id'
    local v : word `w' of `name'
    replace distt_name="`v'" if distt_id == `i'
}

Code:

. list, sepby(distt_id) noobs

  +---------------------------------+
  | stratum   distt_id   distt_name |
  |---------------------------------|
  |   60010         54    Islamabad |
  |   61121         54    Islamabad |
  |   61121         54    Islamabad |
  |---------------------------------|
  |   12410         62   Abbottabad |
  |   12410         62   Abbottabad |
  |   12410         62   Abbottabad |
  |---------------------------------|
  |   21110         86       Attock |
  |   21110         86       Attock |
  |---------------------------------|
  |   25410        108      Nankana |
  |   25410        108      Nankana |
  |   25410        108      Nankana |
  +---------------------------------+

Comment

Mudassira Sarfraz

Join Date: Aug 2019

Posts: 21
#3

04 Sep 2019, 14:54

Dear William

Many thanks for your reply. Yes, that is exactly what I wanted.
Comment

Andrés Lahur Talavera Cuya

Join Date: Dec 2018
Posts: 54

11 Sep 2019, 13:13

I have a double loop problem. Please I need help

Code:

forvalues i=67/69 {
local i = `i'
foreach v in AMA ANC APU {
clear
import excel "http://proyectos.inei.gob.pe/web/biblioineipub/bancopub/Est/Lib0846/cuadros/c0`i'.xls", sheet("C 3.`i' `v'") cellrange(A11:AF17)

drop B C D E F G H I J K L M N O P Q R S T U V W Z AA AB AC AD AE AF
gen depart="`v'"
rename A gedad
rename X pob_mujfert2017
rename Y pob_mujfert2018
reshape long pob_mujfert , i(gedad) j(año)
sort año

bysort año: egen double mujfert = total(pob_mujfert)
collapse (mean) mujfert , by(año depart)
save "dpto_`i'",replace
}
}

Last edited by Andrés Lahur Talavera Cuya; 11 Sep 2019, 13:49.

Comment

Andrés Lahur Talavera Cuya

Join Date: Dec 2018

Posts: 54
#5

11 Sep 2019, 13:18

I am trying to automate this:

*AMA
clear
import excel "http://proyectos.inei.gob.pe/web/biblioineipub/bancopub/Est/Lib0846/cuadros/c067.xls", sheet("C 3.67 AMA") cellrange(A11:AF17)
drop B C D E F G H I J K L M N O P Q R S T U V W Z AA AB AC AD AE AF
gen depart="AMA"
rename A gedad
rename X pob_mujfert2017
rename Y pob_mujfert2018
reshape long pob_mujfert , i(gedad) j(año)
sort año

bysort año: egen double mujfert = total(pob_mujfert)
collapse (mean) mujfert , by(año depart)
save dpto_67,replace

*ANC
clear
import excel "http://proyectos.inei.gob.pe/web/biblioineipub/bancopub/Est/Lib0846/cuadros/c068.xls", sheet("C 3.68 ANC") cellrange(A11:AF17)
drop B C D E F G H I J K L M N O P Q R S T U V W Z AA AB AC AD AE AF
gen depart="ANC"
rename A gedad
rename X pob_mujfert2017
rename Y pob_mujfert2018
reshape long pob_mujfert , i(gedad) j(año)
sort año

bysort año: egen double mujfert = total(pob_mujfert)
collapse (mean) mujfert , by(año depart)
save dpto_68,replace

*APU
clear
import excel "http://proyectos.inei.gob.pe/web/biblioineipub/bancopub/Est/Lib0846/cuadros/c069.xls", sheet("C 3.69 APU") cellrange(A11:AF17)
drop B C D E F G H I J K L M N O P Q R S T U V W Z AA AB AC AD AE AF
gen depart="APU"
rename A gedad
rename X pob_mujfert2017
rename Y pob_mujfert2018
reshape long pob_mujfert , i(gedad) j(año)
sort año

bysort año: egen double mujfert = total(pob_mujfert)
collapse (mean) mujfert , by(año depart)
save dpto_69,replace

Last edited by Andrés Lahur Talavera Cuya; 11 Sep 2019, 13:20.
Comment

Andrés Lahur Talavera Cuya

Join Date: Dec 2018
Posts: 54

12 Sep 2019, 01:12

* SOLUTION: DOUBLE BUCLE PROBLEM:

Code:

cd "H:\GRAFICASANIMADAS\Ejemplo_46\temp\" 

local values 67 68 69
local variables `" "AMA" "ANC" "APU" "'


local z: word count `values'
forvalues n = 1/3 {

local variable: word `n' of `variables'
local value: word `n' of `values' 

clear 
import excel "http://proyectos.inei.gob.pe/web/biblioineipub/bancopub/Est/Lib0846/cuadros/c0`value'.xls", sheet("C 3.`value' `variable'") cellrange(A11:AF17)

drop B C D E F G H I J K L M N O P Q R S T U V  W Z AA AB AC AD AE AF 
gen depart="`variable'"
rename A gedad                             
rename X pob_mujfert2017   
rename Y pob_mujfert2018 
reshape long pob_mujfert , i(gedad) j(año)                       
sort año   
                  
bysort año: egen double mujfert = total(pob_mujfert)
collapse (mean) mujfert , by(año depart)
save "dpto_`variable'",replace
}
*******

Comment

Ridwan Sheikh

Join Date: Apr 2021

Posts: 169
#7

08 Feb 2024, 01:28

Hello!
I have a following data structure :

Code:

* Example generated by -dataex-. To install: ssc install dataex clear input float(country year) 1 1990 1 1991 1 1992 1 1993 1 1994 2 1990 2 1991 2 1992 2 1993 2 1994 3 1990 3 1991 3 1992 3 1993 3 1994 end

I need to replace country codes (numeric) with corresponding country names (string), something like this.

replace country="Australia" if country==1
replace country="Belgium" if country==2
replace country="Canada" if country==3

I have 155 such codes(countries). How can i do that in loop.

Thanks,
(Ridwan)
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35698
#8

08 Feb 2024, 02:00

#7 could only, so far as I can imagine, be written as a loop by placing 155 names in a local (or global, but don't go there) macro and looping over its contents. Not a welcome prospect.

See Stata | FAQ: Efficiently defining group characteristics to create subsets for a much better idea. merge a dataset containing the mapping with your main dataset.
Comment
Ridwan Sheikh

Join Date: Apr 2021

Posts: 169
#9

08 Feb 2024, 03:52

Thanks Nick Cox
This is a hypothetical data set, that i created running following codes

Code:

clear all timer clear set seed 10 global T = 4 global I = 3 set obs `=$I*$T' gen code = int((_n-1)/$T )+1 gen year = mod((_n-1),$T )+1990 gen country="."

I am using much simpler approach by running the folllowng

Code:

replace country="Australia" if code==1 replace country="Austia" if code==2 . . . replace country="Zambia" if code==155

But doing it this way for all the 155 countries seems a tedious task.

I do'nt know how to code it using local or global with the loop. Your help will be really appreciated

Thanks,

Last edited by Ridwan Sheikh; 08 Feb 2024, 04:11.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35698
#10

08 Feb 2024, 04:11

Sorry, no, but I won't show you what I think would be terrible coding practice for your problem.
Comment
Ridwan Sheikh

Join Date: Apr 2021

Posts: 169
#11

08 Feb 2024, 11:12

Thanks Nick Cox
I think i have been able to do it using the following code

Code:

#delimit ; local country_list "Australia Austria China Bangladesh India Pakistan"; #delimit cr local i = 1 foreach country in `country_list' { replace country = "`country'" if code == `i' local i = `i' + 1 }

It did exactly, what i needed. I am not sure what is terrible (as you said) in this case though.

Thank you for engaging with me on this forum.

Regards,
(Ridwan)
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35698
#12

09 Feb 2024, 03:35

Fair question.

First off, the method I recommended in #8 is an excellent clean method expounded and recommended by an expert Stata user, Kit Baum. (I won't mention myself as commending my own advice as coming from me would be vacuous or circular and might seem arrogant too!)

So, you need an argument that a loop approach is here as good or better. I really don't think it is, but I owe you more details on why. Here are some overlapping reasons.

Loops are wonderful and often lead to clean, concise, efficient code. But not always.

Here your example looks moderately simple, but you're looping in parallel, which is almost always more challenging.

But crucially, the approach doesn't scale well, meaning it becomes awkward and fallible as you extend it to larger problems. You have 6 names in your toy example and writing down the code for 6 names isn't too much work. Checking the code isn't too much work. But the real problem has 155 country names. Many of those will have spaces or other punctuation, so at a minimum you will need extra double quotes " " binding such names and compound double quotes `" "' binding the whole. I guess even user-programmers who thoroughly understand the principles there will confirm that getting those right can be fiddly: making small mistakes is common and frustrating and needs several iterations and much checking. And naturally the scope for typos, omissions or repetitions of any kind is large.

Even if you've typed out 155 country names and the code has run without Stata squawking, you've still expended time and effort in a way that's hard to benefit from. If you need something similar, you've got to retrieve your code with the local macro definition and do some awkward copying even if exactly the same code will help.

In contrast, a researcher often has the mapping of country names in another file, or such a file can be created quite easily. And that file will often be useful in related problems, or easily shared with others interested in following or benefitting from your work.

So, that's the gist of my case.
Comment

Announcement

replacing values in a loop

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment