Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • replacing values in a loop

    Dear All

    This is my first post on statalist. I have a household survey data (119,018 observations). It has a 5-digit variable named "stratum" where the first digit is for the province and second and third digit is for district. Fourth digit is for 'region' whether rural or urban and last digit is for income group. This data set I want to ultimately merge with shapefile for which I need district id variable which I will use as a key variable to merge with shapefile. I have created this variable by recoding the stratum equivalent to district ids in shapefile. There are 161 districts.
    I need another variable 'district name' for which I am using a loop but it does not return what I want. It assigns last district name to each observation. What I want is to have the name of each district in front of its code.
    Following is the code I have used so far. Thanks in advance.

    Code:
    clear
    input long stratum 
    12410 
    12410 
    12410 
    21110 
    21110
    25410
    25410
    25410 
    61121
    61121
    60010
    end
    
    capture drop distt_id
    gen distt_id=stratum
    recode distt_id  (12410=62) (21110=86) (25410=108) (61121 60010=54)
    label var distt_id "District ID"
    
    sort distt_id 
    
    capture drop distt_name
    gen distt_name = ""
    
    local id 54 62 86 108 
    local name `" "Islamabad" "Abbottabad" "Attock" "Nankana" "' 
    
    foreach i of local id {
    foreach v of local name {
            replace distt_name="`v'" if distt_id == `i'
    }
    }

  • #2
    Welcome to Statalist.

    I'm not sure I completely understand the results you want, but the following code - to replace the double loop at the bottom of your current code - corrects the problem you are having with the double loop.
    Code:
    local n : word count `id'
    forvalues w = 1/4 {
        local i : word `w' of `id'
        local v : word `w' of `name'
        replace distt_name="`v'" if distt_id == `i'
    }
    Code:
    . list, sepby(distt_id) noobs
    
      +---------------------------------+
      | stratum   distt_id   distt_name |
      |---------------------------------|
      |   60010         54    Islamabad |
      |   61121         54    Islamabad |
      |   61121         54    Islamabad |
      |---------------------------------|
      |   12410         62   Abbottabad |
      |   12410         62   Abbottabad |
      |   12410         62   Abbottabad |
      |---------------------------------|
      |   21110         86       Attock |
      |   21110         86       Attock |
      |---------------------------------|
      |   25410        108      Nankana |
      |   25410        108      Nankana |
      |   25410        108      Nankana |
      +---------------------------------+

    Comment


    • #3
      Dear William

      Many thanks for your reply. Yes, that is exactly what I wanted.

      Comment


      • #4
        I have a double loop problem. Please I need help

        Code:
        forvalues i=67/69 {
        local i = `i'
        foreach v in AMA ANC APU {
        clear
        import excel "http://proyectos.inei.gob.pe/web/biblioineipub/bancopub/Est/Lib0846/cuadros/c0`i'.xls", sheet("C 3.`i' `v'") cellrange(A11:AF17)
        
        drop B C D E F G H I J K L M N O P Q R S T U V W Z AA AB AC AD AE AF
        gen depart="`v'"
        rename A gedad
        rename X pob_mujfert2017
        rename Y pob_mujfert2018
        reshape long pob_mujfert , i(gedad) j(año)
        sort año
        
        bysort año: egen double mujfert = total(pob_mujfert)
        collapse (mean) mujfert , by(año depart)
        save "dpto_`i'",replace
        }
        }
        Last edited by Andrés Lahur Talavera Cuya; 11 Sep 2019, 14:49.

        Comment


        • #5
          I am trying to automate this:


          *AMA
          clear
          import excel "http://proyectos.inei.gob.pe/web/biblioineipub/bancopub/Est/Lib0846/cuadros/c067.xls", sheet("C 3.67 AMA") cellrange(A11:AF17)
          drop B C D E F G H I J K L M N O P Q R S T U V W Z AA AB AC AD AE AF
          gen depart="AMA"
          rename A gedad
          rename X pob_mujfert2017
          rename Y pob_mujfert2018
          reshape long pob_mujfert , i(gedad) j(año)
          sort año

          bysort año: egen double mujfert = total(pob_mujfert)
          collapse (mean) mujfert , by(año depart)
          save dpto_67,replace

          *ANC
          clear
          import excel "http://proyectos.inei.gob.pe/web/biblioineipub/bancopub/Est/Lib0846/cuadros/c068.xls", sheet("C 3.68 ANC") cellrange(A11:AF17)
          drop B C D E F G H I J K L M N O P Q R S T U V W Z AA AB AC AD AE AF
          gen depart="ANC"
          rename A gedad
          rename X pob_mujfert2017
          rename Y pob_mujfert2018
          reshape long pob_mujfert , i(gedad) j(año)
          sort año

          bysort año: egen double mujfert = total(pob_mujfert)
          collapse (mean) mujfert , by(año depart)
          save dpto_68,replace

          *APU
          clear
          import excel "http://proyectos.inei.gob.pe/web/biblioineipub/bancopub/Est/Lib0846/cuadros/c069.xls", sheet("C 3.69 APU") cellrange(A11:AF17)
          drop B C D E F G H I J K L M N O P Q R S T U V W Z AA AB AC AD AE AF
          gen depart="APU"
          rename A gedad
          rename X pob_mujfert2017
          rename Y pob_mujfert2018
          reshape long pob_mujfert , i(gedad) j(año)
          sort año

          bysort año: egen double mujfert = total(pob_mujfert)
          collapse (mean) mujfert , by(año depart)
          save dpto_69,replace
          Last edited by Andrés Lahur Talavera Cuya; 11 Sep 2019, 14:20.

          Comment


          • #6

            * SOLUTION: DOUBLE BUCLE PROBLEM:
            Code:
            cd "H:\GRAFICASANIMADAS\Ejemplo_46\temp\" 
            
            local values 67 68 69
            local variables `" "AMA" "ANC" "APU" "'
            
            
            local z: word count `values'
            forvalues n = 1/3 {
            
            local variable: word `n' of `variables'
            local value: word `n' of `values' 
            
            clear 
            import excel "http://proyectos.inei.gob.pe/web/biblioineipub/bancopub/Est/Lib0846/cuadros/c0`value'.xls", sheet("C 3.`value' `variable'") cellrange(A11:AF17)
            
            drop B C D E F G H I J K L M N O P Q R S T U V  W Z AA AB AC AD AE AF 
            gen depart="`variable'"
            rename A gedad                             
            rename X pob_mujfert2017   
            rename Y pob_mujfert2018 
            reshape long pob_mujfert , i(gedad) j(año)                       
            sort año   
                              
            bysort año: egen double mujfert = total(pob_mujfert)
            collapse (mean) mujfert , by(año depart)
            save "dpto_`variable'",replace
            }
            *******

            Comment


            • #7
              Hello!
              I have a following data structure :
              Code:
              * Example generated by -dataex-. To install: ssc install dataex
              clear
              input float(country year)
              1 1990
              1 1991
              1 1992
              1 1993
              1 1994
              2 1990
              2 1991
              2 1992
              2 1993
              2 1994
              3 1990
              3 1991
              3 1992
              3 1993
              3 1994
              end
              I need to replace country codes (numeric) with corresponding country names (string), something like this.

              replace country="Australia" if country==1
              replace country="Belgium" if country==2
              replace country="Canada" if country==3

              I have 155 such codes(countries). How can i do that in loop.

              Thanks,
              ​​​​​​​(Ridwan)

              Comment


              • #8
                #7 could only, so far as I can imagine, be written as a loop by placing 155 names in a local (or global, but don't go there) macro and looping over its contents. Not a welcome prospect.

                See Stata | FAQ: Efficiently defining group characteristics to create subsets for a much better idea. merge a dataset containing the mapping with your main dataset.

                Comment


                • #9
                  Thanks Nick Cox
                  This is a hypothetical data set, that i created running following codes

                  Code:
                  clear all
                  timer clear
                  set seed 10
                  
                  global T = 4
                  global I =  3
                  
                  set obs `=$I*$T'
                  
                  gen code = int((_n-1)/$T )+1
                  gen year = mod((_n-1),$T )+1990
                  
                  gen country="."
                  I am using much simpler approach by running the folllowng
                  Code:
                  replace country="Australia" if code==1
                  replace country="Austia" if code==2
                  .
                  .
                  .
                  replace country="Zambia" if code==155
                  But doing it this way for all the 155 countries seems a tedious task.

                  I do'nt know how to code it using local or global with the loop. Your help will be really appreciated


                  Thanks,
                  Last edited by Ridwan Sheikh; 08 Feb 2024, 05:11.

                  Comment


                  • #10
                    Sorry, no, but I won't show you what I think would be terrible coding practice for your problem.

                    Comment


                    • #11
                      Thanks Nick Cox
                      I think i have been able to do it using the following code

                      Code:
                      #delimit ;
                      local country_list "Australia Austria China
                                          Bangladesh India Pakistan";   
                      #delimit cr
                      local i = 1
                      foreach country in `country_list' {
                          replace country = "`country'" if code == `i'
                          local i = `i' + 1
                      }
                      It did exactly, what i needed. I am not sure what is terrible (as you said) in this case though.

                      Thank you for engaging with me on this forum.

                      Regards,
                      (Ridwan)

                      Comment


                      • #12
                        Fair question.

                        First off, the method I recommended in #8 is an excellent clean method expounded and recommended by an expert Stata user, Kit Baum. (I won't mention myself as commending my own advice as coming from me would be vacuous or circular and might seem arrogant too!)

                        So, you need an argument that a loop approach is here as good or better. I really don't think it is, but I owe you more details on why. Here are some overlapping reasons.

                        Loops are wonderful and often lead to clean, concise, efficient code. But not always.

                        Here your example looks moderately simple, but you're looping in parallel, which is almost always more challenging.

                        But crucially, the approach doesn't scale well, meaning it becomes awkward and fallible as you extend it to larger problems. You have 6 names in your toy example and writing down the code for 6 names isn't too much work. Checking the code isn't too much work. But the real problem has 155 country names. Many of those will have spaces or other punctuation, so at a minimum you will need extra double quotes " " binding such names and compound double quotes `" "' binding the whole. I guess even user-programmers who thoroughly understand the principles there will confirm that getting those right can be fiddly: making small mistakes is common and frustrating and needs several iterations and much checking. And naturally the scope for typos, omissions or repetitions of any kind is large.

                        Even if you've typed out 155 country names and the code has run without Stata squawking, you've still expended time and effort in a way that's hard to benefit from. If you need something similar, you've got to retrieve your code with the local macro definition and do some awkward copying even if exactly the same code will help.

                        In contrast, a researcher often has the mapping of country names in another file, or such a file can be created quite easily. And that file will often be useful in related problems, or easily shared with others interested in following or benefitting from your work.

                        So, that's the gist of my case.

                        Comment

                        Working...
                        X