Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Looping with levelsof command

    Hi,
    I am trying to "standardise" my variable prim_state to two letters only. It contains names of both individual and multiple states. If it is a latter case, I want to reduce the variable value to the name of the first state only. For example reduce the observation "OH-KY-IN" to "OH"

    I am not sure how to use levelsof, foreach loop, string length and replace command to set this up. Any help would be highly recommended. Thank you.

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str11 prim_state
    "TX"      
    "OH"      
    "GA"      
    "OR"      
    "NY"      
    "NM"      
    "LA"      
    "PA-NJ"   
    "PA"      
    "TX"      
    "IA"      
    "AK"      
    "MI"      
    "AL"      
    "WI"      
    "NC"      
    "GA"      
    "GA"      
    "NJ"      
    "AL"      
    "GA-SC"   
    "TX"      
    "CA"      
    "MD"      
    "ME"      
    "MA"      
    "LA"      
    "MI"      
    "MI"      
    "TX"      
    "WV"      
    "WA"      
    "OR"      
    "MT"      
    "NY"      
    "AL"      
    "ND"      
    "VA"      
    "IL"      
    "IN"      
    "PA"      
    "ID"      
    "MA-NH"   
    "CO"      
    "KY"      
    "WA"      
    "CT"      
    "TX"      
    "GA"      
    "NY"      
    "NC"      
    "VT"      
    "MD"      
    "OH"      
    "FL"      
    "MO-IL"   
    "IL"      
    "NV"      
    "WY"      
    "IA"      
    "PA"      
    "IL"      
    "WV"      
    "SC"      
    "NC-SC"   
    "VA"      
    "TN-GA"   
    "WY"      
    "IL-IN-WI"
    "CA"      
    "OH-KY-IN"
    "TN-KY"   
    "TN"      
    "OH"      
    "ID"      
    "TX"      
    "CO"      
    "MO"      
    "SC"      
    "GA-AL"   
    "IN"      
    "OH"      
    "TX"      
    "OR"      
    "FL"      
    "MD-WV"   
    "TX"      
    "GA"      
    "IL"      
    "AL"      
    "IA-IL"   
    "OH"      
    "AL"      
    "IL"      
    "FL"      
    "CO"      
    "IA"      
    "MI"      
    "AL"      
    "DE"      
    end

  • #2
    No need for loops here.
    Code:
    . generate str state1 = substr(prim_state,1,2)
    
    . list if strlen(prim_state)>2, clean
    
           prim_s~e   state1  
      8.      PA-NJ       PA  
     21.      GA-SC       GA  
     43.      MA-NH       MA  
     56.      MO-IL       MO  
     65.      NC-SC       NC  
     67.      TN-GA       TN  
     69.   IL-IN-WI       IL  
     71.   OH-KY-IN       OH  
     72.      TN-KY       TN  
     80.      GA-AL       GA  
     86.      MD-WV       MD  
     91.      IA-IL       IA
    Last edited by William Lisowski; 29 Jul 2018, 17:37.

    Comment


    • #3
      Thank you William.
      Yes, your solution is much simpler.

      I was thinking about creating a loop that replaces observations based on stringlength function on levelsof prim_state.
      i would be very interested in that solution too, just to build my understanding of locals in Stata.

      Comment


      • #4
        Here is sample code that uses levelsof to generate a list of distinct values of prim_state, then looks at each value, sees if it's longer than 2 characters, and if it is, uses the first two character to replace the value of prim_state.
        Code:
        generate str original = prim_state
        levelsof prim_state, local(ps_list)
        foreach ps of local ps_list {
            // do something
            local len : strlen local ps
            if `len' > 2 {
                local st1 = substr("`ps'",1,2)
                replace prim_state = "`st1'" if prim_state=="`ps'"
                }
            }
        list if prim_state!=original, clean
        I provide this as an example of using levelsof and a foreach loop to "do something" for each value returned by levelsof. It's just that in this case, the "something" that it's doing is not a good approach to this particular problem.

        Comment


        • #5
          Thanks a ton for the sample code, and pointing out that shortcomings of the approach.
          Also, when I run the code, I get the error
          "strlen not allowed
          r(101);"

          Comment


          • #6
            Works for me in Stata 15.1, and nothing I do to try to induce the error message you got seems to cause it. If you can't get it to work, copy all the commands, output, and error message from the Stata results window and paste it into a CODE block.
            Code:
            . generate str original = prim_state
            
            . levelsof prim_state, local(ps_list)
            `"AK"' `"AL"' `"CA"' `"CO"' `"CT"' `"DE"' `"FL"' `"GA"' `"GA-AL"' `"GA-SC"' `"IA"' `"IA-IL"' `"I
            > D"' `"IL"' `"IL-IN-WI"' `"IN"' `"KY"' `"LA"' `"MA"' `"MA-NH"' `"MD"' `"MD-WV"' `"ME"' `"MI"' `
            > "MO"' `"MO-IL"' `"MT"' `"NC"' `"NC-SC"' `"ND"' `"NJ"' `"NM"' `"NV"' `"NY"' `"OH"' `"OH-KY-IN"'
            >  `"OR"' `"PA"' `"PA-NJ"' `"SC"' `"TN"' `"TN-GA"' `"TN-KY"' `"TX"' `"VA"' `"VT"' `"WA"' `"WI"'
            > `"WV"' `"WY"'
            
            . foreach ps of local ps_list {
              2.     // do something
            .     local len : strlen local ps
              3.     if `len' > 2 {
              4.         local st1 = substr("`ps'",1,2)
              5.         replace prim_state = "`st1'" if prim_state=="`ps'"
              6.         }
              7.     }
            (1 real change made)
            (1 real change made)
            (1 real change made)
            (1 real change made)
            (1 real change made)
            (1 real change made)
            (1 real change made)
            (1 real change made)
            (1 real change made)
            (1 real change made)
            (1 real change made)
            (1 real change made)
            
            . list if prim_state!=original, clean
            
                   prim_s~e   original  
              8.         PA      PA-NJ  
             21.         GA      GA-SC  
             43.         MA      MA-NH  
             56.         MO      MO-IL  
             65.         NC      NC-SC  
             67.         TN      TN-GA  
             69.         IL   IL-IN-WI  
             71.         OH   OH-KY-IN  
             72.         TN      TN-KY  
             80.         GA      GA-AL  
             86.         MD      MD-WV  
             91.         IA      IA-IL  
            
            .
            Last edited by William Lisowski; 29 Jul 2018, 19:35.

            Comment


            • #7
              I have Stata 13.0

              Code:
              . do "C:\Users\arvindsh\AppData\Local\Temp\STD00000000.tmp"
              
              .  
              . clear
              
              . input str11 prim_state
              
                    prim_state
                1. "TX"      
                2. "OH"      
                3. "GA"      
                4. "OR"      
                5. "NY"      
                6. "NM"      
                7. "LA"      
                8. "PA-NJ"   
                9. "PA"      
               10. "TX"      
               11. "IA"      
               12. "AK"      
               13. "MI"      
               14. "AL"      
               15. "WI"      
               16. "NC"      
               17. "GA"      
               18. "GA"      
               19. "NJ"      
               20. "AL"      
               21. "GA-SC"   
               22. "TX"      
               23. "CA"      
               24. "MD"      
               25. "ME"      
               26. "MA"      
               27. "LA"      
               28. "MI"      
               29. "MI"      
               30. "TX"      
               31. "WV"      
               32. "WA"      
               33. "OR"      
               34. "MT"      
               35. "NY"      
               36. "AL"      
               37. "ND"      
               38. "VA"      
               39. "IL"      
               40. "IN"      
               41. "PA"      
               42. "ID"      
               43. "MA-NH"   
               44. "CO"      
               45. "KY"      
               46. "WA"      
               47. "CT"      
               48. "TX"      
               49. "GA"      
               50. "NY"      
               51. "NC"      
               52. "VT"      
               53. "MD"      
               54. "OH"      
               55. "FL"      
               56. "MO-IL"   
               57. "IL"      
               58. "NV"      
               59. "WY"      
               60. "IA"      
               61. "PA"      
               62. "IL"      
               63. "WV"      
               64. "SC"      
               65. "NC-SC"   
               66. "VA"      
               67. "TN-GA"   
               68. "WY"      
               69. "IL-IN-WI"
               70. "CA"      
               71. "OH-KY-IN"
               72. "TN-KY"   
               73. "TN"      
               74. "OH"      
               75. "ID"      
               76. "TX"      
               77. "CO"      
               78. "MO"      
               79. "SC"      
               80. "GA-AL"   
               81. "IN"      
               82. "OH"      
               83. "TX"      
               84. "OR"      
               85. "FL"      
               86. "MD-WV"   
               87. "TX"      
               88. "GA"      
               89. "IL"      
               90. "AL"      
               91. "IA-IL"   
               92. "OH"      
               93. "AL"      
               94. "IL"      
               95. "FL"      
               96. "CO"      
               97. "IA"      
               98. "MI"      
               99. "AL"      
              100. "DE"      
              101. end
              
              . 
              . 
              . generate str original = prim_state
              
              . levelsof prim_state, local(ps_list)
              `"AK"' `"AL"' `"CA"' `"CO"' `"CT"' `"DE"' `"FL"' `"GA"' `"GA-AL"' `"GA-SC"' `"IA"' `"IA-IL"' `"ID"' `"IL"' `"IL-IN-WI"' `"IN"' `"KY"' `
              > "LA"' `"MA"' `"MA-NH"' `"MD"' `"MD-WV"' `"ME"' `"MI"' `"MO"' `"MO-IL"' `"MT"' `"NC"' `"NC-SC"' `"ND"' `"NJ"' `"NM"' `"NV"' `"NY"' `"O
              > H"' `"OH-KY-IN"' `"OR"' `"PA"' `"PA-NJ"' `"SC"' `"TN"' `"TN-GA"' `"TN-KY"' `"TX"' `"VA"' `"VT"' `"WA"' `"WI"' `"WV"' `"WY"'
              
              . foreach ps of local ps_list {
                2.     // do something
              .     local len : strlen local ps
                3.     if `len' > 2 {
                4.         local st1 = substr("`ps'",1,2)
                5.         replace prim_state = "`st1'" if prim_state=="`ps'"
                6.         }
                7.     }
              strlen not allowed
              r(101);
              
              end of do-file
              
              r(101);
              
              .

              Comment


              • #8
                That was a change in Stata 14 that coincides with the implementation of Unicode characters. I haven't tried it, but if you substitute length for strlen, I believe you'll be fine in Stata13 since you can't have Unicode characters:

                help extended_fnc in Stata 14:
                Code:
                 strlen { local | global } mname
                            returns the length of the contents of mname in bytes.  If mname is
                            undefined, 0 is returned. For instance,
                
                                . constraint 1 price = weight
                
                                . local myname : constraint 1
                
                                . macro list _myname
                                _myname          price = weight
                
                                . local lmyname : strlen local myname
                
                                . macro list _lmyname
                                _lmyname:        14

                help extended_fnc in Stata 13:
                Code:
                 length { local | global } macname
                            returns the length of macroname in characters.  If macroname is
                            undefined, 0 is returned. For instance,
                
                                . constraint 1 price = weight
                
                                . local myname : constraint 1
                
                                . macro list _myname
                                _myname          price = weight
                
                                . local lmyname : length local myname
                
                                . macro list _lmyname
                                _lmyname:        14
                Stata/MP 14.1 (64-bit x86-64)
                Revision 19 May 2016
                Win 8.1

                Comment


                • #9
                  https://www.statalist.org/forums/help#version

                  11. What should I say about the version of Stata I use?

                  The current version of Stata is 15.1. Please specify if you are using an earlier version; otherwise, the answer to your question may refer to commands or features unavailable to you. Moreover, as bug fixes and new features are issued frequently by StataCorp, make sure that you update your Stata before posting a query, as your problem may already have been solved.

                  Comment


                  • #10
                    Hi Wilson,
                    Yes, substituting length for strlen solves the problem. Thank you.

                    Comment

                    Working...
                    X