Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Vertically aligning a variable on do file to edit/change code simultaneously on multiple lines

    Due to certain nature of my master data, I need to hard code my using data ( the following one ) from city and state name to countyfip code.

    My using data has all three required variables of all the cities , states and counties of USA. On Mac, if you hold down the option button ( in windows you hold down alt button ) and move the cursor it lets you do multiple lines of coding simultaneously if all the variables are aligned like the attached picture of my gen county and city name. But, whenever I'm doing the state this particular thing is not aligned.

    If I could make the state_id aligned vertically , then I can easily code manually ( 100 ore more observations just typing once ) . Is there anything I could do on the do file to align the state_id vertically so that I can apply the same method ? I'm attaching picture and picture both so that you have a better grasp of what I'm talking about. As you can see in my data and picture that the bottom half needs be coded simultaneously to change it from bottom half format to top half format. The problem arises, I cannot do it simultaneously , since the statefip is not vertically aligned. But, county and city are vertically aligned , so I can apply the method in those variables.

    Code:
    gen county = 41051    if city == "Portland"    & statefip =="OR"
    gen county = 48029    if city == "San Antonio"    & statefip == "TX"
    gen county = 29510    if city == "St. Louis"    & statefip =="MO"
    gen county = 6067    if city == "Sacramento"    & statefip =="CA"
    gen county = 12095    if city == "Orlando" & statefip == "FL"
    gen county = 6085    if city == "San Jose"    & statefip =="CA"
    gen county = 39035    if city == "Cleveland"    & statefip =="OH"
    gen county = 42003    if city == "Pittsburgh"    & statefip =="PA"
    gen county = 48453    if city == "Austin    TX" & statefip =="TX"
    gen county = 39061    if city == "Cincinnati"    & statefip =="OH"
    gen county = 29095    if city == "Kansas City" & statefip =="MO"
    gen county = 36061    if city == "Manhattan"    & statefip =="NY"
    gen county = 18097    if city == "Indianapolis" & statefip =="IN"
    gen county = 39049    if city == "Columbus"    & statefip =="OH"
    gen county = 37119    if city == "Charlotte"    & statefip =="NC"
    
    5119    Little Rock    AR
    6001    Oakland    CA
    37063    Durham    NC
    37067    Winston-Salem    NC
    12071    Bonita Springs    FL
    6065    Indio    CA
    12035    Palm Coast    FL
    47065    Chattanooga    TN
    53063    Spokane    WA
    36067    Syracuse    NY
    42071    Lancaster    PA
    48439    Arlington    TX
    6077    Stockton    CA
    36027    Poughkeepsie    NY
    13245    Augusta    GA
    16001    Boise    ID
    6111    Oxnard    CA
    42069    Scranton    PA
    6099    Modesto    CA
    12097    Kissimmee    FL
    8001    Aurora    CO
    39099    Youngstown    OH
    5143    Fayetteville    AR
    Last edited by Tariq Abdullah; 26 Oct 2022, 14:25.

  • #2
    Here is one way to achieve what you want:

    Step 1. Copy your bottom half stuff into a file called stuff.txt in your working directory.
    Step 2. Run this code:

    Code:
    clear
    tempname readfile writefile
    file open `readfile' using stuff.txt, read text
    file open `writefile' using stuff.do, write text replace
    file read `readfile' line
    while r(eof) == 0 {
        local line = ustrtrim(itrim("`line'"))
        local county = ustrtrim("`:word 1 of `line''")
        local line = ustrtrim(subinstr("`line'","`county'","",1))
        local state_id = substr("`line'",-2,.)
        local city = ustrtrim(substr("`line'",1,length("`line'")-3))
        local command `"gen county = `county' if city == "`city'" & statefip == "`state_id'""'
        file write `writefile' `"`command'"' _n
        file read `readfile' line
    }
    
    file close `readfile'
    file close `writefile'
    Step 3. Enjoy your new code in the file stuff.do

    Starting with a file called stuff.txt with the following contents:
    Code:
    5119     Little Rock    AR
    6001     Oakland    CA
    37063     Durham    NC
    37067     Winston-Salem    NC
    12071     Bonita Springs    FL
    6065     Indio    CA
    12035     Palm Coast    FL
    47065     Chattanooga    TN
    53063     Spokane    WA
    36067     Syracuse    NY
    42071     Lancaster    PA
    48439     Arlington    TX
    6077     Stockton    CA
    36027     Poughkeepsie    NY
    13245     Augusta    GA
    16001     Boise    ID
    6111     Oxnard    CA
    42069     Scranton    PA
    6099     Modesto    CA
    12097     Kissimmee    FL
    8001     Aurora    CO
    39099     Youngstown    OH
    5143     Fayetteville    AR
    you will end up with a file called stuff.do with the contents:
    Code:
    gen county = 5119 if city == "Little Rock" & statefip == "AR"
    gen county = 6001 if city == "Oakland" & statefip == "CA"
    gen county = 37063 if city == "Durham" & statefip == "NC"
    gen county = 37067 if city == "Winston-Salem" & statefip == "NC"
    gen county = 12071 if city == "Bonita Springs" & statefip == "FL"
    gen county = 6065 if city == "Indio" & statefip == "CA"
    gen county = 12035 if city == "Palm Coast" & statefip == "FL"
    gen county = 47065 if city == "Chattanooga" & statefip == "TN"
    gen county = 53063 if city == "Spokane" & statefip == "WA"
    gen county = 36067 if city == "Syracuse" & statefip == "NY"
    gen county = 42071 if city == "Lancaster" & statefip == "PA"
    gen county = 48439 if city == "Arlington" & statefip == "TX"
    gen county = 6077 if city == "Stockton" & statefip == "CA"
    gen county = 36027 if city == "Poughkeepsie" & statefip == "NY"
    gen county = 13245 if city == "Augusta" & statefip == "GA"
    gen county = 16001 if city == "Boise" & statefip == "ID"
    gen county = 6111 if city == "Oxnard" & statefip == "CA"
    gen county = 42069 if city == "Scranton" & statefip == "PA"
    gen county = 6099 if city == "Modesto" & statefip == "CA"
    gen county = 12097 if city == "Kissimmee" & statefip == "FL"
    gen county = 8001 if city == "Aurora" & statefip == "CO"
    gen county = 39099 if city == "Youngstown" & statefip == "OH"
    gen county = 5143 if city == "Fayetteville" & statefip == "AR"
    Last edited by Hemanshu Kumar; 26 Oct 2022, 15:24.

    Comment


    • #3
      I have almost 28,000 observations in my data which spans from one type of negate to different type of length. On top of that, when I applied the code you graciously provided gave me do file that's not consistent over all the observation.

      I should have provided a detailed dataset like I'm doing now for my dataset. You can see the minimum length to maximum length of every variable in my following data set. If you think, I can change the code above to fit my data , then please do kindly tell. I'm giving you the format of do file that I get after applying your code. I' going the format of my variables so that you can see how what's the highest length of my variables - if that helps.

      Code:
      input str35 city str2 state_id long county_fips
      Code:
      gen county = 36061    New if city == "York" & statefip == "NY"
      gen county = 6037    Los if city == "Angeles" & statefip == "CA"
      gen county = 17031    Chicago    IL if city == "" & statefip == ""
      gen county = 12086    Miami    FL if city == "" & statefip == ""
      gen county = 48113    Dallas    TX if city == "" & statefip == ""
      gen county = 42101    Philadelphia    PA if city == "" & statefip == ""
      gen county = 48201    Houston    TX if city == "" & statefip == ""
      gen county = 13121    Atlanta    GA if city == "" & statefip == ""
      gen county = 11001    Washington    DC if city == "" & statefip == ""
      gen county = 25025    Boston    MA if city == "" & statefip == ""
      gen county = 4013    Phoenix    AZ if city == "" & statefip == ""
      gen county = 53033    Seattle    WA if city == "" & statefip == ""
      gen county = 6075    San if city == "Francisco" & statefip == "CA"
      gen county = 26163    Detroit    MI if city == "" & statefip == ""
      gen county = 6073    San if city == "Diego" & statefip == "CA"
      gen county = 27053    Minneapolis    MN if city == "" & statefip == ""
      gen county = 12057    Tampa    FL if city == "" & statefip == ""
      gen county = 8031    Denver    CO if city == "" & statefip == ""
      gen county = 36047    Brooklyn    NY if city == "" & statefip == ""
      gen county = 36081    Queens    NY if city == "" & statefip == ""

      The following is the sample of my data which has different length for city and statefip variable- and therefore the inconsistent result on the do file. Since, county has almost no variation in length ( 4 to 5) , county was consistent

      Code:
      input str35 city str2 state_id long county_fips
      Code:
      36061    New York    NY
      6037    Los Angeles    CA
      17031    Chicago    IL
      12086    Miami    FL
      48113    Dallas    TX
      42101    Philadelphia    PA
      48201    Houston    TX
      13121    Atlanta    GA
      11001    Washington    DC
      25025    Boston    MA
      4013    Phoenix    AZ
      53033    Seattle    WA
      6075    San Francisco    CA
      26163    Detroit    MI
      6073    San Diego    CA
      27053    Minneapolis    MN
      12057    Tampa    FL
      37183    Raleigh    NC
      22071    New Orleans    LA
      21111    Louisville    KY
      40109    Oklahoma City    OK
      9001    Bridgeport    CT
      36029    Buffalo    NY
      48439    Fort Worth    TX
      9003    Hartford    CT
      4019    Tucson    AZ
      31055    Omaha    NE
      48141    El Paso    TX
      15003    Honolulu    HI
      48215    McAllen    TX
      35001    Albuquerque    NM
      1073    Birmingham    AL
      12115    Sarasota    FL
      39113    Dayton    OH
      36055    Rochester    NY
      6019    Fresno    CA
      42077    Allentown    PA
      40143    Tulsa    OK
      12071    Cape Coral    FL
      6013    Concord    CA
      8041    Colorado Springs    CO
      45019    Charleston    SC
      25013    Springfield    MA
      26065    Lansing    MI
      21067    Lexington    KY
      17031    Arlington Heights    IL
      17197    Bolingbrook    IL
      26125    Rochester Hills    MI
      51840    Winchester    VA
      25017    Framingham    MA
      45013    Hilton Head Island    SC
      5119    North Little Rock    AR
      6081    South San Francisco    CA
      12099    Palm Beach Gardens    FL
      4019    Catalina Foothills    AZ
      13135    Peachtree Corners    GA
      6037    Rancho Palos Verdes    CA
      12086    South Miami Heights    FL
      27037    Inver Grove Heights    MN
      49035    Cottonwood Heights    UT
      12019    Bellair-Meadowbrook Terrace    FL
      36119    Jefferson Valley-Yorktown    NY
      34029    Holiday City-Berkeley    NJ
      30049    Helena Valley West Central    MT
      6113    University of California-Davis    CA
      35001    Los Ranchos de Albuquerque    NM
      24043    Fountainhead-Orchard Hills    MD
      6097    Fetters Hot Springs-Agua Caliente    CA
      53033    Maple Heights-Lake Desire    WA
      42017    Brittany Farms-The Highlands    PA
      26163    Village of Grosse Pointe Shores    MI
      26021    Shorewood-Tower Hills-Harbert    MI
      28083    Mississippi Valley State University    MS
      42103    Pocono Mountain Lake Estates    PA
      23029    Passamaquoddy Pleasant Point    ME

      Comment


      • #4
        In post #1 you showed lines of plain text.

        In post #3 you showed lines of plain text preceded by an input command. What is the purpose of the input command? It will not read the lines of plain text - the variable names are not even in the correct order. And even if they were, the spaces in some city names will cause problems.

        The elegant code provided in post #2 by Hemanshu Kumar is designed to read a plain text file contains lines like those in post #1 and post #3. I can assure you when used as described - on a plain text file containing lines of data - it works as intended. I prepared the plain text file stuff.txt containing the lines from post #3 and it produced the lines shown, without the problems you showed in post #3.

        I can only assume that you did not follow the instructions given for creating the stuff.txt file that is to be read by the code from post #3.
        Code:
        . clear
        
        . type ~/Downloads/stuff.txt, lines(10) // let's see the first 20 lines of data
        36061    New York    NY
        6037    Los Angeles    CA
        17031    Chicago    IL
        12086    Miami    FL
        48113    Dallas    TX
        42101    Philadelphia    PA
        48201    Houston    TX
        13121    Atlanta    GA
        11001    Washington    DC
        25025    Boston    MA
        . 
        . tempname readfile writefile
        
        . file open `readfile' using ~/Downloads/stuff.txt, read text
        
        . file open `writefile' using ~/Downloads/stuff.do, write text replace
        
        . file read `readfile' line
        
        . while r(eof) == 0 {
          2.     local line = ustrtrim(itrim("`line'"))
          3.     local county = ustrtrim("`:word 1 of `line''")
          4.     local line = ustrtrim(subinstr("`line'","`county'","",1))
          5.     local state_id = substr("`line'",-2,.)
          6.     local city = ustrtrim(substr("`line'",1,length("`line'")-3))
          7.     local command `"gen county = `county' if city == "`city'" & statefip == "`state_id'""'
          8.     file write `writefile' `"`command'"' _n
          9.     file read `readfile' line
         10. }
        
        . 
        . file close `readfile'
        
        . file close `writefile'
        
        . 
        . type ~/Downloads/stuff.do, lines(10) // let's see the results
        gen county = 36061 if city == "New York" & statefip == "NY"
        gen county = 6037 if city == "Los Angeles" & statefip == "CA"
        gen county = 17031 if city == "Chicago" & statefip == "IL"
        gen county = 12086 if city == "Miami" & statefip == "FL"
        gen county = 48113 if city == "Dallas" & statefip == "TX"
        gen county = 42101 if city == "Philadelphia" & statefip == "PA"
        gen county = 48201 if city == "Houston" & statefip == "TX"
        gen county = 13121 if city == "Atlanta" & statefip == "GA"
        gen county = 11001 if city == "Washington" & statefip == "DC"
        gen county = 25025 if city == "Boston" & statefip == "MA"
        .

        Comment


        • #5
          Yes both of you are abolsutely right that I'm doing something wrong producing the data set to text file. Since, I have the dataset stored in csv file and in dta file , I'm just copy pasting them to my text editor file on Mac to convert it to a plain text file. From stata file when I'm copy pasting , it comes with the variable names, and therefore the variable name shows up. In excel file, I removed the variable names but still it shows up in the same format after applying the code generously provided by Mr. Kumar.

          I'm thinking of some other options how I can format my data as plain text format like you are setting up. But, I'm not entirely sure how I can do that other than copy pasting from stata .dta file or excel file.

          If anything comes to your mind. do kindly let me know.

          I apologies for any unintentional confusion and sincerest gratitude for giving me the right direction.

          Comment


          • #6
            The dataset I'm using it uploaded on public forum of Kaggle website. From , there I'm using the excel file of us cities where all the other information I'm using. From the excel file, I've copy pasted to my stata dta file and also using the same copy pasted data on my text editor

            https://www.kaggle.com/datasets/serg...ities-database

            Comment


            • #7
              I converted my dta file to txt format by following command

              Code:
              outfile using final.txt, wide replace
              So, that converts my dta file without any variables. Then after applying the command I'm getting this error:

              Code:
              while r(eof) == 0 {
                2.     local line = ustrtrim(itrim("`line'"))
                3.     local county = ustrtrim("`:word 1 of `line''")
                4.     local line = ustrtrim(subinstr("`line'","`county'","",1))
                5.     local state_id = substr("`line'",-2,.)
                6.     local city = ustrtrim(substr("`line'",1,length("`line'")-3))
                7.     local command `"gen county = `county' if city == "`city'" & statefip == "`state_id'""'
                8.     file write `writefile' `"`command'"' _n
                9.     file read `readfile' line
               10. }
              Salem"                        "NC"         " invalid name
              r(198);
              
              end of do-file
              
              r(198);
              One of my observation is like the following case. Is the hiphen between Winston and Salem creating the error:

              Code:
              37067  "Winston-Salem"                        "NC"

              Comment


              • #8
                What exactly are you attempting to accomplish? If you are trying to create start with the the generate commands shown at the top of post #1, then since you have your data in Stata dataset you can write state code to create the commands and output the commands to a text file.
                Code:
                . use ~/Downloads/stuff.dta, clear
                
                . list, clean noobs
                
                     var1                                  var2   var3  
                    36061                              New York     NY  
                     6037                           Los Angeles     CA  
                    17031                               Chicago     IL  
                    12086                                 Miami     FL  
                    48113                                Dallas     TX  
                    26163       Village of Grosse Pointe Shores     MI  
                    26021         Shorewood-Tower Hills-Harbert     MI  
                    28083   Mississippi Valley State University     MS  
                    42103          Pocono Mountain Lake Estates     PA  
                    23029          Passamaquoddy Pleasant Point     ME  
                
                . generate command = `"gen county = "' + strofreal(var1) + ///
                >                    `" if city == ""' + var2 + ///
                >                    `"" & statefip == ""' + var3 + `"""'
                
                . outfile command using stuff.do, noquote replace
                
                . type stuff.do
                gen county = 36061 if city == "New York" & statefip == "NY"
                gen county = 6037 if city == "Los Angeles" & statefip == "CA"
                gen county = 17031 if city == "Chicago" & statefip == "IL"
                gen county = 12086 if city == "Miami" & statefip == "FL"
                gen county = 48113 if city == "Dallas" & statefip == "TX"
                gen county = 26163 if city == "Village of Grosse Pointe Shores" & statefip == "MI"
                gen county = 26021 if city == "Shorewood-Tower Hills-Harbert" & statefip == "MI"
                gen county = 28083 if city == "Mississippi Valley State University" & statefip == "MS"
                gen county = 42103 if city == "Pocono Mountain Lake Estates" & statefip == "PA"
                gen county = 23029 if city == "Passamaquoddy Pleasant Point" & statefip == "ME"
                
                .

                Comment


                • #9
                  I owe you a big thanks to Mr. Kumar and Mr. lisowski for helping me to get out of this terrible issue. This was a very challenging task to clean up this dataset from the scratch.

                  With both of your kind help and suggestion finally I've become able to clean up my current dataset to the format I needed to take my research further ahead ! Words won't be enough to show my gratitude !

                  Comment


                  • #10
                    I have to say that I am at a total loss as to what you are trying to accomplish.

                    With your using data in a Stata dataset, you could merge the using dataset to your primary dataset . But first you have to address the problem that your using data has, for example, two cities in New York named Woodbury - assigned to different counties - and your primary dataset has no way to distinguish which county is the correct one to match - and it could be the case that some of observations from Woodbury NY in your primary dataset come from one county and some from the other county.

                    Converting your dataset to generate commands does not solve that problem - there will be two generate commands for Woodbury NY with two different values of the county code.

                    Also, the string of generate commands will not run.
                    Code:
                    . sysuse auto
                    (1978 automobile data)
                    
                    . generate f = 1 if foreign==1
                    (52 missing values generated)
                    
                    . generate f = 0 if foreign==0
                    variable f already defined
                    r(110);
                    What you need is
                    Code:
                    generate long county = .
                    replace county = 36061 if city == "New York" & statefip == "NY"
                    ...
                    Finally,
                    From the excel file, I've copy pasted to my stata dta file
                    This is not as good an idea as using the import excel command, or using the import delimited command with the original csv dataset.

                    Comment


                    • #11
                      Yeah I realized it later Thankfully the hold down option button was there to change 14,000 lines of changing code lines was there which got the job Jone in two attempts Thanks again so much for being so kind and generous with your thoughts and suggestions!

                      Comment

                      Working...
                      X