Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Creating variables with 500+ conditions

    I am trying to map out school districts based on precinct numbers (which is present in both data sets that I am working with). The precinct numbers are contained in a different excel sheet that looks like this:

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input long precinct str36 name1
     50001 "ACTON-AGUA DULCE UNIF SCHOOL"  
     50002 "ACTON-AGUA DULCE UNIF SCHOOL"  
     50003 "ACTON-AGUA DULCE UNIF SCHOOL"  
     50004 "ACTON-AGUA DULCE UNIF SCHOOL"  
     50005 "ACTON-AGUA DULCE UNIF SCHOOL"  
     50010 "LA CANADA UNIFIED SCHOOL"      
     50011 "LA CANADA UNIFIED SCHOOL"      
     50014 "ACTON-AGUA DULCE UNIF SCHOOL"  
     50016 "ACTON-AGUA DULCE UNIF SCHOOL"  
     50019 "LA CANADA UNIFIED SCHOOL"      
     50020 "ACTON-AGUA DULCE UNIF SCHOOL"  
     50021 "ACTON-AGUA DULCE UNIF SCHOOL"  
     50022 "ACTON-AGUA DULCE UNIF SCHOOL"  
     50023 "ACTON-AGUA DULCE UNIF SCHOOL"  
     50024 "ACTON-AGUA DULCE UNIF SCHOOL"  
     50025 "ACTON-AGUA DULCE UNIF SCHOOL"  
     50026 "ACTON-AGUA DULCE UNIF SCHOOL"  
     50027 "ACTON-AGUA DULCE UNIF SCHOOL"  
     50028 "ACTON-AGUA DULCE UNIF SCHOOL"  
     50051 "ACTON-AGUA DULCE UNIF SCHOOL"  
     50052 "LA CANADA UNIFIED SCHOOL"      
     50052 "ACTON-AGUA DULCE UNIF SCHOOL"  
     50053 "LA CANADA UNIFIED SCHOOL"      
     50054 "LA CANADA UNIFIED SCHOOL"      
     50056 "ACTON-AGUA DULCE UNIF SCHOOL"  
     50059 "ACTON-AGUA DULCE UNIF SCHOOL"  
     50060 "ACTON-AGUA DULCE UNIF SCHOOL"  
     50061 "ACTON-AGUA DULCE UNIF SCHOOL"  
     50062 "ACTON-AGUA DULCE UNIF SCHOOL"  
     50063 "ACTON-AGUA DULCE UNIF SCHOOL"  
     70001 "LAS VIRGENES UNIFIED SCHOOL"    
     70002 "LAS VIRGENES UNIFIED SCHOOL"    
     70004 "LAS VIRGENES UNIFIED SCHOOL"    
     70006 "LAS VIRGENES UNIFIED SCHOOL"    
     70007 "LAS VIRGENES UNIFIED SCHOOL"    
     70008 "LAS VIRGENES UNIFIED SCHOOL"    
     70027 "LAS VIRGENES UNIFIED SCHOOL"    
     70040 "LAS VIRGENES UNIFIED SCHOOL"    
     70041 "LAS VIRGENES UNIFIED SCHOOL"    
     70041 "SANTA MONICA-MALIBU UNIF SCHOOL"
     70207 "LAS VIRGENES UNIFIED SCHOOL"    
     80001 "LAS VIRGENES UNIFIED SCHOOL"    
     80002 "LAS VIRGENES UNIFIED SCHOOL"    
     80003 "LAS VIRGENES UNIFIED SCHOOL"    
     80012 "LAS VIRGENES UNIFIED SCHOOL"    
     80021 "LAS VIRGENES UNIFIED SCHOOL"    
     80022 "LAS VIRGENES UNIFIED SCHOOL"    
     80025 "LAS VIRGENES UNIFIED SCHOOL"    
     80034 "LAS VIRGENES UNIFIED SCHOOL"    
     80038 "LAS VIRGENES UNIFIED SCHOOL"    
     80043 "LAS VIRGENES UNIFIED SCHOOL"    
     80047 "LAS VIRGENES UNIFIED SCHOOL"    
     80050 "LAS VIRGENES UNIFIED SCHOOL"    
     80051 "LAS VIRGENES UNIFIED SCHOOL"    
     80052 "LAS VIRGENES UNIFIED SCHOOL"    
     80053 "LAS VIRGENES UNIFIED SCHOOL"    
     80054 "LAS VIRGENES UNIFIED SCHOOL"    
     80070 "LAS VIRGENES UNIFIED SCHOOL"    
     80074 "LAS VIRGENES UNIFIED SCHOOL"    
     80075 "LAS VIRGENES UNIFIED SCHOOL"    
     90001 "ACTON-AGUA DULCE UNIF SCHOOL"  
     90002 "ACTON-AGUA DULCE UNIF SCHOOL"  
     90003 "ACTON-AGUA DULCE UNIF SCHOOL"  
     90004 "ACTON-AGUA DULCE UNIF SCHOOL"  
     90005 "ACTON-AGUA DULCE UNIF SCHOOL"  
     90006 "ACTON-AGUA DULCE UNIF SCHOOL"  
     90007 "ACTON-AGUA DULCE UNIF SCHOOL"  
     90008 "ACTON-AGUA DULCE UNIF SCHOOL"  
     90009 "ACTON-AGUA DULCE UNIF SCHOOL"  
     90010 "ACTON-AGUA DULCE UNIF SCHOOL"  
     90012 "ACTON-AGUA DULCE UNIF SCHOOL"  
     90014 "ACTON-AGUA DULCE UNIF SCHOOL"  
     90015 "ACTON-AGUA DULCE UNIF SCHOOL"  
     90016 "ACTON-AGUA DULCE UNIF SCHOOL"  
     90017 "ACTON-AGUA DULCE UNIF SCHOOL"  
     90018 "ACTON-AGUA DULCE UNIF SCHOOL"  
    150001 "ALHAMBRA USD"                  
    150001 "ALHAMBRA USD-1ST DISTRICT"      
    150002 "ALHAMBRA USD"                  
    150002 "ALHAMBRA USD-1ST DISTRICT"      
    150003 "ALHAMBRA USD"                  
    150003 "ALHAMBRA USD-1ST DISTRICT"      
    150004 "ALHAMBRA USD-1ST DISTRICT"      
    150004 "ALHAMBRA USD"                  
    150005 "ALHAMBRA USD"                  
    150005 "SAN GABRIEL UNIFIED SCHOOL"    
    150005 "ALHAMBRA USD-2ND DISTRICT"      
    150006 "ALHAMBRA USD-3RD DISTRICT"      
    150006 "ALHAMBRA USD"                  
    150007 "ALHAMBRA USD-1ST DISTRICT"      
    150007 "ALHAMBRA USD"                  
    150008 "ALHAMBRA USD-1ST DISTRICT"      
    150008 "ALHAMBRA USD"                  
    150009 "ALHAMBRA USD-1ST DISTRICT"      
    150009 "ALHAMBRA USD"                  
    150010 "ALHAMBRA USD-1ST DISTRICT"      
    150010 "ALHAMBRA USD"                  
    150010 "ALHAMBRA USD-2ND DISTRICT"      
    150011 "ALHAMBRA USD"                  
    150011 "ALHAMBRA USD-3RD DISTRICT"      
    end
    I have been trying to do this in a rather tedious way by running the following code for each district:

    PHP Code:
    inlist2 precinctname(Acton_Laguna_Dulcevalues(50028,50059,50001,50062,50024,50052,50062,50061,50001,50059,50004,50003,1770008,90004,1770008,50001,50005,90009,1770015,90003,50004,50004,6850010,50002,50028,50001,50005,50028,50014,50005,90004,50004,50022,90015,90003,50001,50001,50059,50016,90016,50063,50022,50016,50059,90003,50025,50062,50016,90003,90008,50004,50003,50004,6850010,90003,50025,50056,90018,50021,1770015,50022,50063,90003,90004,6850024,50001,1770008,50061,50004,50059,50001,90010,1770008,50059,50051,50060,50005,50001,90007,50003,90018,50027,50061,50059,90004,1770015,90003,5000014,50003,50003,50005,50016,50001,90005,90003,90004,90001,50024,90003,50016,90003,50001,5000014,50004,90003,90003,50026,50063,50025,90006,50059,50059,50063,50021,50002,50023,90016,90017,50002,50022,50021,5000014,90003,50003,50023,90002,50016,50001,50004,5000014,50059,50001,90003,50061,90010,5000014,50014,50005,90002,50059,6850031,50061,90014,6850032,90003,50063,50061,90002,90012,50014,50059,50016,50020,50002
    While this works for districts that contain fewer precincts like the Acton-Laguna Dulce it does not work for bigger districts like Los Angeles since inlist2 stops reading the precinct numbers after a certain point, citing a syntax error and resulting in a lot of missing values. I have also tried merging the data set above with the main data set that I want to create the districts in but this has not worked either since precinct was not able to match the observations of the two datasets uniquely, resulting in an error. I am also adding a data example of the main dataset for your reference.

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input long(voter_id precinct)
      321 3450020
     1006 9000202
     1327 5150015
     1541 1940011
     1648 1000033
     1755 6220133
     1862 9000801
     2119 9000001
     2226  950003
     2547 5400047
     3339 7800006
     3981 9001312
     4024 5500109
     4131 1040020
     4559 3430010
     5565 1850040
     6143  750011
     6571 1770306
     6999  750043
     7149 6250029
     7898 1960005
     9161  950160
     9268 6250052
     9482 7750017
     9703 4800046
    10112 6100015
    10433 4050004
    11118 2550073
    11332 1500018
    11439  750014
    11974   50005
    12231   50005
    12445 5400048
    12552 5400048
    12980 9002416
    13237 9005821
    13344 9002160
    13451 9000214
    13879  950013
    14457 9002248
    14885   50016
    14992 9000151
    15891 1850033
    15998 3330010
    16255 2000012
    16362 2040015
    16897 6000009
    17047  950161
    17154 2700013
    17261 4800023
    17582 6000019
    17689   50003
    17796   50003
    18267 7700154
    18374 6700025
    18702 2600108
    18809 3550020
    19922 9003988
    20224 3650005
    20759 9001563
    21230   50016
    21337   50016
    21551 7700167
    21765 3850376
    22129 4800046
    22343 3850246
    22771 1070003
    22878   50004
    22985   50004
    23242   70008
    23456 3400014
    23777 4800046
    23991 1770001
    24034 1040042
    24462 9002107
    24569 9000288
    24676 1300021
    24890 6220125
    25254 9000549
    25896 9001799
    26153 3850366
    26367 9000963
    26795 3850212
    26802  600007
    26909   80052
    27052 6600005
    27915 6820009
    28058 1030026
    28379 3850104
    28486   50062
    28921 1040251
    29492 5230041
    29713 1450018
    30015 1030049
    30550 9002502
    31128 9001257
    32134 3300040
    32669 9000357
    32883 3450010
    32990 5150004
    end

    Also, the ideal outcome would be for regions like Alhambra USD, Alhambra USD 2ND DISTRICT, ALHAMBRA USD-3RD DISTRICT and so on to be coded as one district (i.e. just Alhambra) but this is largely a secondary concern.

    Does anyone have any thoughts on how to accomplish this?

  • #2
    Any of the mentioned approaches are fine in principle, but since you have so many districts, and you want to assign different subsets to new categories, a m:1 merge is most suitable.

    Suppose you have two datasets:
    1) the data you wish to analyze which includes district ID. Naturally you will have multiple observations at that level.
    2) a dataset to be used to assign district ID to region. Each observation in this dataset contains at a minimum the district ID and associated new region ID.

    This is example code that shows you the above idea.

    Code:
    use dataset1, clear
    merge m:1 precinct using dataset2

    Comment


    • #3
      Originally posted by Asteris Dougalis View Post
      I am trying to map out school districts based on precinct numbers (which is present in both data sets that I am working with). The precinct numbers are contained in a different excel sheet that looks like this:

      Code:
      * Example generated by -dataex-. To install: ssc install dataex
      clear
      input long precinct str36 name1
      50001 "ACTON-AGUA DULCE UNIF SCHOOL"
      50002 "ACTON-AGUA DULCE UNIF SCHOOL"
      50003 "ACTON-AGUA DULCE UNIF SCHOOL"
      50004 "ACTON-AGUA DULCE UNIF SCHOOL"
      50005 "ACTON-AGUA DULCE UNIF SCHOOL"
      50010 "LA CANADA UNIFIED SCHOOL"
      50011 "LA CANADA UNIFIED SCHOOL"
      50014 "ACTON-AGUA DULCE UNIF SCHOOL"
      50016 "ACTON-AGUA DULCE UNIF SCHOOL"
      50019 "LA CANADA UNIFIED SCHOOL"
      50020 "ACTON-AGUA DULCE UNIF SCHOOL"
      50021 "ACTON-AGUA DULCE UNIF SCHOOL"
      50022 "ACTON-AGUA DULCE UNIF SCHOOL"
      50023 "ACTON-AGUA DULCE UNIF SCHOOL"
      50024 "ACTON-AGUA DULCE UNIF SCHOOL"
      50025 "ACTON-AGUA DULCE UNIF SCHOOL"
      50026 "ACTON-AGUA DULCE UNIF SCHOOL"
      50027 "ACTON-AGUA DULCE UNIF SCHOOL"
      50028 "ACTON-AGUA DULCE UNIF SCHOOL"
      50051 "ACTON-AGUA DULCE UNIF SCHOOL"
      50052 "LA CANADA UNIFIED SCHOOL"
      50052 "ACTON-AGUA DULCE UNIF SCHOOL"
      50053 "LA CANADA UNIFIED SCHOOL"
      50054 "LA CANADA UNIFIED SCHOOL"
      50056 "ACTON-AGUA DULCE UNIF SCHOOL"
      50059 "ACTON-AGUA DULCE UNIF SCHOOL"
      50060 "ACTON-AGUA DULCE UNIF SCHOOL"
      50061 "ACTON-AGUA DULCE UNIF SCHOOL"
      50062 "ACTON-AGUA DULCE UNIF SCHOOL"
      50063 "ACTON-AGUA DULCE UNIF SCHOOL"
      70001 "LAS VIRGENES UNIFIED SCHOOL"
      70002 "LAS VIRGENES UNIFIED SCHOOL"
      70004 "LAS VIRGENES UNIFIED SCHOOL"
      70006 "LAS VIRGENES UNIFIED SCHOOL"
      70007 "LAS VIRGENES UNIFIED SCHOOL"
      70008 "LAS VIRGENES UNIFIED SCHOOL"
      70027 "LAS VIRGENES UNIFIED SCHOOL"
      70040 "LAS VIRGENES UNIFIED SCHOOL"
      70041 "LAS VIRGENES UNIFIED SCHOOL"
      70041 "SANTA MONICA-MALIBU UNIF SCHOOL"
      70207 "LAS VIRGENES UNIFIED SCHOOL"
      80001 "LAS VIRGENES UNIFIED SCHOOL"
      80002 "LAS VIRGENES UNIFIED SCHOOL"
      80003 "LAS VIRGENES UNIFIED SCHOOL"
      80012 "LAS VIRGENES UNIFIED SCHOOL"
      80021 "LAS VIRGENES UNIFIED SCHOOL"
      80022 "LAS VIRGENES UNIFIED SCHOOL"
      80025 "LAS VIRGENES UNIFIED SCHOOL"
      80034 "LAS VIRGENES UNIFIED SCHOOL"
      80038 "LAS VIRGENES UNIFIED SCHOOL"
      80043 "LAS VIRGENES UNIFIED SCHOOL"
      80047 "LAS VIRGENES UNIFIED SCHOOL"
      80050 "LAS VIRGENES UNIFIED SCHOOL"
      80051 "LAS VIRGENES UNIFIED SCHOOL"
      80052 "LAS VIRGENES UNIFIED SCHOOL"
      80053 "LAS VIRGENES UNIFIED SCHOOL"
      80054 "LAS VIRGENES UNIFIED SCHOOL"
      80070 "LAS VIRGENES UNIFIED SCHOOL"
      80074 "LAS VIRGENES UNIFIED SCHOOL"
      80075 "LAS VIRGENES UNIFIED SCHOOL"
      90001 "ACTON-AGUA DULCE UNIF SCHOOL"
      90002 "ACTON-AGUA DULCE UNIF SCHOOL"
      90003 "ACTON-AGUA DULCE UNIF SCHOOL"
      90004 "ACTON-AGUA DULCE UNIF SCHOOL"
      90005 "ACTON-AGUA DULCE UNIF SCHOOL"
      90006 "ACTON-AGUA DULCE UNIF SCHOOL"
      90007 "ACTON-AGUA DULCE UNIF SCHOOL"
      90008 "ACTON-AGUA DULCE UNIF SCHOOL"
      90009 "ACTON-AGUA DULCE UNIF SCHOOL"
      90010 "ACTON-AGUA DULCE UNIF SCHOOL"
      90012 "ACTON-AGUA DULCE UNIF SCHOOL"
      90014 "ACTON-AGUA DULCE UNIF SCHOOL"
      90015 "ACTON-AGUA DULCE UNIF SCHOOL"
      90016 "ACTON-AGUA DULCE UNIF SCHOOL"
      90017 "ACTON-AGUA DULCE UNIF SCHOOL"
      90018 "ACTON-AGUA DULCE UNIF SCHOOL"
      150001 "ALHAMBRA USD"
      150001 "ALHAMBRA USD-1ST DISTRICT"
      150002 "ALHAMBRA USD"
      150002 "ALHAMBRA USD-1ST DISTRICT"
      150003 "ALHAMBRA USD"
      150003 "ALHAMBRA USD-1ST DISTRICT"
      150004 "ALHAMBRA USD-1ST DISTRICT"
      150004 "ALHAMBRA USD"
      150005 "ALHAMBRA USD"
      150005 "SAN GABRIEL UNIFIED SCHOOL"
      150005 "ALHAMBRA USD-2ND DISTRICT"
      150006 "ALHAMBRA USD-3RD DISTRICT"
      150006 "ALHAMBRA USD"
      150007 "ALHAMBRA USD-1ST DISTRICT"
      150007 "ALHAMBRA USD"
      150008 "ALHAMBRA USD-1ST DISTRICT"
      150008 "ALHAMBRA USD"
      150009 "ALHAMBRA USD-1ST DISTRICT"
      150009 "ALHAMBRA USD"
      150010 "ALHAMBRA USD-1ST DISTRICT"
      150010 "ALHAMBRA USD"
      150010 "ALHAMBRA USD-2ND DISTRICT"
      150011 "ALHAMBRA USD"
      150011 "ALHAMBRA USD-3RD DISTRICT"
      end
      I have been trying to do this in a rather tedious way by running the following code for each district:

      PHP Code:
      inlist2 precinctname(Acton_Laguna_Dulcevalues(50028,50059,50001,50062,50024,50052,50062,50061,50001,50059,50004,50003,1770008,90004,1770008,50001,50005,90009,1770015,90003,50004,50004,6850010,50002,50028,50001,50005,50028,50014,50005,90004,50004,50022,90015,90003,50001,50001,50059,50016,90016,50063,50022,50016,50059,90003,50025,50062,50016,90003,90008,50004,50003,50004,6850010,90003,50025,50056,90018,50021,1770015,50022,50063,90003,90004,6850024,50001,1770008,50061,50004,50059,50001,90010,1770008,50059,50051,50060,50005,50001,90007,50003,90018,50027,50061,50059,90004,1770015,90003,5000014,50003,50003,50005,50016,50001,90005,90003,90004,90001,50024,90003,50016,90003,50001,5000014,50004,90003,90003,50026,50063,50025,90006,50059,50059,50063,50021,50002,50023,90016,90017,50002,50022,50021,5000014,90003,50003,50023,90002,50016,50001,50004,5000014,50059,50001,90003,50061,90010,5000014,50014,50005,90002,50059,6850031,50061,90014,6850032,90003,50063,50061,90002,90012,50014,50059,50016,50020,50002
      While this works for districts that contain fewer precincts like the Acton-Laguna Dulce it does not work for bigger districts like Los Angeles since inlist2 stops reading the precinct numbers after a certain point, citing a syntax error and resulting in a lot of missing values. I have also tried merging the data set above with the main data set that I want to create the districts in but this has not worked either since precinct was not able to match the observations of the two datasets uniquely, resulting in an error. I am also adding a data example of the main dataset for your reference.

      Code:
      * Example generated by -dataex-. To install: ssc install dataex
      clear
      input long(voter_id precinct)
      321 3450020
      1006 9000202
      1327 5150015
      1541 1940011
      1648 1000033
      1755 6220133
      1862 9000801
      2119 9000001
      2226 950003
      2547 5400047
      3339 7800006
      3981 9001312
      4024 5500109
      4131 1040020
      4559 3430010
      5565 1850040
      6143 750011
      6571 1770306
      6999 750043
      7149 6250029
      7898 1960005
      9161 950160
      9268 6250052
      9482 7750017
      9703 4800046
      10112 6100015
      10433 4050004
      11118 2550073
      11332 1500018
      11439 750014
      11974 50005
      12231 50005
      12445 5400048
      12552 5400048
      12980 9002416
      13237 9005821
      13344 9002160
      13451 9000214
      13879 950013
      14457 9002248
      14885 50016
      14992 9000151
      15891 1850033
      15998 3330010
      16255 2000012
      16362 2040015
      16897 6000009
      17047 950161
      17154 2700013
      17261 4800023
      17582 6000019
      17689 50003
      17796 50003
      18267 7700154
      18374 6700025
      18702 2600108
      18809 3550020
      19922 9003988
      20224 3650005
      20759 9001563
      21230 50016
      21337 50016
      21551 7700167
      21765 3850376
      22129 4800046
      22343 3850246
      22771 1070003
      22878 50004
      22985 50004
      23242 70008
      23456 3400014
      23777 4800046
      23991 1770001
      24034 1040042
      24462 9002107
      24569 9000288
      24676 1300021
      24890 6220125
      25254 9000549
      25896 9001799
      26153 3850366
      26367 9000963
      26795 3850212
      26802 600007
      26909 80052
      27052 6600005
      27915 6820009
      28058 1030026
      28379 3850104
      28486 50062
      28921 1040251
      29492 5230041
      29713 1450018
      30015 1030049
      30550 9002502
      31128 9001257
      32134 3300040
      32669 9000357
      32883 3450010
      32990 5150004
      end

      Also, the ideal outcome would be for regions like Alhambra USD, Alhambra USD 2ND DISTRICT, ALHAMBRA USD-3RD DISTRICT and so on to be coded as one district (i.e. just Alhambra) but this is largely a secondary concern.

      Does anyone have any thoughts on how to accomplish this?
      About inlist2, it could be either a limitation of tokenize or forvalues (probably the first), happy to look at the issue if you can send over a basic replication package. In general tho I agree with Leonardo, when basic programs give errors, there's probably a different approach meant to be used to solve the problem.

      Comment

      Working...
      X