Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • spmap - create rectangles to show labels and colors for small jurisdictions in choropleth maps

    I am making several choropleth (data) maps of the United States using spmap (M. Pisati, available from SSC). Some of the states in the northeast are too small to label or even to distinguish the color. I would like to add rectangles, adjacent to the small jurisdictions, to show the labels and colors legibly, like this example PNG image from this web page.

    I have two questions.

    1. What is the correct term for these rectangles?
    2. Can anyone please share code to create them?

    I imagine #2 is a matter of entering the coordinates and corresponding centroids in the shapefile, but I would really appreciate any help you may offer so as to not reinvent the wheel.

    David
    David Radwin
    Senior Researcher, California Competes
    californiacompetes.org
    Pronouns: He/Him

  • #2
    I think this is an interesting problem that deserves attention. I don't know if there's a name for these "rectangles" but it's hard to see how you can make a compelling choropleth map of US states without using a device similar to these outside proxy squares for some of the smaller states. This is particularly true if you want to overlay labels for each state. The short story is that you need to create polygons in the shape you want and add these, as if they were islands, to the polygon(s) of the desired state(s).

    The example map in #1 is the product of a series of large and small decisions and I don't think this can easily be automated. Just creating a basic composite map of the 50 states requires a bit of effort. I've done some ground work on this in the Fun with Maps section of the geo2xy (from SSC) help files. Because the placement of labels will use centroids as starting locations, I'll start from scratch here. The first step is to get a shapefile of the 50 US states, DC, and Puerto Rico. At the same time, I'll create a dataset with an attribute variable to use for the map.

    Code:
    * http://www2.census.gov/geo/tiger/GENZ2017/shp/cb_2017_us_state_20m.zip
    shp2dta using "cb_2017_us_state_20m/cb_2017_us_state_20m.shp", ///
        data("dbase.dta") coor("coor.dta") genid(id) gencentroids(c) replace
    
    use "dbase.dta", clear
    rename (id x_c y_c) (_ID _X _Y)
    save "dbase.dta", replace
    
    * make an attribute variable
    set seed 312356
    use "dbase.dta", clear
    keep STUSPS
    gen pcchange = runiformint(-30,30)
    gen spcchange = string(pcchange)
    save "attribute.dta", replace
    The second step is to convert the lat/lon coordinates of the downloaded shapefile to planar coordinates. To create the desired composite map, you will need to scale and offset the coordinates for Alaska, Hawaii, and Puerto Rico. Because the Census keeps changing their shapefiles (and the _ID codes used for each state), I'm creating local macros for the state identifiers (_ID) that need to be scaled and moved. The state's centroid coordinates are located in the database part of the shapefile (one per state) and they also need to be converted to planar using the exact same projection. To make things simpler, I'm combining the two sets of coordinates, applying the projections, and splitting them back at the end. I use USGS recommended projection parameters except for Puerto Rico where I simply use default parameters. This appears complicated but it only needs to be done once. The resulting database and coordinates form a shapefile of the composite US states in planar coordinates.

    Code:
    * create macros that contain the _ID of these 3 states
    use "dbase.dta", clear
    foreach state in AK HI PR {
        sum _ID if STUSPS == "`state'", meanonly
        local `state' = r(min)
        dis "`state' = " ``state''
    }
    
    * combine centroids with polygon coordinates to apply the same projections
    use _ID _X _Y STUSPS using "dbase.dta", clear
    append using "coor.dta"
    
    // flip longitudes to reconnect Hawaii and Alaska
    replace _X = cond(_X > 0, _X - 180, _X + 180) if inlist(_ID, `AK', `HI')
    
    // Alaska - USGS recommends standard parallels of 55 and 65 north
    sum _X if _ID == `AK'
    local midlon = (r(min) + r(max)) / 2
    geo2xy _Y _X if _ID == `AK', replace ///
            proj(albers, 6378137 298.257223563 55 65 0 `midlon')
    replace _Y = _Y / 3 + 800000 if _ID == `AK'
    replace _X = _X / 3 - 1700000 if _ID == `AK'
    
    // Hawaii - USGS recommends standard parallels of 8 and 18 north
    sum _X if _ID == `HI'
    local midlon = (r(min) + r(max)) / 2
    geo2xy _Y _X if _ID == `HI', replace ///
            proj(albers, 6378137 298.257223563 8 18 0 `midlon')
    replace _Y = _Y / 1.2 + 850000 if _ID == `HI'
    replace _X = _X / 1.2 - 800000 if _ID == `HI'
    
    // Puerto Rico
    geo2xy _Y _X if _ID == `PR', replace proj(albers)
    replace _Y = _Y + 500000 if _ID == `PR'
    replace _X = _X + 2200000 if _ID == `PR'
    
    // 48 states - USGS recommends standard parallels of 29.5 and 45.5 north
    sum _X if !inlist(_ID, `AK', `HI', `PR')
    local midlon = (r(min) + r(max)) / 2
    geo2xy _Y _X if !inlist(_ID, `AK', `HI', `PR'), replace ///
            proj(albers, 6378137 298.257223563 29.5 45.5 0 `midlon')
            
    * convert to km and save the database and coordinates
    replace _Y = _Y / 1000
    replace _X = _X / 1000
    
    preserve
    keep if !mi(STUSPS)
    rename (_Y _X) (y_c x_c)
    merge 1:1 _ID using "dbase.dta", assert(match) nogen
    sort _ID
    save "xy_dbase.dta", replace
    restore
    drop if !mi(STUSPS)
    drop STUSPS
    sort _ID, stable
    save "xy_coor.dta", replace
    
    * show the basic map made from projected coordinates
    use "xy_dbase.dta", clear
    spmap using "xy_coor.dta", id(_ID)
    
    graph export "basic.png", width(1000) replace
    And here's the map generate:

    Click image for larger version

Name:	basic.png
Views:	1
Size:	335.3 KB
ID:	1448048


    In the next post, I'll show how to derive anchor and label locations for each state.

    Comment


    • #3
      With the composite shapefile generated in #2, I now address the issue of label placement. Centroids are a good starting point but they are not well positioned in certain states like Michigan (because the Upper Peninsula pulls it to the north west) and Florida. Note that I'm not an expert on spmap so all the credit goes to Scott Merryman for showing how to create multiline labels in this post. The final goal here is to finalize, for each state, the anchor coordinates (the location that is pointed to by the line that connects the state to its proxy square ) and the label coordinates (where the label goes). For all states where there is no need for a proxy square, the anchor and label coordinates are the same.

      Because this requires manual tweaks and that the nature of the label can affect placement, I implement a strategy where coordinates can easily be changed from within the code itself. The code tries to follow the map example in #1 where labels from proxy squares are on a single line. In terms of label placement, I also respect the model so the line connector can connect either to the center-right or center-left of the proxy square. The anchor locations are marked by a red x. In choosing the location, I leave enough space for the proxy square. You can always come back to this step if the final result does not look right.

      Code:
      * derive from centroids anchor and label points
      use "xy_dbase.dta", clear
      
      gen x_anchor = int(x_c)
      gen y_anchor = int(y_c)
      gen x_label = x_anchor
      gen y_label = y_anchor
      sort STUSPS
      dataex _ID STUSPS x_anchor y_anchor x_label y_label
      
      * Edit the following to taste
      clear
      input byte _ID str2 STUSPS float(x_anchor y_anchor x_label y_label)
       1 "AK" -1591 3070 -1591 3070
      19 "AL"   837 3478   837 3478
      21 "AR"   308 3680   308 3680
      20 "AZ" -1439 3726 -1439 3726
       2 "CA" -2067 4196 -2067 4196
       3 "CO"  -831 4176  -831 4176
      25 "CT"  1892 4658  2090 4510
       4 "DC"  1606 4282  2040 4100
      26 "DE"  1745 4290  2040 4300
      45 "FL"  1390 3064  1390 3064
      27 "GA"  1152 3498  1152 3498
      28 "HI"  -750 2650  -750 2650
       7 "IA"   192 4480   192 4480
       5 "ID" -1485 4884 -1485 4884
       6 "IL"   560 4272   560 4272
      22 "IN"   808 4276   808 4276
      23 "KS"  -219 4079  -219 4079
       8 "KY"   950 4050   950 4050
       9 "LA"   310 3300   310 3300
      31 "MA"  1953 4755  2200 4860
      10 "MD"  1745 4230  2040 4200
      24 "ME"  2058 5133  2058 5133
      32 "MI"   900 4620   900 4620
      11 "MN"   118 4952   118 4952
      12 "MO"   290 4068   290 4068
      33 "MS"   574 3453   574 3453
      46 "MT" -1050 5107 -1050 5107
      39 "NC"  1489 3900  1489 3900
      40 "ND"  -351 5084  -351 5084
      34 "NE"  -328 4423  -328 4423
      36 "NH"  1926 4905  1690 5200
      37 "NJ"  1790 4465  2040 4400
      38 "NM"  -934 3671  -934 3671
      35 "NV" -1763 4367 -1763 4367
      13 "NY"  1638 4750  1638 4750
      42 "OH"  1096 4354  1096 4354
      43 "OK"  -150 3752  -150 3752
      14 "OR" -1955 4942 -1955 4942
      44 "PA"  1498 4486  1498 4486
      52 "PR"  2213 2600  2213 2620
      41 "RI"  1987 4687  2200 4600
      29 "SC"  1366 3671  1366 3671
      30 "SD"  -348 4750  -348 4750
      15 "TN"   848 3850   848 3850
      16 "TX"  -330 3297  -330 3297
      47 "UT" -1346 4282 -1346 4282
      17 "VA"  1484 4100  1484 4100
      48 "VT"  1831 4926  1600 5100
      49 "WA" -1851 5311 -1851 5311
      18 "WI"   461 4778   461 4778
      50 "WV"  1307 4199  1307 4199
      51 "WY"  -947 4640  -947 4640
      end
      
      * identify the list of special cases
      gen rconnect = 1 if inlist(STUSPS,"NH","VT")
      replace rconnect = 0 if inlist(STUSPS,"MA","RI","CT","NJ","DE","MD","DC")
      
      * create 2 labels for each state
      merge 1:1 STUSPS using "attribute.dta", assert(match) nogen
      expand 2
      bysort STUSPS: gen lgroup = _n
      gen s = cond(mi(rconnect), STUSPS, STUSPS + ", " + spcchange) if lgroup == 1
      replace s = spcchange if mi(rconnect) & lgroup == 2
      save "xy_dbase_locations.dta", replace
      
      use "xy_dbase.dta", clear
      spmap using "xy_coor.dta", id(_ID) /// 
          label(data("xy_dbase_locations.dta") xcoord(x_label) ycoord(y_label) ///
          by(lgroup) label(s) color(blue) size(*0.6 ..) pos(0 6)) ///
          point(data("xy_dbase_locations.dta") xcoord(x_anchor) ycoord(y_anchor) ///
          shape(x) ocolor(red) )
      
      graph export "locations.png", width(1000) replace
      and here's the resulting map:

      Click image for larger version

Name:	locations.png
Views:	1
Size:	425.6 KB
ID:	1448052


      I the next post, I'll create the squares and connectors.

      Comment


      • #4
        In order for spmap to assign the state's shading and color to the proxy square, you need to create a polygon for the proxy square and append it to the shapefile's coordinates. The proxy square is essentially like an extra island for the state (with the same _ID value as the original state polygon(s)). I create two types of squares, one that connects center-right and the other center-left. The first observation of a polygon has missing coordinates. The next observation is the starting point. A polygon always ends where it started. To keep it simple, I create two unit polygons, one for each type. I then use joinby to form all pairwise combinations of states with proxy squares with these unit squares. There's room to play with sizes here (I create squares of 60km * 60km).
        Code:
        * polygon for squares that connect to the right or left
        clear
        input float(obs x y rconnect)
        1  .  . 0
        2 -1  0 0
        3 -1  1 0
        4  1  1 0
        5  1 -1 0
        6 -1 -1 0
        7 -1  0 0
        1  .  . 1
        2  1  0 1
        3  1 -1 1
        4 -1 -1 1
        5 -1  1 1
        6  1  1 1
        7  1  0 1
        end
        tempfile poly
        save "`poly'"
        
        * group label by whether they connect to the middle-right or not
        use "xy_dbase_locations.dta", clear
        keep if !mi(rconnect)
        keep if lgroup == 1
        sort rconnect STUSPS
        keep _ID STUSPS x_label y_label rconnect
        list , sepby(rconnect)
        
        * form all pairwise combinations with the polygon points
        joinby rconnect using "`poly'"
        isid rconnect STUSPS obs, sort
        
        * scale using 30 km for each unit segment
        gen x_sq = x_label + cond(rconnect,130,-130) + 30 * x
        gen y_sq = y_label + 30 * y
        save "squares.dta", replace
        
        * combine with the composite map's coordinates
        keep _ID x_sq y_sq
        rename (x_sq y_sq) (_X _Y)
        keep _ID _X _Y
        append using "xy_coor.dta"
        sort _ID, stable
        save "xy_coor_with_squares.dta", replace
        To be able to connect the proxy square to the state, you now need to create polylines that spmap can use. These follow the same structure as the coordinates used to create the shapes of the state except that they do not end where they started. A single line connector will have 3 observations. The first has missing coordinates. The second is the coordinates of the proxy square's initial point. The third is the state's anchor location. In order to allow for tweaks, I use the same dataex technique that I used in #3. Specifically, I added an elbow to the connector for DC.

        Code:
        * get the state's anchor location
        use "xy_dbase_locations.dta", clear
        keep if !mi(rconnect)
        keep if lgroup == 1
        sort STUSPS
        keep _ID STUSPS x_anchor y_anchor rconnect
        list , sepby(rconnect)
        tempfile anchors
        save "`anchors'"
        
        * each polyline contains at least 3 obs, starting with a missing value.
        * the second obs constains the location of the square connect point
        use _ID STUSPS x_sq y_sq obs if obs <= 3 using "squares.dta", clear
        merge m:1 STUSPS using "`anchors'", assert(match) nogen
        isid STUSPS obs, sort
        
        * move the label location as the final point in the polyline
        gen _X = cond(obs == 3, x_anchor, x_sq)
        gen _Y = cond(obs == 3, y_anchor, y_sq)
        
        * print out initial polyline data, copy output here and adjust as needed
        dataex _ID STUSPS _X _Y
        
        * Example generated by -dataex-. To install: ssc install dataex
        clear
        input byte _ID str2 STUSPS float(_X _Y)
        25 "CT"    .    .
        25 "CT" 1930 4510
        25 "CT" 1892 4658
         4 "DC"    .    .
         4 "DC" 1880 4100
         4 "DC" 1800 4100
         4 "DC" 1606 4282
        26 "DE"    .    .
        26 "DE" 1880 4300
        26 "DE" 1745 4290
        31 "MA"    .    .
        31 "MA" 2040 4860
        31 "MA" 1953 4755
        10 "MD"    .    .
        10 "MD" 1880 4200
        10 "MD" 1745 4230
        36 "NH"    .    .
        36 "NH" 1850 5200
        36 "NH" 1926 4905
        37 "NJ"    .    .
        37 "NJ" 1880 4400
        37 "NJ" 1790 4465
        41 "RI"    .    .
        41 "RI" 2040 4600
        41 "RI" 1987 4687
        48 "VT"    .    .
        48 "VT" 1760 5100
        48 "VT" 1831 4926
        end
        drop STUSPS
        
        save "connectors.dta", replace
        Finally, it's time to put all of this together.
        Code:
        use "xy_dbase.dta", clear
        merge 1:1 STUSPS using "attribute.dta", assert(match) nogen
        
        spmap pcchange using "xy_coor_with_squares.dta", id(_ID) /// 
            fcolor(Greens) ocolor(gs10 gs9 gs8 gs7)  ///
            label( data("xy_dbase_locations.dta") xcoord(x_label) ycoord(y_label) ///
            by(lgroup) label(s)size(*0.6 ..) pos(0 6) ) ///
            line( data("connectors.dta") ) legend(ring(0) bplace(s) bmargin(25 0 3 0)) 
        
        graph export "map_all.png", width(1000) replace
        and here's the resulting map:

        Click image for larger version

Name:	map_all.png
Views:	1
Size:	469.0 KB
ID:	1448057


        In terms of file organization, I broke down each step into separate do-files and have a master do-file run to the whole thing:
        Code:
        do get_data
        do xy_data
        do locations
        do squares
        do connectors
        do map_all

        Comment


        • #5
          Wow, this is exactly what I need! Thank you very much, Robert Picard! I appreciate it very much.

          David
          David Radwin
          Senior Researcher, California Competes
          californiacompetes.org
          Pronouns: He/Him

          Comment


          • #6
            In light of the sad new this morning about Anthony Bourdain, I thought I could show how to reuse the code to replicate the recent CDC figure on trends in suicide rates.

            Code:
            * Example generated by -dataex-. To install: ssc install dataex
            clear
            input str2 STUSPS str13 raw_both str5 spcchange str2 rank float pcchange
            "AL" "+ 21.9 % (33)" "+21.9" "33" 21.9
            "AK" "+ 37.4 % (13)" "+37.4" "13" 37.4
            "AZ" "+ 17.3 % (42)" "+17.3" "42" 17.3
            "AR" "+ 36.8 % (15)" "+36.8" "15" 36.8
            "CA" "+ 14.8 % (46)" "+14.8" "46" 14.8
            "CO" "+ 34.1 % (22)" "+34.1" "22" 34.1
            "CT" "+ 19.2 % (34)" "+19.2" "34" 19.2
            "DE" "+ 5.9 % (50)"  "+5.9"  "50"  5.9
            "DC" "+ 16.1 % (45)" "+16.1" "45" 16.1
            "FL" "+ 10.6 % (48)" "+10.6" "48" 10.6
            "GA" "+ 16.2 % (44)" "+16.2" "44" 16.2
            "HI" "+ 18.3 % (38)" "+18.3" "38" 18.3
            "ID" "+ 43.2 %  (7)" "+43.2" "7"  43.2
            "IL" "+ 22.8 % (32)" "+22.8" "32" 22.8
            "IN" "+ 31.9 % (25)" "+31.9" "25" 31.9
            "IA" "+ 36.2 % (18)" "+36.2" "18" 36.2
            "KS" "+ 45.0 % (5)"  "+45.0" "5"    45
            "KY" "+ 36.6 % (16)" "+36.6" "16" 36.6
            "LA" "+ 29.3 % (26)" "+29.3" "26" 29.3
            "ME" "+ 27.4 % (29)" "+27.4" "29" 27.4
            "MD" "+ 8.5 % (49)"  "+8.5"  "49"  8.5
            "MA" "+ 35.3 % (20)" "+35.3" "20" 35.3
            "MI" "+ 32.9 % (24)" "+32.9" "24" 32.9
            "MN" "+ 40.6 % (8)"  "+40.6" "8"  40.6
            "MS" "+ 17.8 % (40)" "+17.8" "40" 17.8
            "MO" "+ 36.4 % (17)" "+36.4" "17" 36.4
            "MT" "+ 38.0 % (11)" "+38.0" "11"   38
            "NE" "+ 16.2 % (43)" "+16.2" "43" 16.2
            "NV" "- 1.0 % (51)"  "-1.0"  "51"   -1
            "NH" "+ 48.3 % (3)"  "+48.3" "3"  48.3
            "NJ" "+ 19.2 % (35)" "+19.2" "35" 19.2
            "NM" "+ 18.3 % (39)" "+18.3" "39" 18.3
            "NY" "+ 28.8 % (27)" "+28.8" "27" 28.8
            "NC" "+ 12.7 % (47)" "+12.7" "47" 12.7
            "ND" "+ 57.6 % (1)"  "+57.6" "1"  57.6
            "OH" "+ 36.0 % (19)" "+36.0" "19"   36
            "OK" "+ 37.6 % (12)" "+37.6" "12" 37.6
            "OR" "+ 28.2 % (28)" "+28.2" "28" 28.2
            "PA" "+ 34.3 % (21)" "+34.3" "21" 34.3
            "RI" "+ 34.1 % (23)" "+34.1" "23" 34.1
            "SC" "+ 38.3 % (10)" "+38.3" "10" 38.3
            "SD" "+ 44.5 % (6)"  "+44.5" "6"  44.5
            "TN" "+ 24.2 % (31)" "+24.2" "31" 24.2
            "TX" "+ 18.9 % (36)" "+18.9" "36" 18.9
            "UT" "+ 46.5 % (4)"  "+46.5" "4"  46.5
            "VT" "+ 48.6 % (2)"  "+48.6" "2"  48.6
            "VA" "+ 17.4 % (41)" "+17.4" "41" 17.4
            "WA" "+ 18.8 % (37)" "+18.8" "37" 18.8
            "WV" "+ 37.1 % (14)" "+37.1" "14" 37.1
            "WI" "+ 25.8 % (30)" "+25.8" "30" 25.8
            "WY" "+ 39.0 % (9)"  "+39.0" "9"    39
            end
            replace pcchange = round(pcchange, 1)
            replace spcchange = string(pcchange)
            tempfile suicide_rates
            save "`suicide_rates'"
            
            * update labels
            use "xy_dbase_locations.dta", clear
            drop pcchange spcchange s
            merge m:1 STUSPS using "`suicide_rates'", nogen
            
            gen s = cond(mi(rconnect), STUSPS, STUSPS + ", " + spcchange) if lgroup == 1
            replace s = spcchange if mi(rconnect) & lgroup == 2
            drop if STUSPS == "PR"
            save "xy_dbase_labels.dta", replace
            
            use "xy_dbase.dta", clear
            merge 1:1 STUSPS using "`suicide_rates'", nogen
            drop if STUSPS == "PR"
            
            spmap pcchange using "xy_coor_with_squares.dta", id(_ID) /// 
                title("Percent change in annual suicide rate (1999-2016)", size(*0.7) color(gs4)) ///
                caption("source: https://stacks.cdc.gov/view/cdc/53785", size(*0.5) color(gs10)) ///
                clbreaks(-1 0 18 30 37 58) clmethod(custom) ///
                fcolor(ltblue "230 200 200" "230 150 150" "230 100 100" "230 50 50") ocolor(gs10 gs9 gs8 gs7)  ///
                label( data("xy_dbase_labels.dta") xcoord(x_label) ycoord(y_label) ///
                by(lgroup) label(s) size(*0.6 ..) pos(0 6) ) ///
                line( data("connectors.dta") ) legend(ring(0) bplace(s) bmargin(25 0 0 0)) 
            
            graph export "map_suicides.png", width(1000) replace
            and the resulting map:

            Click image for larger version

Name:	map_suicides.png
Views:	1
Size:	467.7 KB
ID:	1448210

            Comment


            • #7
              Many thanks to Robert Picard and Michael Stepner for their helpful guidance on maps. Today the National Center for Education Statistics in the US Department of Education released a publication that (among other things) uses maps to show regional variation in financial aid applications. Naturally, their contributions are also acknowledged in the references.

              https://nces.ed.gov/pubsearch/pubsin...?pubid=2018418
              David Radwin
              Senior Researcher, California Competes
              californiacompetes.org
              Pronouns: He/Him

              Comment


              • #8
                Thank you Robert Picard. I am trying to apply this your codes in step 3 to developing spmap for California county. I have some counties which are really small and i want to use the rectangle style in this post but i cannot seem to get the right coordinates that would place the points outside the map. Please can you clarify how i can derive this estimates. Thank you.

                Comment


                • #9

                  Dear Robert Picard

                  Thanks so much for detailed steps in this post. It is very helpful indeed. I have been able to replicate your code. However, in the final map, the connecting lines for the small states are inconsistent. I would really appreciate if you guide me on this issue. The output map is here:

                  Click image for larger version

Name:	map_all.png
Views:	2
Size:	142.1 KB
ID:	1533666



                  Thanks

                  Attached Files

                  Comment


                  • #10
                    I downloaded the shapefile anew (link in the first code block of #2) and copied the code from #2, #3, and #4 into separate do-files and ran a master do-file with the following content:
                    Code:
                    version 15
                    
                    do get_data.do      // #2 code block 1
                    do xy_data.do       // #2 code block 2
                    do locations.do     // #3 code block 1
                    do squares.do       // #4 code block 1
                    do connectors.do    // #4 code block 2
                    do map_all.do       // #4 code block 3
                    The final map looks identical to the one in #4. I can't tell where things went wrong for you but check that you copied all the parts correctly and run the do-files as a whole.

                    Comment


                    • #11
                      Hello Robert,

                      Thank you so much for this detailed code of how you got to make the map above and I think it looks fantastic. I wanted to apply your technique and in a way replicate the map you have on my own dataset that is of Pakistan. The problem is that I was unable to get the final step working where I am able to connect labels outside the map with lines and squares.

                      I was wondering if you could kindly have a look at my code and let me know where I am going wrong. It will be great if you can help me out with this.

                      I have also attached an image that shows at what point I reached. Also, I used the grmap option in Stata 16 to get this done.


                      Code:
                      * Example generated by -dataex-. To install: ssc install dataex
                      clear
                      input int _ID float district_new double(_CX _CY) float(Shape_Leng Shape_Area) str19 ADM2_EN float tot_complaints_dist
                       96   1 71.57102399300932  33.96826304010452  1.522665 .12526974 "Peshawar"         39520
                       76   2 72.09787154023972  34.31495116535756    1.8832 .16054375 "Mardan"           14455
                       15   3 71.71738341794575  34.24465505806053 1.4824945 .09679574 "Charsadda"         7496
                        8   4 70.65651337271892  32.91955405915595  1.778845 .11399214 "Bannu"             8962
                       21   5 70.69899065440383 31.869962607712854  4.436749  .7034219 "Dera Ismail Khan" 10406
                      127   6 72.03566989875996  35.28192972957805 3.3593645 .37131655 "Upper Dir"         3312
                       72   7 71.84545004152172  34.84476356450102 2.0598717  .1572234 "Lower Dir"         4644
                        1   8 73.26998212643791 34.108332575252426 2.1874955  .1715639 "Abbottabad"       13679
                       10   9 73.12267415270053  34.79141270761624 2.0509117 .13352133 "Batagram"          1629
                       12  10 72.51782763039995  34.46521312932733  2.162965 .17023906 "Buner"             3019
                       17  11 72.22485265048144  36.22554788585233   8.20395 1.4665498 "Chitral"           1524
                       34  12 70.84737720912672 33.430656237237805 2.1548097 .13382126 "Hangu"             2564
                       35  13 72.89508128480321  34.00501008422797  2.903675  .1875543 "Haripur"           7846
                       48  14 71.10924739977347 33.137505729535434 2.6047754  .2563766 "Karak"             5140
                       60  15  71.5394672122824  33.45592088037413  3.987598  .2885411 "Kohat"             7459
                       61  16 73.22681996339976 35.326556717856235 4.7075524  .7589301 "Kohistan"          1050
                       65  17 70.83372375416293  32.59846602201369  2.557964   .305942 "Lakki Marwat"      7403
                       73  18 71.89549074696235  34.53961904383832  1.657575  .0926152 "Malakand PA"       5630
                       75  19 73.44297497648662  34.66648399310128  4.276061  .4221201 "Mansehra"          9673
                       90  20 71.98608337906539  33.92841726337082 2.1039987 .16916557 "Nowshera"          9537
                      107  21 72.73041969747807 34.847820027112284  2.130305 .13688222 "Shangla"           2418
                      117  22 72.45922806355912  34.13610181558486 2.2540567 .14477351 "Swabi"             9409
                      118  23 72.46511237961398 35.250680745507815  4.142781 .53154844 "Swat"              7632
                      121  24 70.39242698179864  32.24356165350789  1.905292 .16023873 "Tank"              1853
                      125  25 72.83423351038117  34.55477365282171 1.3609056  .0458548 "Tor Ghar"           370
                        7 138  71.5046286351669  34.75315305669825 1.9832087 .13402374 "Bajaur"            2127
                       57 140 71.06682179824192  33.95970556478228 3.4295936  .2675712 "Khyber"            4603
                       63 141 70.32403406672428 33.708348327156706 3.0417414  .3305859 "Kurram"            1610
                       81 142 71.33862847620053 34.465521288680584 2.4713986 .22176823 "Mohmand"            817
                       89 143 69.98518851023353  32.95383641401969 3.8699205  .4752517 "North Waziristan"  1150
                       93 144 70.98913307258228  33.70000231484342  2.517406 .13378064 "Orakzai"            605
                      114 145  69.7494046231574  32.31388890741291  4.787823  .5979149 "South Waziristan"  1024
                      end

                      The Code i use is as below:

                      Code:
                      * derive from centroids anchor and label points to mark on map later for those
                      * district names that are difficult to read due to small area            
                      preserve
                      generate double x_anchor = _CX
                      generate double y_anchor = _CY
                      generate double x_label = x_anchor
                      generate double y_label = y_anchor
                      generate stot_complaints_dist = string(tot_complaints_dist)
                      
                      keep _ID district_new ADM2_EN x_anchor y_anchor x_label y_label stot_complaints_dist
                      replace x_label = 72.08102399300932 in 1
                      replace y_label = 33.26826304010452 in 1
                      
                      replace x_label = 70.71738341794575 in 3
                      replace y_label = 34.44465505806053 in 3
                      
                      replace x_label = 71.04545004152172 in 7
                      replace y_label = 35.14476356450102 in 7
                      
                      replace x_label = 70.51738341794575 in 18
                      replace y_label = 34.83961904383832 in 18
                      
                      replace x_label = 72.45922806355912 in 22
                      replace y_label = 33.83610181558486 in 22
                      
                      replace x_label = 73.83423351038117 in 25
                      replace y_label = 34.1477365282171 in 25
                      
                      
                      * dataex _ID district_new ADM2_EN x_anchor y_anchor x_label y_label
                      
                          
                      * Example generated by -dataex-. To install: ssc install dataex
                      *clear
                      *input int _ID float district_new str19 ADM2_EN double(x_anchor y_anchor x_label y_label)
                      * 96   1 "Peshawar"         71.57102399300932  33.96826304010452 72.08102399300932  33.26826304010452
                      * 76   2 "Mardan"           72.09787154023972  34.31495116535756 72.09787154023972  34.31495116535756
                      * 15   3 "Charsadda"        71.71738341794575  34.24465505806053 70.71738341794575  34.44465505806053
                      *  8   4 "Bannu"            70.65651337271892  32.91955405915595 70.65651337271892  32.91955405915595
                      * 21   5 "Dera Ismail Khan" 70.69899065440383 31.869962607712854 70.69899065440383 31.869962607712854
                      *127   6 "Upper Dir"        72.03566989875996  35.28192972957805 72.03566989875996  35.28192972957805
                      * 72   7 "Lower Dir"        71.84545004152172  34.84476356450102 71.04545004152172  35.14476356450102
                      *  1   8 "Abbottabad"       73.26998212643791 34.108332575252426 73.26998212643791 34.108332575252426
                      * 10   9 "Batagram"         73.12267415270053  34.79141270761624 73.12267415270053  34.79141270761624
                      * 12  10 "Buner"            72.51782763039995  34.46521312932733 72.51782763039995  34.46521312932733
                      * 17  11 "Chitral"          72.22485265048144  36.22554788585233 72.22485265048144  36.22554788585233
                      * 34  12 "Hangu"            70.84737720912672 33.430656237237805 70.84737720912672 33.430656237237805
                      * 35  13 "Haripur"          72.89508128480321  34.00501008422797 72.89508128480321  34.00501008422797
                      * 48  14 "Karak"            71.10924739977347 33.137505729535434 71.10924739977347 33.137505729535434
                      * 60  15 "Kohat"             71.5394672122824  33.45592088037413  71.5394672122824  33.45592088037413
                      * 61  16 "Kohistan"         73.22681996339976 35.326556717856235 73.22681996339976 35.326556717856235
                      * 65  17 "Lakki Marwat"     70.83372375416293  32.59846602201369 70.83372375416293  32.59846602201369
                      * 73  18 "Malakand PA"      71.89549074696235  34.53961904383832 70.51738341794575  34.83961904383832
                      * 75  19 "Mansehra"         73.44297497648662  34.66648399310128 73.44297497648662  34.66648399310128
                      * 90  20 "Nowshera"         71.98608337906539  33.92841726337082 71.98608337906539  33.92841726337082
                      *107  21 "Shangla"          72.73041969747807 34.847820027112284 72.73041969747807 34.847820027112284
                      *117  22 "Swabi"            72.45922806355912  34.13610181558486 72.45922806355912  33.83610181558486
                      *118  23 "Swat"             72.46511237961398 35.250680745507815 72.46511237961398 35.250680745507815
                      *121  24 "Tank"             70.39242698179864  32.24356165350789 70.39242698179864  32.24356165350789
                      *125  25 "Tor Ghar"         72.83423351038117  34.55477365282171 73.83423351038117  34.1477365282171
                      *  7 138 "Bajaur"            71.5046286351669  34.75315305669825  71.5046286351669  34.75315305669825
                      * 57 140 "Khyber"           71.06682179824192  33.95970556478228 71.06682179824192  33.95970556478228
                      * 63 141 "Kurram"           70.32403406672428 33.708348327156706 70.32403406672428 33.708348327156706
                      * 81 142 "Mohmand"          71.33862847620053 34.465521288680584 71.33862847620053 34.465521288680584
                      * 89 143 "North Waziristan" 69.98518851023353  32.95383641401969 69.98518851023353  32.95383641401969
                      * 93 144 "Orakzai"          70.98913307258228  33.70000231484342 70.98913307258228  33.70000231484342
                      *114 145 "South Waziristan"  69.7494046231574  32.31388890741291  69.7494046231574  32.31388890741291
                      *end
                      
                      
                      * identify the list of special cases that need to be moved outside the map
                      gen rconnect = 1 if inlist(ADM2_EN,"Peshawar","Charsadda","Malakand PA","Lower Dir")
                      replace rconnect = 0 if inlist(ADM2_EN,"Swabi","Tor Ghar")
                      
                      * create/make 2 labels for each district
                      expand 2
                      bysort district_new: gen lgroup = _n
                      generate s = cond(mi(rconnect),ADM2_EN,ADM2_EN + ", " + stot_complaints_dist) if lgroup==1
                      replace s = stot_complaints_dist if mi(rconnect) & lgroup==2     
                      save "xy_dbase_locations.dta", replace
                      restore
                      
                      
                      * NOW TO MAKE SQUARE MARKINGS FOR LOCATIONS
                      preserve
                      clear
                      input float(obs x y rconnect)
                      1  .  . 0
                      2 -1  0 0
                      3 -1  1 0
                      4  1  1 0
                      5  1 -1 0
                      6 -1 -1 0
                      7 -1  0 0
                      1  .  . 1
                      2  1  0 1
                      3  1 -1 1
                      4 -1 -1 1
                      5 -1  1 1
                      6  1  1 1
                      7  1  0 1
                      end
                      tempfile poly
                      save "`poly'"
                      
                      * group label by whether they connect to the middle-right or not
                      use "xy_dbase_locations.dta", clear
                      keep if !mi(rconnect)
                      keep if lgroup == 1
                      sort rconnect ADM2_EN
                      keep _ID ADM2_EN x_label y_label rconnect
                      list , sepby(rconnect)
                      
                      * form all pairwise combinations with the polygon points
                      joinby rconnect using "`poly'"
                      isid rconnect ADM2_EN obs, sort
                      
                      * scale using 30 km for each unit segment
                      gen x_sq = x_label + cond(rconnect,130,-130) + 30 * x
                      gen y_sq = y_label + 30 * y
                      save "squares.dta", replace
                      
                      * combine with the composite map's coordinates
                      keep _ID x_sq y_sq
                      rename (x_sq y_sq) (_CX _CY)
                      keep _ID _CX _CY
                      append using "./master/merged_v1"
                      sort _ID, stable
                      save "xy_coor_with_squares.dta", replace
                      
                      restore
                      
                      * get the state's anchor location
                      use "xy_dbase_locations.dta", clear
                      keep if !mi(rconnect)
                      keep if lgroup == 1
                      sort ADM2_EN
                      keep _ID ADM2_EN x_anchor y_anchor rconnect
                      list , sepby(rconnect)
                      tempfile anchors
                      save "`anchors'"
                      
                      * each polyline contains at least 3 obs, starting with a missing value.
                      * the second obs constains the location of the square connect point
                      use _ID ADM2_EN x_sq y_sq obs if obs <= 3 using "squares.dta", clear
                      merge m:1 ADM2_EN using `"`anchors'"', assert(match) nogen
                      isid ADM2_EN obs, sort
                      
                      * move the label location as the final point in the polyline
                      gen _X = cond(obs == 3, x_anchor, x_sq)
                      gen _Y = cond(obs == 3, y_anchor, y_sq)
                      
                      drop ADM2_EN
                      
                      save "connectors.dta", replace
                      
                      
                      use "merged_v1", clear
                      
                      
                          
                      * Test RUN
                      
                          #delimit ;
                          
                          grmap tot_complaints_dist,  
                          title("Total Complaints Per District", size(*1) color(gs4))
                          caption("source: KPK Administrative Data", size(*0.5) color(gs10))
                          fcolor("230 250 250" "230 200 200" "230 150 150" "230 100 100" "230 50 50" "230 25 25" gs8)
                          ocolor(white ..)
                          osize(medthin ..)
                          legenda(on)
                          legstyle(3)
                          legend(ring(0) position(4))
                          plotregion(icolor(white))
                          graphregion(icolor(white))
                          legtitle("Complaint Totals")
                          clbreaks(0 1001 2001 3001 5001 10001 15001 40000)
                          clmethod(custom)
                          
                          legend(label(2 "0 to 1,000") label(3 "1,001 to 2,000")  label(4 "2,001 to 3,000")  
                          label(5 "3,001 to 5,000") label(6 "5,001 to 10,000" ) label(7 "10,001 to 15,000") label(8 "35000+"))
                          
                          label(data("xy_dbase_locations.dta") xcoord(x_label) ycoord(y_label) by(lgroup) label(s)
                          color(gs3) size(*0.55 ..) pos(0 6))
                          line( data("./connectors.dta") )
                          
                          ;
                          
                          #delimit cr
                      Click image for larger version

Name:	Sample.png
Views:	1
Size:	78.9 KB
ID:	1574321

                      Comment


                      • #12
                        Anyone that may be able to help?

                        Comment


                        • #13
                          While it is much too late for you Fahad, maybe it will help someone else trying to use this thread to help them later. I think your biggest problem is in the code:

                          Code:
                           
                           * scale using 30 km for each unit segment gen x_sq = x_label + cond(rconnect,130,-130) + 30 * x gen y_sq = y_label + 30 * y save "squares.dta", replace
                          The original example switched to planar coordinates allowing for everything to be measured in km. You are using a coordinate system measured in degrees.
                          In my case when working with degrees for a map of Sindh, I found that multiplying x and y by 0.1 each worked nicely to get a 0.2 x 0.2 degree square

                          Comment

                          Working...
                          X