Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Generate a variable for number of observations within a distance using lat/long

    I have two kinds of observations, houses and schools, both with latitude and longitude variables. I would like to generate a new variable for the schools that is the number of houses within a certain distance from each school. I've never used lat/longs in stata before.

  • #2
    This is pretty easy to do with geonear (from SSC). To install geonear, type in Stata's command window

    Code:
    ssc install geonear
    Here a quick example using toy school and house datasets

    Code:
    * toy school data
    clear
    set seed 3214132
    set obs 5
    gen school = _n
    gen double lat = runiform()
    gen double lon = runiform()
    save "schools.dta", replace
    
    * toy house data
    clear
    set obs 50
    gen house = _n
    gen double lat = runiform()
    gen double lon = runiform()
    save "houses.dta", replace
    
    * find houses within 15km of each school
    * this returns at least one house per school even if it's not within 15km
    use "schools.dta", clear
    geonear school lat lon using "houses.dta", n(house lat lon) within(15) long
    
    * number of houses
    bysort school (house): egen within15 = total(km_to_house <= 15)
    
    list, sepby(school)
    
    * return to one obs per school
    by school: keep if _n == 1
    drop house km_to_house
    list
    The first list shows
    Code:
    . list, sepby(school)
    
         +---------------------------------------+
         | school   house   km_to_h~e   within15 |
         |---------------------------------------|
      1. |      1      15   13.528282          3 |
      2. |      1      18   6.9290569          3 |
      3. |      1      39   13.409289          3 |
         |---------------------------------------|
      4. |      2       7   2.9900385          4 |
      5. |      2      17   13.001577          4 |
      6. |      2      33   14.864743          4 |
      7. |      2      41   14.246075          4 |
         |---------------------------------------|
      8. |      3       4   9.7507377          2 |
      9. |      3      42   14.831461          2 |
         |---------------------------------------|
     10. |      4      23   12.106559          3 |
     11. |      4      34   12.312661          3 |
     12. |      4      43   9.0754699          3 |
         |---------------------------------------|
     13. |      5       1   14.415491          4 |
     14. |      5       9   9.5068169          4 |
     15. |      5      24   12.501883          4 |
     16. |      5      35   7.1974658          4 |
         +---------------------------------------+
    The final count per school
    Code:
    . list
    
         +-------------------+
         | school   within15 |
         |-------------------|
      1. |      1          3 |
      2. |      2          4 |
      3. |      3          2 |
      4. |      4          3 |
      5. |      5          4 |
         +-------------------+

    Comment


    • #3
      On further thought, the following is probably a better example. In this case, school #3 has no house within the desired distance. The collapse command is used to count the number of houses per school. The final count is then merged back with the original school data.

      Code:
      * toy school data
      clear
      set seed 3214132
      set obs 5
      gen school = _n
      gen double lat = runiform()
      gen double lon = runiform()
      gen students = int(1000 * runiform())
      save "schools.dta", replace
      
      * toy house data
      clear
      set obs 50
      gen house = _n
      gen double lat = runiform()
      gen double lon = runiform()
      save "houses.dta", replace
      
      * find houses within 14km of each school. use the near(0) option
      * to exclude schools if no house in range
      use "schools.dta", clear
      geonear school lat lon using "houses.dta", n(house lat lon) within(14) near(0) long
      list, sepby(school)
      
      * number of houses
      collapse (count) within14 = house, by(school)
      list
      
      * merge back with original school data
      merge 1:1 school using "schools.dta", nogen
      sort school
      list
      and the result is

      Code:
      . list, sepby(school)
      
           +----------------------------+
           | school   house   km_to_h~e |
           |----------------------------|
        1. |      1      13   6.9290569 |
        2. |      1      34   13.409289 |
        3. |      1      10   13.528282 |
           |----------------------------|
        4. |      2       2   2.9900385 |
        5. |      2      12   13.001577 |
           |----------------------------|
        6. |      4      38   9.0754699 |
        7. |      4      18   12.106559 |
        8. |      4      29   12.312661 |
           |----------------------------|
        9. |      5      30   7.1974658 |
       10. |      5       4   9.5068169 |
       11. |      5      19   12.501883 |
           +----------------------------+
      
      .
      . * number of houses
      . collapse (count) within15 = house, by(school)
      
      . list
      
           +-------------------+
           | school   within15 |
           |-------------------|
        1. |      1          3 |
        2. |      2          2 |
        3. |      4          3 |
        4. |      5          3 |
           +-------------------+
      
      .
      . * merge back with original school data
      . merge 1:1 school using "schools.dta", nogen
      
          Result                           # of obs.
          -----------------------------------------
          not matched                             1
              from master                         0  
              from using                          1  
      
          matched                                 4  
          -----------------------------------------
      
      . sort school
      
      . list
      
           +------------------------------------------------------+
           | school   within14         lat         lon   students |
           |------------------------------------------------------|
        1. |      1          3    .4453467   .09112254        605 |
        2. |      2          2   .83407076   .24417454        167 |
        3. |      3          .    .7268483   .64209338        980 |
        4. |      4          3   .22134942   .58238519        640 |
        5. |      5          3   .73406389   .37945151        964 |
           +------------------------------------------------------+
      Last edited by Robert Picard; 23 Feb 2016, 17:01. Reason: changed within15 to within 14

      Comment

      Working...
      X