Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • geodist two datasets in matrix form

    Dear all,

    I have 2 sets of data - the first is 55,000 pairs of latitude and longitude corresponding to each house, and the second dataset of 300 pairs of latitude and longitude for each school.

    I wish to use the geodist function to find the distance between each house (55,000) and each school (300). In other words, I should get 55,000*300 distances.

    My purpose of doing this is so that I can find out for each house (55,000), how many schools (300) fall within certain range of distances.

    Each dataset looks like this

    HouseID Latitude Longitude

    SchoolID Latitude Longitude

    Would appreciate ideas on how this can be done efficiently.

    Thanks.

  • #2
    Elizabeth,

    Take a look at the joinby command, which does exactly what you are requesting. You will probably have to rename the latitude and longitude variables before using this command so that you can distinguish between them after the join.

    Regards,
    Joe

    Comment


    • #3
      The -geonear- package at SSC, also written by Robert Picard as is -geodist-, contains an option to achieve your ultimate goal. In the help for -geodist-, there is simulated data and code to illustrate finding all cell towers within 50km of the block group centroids for the state of Colorado, analogous to finding all schools within SomeDistance of each house. In your context, this command would translate to something like:

      Code:
      local SomeDistance = 100
      geonear Latitude Longitude using "SchoolFile.dta",  n(SchoolID Latitude Longitude ) within(`SomeDistance') near(0) long
      Regards, Mike

      Comment


      • #4
        Thank you, the geonear one was especially helpful for my purpose. I got the results I needed (I'll go check up the joinby command as well!)
        Mike, I've read up on the helpfile for geonear. Wondering if you might be able to assist me on this extension. Based on my above question, I was wondering if it were possible to add a date option (DDMMYYYY) on the geonear command as well.

        The thing is some schools are built only after the houses are built. I wish to find out only the schools within the distance only if it existed at the time the house did.

        So now my dataset variables is as follows:

        HouseID House_Lat House_Lon House_Date
        SchoolID School_Lat School_Lon School_Date

        Any ideas?

        Comment


        • #5
          I'm afraid that geonear cannot be used with such a date constraint. You can, however, use the brute force approach with a problem this size. This means using geodist to calculate the distance between each house and all schools that match your date constraint. It would go something like:

          Code:
          * Generate schools
          clear
          set seed 123456
          set obs 300
          gen SchoolID = _n
          gen double School_Lat = 37 + (41 - 37) * uniform()
          gen double School_Lon = -109 + (109 - 102) * uniform()
          gen School_Date = mdy(1,1,1990) + int(runiform() * 20 * 365)
          format %d School_Date
          tempfile schools
          save "`schools'"
                  
          * Generate houses   
          clear
          set obs 5500
          gen HouseID = _n
          gen double Latitude = 37 + (41 - 37) * uniform()
          gen double Longitude = -109 + (109 - 102) * uniform()
          gen House_Date = mdy(1,1,1990) + int(runiform() * 20 * 365)
          format %d House_Date
          
          * Form all pairwise combinations of schools and houses
          cross using "`schools'"
          
          * Use -geodist- from SSC to calculate distances.
          geodist Latitude Longitude School_Lat School_Lon, gen(d) sphere
          
          * For each house, find the closest school
          sort HouseID d SchoolID
          by HouseID: drop if School_Date > House_Date
          by HouseID: keep if _n == 1

          Comment


          • #6
            Thank you Robert, I will try the codes.
            On the same note, I managed another method that also worked. Using the previous data churned, I subtracted the 2 dates and if it returned negative, I dropped the entire observation. So all I was left with were the pairs of observations with the correct date sequences.

            Comment


            • #7
              I misread your initial post and did not notice that you wanted schools within a predetermined distance. In that case, the date constraint is secondary and can be resolved after geonear has identified neighboring schools. Here's a reworked example that finds all schools within 10km of each house. The example then shows how to recover the house and school dates. Finally, observations where the school opened after House_Date are dropped.

              Code:
              * Generate schools
              clear
              set seed 123456
              set obs 300
              gen SchoolID = _n
              gen double School_Lat = 37 + (41 - 37) * uniform()
              gen double School_Lon = -109 + (109 - 102) * uniform()
              gen School_Date = mdy(1,1,1990) + int(runiform() * 20 * 365)
              format %d School_Date
              tempfile schools
              save "`schools'"
                      
              * Generate houses   
              clear
              set obs 5500
              gen HouseID = _n
              gen double Latitude = 37 + (41 - 37) * uniform()
              gen double Longitude = -109 + (109 - 102) * uniform()
              gen House_Date = mdy(1,1,1990) + int(runiform() * 20 * 365)
              format %d House_Date
              tempfile houses
              save "`houses'"
              
              * Use -geonear- from SSC to identify all schools within a radius of 10km
              geonear HouseID Latitude Longitude using "`schools'", ///
                  n(SchoolID School_Lat School_Lon) long within(10) nearcount(0)
              
              list in 1/10, sepby(HouseID)
              
              * Recover house and school dates.
              merge m:1 HouseID using "`houses'", assert(match using) keep(match) keepusing(House_Date) nogen
              merge m:1 SchoolID using "`schools'", assert(match using) keep(match) keepusing(School_Date) nogen
              
              * Drop schools that do not exist on House_Date
              gen todrop = School_Date > House_Date
              sort HouseID km_to_SchoolID SchoolID
              list in 1/10, sepby(HouseID)
              drop if todrop

              Comment


              • #8
                Thank you Robert! It's works and it's faster!

                Comment

                Working...
                X