Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Identifying nearest neighbors

    Hi Folks, I have a number of water wells, whose depths are recorded. I know their coordinates, and want to determine those that are located within 1 km of other wells. Will teffects get me there? Ultimately, I would like to do a spatial regression, but this is unbalanced panel data. I think my best bet is to take averages of nearest neighbors.

    Thanks!

    -Steve

  • #2
    So I believe your first question is identifying wells who are located within 1 km of other wells. I don't know what your data looks like (see the FAQ for dataex), but I generated some sample data.

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input float wellNo str24(longitude latitude)
     1 "-152.43950945" "-21.85213278"
     2 "-152.3726578"  "-21.9021224"
     3 "-152.40563908" "-21.74724129"
     4 "-152.40739776" "-21.91309043"
     5 "-152.32675728" "-21.80928953"
     6 "-152.40195805" "-21.79260702"
     7 "-152.39944152" "-21.76482483"
     8 "-152.44531577" "-21.79091045"
     9 "-152.50109519" "-21.84645562"
    10 "-152.37789756" "-21.81652595"
    11 "-152.37917162" "-21.84705634"
    12 "-152.46804084" "-21.77047508"
    13 "-152.48958026" "-21.82356655"
    14 "-152.46396156" "-21.81541373"
    15 "-152.39214841" "-21.83137453"
    16 "-152.43509205" "-21.77150633"
    17 "-152.33159817" "-21.85552058"
    18 "-152.48673575" "-21.85673473"
    19 "-152.33912399" "-21.78932653"
    20 "-152.41024388" "-21.83814376"
    21 "-152.35050844" "-21.79710933"
    22 "-152.48882972" "-21.8151216"
    23 "-152.45806256" "-21.89380419"
    24 "-152.40388105" "-21.7423344"
    25 "-152.33906248" "-21.80901056"
    26 "-152.429231"   "-21.84659144"
    27 "-152.34997744" "-21.79235219"
    28 "-152.4138936"  "-21.89970719"
    29 "-152.4127534"  "-21.77729596"
    30 "-152.463331"   "-21.89522026"
    end
    This will create a variable == 1 if there exists another well within 1 km of it.
    Code:
    ssc install GEODIST
    
    gen hasNearby = 0
    local N = _N
    
    forvalues i = 1/`N' {
        local lat1 = latitude[`i']
        local long1 = longitude[`i']
        forvalues j = 1/`N' {
            if `i' == `j' {
                continue
            }
            else {
                local lat2 = latitude[`j']
                local long2 = longitude[`j']
                geodist `lat1' `long1' `lat2' `long2'
                replace hasNearby = 1 in `i' if `r(distance)' <= 1
                if `r(distance)' <= 1{
                    continue, break
                }
            }
        }    
    }
    Perhaps someone can comment on the other aspects.
    Last edited by Andrew Castro; 30 Jan 2017, 19:51.

    Comment


    • #3
      Thanks Andrew, very good, much appreciated. I have this rich long data set, where wells were not tested every year, so its unbalanced. It seems a spatial analysis would be the best, but due to the unbalanced nature, its a challenge. I wonder if there is some way to use a different weight matrix for each year in the series.

      Comment


      • #4
        Originally posted by Andrew Castro View Post
        So I believe your first question is identifying wells who are located within 1 km of other wells. I don't know what your data looks like (see the FAQ for dataex), but I generated some sample data.

        Code:
        * Example generated by -dataex-. To install: ssc install dataex
        clear
        input float wellNo str24(longitude latitude)
        1 "-152.43950945" "-21.85213278"
        2 "-152.3726578" "-21.9021224"
        3 "-152.40563908" "-21.74724129"
        4 "-152.40739776" "-21.91309043"
        5 "-152.32675728" "-21.80928953"
        6 "-152.40195805" "-21.79260702"
        7 "-152.39944152" "-21.76482483"
        8 "-152.44531577" "-21.79091045"
        9 "-152.50109519" "-21.84645562"
        10 "-152.37789756" "-21.81652595"
        11 "-152.37917162" "-21.84705634"
        12 "-152.46804084" "-21.77047508"
        13 "-152.48958026" "-21.82356655"
        14 "-152.46396156" "-21.81541373"
        15 "-152.39214841" "-21.83137453"
        16 "-152.43509205" "-21.77150633"
        17 "-152.33159817" "-21.85552058"
        18 "-152.48673575" "-21.85673473"
        19 "-152.33912399" "-21.78932653"
        20 "-152.41024388" "-21.83814376"
        21 "-152.35050844" "-21.79710933"
        22 "-152.48882972" "-21.8151216"
        23 "-152.45806256" "-21.89380419"
        24 "-152.40388105" "-21.7423344"
        25 "-152.33906248" "-21.80901056"
        26 "-152.429231" "-21.84659144"
        27 "-152.34997744" "-21.79235219"
        28 "-152.4138936" "-21.89970719"
        29 "-152.4127534" "-21.77729596"
        30 "-152.463331" "-21.89522026"
        end
        This will create a variable == 1 if there exists another well within 1 km of it.
        Code:
        ssc install GEODIST
        
        gen hasNearby = 0
        local N = _N
        
        forvalues i = 1/`N' {
        local lat1 = latitude[`i']
        local long1 = longitude[`i']
        forvalues j = 1/`N' {
        if `i' == `j' {
        continue
        }
        else {
        local lat2 = latitude[`j']
        local long2 = longitude[`j']
        geodist `lat1' `long1' `lat2' `long2'
        replace hasNearby = 1 in `i' if `r(distance)' <= 1
        if `r(distance)' <= 1{
        continue, break
        }
        }
        }
        }
        Perhaps someone can comment on the other aspects.
        Quick question, what if the latitude and longitude are in km? Doc I have to retroject the data?

        Comment


        • #5
          Originally posted by Steven Archambault View Post
          Thanks Andrew, very good, much appreciated. I have this rich long data set, where wells were not tested every year, so its unbalanced. It seems a spatial analysis would be the best, but due to the unbalanced nature, its a challenge. I wonder if there is some way to use a different weight matrix for each year in the series.
          I don't believe I'm familiar with spatial analysis, as you are referring to. Could you link to a paper discussing it? Or someone else with more expertise can help with that.


          Originally posted by Steven Archambault View Post

          Quick question, what if the latitude and longitude are in km? Doc I have to retroject the data?
          Lat/Long are in degrees. It doesn't make sense to be "50 km lat" unless you have a baseline coordinate or something. If you could post a sample of your data using dataex, it would make it a lot clearer what's going on and you're more likely to get a response from others (see http://www.statalist.org/forums/help#stata).

          Comment


          • #6
            Note that there are more efficient ways to find nearest neighbors using geographic coordinates. The most efficient by far is to use geonear (from SSC). The other is to skip the double loop, form all pairwise combinations, and then calculate distances with one call to geodist (also from SSC). Using Andrew's sample data (in numeric form):

            Code:
            * Example generated by -dataex-. To install: ssc install dataex
            clear
            input float wellNo double(longitude latitude)
             1 -152.43950945 -21.85213278
             2  -152.3726578  -21.9021224
             3 -152.40563908 -21.74724129
             4 -152.40739776 -21.91309043
             5 -152.32675728 -21.80928953
             6 -152.40195805 -21.79260702
             7 -152.39944152 -21.76482483
             8 -152.44531577 -21.79091045
             9 -152.50109519 -21.84645562
            10 -152.37789756 -21.81652595
            11 -152.37917162 -21.84705634
            12 -152.46804084 -21.77047508
            13 -152.48958026 -21.82356655
            14 -152.46396156 -21.81541373
            15 -152.39214841 -21.83137453
            16 -152.43509205 -21.77150633
            17 -152.33159817 -21.85552058
            18 -152.48673575 -21.85673473
            19 -152.33912399 -21.78932653
            20 -152.41024388 -21.83814376
            21 -152.35050844 -21.79710933
            22 -152.48882972  -21.8151216
            23 -152.45806256 -21.89380419
            24 -152.40388105  -21.7423344
            25 -152.33906248 -21.80901056
            26   -152.429231 -21.84659144
            27 -152.34997744 -21.79235219
            28  -152.4138936 -21.89970719
            29  -152.4127534 -21.77729596
            30   -152.463331 -21.89522026
            end
            save "temp.dta", replace
            
            * first method - use geonear (from SSC)
            rename wellNo wellNo0
            geonear wellNo0 latitude longitude using "temp.dta", ///
                neighbor(wellNo latitude longitude) within(1) long
            bysort wellNo0 (km_to_wellNo): keep if _n == _N
            list if km_to_wellNo > 0
            
            * second method - form all pairwise combinations of points
            use "temp.dta", clear
            rename * *0
            cross using "temp.dta"
            geodist latitude0 longitude0 latitude longitude, gen(km_to_wellNo)
            bysort wellNo0 (km_to_wellNo): egen hasNearby = total(km_to_wellNo > 0 & km_to_wellNo < 1)
            by wellNo0: keep if _n == 1
            list if hasNearby
            
            erase "temp.dta"
            If the locations are in map coordinates (in meters, kms, yards, etc.) instead of geographic coordinates, then you can only compute distances on the map (using Euclidean distances). All map projections involve distorsions as portions of a spheroid are transfered to a plane. Distance accuracy depends on the map projection used. Here's a quick example (tweaked from a recent Statalist post):

            Code:
            * Example generated by -dataex-. To install: ssc install dataex
            clear
            input double(PrimaryKey Longitude Latitude)
             2600000660 46829723 9855609
            26010002399 54790663 9426379
            12345677888 54799999 9426666
            end
            format %15.0g PrimaryKey
            save "temp.dta", replace
            
            * form all pairwise combination of points
            rename * *0
            cross using "temp.dta"
            
            * cartesian distance
            gen distance = sqrt((Longitude-Longitude0)^2 + (Latitude-Latitude0)^2)
            
            * so what's the unit of distance?
            sort PrimaryKey0 distance PrimaryKey
            list, sepby(PrimaryKey0)
            
            erase "temp.dta"

            Comment


            • #7
              Thanks folks. I was able to get the coordinates decimal degrees.

              Here is an example of a spatial autoregressive approach using an unbalanced panel.

              Comment


              • #8
                Hello, I have data including neighborhood centroids' coordiantes. I want to create 5 nearest neighborhood centroid based spatial matrix and then, I want to create spatially lagged variable. I have looked on the web, but I cound not find appropriate codes. Plase could you help me to create 5 nearest neighborhod centroid based matrix and spaially lagged variable? Thanks in advance

                Comment

                Working...
                X