Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Error using spgenerate

    Hello Statalisters,

    I am trying to create spatial lag variable manually using spgenerate command. I first create my spatial weight matrix and the use that to multiply the variable I want to spatially lag. Below is my code:

    Code:
    **xtset the data.
    xtset ID YEAR
    
    save mypanel, replace
    
    **Spset data
    use mypanel
    spset ID, coord(lon lat)
    
    **Set the coordinate units, if necessary
    spset, modify coordsys(latlong)
    
    spset, modify coordsys(latlong, miles)
    
    
    **Save the data
    save, replace
    
    
    ***Create inverse distance weight matrix W for point locations within 0.5miles radius
    
     spmatrix create idistance W if YEAR == 2009, vtruncate(2) normalize(row)
    
     spmatrix summarize W
    
    Weighting matrix  W
    ---------------------------------------
               Type |            idistance
      Normalization |                  row
          Dimension |          2105 x 2105
    Elements        |
       minimum      |                    0
       minimum > 0  |             .0572685
       mean         |             .0003566
       max          |                    1
    ---------------------------------------
    Now to spatially lag a variable, fee:
    Code:
    spgenerate Wfee = W*fee
    But I got the following error
    Code:
    _IDs in weighting matrix W do not match _IDs in estimation sample
        There are places in W not in estimation sample and places in estimation sample not in W05.
    Is this because of the "islands" ? Or how do I resolve this?

    Any help is much appreciated.

    Thank you.

  • #2
    Dear Statalisters,

    Does anyone have an idea how to resolve my error, please?

    Comment


    • #3
      FAQ 12 advises you to present a reproducible example to increase your chances of obtaining helpful replies. This will probably be the issue, but I cannot guarantee it because I am not able to test the proposed solution and I am unwilling to create a reproducible example on your behalf.

      spmatrix create idistance W if YEAR == 2009, vtruncate(2) normalize(row)
      Here you create a matrix using a condition, i.e., "if YEAR == 2009". Thereafter, you are instructing Stata to generate a variable with the full set of observations

      Now to spatially lag a variable, fee:
      Code:
      spgenerate Wfee = W*fee
      Then it makes sense that Stata complains

      _IDs in weighting matrix W do not match _IDs in estimation sample
      There are places in W not in estimation sample and places in estimation sample not in W05.
      Depending on whether spgenerate allows the -if- qualifier, you may do the following:

      Code:
      spgenerate Wfee = W*fee if YEAR==2009
      or alternatively

      Code:
      preserve
      keep if YEAR==2009
      spgenerate Wfee = W*fee
      save the variable and identifiers, restore and merge back to the full dataset.

      Comment


      • #4
        Thank you Andrew Musau. Actually, this is a strongly balanced panel data and the spset variable, ID uniquely identify each panel across the years.
        Code:
         
        //Verify that unit and YEAR jointly identify the observations
        
        assert ID!=.
        
        . assert YEAR!=.
        
        . bysort ID YEAR: assert _N==1
        
        .
        Using the -if- qualifier produces missing values for the Wfee variable for all other years except 2009. And for the suggested alternative, I am not sure how that works though. My data is somewhat private. But I will try to see how I can generate a reproducible sample so I can share it here to see if you or any other person can help me out on this.

        Thanks.

        Comment


        • #5

          Using the -if- qualifier produces missing values for the Wfee variable for all other years except 2009.
          Why are you surprised by this? It's exactly what you asked for by using the -if- qualifier.

          But I will try to see how I can generate a reproducible sample so I can share it here to see if you or any other person can help me out on this.
          I will save you the work. The following replicates your error and shows that my suggestion in #3 is right on the money. There is no evidence that you tried what was suggested.

          Code:
          copy https://www.stata-press.com/data/r16/homicide1990.dta .
          copy https://www.stata-press.com/data/r16/homicide1990_shp.dta .
          use homicide1990, clear
          spset
          spmatrix create contiguity W if _n<=500
          spgenerate W_gini = W*gini
          spgenerate W_gini = W*gini if _n<=500
          Res.:

          Code:
          . copy https://www.stata-press.com/data/r16/homicide1990.dta .
          
          . copy https://www.stata-press.com/data/r16/homicide1990_shp.dta .
          
          .
          . use homicide1990, clear
          (S.Messner et al.(2000), U.S southern county homicide rates in 1990)
          
          .
          . spset
            Sp dataset homicide1990.dta
                          data:  cross sectional
               spatial-unit id:  _ID
                   coordinates:  _CX, _CY (planar)
              linked shapefile:  homicide1990_shp.dta
          
          .
          . spmatrix create contiguity W if _n<=500
            weighting matrix in W contains 2 islands
          
          .
          . spgenerate W_gini = W*gini
          _IDs in weighting matrix W do not match _IDs in estimation sample
              There are places in W not in estimation sample and places in estimation sample not in W.
          r(459);
          
          .
          . spgenerate W_gini = W*gini if _n<=500
          Last edited by Andrew Musau; 13 Aug 2020, 01:06.

          Comment


          • #6
            Hello Andrew Musau, I think you may be missing the point that this I am dealing with a panel data. With a single cross-sectional data, your suggestion would have resolved the error/problem.

            Comment


            • #7
              Does that matter? Provide a data example for any further input from me.

              Comment


              • #8
                Hello Andrew Musau, so I tried using the panel version of homicide data you used. After a number of tries, below is what I came up with.

                Code:
                . copy https://www.stata-press.com/data/r16/homicide_1960_1990.dta .
                
                . copy https://www.stata-press.com/data/r16/homicide_1960_1990_shp.dta .
                
                
                . use homicide_1960_1990
                (S.Messner et al.(2000), U.S southern county homicide rate in 1960-1990)
                
                
                . xtset _ID year
                       panel variable:  _ID (strongly balanced)
                        time variable:  year, 1960 to 1990, but with gaps
                                delta:  1 unit
                
                . spset
                  Sp dataset homicide_1960_1990.dta
                                data:  panel
                     spatial-unit id:  _ID
                             time id:  year (see xtset)
                         coordinates:  _CX, _CY (planar)
                    linked shapefile:  homicide_1960_1990_shp.dta
                
                
                . spmatrix create contiguity W if year == 1990
                
                
                . spgenerate Wue = W*unemployment
                _IDs in weighting matrix W do not match _IDs in estimation sample
                    There are places in W not in estimation sample and places in estimation sample not in W.
                r(459);
                
                ** So I decided to create the lag variable for each year
                
                . spgenerate W_ue60 = W*unemployment if year == 1960
                
                . 
                . spgenerate W_ue70 = W*unemployment if year == 1970
                
                . 
                . spgenerate W_ue80 = W*unemployment if year == 1980
                
                . 
                . 
                . spgenerate W_ue90 = W*unemployment if year == 1990
                
                ***Then I now generate W_ue as zeros and later fill it as follows :
                
                . gen W_ue = 0
                
                . replace W_ue = W_ue60 if year == 1960
                (1,412 real changes made)
                
                
                . replace W_ue = W_ue70 if year == 1970
                (1,412 real changes made)
                
                . replace W_ue = W_ue80 if year == 1980
                (1,412 real changes made)
                
                . replace W_ue = W_ue90 if year == 1990
                (1,412 real changes made)
                
                .drop W_ue60 W_ue70 W_ue80 W_ue90 
                .
                Now while this achieves what I wanted in creating the manual spatial lag W_ue, it is extremely laborious and very inefficient in the sense that I am manually lagging about 15 variables in my data set. Is it possible to have this done in a much efficient manner?

                Thanks.

                Comment


                • #9
                  Code:
                  use homicide_1960_1990
                  xtset _ID year
                  spset
                  spmatrix create contiguity W if year == 1990
                  levelsof year, local(years)
                  foreach year in `years'{
                      spgenerate XYZ`year' = W*unemployment if year==`year'
                  }
                  egen W_ue= rowmax(XYZ*)
                  drop XYZ*

                  Comment


                  • #10
                    Thanks Andrew Musau.

                    Comment

                    Working...
                    X