Hello, I am working on a Indian village level data. I am trying to check the spatial dependence in my dependent variable, which is a binary variable. I have a large dataset with more than 2,35,000 villages. I create a spatial lag of order 1 using
following (Kondo 2018).
I am able to include the spatial lag in my OLS regression and it shows a highly significant coefficient. Similarly in case of logit model. But the spatial lag of the dependent variable is endogenous, and OLS estimates are inconsistent. It is suggested by Kondo (2018) that to overcome endogeneity, spatial econometric model is estimated by maximum likelihood, the method of IV, or generalized method of moments. Others also suggest that spatial autoregressive models with spatial autoregressive disturbances (SARAR models) estimated by general spatial two stage least squares (GS2SLS) proposed by Kelejian and Prucha (1998).
The standard state commands such as
"
"
does not work as I have generated my spatial lag with spgen command described above.
I tried using command
It returns an error "3900".
Can someone suggest me some stata commands or manuals that can help in my case of large datasets? Or how to run the above mentioned models with the defined spatial lag and large dataset? How can I take care of the endogeneity concerns? Also, I want to introduce spatial lag of one of my independent variable of interest. What models should I estimate then?
Code:
spgen depvar, lat(vill_lat) lon(vill_lon) swm(pow 2) dist(20) duni(km) large
I am able to include the spatial lag in my OLS regression and it shows a highly significant coefficient. Similarly in case of logit model. But the spatial lag of the dependent variable is endogenous, and OLS estimates are inconsistent. It is suggested by Kondo (2018) that to overcome endogeneity, spatial econometric model is estimated by maximum likelihood, the method of IV, or generalized method of moments. Others also suggest that spatial autoregressive models with spatial autoregressive disturbances (SARAR models) estimated by general spatial two stage least squares (GS2SLS) proposed by Kelejian and Prucha (1998).
The standard state commands such as
"
Code:
spregress y x, gs2sls dvarlag(Wy)
does not work as I have generated my spatial lag with spgen command described above.
I tried using command
Code:
spregress y lagy xlist, gs2sls
Can someone suggest me some stata commands or manuals that can help in my case of large datasets? Or how to run the above mentioned models with the defined spatial lag and large dataset? How can I take care of the endogeneity concerns? Also, I want to introduce spatial lag of one of my independent variable of interest. What models should I estimate then?