Controlling for village fixed effects in large cross-section data

Parul Gupta

Join Date: Jun 2020

Posts: 135
#1

Controlling for village fixed effects in large cross-section data

21 Mar 2022, 03:55

A reviewer has suggested that I control for village-level fixed effects in my probit regression. I have a large cross-section and the number of villages is more than 17000. The total number of observations is around 500,000. I have tried to use the following command in Stata 14 but I get a -r(103)- error (too many variables). I have tried using -set matsize- and -set maxvar- as well but it didn't help.

Code:

probit y i.village_code $xvar

Is there a solution I could use?
Tags: None
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17612
#2

21 Mar 2022, 04:36

Parul:
type -help about- to display the limits of your version of Stata.

Kind regards,
Carlo
(StataNow 18.5)
Comment

Parul Gupta

Join Date: Jun 2020
Posts: 135

21 Mar 2022, 06:17

Thank you for the suggestion. I got the following, not sure how to interpret this:

Code:

Stata/MP 14.2 for Windows (32-bit)
Revision 19 Dec 2017
Copyright 1985-2015 StataCorp LLC

Total physical memory:     2097151 KB
Available physical memory: 2097151 KB

Comment

Jeff Wooldridge

Join Date: Apr 2014

Posts: 2081
#4

22 Mar 2022, 21:01

I don't think this is a good idea from a statistical perspective, let along a computational one. It will suffer from the incidental parameters problem when you don't have many units per village. Plus, in many villages you'll get a perfect prediction if none of the outcome vary within village. You could try a correlated random effects approach using the village-level averages of all covariates. You might want both the intercept and the slopes to vary by the number of observations per village. I discuss this in Section 20.3 of my MIT Press book, although I suggested only included functions of the cluster size. I would actually include dummies, and you could even estimate a heteroskedastic probit with such village size dummies.
2 likes
Comment

Announcement

Controlling for village fixed effects in large cross-section data

Comment

Comment

Comment