Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • r(2000) no observations in panel data regression (new to Stata)

    Hello, I am very new to Stata, I'm currently trying to do a regression using panel, the command I'm using is xtreg taxa_obito taxa_cf

    The variable taxa_cf shows the rate of the number of medical facilities at the neighborhood i [(number of facilities/city population)*100.000, (panel variable)] in the month j (time variable), I want to see the effect of the growth in number of medical facilities in the death by diabetes mellitus rate ( taxa_obito is the diabetes mortality rate by neighborhood and month), the time variable (date) begins in january 2006 and ends in december 2016, and there are 158 neighborhoods.

    Before 2009, there were literally no medical facilities in those neighborhood, so there are a lot of zeros in the variable taxa_cf, also, even today not all neighborhoods have received medical facilities, so there are still some zeros after 2009, I first thought of doing a diff in diff approach, but the problem is that the treatment was gradual (e.g., a certain neighborhood received 1 medical facility in january 2009, then another in may 2011, then another in june 2014 and so on.. and the more facilities a certain neighborhood have the more impact they make, supposedly)

    I could merge the neighborhoods into bigger locations called "planning regions" (N would become something like 15 instead of 158), this would make it so that the variable taxa_cf would always be positive after october 2010, but I'm not sure if losing the 158 locations would be wise..

    Honestly I'm lost here, I gladly welcome all tips, hints, critiques, anything.

    Thank you all.

  • #2
    It is most unclear whether the title of your post has anything to do with the content.

    Let me assume the title is correct: you are running that regression and getting a message saying "r(2000) no observations." This would not have anything to do with zero values of the variables in the regressions. There are two reasons you would be getting this:

    1. The pattern of missing values (not zeroes) in the data is such that every observation has a missing value for at least one of the regression variables. In that case the regression cannot be done, as no observation with any missing values among the regressors can be part of the estimation sample. If you are using missing values as a code for zero, you should change them back to zero. Missing values should never be used to code for real numeric values (nor vice versa, in my opinion, notwithstanding widespread usage.) Missing values should mean that the value is not observed, nothing else.

    2. The more likely explanation is that one of your regressor variables is a string variable. String variables cannot participate in regressions at all, and their presence is interpreted by Stata as missing values on all observations, hence no observations left for the regression. The way to see if this is what's happening is to run -describe taxa_obito taxa_cf-. The second column of the output table from that command gives you the storage type. If a storage type begins with the letters str, then that variable is a string variable. The solution then is to convert it to numeric. Most likely the appropriate way to do that is with the -destring- command, but without actually seeing an example of the data it is hard to be certain of this.

    If the above suggestions do not show you the source of your problem and enable you to resolve it, I suggest posting back with an example of your data (be sure use the -dataex- command to do this), and also show the exact command you ran that led to the error message.

    Now, it may be that you are not actually getting an error message and you are just concerned about the distributions of zeroes in your data and wondering whether a different model might be in order. That's a different question altogether.

    Comment


    • #3
      Originally posted by Clyde Schechter View Post
      It is most unclear whether the title of your post has anything to do with the content.

      Let me assume the title is correct: you are running that regression and getting a message saying "r(2000) no observations." This would not have anything to do with zero values of the variables in the regressions. There are two reasons you would be getting this:

      1. The pattern of missing values (not zeroes) in the data is such that every observation has a missing value for at least one of the regression variables. In that case the regression cannot be done, as no observation with any missing values among the regressors can be part of the estimation sample. If you are using missing values as a code for zero, you should change them back to zero. Missing values should never be used to code for real numeric values (nor vice versa, in my opinion, notwithstanding widespread usage.) Missing values should mean that the value is not observed, nothing else.

      2. The more likely explanation is that one of your regressor variables is a string variable. String variables cannot participate in regressions at all, and their presence is interpreted by Stata as missing values on all observations, hence no observations left for the regression. The way to see if this is what's happening is to run -describe taxa_obito taxa_cf-. The second column of the output table from that command gives you the storage type. If a storage type begins with the letters str, then that variable is a string variable. The solution then is to convert it to numeric. Most likely the appropriate way to do that is with the -destring- command, but without actually seeing an example of the data it is hard to be certain of this.

      If the above suggestions do not show you the source of your problem and enable you to resolve it, I suggest posting back with an example of your data (be sure use the -dataex- command to do this), and also show the exact command you ran that led to the error message.

      Now, it may be that you are not actually getting an error message and you are just concerned about the distributions of zeroes in your data and wondering whether a different model might be in order. That's a different question altogether.
      Thank you, Clyde, indeed the variables were set as strings, I've adjusted them and it seems the regression is working... Like I said I'm very new to this program so I'm still struggling with it a lot. As for the model, I am indeed worried about it because of all the zeroes, currently thinking about reducing the N but I worry if the panel with FE will still be adequate, trying to figure out a solution for this predicament, but like you said, a different matter altogether, thank you!

      Comment

      Working...
      X