Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Using a counting model to estimate a binary response model

    Hello, statalisters.
    I apologize in advance for this question because it might be too naive.
    I would like to know if, statistically speaking, I can estimate a binary model (usually estimated via probit or logit) but using a counting model instead.
    This is because I'm struggling with the running time of -xtlogit- with FE. And I was wondering if can I use -ppmlhdfe- instead, which is way faster.
    I know the nature of the dependent variable is different, but I think maybe it is an "acceptable" trick to deal with the computation time.
    By the way, I have a binary response model with geographic (over 1000 different places) and time (10 years) fixed effects. Plus other controls.
    Thank you for your advice.

  • #2
    HTML Code:
    https://stats.stackexchange.com/questions/18595/poisson-regression-to-estimate-relative-risk-for-binary-outcomes

    Comment


    • #3
      Originally posted by George Ford View Post
      HTML Code:
      https://stats.stackexchange.com/questions/18595/poisson-regression-to-estimate-relative-risk-for-binary-outcomes
      Thank you George Ford. I had not seen that post before. I appreciate it.

      Comment


      • #4
        My view is that Yes, it is appropriate to use Poisson regression on binary outcomes. And I think so because the binary outcome is a subclass of Poisson outcome, where the count can be only 0 or 1.

        We have here experts on this class of models ( Jeff Wooldridge and Joao Santos Silva ), and apart from OP I am also interested to hear the experts views on whether Poisson model is appropriate for binary outcomes.

        Comment


        • #5
          Like Joro, I would be interested in hear what Jeff thinks about this but, since I was tagged, here are my two cents.

          Before addressing the question, I would say that estimating a FE logit with 10 periods is always going to take a long time, so my first suggestion is that you simply let it run over night to see if you get results. There is also another potential problem with Stata's implementation of this estimator: as far as I know (but I may be wrong!), Stata does not check for the existence of "separation" or perfect predictors in this estimator, and if this is an issue in your sample the estimator may never converge. Maybe you can try to estimate the model using just pairs of years in your sample (this should be quick) and see if those converges; if it does you can then use the estimates as starting values for the estimation with the full sample; hopefully it will speed up things.

          Moving now to the ppmlhdfe command, at least you can use it to investigate the problem above because it will give you information about separated observations, and it can provide starting values.

          Whether the PPML estimates can be used for more than that will depend on your data. The obvious problem of using PPML in this context is that it assumes an exponential conditional expectation that will not be valid for binary data. However, if the probabilities you are estimating are all sufficiently close to zero, the exponential function will be a good approximation to the logistic and therefore the use of PPML can be justified on those grounds. So, perhaps you can start by computing the average of the dependent variable for each unit and see what that looks like. If you have a decent proportion of units with averages around 0.5 or above, I would not advise using PPML in your case.

          Best wishes,

          Joao

          Comment


          • #6
            Originally posted by Joao Santos Silva View Post
            Like Joro, I would be interested in hear what Jeff thinks about this but, since I was tagged, here are my two cents.

            Before addressing the question, I would say that estimating a FE logit with 10 periods is always going to take a long time, so my first suggestion is that you simply let it run over night to see if you get results. There is also another potential problem with Stata's implementation of this estimator: as far as I know (but I may be wrong!), Stata does not check for the existence of "separation" or perfect predictors in this estimator, and if this is an issue in your sample the estimator may never converge. Maybe you can try to estimate the model using just pairs of years in your sample (this should be quick) and see if those converges; if it does you can then use the estimates as starting values for the estimation with the full sample; hopefully it will speed up things.

            Moving now to the ppmlhdfe command, at least you can use it to investigate the problem above because it will give you information about separated observations, and it can provide starting values.

            Whether the PPML estimates can be used for more than that will depend on your data. The obvious problem of using PPML in this context is that it assumes an exponential conditional expectation that will not be valid for binary data. However, if the probabilities you are estimating are all sufficiently close to zero, the exponential function will be a good approximation to the logistic and therefore the use of PPML can be justified on those grounds. So, perhaps you can start by computing the average of the dependent variable for each unit and see what that looks like. If you have a decent proportion of units with averages around 0.5 or above, I would not advise using PPML in your case.

            Best wishes,

            Joao
            Thank you, Dr. Santos Silva and Dr. Kolev. I ran the -xtlogit- and it converged during the night. I'm also checking the means for each unit, and it seems most of them are lower than .1.
            Regarding your other recommendation, is it possible to set initial values in xtlogit? Thank you

            Comment


            • #7
              I believe you can you the option "from()" to set the staring values; please check the maximization options.

              Comment


              • #8
                I face very similar challenges. I am trying to estimate a gravity style event study regression with a binary dependent variable and a binary treatment variable. I have a panel covering trade of a certain product code between 324 regions over 10 years. The dependent variable, y_ijt, takes the value 1 if region i exports the product to region j at time t and 0 otherwise. The treatment variable is a policy change that affects a subset of the region pairs at t=4 (all in the same year). I want to see if the policy change affected the probability for regions to trade that product code. I have tried to run the following code:

                ppmlhdfe y treated*t_1 treated*t_2 …. treated*t_10, absorb(i.exporter_t i.importer_t i.exporter_importer) vce(cluster exporter_importer)

                (treated takes value 1 if the exporter-importer pair was affected by the policy, t_n is a dummy variable that takes the value 1 if time equals n, I omit treated*t_4 to avoid collinearity)

                My concerns are the following:
                1. While I obtain output, I get the following message “ReLU separation check: maximum number of iterations reached; aborting”. I get similar estimates while running the same specification using ppml_panel_sg without getting any error messages.

                2. I worry that the probabilities I estimate or not close enough to zero, causing the exponential function not be a good approximation for the logistic function. The average probability of observing trade in the certain product between two regions is 0.25.

                3. I have tried to estimate the model using xtlogit without the estimator converging. I have also tried using only pairs of years without it converging either.

                I wonder if perhaps Joao Santos Silva or Sergio Correia may have some useful answers. I would be very thankful for any input.

                Comment


                • #9
                  Dear Bengt Soderlund,

                  It is a bit early for me, so my brain may not bee fully caffeinated, but I am afraid I do not see a good solution for this. From the description of your data, it looks as if a binary model would be preferable, but I do not think a logit would be consistent with the 3 sets of FE. So, you are left with the PPML results, which will be somewhat unreliable because the exponential functional form may not be suitable. As for the message you are getting about the ReLU method, I think it is safe to ignore it, but you can try different flavours of the separation option.

                  An alternative would be to model the volume of trade and compare results for the full sample and for the sub-sample where trade was positive before the policy change. This would allow you to use PPML and may allow you to see if an increase in trade volumes can be fully explained by the extensive margin.

                  Best wishes,

                  Joao

                  Comment


                  • #10
                    Dear Joao Santos Silva ,

                    Thank you so much for taking the time to write this very valuable response. I did not know much about the various way to handle separation. Also, comparing results between always-traders and the full sample to check the validity of PPML is a great suggestion.

                    Best,
                    Bengt

                    Comment

                    Working...
                    X