Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Poisson regression for panel data: exposure variable

    Hello. I am working on a research that seeks to identify if precipitation influences the occurrence of traffic accidents. For this, I am working with a data set from 2010 to 2015 with data on traffic accidents and precipitation in 47 municipalities in Brazil. I have been thinking of adding the fleet (number of vehicles per municipality) as an exposure variable. I would like to know if for this, it is necessary that the exposure variable has data in the same time period as the other variables. For I have found fleet data for the year 2016. If I am going to use the fleet as an exposure variable is it necessary to have data from 2010 to 2015?

  • #2
    Please can anyone tell me what the effect of the exposure variable in the regression will be? I did not find much about the display variable in the manual.

    Comment


    • #3
      I would like to know if for this, it is necessary that the exposure variable has data in the same time period as the other variables.
      Ideally, yes, the exposure variable should be ascertained for the same time period as the other variables. Sometimes, an estimate of the actual exposure is made using a value from a single time point within the actual time period. Doing this can introduce some bias, and would certainly add uncertainty to your estimates. But if that's the best data available, it's better than not including an exposure at all (unless the fleet sizes in the different municipalities are all the same).

      When you run one of these count models with an exposure variable, the logarithm of the exposure variable is entered into the regression as a regressor, and its coefficient is constrained to 1. The result is that the exposure variable provides units to the denominator of the result. Thus, if you include fleet size as an exposure, and if your incidence rate ratio for say rain is estimated as 1.002, that means that the accident rate goes up by a factor of 1.002 per car in the fleet. There is no separate "effect" of the exposure variable in the sense that other model variables have an effect.

      Comment


      • #4
        Thanks a lot, Clyde

        Comment


        • #5
          Picking up on this thread again - can exposure variable be used to find effects on proportions? For example if I am interested in computing the effect of an intervention in a company on proportion of low income customers (as a % of total customers), could I use the number of total customers as the exposure variable, with the number of low income customers as my DV in a count model such as PPML?

          Thereby, how would this differ from using glm or xtgee which are often used when DV is a proportion as discussed here: https://www.statalist.org/forums/for...both-inclusive

          Clyde Schechter , will appreciate your thoughts!

          Comment


          • #6
            If you're mainly interested in the proportion of low-income customers, model that proportion using fracreg without an offset. Alternative, use the binomial quasi-MLE available with glm where the total number of customers is the upper bound.

            Code:
            fracreg logit proplowinc x1 ... xK, vce(robust)
            glm lowinccust x1 ... xK, fam(bin totalcust) link(logit) vce(robust)

            Comment


            • #7
              Sorry, read too quickly. You can still use the above, but use vce(cluster id). Or, use xtgee with fam(bin) if you're using the fraction or fam(bin totalcust) if using the count.

              Comment


              • #8
                Thank you Prof. Jeff Wooldridge . Really appreciate your response. I had previously tried using glm but the model did not converge. What I am taking away from your comments is that either fracreg , glm or xtgee are the appropriate approaches and one should not use ppml with offset. Am I understanding correctly?

                Also, what determines the choice among these options?

                Thank you.

                Comment


                • #9
                  Dear Prof. Jeff Wooldridge Would be very helpful to get further comments from your on my query. Thank you!

                  Comment


                  • #10
                    You can use PPML with an offset but your estimates won't naturally satisfy the logical restriction that the predicted number of low income customers is less than the total number of customers. That's because an exponential function can take values larger than one. But this would hardly be the only case where such logical restrictions aren't imposed. You can use total customers as an offset or even include log(totalcustomers) as an explanatory variable in the PPML estimation and estimate its coefficient.

                    Comment


                    • #11
                      Thank you very much Prof. Jeff Wooldridge . I will also highly appreciate your advise on what determines the choice among fracreg , glm or xtgee for modelling proportions. Thank you so much.

                      Comment

                      Working...
                      X