Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Dummy variables in first difference regression

    Hi guys,

    I have a potentially very stupid question to ask, but for the life of me I can not figure this out.

    I am estimating cartel damages using a dummy variable approach. Essentially I am setting my "cartel dummy" equal to 1 during the cartel and 0 otherwise, as is the standard approach. However, the other variables in my model are non-stationary and as such I am estimating the regression equation in first differences.

    When using a level regression, the interpretation of the dummy is relatively straightforward - one can more or less directly compute the percentage overcharge from the coefficient on the dummy variable. Basically, it shows by what percentage was higher but-for the cartel.

    My question is, how would one interpret the coefficient on such a dummy variable when all the regressors in the model are in first differences?

    Thanks!
    Albertus

  • #2
    I always emphasis the difference between the model and the estimation method. If your model is in levels, that's how you interpret the estimates regardless of how you estimate the parameters. It's similar to using, say, Cochcrane-Orcutt (GLS), which also uses a transformation of the data. But we don't change the interpretation of the coefficients. If you like the model in levels but are using differencing as the estimation method, interpret the coefficient in levels.

    Having said that, if the model in levels is not "cointegrated" then the model in levels doesn't really make sense. Then you might want to specify the model in first differences, but then the dummy might not be differenced. That's a different model, though, as the dummy directly estimates the effect on the differenced y holding the differenced x's fixed.

    Is this panel data or pure time series?

    Jeff

    Comment


    • #3
      Originally posted by Jeff Wooldridge View Post
      I always emphasis the difference between the model and the estimation method. If your model is in levels, that's how you interpret the estimates regardless of how you estimate the parameters. It's similar to using, say, Cochcrane-Orcutt (GLS), which also uses a transformation of the data. But we don't change the interpretation of the coefficients. If you like the model in levels but are using differencing as the estimation method, interpret the coefficient in levels.

      Having said that, if the model in levels is not "cointegrated" then the model in levels doesn't really make sense. Then you might want to specify the model in first differences, but then the dummy might not be differenced. That's a different model, though, as the dummy directly estimates the effect on the differenced y holding the differenced x's fixed.

      Is this panel data or pure time series?

      Jeff
      Hi professor Wooldridge,

      Thanks for your reply, much appreciated. We are using pure quarterly time series.

      The model would be in first-difference form, something like the following: dlog(Price) = dlog(Explanatory Variable_1, etc.) + collusion_dummy. We are estimating it simply using OLS for the moment. What we are interested in, is the coefficient on the collusion_dummy to understand what the effect of the cartel was on prices over a chosen time period (i.e. by how much they where higher than would otherwise have been the case, controlling for demand and supply drivers). Maybe I am misunderstanding the difference between the model and the estimation method. To be clear, we have a theoretical model where price is explained by certain demand and supply drivers (and essentially the unexplained portion is what we contribute to a cartel that existed in this market, which we intend to quantify using a dummy variable). However, we have found unit roots in all series and as such we take the first difference of the various series to control for the unit root.

      When using an estimation such as log(Cement Price) = log(Explanatory Variable_1, etc.) + collusion_dummy, the interpretation is straightforward. However, as we are differencing our data (to control for the presence of unit roots in the series), that is where we are stuck. Would the collusion dummy then essentially be capturing the difference in growth rates between a cartel vs. no cartel scenario? How would one convert this to calculate what the differences are in level terms.

      Thanks,
      Albertus

      Comment


      • #4
        Originally posted by Jeff Wooldridge View Post
        I always emphasis the difference between the model and the estimation method. If your model is in levels, that's how you interpret the estimates regardless of how you estimate the parameters. It's similar to using, say, Cochcrane-Orcutt (GLS), which also uses a transformation of the data. But we don't change the interpretation of the coefficients. If you like the model in levels but are using differencing as the estimation method, interpret the coefficient in levels.

        Having said that, if the model in levels is not "cointegrated" then the model in levels doesn't really make sense. Then you might want to specify the model in first differences, but then the dummy might not be differenced. That's a different model, though, as the dummy directly estimates the effect on the differenced y holding the differenced x's fixed.

        Is this panel data or pure time series?

        Jeff
        Hi Professor Jeff Wooldridge.

        Thanks for your reply. I have another related and potentially silly question. I find myself in the exact same situation described by Albertus. I have panel data and my DV and continuous IVs are cointegrated. I also have a binary predictor. I am wanting to estimate a general ECM. I am unsure as to whether the binary variable should be differenced or not. Substantively, I fail to see how that would make sense. At the same time, I am interested in computing the long-run effect of my event variable. Do you have any advice?
        Last edited by Francesco Bromo; 23 Apr 2021, 11:18.

        Comment


        • #5
          This paper might be of interest for those posting questions on this thread: https://northcoasteconomics.com/wp-c...g-JAE-2006.pdf

          Comment


          • #6
            Is this panel data or pure time series?
            Jeff Wooldridge What happens if its Panel data. Do I need to differenciate dummy variable (e.g. dummy for a shock) when I am using first-difference Panel Data, T=3?

            Comment


            • #7
              I think your answer to this question is here How to first difference a panel data set with many dummy variables? - Statalist

              Comment


              • #8
                I have further quires. lets say I want to estimate a model

                yict=b0+b*Xict +b1*shock_dummy+individual fixed effect+ community fixed effect + error

                where y is dependent variable, i is individual, c is community and t is time(=3).
                I can estimate this equation easily with fixed effect command using reghdfe. However, If I want to run the same model with first-difference how can I control for individual fixed effect and community fixed effect together? Do I even need to do so?
                I know I can run fixed effect like...
                reg yict Xict b1*shock_dummy+ i.individual_id i.community_id i.year
                or
                reghdfe yict Xict shock_dummy i.year, absorb(individual_id community_id)

                Can I run first-difference of the same model as
                reg d.(Xict shock_dummy i.individual_id i.community_id i.year)
                (I know the result will not be same as we have T=3).

                Comment

                Working...
                X