Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Time fixed effect and brand fixed effect

    Hi everyone,

    I have a question about adding time fixed effect and brand fixed effect in a random effect model. My current regression code is something like xtreg x1, x2, x3, ..., re.
    However, now I need to add the date dummy in the model, which is about 400 unique dates, and need to add brand dummy, which has around 7,000 unique values.

    First, I want to ask if it makes sense to add for instance, the daily time dummy in the model or has xtreg already take the date fixed effect into the regression. I've only seen people adding year and month dummy in the model but never the date dummy.
    Second, if it make sense to add these dummies, how can I suppress the regression outcomes for these dummies while only keep the main IVs (x1,x2,x3) in my case.
    Third, I wonder if you know in stata, the random effect is on the intercept or the intercept and the slopes?
    Click image for larger version

Name:	random_effect.png
Views:	1
Size:	140.8 KB
ID:	1654316



    Thanks a lot and look forward to your reply.

  • #2
    First, I want to ask if it makes sense to add for instance, the daily time dummy in the model or has xtreg already take the date fixed effect into the regression.
    Whether it makes sense to add the daily time indicator depends on what you are modeling. If it is subject to daily shocks large enough that you need to account for them, then yes. Bear in mind that this is only doable if you have more than one observation per date in each brand. If brand and date uniquely identify observation, then adding the date effect is like adding an observation indicator and the model will be meaningless.

    -xtreg- does not automatically include a time effect. If you want a time effect, you have to put it into the varlist.

    I've only seen people adding year and month dummy in the model but never the date dummy.
    In principle there is no reason one cannot do this with daily dates. It's a question of whether you have enough data to make it feasible, and whether there is enough variation at the daily frequency level to warrant so expansive a model. With that many fixed effects this is going to be very slow at best, and possibly will exceed some limits and not run at all.

    how can I suppress the regression outcomes for these dummies while only keep the main IVs (x1,x2,x3) in my case
    I don't think you can suppress output selectively in -xtreg- itself. What you could do is run -xtreg- -quietly-, and store the estimates. Then use a pretty-print command like -estout- or -esttab-, which has -drop()- and -keep()- options to enable you to restrict the output to what you are interested in.

    Third, I wonder if you know in stata, the random effect is on the intercept or the intercept and the slopes?
    -xtreg, re- estimates models with random intercepts and fixed slopes. If you want random slopes (with or without intercepts), you have to use -mixed- instead.

    Comment


    • #3
      Originally posted by Clyde Schechter View Post
      Whether it makes sense to add the daily time indicator depends on what you are modeling. If it is subject to daily shocks large enough that you need to account for them, then yes. Bear in mind that this is only doable if you have more than one observation per date in each brand. If brand and date uniquely identify observation, then adding the date effect is like adding an observation indicator and the model will be meaningless.

      -xtreg- does not automatically include a time effect. If you want a time effect, you have to put it into the varlist.


      In principle there is no reason one cannot do this with daily dates. It's a question of whether you have enough data to make it feasible, and whether there is enough variation at the daily frequency level to warrant so expansive a model. With that many fixed effects this is going to be very slow at best, and possibly will exceed some limits and not run at all.


      I don't think you can suppress output selectively in -xtreg- itself. What you could do is run -xtreg- -quietly-, and store the estimates. Then use a pretty-print command like -estout- or -esttab-, which has -drop()- and -keep()- options to enable you to restrict the output to what you are interested in.


      -xtreg, re- estimates models with random intercepts and fixed slopes. If you want random slopes (with or without intercepts), you have to use -mixed- instead.
      Hi Clyde,

      Thanks for your reply. I'll try to add the dummies and see if stata would work. Your comments really helped a lot.

      Best wishes
      Meng

      Comment


      • #4
        Originally posted by Clyde Schechter View Post
        Whether it makes sense to add the daily time indicator depends on what you are modeling. If it is subject to daily shocks large enough that you need to account for them, then yes. Bear in mind that this is only doable if you have more than one observation per date in each brand. If brand and date uniquely identify observation, then adding the date effect is like adding an observation indicator and the model will be meaningless.

        -xtreg- does not automatically include a time effect. If you want a time effect, you have to put it into the varlist.


        In principle there is no reason one cannot do this with daily dates. It's a question of whether you have enough data to make it feasible, and whether there is enough variation at the daily frequency level to warrant so expansive a model. With that many fixed effects this is going to be very slow at best, and possibly will exceed some limits and not run at all.


        I don't think you can suppress output selectively in -xtreg- itself. What you could do is run -xtreg- -quietly-, and store the estimates. Then use a pretty-print command like -estout- or -esttab-, which has -drop()- and -keep()- options to enable you to restrict the output to what you are interested in.


        -xtreg, re- estimates models with random intercepts and fixed slopes. If you want random slopes (with or without intercepts), you have to use -mixed- instead.
        Hi Clyde,

        Thanks for your previous comments. The method of running xtreg quietly and use esttab keep options work for me in my case.

        However, when I use the code below, I couldn't get the output for adjusted R value, do you have any idea why this happened?

        Code:
        esttab, ar2 label keep(x1, x2,x3)
        Click image for larger version

Name:	1.PNG
Views:	1
Size:	23.6 KB
ID:	1654924


        Another question that I have is when I add i.brand into the regression using reg, it took more than 3 hours to run the vif and in the end I manually stopped the process. I wonder if you have any suggestions to check the multicollinearity in the model.

        Thanks a lot!

        Comment


        • #5
          I think the code for adjusted R2 in esttab is r2_a, not ar2. Try that. That said, adjusted R2 is not part of the regular output of -xtreg-, so it may not be possible to get it at all.

          As for multicollinearity, it's a waste of time looking for it even when it runs quickly. It is seldom a problem, and when it is, there is nothing you can do about it any way except get a much larger sample, so there's nothing to be gained by testing for it. Read the chapter in Arthur Goldberger's econometrics textbook about it. Or for a much shorter version, look at https://www.econlib.org/archives/200...ollineari.html where Bryan Caplan reviews the matter. The gist of it is this: multicollinearity does not introduce any bias into the coefficient estimates. What it does is inflate the standard errors. (That's why the standard test is called the variance inflation factor.) It only affects the variables that actually participate in the multicollinearity. Often, the only variables involved are variables included only to adjust for their confounding effects, not the actual variables of interest. So in that case, the multicollinearity has no importance at all: the results for the variables of interest are unaffected. Now, if a variable of interest is involved in the multicollinearity, there may be a problem. The problem will show up as a large standard error, and a correspondingly wide confidence interval, and higher test statistic and p-value. So just look at the output for your variable of interest. If the standard error is low enough, i.e. your confidence interval narrow enough, that you can draw conclusions that answer your research question, then you have no problem. If the confidence interval is too wide to enable you to answer your research question one way or another, then you have a problem. But it is a problem with no solution other than getting a much larger data set, or getting an altogether new data set sampled in such a way as to break the multicolinearity.

          Comment


          • #6
            Originally posted by Clyde Schechter View Post
            I think the code for adjusted R2 in esttab is r2_a, not ar2. Try that. That said, adjusted R2 is not part of the regular output of -xtreg-, so it may not be possible to get it at all.

            As for multicollinearity, it's a waste of time looking for it even when it runs quickly. It is seldom a problem, and when it is, there is nothing you can do about it any way except get a much larger sample, so there's nothing to be gained by testing for it. Read the chapter in Arthur Goldberger's econometrics textbook about it. Or for a much shorter version, look at https://www.econlib.org/archives/200...ollineari.html where Bryan Caplan reviews the matter. The gist of it is this: multicollinearity does not introduce any bias into the coefficient estimates. What it does is inflate the standard errors. (That's why the standard test is called the variance inflation factor.) It only affects the variables that actually participate in the multicollinearity. Often, the only variables involved are variables included only to adjust for their confounding effects, not the actual variables of interest. So in that case, the multicollinearity has no importance at all: the results for the variables of interest are unaffected. Now, if a variable of interest is involved in the multicollinearity, there may be a problem. The problem will show up as a large standard error, and a correspondingly wide confidence interval, and higher test statistic and p-value. So just look at the output for your variable of interest. If the standard error is low enough, i.e. your confidence interval narrow enough, that you can draw conclusions that answer your research question, then you have no problem. If the confidence interval is too wide to enable you to answer your research question one way or another, then you have a problem. But it is a problem with no solution other than getting a much larger data set, or getting an altogether new data set sampled in such a way as to break the multicolinearity.
            Thanks Clyde, I'll read the article you attached. I was confused that: If in the core regression model (variables of interest), reg y, x1, x2, x3, there is no severe multicollinearity. Now I want to incorporate brand as control. If after incorporating this variable, the vif increased a lot, should I keep brand as control or simply drop the variable from the model? If it won't affect the coefficients of x1,x2,x3, I can still keep it right?

            Comment


            • #7
              Brand is a categorical variable with several levels. The indicator variables ("dummies") that are created to represent it when you use i.brand are always necessarily highly collinear. But precisely because you are including it only "as control" that collinearity is completely irrelevant. The inclusion of brand does affect the estimates for x1, x2, and x3. And assuming there was good reason to want to include it "as control" in the first place, you should include it. But the collinearity among the brand variables does not affect the estimates for x1, x2, and x3. In short, you have exactly the situation where the multicollinearity is expected and is of no importance whatsoever. Keep brand in the model, and don't even give multicollinearity a thought. You've already wasted more of your time on it than it's worth.

              Comment


              • #8
                Originally posted by Clyde Schechter View Post
                Brand is a categorical variable with several levels. The indicator variables ("dummies") that are created to represent it when you use i.brand are always necessarily highly collinear. But precisely because you are including it only "as control" that collinearity is completely irrelevant. The inclusion of brand does affect the estimates for x1, x2, and x3. And assuming there was good reason to want to include it "as control" in the first place, you should include it. But the collinearity among the brand variables does not affect the estimates for x1, x2, and x3. In short, you have exactly the situation where the multicollinearity is expected and is of no importance whatsoever. Keep brand in the model, and don't even give multicollinearity a thought. You've already wasted more of your time on it than it's worth.
                I see. Thanks a lot for the detailed explanation, Clyde!

                Comment

                Working...
                X