Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Why running a loop flips the sign of the regression coefficient ?

    Dear Statalist
    ***I run the following regression to calculate the coefficient on neg_siW

    gen c1=.
    ivreg2 cesur neg_siW, cluster(firmid yr)
    replace c1=_b[neg_siW]

    sum c1
    **I get:

    Variable Obs Mean Std. Dev. Min Max

    c1 68490 -.0397268 0 -.0397268 -.0397268

    *ofcourse c1 is the same for all observations
    *******I try to run a loop to estimate the coefficient per industry (sic_2) and year (yr) combination:

    gen c2=.

    levelsof sic_2, local(levels)

    foreach x of local levels {
    foreach z of numlist 1990/2012 {

    capture reg cesur neg_siW if sic_2==`x' & yr==`z'

    if _rc == 0 {

    replace c2=_b[neg_siW] if e(sample)

    }
    }
    }

    sum c2
    ** I get:
    Variable Obs Mean Std. Dev. Min Max

    c2 68490 .0347764 1.877543 -25.60417 87.14806

    What I don't understand is that why the loop generates c2 that has a mean very similar to the value of c1 (which should be the case on average) but with a WRONG SIGN ?
    I tried to re run the pooled cross section regression using i.sic_2 i.yr , still the same correct sign which is positive. The sign in the pooled cross sectional regression is the correct one i.e. for c1
    while the mean of c2 has a wrong sign.

    Why does the loop flips the sign ?

    Thanks

  • #2
    I don't really understand your question, but if you are looking at a slope for individual industries, why would you expect all their coefficients to be positive? In a random slopes context for a nonsignificant slope overall, it seems reasonable to have some negative and some positive slopes.

    Comment


    • #3
      Thanks Dave
      Yes of course the coefficient on neg_siW will be positive in some industries and negative on others, but on average it should have a mean that is negative. When I run the regression for all data in the panel the coefficient is negative similar to other studies, but the loop results in coefficients that have a negative mean. Surprisingly, it is almost the same value but different sign.
      Do I have any errors in my loop code ?

      Comment


      • #4
        Ahmed,

        There is nothing particularly wrong with your loops, as far as I can tell. Like Dave, I don't understand why you would expect the two results to be the same sign, when both are very close to zero. The fact that the absolute values are similar is probably just coincidence.

        To expand on Dave's post, here are some specific things I would want to know in order to find other reasons why you are getting these results:
        • What does the distribution of coefficients look like in your stratified analysis? You have a small standard deviation but a pretty wide range, which suggests some significant outliers which could be influencing the mean.
        • Are the sample sizes in each strata the same? If not, you are getting a weighted estimate of the mean (with weight=sample size). As a result, are some strata inordinately influencing the mean?
        This may be a naive question (I don't know much about ivreg2), but just out of curiosity: why does your first run use ivreg2 and your stratified run use reg?

        Also, you need to indicate in your posts the source of any user-written commands. ivreg2 is from SSC, not from Stata (although it is heavily based on Stata's ivregress).

        Regards,
        Joe

        Comment


        • #5
          I think the history is more complicated here. ivreg2 (SSC) is a much developed variant on ivreg, and ivregress is too.

          Comment


          • #6
            Thanks Nick,
            I was waiting for you to be involved
            The issue is not with the ivreg2. Even if I use reg instead of ivreg2 in both the pooled cross sectional time series or the loop it still the same (Neither reg or ivreg2 will affect the value of the coefficients and the difference between them will obviously be in the standard errors and t statistics). I also tried to use a panel data fixed effect model instead of ivreg2 and it is still the same.
            The sign is very important in my research as it is an indicative of the existence of one type of a certain behavior in accounting (earnings management). What I can't understand is why the mean of the coefficients in each SIC year combination has a wrong sign comparing to the sign of the coefficient when running a regression in the panel data.

            Dear Joe,
            If you mean by strata (indusrty and year combination),
            the sample size in each strata is NOT the same. Of course the number of companies within a certain industry and year will be different that other industries and years combinations. I tried to run the following commands to get a better idea on each industry year group:

            egen sic2id=group(sic_2 yr)

            egen count=count(sic2id),by(sic2id)

            sum count
            Variable | Obs Mean Std. Dev. Min Max
            -------------+--------------------------------------------------------
            count | 68490 158.2229 118.6049 17 499

            I attach two histograms for count and c2

            **I even tried to drop count less than 25 which results in a loss of 3300 obs , I summarize c2 again as below, but unfortunately the sign is still the same
            . sum c2

            Variable | Obs Mean Std. Dev. Min Max
            -------------+--------------------------------------------------------
            c2 | 65190 .0170691 .816219 -10.85221 7.269979

            Any more suggestions ?
            Attached Files

            Comment


            • #7
              I am just commenting on the history, nothing more. But while I am posting I will add a different comment.

              Please don't post MS Word attachments. The presumption that all Statalist members can read them is quite incorrect. Many members just don't have MS Word on their machines. Stata graphics are best posted as .png files.

              Comment


              • #8
                Ahmed,

                Your first histogram highlights well the point I am trying to make, but you need to investigate further. Almost all of your coefficients are close to zero, but you have a few outliers, some down to -20 and some up to 80. I did a little simulation to try to reproduce your issue. Using random normal values for x and random deviations from y=-0.03x for y, I got what you expected: the pooled slope was the same as the mean of the stratified slopes (-0.03). Then. for one industry-year combination I made the y values random deviations from y=20x and I got what you got: a small negative pooled slope and small positive mean of the stratified slopes.

                In other words, a few observations that don't follow the usual pattern, especially clustered in a particular industry-year combination, will not have undue influence on the pooled slope, but will have a lot of influence on the slope for a particular strata, which will in turn influence the mean of the slopes. Accordingly, you need to investigate those strata with unusually high slopes (say >5 or <-5) and see what industry-years they are in and what might be causing them to have slopes that are so out of line with the rest of the industry-years.

                The sample size in each strata is probably less of an issue than I originally thought, since both analyses are weighted the same. Moreover, you don't have any strata with wildly unusual sample sizes.

                Regards,
                Joe

                Comment


                • #9
                  Joe,
                  Thanks for your comments. I think there has been some coefficients with extremely large values in some industries that don't make sense, Probably because the dependent and independent variables have very different gross values !! I tried to remove those extreme coefficient values and it seems that the means picks up the correct sign.
                  Thanks for your comments, that was extremely useful !
                  Best regards
                  Ahmed

                  Comment

                  Working...
                  X