Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Differences between reg and areg.

    I am running a regression with many dummy variables as controls. If I run the regression with the reg command the output I obtain is totally different from the one that I obtain if I use areg (rejecting the null hypothesis of coefficient of interest is equal to zero vs. not doing it). I would like to understand what lies behind this two commands that they create such different results. Knowing that I have a model with many dummy variables as controls, should I always take into account the results that "areg" gives me instead of the one that I obtain with "reg"?
    Thanks!

  • #2
    I would stand the question on its head. Rather than ask what the difference between reg and areg is, I would ask, what do reg and areg have in common. And the answer to that is the three letters r-e-g in the name, and not much else. The two commands estimate very different models and there is no reason to expect their results to be similar.

    -reg- does ordinary least squares regression of independent observations. -areg- is used with data that includes multiple observations on the same entities, and fits a regression model that allows each entity to have its own intercept, but all other coefficients are the same. The effects estimated by -areg- are within entity effects; those estimated by -reg- are between.

    As for which one you should use, you have to understand what kind of data you have and what your research question is. You don't provide enough information about that to give you any more concrete advice.

    Comment


    • #3
      Really good explanation, now I see clear that I should use -areg-. Thank you!

      Comment


      • #4
        either I am confused or Clyde is <grin>; -areg- is a way to deal with a large set of indicator (dummy) variables; try the following and compare the outputs:
        Code:
        sysuse auto
        regress price mpg i.rep78 foreign gear
        areg price mpg foreign gear, a(rep78)

        Comment


        • #5
          I would say that you should provide a data example with -dataex- so that we see what you are doing, and then the mystery would be probably clarified.

          Because if you have say a variable regionid that identifies regions

          reg y x i.regionid

          should be giving you the same results as

          areg y x, absorb(regionid).

          Notwithstanding the useful comments by Clyde, -areg- is in a way convenience command, and everything that can be done through -areg- should be in principle implementable by appropriate -reg-.

          Comment


          • #6
            The examples given by Joro Kolev would, indeed, produce similar results. I probably should not have responded to the original post as I did but should, instead, have asked to see the code that was actually run. My assumption, for which the original post provides no evidence, was that a comparison was being made between -reg y x- and -areg y x, absorb(regionid)-. That was how I understood the original question, though, on re-reading most of that interpretation came from my assumptions, not what was written there.

            Comment


            • #7
              "and everything that can be done through -areg- should be in principle implementable by appropriate -reg-". That was my concern Joro. I couldn't see the problem in using -reg- if I stated the control variables I wanted to use.
              To understand better my case, I am using data from a two-times survey across a sample of households. Therefore, I cannot treat the variables as independent from each other.

              Comment


              • #8
                mentioning the survey in #7 changes, possibly, everything; (1) you don't tell us how you have -svyset- the data; (2) -areg- is not an estimator that can be used with -svy-; see
                Code:
                help svy_estimation

                Comment


                • #9
                  That puzzles me, because I am replicating a research paper from harvard University ("Can higher prices stimulate product use? Evidence from a field experiment in Zambia"), and they use -areg- several times in the code that they have plublished.

                  Comment


                  • #10
                    Information about Carlos Javier Lopez' problem is dripping out slowly. I think that rather than keeping everybody guessing what is going on, he should show the actual code he ran (both -reg- and -areg-) that provoked his concerns, along with the actual Stata output that he got from those commands, so we will all try to explain the real problem rather than conjuring up explanations for what we imagine the problem might be.

                    As William Lisowski tirelessly reminds people here, the more you tell people about your problem, the more they will be able to help you.

                    Comment


                    • #11
                      Alright, I just joined the forum yesterday, I didn't know that. I will try to expllain myself. As I said, I am replicating a paper of an experiment conducted in Zambia. Individuals recieved a good by a random offer price, and if they acceptedit, they would recieve an unexpected random transaction discount. The authors wanted to study which price explains better the proportion of the households using the product, the acepted-offer price or the actual money paid after the transaction discount.
                      follow_use: use after one month
                      first: offer price
                      second: transaction price
                      bought: the household accepted the offer
                      all controls: a wide range of control variables, demographic, health related... all collected by a house to house survey, most of the time "yes or no" questions.
                      Lets say I want to study how the transaction price affect in follow-up use, offer price fixed effects. I have done it in these two ways.

                      areg follow_have second `all_controls' if bought==1, absorb(first)

                      reg follow_have second `all_controls' `first' if bought==1

                      I have tried to send also the output but when I try to do it the format changes and it is really hard to read it. Results are not totally different, the problem appears when some coefficient is slightly significative or not, then you find the differences. Neither both of the results totally match with the actual output of the paper, but both get close, that it is what provokes my doubts. In the code published in the paper authors use areg, but since I still cannot get the exact results, I was asking myself which is the actual difference between the two commands, and which should I use in this especific case.

                      Comment


                      • #12
                        What is `first' in your -reg- command. Unless you have defined a local macro with the name of first, the character sequence `first' will be interpret by Stata as an empty string, and your -reg- command is interpreted as:
                        Code:
                        reg follow_have second `all_controls' if bought == 1
                        which, ironically, is what I imagined in my earlier response!

                        The following, however, would be equivalent to the -areg- command:
                        Code:
                        reg follow_have second `all_controls' i.first if bought == 1
                        Note: This will only work on the assumption that the variable first contains only non-negative integer values. If that is not true in the data (which seems questionable since it is a price so one might anticipate fractional currency units), then the equivalent would be:
                        Code:
                        xi: reg follow_have second `all_controls' i.first if bought == 1
                        That said, this looks offhand like a rather strange thing to do (whether with -reg- or with -areg-). The variable being "absorb"ed here is explained as an initial offer price. That sounds to me like it would ordinarily be treated as a continuous, or at least ordinal, variable in analyses. Its use as an absorbed variable treats it as discrete. Perhaps given that the context here is an experiment and the initial offer price was assigned at random as a treatment, this unusual treatment makes some sense. Nevertheless, such an analysis will be relatively insensitive to the effect of the offering price being related to how high or low it is, compared to an analysis that treats it as a continuous variable.

                        Comment


                        • #13
                          Had a quick look at the data set for the original paper and which can be found at:
                          https://www.povertyactionlab.org/eva...eriment-zambia

                          Description of first and second:
                          Code:
                          . d first second
                          
                                        storage   display    value
                          variable name   type    format     label      variable label
                          --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
                          first           byte    %9.0g                 Offer price (100 Kw)
                          second          byte    %9.0g                 Transaction price (100 Kw)
                          
                          . sum first second
                          
                              Variable |        Obs        Mean    Std. Dev.       Min        Max
                          -------------+---------------------------------------------------------
                                 first |      1,260    5.301587    1.615404          3          8
                                second |      1,260    1.607937    1.763256          0          7
                          
                          . tab first
                          
                          Offer price |
                             (100 Kw) |      Freq.     Percent        Cum.
                          ------------+-----------------------------------
                                    3 |        226       17.94       17.94
                                    4 |        227       18.02       35.95
                                    5 |        227       18.02       53.97
                                    6 |        227       18.02       71.98
                                    7 |        227       18.02       90.00
                                    8 |        126       10.00      100.00
                          ------------+-----------------------------------
                                Total |      1,260      100.00
                          
                          . tab second
                          
                          Transaction |
                           price (100 |
                                  Kw) |      Freq.     Percent        Cum.
                          ------------+-----------------------------------
                                    0 |        500       39.68       39.68
                                    1 |        205       16.27       55.95
                                    2 |        210       16.67       72.62
                                    3 |        142       11.27       83.89
                                    4 |         96        7.62       91.51
                                    5 |         62        4.92       96.43
                                    6 |         34        2.70       99.13
                                    7 |         11        0.87      100.00
                          ------------+-----------------------------------
                                Total |      1,260      100.00

                          Comment

                          Working...
                          X