Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Which type of Regression is appropriate? (reg; rreg; reg, vce(robust); ...)

    Hey there,

    I have a cross-sectional data set of about 180 firms with operating performance data after a restructuring event (events taking place within a period of 15 years). I want to regress the median operating performance in the 3 years after the restructuring event on different explanatory (e.g. prior performance, diversification level) and control variables (capes, industry, year..).

    However, I am unsure of which regression type would be appropriate (and how to find out which one to use). I made regressions with reg; reg, vce(robust) and rreg. Especially when using rreg, different independent variables get way more significant compared to the other regression types.

    Do you have any advice on which regression type to use or how to find out which one is appropriate?

    Thank you so much!

    Andreas
    Last edited by Andreas Manuel; 16 Oct 2014, 09:31.

  • #2
    andymanu (the same yesterday's kind reminder about fixing your identifier as per FAQ still holds):
    - -reg depvar indepvar, vce(robust)- usually comes after detecting heteroskedasticity in -reg depvar indepvar- residuals via -estat hettest-;
    - -rreg depvar indepvar - usually comes after detecting high leverage points in -reg depvar indepvar- via -lvr2plot- and/or -predict xdist, hat-.

    It may well be that -rreg- downweights some nasty behaviour in your data; if this is the case, R-squared should have improved as well (you can retrieve it as -e(r2)- by typing -ereturn list- after -rreg-.

    Put it differently, there are different answers to your question.

    Kind regards,
    Carlo
    Kind regards,
    Carlo
    (Stata 19.0)

    Comment


    • #3
      Hi Carlo,

      thank you very very much for your appreciated help!! I already changed my username. I have some more questions regarding this issue

      - So is rreg another way to handle outliers? I winsorized my variables before - it probably makes no sense to first winsorize and then use rreg right?

      - I posted four pictures (for two different dependent variables) showing the leverage plots and output of the heteroskedasticity measure - which of the three regression models reg (with/without vce(robust)) and rreg would you recommend to use?

      - It seems to me that my data allows both reg, vie(robust) and the rreg even though I don't understand how to draw definite conclusions from the leverage charts

      Best regards,

      AndreasClick image for larger version

Name:	Heteroskedasticity_ROA.jpg
Views:	1
Size:	15.3 KB
ID:	325231Click image for larger version

Name:	Heteroskedasticity_ROS.jpg
Views:	1
Size:	15.2 KB
ID:	325232Click image for larger version

Name:	Leverage_ROA.jpg
Views:	1
Size:	65.8 KB
ID:	325233Click image for larger version

Name:	Leverage_ROS.jpg
Views:	1
Size:	55.2 KB
ID:	325234

      Comment


      • #4
        Andreas:
        - thanks for fixing your identifier;
        - I would not
        ...winsorize and then use rreg
        - as per your pictures, both your regressions have some problems concerning residual heteroskedasticity and/or higher-than-average leverage;
        - I would -rreg- the same models that you -reg- and investigate if R-squared improve (as I would expect it to);
        - you could benefit from a clear explanation on these issues from -help lvr2plot - and -help regress posteestimation diagnostic plot- along with related entries in Stata 13.1 .pdf manual.


        Kind regards,
        Carlo
        Kind regards,
        Carlo
        (Stata 19.0)

        Comment


        • #5
          Carlo,

          - the RSquared improves a lot if I use rreg and my explanatory as well as control variables increase a lot regarding their statistical significance.

          - However, I am wondering whether it is justifiable/makes sense given my dataset to use rreg or whether it could be considered kind of cheating (I am analysing this data taken from a database for my thesis at university) since basically the whole result of my regression is very different (explanatory variables which are not stat. significant using reg become highly stat. significant using rreg) using reg and rreg.

          Thank you for your appreciated advise!

          Best regards,

          Andreas

          Comment


          • #6
            Andreas:
            if RSquared improves with -rreg- it seems that -rreg- is the way to go.
            If no downweight of nasty observation is necesary, the estimators of -rreg- would give about the same RSquared of -reg-.
            The problem you are facing now is how to justify your methodological choice (that, as usual, is your call).
            About this topic, I would take a comprehensive look at -rreg- and -regression postestimation- (-help cooksd-, for instance) entries in Stata 13.1 .pdf manual to get aware about how -rreg- works.
            At the top of that, since you can probably rent a volume from your university library, I would strongly recommend Treiman DJ. Quantitative data analysis. San Francisco, CA: Jossey-Bass, 2009. It is plenty of useful, Stata-worked out examples and code; Chapter 10 covers exactly what you're looking for.
            I take this chance to thanks once again Maarten Buis for mentioning this textbook on the list some years ago.

            Kind regards,
            Carlo
            Kind regards,
            Carlo
            (Stata 19.0)

            Comment


            • #7
              Dear Andreas,

              I think you have to be careful here because -reg- and -rreg- identify different sets of parameters. That is, they are not alternative estimators for your model, they estimate models for different things. You can read more about that in:

              Baldauf, Markus and Santos Silva, J.M.C. (2012), "On the Use of Robust Regression in Econometrics," Economics Letters, 114(1), pp. 124–127.

              All the best,

              Joao

              Comment


              • #8
                I'd want to suggest a different focus. rreg is just one flavour of robust regression. I talk to robustniks and look at their literature and it's pretty clear that it is way behind the state of the art.

                I wrote this in 2011 in http://www.stata.com/statalist/archi.../msg00416.html See also back and forth in the thread, including an endorsement by Steve Samuels.

                The help file has it right: -rreg- is "one version of robust regression". When -rreg- was written the method seemed a good all-round flavour of robust regression, but it is doubtful whether it now looks like _the_ method of choice to anyone in 2011. If you ever used -rreg- for real, you'd be obliged to explain it and defend the choice in any serious forum. "I used robust regression" means virtually nothing. There are probably hundreds of ways to do robust regression (quite apart from what robustness means). "I used -rreg- as implemented in Stata" counts for little outside this community. "I used robust regression as codified by Li (1985)" obliges you to explain why you didn't use something more recent (to fad- and fashion-followers) or something else that someone else fancies for some reason of their own. The literature would keep you busy indefinitely.

                Li, G. 1985. Robust regression. In Exploring Data Tables, Trends, and Shapes, ed. D. C. Hoaglin, F. Mosteller, and J. W. Tukey, 281-340. New York: Wiley.

                As Steve said in 2011, there are now better modern methods available in Stata. For example, search for the work of Vincent Verardi.

                Comment


                • #9
                  Thank you very much for your comments!

                  Nick, I already read your post from 2011 and I also tried using mmregress, msregress and robreg... but I didn't get any results after waiting for around two hours. My dataset has lots of control variables (14 controls for different years and around 30 controls to control for different industries) - I think this is a problem when using these regressions right?

                  I am feeling kind of desperate now because I have a dataset consisting of 181 observations and the following measures for Leverage, Outliers, residual distribution and homoskedasticity of residuals - I really don't know which kind of regression would be the most appropriate one given this diverse dataset.. I would very much appreciate any help of experienced Stata users!!!
                  Click image for larger version

Name:	Leverage.jpg
Views:	1
Size:	19.3 KB
ID:	329301Click image for larger version

Name:	Outliers.jpg
Views:	1
Size:	33.9 KB
ID:	329302Click image for larger version

Name:	LVR2PLOT.jpg
Views:	1
Size:	53.2 KB
ID:	329303Click image for larger version

Name:	Residuals normal distribution.jpg
Views:	1
Size:	12.7 KB
ID:	329304
                  Last edited by Andreas Manuel; 17 Oct 2014, 11:49.

                  Comment


                  • #10
                    I wouldn't want to try to recommend what to do based on these diagnostics alone. My comments were conditional on your interest in robust regression, but how to model your data is much broader.

                    As someone who prefers not to estimate more about three parameters at once, I am certainly queasy about how many parameters you are estimating with a dataset small for the purpose. I prefer robust to non-robust, but my usual starting point when talking about outliers is to wonder whether you need a logarithmic link function.
                    Last edited by Nick Cox; 17 Oct 2014, 11:52.

                    Comment


                    • #11
                      Is there any command for "robust regression (rreg of similar) with robust & clustered standard errors" in Stata13?

                      Comment

                      Working...
                      X