Running regressions without normally distributed residuals

Astrid Lind

Join Date: Mar 2019

Posts: 7
#1

Running regressions without normally distributed residuals

28 Mar 2019, 03:23

Hi,

I have a question about a regression that I am running for my Bachelor thesis. I do not have normally distributed residuals which I believe is an issue.
I was wondering if there is a command that makes it possible to run a regression without normally distributed residuals?
(I have learned that the robust command can be used to correct for heteroscedasticity for example).

The second question is that I have problem with three of the regression assumptions (homoscedasticity, normally distributed residuals and autocorrelation). Is there any command that can be used to correct for all of them? (I have heard that the robust command can correct for heterscedasticity as well as autocorrelation)

I have panel data and using log/square to transform my dependent variables is not an option because they have negative as well as positive values.

Please be patient, I am a first time user of Statalist. Let me know if I have missed any information that should be included in the question.

Thank you,
Astrid
Tags: None
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17709
#2

28 Mar 2019, 10:53

Astrid:
welcome to this forum.
For most commands developed for panel data regression, -cluster- or -robust- options are what you're after for dealing with heteroskedasticity and/or autocorrelation.

Kind regards,
Carlo
(Stata 19.0)
Comment
Astrid Lind

Join Date: Mar 2019

Posts: 7
#3

29 Mar 2019, 02:20

Originally posted by Carlo Lazzaro View Post

Astrid:
welcome to this forum.
For most commands developed for panel data regression, -cluster- or -robust- options are what you're after for dealing with heteroskedasticity and/or autocorrelation.

Thank you Carlo. We have tried that command. Do you know if there is a similar command to correct for the absence of normally distributed residuals?
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17709
#4

29 Mar 2019, 02:54

Astrid:
-cluster- or -robust- options are enough, as non-normality in residuals distribution is, in fact, heteroskedasticity.

Kind regards,
Carlo
(Stata 19.0)
Comment
Astrid Lind

Join Date: Mar 2019

Posts: 7
#5

29 Mar 2019, 03:46

Originally posted by Carlo Lazzaro View Post

Astrid:
-cluster- or -robust- options are enough, as non-normality in residuals distribution is, in fact, heteroskedasticity.

Hi again Carlo,
My understanding is that it is not the case that non-normality in residuals is the same as heteroscedacticity. Normal distribution requires that the number of residuals gets fewer and fewer as you they get larger. Homoskedasticity simply requires that the the variance is constant when plotted against the fitted values of the dependent variable => its not necessarily always the same thing.

Do you still believe that the robust command can be used?
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17709
#6

29 Mar 2019, 05:00

Astrid:
as you're correct about the difference between normal distributed and heteroskedastic residuals, even if residuals are not normally distributed, it is possible to prove that confidence intervals of the coefficients and related hypothesis testing are still valid.
Hence, the only thing you shoud care about, is heteroskedasticity of the residuals distribution.
That said, I should have been clearer in my previous post. Sorry for this.

Kind regards,
Carlo
(Stata 19.0)
Comment
Astrid Lind

Join Date: Mar 2019

Posts: 7
#7

29 Mar 2019, 05:24

Thank you Mr. Lazzaro

How can we prove that the confidence intervals of the coefficients are still valid?

Also, what is the difference between the cluster option and the robust option?
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17709
#8

29 Mar 2019, 06:13

Astrid:
1) Appendix C of one of my favourite textbook on introductory econometrics (admittedly, I judged the book by its cover!) https://www.wiley.com/en-gb/Introduc...-9780470032701 provides basics concepts in Asymptotic Theory;
2) if you refer to panel data regression commands, such as -xtreg-, both options do the very same job. If you refer to -regress- the previous comments still holds, but with the addition that the -robust- option allows you to take different types of heteroskedasticity into account.
As an aside, please call me Carlo, like all on (and many more off) this list do. Thanks.

Kind regards,
Carlo
(Stata 19.0)
Comment
Astrid Lind

Join Date: Mar 2019

Posts: 7
#9

29 Mar 2019, 06:24

Originally posted by Carlo Lazzaro View Post

Astrid:
1) Appendix C of one of my favourite textbook on introductory econometrics (admittedly, I judged the book by its cover!) https://www.wiley.com/en-gb/Introduc...-9780470032701 provides basics concepts in Asymptotic Theory;
2) if you refer to panel data regression commands, such as -xtreg-, both options do the very same job. If you refer to -regress- the previous comments still holds, but with the addition that the -robust- option allows you to take different types of heteroskedasticity into account.
As an aside, please call me Carlo, like all on (and many more off) this list do. Thanks.

Thank you again Carlo!
We're planning to use reg dep indep1 indep2 ...., robust for our function. What you are saying is that the robust command corrects for heteroscedasticity, non-normality and autocorrelation?

Thank you so much for your help
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17709
#10

29 Mar 2019, 06:29

Astrid:
if you have panel data with a continuous regressand, why not using -xtreg- with -robust- or -cluster- options (with a decent sample size, non-normality of residuals distribution is not an issue to worry about) instead of -regress-?

Kind regards,
Carlo
(Stata 19.0)
Comment
Astrid Lind

Join Date: Mar 2019

Posts: 7
#11

29 Mar 2019, 06:46

Originally posted by Carlo Lazzaro View Post

Astrid:
if you have panel data with a continuous regressand, why not using -xtreg- with -robust- or -cluster- options (with a decent sample size, non-normality of residuals distribution is not an issue to worry about) instead of -regress-?

We do not really see a reason for using that command. What difference does it make in terms of removing our assumption errors?
We have a sample of 152 companies over eight years. We're using six independent variables excluding our control variables.

Thank you,
Astrid
Comment
Joseph Coveney

Join Date: Apr 2014

Posts: 4410
#12

29 Mar 2019, 07:00

Originally posted by Astrid Lind View Post

I have problem with three of the regression assumptions (homoscedasticity, normally distributed residuals and autocorrelation). Is there any command that can be used to correct for all of them? (I have heard that the robust command can correct for heterscedasticity as well as autocorrelation)

What is your concern? If you have heteroscedasticity and autocorrelation, then model it. Or reach for the robust security blanket.

Deviation from normality for the residuals comes into play when you are concerned with p-values (and corresponding confidence intervals) with relatively small datasets. If that's your concern, then either take the p-values with a grain of salt, or use resampling methods (e.g., permute).
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17709
#13

29 Mar 2019, 08:05

Astrid:
I do second Joseph's advice.
As a general note, if you have a N>T panel dataset, -regress- is the exception, not the rule vs -xtreg-.
My impression is that you'resisting considering -xtreg- being afraid of making a jump in the dark (this is a free translation of an Italian saying related to leaving a known road to a new one, full of uncertainty).
-xtreg- is simply statistics, do not be afraid about it. Besides, it can easily take heteroskedasticity and autocorrelation with an unique option (-robuts- or -cluster-).
I also recommend you the reading of https://www.stata.com/bookstore/micr...metrics-stata/, that covers -regress- and -xtreg- comprehensively.

Kind regards,
Carlo
(Stata 19.0)
Comment
Astrid Lind

Join Date: Mar 2019

Posts: 7
#14

29 Mar 2019, 08:10

Originally posted by Joseph Coveney View Post

What is your concern? If you have heteroscedasticity and autocorrelation, then model it. Or reach for the robust security blanket.

Deviation from normality for the residuals comes into play when you are concerned with p-values (and corresponding confidence intervals) with relatively small datasets. If that's your concern, then either take the p-values with a grain of salt, or use resampling methods (e.g., permute).

I am concerned with the p-values since the regression is ought to be used to reject a null-hypothesis. My question is wether the robust command corrects for all deviations from the assumptions or if it only corrects for heteroscedasticity and autocorrelation.
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17709
#15

29 Mar 2019, 08:35

Astrid:
as Joseph wisely pointed out, non-normality of residuals distribution bites when you have small dataset (by the way, almost every deviation from OLS theory bites when you have a small dataset).
That said, it seems to me that you have received enough replies (and referemces) to take a step forward.
I would consider the following items:
- do not worry about residuals distribution non-normality (if you have a decents sample size);
- consider using -xtreg- vs -regress- if you have linear panel data;
- check your regression for non-linearity between predictors and regressand (aka omitted variable bias);
- check for endogeneity (a nasty form of omitted variable bias);
- under -xtreg-, invoke -cluster- or -robust- option to take both heteroskedasticity and autocorrelation into account.

Kind regards,
Carlo
(Stata 19.0)
1 like
Comment

Announcement

Running regressions without normally distributed residuals

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment