OLS with categorical independent variables_assumptions

Lina Massou

Join Date: Jul 2015

Posts: 41
#1

OLS with categorical independent variables_assumptions

17 Jul 2015, 18:29

Hi everybody!
I'm running OLS analysis for continuous dependent variable and 8 categorical independent variables. Could you please help me about the assumptions of the method and how to check them in stata?..Also, in case they are not met, how I could go on?

thanks!
Tags: None
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17673
#2

18 Jul 2015, 01:38

Lina:
welcome to the list.
Unfortunately, your question is too vague to receive a helpful reply.
Please take a look at FAQ on how to post effectively: it's diffcicult to reply to your query if you do not post what you typed and what Stata gave you back.
Assumptions on OLS are covered in any decent textbook on basici statistics and econometrics.

Kind regards, Carlo

Kind regards,
Carlo
(Stata 19.0)
Comment
Lina Massou

Join Date: Jul 2015

Posts: 41
#3

18 Jul 2015, 05:16

Thanks for this Mr. Lazzaro. The dependent variable is referred to expenditure and the independent variables are demographics, some of these have 2 categories (e.g. male/female) whereas others more than 2 categories (e.g status in employment). I know the assumptions of OLS ,but I'd like to ask you first of all if I have to create dummy variables for the IVs with more than 2 categories or if I run the regression in stata without any transformations. Also, who do I check if there is linear relationship between DV and IVs, and what if DV is not normally distributed?...I have about 3500 observations of this is helpful.

Best,
Lina
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17673
#4

18 Jul 2015, 05:54

Lina:
thanks for providing more details, which allows some remrks concerning your query:
- you don't have to botheri yourself with creating categorical variables by hand, as Stata has a cozy command for that task: -fvvarlist-;
- for checking linear realtionship between DV and IVs, Stata has lots of visual and analytical methods: please, see -regress postestimation-;
- you don't have to worry about DV being not normally distributed, because nomal distribution in OLS relates to residuals, not DV.
I find hard to go further without seeing what you typed and what Stata gave you back (there's a FAQ explaining why this increases your likelihood of receiving helpful replies). The gist of the matter is that we do not know either your code, or your results until you make them available to us (there's another FAQ on code delimiters, the best way to paste what you're going to post).

Kind regards,
Carlo
(Stata 19.0)
Comment
Maarten Buis

Join Date: Mar 2014

Posts: 3426
#5

18 Jul 2015, 06:16

For categorical explanatory variables you don't have to check for linearity, as it is impossible to violate that assumption: Categorical variables are turned into a set of indicator (dummy) variables if you use the factor variable notation (see: help fvvarlist as Carlo already mentioned). With each indicator variable you just compare two points (conditional means) and you can always connect two points by a linear line.

---------------------------------
Maarten L. Buis
University of Konstanz
Department of history and sociology
box 40
78457 Konstanz
Germany
http://www.maartenbuis.nl
---------------------------------
Comment
Lina Massou

Join Date: Jul 2015

Posts: 41
#6

18 Jul 2015, 08:56

Thank you so much for your helpful comments.!So, first I use the fvvarlist to create the dummy variables from the categorical IVs, then I run the regression and then I check the assumptions through regress postestimation, right?
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17673
#7

18 Jul 2015, 10:49

Lina:
yes, you're right.

Kind regards,
Carlo
(Stata 19.0)
Comment
Anton Ivanov

Join Date: Sep 2014

Posts: 267
#8

18 Jul 2015, 11:07

Lina,

I would strongly recommend you a book by Berry (Berry, W. D. 1993. Understanding Regression Assumptions (Vol. 92), Sage.), which is very helpful in my point of view. Once you get the idea behind the assumptions, Stata can provide you with all the tools necessary to test them.

Anton
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17673
#9

18 Jul 2015, 11:37

Lina:
expanding on Anton's reference list, I would recommend you another valuable (and lovely short) textbook on this topic: Allison PD. Multiple regression. A primer. Thousand Oaks, CA: Pine Forge Press, 1999.

Last edited by Carlo Lazzaro; 18 Jul 2015, 11:41.

Kind regards,
Carlo
(Stata 19.0)
1 like
Comment
Lina Massou

Join Date: Jul 2015

Posts: 41
#10

18 Jul 2015, 19:01

Many many thanks all of you!!!I really appreciate it!
Comment
Lina Massou

Join Date: Jul 2015

Posts: 41
#11

16 Nov 2015, 15:59

hello again!
I hope all of you are fine.
Could you please let me know if this result is ok and if I interprete it correctly?
HEALTHRT is health expenditure and its continuous (DV)
MB02 gender (male/female)
maritalst marital status (never married/married/widowed/divorced)
Is the command right for the OLS analysis?

If so, then as for the interpretation:
there is NO significant difference in the expenditure of two genders (what about the minus "-" in males' coef.?)
Married and never married have significant effects on health expenditure contrary to the widowed.
The effect of divorced is also significant judging by the constant (put everything equal to zero in the model).
Married and paid 528.09 more on health than divorced did while never married 322.10 less than the reference category.
The same for widowed.
Is it ok?...Is the weight ok?

1 Photo
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17673
#12

17 Nov 2015, 02:24

Lina:
your regress code seems right for the purpose of your analysis.
Hoewever, your interpretation of the coefficients is not always correct:
- other things being equal, male spend less than female, but there's no evidence that the difference is statistically significant;
- other things being equal,

Married and never married have significant effects on health expenditure contrary to the widowed.

,

Married and paid 528.09 more on health than divorced did while never married 322.10 less than the reference category.

(i.e. widowed); I'm not sure if your results are in line with the literature;
- the constant refers to divorced female only;
- i can't comment on the correctness of weight.

For the future, please post what you typed and what Stata gave you back via Code delimiters (see the FAQ on this topics). Thanks.

Kind regards,
Carlo
(Stata 19.0)
Comment
Lina Massou

Join Date: Jul 2015

Posts: 41
#13

17 Nov 2015, 03:29

Great! ..Many thanks for your response. I'm not interesting if the results are in line with the literature - actually I'm expecting not to be- since the data are pooled from different sources, I simply want to see if the command and the interpretation are correct. It was a trial. The p values (e.g for the male) means that the effect of being male on expenditure is significant?And what about the weights?...The dataset is from household budgets survey and the weights are given from the source of survey, I haven't calculated anything on this. Could you have any suggestion how could I use weights?

Thank you,

Lina
Comment
Lina Massou

Join Date: Jul 2015

Posts: 41
#14

17 Nov 2015, 03:34

One last query, if the F statistic is missing (it appears a blue link that however is not clear what it means), what it means for my model?
I saw some post on forum but I couldn't understand.
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17673
#15

17 Nov 2015, 04:24

Lina:
the reason why your F-statistic is missing is well covered in this thread: http://www.stata.com/statalist/archi.../msg00685.html.
If your data come from a survey, you should take a look at -help svy- prefix.

Kind regards,
Carlo
(Stata 19.0)
Comment

Announcement

OLS with categorical independent variables_assumptions

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment