Dummy Independent Variables

Shiwani Varal

Join Date: Apr 2020

Posts: 24
#1

Dummy Independent Variables

10 Sep 2020, 12:10

Hi everyone,

I am currently using Stata 16. For my project, I am trying to observe the effect of undergraduate education background (my INDEPENDENT variable) on my dependent variable. My education background variable is grouped into Engineering, Financial, Law and Others. I tried to create a dummy variable to categorize them into 4 different groups. First I used:

Code:

encode EDUCATION, gen(qualification)

Then I used:

Code:

xi: reg DEP_VAR i.qualification

This second step created dummies for each group. When I use these dummy variables in my regression, I get very large p-value. Is there something wrong with my approach? I am trying to re-do an analysis I saw somewhere and I took same step as them. I needed advice on whether my method of dummy variable creation is incorrect, or if maybe the relation I am trying to observe is not there?

I was not sure which thread would best fit this question, so I am posting it here. I would appreciate any help that you can provide me.
Thank you!
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 29959
#2

10 Sep 2020, 13:23

I see nothing wrong with the code. I wouldn't use the -xi:- prefix. It is obsolete: -reg DEP_VAR i.qualification- will do the same thing using factor variable notation (-help fvvarlist)- without cluttering our data set with otherwise unneeded "dummy" variables, and also will enable you to use the -margins- command if that should prove handy. But the results won't be any different either way.

That your results have large p-values simply means that your data do not support estimates of these effects that are sufficiently precise to allow you to determine their signs. One possibility is that the effects truly are zero. Other more likely possibilities are that they are small in magnitude and the noise in your data, or a small sample size, makes your effect estimates too imprecise to distinguish them from zero. You should also consider other issues such as omitted variable bias when doing a simple model like this in observational data. Also bear in mind that the p-values you are seeing refer to a comparison between the expected values of DEP_VAR in each of the levels of qualification and the reference level of qualification (that is, the value that does not appear in the model.) Are you sure those are the comparisons you are interested in?
Comment
Shiwani Varal

Join Date: Apr 2020

Posts: 24
#3

21 Sep 2020, 04:26

Hi Clyde, Thanks so much for your insight. Yes, you pointed out an important point that I am comparing between the expected values of my dependent variable with reference level of qualification. This got me into thinking that maybe I can also have a dummy variable for my dependent variable and use logistic regression. However, I have never used "Logit" before, and I am not sure how to include industry and year effects into it. Do you have any suggestions?
Comment
Felix Scholl

Join Date: Aug 2020

Posts: 33
#4

21 Sep 2020, 04:33

Is your dependent variable binary or can be transformed to a binary variable? Then you can use logistic regression. Of course you can include industry and years effects the same way as you do in OLS. You can find countless introductory examples of how to use logistic regression in Stata and how to interpret the results. However, I believe it is unlikely that you will get a significant effect by switching to a binary dependent variable.
Comment
Marcos Almeida

Join Date: Apr 2014

Posts: 4047
#5

21 Sep 2020, 11:04

Changing the DV from continuous to a binary one, more so, "selecting" a cut-off value after seeing the results of the linear regression, this may end up as a fishing expedition. That said, the commands are similar with regards to the predictors.

Best regards,

Marcos
Comment
Shiwani Varal

Join Date: Apr 2020

Posts: 24
#6

21 Sep 2020, 14:20

Hi Felix and Marcos, Yes, it does sound non-ethical/fishy when I say I will switch from continuous to binary, but the nature of my dependent variable actually allows me to put it in a binary form. Because there is a cutoff threshold that I did not come up with. I am trying to use logistic regression, and I have panel data. Is

Code:

clogit binaryDEP_var independent variable, group()

the way to go? Thank you!!
Comment
Felix Scholl

Join Date: Aug 2020

Posts: 33
#7

22 Sep 2020, 02:43

Standard logistic regression is called by logistic. Clogit estimates a conditional (fixed-effects) logistic regression (see help clogit), where groups are defined by var in group(var). You have to define your group variable, otherwise Stata will complain, e.g.

Code:

clogit binaryDEP_var independent variable, group(id)

If you really have panel data (the same units are observed multiple times, then using conditional fixed-effects logistic may be appropriate. Clogit disregards any variance across units and only takes into account variation within units. If this is the right think to do for you, is almost impossible without knowing your data and your research.

By the way, I cannot think of an example with panel data, where using standard OLS regression is appropriate, because units are not independent from each other.

Last edited by Felix Scholl; 22 Sep 2020, 02:46.
Comment

Announcement

Dummy Independent Variables

Comment

Comment

Comment

Comment

Comment

Comment