Average of a dummy variable

Simone Nuzzo

Join Date: May 2016

Posts: 17
#1

Average of a dummy variable

19 Oct 2016, 09:37

Hi All,

let's say I want to control for gender effect in a given market. Then I set up a binary variable "Male" which takes on value 1 for males and 0 otherwise. Does it make any sense to average the dummy variable at a market level? So, assuming that I have 10 agents in a market and 8 of them are males, the dummy would assume value 0.8 and will provide the model with the information that the market was "male-dominated".

Would that be correct?

Thanks
Simone
Tags: None
Chris Larkin

Join Date: Apr 2016

Posts: 296
#2

19 Oct 2016, 09:46

Yes, this is perfectly correct. Your result would be a proportion - which you can just times by 100 to derive percent.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35432
#3

19 Oct 2016, 10:16

Chris: That's clearly good advice except that "times" is perhaps the word that small children may be taught! Above age 11 or so, the official word is surely "multiply".

There is a more serious point in that percent[age]s are not really different from proportions in that 80% = 80/100 = 0.8.

In most software I know to "see" percents, however, you must multiply by 100, as in the advice here.

I argue that percent is, or should be, just a change of display format.

I asked around a group of Stata people -- possibly at a recent users' meeting -- if they knew software that followed what to me is the logical procedure and was told that Excel has this the way I want. Perhaps I should change.

I did suggest to StataCorp a percent format so that say

Code:

di %2.1p 0.8

would show 80.0 but the idea is just lurking in their files. More importantly, a display format would at best be only a partial solution.
1 like
Comment
Maarten Buis

Join Date: Mar 2014

Posts: 3426
#4

19 Oct 2016, 10:58

Originally posted by Nick Cox View Post

Chris: That's clearly good advice except that "times" is perhaps the word that small children may be taught! Above age 11 or so, the official word is surely "multiply".

Times is also the word I was taught in school when learning English. I don't doubt Nick's statement that multiply sounds more professional, but as a non-native speaker the distinction is not as self-evident as it appears to be for Nick.

---------------------------------
Maarten L. Buis
University of Konstanz
Department of history and sociology
box 40
78457 Konstanz
Germany
http://www.maartenbuis.nl
---------------------------------
Comment
Jesse Wursten

Join Date: Jan 2016

Posts: 915
#5

19 Oct 2016, 11:04

With the risk of going very offtopic, I think the point if that "times" is generally not used as a verb (I'm not a native speaker, so don't put too much stock in my opinion).

For example, you can say: "10 is 2 times 5". But you shouldn't say, "if you time 2 by 5, you get 10".
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35432
#6

19 Oct 2016, 11:30

I was being mischievous, as often, and blame the digression on me. But it's certainly common in informal English (here meaning British) to say (e.g.) "times it by 10" and "times" is there a verb meaning multiply.

People do not say (same qualification) "time it" to mean multiply. That word is used to mean measuring how long something took, as in Usain Bolt was timed to run 100 m in less than 10 seconds. To see how long before the water boils, time it.

Perhaps inconsistently, even mathematicians here would not object to "3 times 4".
Comment
Chris Larkin

Join Date: Apr 2016

Posts: 296
#7

19 Oct 2016, 14:27

Noted Nick! I think 11 was about the age I left school... I've since managed to get myself a master's degree (I know... only a master's) but obviously missed out on this important lesson somewhere in the middle. You are, of course, completely right that a proportion is the same thing as a percent(age) but i've genuinely lost count of the amount of times people interpret the coefficients on binary variables incorrectly because they're not multiplying by 100 in their head!

Last edited by Chris Larkin; 19 Oct 2016, 14:35.
1 like
Comment
Oded Mcdossi

Join Date: Jun 2014

Posts: 577
#8

19 Oct 2016, 14:46

Originally posted by Nick Cox View Post

I asked around a group of Stata people -- possibly at a recent users' meeting -- if they knew software that followed what to me is the logical procedure and was told that Excel has this the way I want. Perhaps I should change.

I did suggest to StataCorp a percent format so that say

Code:

di %2.1p 0.8

would show 80.0 but the idea is just lurking in their files. More importantly, a display format would at best be only a partial solution.

I second this call for StataCorp to add percentages as another format type (with and without the option for the percentage symbol). This seems so trivial and easy to add and may make life easier.
2 likes
Comment
Jeff Wooldridge

Join Date: Apr 2014

Posts: 2121
#9

19 Oct 2016, 18:29

If the model is linear in the gender dummy then it does not matter whether you average the partial effect or plug in the average.In a nonlinear model, one can make a case for the average partial effect (rather than the partial effect evaluated at the average) because the APE averages the partial effect for actual units in the population.

But I'm not sure why you would want to do the averaging in most cases. For example, consider

y = b0 + b1*educ + b2*female + b3*female*educ + ...

then the effect of education is b1 for males and b1 + b3 for females. If you want the average effect of education across the entire population then you would report b1 + b3*(proportion female). I'm not sure why this is preferred to reporting b1 and b3 separately.
1 like
Comment
Simone Nuzzo

Join Date: May 2016

Posts: 17
#10

20 Oct 2016, 09:20

Thank you very much for your replies.

In particular to Jeff: we average over individual characteristics at the market level, since we indeed run the whole analysis at the market level (rather than at individual level).
More specifically, we employ the following tobit model:

where we set up a dummy variable for each of our n financial markets, and X is a vector of some demographics variables that we want to control for. For instance, we want to control for the fact that women may be disporportionately represented in some markets, thus leading to exceptional trends in that specific market. Period is a trend variable capturing the time effect (we have several periods (t) in each market).

We also want to test pairwise treatments effects on the dependent variable. We were thinking to run the following lincom comand to test, for instance, differences between treatment 1 and 2:
lincom Mkt(1) + Mkt(2) + … + Mkt(6) - Mkt(7) - Mkt(8) - … - Mkt(12)

where the first six markets belong to Treatment 1 and markets 7...12 belong to Market 2.

Do you think this procedure is appropriate? Would it allow to properly control for market effects as well as to account for cluster correlation at market level?

If the procedure is correct, how can we deal with the treatment whose market is the omitted category? Let' s suppose this is Treatment 4. If we run the same lincom test as above, one treatment would now have 6 coefficients, while Treatment 4 would have 5 coefficients. We believe this would introduce a distortion. We were thus thinking to rerun a regression changing the omitted variable to one that belongs to another treatment. In other words, when we want treatment effects involving Treatment 4, we would run another regression where the omitted category belong to, say, Treatment 1. Is this method ok?

Thanks!
Comment

Announcement

Average of a dummy variable

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment