Creating A Dummy Variable

Shahzeb Ahmed

Join Date: May 2017

Posts: 5
#1

Creating A Dummy Variable

01 May 2017, 04:41

Hi all,
I am new to econometrics and stata so if I say something wrong, please spare me. I have a few questions to ask on creating a dummy variable.

I have a dataset coming from annual reports of 182 companies. Within my dataset, I have total assets of firms. I want to create dummy variables for Small, Medium and Large size firms based on their total assets; a ratio of 30%, 40% and 30%. So dummy variable for Top 30% firms = 1, Mid 40% firms = 2 and Small 30% firms = 0. How do I create this dummy variable?

My second question is, these 182 companies are from 4 different countries, how do I create a dummy for each country?

Thanks and Regards,
Shahzeb Ahmed
Tags: None
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17678
#2

01 May 2017, 04:58

Shahzed:
welcome to the list.
All you need to know about creating categorical variables and interactio is covered under -help fvvarlist-.
Taking a look at -help label- will answer the remanining part of your query.
As per FAQ, please post an example/excerpt of your dataset via -dataex- (see -serch dataex- to download it, first), so that interested listers can reply more positively to your queries. Thanks.

Kind regards,
Carlo
(Stata 19.0)
Comment

Nick Cox

Join Date: Mar 2014
Posts: 35448

01 May 2017, 05:12

I wouldn't worry about being new to econometrics. Most of the people who answer most of the questions here aren't even economists.

I wouldn't call this a dummy variable, or even an indicator variable. It's just a categorical variable indicating bins.

I recommend thinking this through as a binning of percentile rank. on which see http://www.stata.com/support/faqs/st...ing-positions/

Here is some technique which you can run for yourself.

Code:

. webuse grunfeld

. egen rank = rank(mvalue), by(year)

. egen count = count(mvalue), by(year)

. gen pcrank = (rank - 0.5)/count

. tab pcrank

     pcrank |      Freq.     Percent        Cum.
------------+-----------------------------------
        .05 |         20       10.00       10.00
        .15 |         20       10.00       20.00
        .25 |         20       10.00       30.00
        .35 |         20       10.00       40.00
        .45 |         20       10.00       50.00
        .55 |         20       10.00       60.00
        .65 |         20       10.00       70.00
        .75 |         20       10.00       80.00
        .85 |         20       10.00       90.00
        .95 |         20       10.00      100.00
------------+-----------------------------------
      Total |        200      100.00

. gen bin = cond(pcrank < .3, 0, cond(pcrank < .7, 1, 2)) if pcrank < .

. tab bin

        bin |      Freq.     Percent        Cum.
------------+-----------------------------------
          0 |         60       30.00       30.00
          1 |         80       40.00       70.00
          2 |         60       30.00      100.00
------------+-----------------------------------
      Total |        200      100.00

In your case, you may want to group by(country year) or whatever your variables are. You give no data example.

The Grunfeld data has 10 companies and ties are not a problem with the example above, so count ratios 3:4:3 can be achieved. Your data may not allow that exactly.

Announcement

Creating A Dummy Variable

Comment

Comment