Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Creating A Dummy Variable

    Hi all,
    I am new to econometrics and stata so if I say something wrong, please spare me. I have a few questions to ask on creating a dummy variable.

    I have a dataset coming from annual reports of 182 companies. Within my dataset, I have total assets of firms. I want to create dummy variables for Small, Medium and Large size firms based on their total assets; a ratio of 30%, 40% and 30%. So dummy variable for Top 30% firms = 1, Mid 40% firms = 2 and Small 30% firms = 0. How do I create this dummy variable?

    My second question is, these 182 companies are from 4 different countries, how do I create a dummy for each country?

    Thanks and Regards,
    Shahzeb Ahmed

  • #2
    Shahzed:
    welcome to the list.
    All you need to know about creating categorical variables and interactio is covered under -help fvvarlist-.
    Taking a look at -help label- will answer the remanining part of your query.
    As per FAQ, please post an example/excerpt of your dataset via -dataex- (see -serch dataex- to download it, first), so that interested listers can reply more positively to your queries. Thanks.
    Kind regards,
    Carlo
    (Stata 19.0)

    Comment


    • #3


      I wouldn't worry about being new to econometrics. Most of the people who answer most of the questions here aren't even economists.

      I wouldn't call this a dummy variable, or even an indicator variable. It's just a categorical variable indicating bins.

      I recommend thinking this through as a binning of percentile rank. on which see http://www.stata.com/support/faqs/st...ing-positions/

      Here is some technique which you can run for yourself.

      Code:
      . webuse grunfeld
      
      . egen rank = rank(mvalue), by(year)
      
      . egen count = count(mvalue), by(year)
      
      . gen pcrank = (rank - 0.5)/count
      
      . tab pcrank
      
           pcrank |      Freq.     Percent        Cum.
      ------------+-----------------------------------
              .05 |         20       10.00       10.00
              .15 |         20       10.00       20.00
              .25 |         20       10.00       30.00
              .35 |         20       10.00       40.00
              .45 |         20       10.00       50.00
              .55 |         20       10.00       60.00
              .65 |         20       10.00       70.00
              .75 |         20       10.00       80.00
              .85 |         20       10.00       90.00
              .95 |         20       10.00      100.00
      ------------+-----------------------------------
            Total |        200      100.00
      
      . gen bin = cond(pcrank < .3, 0, cond(pcrank < .7, 1, 2)) if pcrank < .
      
      . tab bin
      
              bin |      Freq.     Percent        Cum.
      ------------+-----------------------------------
                0 |         60       30.00       30.00
                1 |         80       40.00       70.00
                2 |         60       30.00      100.00
      ------------+-----------------------------------
            Total |        200      100.00
      In your case, you may want to group by(country year) or whatever your variables are. You give no data example.

      The Grunfeld data has 10 companies and ties are not a problem with the example above, so count ratios 3:4:3 can be achieved. Your data may not allow that exactly.

      Comment

      Working...
      X