Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • tabulate, generate() with custom variable names

    I'm using tabulate with the generate() option but rather than creating dummy variable names like companyyr1, companyyr2, ... I'd like to create variable names with a prefix, then the actual company name and year. So, for example, given the following dataset:

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str6 name str1 pd float value
    "texas"  "1" 4.1023707
    "texas"  "1"  3.037997
    "texas"  "2"  4.152264
    "triana" "1"  3.017829
    "triana" "1"  3.063638
    "triana" "2" 3.3544934
    "turm"   "1"  5.234208
    "turm"   "1"  5.191209
    "turm"   "1"  4.775544
    "turm"   "2"  6.506348
    "turm"   "2"  5.759304
    "turm"   "2"  5.682965
    end
    I'd like to generate six new dummy variables:
    cpdum_texas1 (=1 if name == "texas" and pd == "1", and zero otherwise)
    cpdum_texas2 (=1 if name == "texas" and pd == "2", and zero otherwise)
    cpdum_triana1 (=1 if name == "triana" and pd == "1", and zero otherwise)
    cpdum_triana2 (=1 if name == "triana" and pd == "2", and zero otherwise)
    cpdum_turm1 (=1 if name == "turm" and pd == "1", and zero otherwise)
    cpdum_turm2 (=1 if name == "turm" and pd == "2", and zero otherwise)

    That is, the dummy variable names should take the form cpdum_[name][pd]. I have well over a hundred of these combinations (though pd is always 1 or 2, at least in the current iteration).

    tabulate, generate(cpdum_) seems promising, but the variables thus created have unintuitive names. Thus the challenge here is to try to replicate that functionality with custom names.

    My efforts so far have focused on using tabulate, generate() and then renaming the variables, like so:
    Code:
    gen namepd = name+pd
    tab namepd, gen(cpdum_)
    foreach vbl of varlist cpdum_* {
        local newvar = "cpdum_"+namepd
        rename `vbl' `newvar' 
    }
    However, this returns an error, as the variable cpdum_texas1 is already defined when we get to the second observation. FWIW I also tried adding a tag so that -rename- would be applied only to select observations, but that didn't work either. That is,

    Code:
    egen t=tag(namepd)
    then, in the foreach loop,
    Code:
    rename `vbl' `newvar' if t==1
    But that returned the same error, perhaps because -rename- doesn't play well with -if- (which is logical).

    I also searched past posts on Statalist, but did not find anything I could use to solve this question. A question from 2015 comes close (http://www.statalist.org/forums/foru...ars-dummy-vars) but I couldn't figure out how to translate that into my case, chiefly because the large number of possible combinations in my data (i.e., large number of distinct values of "name") makes it impossible to list out every one when declaring the loop. (Nonetheless, I thought I'd provide the cross-ref for posterity, in case someone in future finds this question but is looking for the other.)

    I'd be grateful for any suggestions. Many thanks.

  • #2
    Code:
    clear
    input str6 name str1 pd float value
    "texas"  "1" 4.1023707
    "texas"  "1"  3.037997
    "texas"  "2"  4.152264
    "triana" "1"  3.017829
    "triana" "1"  3.063638
    "triana" "2" 3.3544934
    "turm"   "1"  5.234208
    "turm"   "1"  5.191209
    "turm"   "1"  4.775544
    "turm"   "2"  6.506348
    "turm"   "2"  5.759304
    "turm"   "2"  5.682965
    end
    
    levelsof name, local(NAMES)
    levelsof pd, local(PD)
    
    foreach name of local NAMES {
        foreach pd of local PD {
            gen cpdum_`name'`pd' = (name == "`name'") & (pd == "`pd'")
        }
    }
    Note that rename can't be restricted to certain observations. A variable is an entire column in the dataset; there is no sense in which it is defined only for certain observations. Also, I think that the problem with

    Code:
    local newvar = "cpdum_" + namepd
    is that this can only ever be interpreted as

    Code:
    local newvar = "cpdum_" + namepd[1]
    It's not that you're looping over observations at all. It's that you never get to use any other name elements with that code than those within the first observation.

    Last edited by Nick Cox; 06 Jun 2017, 11:35.

    Comment


    • #3
      Brilliant - thanks Nick!

      Comment

      Working...
      X