I'm using tabulate with the generate() option but rather than creating dummy variable names like companyyr1, companyyr2, ... I'd like to create variable names with a prefix, then the actual company name and year. So, for example, given the following dataset:
I'd like to generate six new dummy variables:
cpdum_texas1 (=1 if name == "texas" and pd == "1", and zero otherwise)
cpdum_texas2 (=1 if name == "texas" and pd == "2", and zero otherwise)
cpdum_triana1 (=1 if name == "triana" and pd == "1", and zero otherwise)
cpdum_triana2 (=1 if name == "triana" and pd == "2", and zero otherwise)
cpdum_turm1 (=1 if name == "turm" and pd == "1", and zero otherwise)
cpdum_turm2 (=1 if name == "turm" and pd == "2", and zero otherwise)
That is, the dummy variable names should take the form cpdum_[name][pd]. I have well over a hundred of these combinations (though pd is always 1 or 2, at least in the current iteration).
tabulate, generate(cpdum_) seems promising, but the variables thus created have unintuitive names. Thus the challenge here is to try to replicate that functionality with custom names.
My efforts so far have focused on using tabulate, generate() and then renaming the variables, like so:
However, this returns an error, as the variable cpdum_texas1 is already defined when we get to the second observation. FWIW I also tried adding a tag so that -rename- would be applied only to select observations, but that didn't work either. That is,
then, in the foreach loop,
But that returned the same error, perhaps because -rename- doesn't play well with -if- (which is logical).
I also searched past posts on Statalist, but did not find anything I could use to solve this question. A question from 2015 comes close (http://www.statalist.org/forums/foru...ars-dummy-vars) but I couldn't figure out how to translate that into my case, chiefly because the large number of possible combinations in my data (i.e., large number of distinct values of "name") makes it impossible to list out every one when declaring the loop. (Nonetheless, I thought I'd provide the cross-ref for posterity, in case someone in future finds this question but is looking for the other.)
I'd be grateful for any suggestions. Many thanks.
Code:
* Example generated by -dataex-. To install: ssc install dataex clear input str6 name str1 pd float value "texas" "1" 4.1023707 "texas" "1" 3.037997 "texas" "2" 4.152264 "triana" "1" 3.017829 "triana" "1" 3.063638 "triana" "2" 3.3544934 "turm" "1" 5.234208 "turm" "1" 5.191209 "turm" "1" 4.775544 "turm" "2" 6.506348 "turm" "2" 5.759304 "turm" "2" 5.682965 end
cpdum_texas1 (=1 if name == "texas" and pd == "1", and zero otherwise)
cpdum_texas2 (=1 if name == "texas" and pd == "2", and zero otherwise)
cpdum_triana1 (=1 if name == "triana" and pd == "1", and zero otherwise)
cpdum_triana2 (=1 if name == "triana" and pd == "2", and zero otherwise)
cpdum_turm1 (=1 if name == "turm" and pd == "1", and zero otherwise)
cpdum_turm2 (=1 if name == "turm" and pd == "2", and zero otherwise)
That is, the dummy variable names should take the form cpdum_[name][pd]. I have well over a hundred of these combinations (though pd is always 1 or 2, at least in the current iteration).
tabulate, generate(cpdum_) seems promising, but the variables thus created have unintuitive names. Thus the challenge here is to try to replicate that functionality with custom names.
My efforts so far have focused on using tabulate, generate() and then renaming the variables, like so:
Code:
gen namepd = name+pd tab namepd, gen(cpdum_) foreach vbl of varlist cpdum_* { local newvar = "cpdum_"+namepd rename `vbl' `newvar' }
Code:
egen t=tag(namepd)
Code:
rename `vbl' `newvar' if t==1
I also searched past posts on Statalist, but did not find anything I could use to solve this question. A question from 2015 comes close (http://www.statalist.org/forums/foru...ars-dummy-vars) but I couldn't figure out how to translate that into my case, chiefly because the large number of possible combinations in my data (i.e., large number of distinct values of "name") makes it impossible to list out every one when declaring the loop. (Nonetheless, I thought I'd provide the cross-ref for posterity, in case someone in future finds this question but is looking for the other.)
I'd be grateful for any suggestions. Many thanks.
Comment