Dear all,
I am looking for a way to create dummies from a twoway table of variables that contain the same values, so something similar to the “tabulate variable, gen(newvariable)” command but with a varlist instead of just one variable.
The setting is as follows:
In one column (variable cityA) I have a complete list of cities and in another (variable cityB) I have the neighbouring cities for each city in the first column. Because of this setup, each combination of two neighbouring cities is observed twice (as each city in cityB appears in cityA and vice versa). Additionally, a third and a fourth variable, regionA and regionB, indicate in which region cityA resp. cityB are located. They could be located in the same region or in different regions (in which case both are located at the border of their respective regions).
Now what I would like to do is create an indicator variable for each unique combination of regions in the dataset, with the aim of both dividing up my dataset geographically and being able to distinguish between intra and interregional neighbours. I thought of using “egen combo= group(regionA regionB)” followed by “tabulate combo, gen(neighbour)” but “egen, group” is not suitable for this because it assigns a different value to each combination whether or not it is unique. For example, the combination XYZ-ABC is considered different from ABC-XYZ, whereas I need it to be considered the same, just like “tabulate regionA regionB” would. Is there a way around this issue? If deleting one of the two possible combinations between two cities is my only option, how would I go about that?
Thanks!
Best,
Sander
I am looking for a way to create dummies from a twoway table of variables that contain the same values, so something similar to the “tabulate variable, gen(newvariable)” command but with a varlist instead of just one variable.
The setting is as follows:
In one column (variable cityA) I have a complete list of cities and in another (variable cityB) I have the neighbouring cities for each city in the first column. Because of this setup, each combination of two neighbouring cities is observed twice (as each city in cityB appears in cityA and vice versa). Additionally, a third and a fourth variable, regionA and regionB, indicate in which region cityA resp. cityB are located. They could be located in the same region or in different regions (in which case both are located at the border of their respective regions).
Now what I would like to do is create an indicator variable for each unique combination of regions in the dataset, with the aim of both dividing up my dataset geographically and being able to distinguish between intra and interregional neighbours. I thought of using “egen combo= group(regionA regionB)” followed by “tabulate combo, gen(neighbour)” but “egen, group” is not suitable for this because it assigns a different value to each combination whether or not it is unique. For example, the combination XYZ-ABC is considered different from ABC-XYZ, whereas I need it to be considered the same, just like “tabulate regionA regionB” would. Is there a way around this issue? If deleting one of the two possible combinations between two cities is my only option, how would I go about that?
Thanks!
Best,
Sander
Comment