Hi Statalisters
I am working on a fairly simple question that I have not confronted before and hope that I may get input here.
Basically, I have created a dataset with an ID where some variables contain multiple values in a single cell. My aim is a data set in long format. I wonder what the best approach is. Do I make several rows for each ID?
I have created a data example to illustrate:
For analytic purposes, it's important to be able to count the number of e.g. "Economics", "Sociology", etc., as well as number of countries.
Note that there is not perfect overlap between number of values in Discipline and Country. It would be great if someone had input on how this may be solved through code instead of manual work as my real data set is larger.
Best
Tarjei
I am working on a fairly simple question that I have not confronted before and hope that I may get input here.
Basically, I have created a dataset with an ID where some variables contain multiple values in a single cell. My aim is a data set in long format. I wonder what the best approach is. Do I make several rows for each ID?
I have created a data example to illustrate:
Code:
* Example generated by -dataex-. To install: ssc install dataex clear input float ID str25 Discipline str23 Country 1 `""Economics""' "USA" 2 `""Statistics, Economics""' "Canada" 3 `""Sociology""' "USA, Germany" 4 `""Economics, Epidemiology""' "USA" 5 `""Sociology""' "UK" 6 `""Sociology""' "Norway, Sweden, Denmark" 7 `""Economics""' "UK, USA" 8 `""Sociology""' "Italy" 9 `""Epidemiology, Economics""' "USA" 10 `""Economics""' "USA" end
Note that there is not perfect overlap between number of values in Discipline and Country. It would be great if someone had input on how this may be solved through code instead of manual work as my real data set is larger.
Best
Tarjei
Comment