Merging/Collapsing Multiple Variables of Similar Type

Danielle Julien

Join Date: Jun 2016

Posts: 26
#1

Merging/Collapsing Multiple Variables of Similar Type

20 Jun 2016, 09:36

Hello,
I am working with a dataset from a survey. In the survey, participants were asked to answer questions based on the number of dogs in their household.
The study was conducted online and the dataset output that I received lists for example, the sex of each of the respondents' dogs in vertical tables.

e.g.
Respondent number Sexdog1 Sexdog2 Sexdog3 Sexdog4

1 male female male male

2 Female male . .

There are 2006 respondents, and each has the ability to list many characteristics of the dog (including sex), for up to a maximum 10 dogs;
So, there are 2006 rows, and 10 columns across for one question in one table, for example.

Is there any way to merge/collapse the data in STATA14 wherein I can see by respondent number, what was the most commonly listed sex of dog, in all of their dogs listed?
So, merge Sexdog1, Sexdog2, Sexdog3 etc. into one variable for this particular respondent to show "the most common type of sex of the dog in respondent household 1 was ... "
?
I have both continuous and categorical data (just thought I would show the categorical as the example)

Thanks so much in advance!
Danielle
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 29942
#2

20 Jun 2016, 10:02

What do you want the result to be for Respondent 2? There is one male and one female: which is the "most common?"
Comment
Danielle Julien

Join Date: Jun 2016

Posts: 26
#3

20 Jun 2016, 10:29

Yes, very good question Clyde, thank you for responding!
Some of the respondents, only have 1 dog, some have 2, some have 5 and very, very few have 10 dogs.

I suppose the question is, on that basis, is it even useful for me to merge the variables??

Maybe I have to leave them all as is, and use the <tab> or <summ> command to view details for descriptive stats?

I am new to STATA so sorry for the overly simple questions,

Danielle
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 29942
#4

20 Jun 2016, 11:01

Well, my question didn't really raise a Stata issue. If you were working in another package, my counter-question would still apply. This comes up all the time on the forum, where somebody wants to calculate a variable that tells them "the most frequent value" of some variable (or something related to that) and has not specified what to do if there is a tie.

Perhaps the most sensible thing in this case is to simply assign each respondent a count of males and a count of females? Or perhaps a three-way variable: more males than females, equal numbers, more females than males?

Code:

* Example generated by -dataex-. To install: ssc install dataex clear input byte respondent_number str6(sexdog1 sexdog2) str4(sexdog3 sexdog4) 1 "male" "female" "male" "male" 2 "female" "male" "" "" end reshape long sexdog, i(respondent_number) j(dog_number) collapse (count) n_dogs_ = dog_number, by(respondent_number sexdog) drop if missing(sexdog) reshape wide n_dogs_, i(respondent_number) j(sexdog) string gen byte threeway = sign(n_dogs_male - n_dogs_female) label define threeway 1 "More Males" 0 "Equal" -1 "More Females" label values threeway threeway

The code above begins by inputting a cleaned up version of your example data (and, in the future, please use -dataex- to post examples, as I have done), and then calculates for you the counts of males and females, and a three-way variable showing which, if either, is the majority. The original detail data on the dogs has been obliterated in the process. But if you want it back, you can always -merge- these results with your original data set.

Last edited by Clyde Schechter; 20 Jun 2016, 11:38. Reason: Correct error in code: had the signs wrong in the label define command.
Comment
rezza cav

Join Date: Jul 2022

Posts: 13
#5

08 Jul 2022, 05:51

I have a very similar problem where I have not found the solution yet.

In my data (example from other post) I also have multiple entries for the same population ("entity") but also for different years ("year").

1. With "egen tag" and "egen ndistinct" I can find out how many different "firms" each combination of "entity" and "year" (group?) has.
( I will have to do this not only for "firm" but also for "value" and other variables)

2. Now I would like to know the frequencies for each value in "ndistinct" over all the combinations of "entity" and "year".
(To get an idea of the variation within different variables and think about how to merge them in the most sensible way).

3. Last I need to make sure there is only one observation per combinations of "entity" and "year" and therefore combine (merge, collapse, append...?) the values of other variables. If there is a most frequent value, choose that one, if there is not, choose the highest or first, or in some other cases the mean or sum (to be defined for each variable).

I really struggle to find a way how to do that. I hope my example is clear.

Original data

Code:

* Example generated by -dataex-. To install: ssc install dataex clear input byte entity int year str1 firm float value 1 2010 "A" 15 1 2010 "A" 8 1 2010 "B" 12 1 2011 "B" 25 1 2012 "B" 8 2 2010 "A" 7 2 2011 "A" 5 2 2011 "A" 12 2 2011 "C" 13 2 2012 "A" 19 2 2012 "B" 25 2 2011 "B" 14 2 2012 "C" 18 2 2012 "D" 16 end sort entity year list, sepby(entity year)

1. Question

Code:

* Example generated by -dataex-. To install: ssc install dataex clear input byte entity int year str1 firm float value 1 2010 "A" 15 1 2010 "A" 8 1 2010 "B" 12 1 2011 "B" 25 1 2012 "B" 8 2 2010 "A" 7 2 2011 "A" 5 2 2011 "A" 12 2 2011 "C" 13 2 2012 "A" 19 2 2012 "B" 25 2 2011 "B" 14 2 2012 "C" 18 2 2012 "D" 16 end egen tag = tag(firm entity year) egen ndistinct = total(tag), by(entity year) sort entity year list, sepby(entity year)

2. Question -> This is the table I would like to obtain

Code:

* Example generated by -dataex-. To install: ssc install dataex clear input byte ndistinct freq 1 3 2 1 3 1 4 1 end sort ndistinct freq list, sepby(ndistinct freq)

3. Question?

See also:
https://www.statalist.org/forums/for...r-each-subject
https://www.statalist.org/forums/for...iven-condition
https://www.statalist.org/forums/for...cases-by-group
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 29942
#6

08 Jul 2022, 10:01

It appears that this same question has been posted on 6 different threads. It is answered at https://www.statalist.org/forums/for...r-each-subject.

Seriously? 6 threads? That's abusive. Please stop it. Find one appropriate thread for your topic and post it there.
2 likes
Comment

Respondent number	Sexdog1	Sexdog2	Sexdog3	Sexdog4
1	male	female	male	male
2	Female	male	.	.

Announcement

Merging/Collapsing Multiple Variables of Similar Type

Comment

Comment

Comment

Comment

Comment