Reshaping to long

Federico DellAnna

Join Date: Aug 2020

Posts: 7
#1

Reshaping to long

04 Aug 2020, 04:34

Dear all,
thank you for the valuable advice you provide on this forum to all STATA newbies like me.
As part of a research that I am carrying out with my group, we have developed a questionnaire based on the best to worst scaling approach. The survey plans to ask an interviewee through an online survey to choose between a set of alternatives the best and the worst (a choice for the best, a choice for the worst).
BEST WORST
OPT1a [_] [_]
OPT1b [X] [_]
OPT1c [_] [_]
OPT1d [_] [X]
OPT1e [_] [_]

In this example, OPT1b is the best, OPT1d is the worst.
This exercise is developed 5 times by the same interviewee. To develop the survey, we used LimeSurvey.
The software reports the results for each person interviewed horizontally. So the results for each interviewee take the form:
ID | OPTB1 | OPTW1 | OPTB2 | OPTW2 | OPTB3 | OPTW3 | OPTB4 | OPTW4 | OPTB5 | OPTW5 | AGE | INCOME
where ID is the indicator of the interviewee, OPTB1 is the "best" option chosen by the interview in the first group and OPTW1 is the "worst" option chosen by the interview in the first group, OPTB2 and OPTW2 in the second, and so via up to 5, AGE and INCOME the socio-economic variables for each interviewee.
In order to develop the analyzes we need the data to have a vertical format where there are 5 lines for each interviewee:
ID | OPTB1 | OPTW1 | AGE | INCOME
ID | OPTB2 | OPTW2 | AGE | INCOME
ID | OPTB3 | OPTW3 | AGE | INCOME
ID | OPTB4 | OPTW4 | AGE | INCOME
ID | OPTB5 | OPTW5 | AGE | INCOME

The groups of questions are preset. In fact there were 20 macro groups (randoms) that include the 5 different groups that are automatically selected by the LimeSurvey software for each interviewee in order to have a balanced dataset. The random number selected is available for each interviewee. I don't know if this last information can be useful.

I thank you very much for your support.
I hope you can help me.

Federico
Tags: None

Mike Lacy

Join Date: Apr 2014
Posts: 2400

04 Aug 2020, 08:52

This appears to require a standard -reshape long-. Note the example data and use of -dataex- as prescribed in the StataList for new members. Because I created the example data based on your description rather than a concrete example, it's possible my suggestion below is not what you want.

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input byte ID byte macgroup str1 (OPTB1 OPTW1 OPTB2 OPTW2 OPTB3 OPTW3 OPTB4 OPTW4 OPTB5 OPTW5) int AGE int INCOME
1 5 "d" "e" "a" "b" "b" "c" "e" "a" "d" "e" 65 10000
2 10 "b" "c" "a" "b" "b" "c" "a" "b" "d" "e" 49 20000
3 15 "d" "e" "a" "b" "c" "d" "a" "b" "c" 52 30000
end
//
list
reshape long OPTB OPTW, i(ID) j(question_num)
sort ID question_num
list

Comment

Federico DellAnna

Join Date: Aug 2020

Posts: 7
#3

05 Nov 2020, 05:09

Dear Mike,
many thanks for your kind reply.
FD
Comment
Federico DellAnna

Join Date: Aug 2020

Posts: 7
#4

05 Nov 2020, 05:39

I have another question why I should import the dataset in a further different format.
Starting from the vector built for each interviewee, I should obtain the dataset in this way:
ID; Interviewee_ID; Choice Set; A1; A2; A3; A4; A5; ...; A24; CHOICE; AGE, INCOME
1; 1; 1; 0; 1; 0; 0; 0; ...; 0; 0; 64; 10000
2; 1; 1; 1; 0; 0; 0; 0; ...; 0; 1; 64; 10000
3; 1; 1; 0; 0; 0; 0; 1; ...; 0; 0; 64; 10000
4; 1; 1; 0; 0; 0; 0; 0; ...; 1; 0; 64; 10000
5; 1; 1; 0; 0; -1; 0; 0; ...; 0; 0; 64; 10000
6; 1; 1; 0; -1; 0; 0; 0; ...; 0; 0; 64; 10000
7; 1; 1; 0; 0; 0; 0; -1; ...; 0; 1; 64; 10000
8; 1; 1; 0; 0; 0; 0; -1; ...; 1; 0; 64; 10000
9; 1; 2; 0; 0; 0; 1; 0; ...; 0; 1; 64; 10000
10; 1; 2; 0; 1; 0; 0; 0; ...; 0; 0; 64; 10000
11; 1; 2; 0; 0; 0; 0; 1; ...; 0; 0; 64; 10000
12; 1; 2; 0; 0; 0; 0; 0; ...; 1; 0; 64; 10000
13; 1; 2; 0; 0; 0; 0; -1; ...; 0; 0; 64; 10000
14; 1; 2; 0; 0; 0; -1; 0; ...; 0; 1; 64; 10000
15; 1; 2; 0; 0; 0; 0; -1; ...; 0; 0; 64; 10000
16; 1; 2; 0; -10; 0; 0; 0; ...; 1; 0; 64; 10000
...

Where the first variable indicates the ID of the response in the entire dataset, Interviewee_ID indicates the ID of the respondent, Choice set indicates the group of alternatives proposed for each choise set to be selected as best and worst, variables A1 to A24 indicate the different items, where it takes value 1 if the alternative is in the group of possible BEST solutions, -1 if in the WORST (for each set there are 4 BEST and 4 worst alternatives, and in the dataset the BEST and then the WORST must appear first ), CHOICE indicates the choice made by the interviewee (where 1 indicates the line where the interviewee's choice is described), then AGE and INCOME. In the example, only 2 choice sets were shown for the first interviewee.

Is it possible to carry out this transformation of the dataset into STATA?

Many thanks in advance

Federico
Comment
Federico DellAnna

Join Date: Aug 2020

Posts: 7
#5

06 Nov 2020, 06:24

Thanks to all. I solved the problem using SPSS.
Comment

Announcement

Comment

Comment

Comment

Comment