Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Reshaping to long

    Dear all,
    thank you for the valuable advice you provide on this forum to all STATA newbies like me.
    As part of a research that I am carrying out with my group, we have developed a questionnaire based on the best to worst scaling approach. The survey plans to ask an interviewee through an online survey to choose between a set of alternatives the best and the worst (a choice for the best, a choice for the worst).
    BEST WORST
    OPT1a [_] [_]
    OPT1b [X] [_]
    OPT1c [_] [_]
    OPT1d [_] [X]
    OPT1e [_] [_]

    In this example, OPT1b is the best, OPT1d is the worst.
    This exercise is developed 5 times by the same interviewee. To develop the survey, we used LimeSurvey.
    The software reports the results for each person interviewed horizontally. So the results for each interviewee take the form:
    ID | OPTB1 | OPTW1 | OPTB2 | OPTW2 | OPTB3 | OPTW3 | OPTB4 | OPTW4 | OPTB5 | OPTW5 | AGE | INCOME
    where ID is the indicator of the interviewee, OPTB1 is the "best" option chosen by the interview in the first group and OPTW1 is the "worst" option chosen by the interview in the first group, OPTB2 and OPTW2 in the second, and so via up to 5, AGE and INCOME the socio-economic variables for each interviewee.
    In order to develop the analyzes we need the data to have a vertical format where there are 5 lines for each interviewee:
    ID | OPTB1 | OPTW1 | AGE | INCOME
    ID | OPTB2 | OPTW2 | AGE | INCOME
    ID | OPTB3 | OPTW3 | AGE | INCOME
    ID | OPTB4 | OPTW4 | AGE | INCOME
    ID | OPTB5 | OPTW5 | AGE | INCOME

    The groups of questions are preset. In fact there were 20 macro groups (randoms) that include the 5 different groups that are automatically selected by the LimeSurvey software for each interviewee in order to have a balanced dataset. The random number selected is available for each interviewee. I don't know if this last information can be useful.

    I thank you very much for your support.
    I hope you can help me.

    Federico

  • #2

    This appears to require a standard -reshape long-. Note the example data and use of -dataex- as prescribed in the StataList for new members. Because I created the example data based on your description rather than a concrete example, it's possible my suggestion below is not what you want.

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input byte ID byte macgroup str1 (OPTB1 OPTW1 OPTB2 OPTW2 OPTB3 OPTW3 OPTB4 OPTW4 OPTB5 OPTW5) int AGE int INCOME
    1 5 "d" "e" "a" "b" "b" "c" "e" "a" "d" "e" 65 10000
    2 10 "b" "c" "a" "b" "b" "c" "a" "b" "d" "e" 49 20000
    3 15 "d" "e" "a" "b" "c" "d" "a" "b" "c" 52 30000
    end
    //
    list
    reshape long OPTB OPTW, i(ID) j(question_num)
    sort ID question_num
    list

    Comment


    • #3
      Dear Mike,
      many thanks for your kind reply.
      FD

      Comment


      • #4
        I have another question why I should import the dataset in a further different format.
        Starting from the vector built for each interviewee, I should obtain the dataset in this way:
        ID; Interviewee_ID; Choice Set; A1; A2; A3; A4; A5; ...; A24; CHOICE; AGE, INCOME
        1; 1; 1; 0; 1; 0; 0; 0; ...; 0; 0; 64; 10000
        2; 1; 1; 1; 0; 0; 0; 0; ...; 0; 1; 64; 10000
        3; 1; 1; 0; 0; 0; 0; 1; ...; 0; 0; 64; 10000
        4; 1; 1; 0; 0; 0; 0; 0; ...; 1; 0; 64; 10000
        5; 1; 1; 0; 0; -1; 0; 0; ...; 0; 0; 64; 10000
        6; 1; 1; 0; -1; 0; 0; 0; ...; 0; 0; 64; 10000
        7; 1; 1; 0; 0; 0; 0; -1; ...; 0; 1; 64; 10000
        8; 1; 1; 0; 0; 0; 0; -1; ...; 1; 0; 64; 10000
        9; 1; 2; 0; 0; 0; 1; 0; ...; 0; 1; 64; 10000
        10; 1; 2; 0; 1; 0; 0; 0; ...; 0; 0; 64; 10000
        11; 1; 2; 0; 0; 0; 0; 1; ...; 0; 0; 64; 10000
        12; 1; 2; 0; 0; 0; 0; 0; ...; 1; 0; 64; 10000
        13; 1; 2; 0; 0; 0; 0; -1; ...; 0; 0; 64; 10000
        14; 1; 2; 0; 0; 0; -1; 0; ...; 0; 1; 64; 10000
        15; 1; 2; 0; 0; 0; 0; -1; ...; 0; 0; 64; 10000
        16; 1; 2; 0; -10; 0; 0; 0; ...; 1; 0; 64; 10000
        ...

        Where the first variable indicates the ID of the response in the entire dataset, Interviewee_ID indicates the ID of the respondent, Choice set indicates the group of alternatives proposed for each choise set to be selected as best and worst, variables A1 to A24 indicate the different items, where it takes value 1 if the alternative is in the group of possible BEST solutions, -1 if in the WORST (for each set there are 4 BEST and 4 worst alternatives, and in the dataset the BEST and then the WORST must appear first ), CHOICE indicates the choice made by the interviewee (where 1 indicates the line where the interviewee's choice is described), then AGE and INCOME. In the example, only 2 choice sets were shown for the first interviewee.

        Is it possible to carry out this transformation of the dataset into STATA?

        Many thanks in advance

        Federico

        Comment


        • #5
          Thanks to all. I solved the problem using SPSS.

          Comment

          Working...
          X