Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Reshaping based on suffix not prefix

    Hey y'all! Apologies in advance if this has been asked, I ran a search and didn't find anything but please feel free to point me in the right direction if needed.

    tl;dr: I have data in wide format that I need to reshape into long. The variable names vary by prefix and not by suffix. AFAIK, reshape appears to require static prefixes for the stub and variable suffixes for j. Is there a way to run reshape without having to relabel all my variables?

    Longer explanation:

    I have data from a randomized vignette survey. Participants answered the same questions about 9 out of 54 possible vignettes. The dataset labels the variables by vignette and then by content, so my variables look like "A1impact A1intent B1impact B1intent A2impact A2intent B2impact B2intent." I would like to reshape from wide to long where "intent" and "impact" are my new variables with rows for A1, A2, B1 and B2. Is this possible? I know my i is PID (participant ID) and my instinct is to make j something like *impact. The closest option I found to that would be @impact but I've run into errors with all of those approaches. Any advice?


    Details:
    I get the following error when I try @@@outcome or *outcome or ???outcome

    Code:
    . reshape long @@@outcome @@@convince @@@intent @@@impact @@@difftreat @@@power @@@realistic @@@wrong, i(PID) j(vignette)
    no xij variables found
        You typed something like reshape wide a b, i(i) j(j).
        reshape looked for existing variables named a# and b# but could not find any.  Remember this picture:
    
             long                                wide
            +---------------+                   +------------------+
            | i   j   a   b |                   | i   a1 a2  b1 b2 |
            |---------------| <--- reshape ---> |------------------|
            | 1   1   1   2 |                   | 1   1   3   2  4 |
            | 1   2   3   4 |                   | 2   5   7   6  8 |
            | 2   1   5   6 |                   +------------------+
            | 2   2   7   8 |
            +---------------+
    
            long to wide: reshape wide a b, i(i) j(j)    (j existing variable)
            wide to long: reshape long a b, i(i) j(j)    (j    new   variable)
    I get the following error when I try @outcome

    Code:
    . reshape long @outcome @convince @intent @impact @difftreat @power @realistic @wrong, i(PID) j(vignette)
    variable vignette contains all missing values
    r(498);
    When I try to drop vignette it tells me the variable cannot be found.



    Existing Data Structure
    PID S1Aimpact S1Aintent S1Bimpact S1Bintent S2Aimpact S2Aintent S2Bimpact S2Bintent R1Aimpact R1Aintent
    1 90 50 100 100
    2 75 100 20 20 90 100
    3 50 75 20 0 100 50
    4 20 100 90 0
    Ideal Data Structure
    PID impact intent vignette
    1 90 50 S1A
    1 100 100 S2A
    2 75 100 S1B
    2 20 20 S2A
    2 90 100 R1A
    3 50 75 S1A
    3 20 0 S2B
    3 100 50 R1A
    4 20 100 S1A
    4 90 0 S2B
    Other helpful info: I'm running v16.1 on a Windows 10 machine.

    ETA: Unfortunately the data is confidential and cannot be shared. Please let me know if I can make the example data more helpful! All of the data is numerical with associated labels as appropriate.
    Last edited by Eva Warren; 02 Mar 2022, 13:08.

  • #2
    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input byte(pid s1aimpact s1aintent s1bimpact s1bintent s2aimpact s2aintent s2bimpact s2bintent r1aimpact r1aintent)
    1 90  50  .   . 100 100  . .   .   .
    2  .   . 75 100  20  20  . .  90 100
    3 50  75  .   .   .   . 20 0 100  50
    4 20 100  .   .   .   . 90 0   .   .
    end
    
    reshape long @impact @intent, i(pid) j(vignette) string
    drop if missing(impact) & missing(intent)
    In the future, when showing data examples, please use the -dataex- command to do so, as I have here. If you are running version 17, 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

    Comment


    • #3
      Code:
      * Example generated by -dataex-. For more info, type help dataex
      clear
      input byte(pid s1aimpact s1aintent s1bimpact s1bintent s2aimpact s2aintent s2bimpact s2bintent r1aimpact r1aintent)
      1 90  50  .   . 100 100  . .   .   .
      2  .   . 75 100  20  20  . .  90 100
      3 50  75  .   .   .   . 20 0 100  50
      4 20 100  .   .   .   . 90 0   .   .
      end
      reshape long @impact @intent, i(pid) j(vignette) string
      drop if impact==. & intent==.
      list, clean
      Code:
      . reshape long @impact @intent, i(pid) j(vignette) string
      (j = r1a s1a s1b s2a s2b)
      
      Data                               Wide   ->   Long
      -----------------------------------------------------------------------------
      Number of observations                4   ->   20          
      Number of variables                  11   ->   4          
      j variable (5 values)                     ->   vignette
      xij variables:
            r1aimpact s1aimpact ... s2bimpact   ->   impact
            r1aintent s1aintent ... s2bintent   ->   intent
      -----------------------------------------------------------------------------
      
      . drop if impact==. & intent==.
      (10 observations deleted)
      
      . list, clean
      
             pid   vignette   impact   intent  
        1.     1        s1a       90       50  
        2.     1        s2a      100      100  
        3.     2        r1a       90      100  
        4.     2        s1b       75      100  
        5.     2        s2a       20       20  
        6.     3        r1a      100       50  
        7.     3        s1a       50       75  
        8.     3        s2b       20        0  
        9.     4        s1a       20      100  
       10.     4        s2b       90        0  
      
      .
      Added in edit: Clyde and I have simultaneously posted similar answers in the past, but never before have we posted identical code.
      Last edited by William Lisowski; 02 Mar 2022, 13:17.

      Comment


      • #4
        AMAZING! Thank you so much! Quick question about -dataex-: I went the table route because our data is confidential and can't be shared. In future similar cases, should I just create a mock .dta sample and then use -dataex-?

        Comment


        • #5
          Exactly. This is explained in the Statalist FAQ linked to from the top of the page, as well as from the Advice on Posting link on the page you used to create your post. Note especially sections 9-12 on how to best pose your question.

          Comment


          • #6
            In future similar cases, should I just create a mock .dta sample and then use -dataex-?
            YES!!! Just be sure that the mock example exhibits the same problems/questions as the original data.

            Comment


            • #7
              Will do! Thank you both for your help. You've save me many hours of frustration.

              Comment

              Working...
              X