Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Reshaping data with multiple obs per id

    Dear Stata users,

    I have the following panel data set covering all 50 states over 13 years.

    Code:
    . describe
      obs:       452,755                          
     vars:             4                          3 Mar 2020 23:39
     size:     7,244,080                          
    -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
                  storage   display    value
    variable name   type    format     label      variable label
    -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
    year                   long    %12.0g                CCES year
    state                  long    %12.0g     state      State (FIPS)
    st                     long    %12.0g     st         State abbreviation (FIPS)
    approval_gov           long    %12.0g     approval_gov  Governor approval
    -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
    Sorted by: 
    
    
    
    . tab approval_gov,m
    
                   Governor approval |      Freq.     Percent        Cum.
    ---------------------------------+-----------------------------------
                    Strongly Approve |     67,404       14.89       14.89
          Approve / Somewhat Approve |    139,940       30.91       45.80
    Disapprove / Somewhat Disapprove |     84,271       18.61       64.41
                 Strongly Disapprove |    117,282       25.90       90.31
              Never Heard / Not Sure |     40,318        8.91       99.22
      Neither Approve Nor Disapprove |      1,414        0.31       99.53
                                   . |      2,126        0.47      100.00
    ---------------------------------+-----------------------------------
                               Total |    452,755      100.00
    
    . tab approval_gov,m nolab
    
       Governor |
       approval |      Freq.     Percent        Cum.
    ------------+-----------------------------------
              1 |     67,404       14.89       14.89
              2 |    139,940       30.91       45.80
              3 |     84,271       18.61       64.41
              4 |    117,282       25.90       90.31
              5 |     40,318        8.91       99.22
              6 |      1,414        0.31       99.53
              . |      2,126        0.47      100.00
    ------------+-----------------------------------
          Total |    452,755      100.00
    
    
    
    .    tab year
    
        CCES year    Freq.    Percent    Cum.
                    
        2006    36,421    8.04    8.04
        2007    9,999    2.21    10.25
        2008    32,800    7.24    17.50
        2009    13,800    3.05    20.55
        2010    55,400    12.24    32.78
        2011    20,150    4.45    37.23
        2012    54,535    12.05    49.28
        2013    16,400    3.62    52.90
        2014    56,200    12.41    65.31
        2015    14,250    3.15    68.46
        2016    64,600    14.27    82.73
        2017    18,200    4.02    86.75
        2018    60,000    13.25    100.00
                    
        Total    452,755    100.00
    The dataset is 4 variable wide and 452,755 observations long because each state year contains individual responses to a survey question about gubernatorial approval. Thus, there are 495 observations for Alabama in 2006, 131 obs for Alaska in 2006 etc. I would like to reshape the dataset and create 6 variables " Strongly_Approve", "Approve", "Disapprove", "Strongly_Disapprove", "Never_Heard", "Neither_Approve_Nor_Disapprove" instead of "approval_gov". Each of these 6 new variables would count the number of occurrences of each type of response in a given state year so that my dataset is 650 observations long (50 states by 13 years). There are about 2,000 missing values in approval_gov, which can be ignored/dropped. I tried to use `reshape`, but it does not seem to be able to give me what I want. I would be grateful for any help.

  • #2
    At this very moment, I cannot test if the code works in Stata but you can try something like:

    Code:
    bysort state year: tabulate approval_gov, gen(newvar)
    Then, just - rename - the variables accordingly.

    Hopefully it works.
    Best regards,

    Marcos

    Comment


    • #3
      Dear Marcos,

      Thank you for the advice! I've tested the code and I get the following error:


      Code:
      . bysort state year: tabulate approval_gov, gen(newvar)
      
      ----------------------------------------------------------------------------------
      -> state = Alabama, year = 2006
      request may not be combined with by
      r(190);
      
      end of do-file
      
      r(190);
      And, of course, if i get rid of bysort, there is no error but it gives me wrong counts...

      Comment


      • #4
        I tried to use `reshape`, but it does not seem to be able to give me what I want.
        Perhaps if you were to tell us the reshape command you issued and what it gave you, we could tell you how to change the command to get what you want. It is a lot easier to solve a problem by starting from a partial solution than to invent an entire solution from scratch, as you are asking us to do.

        In post #3 you did exactly what you should have done in post #1: copy the command and it's output from Stata's Results window and paste it into a code block in the forum, accompanying it if necessary with enough commentary to explain how you got differed from what you wanted. (Of course, if reshape quit with an error message, that doesn't take much explaining!).

        Comment


        • #5
          Dear William,

          Thank you for your reply. Here is the reshape code I have used, but I think it won't do the job because i need the sums of all occurrence of each value of approval_gov per state-year, whereas the `reshape` function only changes the structure of the data. This is why I assumed it would not be useful to post the code.

          Code:
          . reshape wide approval_gov, i(state) j(year)
          (note: j = 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018)
          values of variable year not unique within state
              Your data are currently long.  You are performing a reshape wide.  You
              specified i(state) and j(year).  There are observations within i(state) with
              the same value of j(year).  In the long data, variables i() and j() together
              must uniquely identify the observations.
          
                   long                                wide
                  +---------------+                   +------------------+
                  | i   j   a   b |                   | i   a1 a2  b1 b2 |
                  |---------------| <--- reshape ---> |------------------|
                  | 1   1   1   2 |                   | 1   1   3   2  4 |
                  | 1   2   3   4 |                   | 2   5   7   6  8 |
                  | 2   1   5   6 |                   +------------------+
                  | 2   2   7   8 |
                  +---------------+
              Type reshape error for a list of the problem variables.
          r(9);

          Comment


          • #6
            Well, I started down the reshape path, which took me far enough to understand what Marcos recommended in post #2.
            Code:
            * Example generated by -dataex-. To install: ssc install dataex
            clear
            input float(year state approval_gov)
            2006 1 3
            2006 1 3
            2006 1 2
            2006 1 1
            2006 1 .
            2006 2 3
            2006 2 1
            2006 2 3
            2006 2 5
            2006 2 2
            2007 1 5
            2007 1 3
            2007 1 .
            2007 1 4
            2007 1 1
            2007 2 .
            2007 2 6
            2007 2 6
            2007 2 4
            2007 2 1
            end
            
            tabulate approval_gov, generate(ag)
            list in 1/5
            collapse (sum) ag1-ag6, by(year state)
            rename (ag1-ag6) (SA A D SD NH NAOD)
            list, clean noobs
            Code:
            . list in 1/5
            
                 +-------------------------------------------------------------+
                 | year   state   approv~v   ag1   ag2   ag3   ag4   ag5   ag6 |
                 |-------------------------------------------------------------|
              1. | 2006       1          3     0     0     1     0     0     0 |
              2. | 2006       1          3     0     0     1     0     0     0 |
              3. | 2006       1          2     0     1     0     0     0     0 |
              4. | 2006       1          1     1     0     0     0     0     0 |
              5. | 2006       1          .     .     .     .     .     .     . |
                 +-------------------------------------------------------------+
            
            . collapse (sum) ag1-ag6, by(year state)
            
            . rename (ag1-ag6) (SA A D SD NH NAOD)
            
            . list, clean noobs
            
                year   state   SA   A   D   SD   NH   NAOD  
                2006       1    1   1   2    0    0      0  
                2006       2    1   1   2    0    1      0  
                2007       1    1   0   1    1    1      0  
                2007       2    1   0   0    1    0      2

            Comment


            • #7
              William,

              I tested the code with my dataset and it gives me exactly what I needed! Thank you so much for your time and this elegant solution!

              Comment


              • #8
                I have a merged data set for mutual funds and etfs. I want to create a bar graph to compare the number of mutual funds and etfs over the sample period (2010-2019). I created a dummy variable that distinguished between mutual funds and etfs (mutual == 1 if the observation is a mutual fund). How would I use this dummy variable to create a comparative bar graph of the number of mutual funds and etfs over the sample period?

                Comment


                • #9
                  Please start a new thread on this new matter.

                  Also, please take a at FAQ, particularly the recommendations about uses - dataex - or code delimiters to share a fraction of data.
                  Best regards,

                  Marcos

                  Comment

                  Working...
                  X