Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Issue reshaping panel data from wide to long: data for certain years are missing

    Greetings,

    I'm running Stata 15.1 on OSX and working with panel data. My variables were measured in different months between the years 2013-2018. I've labeled them according to the following format: 'burden_022013', where 022013 denotes February 2013 and so on and so forth. I then ran the following syntax:
    Code:
    reshape long burden_, i(caseid) j(date)
    I receive the following output:
    Code:
    . reshape long burden_, i(caseid) j(time)
    (note: j = 12017 12018 22013 22014 22016 22017 42017 52016 52017 72016 82017 92013 92015 102017 122014 122015)
    (note: burden_12017 not found)
    (note: burden_12018 not found)
    (note: burden_22013 not found)
    (note: burden_22014 not found)
    (note: burden_22016 not found)
    (note: burden_22017 not found)
    (note: burden_42017 not found)
    (note: burden_52016 not found)
    (note: burden_52017 not found)
    (note: burden_72016 not found)
    (note: burden_82017 not found)
    (note: burden_92013 not found)
    (note: burden_92015 not found)
    
    Data                               wide   ->   long
    -----------------------------------------------------------------------------
    Number of obs.                     3929   ->   62864
    Number of variables                  17   ->      16
    j variable (16 values)                    ->   time
    xij variables:
    burden_12017 burden_12018 ... burden_122015->  burden_
    -----------------------------------------------------------------------------
    I then proceed to convert the numeric date variable:
    Code:
    tostring date, replace
    gen month = real(substr(date,1,2)) if strlen(date)==6
    replace month = real(substr(date,1,1)) if strlen(date)==5
    gen year = real(substr(date,3,6)) if strlen(date)==6 
    replace year = real(substr(date,2,5)) if strlen(date)==5
    gen yearmonth=ym(year, month)
    format %tm yearmonth
    When I proceeded to check whether all of the data was reshaped as expected, I noticed that only 3 years are listed:
    Code:
    tab year if burden_!=.
    
           year |      Freq.     Percent        Cum.
    ------------+-----------------------------------
           2014 |      1,346       33.31       33.31
           2015 |      1,236       30.59       63.90
           2017 |      1,459       36.10      100.00
    ------------+-----------------------------------
          Total |      4,041      100.00
    In other words, I'm missing data for 2013, 2016, and 2018. I'm not quite sure what the issue is. Can anybody tell me what's going on here?

    Here are some of the data in wide format:

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input double caseid long(burden_022013 burden_092013) byte(burden_022016 burden_052016)
    989870 . . . .
    989871 2 . . .
    989872 . . . .
    989873 . . . .
    989874 1 . 1 1
    989875 2 1 1 1
    989876 1 1 1 1
    989877 3 3 2 2
    989878 3 3 3 3
    989879 . . . .
    989880 2 1 1 1
    989881 2 2 2 2
    989882 3 3 3 3
    989883 1 1 1 2
    989884 3 3 3 3
    989885 . . . .
    989886 2 2 2 2
    989887 . . . .
    989888 3 2 3 2
    989889 3 3 3 3
    989890 3 3 . .
    989891 . . . .
    989892 3 3 3 2
    989893 3 2 3 3
    989894 2 2 . .
    989895 . . . .
    989896 2 . . .
    989897 . 2 1 1
    989898 2 1 . .
    989899 3 2 . .
    989900 . . . .
    989901 . 2 . .
    989902 . . . .
    989903 3 3 3 3
    989904 2 2 3 3
    989905 2 3 3 3
    989906 2 2 1 2
    989907 3 3 3 3
    989908 3 3 3 3
    989909 . . . .
    989910 2 1 2 3
    989911 . . 2 3
    989912 3 3 3 3
    989913 3 3 3 3
    989914 2 1 2 2
    989915 . . . .
    989916 . 1 1 1
    989917 . . . .
    989918 2 2 . .
    989919 2 3 . .
    989920 3 3 3 3
    989921 2 2 2 2
    989922 1 1 1 1
    989923 3 3 3 3
    989924 2 3 . .
    989925 3 2 2 3
    989926 . . . .
    989927 1 2 2 2
    989928 2 3 3 3
    989929 1 1 . .
    989930 3 3 3 3
    989931 2 2 1 2
    989932 2 2 3 3
    989933 2 3 . .
    989934 1 1 1 1
    989935 . . . .
    989936 2 2 . .
    989937 3 3 3 3
    989938 . . 3 3
    989939 2 1 1 2
    989940 1 2 1 2
    989941 . . . .
    989942 . . . .
    989943 . . . .
    989944 1 1 2 2
    989945 1 2 1 1
    989946 2 3 2 2
    989947 2 2 2 1
    989948 3 3 3 3
    989949 . . . .
    989950 . . . .
    989951 3 3 3 3
    989952 2 2 . .
    989953 2 3 3 3
    989954 2 2 3 2
    989955 2 2 . .
    989956 1 3 3 3
    989957 . . . .
    989958 2 3 . .
    989959 3 2 . .
    989960 3 3 3 3
    989961 . . . .
    989962 2 2 . .
    989963 . . . .
    989964 . 2 1 1
    989965 3 3 3 3
    989966 3 2 2 2
    989967 . . . .
    989968 2 2 2 1
    989969 3 . 2 .
    end
    label values burden_022013 IMMIG7S15
    label def IMMIG7S15 1 "Refused", modify
    label def IMMIG7S15 2 "very concerned", modify
    label def IMMIG7S15 3 "somewhat concerned", modify
    label values burden_092013 IMMIG7S22
    label def IMMIG7S22 1 "Refused", modify
    label def IMMIG7S22 2 "very concerned", modify
    label def IMMIG7S22 3 "somewhat concerned", modify
    label values burden_022016 IMMIG7SS51
    label def IMMIG7SS51 1 "Refused", modify
    label def IMMIG7SS51 2 "very concerned", modify
    label def IMMIG7SS51 3 "somewhat concerned", modify
    label values burden_052016 IMMIG7SS54
    label def IMMIG7SS54 1 "Refused", modify
    label def IMMIG7SS54 2 "very concerned", modify
    label def IMMIG7SS54 3 "somewhat concerned", modify
    Thank in advance for your help!

  • #2
    I think I figured it out: the suffix for months can't begin with 0 (i.e. burden_012012 --> burden_12012). Is this even mentioned in the manual?

    Comment


    • #3
      The need for the string option with reshape when you have suffixes beginning with 0 is certainly mentioned in

      https://www.stata.com/support/faqs/d...-with-reshape/

      Comment

      Working...
      X