Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Help gen a variable which counts qualifications (translating SPSS to Stata syntax)

    Hello all , I am attempting to manually translate some SPSS syntax into Stata syntax. It is a fairly large dataset and I am stuck on a section which does not seem to translate well. I would be really interested (and grateful) to see how the Stata community may approach this. The SPSS syntax generates some new variables which adds up the number of qualifications a student achieves. The syntax also creates other variables that counts how many qualifications an individual has been awarded at different levels. I have been attempting to use egen, rowfirst, _n, _N, count, gen and bysort commands with little success on Stata 16. I have posted the syntax in SPSS below for anyone that is bi-lingual. I would really appreciate any advice given. Best Wishes, Matthew

    Code:
     
     AGGREGATE   /OUTFILE='H:\Data\spss_data\aggr1.sav'   /BREAK=attainment_year uniqueid new_qualifications   /quallevel_first=FIRST(quallevel)    /qualresult1_first=FIRST(qualresult1)    / qualcode_first=FIRST(qualcode)    /centre_first_1=FIRST(centre)   /entrystage_first=FIRST(entrystage)   / subject_group_1=FIRST(subject_group)  /sqalevel_first = FIRST (sqalevel)  /sqalevel_1_first = FIRST (sqalevel_1). EXECUTE.
    Code:
       
     AGGREGATE   /OUTFILE=* MODE=ADDVARIABLES   /BREAK= attainment_year uniqueid   /N_BREAK=N.  Variable  LABELS  N_Break 'number of SGs and level 3-4-5 qualifications'. execute.
    Code:
       
     AGGREGATE   /OUTFILE=* MODE=ADDVARIABLES   /BREAK= attainment_year uniqueid sqalevel_first   /N_BREAK2=N. Variable  LABELS  N_Break2 'number qualifications pass at least some '. execute.  compute number_level3=0. if sqalevel_first=3 number_level3=N_BREAK2. execute.  compute number_level4=0. if sqalevel_first=4 number_level4=N_BREAK2. execute.  compute number_level5=0. if sqalevel_first=5 number_level5=N_BREAK2. execute.   AGGREGATE   /OUTFILE=* MODE=ADDVARIABLES   /BREAK=attainment_year uniqueid    /number_level3_max=MAX(number_level3)    /number_level5_max=MAX(number_level5)    /number_level4_max=Max(number_level4).

  • #2
    There are some people here who know SPSS well. Otherwise there are, I think, many more people who know Stata well and if no-one from the first group answers I suggest

    1. Posting a Stata data example of real or realistic data with variable names as used in Stata.

    2. Giving rules and examples of results.

    3. Showing the code you tried in Stata and explaining what went wrong.

    Comment


    • #3
      Thanks for your quick response and for all your hard work in the Statalist community Nick. I have posted some realistic data below.

      Code:
      uniqueid    attainment_year    school_code    qualcode    quallevel    qualresult    subject
      100000      2012                84848595       102020           4            1        Maths
      100000      2012                84848595       102030           4            4        English
      100000      2012                84848595       102040           4            3        Science
      100001      2013                94567577       254565           5            2        History
      100001      2013                94567577       232425           5            5        Geography
      What I am trying to do
      Generate various variables that counts the total number of qualifications, the number of qualification at different levels for each individual and again per subject grouping. For example, a variable number_level3 that counts the number of qualifications each individual achieves at quallevel 3.


      Problem
      I have attempted to translate the SPSS syntax as closely as possible with little success. I have tried using the COLLAPSE command to mimic the function of AGGREGATE with no success. Please find below some examples of the Stata syntax I have been using.

      Code:
       egen quallevel_first = rowfirst(quallevel)
      Code:
       bysort attainment_year uniqueid: gen N_BREAK = _N
      Code:
       bysort attainment_year uniqueid quallevel_first:    gen N_BREAKS2 = _N    gen number_level3 = 0 if quallevel_first = 3 & number_level3 = N_BREAK2
      I would really appreciate a fresh take on this if anyone has a spare few minutes. 🤝
      Last edited by Matthew Sutherland; 18 Feb 2022, 05:55.

      Comment


      • #4
        Matt,
        I tried to replicate your SPSS syntax (#1) but there are many other undefined variables in your small sample data (#3). I couldn't find, for example, 'qualifications', 'qualresults1', 'centre', etc..
        As Nick mentioned above, other Stata users will help you if you post a more broader sample and explain your goal (step-by-step if possible).
        Also, I am confused what is your end goal in this data set. The first SPSS syntax (#1) creates a collapse file but the second and third script generates new variables ('collapse' and 'egen' commands, respectively, in Stata).So, do you want to creates new file or create new variable in current data set? or both?
        C
        Last edited by Chul Lee; 18 Feb 2022, 09:15.

        Comment


        • #5
          Following on the advice from both Nick Cox and Chul Lee, I was able to deduce only one objective from post #3:

          For example, a variable number_level3 that counts the number of qualifications each individual achieves at quallevel 3.
          Your data wasn't sufficient to create a substantial example, so I invented some data of my own and present it using the output of the dataex command, as the Statalist FAQ recommends in its advice on effectively posing your questions, posting data, and sharing Stata output.
          Code:
          * Example generated by -dataex-. For more info, type help dataex
          clear
          input long uniqueid int quallevel
          200000 2
          200000 3
          200000 3
          200000 3
          200000 4
          200000 5
          200001 1
          200001 1
          200001 4
          200001 5
          200002 2
          200002 3
          200002 4
          200002 4
          200002 5
          200003 1
          200003 1
          200003 2
          200003 4
          200003 4
          200003 5
          200004 1
          200004 2
          200004 2
          200004 4
          200004 5
          end
          generate int number_level = 1
          collapse (sum) number_level, by(uniqueid  quallevel)
          list, abbreviate(16) sepby(uniqueid)
          reshape wide number_level, i(uniqueid) j(quallevel)
          mvencode number_level*, mv(. = 0)
          list, noobs abbreviate(16)
          Code:
          . generate int number_level = 1
          
          . collapse (sum) number_level, by(uniqueid  quallevel)
          
          . list, abbreviate(16) sepby(uniqueid)
          
               +-------------------------------------+
               | uniqueid   quallevel   number_level |
               |-------------------------------------|
            1. |   200000           2              1 |
            2. |   200000           3              3 |
            3. |   200000           4              1 |
            4. |   200000           5              1 |
               |-------------------------------------|
            5. |   200001           1              2 |
            6. |   200001           4              1 |
            7. |   200001           5              1 |
               |-------------------------------------|
            8. |   200002           2              1 |
            9. |   200002           3              1 |
           10. |   200002           4              2 |
           11. |   200002           5              1 |
               |-------------------------------------|
           12. |   200003           1              2 |
           13. |   200003           2              1 |
           14. |   200003           4              2 |
           15. |   200003           5              1 |
               |-------------------------------------|
           16. |   200004           1              1 |
           17. |   200004           2              2 |
           18. |   200004           4              1 |
           19. |   200004           5              1 |
               +-------------------------------------+
          
          . reshape wide number_level, i(uniqueid) j(quallevel)
          (j = 1 2 3 4 5)
          
          Data                               Long   ->   Wide
          -----------------------------------------------------------------------------
          Number of observations               19   ->   5          
          Number of variables                   3   ->   6          
          j variable (5 values)         quallevel   ->   (dropped)
          xij variables:
                                     number_level   ->   number_level1 number_level2 ... number_level5
          -----------------------------------------------------------------------------
          
          . mvencode number_level*, mv(. = 0)
          number_lev~1: 2 missing values recoded
          number_lev~2: 1 missing value recoded
          number_lev~3: 3 missing values recoded
          
          . list, noobs abbreviate(16)
          
            +------------------------------------------------------------------------------------------+
            | uniqueid   number_level1   number_level2   number_level3   number_level4   number_level5 |
            |------------------------------------------------------------------------------------------|
            |   200000               0               1               3               1               1 |
            |   200001               2               0               0               1               1 |
            |   200002               0               1               1               2               1 |
            |   200003               2               1               0               2               1 |
            |   200004               1               2               0               1               1 |
            +------------------------------------------------------------------------------------------+
          
          .

          Comment


          • #6
            Thanks you so much for taking the time to help Chun and William. I really appreciate it. Sorry about the poor attempt at example data. I have updated the example dataset below using dataex which should be more fitting for this forum.

            William. Thank you so much for the above code. I have been struggling in how best to translate the spss AGGREGATE command to Stata and your code has been really helpful.

            Code:
            * Example generated by -dataex-. To install: ssc install dataex
            clear
            input str9 uniqueid str18 centre str19(attainment_year new_qualifications qual_level) str15 entry_stage str12(qual_result subject_code) str16 Subject_Group float(sqalevel sqa_level1)
            "100000" "#1234" "2011" "400010" "Standard Grade" "4" "1" "4010"   "Maths"            4 .
            "100000" "#1234" "2011" "400020" "Standard Grade" "4" "1" "4020"   "Science"          4 .
            "100000" "#1234" "2011" "400030" "Standard Grade" "4" "1" "4030"   "Modern Languages" 4 .
            "100000" "#1234" "2011" "400040" "Standard Grade" "4" "1" "4040"   "Vocational"       4 .
            "100001" "#1234" "2011" "400050" "Standard Grade" "4" "2" "4050"   "Technical"        4 .
            "100001" "#1234" "2011" "400060" "Standard Grade" "4" "1" "4060"   "Modern Languages" 4 .
            "100001" "#1234" "2011" "400070" "Standard Grade" "4" "2" "4070"   "Science"          4 .
            "100001" "#1234" "2012" "600001" "Higher"         "5" "3" "6001"   "Modern Languages" 5 .
            "100001" "#1234" "2012" "600020" "Higher"         "5" "4" "6020"   "Vocational"       5 .
            "100001" "#1234" "2012" "600030" "Higher"         "6" "5" "6030"   "Technical"        5 .
            "100002" "#5678" "2014" "204012" "National 5"     "6" "5" "200412" "Maths"            . 4
            "100002" "#5678" "2014" "204010" "National 5"     "6" "7" "200010" "Science"          . 4
            "100002" "#5678" "2015" "600056" "Higher"         "5" "3" "600056" "Modern Languages" . 5
            "100002" "#5678" "2016" "600434" "Higher"         "6" "2" "600434" "Vocational"       . 5
            "100003" "#7890" "2013" "102435" "National 4"     "4" "3" "102435" "Technical"        . 3
            "100003" "#7890" "2012" "103456" "National 4"     "4" "1" "103456" "Maths"            . 3
            "100003" "#7890" "2012" "103454" "National 4"     "4" "1" "103454" "Science"          . 3
            "100003" "#7890" "2013" "201342" "National 5"     "5" "1" "201342" "Modern Languages" . 4
            "100003" "#7890" "2013" "201234" "National 5"     "5" "1" "201234" "Vocational"       . 4
            "100004" "#1234" "2013" "201465" "National 5"     "4" "2" "201465" "Technical"        . 4
            "100004" "#1234" "2013" "201254" "National 5"     "4" "2" "201254" "Maths"            . 4
            "100004" "#1234" "2013" "201465" "National 5"     "4" "2" "201465" "Science"          . 4
            "100004" "#1234" "2013" "201345" "National 5"     "4" "3" "201345" "Modern Languages" . 4
            "100004" "#1234" "2013" "201364" "National 5"     "4" "2" "201364" "Vocational"       . 4
            "100004" "#1234" "2013" "201564" "National 5"     "4" "1" "201564" "Technical"        . 4
            "100004" "#1234" "2014" "602345" "Higher"         "5" "3" "602345" "Maths"            5 .
            "100004" "#1234" "2014" "603456" "Higher"         "5" "3" "603456" "Science"          5 .
            "100004" "#1234" "2014" "603256" "Higher"         "5" "3" "603256" "Modern Languages" 5 .
            "100004" "#1234" "2014" "602365" "Higher"         "5" "3" "602365" "Vocational"       5 .
            "100004" "#1234" "2015" "603456" "Higher"         "6" "1" "603456" "Technical"        5 .
            end

            End goal

            The data will be used to examine whether there are any trends in the number and configuration of subject choices and subsequent attainment based on individual, school and family characteristics.


            Objectives
            • Generate a new variable numberlevel_4 that counts the total number of qualifications each pupil achieves at sqalevel 4 or sqa_level_1 4. A pass is defined as qualresult >=1 or < 7.
            • Generate a new variable that counts qualifications within each subject group.For example, generate a variable science_2 to explore whether or not a student studied two science subjects.
            Hopefully that is a little clearer for everyone. I would appreciate any comments/help with the two objectives. I hope you are all having a great weekend,

            Comment


            • #7
              Perhaps this will start you in a useful direction.
              Code:
              destring uniqueid attainment_year new_qualifications ///
                  entry_stage qual_result subject_code, replace
              destring centre, ignore("#") replace
              
              bysort uniqueid: egen numberlevel_4 =  ///
                  total( qual_result>=1 & qual_result<7 & (sqalevel==4 | sqa_level1==4) )
              
              // if a subject is repeated only count once, the most recent 
              bysort uniqueid (Subject_Group subject_code attainment_year): egen science_count = ///
                  total( Subject_Group=="Science" & subject_code!=subject_code[_n+1] ) 
              
              sort uniqueid attainment_year Subject_Group subject_code
              list uniqueid attainment_year qual_result sqalevel sqa_level1 numberlevel_4 ///
                  Subject_Group subject_code science_count, noobs sepby(uniqueid)
              Code:
              . list uniqueid attainment_year qual_result sqalevel sqa_level1 numberlevel_4 ///
              >     Subject_Group subject_code science_count, noobs sepby(uniqueid)
              
                +----------------------------------------------------------------------------------------------------------+
                | uniqueid   attain~r   qual_r~t   sqalevel   sqa_le~1   number~4      Subject_Group   subjec~e   scienc~t |
                |----------------------------------------------------------------------------------------------------------|
                |   100000       2011          1          4          .          4              Maths       4010          1 |
                |   100000       2011          1          4          .          4   Modern Languages       4030          1 |
                |   100000       2011          1          4          .          4            Science       4020          1 |
                |   100000       2011          1          4          .          4         Vocational       4040          1 |
                |----------------------------------------------------------------------------------------------------------|
                |   100001       2011          1          4          .          3   Modern Languages       4060          1 |
                |   100001       2011          2          4          .          3            Science       4070          1 |
                |   100001       2011          2          4          .          3          Technical       4050          1 |
                |   100001       2012          3          5          .          3   Modern Languages       6001          1 |
                |   100001       2012          5          5          .          3          Technical       6030          1 |
                |   100001       2012          4          5          .          3         Vocational       6020          1 |
                |----------------------------------------------------------------------------------------------------------|
                |   100002       2014          5          .          4          1              Maths     200412          1 |
                |   100002       2014          7          .          4          1            Science     200010          1 |
                |   100002       2015          3          .          5          1   Modern Languages     600056          1 |
                |   100002       2016          2          .          5          1         Vocational     600434          1 |
                |----------------------------------------------------------------------------------------------------------|
                |   100003       2012          1          .          3          2              Maths     103456          1 |
                |   100003       2012          1          .          3          2            Science     103454          1 |
                |   100003       2013          1          .          4          2   Modern Languages     201342          1 |
                |   100003       2013          3          .          3          2          Technical     102435          1 |
                |   100003       2013          1          .          4          2         Vocational     201234          1 |
                |----------------------------------------------------------------------------------------------------------|
                |   100004       2013          2          .          4          6              Maths     201254          2 |
                |   100004       2013          3          .          4          6   Modern Languages     201345          2 |
                |   100004       2013          2          .          4          6            Science     201465          2 |
                |   100004       2013          2          .          4          6          Technical     201465          2 |
                |   100004       2013          1          .          4          6          Technical     201564          2 |
                |   100004       2013          2          .          4          6         Vocational     201364          2 |
                |   100004       2014          3          5          .          6              Maths     602345          2 |
                |   100004       2014          3          5          .          6   Modern Languages     603256          2 |
                |   100004       2014          3          5          .          6            Science     603456          2 |
                |   100004       2014          3          5          .          6         Vocational     602365          2 |
                |   100004       2015          1          5          .          6          Technical     603456          2 |
                +----------------------------------------------------------------------------------------------------------+

              Comment


              • #8
                A very useful direction, thank you William! Much appreciated.

                Comment

                Working...
                X