Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    Thank you! I knew it would be something simple that I was missing. Much appreciate you taking the time to share those tips too.

    Comment


    • #17
      This was one to easily look over. Posting your exact code made it easier to troubleshoot though.

      Comment


      • #18
        To keep other variables in my dataset, e.g. postcode parity babynum, do I need to specify them in both the -collapse- and -keep- commands below?

        I've created a local macro VLISTKEEP which I define as a list of the other variables I want to keep:

        Code:
        **define list of variables that need to be split here 
        local VLISTSTRING "procsops obstetriccomplications" // quick for testing
        local VLISTKEEP "plurality gravidity parity babynum placeofbirth preflanguage postcode"
        
        foreach item in `VLISTSTRING'{
            use `original', clear
            *splitting your lists of procedures
            split `item', parse(`=char(10)')
            drop `item'
            keep episodeid babyepisodeid `VLISTKEEP' `item'*
            reshape long `item', i(episodeid babyepisodeid) j(`item'_no)
            drop if `item' ==""
        
            *making your binary variables:
            replace `item' =strtoname(`item')
        
            levelsof `item', local(listofthings)
            foreach thing of local listofthings{
                gen `thing'=0
                replace `thing'=1 if `item'=="`thing'"
            }
            collapse (max) `item'_no `listofthings' `VLISTKEEP', by(episodeid babyepisodeid)
            merge 1:1 episodeid babyepisodeid using `merged', nogenerate
            save `merged', replace
        }
        use `merged', clear
        at the moment I get a "type mismatch" error when Stata is passing through the collapse (max) part of the loop:



        ---------------------------------------------------- begin collapse._max ---
        - args y x sortpreserve wt w by
        - if `sortpreserve' {
        = if 0 {
        tempvar sortpreserve
        local sortvars : sortedby
        gen `sortpreserve' = _n
        }
        - tempvar touse
        - quietly {
        - local ty : type `x'
        = local ty : type __00001Y
        - gen byte `touse' = (`x' < .)
        = gen byte __000021 = (__00001Y < .)
        type mismatch
        sort `by' `touse' `x'
        if `"`by'"' != "" {
        by `by': gen `ty' `y' = `x'[_N]
        }
        else gen `ty' `y' = `x'[_N]
        }
        ------------------------------------------------------ end collapse._max ---
        capture drop `drp`i''
        if `"`fcn`i''"' == "percent" & "`lab`i''" == "_percent" {
        label var `new`i'' "percent"
        }
        else if `"`fcn`i''"' == "count" & "`lab`i''" == "_freq" {
        label var `new`i'' "frequency"
        }
        else if "`clabel'"=="" {
        label var `new`i'' `"(`fcn`i'') `lab`i''"'
        }
        else label var `new`i'' `"`fcn`i'' of `lab`i''"'
        format `new`i'' `fmt`i''
        local ++i
        }
        while `i'<=`n' {
        _`fcn`i'' `new`i'' `use`i'' `sortpreserve' `"`weight'"' `"`w'"' `"`by'"'
        capture drop `drp`i''
        if `"`fcn`i''"' == "percent" & "`lab`i''" == "_percent" {
        label var `new`i'' "percent"
        }
        else if `"`fcn`i''"' == "count" & "`lab`i''" == "_freq" {
        label var `new`i'' "frequency"
        }
        else if "`clabel'"=="" {
        label var `new`i'' `"(`fcn`i'') `lab`i''"'
        }
        else label var `new`i'' `"`fcn`i'' of `lab`i''"'
        format `new`i'' `fmt`i''
        local ++i
        }
        }
        ------------------------------------------------------------- end collapse ---
        merge 1:1 episodeid babyepisodeid using `merged', nogenerate
        save `merged', replace
        }
        r(109);

        Comment


        • #19
          Yes you'd need to include those variables in both lines.
          However, you need to specify the correct options for your collapse command with each of the variables.

          Right now, where it says:
          Code:
          collapse (max) `item'_no `listofthings' , by(episodeid babyepisodeid)
          It keeps only one line for each (episodeid babyepisodeid), with the maximum value for each of the `item'_no `listofthings' variables.

          For your other variables, you might want to specify different options. See https://www.stata.com/manuals13/dcollapse.pdf
          I can't really judge from your variable descriptions, but in some cases you might want collapse to take the mean, or something else.
          Some variables, I presume, would be string variable,s and collapse cannot take a max, min, or mean for such variables. In these cases, you could specify first non-missing value, e.g.:

          Code:
          collapse (max) `item'_no `listofthings' (firstnm) some_string_var other_string_var, by(episodeid babyepisodeid)
          If you think it's useful to keep those lists in macros, you could split them into a numeric 'keeplist' and a string variable keeplist, and do e.g:
          Code:
          collapse (max) `item'_no `listofthings' `VLISTKEEPNUMERIC' (firstnm) `VLISTKEEPSTRING', by(episodeid babyepisodeid)
          And include both lists in your line with keep.

          Comment


          • #20
            How would I name the new binary variables with a prefix that is equivalent to the original variable name, ie `item' in the macro? For example continuing our original example, a new binary variable would be "procedure-appendectomy" rather than just "appendectomy"?

            Comment


            • #21
              The best place is to add these to your procedure (or other variables) names before you make them variable names. See below.



              Code:
              **define list of variables that need to be split here 
              local VLISTSTRING "procsops obstetriccomplications" // quick for testing
              local VLISTKEEP "plurality gravidity parity babynum placeofbirth preflanguage postcode"
              
              foreach item in `VLISTSTRING'{
                  use `original', clear
                  *splitting your lists of procedures
                  split `item', parse(`=char(10)')
                  drop `item'
                  keep episodeid babyepisodeid `VLISTKEEP' `item'*
                  reshape long `item', i(episodeid babyepisodeid) j(`item'_no)
                  drop if `item' ==""
              
                  *making your binary variables:
                  replace `item'="`item'_"+`item'
                  replace `item' =strtoname(`item')
              
                  levelsof `item', local(listofthings)
                  foreach thing of local listofthings{
                      gen `thing'=0
                      replace `thing'=1 if `item'=="`thing'"
                  }
                  collapse (max) `item'_no `listofthings' `VLISTKEEP', by(episodeid babyepisodeid)
                  merge 1:1 episodeid babyepisodeid using `merged', nogenerate
                  save `merged', replace
              }
              use `merged', clear
              Note, however, that Stata names can only be 32 characters long. You'll have issues with variables like obstetriccomplications when you do strtoname.
              You could take the first four letters of those codes and do:
              Code:
                  *making your binary variables:
                  replace `item'=substr("`item'",1,4)+"_"+`item'
                  replace `item' =strtoname(`item')
              If you would then also want to keep full variable names, you could stick these full names in the variable label.
              Because you run collapse, and collapse replaces labels, you would need to follow this guide to retain those lables: https://www.stata.com/support/faqs/d...with-collapse/

              Comment


              • #22
                Thanks again Jorrit!

                Comment

                Working...
                X