Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Dropping variables not working

    Hi all,

    I have a dataset. Each variable is in a matched set of "three" with a grade, cat, and adevent... which have the same numberings (like cat1, grade1, adevent1) and endings (like _scr or _fu1)


    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input byte(cat1_scr grade1_scr) str2 adevent1_scr byte(cat2_scr grade2_scr) str2 adevent2_scr byte(cat6_wk4 grade6_wk4) str24 adevent6_wk4 byte(cat7_maint1 grade7_maint1) str43 adevent7_maint1 byte(cat8_maint1 grade8_maint1) str43 adevent8_maint1
    0 0 "" 0 0 "" 0 0 "" 0 0 "" 0 0 ""
    0 1 "" 0 0 "" 0 0 "" 0 0 "" 0 0 ""
    0 0 "" 0 0 "" 0 0 "" 0 0 "" 0 0 ""
    0 0 "" 0 0 "" 0 0 "" 0 0 "" 0 0 ""
    0 0 "" 0 0 "" 0 0 "" 0 0 "" 0 1 ""
    0 0 "" 0 0 "" 0 0 "" 0 0 "" 0 0 ""
    end
    label values cat1_scr cat_
    label values cat2_scr cat_
    label values grade1_scr grade_
    label values grade2_scr grade_
    label def grade_ 0 "0", modify
    label def grade_ 1 "1", modify
    label values cat6_wk4 cat4_
    label values grade6_wk4 grade4_
    label def grade4_ 0 "0", modify
    label values cat7_maint1 cat7_
    label values cat8_maint1 cat7_
    label values grade7_maint1 grade7_
    label values grade8_maint1 grade7_
    label def grade7_ 0 "0", modify
    label def grade7_ 1 "1", modify
    I'm trying to drop all three matching variables every time all the grades for that group are 0. My approach has been extracting the necessary parts of the grade string to create matching adevent and cat strings.

    Code:
    * drop matching cat and adevent if all grades == 0 
    qui desc
    local c = r(N)
    foreach var of varlist grade* {
         quietly count if `var' == 0 
         * get part of string after and including the "_" (e.g. _wk1)
         local when = substr("`var'", strpos("`var'", "_") , .)
    
         * get part of string before the "_" (e.g., grade7)
         local p = substr("`var'", 1, strpos("`var'", "_") - 1) 
         * and get just the numeric value out of it 
         local val = ustrregexra("`p'","[^0-9]+","")
     
         * drop grade, cat, and adevent of same 'type' if 'grade' is all 0s
         if r(N) == 6 drop `var'
         if r(N) == 6 drop  adevent`val'`when'
         if r(N) == 6 drop  cat`val'`when'
    }
    All the code works except for the last two lines:

    if r(N) == 6 drop adevent`val'`when'
    if r(N) == 6 drop cat`val'`when'

    Howver, the grade part will drop... but the associated cat and adevent parts refuse to drop.

    Any suggestions how to correct this?




  • #2
    Solved it finally...needed to put count in a local to avoid it nulling for second two parts.
    qui count if `var' == 0
    local ct = r(N)

    Comment


    • #3
      You can drop variables (columns) you don't want OR you can drop observations (rows) you don't want. Your question is in terms of observations sharing values but your syntax is in terms of variables being dropped. It's hard for me to follow why your syntax is being said to have worked even partially.

      Also you use the term variable to refer to a batch of three values in three Stata variables. Naturally, you're entitled to your own terminology but Stata questions are hard to answer if you don't use Stata terminology.

      I suspect you need a quite different Stata data structure to work well, but as I don't understand your data I am not well placed to suggest anything beyond reshape long somehow.



      Comment


      • #4
        Hi Nick,
        Yes, apolgies for the terminology.
        Strange the code doesn't work for you though.
        I loaded the dataex and ran the following and it worked properly: Keeping the grade1 and grade8 and associated columns and removing the rest.


        Code:
        * drop matching cat and adevent if all grades == 0 
        qui desc
        local c = r(N)
        foreach var of varlist grade* {
             qui count if `var' == 0 
             local ct = r(N)
             * get part of string after and including the "_" (e.g. _wk1)
             local when = substr("`var'", strpos("`var'", "_") , .)
        
             * get part of string before the "_" (e.g., grade7)
             local p = substr("`var'", 1, strpos("`var'", "_") - 1) 
             * and get just the numeric value out of it 
             local val = ustrregexra("`p'","[^0-9]+","")
         
             * drop grade, cat, and adevent of same 'type' if 'grade' is all 0s
        
             if `ct' == 6 drop `var'
             if `ct' == 6 drop  adevent`val'`when'
             if `ct' == 6 drop  cat`val'`when'
        }




        Comment


        • #5
          Sorry to be so slow grasping what you wanted.

          Note that findname from the Stata Journal allows finding variables that are constant (and in this case all zero)

          Code:
          findname, all(@ == 0)
          I was also slow to see why 6 was the magic constant in the last few lines of code, but naturally it is just the number of observations in the data example. The more general comparison would be of r(N) with _N or c(N) (both of which are defined as the number of observations independently of describe.)

          Comment


          • #6
            Due to my bad explanation I'm sure.

            Regarding findname... interesting command that I may try to use in future, but not sure it fits the bill here as I'm interested specifically in variables starting with grade* ... and splitting off the substrings to drop matching (ie, in the trio) numeric and string variables. As I only use 1 line to limit the loop to grade* variables and one to evaluate the grade count.... I can't see how findname would be more expedient or clearer unless I am missing something.

            I've changed my hard-coding of 6 to `c' now that I've finished testing, but, yes good spot!





            Comment


            • #7
              You can specify findname with a variable list. But the bigger point was just -- if you want to find variables that are constant, that's a tool you can use.

              Comment

              Working...
              X