Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Macros and using _all and *

    Hello all,

    I am working on recoding the variables of various datasets based on their own properties. The basic rule is that if there is more than 5% missing information, then that missing information should be coded to missing (=.) instead of being allowed to remain in its original form (-99 -97).

    I've built the following program which works well until the last macro. It appears that Stata is recalling the first variable as * instead of each individual variable like it did in the first macro.


    local all_vars * (Trying to indicate all of the original variables in the dataset)
    foreach var of local all_vars {
    cap noi recode `var' (-99=1) (-97=1) (else=0), pre(missing_)
    }
    *
    foreach var of varlist missing_* {
    egen frac_`var'= mean(`var')
    }
    *
    foreach var of local all_vars { *This is where the code fails because it reads the first variable as “*” instead of the lsit of variables used earlier
    recode `var' (-99 -97=.) if frac_missing_`var'>.05 }
    }

    Any thoughts on how to avoid this error?

  • #2
    The problem is that the first statement (local all_vars *) is not actually creating a list of variable names. Take a look at the unab command:

    Code:
    unab all_vars: *
    ...

    Comment


    • #3
      Joe has explained the main point to your bug, but there is another.

      Putting some text into a macro only to take out the same text immediately afterwards is possibly entertaining, but it's usually pointless. Try to imagine e.g. getting some chocolate, putting it into a box and then taking out the chocolate immediately after. Why did you feel obliged to do that?

      Code:
      foreach v of var * {
      is good style. * is a varlist and foreach understands varlists, so you can go straight there.

      Comment


      • #4
        Nick,

        I was about the suggest the same thing, but I think the issue is that the user is creating additional variables in the second foreach loop, but wants to use the original set of variables for the third foreach loop.

        Regards,
        Joe

        Comment


        • #5
          Indeed. So, for the first problem the use of a local is pointless, and with the third problem in mind, you need unab as well. As I said, the main point was made by you, that unab is needed to get the syntax right. My secondary point was about style.

          Comment


          • #6

            I am working on recoding the variables of various datasets based on their own properties. The basic rule is that if there is more than 5% missing information, then that missing information should be coded to missing (=.) instead of being allowed to remain in its original form (-99 -97).
            I don't see the the utility of this exercise. Moreover retaining numeric values for missing variables is short-sighted: to exclude them from an analysis you will be forced to apply an if clause to many statements. Instead, use Stata's extended missing values, (help missing), which you can label.

            Code:
            sysuse auto, clear
            recode rep78 1 = .a 2 =.b
            label define  mvlabel  .a  "Did not Know"  .b "Refused"
            label values rep78 mvlabel
            tab rep78, missing
            If you want to indicate that a variable has too many missing values, attach a characteristic to that variable (help char).
            Last edited by Steve Samuels; 14 Apr 2014, 18:27.
            Steve Samuels
            Statistical Consulting
            [email protected]

            Stata 14.2

            Comment


            • #7
              Steve Samuels rightly draws attention to the key question of what the code is doing, as well as the question raised in the original post.

              The algorithm used as I understand it is

              foreach numeric variable {
              calculate the fraction of -99 and -97 values
              if the fraction > 0.05 recode such values to missing
              }

              On the face of it this is an absurd procedure. If -97 and -99 mean two flavours of missing, they have that meaning regardless of how often they occur. I don't think you could trust any results for the variables not recoded without access to the original data so that you could do your own corrections.

              Incidentally, mvdecode is a more convenient command, as it allows not only specifying different values that mean missing, but also a loop over variables.

              Comment


              • #8
                Hey everyone,

                Thanks for all the replies and insights into the code. I am a fairly new Stata user to so I often have trouble getting my code to work properly, particularly when using macros, and that may account for my absurd procedures!

                The main thing I wanted this code to do was go through a list of datasets and perform the procedure that Nick Cox outlined in the previous post. Essentially, there is a tracking study and some of the data recoding procedures changed at one point, and I need to observe said rules when cleaning the datasets so I can construct an aggregate file.

                I was having trouble specifying that the initial macro look at all variables because I was using local all_vars * and this was returning an asterisk at a later macro. The unab suggestion worked really well but one of my colleagues also suggested I use:

                describe *, varlist
                local all_vars "`r(varlist)'"

                This worked really well and the code is now up and running and going through all the 30+ datasets I need it to!

                Thank you everyone for the assistance and sorry for the delay in responding

                Comment

                Working...
                X