Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Parsing a user-given command to add an "if touse"

    I've recently had occasion to write a program using the colon operator, where as is typical, an estimation command comes after the colon. I'm seeking advice about a less clumsy way to put an "if touse" command in place of any if/in, so as to be able to modify the touse. The context is that I want to walk through the sample, repeatedly applying the estimation command to the dataset and dropping one case at a time so as to implement a brute-force analysis of influential observations for commands for which dfbeta and so forth don't apply.)

    Here's my clumsy way to extract the estimation command, chop off any options on it, process the first part of that command so as to replace any if/in with a "if touse, " and put the options back on the estimation command. The way I've done this works, but seems clumsy, and I think I'd benefit more generally from getting advice about intelligent parsing of syntax. Here's what I have done
    Code:
    // My program has a syntax like this-- MyProg, option1 option2: EstimationCommand
    // Extract and work on the user's estimation command
    capture _on_colon_parse `0'
    local estimcmd = `"`s(after)'
    // Get part of estimation command before options start at comma
    local pos = strpos("`estimcmd'", ",")
    if ((`pos') > 0 ) {    // options exist on the estimation command, so put them aside for the moment
       local commaafter= substr("`estimcmd'", `pos', .)
       local estimcmd = substr( "`estimcmd'", 1, `pos'-1)  // just the command
    }
    // Make a touse variable
    local 0 `estimcmd'
    syntax anything [if] [in]
    marksample touse  // record existing if/in
    //
    // Replace if/in material in estimation command string with an "if touse" clause.  
    local estimcmd = subinstr("`estimcmd'", "`in'", "", .)  // strip out in
    local estimcmd = subinstr("`estimcmd'", "`if'", "", .)  // strip out "if" material
    local estimcmd = "`estimcmd'" + " if " + "\`touse'"  +  "`commaafter'"  // touse and options onto the estimation command
    Having done the preceding, I can then easily and efficiently loop through the sample, modifying the `touse' variable at each iteration so as to drop each observation in turn while executing the estimation command. Besides better parsing approaches, I'd also be interested in other strategies for the larger problem.

    I understand that there are syntax features for capturing options and so forth, but the documentation is distinctly thin there. The maneuvers I see in the code for the built-in colon commands are much fancier than I'm ready to handle at this point.

    Thanks for taking a look at this.





  • #2
    Mike

    I am not really into statistics that identify influential observations, but it seems that, e.g., Cook's distance measure is available for any estimation performed by glm. I wonder whether you would really need a brute-force approach, actually running the estimation command on different samples. I also have no clear idea what exactly you want to do with the results you would get from this.

    Anyway, sticking with your brute-force approach, I would probably do something similar to your suggestion. I would implement the details a bit differently, though: set up a temporary variable in your main program, then have small subcommand deal with the disassembling and reassembling the estimation command, so you do not need to fiddle around with those string functions.

    Code:
    program mycommand
        version 15
        
        _on_colon_parse `0'
        
        tempvar touse
        mark_touse `touse' `s(after)'
    end
    
    program mark_touse , sclass
        gettoken touse 0 : 0
        syntax                                 ///
            [ anything ]                       ///
            [ if ] [ in ]                      ///
            [ fweight aweight pweight iweight] ///
            [ , * ]
        
        if (`"`weight'"' != "") local weight [ `weight' `exp' ]
        mark `touse' `if' `in' `weight'
        
        if (`"`options'"' != "") local comma ","
        sreturn local after `anything' if `touse' `weight' `comma' `options'
    end
    Note that this approach respects weights. Note also that missing values in any of the variables will not affect the marked sample; to do that, you would need to parse the estimation commands variable list. Also, the approach will fail for non-standard syntax estimation commands, such as mixed and the me suit of commands.

    Best
    Daniel
    Last edited by daniel klein; 13 Feb 2019, 04:44.

    Comment


    • #3
      To Daniel's substantive advice let add the following: have you taken a look at how bootstrap.ado resolves similar problems?

      Comment


      • #4
        Thanks to both of you.

        Aha, I hadn't realized that Cook's distance is available for any GLM. I'll have to look at this. And, I hadn't known of the construct "[, *]" (I do think I have a legitimate beef with the help for the syntax command: Nowhere that I could find does it state the simplest thing, namely something like: "The syntax command always applies to the contents of the local macro 0, whether or not that macro contains the original command line or has been constructed in some other way." ) I was leaning toward the idea of using an sclass program, so it's helpful to be encouraged in that direction.

        Regarding using bootstrap.ado, yes, I looked at that, and extracted some ideas from it, but I wanted to start with simpler stuff. The syntax parsing in bootstrap and other commands is well beyond what I understand--I wanted to start by walking before I tried to run <grin>.

        Comment


        • #5
          For anyone who's interested in the problem for which I was trying to create a program, as opposed to the programming problem: I just noticed a recent post by Steve Samuels in which he pointed to a Stata blog entry describing how to use -jackknife- to implement a versatile "drop one observation" analysis.
          https://blog.stata.com/2014/05/08/us...ential-points/



          Comment


          • #6
            #4 Some sympathies, but whatever is documented in [P] basically presupposes that you have mastered [U] 18 first.

            The question of where to start is always difficult. Several of my pieces in the Stata Journal were written so that I would understand something better once I had finished.

            Comment


            • #7
              An irrelevant question: I check in dictionary the meaning of the word touse, all dictionaries tell me its meaning is : a noisy disturbance. Why Stata use touse as markervar name? What's the origin of this programming usage?

              Comment


              • #8
                I guess you should read it as "to use"
                Best wishes

                (Stata 16.1 MP)

                Comment


                • #9
                  Indeed, the interpretation in #8 is strongly implied reading [P] mark:
                  Both [mark and markout] create a 0/1 to-use variable that records which observations are to be used in subsequent code.

                  Comment


                  • #10
                    Dear Felix Bittmann daniel klein thanks to both of you. How silly I am. However, I feel very glad. If I had not asked the question here, I would never know why.

                    Comment

                    Working...
                    X