Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • A more parsimonious way of extracting the xvars from the e(cmdline)

    Hi All,

    I am trying to extract only the covariates ("xvars") from the e(cmdline) of a previously estimated logit model (although this should generalize to any regression model)..

    I have written some code below that does the job, but I am guessing that it can be written in a more parsimonious manner -- especially as it evaluates whether there exists [if][in] qualifiers...

    Any help is appreciated!

    Ariel

    Code:
    webuse lbw
    logit low age lwt i.race smoke if ftv <3
    
    local xvars = subinstr("`e(cmdline)'","`e(cmd)' `e(depvar)'","",1)
    local cntif = strpos("`xvars'", "if ") - 1
    local cntin = strpos("`xvars'", "in ") - 1        
    
    if `cntif' > 0 {
       local xvars = substr("`xvars'",1,`cntif')
    }
    else if `cntin' > 0 {
       local xvars = substr("`xvars'",1,`cntin')
    }
    else {
        local xvars = "`xvars'"
    }
    di "`xvars'"
    age lwt i.race smoke
    Code:
    webuse lbw
    logit low age lwt i.race smoke in 1/150
    
    local xvars = subinstr("`e(cmdline)'","`e(cmd)' `e(depvar)'","",1)
    local cntif = strpos("`xvars'", "if ") - 1
    local cntin = strpos("`xvars'", "in ") - 1        
    
    if `cntif' > 0 {
       local xvars = substr("`xvars'",1,`cntif')
    }
    else if `cntin' > 0 {
       local xvars = substr("`xvars'",1,`cntin')
    }
    else {
        local xvars = "`xvars'"
    }
    di "`xvars'"
    age lwt i.race smoke
    Code:
    webuse lbw
    logit low age lwt i.race smoke
    
    local xvars = subinstr("`e(cmdline)'","`e(cmd)' `e(depvar)'","",1)
    local cntif = strpos("`xvars'", "if ") - 1
    local cntin = strpos("`xvars'", "in ") - 1        
    
    if `cntif' > 0 {
       local xvars = substr("`xvars'",1,`cntif')
    }
    else if `cntin' > 0 {
       local xvars = substr("`xvars'",1,`cntin')
    }
    else {
        local xvars = "`xvars'"
    }
    di "`xvars'"
    age lwt i.race smoke

  • #2
    Originally posted by Ariel Linden View Post
    . . . especially as it evaluates whether there exists [if][in] qualifiers...
    I recommend avoiding working from the command line returned macro, because the command line can contain more than just if and in qualifiers. Maybe start from the coefficient vector, something like the following.
    Code:
    version 18.0
    
    clear *
    
    quietly sysuse auto
    
    // seedem
    set seed 846471640
    quietly replace rep78 = runiformint(r(min), r(max)) if mi(rep78)
    
    generate byte wgt = 1
    
    *
    * Begin here
    *
    quietly logit foreign c.mpg i.rep78 if 1 == 1 in 1/74 [fweight=wgt], nolog
    // Contains all sorts of stuff beyond if/in
    display in smcl as text e(cmdline)
    
    // Try this
    local predictors : colvarlist e(b)
    foreach predictor of local predictors {
        local point = strpos("`predictor'", ".")
        if `point' local parsed_predictors `parsed_predictors' `=substr("`predictor'", `point' + 1, .)'
        else local parsed_predictors `parsed_predictors' `predictor'
    }
    local parsed_predictors : list uniq parsed_predictors
    
    display in smcl as result "`parsed_predictors'"
    
    exit
    The resulting macro will include the _cons, but you can easily remove that, if desired.

    Comment


    • #3
      Thank you, Joseph!

      This is a great idea. The only problem here is that the factor and continuous variable operators are stripped when the code extracts the unique variable name. Also, in the case of an interaction term, it misses the second operator (try for example the following command line):

      Code:
      logit foreign c.mpg##c.headroom i.rep78 if 1 == 1 in 1/74 [fweight=wgt], nolog
      Ultimately, I am using these "xvars" in a new regression, so they have to be specified exactly as in the original.

      Thank you!

      Ariel


      Comment


      • #4
        As an added clarification to my statement:
        Ultimately, I am using these "xvars" in a new regression, so they have to be specified exactly as in the original.
        I use the e(sample) following the estimation to limit the sample accordingly, therefore while extracting the "xvars", I don't have to parse out the [if][in] part of the command line.

        Comment


        • #5
          I think this should do it. Report if you find cases where it fails.

          Code:
          webuse lbw, clear
          logit low age lwt i.race smoke if ftv <3
          local wanted= ustrregexra(ustrregexra("`e(cmdline)'", "\b`e(cmd)'\b", ""), "(.*)\s(\bif\b|\bin\b)\s(.*)$", "$1")
          di "`wanted'"
          Res.:

          Code:
          . di "`wanted'"
           low age lwt i.race smoke
          Last edited by Andrew Musau; 14 Apr 2024, 12:55.

          Comment


          • #6
            Originally posted by Andrew Musau View Post
            I think this should do it. Report if you find cases where it fails.]
            How about

            Code:
            webuse lbw, clear
            
            rename smoke logit // <- new
            
            logit low age lwt i.race logit if ftv <3
            local wanted= ustrregexra(ustrregexra("`e(cmdline)'", "\b`e(cmd)'\b", ""), "(.*)\s(\bif\b|\bin\b)\s(.*)$", "$1")
            di "`wanted'"


            I would [Edit: not necessarily] stick with Joseph's approach. If you don't want the factor operators stripped, just stop at

            Code:
            local xvars : colvarlist e(b)
            You will then only have to remove the constant, which should be the last element in most cases and always be named _cons. For running the regression in #3, it does not matter whether you type

            Code:
            logit foreign c.mpg##c.mpg i.rep78 if 1 == 1 in 1/74 [fweight=wgt], nolog
            or the expanded factor variable list

            Code:
            logit foreign mpg c.mpg#c.mpg 1b.rep78 2o.rep78 3.rep78 4.rep78 5o.rep78 if e(sample)

            Edit:

            By the way, it does matter whether or not you omit weights if weights are not equal to 1. You might also run into problems with svy or other prefixes. So perhaps, using e(cmdline) is not the worst option. However, if you just want to run another (different) regression, why not simply strip the original estimation command (and perhaps the dependent variable) and leave the remainder of e(cmdline) unchanged?
            Last edited by daniel klein; 14 Apr 2024, 13:18.

            Comment


            • #7
              Hi Andrew!

              A couple of things:

              first, this does not appear to properly handle the case where both [if] and [in] qualifiers are present. In the example below, it doesn't catch the [in], but it does catch the [if] qualifier

              Code:
              logit low age lwt i.race smoke in 1/189 if ftv <3
              
              local wanted= ustrregexra(ustrregexra("`e(cmdline)'", "\b`e(cmd)'\b", ""), "(.*)\s(\bif\b|\bin\b)\s(.*)$", "$1")
              di "`wanted'"
              result:
              low age lwt i.race smoke in 1/189
              second, how do I get this to exclude the depvar? In my original code above (and here below), I had the subinstr() search for "`e(cmd)' `e(depvar)'" , which took out both the cmd and depvar. I can't figure out how to apply this in your line of code?

              Code:
              local xvars = subinstr("`e(cmdline)'","`e(cmd)' `e(depvar)'","",1)
              I'll continue to try and find other scenarios where this may break down!

              Thanks!

              Ariel

              Comment


              • #8
                The original code will misclassify variables that contain the sequence in as an in qualifier.

                Code:
                . display strpos("inefficient","in")
                1
                
                . display strpos("thin","in")
                3
                Edit:

                OK, I see that you have the trailing space. That does still not catch my second example. The approach might also fail if the if qualifier is followed immediately by a parenthesized expression as in

                Code:
                summarize mpg if(foreign==1)
                Last edited by daniel klein; 14 Apr 2024, 13:57.

                Comment


                • #9
                  Hi Daniel,

                  The original code has a space after the if and in, so it shouldn't misclassify strings that start with if or in

                  Code:
                  local cntif = strpos("`xvars'", "if ") - 1
                  local cntif = strpos("`xvars'", "in ") - 1
                  Code:
                  . display strpos("inefficient","in ")
                  0
                  
                  . display strpos("thin","in ")
                  0

                  Comment


                  • #10
                    Just to be clear on the in:

                    Code:
                    logit depvar thin and other vars
                    will fail.

                    Depending on how you want to handle prefixes, weights, non-standard syntax, such as sem, suest, etc., the coefficient vector, e(b), might really be your safest option.

                    Comment


                    • #11
                      Hi Daniel,

                      I just now saw your previous response #6. The command I am writing does not take weights, so it will have an error if it detects a weight was used in the estimation model. I suppose I can look for a prefix as well and have that error too.

                      The reason why I want ONLY the xvars and not the entire command line is that I will be replacing the depvar with another depvar and making some additional modifications before estimating a new model. The only thing that remains the same in the original estimation model and the new model will be the xvars (as they were specified)...

                      I'll try to modify Joseph's code to get to where the xvars will work as they were specified in the original estimation model -- even with the variable operators specified separately.

                      Thanks!

                      Ariel

                      Comment


                      • #12
                        Originally posted by Ariel Linden View Post
                        first, this does not appear to properly handle the case where both [if] and [in] qualifiers are present. In the example below, it doesn't catch the [in], but it does catch the [if] qualifier
                        We can make it not rely on the order. Here are some cases that I think can arise:


                        Code:
                        webuse lbw, clear
                        
                        qui logit low age lwt i.race smoke
                        local wanted= ustrregexra(ustrregexra("`e(cmdline)'", "(\b`e(cmd)'\b|\b`e(depvar)'\b)", ""), "^(.+?)(\bif\b|\bin\b|\,)(.*)", "$1")
                        di "`wanted'"
                        
                        qui logit low age lwt i.race smoke, robust
                        local wanted= ustrregexra(ustrregexra("`e(cmdline)'", "(\b`e(cmd)'\b|\b`e(depvar)'\b)", ""), "^(.+?)(\bif\b|\bin\b|\,)(.*)", "$1")
                        di "`wanted'"
                        
                        qui logit low age lwt i.race smoke if ftv<3, robust
                        local wanted= ustrregexra(ustrregexra("`e(cmdline)'", "(\b`e(cmd)'\b|\b`e(depvar)'\b)", ""), "^(.+?)(\bif\b|\bin\b|\,)(.*)", "$1")
                        di "`wanted'"
                        
                        qui logit low age lwt i.race smoke in 1/180, robust
                        local wanted= ustrregexra(ustrregexra("`e(cmdline)'", "(\b`e(cmd)'\b|\b`e(depvar)'\b)", ""), "^(.+?)(\bif\b|\bin\b|\,)(.*)", "$1")
                        di "`wanted'"
                        
                        qui logit low age lwt i.race smoke if ftv<3 in 1/180, robust
                        local wanted= ustrregexra(ustrregexra("`e(cmdline)'", "(\b`e(cmd)'\b|\b`e(depvar)'\b)", ""), "^(.+?)(\bif\b|\bin\b|\,)(.*)", "$1")
                        di "`wanted'"
                        
                        qui logit low age lwt i.race smoke in 1/180 if ftv<3, robust
                        local wanted= ustrregexra(ustrregexra("`e(cmdline)'", "(\b`e(cmd)'\b|\b`e(depvar)'\b)", ""), "^(.+?)(\bif\b|\bin\b|\,)(.*)", "$1")
                        di "`wanted'"
                        Res.:

                        Code:
                        . 
                        . qui logit low age lwt i.race smoke
                        
                        . 
                        . local wanted= ustrregexra(ustrregexra("`e(cmdline)'", "(\b`e(cmd)'\b|\b`e(depvar)'\b)", ""), "^(.+?)(\bif\b|\bin\b|\,)(.*)", "$1")
                        
                        . 
                        . di "`wanted'"
                          age lwt i.race smoke
                        
                        . 
                        . 
                        . 
                        . qui logit low age lwt i.race smoke, robust
                        
                        . 
                        . local wanted= ustrregexra(ustrregexra("`e(cmdline)'", "(\b`e(cmd)'\b|\b`e(depvar)'\b)", ""), "^(.+?)(\bif\b|\bin\b|\,)(.*)", "$1")
                        
                        . 
                        . di "`wanted'"
                          age lwt i.race smoke
                        
                        . 
                        . 
                        . 
                        . qui logit low age lwt i.race smoke if ftv<3, robust
                        
                        . 
                        . local wanted= ustrregexra(ustrregexra("`e(cmdline)'", "(\b`e(cmd)'\b|\b`e(depvar)'\b)", ""), "^(.+?)(\bif\b|\bin\b|\,)(.*)", "$1")
                        
                        . 
                        . di "`wanted'"
                          age lwt i.race smoke 
                        
                        . 
                        . 
                        . 
                        . qui logit low age lwt i.race smoke in 1/180, robust
                        
                        . 
                        . local wanted= ustrregexra(ustrregexra("`e(cmdline)'", "(\b`e(cmd)'\b|\b`e(depvar)'\b)", ""), "^(.+?)(\bif\b|\bin\b|\,)(.*)", "$1")
                        
                        . 
                        . di "`wanted'"
                          age lwt i.race smoke 
                        
                        . 
                        . 
                        . 
                        . qui logit low age lwt i.race smoke if ftv<3 in 1/180, robust
                        
                        . 
                        . local wanted= ustrregexra(ustrregexra("`e(cmdline)'", "(\b`e(cmd)'\b|\b`e(depvar)'\b)", ""), "^(.+?)(\bif\b|\bin\b|\,)(.*)", "$1")
                        
                        . 
                        . di "`wanted'"
                          age lwt i.race smoke 
                        
                        . 
                        . 
                        . 
                        . qui logit low age lwt i.race smoke in 1/180 if ftv<3, robust
                        
                        . 
                        . local wanted= ustrregexra(ustrregexra("`e(cmdline)'", "(\b`e(cmd)'\b|\b`e(depvar)'\b)", ""), "^(.+?)(\bif\b|\bin\b|\,)(.*)", "$1")
                        
                        . 
                        . di "`wanted'"
                          age lwt i.race smoke

                        Comment


                        • #13
                          Thank you, Andrew!

                          I can't think of any other way of breaking this!

                          I will have an error for e(wtype) so I don't have to worry about a weight getting caught up with the xvars. I need to give some more thought to whether I should have an error for prefix (there is an e(prefix) macro that is returned).

                          I'll go with this code, unless somebody can find a scenario in which this doesn't work?

                          Thanks to Andrew, Joseph, and Daniel for all your thoughtful ideas!

                          Ariel

                          Comment


                          • #14
                            Daniel,

                            Just to close the loop on your #8

                            summarize mpg if(foreign==1)
                            Andrew's code catches the "if" even when it is followed immediately by something, such as in the following:

                            Code:
                            logit low age lwt i.race smoke if(ftv<3), robust
                                   
                            local wanted= ustrregexra(ustrregexra("`e(cmdline)'", "(\b`e(cmd)'\b|\b`e(depvar)'\b)", ""), "^(.+?)(\bif\b|\bin\b|\,)(.*)", "$1")
                            di "`wanted'"
                            Code:
                            age lwt i.race smoke

                            Comment


                            • #15
                              Just beware of depvar abbreviations. This will cause an error. But since we know that the second word of the command line is the depvar, we can use this instead. Command name abbreviations are fine.

                              Code:
                              sysuse auto, clear
                              qui reg mp weight turn
                              di "`e(cmdline)'"
                              di "`e(cmd)'"
                              di "`e(depvar)'"
                              di word("`e(cmdline)'", 2)
                              Res.:

                              Code:
                              . di "`e(cmdline)'"
                              regress mp weight turn
                              
                              .
                              . di "`e(cmd)'"
                              regress
                              
                              .
                              . di "`e(depvar)'"
                              mpg
                              
                              .
                              . di word("`e(cmdline)'", 1)
                              regress
                              
                              .
                              . di word("`e(cmdline)'", 2)
                              mp
                              Last edited by Andrew Musau; 14 Apr 2024, 16:24.

                              Comment

                              Working...
                              X