Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to write ado-files and the syntax command

    Dear Statalist

    I'm using version 16.1.

    I have used a bayesian method for estimation of diasease prevalence and the accuracy of diagnostic tests. The method is widely used (500+ citations), however I did not find any standard STATA command or user-written commands to apply the method in my favorite stitiscal software, STATA. I have therefore written it myself using program define in a do-file. The code works well and produce correct results.

    Now I want to write the program more generic, save it as an ado-file, and ultimately make it availble for others as a user-written command. I have followed the STATA PDF manual om programming.

    I would like the following syntax (I call the command nogoldone):

    nogoldone a b alpha_prev beta_prev alpha_sn beta_sn alpha_sp beta_sp , sims() burnin() graph detail bdpar

    where
    a b are numerical arguments restricted to contain an integer
    alpha_prev beta_prev alpha_sn beta_sn alpha_sp beta_sp numerical arguments - if possible with default values = 1
    sims() burnin() are optional options with the default values = 25000 and 5000, respectively
    graph detail and bdpar are optional options

    I have read the manual on the syntax command, but can't find out how to do this.
    y best guees so far have been something like:

    Currently in the program I just do it like this, i.e. calls the numerical arguments from `1' [...] `8' and call the options sims() and burnin() from ´9' and `10' .

    Code:
    program define nogoldone , rclass
    
    version 16
    
    quietly {
    preserve
    
    tempfile simudat
    
    tempvar y1 y2 π s c
    
    
    save "`simudat'", emptyok 
    use "`simudat'", clear
    
    set obs `9'
    
    tempname a b απ βπ αs βs αc βc  p_alpha p_beta s_alpha s_beta c_alpha c_beta
    
    sca define `a' = `1'
    sca define `b' = `2'
    scalar define  `απ' = `3'
    scalar define  `βπ' = `4'
    scalar define  `αs' = `5'
    scalar define  `βs' = `6'
    scalar define  `αc' = `7'
    scalar define  `βc' = `8'
    
    gen `y1' =.
    gen `y2'=.
    gen `π'=.
    gen `s' =.
    gen `c'=.
    
    label variable `π' "Prevalence"
    label variable  `s' "Sensitivity"
    label variable `c' "Specificity"
    
    replace `y1'= rbinomial(`a', 0.5) in 1 
    replace `y2'= rbinomial(`b', 0.5) in 1
    
    [some more code calling the scalars `απ'   `βπ' `αs'   `βs'   `αc' `βc' ]
    
    
    forvalues i= 2/`9'
    
    [some more code running simulations ]
     
    
    drop in 1/`10' 
    }
    
    some code generating output, graphs, detailed output and beta distrubution parameters as output
    
    end

    When I have the graph detail and bdpar options I will use it like this


    Code:
    if detail { 
    [some code generating detailed rather than simple estimates as output]
    }
    
    if graph { 
    [some code generating graphs as ouput]
    }
    
    if bdpar { 
    [some code generating estimates of beta distrubution parameters as output]
    }

    Hope you can guide me on how to
    1) write the syntax command
    2) call the arguments in the code

    Best regards
    Esben Eriksen
    University of Copenhagen

  • #2
    Your syntax is programmable using gettoken as well. I would look inside the code of so-called immediate commands, such as ttesti

    Code:
    viewsource ttesti.ado 

    Comment


    • #3
      I think that it might help to give a little real syntax from one of my commands to help you along here. I'm nowhere near as good as Nick Cox at this, but I think a minimal worked example might be in order. Note that the way everyone writes code is different, but I think mine happens to make programming a little more intuitive, or at least it does for me. Esben Eriksen


      If I understand your question well, you want to learn how to use syntax in your program so you don't need to keep calling `1' `2' and so on (trust me, I've been there). So, let's look at the first section of a command I'm working on. it starts with a simple subroutine.
      Code:
      cap prog drop scul // Drops previous iterations of the program
      
      *! SCUL v1.0.0, Jared Greathouse, 1/2/22
      prog scul
          graph close _all
          graph drop _all
          
      /**********************************************************
          * Installation*
      Installs relevant commands needed.
      **********************************************************/
          loc package st0594 gr0034 dm0042_3
          
          foreach x of loc package { // begin foreach
      
              qui: cap which cvlasso
          
                  if _rc { // if command is missing
      
                  qui: net inst `x'.pkg, replace
              
                  } // ends if
          } // ends foreach
          
          loc comm gtools labvars
          
          foreach x of loc comm { // begin foreach
      
              qui: cap which `x'
          
                  if _rc { // if command is missing
      
                  qui: ssc inst `x', replace
              
                  } // ends if
          } // ends foreach
      
          
          cap set scheme black_tableau
          
          if _rc {
              
              qui ssc inst schemepack, replace
          }    
      
      /**********************************************************
      
          
          
          * Preliminaries*
      
      
      If the data aren't a balanced panel, something about the
      user's dataset ain't right.
      **********************************************************/
      
      
      cap qui: xtset
      if _rc {
          
          disp as err "Data isn't xtset"
          exit 498
      }
      gl time: disp "`r(timevar)'"
      
      gl panel: disp "`r(panelvar)'"
      
      marksample touse
      
      _xtstrbal $panel $time  `touse'
      
      if _rc {
          
          disp as err "Data is not balanced."
          exit 498
      }
      
      
          syntax anything, ///
              [TReated(varname)] /// We need a treatment variable as 0 1
              ahead(numlist min=1 max=1 >=1 int) /// Number of forecasting periods.
              trdate(numlist min=1 max=1 >=1 int) /// Give the date of treatment
              trunit(numlist min=1 max=1 >=1 int) /// Which unit was treated? Relevant only for single-intervention studies
              [PLAcebos] /// Conducts iterative assignment of the intervention at time t
              LAMBda(string) ///
      [COVs(varlist)]
              
      gettoken depvar anything: anything
      
      local y_lab: variable label `depvar'
      
      gl outlab: disp "`y_lab'" // Grabs the label of our outcome variable
      
              
      /**********************************************************
      
          
          
          * Pre-Processing*
      
      
      Assuming the user doesn't want placebo tests and hasn't specified
      the multiple option, I presume they want the single-intervention
      design. We break the command into two stages: data validation
      and estimation.
      **********************************************************/
      
      if "`placebos'" != "placebos" { // thus......
      
      preserve // Keep the primary long dataset the exact same
      
      numcheck, unit($panel) time($time) depvar(`depvar') // Routine 1
      Okay so let's see what we've got. We see my first routine where I check the validity of the panel variables as well as the outcome. Note that the numcheck command we see here is no different than any other Stata command. And I write the code for it below, using the syntax command. Here's the syntax for that routine.
      Code:
      cap prog drop numcheck // Subroutine 1.1
      prog numcheck
      // Original Data checking
      syntax, unit(varname) time(varname) depvar(varname)
          
              
      /*#########################################################
      
          * Section 1.1: Extract panel vars
      
          Before SCM can be done, we need panel data.
          
          
          Along with the R package, I'm checking that
          our main vairables of interest, that is,
          our panel variables and outcomes are all:
          
          a) Numeric
          b) Non-missing and
          c) Non-Constant
          
      *########################################################*/
      
      di as txt "{hline}"
      di as txt "Algorithm: Synthetic LASSO"
      di as txt "{hline}"
      di as txt "First Step: Data Setup"
      di as txt "{hline}"
      di as txt "Checking that setup variables make sense."
      
      tempvar obs_count
      qbys `unit' (`time'): g `obs_count' = _N
      qui su `obs_count'
      
      qui drop if `obs_count' < `r(max)'
      
      /*The panel should be balanced, but in case it isn't somehow, we drop any variable
      without the maximum number of observations (unbalanced) */
      
      
          foreach v of var `unit' `time' `depvar' {
          cap {    
              conf numeric v `v', ex // Numeric?
              
              as !mi(`v') // Not missing?
              
              qui: su `v'
              
              as r(sd) ~= 0 // Does the unit ID change?
          }
          }
          if !_rc {
              
              
              di as res "Setup successful!! All variables `unit' (ID), `time' (Time) and `depvar' (Outcome) pass."
              di as txt ""
              di as res "All are numeric, not missing and non-constant."
          }
          
          else if _rc {
              
              
              
              disp as err "All variables `unit' (ID), `time' (Time) and `depvar' must be numeric, not missing and non-constant."
              exit 498
          }
          
      
      end
      Okay, notice a few things here. I Include the exact same syntactic structures here. The panel variable, time variable and outcome variable are denoted in the exact same way, the only difference here is that they've been replaced with varnames. Stata here will check for the panel and time variables from the global macros I defined above. It also extracts the outcome variable I used from gettoken above, which we also see is a varname. Note that Stata will spit out an error if we've misspelled any of the variables or the variable doesn't exist.

      This is a small example, but it's quite useful to think about it in this way. It shows you that the rules of ado programming are in principle and in practice the exact same thing as writing a regular do file, the only difference is that you're replacing your normal dataset variables with local macros and so on and so forth. This may be a weird analogy, but it's a little like riding a bike or driving. At first it can be a fearsome experience where you fall or (hopefully not!) hit inanimate objects, but once you learn and learn it well, you'll wonder why you weren't writing your own syntax before. A few more comments might be in order: When you're writing syntax, it'll likely be hundreds or thousands of lines long. This is because the stuff you're doing likely needs to be robust to a lot of situations and possibilities. You're writing a do-file that needs to be generalized to lots of other situations. So, given this, if you don't do anything else, I would advise two things: One, use subroutines as I do. It'll help you organize your thoughts better, as well as break down complicated tasks into easier to understand, bite-sized tasks.

      Sort of related to this is use comments. I use them profusely, and I don't use them enough. When you're writing thousands of lines of code, it's quite easy to become lost quickly. Not every line needs a comment, but be sure to label your files in terms of sections, tests and related matters. You can easily edit your files with the user-written ado-edit, which opens your program (any program) in a do-file editor.

      Whatever you do, find your own style and be organized.

      Comment


      • #4
        Really appreciate your taking the time to share your ado file structuring with detailed thoughts, Jared Greathouse. There are so many aspects I could use from this. Was just scratching my head how to code installation of ssc packages and other user defined programs if they are missing, and your post totally answered how to code it in.

        Comment


        • #5
          I would like the following syntax (I call the command nogoldone):
          nogoldone a b alpha_prev beta_prev alpha_sn beta_sn alpha_sp beta_sp , sims() burnin() graph detail bdpar
          which is not following standard syntax (-help language-):
          Code:
             [prefix :] command [varlist] [=exp] [if] [in] [weight] [using filename] [, options]
          The proposed syntax seems similar to immediate commands (-help immed-), ref #2. But, immediate commands typically have few arguments, like -iri-
          Code:
          iri #a #b #N1 #N2 [, iri_options]
          The first lines of iri.ado is:
          Code:
              gettoken a  0 : 0, parse(" ,")
              gettoken b  0 : 0, parse(" ,")
              gettoken n1 0 : 0, parse(" ,")
              gettoken n0 0 : 0, parse(" ,")
          
              confirm integer number `a'
              confirm integer number `b'
              confirm number `n1'
              confirm number `n0'
          
              if `a'&lt;0 | `b'&lt;0 | `n1'&lt;0 | `n0'&lt;0 {
                  di in red "negative numbers invalid"
                  exit 498
              }
          
              syntax [,          ///
                    Level(cilevel) ///
                    COL(string)      ///
                    ROW(string)      ///
                    ROW2(string)   ///
                    TB          /// undocumented
                    MIDP          ///
                    EXACT      ///
                    STIR         /// undocumented
                     ]
          If follwoing the above: Try to reduce number of unnamed arguments moving most to options.

          An alternative implementation may be to make some slightly non-standard syntax using the syntax command keyword anything see -help syntax##description_of_anything- One example:
          Code:
          prog define testITi, rclass
          
          local 0 = subinstr("`0'", "=", "==", .) // avoid syntax error on "="
          
          syntax anything,                     /// non-std syntax to be parsed again below
               graph                           /// some description
               detail                          /// some description
               bdpar                           /// some description
               [sims(integer 2500)]            /// some description
          
          local 0 = ","+ustrregexra("`anything'",            /// non-std syntax
                                    "\b(.+?)(==)(\d+?)\b",   /// match non-std syntax
                                    "$1($3)"                 /// std opt syntax
                                   )
          
          syntax,                      /// parse `anything' modified to std opt syntax
               a(integer)              /// some description
               b(integer)              /// some description
               [alpha_prev(real 1)]    /// some description
          
          * use args
          
          di"`a'"
          di"`b'"
          di"`graph' `detail' `bdpar' sims=`sims' alpha_prev=`alpha_prev' "
          
          end 
          
          testITi a=12 b=34 , graph detail bdpar
          Code:
          12
          34
          graph detail bdpar sims=2500 alpha_prev=1

          Comment

          Working...
          X