How to write ado-files and the syntax command

Esben Eriksen

Join Date: Jan 2022

Posts: 4
#1

How to write ado-files and the syntax command

19 Jan 2022, 04:10

Dear Statalist

I'm using version 16.1.

I have used a bayesian method for estimation of diasease prevalence and the accuracy of diagnostic tests. The method is widely used (500+ citations), however I did not find any standard STATA command or user-written commands to apply the method in my favorite stitiscal software, STATA. I have therefore written it myself using program define in a do-file. The code works well and produce correct results.

Now I want to write the program more generic, save it as an ado-file, and ultimately make it availble for others as a user-written command. I have followed the STATA PDF manual om programming.

I would like the following syntax (I call the command nogoldone):

nogoldone a b alpha_prev beta_prev alpha_sn beta_sn alpha_sp beta_sp , sims() burnin() graph detail bdpar

where
a b are numerical arguments restricted to contain an integer
alpha_prev beta_prev alpha_sn beta_sn alpha_sp beta_sp numerical arguments - if possible with default values = 1
sims() burnin() are optional options with the default values = 25000 and 5000, respectively
graph detail and bdpar are optional options

I have read the manual on the syntax command, but can't find out how to do this.
y best guees so far have been something like:

Currently in the program I just do it like this, i.e. calls the numerical arguments from `1' [...] `8' and call the options sims() and burnin() from ´9' and `10' .

Code:

program define nogoldone , rclass version 16 quietly { preserve tempfile simudat tempvar y1 y2 π s c save "`simudat'", emptyok use "`simudat'", clear set obs `9' tempname a b απ βπ αs βs αc βc p_alpha p_beta s_alpha s_beta c_alpha c_beta sca define `a' = `1' sca define `b' = `2' scalar define `απ' = `3' scalar define `βπ' = `4' scalar define `αs' = `5' scalar define `βs' = `6' scalar define `αc' = `7' scalar define `βc' = `8' gen `y1' =. gen `y2'=. gen `π'=. gen `s' =. gen `c'=. label variable `π' "Prevalence" label variable `s' "Sensitivity" label variable `c' "Specificity" replace `y1'= rbinomial(`a', 0.5) in 1 replace `y2'= rbinomial(`b', 0.5) in 1 [some more code calling the scalars `απ' `βπ' `αs' `βs' `αc' `βc' ] forvalues i= 2/`9' [some more code running simulations ] drop in 1/`10' } some code generating output, graphs, detailed output and beta distrubution parameters as output end

When I have the graph detail and bdpar options I will use it like this

Code:

if detail { [some code generating detailed rather than simple estimates as output] } if graph { [some code generating graphs as ouput] } if bdpar { [some code generating estimates of beta distrubution parameters as output] }

Hope you can guide me on how to
1) write the syntax command
2) call the arguments in the code

Best regards
Esben Eriksen
University of Copenhagen
Tags: None
Nick Cox

Join Date: Mar 2014

Posts: 35432
#2

19 Jan 2022, 04:20

Your syntax is programmable using gettoken as well. I would look inside the code of so-called immediate commands, such as ttesti

Code:

viewsource ttesti.ado
Comment

Jared Greathouse

Join Date: Sep 2021
Posts: 2170

19 Jan 2022, 08:21

I think that it might help to give a little real syntax from one of my commands to help you along here. I'm nowhere near as good as Nick Cox at this, but I think a minimal worked example might be in order. Note that the way everyone writes code is different, but I think mine happens to make programming a little more intuitive, or at least it does for me. Esben Eriksen

If I understand your question well, you want to learn how to use syntax in your program so you don't need to keep calling `1' `2' and so on (trust me, I've been there). So, let's look at the first section of a command I'm working on. it starts with a simple subroutine.

Code:

cap prog drop scul // Drops previous iterations of the program

*! SCUL v1.0.0, Jared Greathouse, 1/2/22
prog scul
    graph close _all
    graph drop _all
    
/**********************************************************
    * Installation*
Installs relevant commands needed.
**********************************************************/
    loc package st0594 gr0034 dm0042_3
    
    foreach x of loc package { // begin foreach

        qui: cap which cvlasso
    
            if _rc { // if command is missing

            qui: net inst `x'.pkg, replace
        
            } // ends if
    } // ends foreach
    
    loc comm gtools labvars
    
    foreach x of loc comm { // begin foreach

        qui: cap which `x'
    
            if _rc { // if command is missing

            qui: ssc inst `x', replace
        
            } // ends if
    } // ends foreach

    
    cap set scheme black_tableau
    
    if _rc {
        
        qui ssc inst schemepack, replace
    }    

/**********************************************************

    
    
    * Preliminaries*


If the data aren't a balanced panel, something about the
user's dataset ain't right.
**********************************************************/


cap qui: xtset
if _rc {
    
    disp as err "Data isn't xtset"
    exit 498
}
gl time: disp "`r(timevar)'"

gl panel: disp "`r(panelvar)'"

marksample touse

_xtstrbal $panel $time  `touse'

if _rc {
    
    disp as err "Data is not balanced."
    exit 498
}


    syntax anything, ///
        [TReated(varname)] /// We need a treatment variable as 0 1
        ahead(numlist min=1 max=1 >=1 int) /// Number of forecasting periods.
        trdate(numlist min=1 max=1 >=1 int) /// Give the date of treatment
        trunit(numlist min=1 max=1 >=1 int) /// Which unit was treated? Relevant only for single-intervention studies
        [PLAcebos] /// Conducts iterative assignment of the intervention at time t
        LAMBda(string) ///
[COVs(varlist)]
        
gettoken depvar anything: anything

local y_lab: variable label `depvar'

gl outlab: disp "`y_lab'" // Grabs the label of our outcome variable

        
/**********************************************************

    
    
    * Pre-Processing*


Assuming the user doesn't want placebo tests and hasn't specified
the multiple option, I presume they want the single-intervention
design. We break the command into two stages: data validation
and estimation.
**********************************************************/

if "`placebos'" != "placebos" { // thus......

preserve // Keep the primary long dataset the exact same

numcheck, unit($panel) time($time) depvar(`depvar') // Routine 1

Okay so let's see what we've got. We see my first routine where I check the validity of the panel variables as well as the outcome. Note that the numcheck command we see here is no different than any other Stata command. And I write the code for it below, using the syntax command. Here's the syntax for that routine.

Code:

cap prog drop numcheck // Subroutine 1.1
prog numcheck
// Original Data checking
syntax, unit(varname) time(varname) depvar(varname)
    
        
/*#########################################################

    * Section 1.1: Extract panel vars

    Before SCM can be done, we need panel data.
    
    
    Along with the R package, I'm checking that
    our main vairables of interest, that is,
    our panel variables and outcomes are all:
    
    a) Numeric
    b) Non-missing and
    c) Non-Constant
    
*########################################################*/

di as txt "{hline}"
di as txt "Algorithm: Synthetic LASSO"
di as txt "{hline}"
di as txt "First Step: Data Setup"
di as txt "{hline}"
di as txt "Checking that setup variables make sense."

tempvar obs_count
qbys `unit' (`time'): g `obs_count' = _N
qui su `obs_count'

qui drop if `obs_count' < `r(max)'

/*The panel should be balanced, but in case it isn't somehow, we drop any variable
without the maximum number of observations (unbalanced) */


    foreach v of var `unit' `time' `depvar' {
    cap {    
        conf numeric v `v', ex // Numeric?
        
        as !mi(`v') // Not missing?
        
        qui: su `v'
        
        as r(sd) ~= 0 // Does the unit ID change?
    }
    }
    if !_rc {
        
        
        di as res "Setup successful!! All variables `unit' (ID), `time' (Time) and `depvar' (Outcome) pass."
        di as txt ""
        di as res "All are numeric, not missing and non-constant."
    }
    
    else if _rc {
        
        
        
        disp as err "All variables `unit' (ID), `time' (Time) and `depvar' must be numeric, not missing and non-constant."
        exit 498
    }
    

end

Okay, notice a few things here. I Include the exact same syntactic structures here. The panel variable, time variable and outcome variable are denoted in the exact same way, the only difference here is that they've been replaced with varnames. Stata here will check for the panel and time variables from the global macros I defined above. It also extracts the outcome variable I used from gettoken above, which we also see is a varname. Note that Stata will spit out an error if we've misspelled any of the variables or the variable doesn't exist.

This is a small example, but it's quite useful to think about it in this way. It shows you that the rules of ado programming are in principle and in practice the exact same thing as writing a regular do file, the only difference is that you're replacing your normal dataset variables with local macros and so on and so forth. This may be a weird analogy, but it's a little like riding a bike or driving. At first it can be a fearsome experience where you fall or (hopefully not!) hit inanimate objects, but once you learn and learn it well, you'll wonder why you weren't writing your own syntax before. A few more comments might be in order: When you're writing syntax, it'll likely be hundreds or thousands of lines long. This is because the stuff you're doing likely needs to be robust to a lot of situations and possibilities. You're writing a do-file that needs to be generalized to lots of other situations. So, given this, if you don't do anything else, I would advise two things: One, use subroutines as I do. It'll help you organize your thoughts better, as well as break down complicated tasks into easier to understand, bite-sized tasks.

Sort of related to this is use comments. I use them profusely, and I don't use them enough. When you're writing thousands of lines of code, it's quite easy to become lost quickly. Not every line needs a comment, but be sure to label your files in terms of sections, tests and related matters. You can easily edit your files with the user-written ado-edit, which opens your program (any program) in a do-file editor.

Whatever you do, find your own style and be organized.

Comment

Girish Venkataraman

Join Date: Dec 2021

Posts: 280
#4

24 Nov 2023, 13:22

Really appreciate your taking the time to share your ado file structuring with detailed thoughts, Jared Greathouse. There are so many aspects I could use from this. Was just scratching my head how to code installation of ssc packages and other user defined programs if they are missing, and your post totally answered how to code it in.
Comment

Bjarte Aagnes

Join Date: Apr 2014
Posts: 783

26 Nov 2023, 10:56

I would like the following syntax (I call the command nogoldone):
nogoldone a b alpha_prev beta_prev alpha_sn beta_sn alpha_sp beta_sp , sims() burnin() graph detail bdpar

which is not following standard syntax (-help language-):

Code:

   [prefix :] command [varlist] [=exp] [if] [in] [weight] [using filename] [, options]

The proposed syntax seems similar to immediate commands (-help immed-), ref #2. But, immediate commands typically have few arguments, like -iri-

Code:

iri #a #b #N1 #N2 [, iri_options]

The first lines of iri.ado is:

Code:

    gettoken a  0 : 0, parse(" ,")
    gettoken b  0 : 0, parse(" ,")
    gettoken n1 0 : 0, parse(" ,")
    gettoken n0 0 : 0, parse(" ,")

    confirm integer number `a'
    confirm integer number `b'
    confirm number `n1'
    confirm number `n0'

    if `a'&lt;0 | `b'&lt;0 | `n1'&lt;0 | `n0'&lt;0 {
        di in red "negative numbers invalid"
        exit 498
    }

    syntax [,          ///
          Level(cilevel) ///
          COL(string)      ///
          ROW(string)      ///
          ROW2(string)   ///
          TB          /// undocumented
          MIDP          ///
          EXACT      ///
          STIR         /// undocumented
           ]

If follwoing the above: Try to reduce number of unnamed arguments moving most to options.

An alternative implementation may be to make some slightly non-standard syntax using the syntax command keyword anything see -help syntax##description_of_anything- One example:

Code:

prog define testITi, rclass

local 0 = subinstr("`0'", "=", "==", .) // avoid syntax error on "="

syntax anything,                     /// non-std syntax to be parsed again below
     graph                           /// some description
     detail                          /// some description
     bdpar                           /// some description
     [sims(integer 2500)]            /// some description

local 0 = ","+ustrregexra("`anything'",            /// non-std syntax
                          "\b(.+?)(==)(\d+?)\b",   /// match non-std syntax
                          "$1($3)"                 /// std opt syntax
                         )

syntax,                      /// parse `anything' modified to std opt syntax
     a(integer)              /// some description
     b(integer)              /// some description
     [alpha_prev(real 1)]    /// some description

* use args

di"`a'"
di"`b'"
di"`graph' `detail' `bdpar' sims=`sims' alpha_prev=`alpha_prev' "

end 

testITi a=12 b=34 , graph detail bdpar

Code:

12
34
graph detail bdpar sims=2500 alpha_prev=1

Announcement