Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Simple user defined function/program with custom input and output

    How to create a function that gets user defined values (strings, numbers etc.) as input and user defined values (strings,numbers etc.) as output. Have searched online and it seems surprisingly complicated. Still, there are user-made packages in stata that have (python or R like) custom functions so this should be possible within Stata as well.

    Here is a MWE in python but would like to have the same in Stata. The `paste_text`function should suffice to get the idea, but for completeness I've also added the `paste_text_with_conditions` as that is what I ultimately want to achieve this time.

    Code:
    def paste_text(x:str,y:str):
        return f"I {x} want to output a string based on some parameters, {y}!"
    
    print(paste_text("really","thank you"))
    # output
    # I really want to output a string based on some parameters, thank you!
    
    def paste_text_with_conditions(x:str,z:int):
        if z >=1:
            string= f"I {x} want to output a string based on some parameters!"
        else:
            string= f"It {x} should not be that difficult"
        return string
    
    print(paste_text_with_conditions("really",0))
    # output
    # I really want to output a string based on some parameters, thank you!
    #It really should not be that difficult
    Then later I would like to use it in, for example the following way

    Code:
    use paste_text_with_conditions("text",1), clear
    Last edited by Samuel Saari; 21 Feb 2024, 23:45.

  • #2
    What do you mean with output: print on screen or leave behind in memory ready to be accessed by other programs? Here is an example how to print on screen a string in different ways depending on some other inputs.

    Code:
    program define foo
        syntax , toshow(string) otherstuff(string)
        if "`otherstuff'" == "something" {
            di as txt "`toshow'"
        }
        else {
            di as err "`toshow'"
        }
    end
    
    foo, toshow("Denkend aan Holland zie ik breede rivieren traag door oneindig laagland gaan") otherstuff("bla bla")
    ---------------------------------
    Maarten L. Buis
    University of Konstanz
    Department of history and sociology
    box 40
    78457 Konstanz
    Germany
    http://www.maartenbuis.nl
    ---------------------------------

    Comment


    • #3
      Thanks Marteen.

      With output I mean something that could be stored in a memory (like with global x 1 in stata or x=1 in python or x <-1 in R) that could be later used elsewhere. In this case, I want to have a function/programme that will create paths based on a couple of parameters: wave name and module number in a large survey.

      Then later I would like to use the functions as such:

      Code:
      use *string generated by function*, clear
      merge 1:1 id using *string generated by function*
      If your way is the way to go, then something like this:

      Code:
      // this code does not work, for demonstration purposes only
      cap nois program drop paste_text
      program define paste_text
          syntax , text(string) number(int)
          if number >= 7 {
              di as txt "Let's display text: " + "`text'" + "`number'"
          }
          else {
              di as txt "Let's write something else: " + "`text'" + "`number'"
          }
      end
      
      paste_text, text("this") number(6)
      But instead of displaying text it should be used like this...

      Code:
      use paste_text, text("this") number(6) , clear
      ...in the same way you would after defining a macro:

      Code:
      global path_to_file "C:\DATA\module_demographics_wave_1.dta"
      use "${path_to_file}",clear
      But this time I just want to add parameters so that one could define module and wave dynamically

      Code:
      global path_to_file "C:\DATA\module_MODULE_wave_WAVE.dta" // mode and wave would need to be added as parameters dynamically (like in python in op)
      use path_to_file(MODULE="demographics",WAVE=1),clear
      Last edited by Samuel Saari; 22 Feb 2024, 03:04.

      Comment


      • #4
        Look help return to see how to return results. Personally, i would write most of such a function in Mata, as in Mata there are nice functions for manipulating paths. See help pathjoin. The Stata program would be mostly a wrapper program for that Mata function. Also look at help syntax on how to pass arguments to a Stata program.
        ---------------------------------
        Maarten L. Buis
        University of Konstanz
        Department of history and sociology
        box 40
        78457 Konstanz
        Germany
        http://www.maartenbuis.nl
        ---------------------------------

        Comment


        • #5
          I think a colleague of mine and I wrote something along those lines a couple of years ago (net describe nepstools , from(http://nocrypt.neps-data.de/stata).

          If I understand the ultimate goal, a basic layout could be something along these lines:

          Code:
          program use_path_to_file
              
              version 18
              
              syntax anything [ , * ]
              
              gettoken module anything : anything
              gettoken wave   anything : anything
              
              if (`"`aynthing'"' != "") error 198
              
              use "module_`module'_wave_`wave'.dta" , `options'
              
          end
          With the above, you could call your demographic module of wave 1 as

          Code:
          cd C:\DATA
          use_path_to_file demographics 1, clear
          Last edited by daniel klein; 22 Feb 2024, 13:27.

          Comment


          • #6
            Thank you, Daniel.

            Your code would work if I only had to use the use-command. However, I also would like to use merge and possibly other commands that use paths. In addition, would also prefer using stata's native syntax and not a wrapper.

            After yet another round of googling, I stumbled upon this statalist post, and tweaked it to get a half-satisfactory solution.

            The caveat is that I will first have to define the next path that is going to be used (nextpath below) on one line and then use that stored value whenever using use or merge.

            I wonder whether it is possible to get a one-liner solution instead (like this)?

            Code:
            use nextpath dn 7,clear
            ...that would work as the two line solution...

            Code:
            nextpath dn 7
            use "`s(path)'",clear
            Here is the full MWE of the current, two-line solution:

            Code:
            // works fine, but would prefer a one-liner
            cap nois program drop nextpath
            program define nextpath, sclass
                 args module wave
                 capture local wave =string(`wave')
                 local mydirectory ="C:\my_folder" // add your path
                 if "`wave'" =="9" { // update when needed
                     local release="rel0"
                 }
                 else {
                     local release="rel1" // update when needed
                 }
                 local new_path = "`mydirectory'\survey_w`wave'_`release'_ALL_datasets_stata\survey_w`wave'_`release'_`module'.dta"
                 display "Current module: `module' & wave: `wave':"
                 display "Current path is `new_path'"
                 sreturn local path "`new_path'"
            end
            
            
            nextpath dn 6
            di s(path)
            //
            use `s(path)',clear
            nextpath gv_health 8
            merge 1:1 mergeid using `s(path)',keepusing(maxgrip) nogenerate
            nextpath ca 9ca
            merge 1:1 mergeid using `s(path)',keepusing(cah006) nogenerate
            Last edited by Samuel Saari; 23 Feb 2024, 00:59.

            Comment


            • #7
              Originally posted by Samuel Saari View Post
              Your code would work if I only had to use the use-command. However, I also would like to use merge and possibly other commands that use paths.
              Our command, neps (admittedly, hard to find in the package I have linked to) does that -- for NEPS datasets. It essentially allows for any command with the generic syntax

              Code:
              command [ anything ] [ using ]

              Originally posted by Samuel Saari View Post
              In addition, would also prefer using stata's native syntax and not a wrapper.
              That is not possible. At least not as

              Originally posted by Samuel Saari View Post
              Code:
              use nextpath dn 7,clear
              It should be obvious that such syntax would require changes to the use command; that won't happen. Frankly, I do not understand why you would not want a wrapper. You are going to rely on a community-contributed / user-written command anyway. A wrapper seems the most natural choice.

              There are a couple of issues with your program.

              1. What is the purpose of

              Code:
              capture local wave =string(`wave')
              2. Do not use backlashes. They interfere with macro substitution (see Cox, 2008). Windows understands forward slashes and so do all other OS.

              3. Hardwiring the directory does not seem like a good idea. Changing directories then requires changing the program. I do not see any advantage over

              Code:
              cd mydirectory
              nexpath ...
              which uses built-in Stata command cd to set (and, if necessary change) the directory.

              4. The sub-directory "survey_w`wave'_`release'_ALL_datasets_stata" does not strike me as very generic either. By the way, Windows is no case-sensitive, so "ALL" to Windows is the same as "all"; just something to be aware of when working with filepaths.


              Here is how I would set up a wrapper instead (not thoroughly tested, though):

              Code:
              program complete_path
                  
                  version 18
                  
                  syntax anything [ using/ ] [ if ] [ in ] [ , * ]
                  
                  if ("`using'" == "") {
                      
                      /*
                          <module> and <wave> are part of <anything>
                          
                          More specifically, we assume that <anything> is
                          
                              <command> "<module> <wave>"
                              
                          as in, e.g.,
                          
                              use "<module> <wave>"
                          
                          (double quotes optional)
                      */
                      
                      gettoken command anything : anything
                      gettoken module  anything : anything
                      gettoken wave    anything : anything
                      
                      if (`"`anything'"' != "") error 198
                      
                      local anything `command'
                      
                      local using `module' `wave'
                      
                  }
                  else local the_word_using "using"
                  
                  gettoken module wave_void : using
                  gettoken wave        void : wave_void
                  
                  if ("`void'" != "") error 198
                  
                  confirm integer number `wave'
                  
                  local release = cond(`wave'==9, 10, 11)
                  
                  local using "survey_w`wave'_`release'_ALL_datasets_stata"
                  local using "`using'/survey_w`wave'_`release'_`module'.dta"
                  
                  `anything' `the_word_using' "`using'" `if' `in' , `options'
                  
              end
              Drawing on your examples, the syntax would be

              Code:
              cd "C:\my_folder"
              complete_path use "dn 6", clear
              complete_path merge 1:1 mergeid using "gv_health 8", keepusing(maxgrip) nogenerate
              complete_path merge 1:1 mergeid using "ca 9ca", keepusing(cah006) nogenerate


              Cox, N. J. 2008. Stata tip 65: Beware the backstabbing backslash. The Stata Journal 8(3), 446--447.
              Last edited by daniel klein; 23 Feb 2024, 03:46.

              Comment


              • #8
                As a detail qualifying the excellent advice from daniel klein , the point is that Stata for Windows understands forward clashes within filepaths as if they were backslashes and translates them when dealing with the operating system. Windows itself, as I understand it, is not flexible on this point.

                Comment


                • #9
                  Daniel, I can only:
                  A. be ever grateful for doing this and pointing to the potential issues in my previous implementation. (I, in essence, only changed confirm integer number `wave' to capture local wave =string(`wave') and added quotes to the cond-statement arguments. That is not needed if all waves are integers).
                  B. be astonished how complicated this is compared to competing languages used for data analysis.

                  As for your questions:
                  1. The purpose of "capture local wave =string(`wave')" is to be able to pass anything to the function - integer or string, like 8 or 8ca - and yet be able to handle everyhthing. I know it is hacky and ugly but does the job here. Happy to hear a better way of doing it.
                  2.Good to know that about backlashes, did not know it mattered
                  3.Hardwiring the directory seems like a matter of taste, unless I am missing something. It seems most practical to do it like that here.
                  4. You might be right about the non-generality here but that is ,as far as I can see, how each and every dataset folder and file are named, so suffices here.

                  Will see how this works in practice and get back should there still be somthing.

                  Comment


                  • #10
                    On:

                    Originally posted by Samuel Saari View Post
                    B. be astonished how complicated this is compared to competing languages used for data analysis.
                    Not sure how SPSS would handle this. As for Python and the like, those are generic programming languages that can also be used for data analyses. I'd like to see how simple you can do, say, multiple imputation and multi-level modeling in those languages compared to Stata. As for R, I personally think it gives you the worst of both worlds. It is not as capable as a more generic programming language and at the same time not nearly as easy to use as Stata - especially in terms of consistent syntax for similar tasks. Having said that, what appears complicated or not often depends on our familiarity with the software. I guess the main difference between Stata and some alternatives is Stata's command-based approach which is pretty different from function-based approaches typically used in the more generic programming languages. Functions nest; commands do not.

                    Originally posted by Samuel Saari View Post
                    1. The purpose of "capture local wave =string(`wave')" is to be able to pass anything to the function - integer or string, like 8 or 8ca - and yet be able to handle everything. I know it is hacky and ugly but does the job here. Happy to hear a better way of doing it.
                    The line is not necessary. Local macros are always strings in Stata. If you want to allow for strings, you only need to add the double quotes in the cond() function, as you did:

                    Code:
                    local release = cond("`wave'"=="9", 10, 11)

                    Comment


                    • #11
                      With regards to
                      Code:
                      capture local wave =string(`wave')
                      it could just be removed as long as confirm integer number `wave' is deleted as well.

                      As for my complaints about Stata's ease-of-use in this case and your response to it: I guess every language has their pros and cons in general and all have applications where both of them become visible. All of us also have our biases and have a tendency to favor anything that is familiar. Stata has its merits, but not having convenient way of creating custom functions , is definately a caveat for anyone who likes to work that way. But your generous help is yet another demonstration that many different tasks can ultimately be accomplished in many different languages.

                      Comment

                      Working...
                      X