Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Passing arguments from Stata to Python (or Stata equivalent to "itertools" )

    Dear Readers

    I wonder if there is an equivalent in Stata to "itertools" in python? For a large number of cases I have a list of numbers of different length and I want to find all the combinations of those numbers that add to a given number. In Python this can be done for example, with the list of numbers [0.02, 0.01, 0.18, 0.04, 0.03] to add to 0.25) :

    . python
    ----------------------------------------------- python (type end to exit) ---------------------------------------------------------------------------------------------------------------------------------------
    >>> import itertools
    ... numbers = [0.02, 0.01, 0.18, 0.04, 0.03]
    >>>
    >>> result = [seq for i in range(len(numbers), 0, -1) for seq in itertools.combinations(numbers, i) if sum(seq) == 0.25]
    >>> print result
    [(0.02, 0.01, 0.18, 0.04), (0.18, 0.04, 0.03)]
    >>> end

    As a way round this I have written a python script which I call from Stata

    Script:

    #python script combinations.py
    import sys
    import itertools
    print sys.argv
    arguments = len(sys.argv) - 1
    arguments
    position = 1
    while (arguments >= position):
    print ("parameter %i: %s" % (position, sys.argv[position]))
    position = position + 1
    numbers = str(sys.argv[1])
    print numbers
    # seems to be list of strings
    length = len(numbers)
    length
    goal = float(sys.argv[2])
    print goal
    result = [seq for i in range(len(numbers), 0, -1) for seq in itertools.combinations(numbers, i) if sum(seq) == goal]
    print result

    Call from Stata

    local numbers `"[0.02, 0.01, 0.18, 0.04, 0.03]"'
    local target = 0.25
    python script [path]/combinations.py, args(`"`numbers'"' "`target'")

    which gives:

    . python script d:/india/banerjee&iyer/gis/dams/nrld/DoFiles/python/combinations.py, args(`"`numbers'"' "`target'")
    ['d:/india/banerjee&iyer/gis/dams/nrld/DoFiles/python/combinations.py', '[0.02, 0.01, 0.18, 0.04, 0.03]', '.25']
    parameter 1: [0.02, 0.01, 0.18, 0.04, 0.03]
    parameter 2: .25
    [0.02, 0.01, 0.18, 0.04, 0.03]
    0.25
    Traceback (most recent call last):
    File "d:/india/banerjee&iyer/gis/dams/nrld/DoFiles/python/combinations.py", line 24, in <module>
    result = [seq for i in range(len(numbers), 0, -1) for seq in itertools.combinations(numbers, i) if sum(seq) == goal]
    TypeError: unsupported operand type(s) for +: 'int' and 'str'
    failed to execute the specified Python script file
    r(7103);

    end of do-file

    But running the following irectly in Stat gives the correct results

    ************************************************** ***********
    local numbers `"[`0.02', `0.01', `0.18', `0.04', `0.03']"'
    local target = 0.25
    python
    import sys
    import itertools
    numbers = `numbers'
    print numbers
    target = float(`target')
    print target
    result = [seq for i in range(len(numbers), 0, -1) for seq in itertools.combinations(numbers, i) if sum(seq) == target]
    print result
    end
    ************************************************** **************

    . local numbers `"[0.02, 0.01, 0.18, 0.04, 0.03]"'

    . local target = 0.25

    . python
    ----------------------------------------------- python (type end to exit) ---------------------------------------------------------------------------------------------------------------------------------------
    >>> import sys
    >>> import itertools
    >>> numbers = `numbers'
    >>> print numbers
    [0.02, 0.01, 0.18, 0.04, 0.03]
    >>> target = float(`target')
    >>> print target
    0.25
    >>> result = [seq for i in range(len(numbers), 0, -1) for seq in itertools.combinations(numbers, i) if sum(seq) == target]
    >>> print result
    [(0.02, 0.01, 0.18, 0.04), (0.18, 0.04, 0.03)]
    >>> end

    It is seemingly the way the argument "numbers" is treated as all string in the args() option to the "python script" call, but I cannot work out how to get around it.

    Thanks for help

    Richard



  • #2
    I don't have any experience using Python in Stata so I can't help you with that part, but this can also done with just Stata, using tuples from SSC. Your Python code may be faster though.
    Code:
    program sumlist, rclass
        local sum = 0
        foreach n of local 0 {
            local sum = `sum' + `n'
        }
        return local sumlist = `sum'
    end
    
    program sumequals, rclass
        args numlist sum
        tuples `numlist'
        forval i = 1/`ntuples' {
            sumlist `tuple`i''
            if `r(sumlist)' == `sum' local sumequalslist `"`sumequalslist' "`tuple`i''""'
        }
        di `"`sumequalslist'"'
        return local list "`sumequalslist'"
    end
    
    local numbers 0.02 0.01 0.18 0.04 0.03
    sumequals "`numbers'" .25
    Code:
    . sumequals "`numbers'" .25
     "0.18 0.04 0.03" "0.02 0.01 0.18 0.04"

    Comment


    • #3
      Have you tried removing the double quotes around the numbers macro?
      Regards
      --------------------------------------------------
      Attaullah Shah, PhD.
      Professor of Finance, Institute of Management Sciences Peshawar, Pakistan
      FinTechProfessor.com
      https://asdocx.com
      Check out my asdoc program, which sends outputs to MS Word.
      For more flexibility, consider using asdocx which can send Stata outputs to MS Word, Excel, LaTeX, or HTML.

      Comment


      • #4
        Fantastic - Wouter's code works a treat. I had forgotten about "tuples".

        Attaullah's suggestion unfortunately does not work - though I can explore that further:


        . local numbers `[0.02, 0.01, 0.18, 0.04, 0.03]'

        . local target = 0.25

        . python script d:/india/banerjee&iyer/gis/dams/nrld/DoFiles/python/combinations.py, args(`"`numbers'"' "`target'")
        ['d:/india/banerjee&iyer/gis/dams/nrld/DoFiles/python/combinations.py', '', '.25']
        parameter 1:
        parameter 2: .25

        0.25
        []


        I would still like to know how to pass the parapeters to Python (which I have to use for (Arc)GIS stuff).

        Thanks again

        Richard

        Comment


        • #5
          FWIW, the latest version of tuples actually calls Python's itertools.

          Comment


          • #6
            Hi Richard,

            Related to:

            I would still like to know how to pass the parapeters to Python (which I have to use for (Arc)GIS stuff).
            As you note, the arguments are being passed as a string.

            To get the passing to work better, try changing:

            Code:
            local numbers `"[0.02, 0.01, 0.18, 0.04, 0.03]"'
            to

            Code:
            local numbers "0.02 0.01 0.18 0.04 0.03"
            in your Stata code and then parse the string to make them a list of floats like so in your Python code. Thus change:

            Code:
            numbers = str(sys.argv[1])
            to

            Code:
            numbers = [float(x) for x in sys.argv[1].split(" ")]
            This last component splits the input arguments on spaces and "de-strings" them into floats that can be used.

            For instance, this Python code:

            Code:
            import sys
            import itertools
            print(sys.argv)
            numbers = [float(x) for x in sys.argv[1].split(" ")]
            print(numbers)
            result = [seq for i in range(len(numbers), 0, -1) for seq in itertools.combinations(numbers, i) if sum(seq) == float(sys.argv[2])]
            print(result)
            Works when called as:

            Code:
            . local numbers "0.02 0.01 0.18 0.04 0.03"
            
            . local target = 0.25
            
            . python script ./combinations.py, args(`"`numbers'"' "`target'")
            ['./combinations.py', '0.02 0.01 0.18 0.04 0.03', '.25']
            [0.02, 0.01, 0.18, 0.04, 0.03]
            [(0.02, 0.01, 0.18, 0.04), (0.18, 0.04, 0.03)]
            Last edited by Joseph Luchman; 01 Jul 2020, 07:12. Reason: Removed commas in Stata local macro "numbers"
            Joseph Nicholas Luchman, Ph.D., PStat® (American Statistical Association)
            ----
            Research Fellow
            Fors Marsh

            ----
            Version 18.0 MP

            Comment


            • #7
              Thanks Daniel - - yes, after setting "trace on" (and reading the "tuples" help) I saw that unless "nopython" "tuples" used "itertools".

              Thanks also Joseph - I am very new to python so your code helps up the curve.

              I am hoping it will help with a related problem. When I do "tuples" on a set of strings it returns the tuples, but if there is a space in one string it returns the string with the space unadorned, so when I come to use it the particular element becomes two (or more).

              For example:

              local neighbours `" "a" "b" "c d""'
              tuples `neighbours', display

              . local neighbours `" "a" "b" "c d""'

              . tuples `neighbours', display
              tuple1: c d
              tuple2: b
              tuple3: a
              tuple4: b c d
              tuple5: a c d
              tuple6: a b
              tuple7: a b c d

              This is wrong for me as I need to distinguish only "a" "b" and "c d" in the returned tuples. I need to get "tuples" to return the separate elements.

              All ideas very gratefully .....

              Richard
              PS - embarassingly I have to ask - how do I embed code in Statalist postings?

              Comment


              • #8
                Hi Richard,

                Local macros won't keep those double quotes as they're meaningful to Stata as indicators for strings and will be stripped.

                Consider putting in other delimiters like commas. For instance:

                Code:
                . tuples ",a" ",b" ",c d", display
                tuple1: ,c d
                tuple2: ,b
                tuple3: ,a
                tuple4: ,b ,c d
                tuple5: ,a ,c d
                tuple6: ,a ,b
                tuple7: ,a ,b ,c d
                You can then use -gettoken- or similar other methods (e.g., tokens() in Mata) to break them up into a matrix or separate macros.

                Code:
                . mata: tokens(st_local("tuple7"),",")
                         1     2     3     4     5     6
                    +-------------------------------------+
                  1 |    ,    a      ,    b      ,   c d  |
                    +-------------------------------------+
                
                .
                Try wrapping [ CODE ] (without spaces) at the beginning of and [ /CODE ] (again without spaces) to get the code to display as code on Statalist.

                - joe
                Joseph Nicholas Luchman, Ph.D., PStat® (American Statistical Association)
                ----
                Research Fellow
                Fors Marsh

                ----
                Version 18.0 MP

                Comment


                • #9
                  Hello Joe

                  thanks for this - it may work but I got around by replacing spaces with underscores
                  Code:
                  replace name = subinstr(name, " ", "_", .)
                  BUT - tuples is now crashing Stata MP (16.1 29/9/2020) on this computer - windows 10 Pro, version 10.0.18363) but not on my laptop Stata ditto, Wiindows 10 home 10.0.19041).

                  Code:
                  tuples 1 2 3 , di
                  Stata window disappears. Any ideas?

                  A bit annoying. Fortunately I don't have to run tuples just now, but I may expect to in due course.

                  Thanks to all again


                  Richard

                  Comment


                  • #10
                    Hi Richard Palmer-Jones,

                    The arguments passed to the .py file through the args() option of -python script- command are stored as strings. In your case your can call the literal_eval() function of
                    the ast module to convert your list represented in string to a real list.

                    Code:
                    #python script combinations.py
                    import sys
                    import itertools
                    import ast
                    
                    print(sys.argv)
                    
                    arguments = len(sys.argv) - 1
                    arguments
                    position = 1
                    
                    while (arguments >= position):
                        print("parameter %i: %s" % (position, sys.argv[position]))
                        position = position + 1
                        
                    numbers = str(sys.argv[1])
                    print(numbers)
                    
                    # seems to be list of strings
                    numbers = ast.literal_eval(numbers) 
                    
                    length = len(numbers)
                    print(length)
                    goal = float(sys.argv[2])
                    print(goal)
                    result = [seq for i in range(len(numbers), 0, -1) for seq in itertools.combinations(numbers, i) if sum(seq) == goal]
                    print(result)
                    Then in Stata,

                    Code:
                    . local numbers `"[0.02, 0.01, 0.18, 0.04, 0.03]"'
                    
                    . local target = 0.25
                     
                    . python script combinations.py, args(`"`numbers'"' "`target'")
                    ['combinations.py', '[0.02, 0.01, 0.18, 0.04, 0.03]', '.25']
                    parameter 1: [0.02, 0.01, 0.18, 0.04, 0.03]
                    parameter 2: .25
                    [0.02, 0.01, 0.18, 0.04, 0.03]
                    5
                    0.25
                    [(0.02, 0.01, 0.18, 0.04), (0.18, 0.04, 0.03)]

                    For your crash problem, I cannot reproduce it on my Windows 10 Pro. Can you type

                    which tuples

                    to check whether you got the latest version?

                    Comment


                    • #11
                      HI Xu

                      Sorry for the delay in responding

                      Code:
                      . which tuples
                      C:\Users\Richard Palmer-Jones\ado\plus\t\tuples.ado
                      *! 4.0.1 Joseph N. Luchman, daniel klein, & NJC 16 May 2020
                      
                      . ssc install tuples
                      checking tuples consistency and verifying not already installed...
                      all files already exist and are up to date.
                      I'll respond about xtivreg too shortly.

                      Thanks

                      Richard



                      Comment


                      • #12
                        Further - I tried to check the python version. Note that I use ArcGis and had updated from 10.6.1 to 10.8.1 but I did thison both Workstation and Laptop.

                        Code:
                        . python query
                        ------------------------------------------------------------------------------------------------------------------------------------
                            Python Settings
                              set python_exec      C:\Python27\ArcGISx6410.6\python.exe
                              set python_userpath  
                        
                            Python system information
                              initialized          no
                              version              2.7.14
                              architecture         64-bit
                              library path         python27.dll
                        . python set exec C:\Python27\ArcGIS10.8\python.exe
                        failed to set the specified Python version.
                        Unable to find the shared library.
                        On my lapton I have the same python, but I can "python set exec C:\Python27\ArcGISx6410.6\python.exe" without problems.

                        Cheers

                        Richard

                        Comment


                        • #13
                          Hi Richard Palmer-Jones:

                          The error message issued by python set exec suggests that the shared library python27.dll is missing on your system. For Python 2, it is usually located in C:/Windows/System32 or C:/Windows/SysWOW64. Can you verify that?

                          Comment


                          • #14
                            Hello Zhao

                            python27.dll is present in both locations.

                            Richard

                            Comment


                            • #15
                              Hi Richard Palmer-Jones:

                              Can you open a Command Prompt Window first and then type

                              Code:
                              cd C:\Python27\ArcGIS10.8
                              to enter your Python environment locally?

                              After that, type

                              Code:
                              import platform
                              platform.architecture()
                              within the environment to check your Python architecture? It will tell whether you are using a 32bit or 64bit Python. Note that a 32bit Python will be not loaded from a 64bit Stata.

                              Comment

                              Working...
                              X