Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Stata vs Python. Why so many lines to do something so simple??

    Hi, I'm a long time python programmer, and I'm taking a course in econometrics and getting my first exposure to Stata.

    I do see the amazing power of Stata, but also a lot of strange stuff that should be simpler.

    python:

    Xbar = mean(sum(Y))


    Stata:

    quietly sum X, meanonly
    return list
    local Xbar = `r(mean)'


    Perhaps I'm not searching or reading the documentation properly? Or does this have an internal logic that will soon make sense?

    Thanks!

  • #2
    Well, you don't need the -quietly- there because the -meanonly- option automatically suppresses output to the Result window. You also don't need the -return list- command unless you are eager to see it for some reason. In python, when you run that command, do you get the value of Xbar echoed to output, or just stored in Xbar. (I'm not a python programmer, forgive my ignorance here.) And you don't need that = in the third command either. So it boils down to:
    Code:
    summ X, meanonly
    local Xbar `r(mean)'
    I should also point out that if instead of storing the results in a local macro, if you wanted to store it in a variable (which is not really recommended, because why waste all that storage on a single number) you could reduce it to one line similar to python:
    Code:
    egen Xbar = mean(X)

    Comment


    • #3
      Why Stata is designed the way it is designed is not for me to answer. To me, Stata is about bringing results to the screen first, working with those results second. That is what most users want. Put differently, Stata is programmable but it is not primarily a programming language. To me, Stata focuses more on making complex stuff easily accessible, not so much on assembling basic stuff to create more complex stuff. One of the key differences to most other languages, which are primarily programming languages, is, in Stata terminology, Stata is based on commands, not functions. You will find that Mata, which is Stata's programming language, is a lot closer to Python:

      Code:
      mata : Xbar = mean(X)

      Comment


      • #4
        OP, why dont you also show us how you do in Python the following: Ordinary Least Squares regression, Instrumental Variable regression and Seemingly Unrelated regressions? I am very curious so see the simplicity and elegance of Python in doing these ! Here how it is done in Stata:

        Code:
        reg y x
        
        ivreg y (x = z)
        
        sureg (y x) (x z)
        As for your example, and repeating what Clyde said but phrasing it in terms of what you want to do, and avoiding locals which are specific to Stata.

        1. You want to see the mean calculated and displayed to you?
        Code:
        summ Y
        2. You want to have the mean calculated, and assigned to a scalar?

        Code:
        summ Y
        sca Ybar = r(mean)
        3. You want the mean calculated and assigned to a vector of the same dimensionality as Y?

        Code:
        egen Ybar = mean(Y)

        I do not know what the Python code -Xbar = mean(sum(Y))- does either, in particular, is Xbar displayed to you, and is Xbar a scalar or a vector, and is Xbar later accessible to you for further manipulations, and how? But one thing to note is that it does two things which are wrapped up on one line. The first is sum(Y), and the seconds is mean(). If it does what I think it does, Stata also does two things, but they are on two lines, see 2. above.

        Lets assume that your Python codes does 2. above. Daniel says many things with which I thoroughly disagree, and then he says something which is crucial. In particular the disagreement points are: I think that many people toss around what is "programming language" and what is not, pretty randomly. If anybody has a rigorous definition of what constitutes a "programming language" he or she should state the definition, we all should check whether we agree with his/her definition, and then we can proceed to check whether Stata and Python fit in the definition... I do not know how Stata started, but when I joined in year 2000, on Stata 7 (reading documentation for Stata 6), Stata had two modes, the interactive and the scripting mode. Hence I think it is very misleading to focus on the fact that Stata has interactive mode.

        The crucial point Daniel makes is the following: Stata is centered on commands. Commands do something, and leave results behind which are later accessible. The logic of Stata is not so much about functions which take input and return output, though the latter exist in Stata, but they do mostly simple things which are used as building blocks in commands. This is illustrated in the difference between Stata and Python in 2. above. I guess that Python uses two nested functions to return the scalar. While Stata uses the command -summ- to do the calculation, and then the second line is the assignment to a scalar of one of the results left behind by -summ- to a scalar
        Last edited by Joro Kolev; 07 Feb 2021, 01:58.

        Comment


        • #5
          Originally posted by Joro Kolev View Post
          Daniel says many things with which I thoroughly disagree, [...] I think that many people toss around what is "programming language" and what is not, pretty randomly.
          Many things? Really? Going through my post, there seems to be one sentence on which we apparently disagree:

          Originally posted by daniel klein View Post
          Put differently, Stata is programmable but it is not primarily a programming language.
          Perhaps, I should have added the "To me" qualifier a third time (as I did in both surrounding sentences)?


          Otherwise,

          Originally posted by daniel klein View Post
          To me, Stata focuses more on making complex stuff easily accessible, not so much on assembling basic stuff to create more complex stuff.
          is just a more abstract way of saying what Joro's question regarding different types of regression models are supposed to illustrate. Because Peter, despite his slightly provocative title, explicitly tells us that

          Originally posted by Peter Jakobsen View Post
          I do see the amazing power of Stata
          I did not see any need to dive deeper into this.

          However, Joro's example does serve another purpose: it points to Peter's last question about internal logic. In my view, Stata is a very well designed language precisely because of its' internal logic, which has a lot to do with what I would call Stata's grammar.

          Comment


          • #6
            Please tell people who don't use Python (including me (*)) what the sum() does in mean(sum(Y)) ?


            (*) If people would just stop asking questions on Statalist I would have more time to learn some.

            Comment


            • #7
              I think some context was needed with post #1. When I try to find the mean of a list, it returns an error.
              Code:
              python
              >>> x = [1,2,3,4]
              
              # Sum of the list
              >>> Xbar = sum(x)
              >>> Xbar
              10
              
              # When we try:
              >>> Xbar = mean(sum(x))
              Traceback (most recent call last):
                File "<stdin>", line 1, in <module>
              NameError: name 'mean' is not defined
              r(7102);
              >>>
              Therefore, it turns out that the mean is a user-defined function, something that does not comes with the python installation. If that is the case, the lines of codes then are actually more in python than in Stata.
              Code:
              >>> def mean(list):
                   return sum(list) / len(list)
              
              >>> Xbar = mean(x)
              >>> Xbar
              2.5
              >>>
              Last edited by Attaullah Shah; 07 Feb 2021, 08:13.
              Regards
              --------------------------------------------------
              Attaullah Shah, PhD.
              Professor of Finance, Institute of Management Sciences Peshawar, Pakistan
              FinTechProfessor.com
              https://asdocx.com
              Check out my asdoc program, which sends outputs to MS Word.
              For more flexibility, consider using asdocx which can send Stata outputs to MS Word, Excel, LaTeX, or HTML.

              Comment


              • #8
                Mean is a function from the Python Standard Library module statistics:

                Code:
                >>> from statistics import mean
                >>> x = [1,2,3,4]
                >>> Xbar = mean(x)
                >>> Xbar
                2.5
                But -mean(sum(x))- returns an error because it's asking for the mean of an integer (in this case 10), not a list:

                Code:
                >>> Xbar = mean(sum(x))
                Traceback (most recent call last):
                  File "<pyshell#4>", line 1, in <module>
                    Xbar = mean(sum(x))
                  File "C:\Users\username\AppData\Local\Programs\Python\Python39\lib\statistics.py", line 311, in mean
                    if iter(data) is data:
                TypeError: 'int' object is not iterable
                Last edited by Ali Atia; 07 Feb 2021, 08:39.

                Comment


                • #9
                  Well, mean() could also be from numpy, in which case it would not return an error:

                  Code:
                  >>> from numpy import mean
                  >>> X = [1, 2, 3, 4]
                  >>> mean(sum(X))
                  10.0
                  >>> end
                  Obviously, taking the mean of the sum (which is a scalar, in this case) will just return the sum. Peter Jakobsen is really the only person who can tell us how mean() is defined, how sum() is defined, and what X actually is. Terminology is quite different in Python and Stata which makes declaring the stuff we use even more important. Ideally, Peter will provide reproducible examples.
                  Last edited by daniel klein; 07 Feb 2021, 08:58.

                  Comment


                  • #10
                    Being careful, not to say rigid, about what a routine accepts and emits is often a great idea in abstraction.

                    Mata's
                    Code:
                     mean()
                    is very happy with being fed a constant.

                    Code:
                    . mata : mean(42)
                      42
                    There is a vacuous principle of contentment: Every language is long familiar to those to whom it is long familiar.



                    Comment


                    • #11
                      Thank you. I'm continually surprised how well organized the Stata documentation is. After posting it did not take much browsing of the docs to find something terse enough, while still being somewhat self documenting.

                      quietly regress Y X
                      local beta_hat = _b[X]

                      Comment


                      • #12
                        Pleased you solved your problem but you created puzzlement about what your Python is and does that is not yet resolved.

                        Comment


                        • #13
                          Thank you for clarifying the details of my post, and for the helpful comments. A general programming language will never be as powerful or as easy to use for a specific domain than a tool that has been purpose built for that specific domain, especially when that tool has evolved over a long time. That's why programers and sys admins still work in the bash shell, which has been around in some form or another since 1971.

                          I found a fantastic resource for Stata newbies with prior programming experience: Programming an Estimation Command in Stata. It is a well organized, clear piece of writing, and a very fast way to get up to speed. A+
                          Last edited by Peter Jakobsen; 07 Feb 2021, 18:54.

                          Comment

                          Working...
                          X