What's the most advantageous points of Stata compared with Python or R?

Tatsuru Kikuchi

Join Date: Mar 2021

Posts: 12
#1

What's the most advantageous points of Stata compared with Python or R?

21 Mar 2021, 07:47

What's the most advantageous points of Stata compared with Python or R?

What's the advantageous points of Stata as a data analytics tool?

This is a very simple but relevant questions for me to proceed my work. Indeed, I would like to know more deeply about Stata since it is required in the current working environment. Would you please make any comments on the questions above?

Note Added

Now I am studying Economics at the Graduate School of Economics in The University of Tokyo. Before joining here, I have been a data scientist, and hence I have been using Fortran, C++. Python, R, TensorFlow as data analytics tools. I have also utilized SQL to manipulate commercial databases.
Tags: analytics, python, software, stata
Joro Kolev

Join Date: Aug 2018

Posts: 3050
#2

21 Mar 2021, 09:20

Different people will give you different answers. My answer is that:

Stata is not comparable to Python, because the latter is a general purpose programming language, and Stata is a specific statistical/regression language.

Stata is comparable to R, but they are very different. Stata's syntax is very similar to many statistical/regression languages such as TSP, RATS, Eviews, Gretl, etc. R is based on some (extinct now I think) statistical language called S, and has very different syntax.

I rarely write programs that other people are supposed to use. For my own research, Stata has the advantage over R that everything in Stata is simple and fast to do, and the syntax of Stata is natural to me, while in R everything is hard and slow to do, and the syntax of R is unnatural to me.

But I am saying these things as somebody who has used Stata constantly from year 2000 till now, and as somebody who has 4 months of experience with R.

I did my Ph.D. at Universitat Pompeu Fabra, and for a couple of years I was the Teaching Assistant of the great Catalan statistician Albert Satorra. Professor Satorra was very excited about R and he was talking about R oftentimes in our conversations, and one year I decided to see for myself the greatness of R. I took Econometrics in R by Grant V. Farnsworth, and over 4 months slowly went through everything described in this guide to R. Well, I did not like R at all, I did not see anything in R that I cannot do in Stata a lot easier, and I never again touched R after that.

At the end I think it all comes to be a matter of taste.
1 like
Comment
Leonardo Guizzetti

Join Date: Jul 2016

Posts: 2402
#3

21 Mar 2021, 14:49

For my own view, I find questions of the sort "Is software X better than software Y?" neither interesting nor useful. I also disagree that it is a simple question. Any such question implicitly assumes thatyou – the reader – understand and have some proficiency in X and Y. (Otherwise, the discussion is moot.) As a result, every answer is subjective and couched in the relative abilities of the responder. Whether that comparison means anything to you depends on unknown assumptions about you. For example, how fluent are you in each language? Which do you have access/permission to use? Is there a crucial aspect that needs to be optimized, like memory or speed, or will any reasonable solution to the problem at hand be sufficient? Are there external factors that need to be considered, such as time cost of your training, financial cost of the software or infrastructure, software licensing, or colleagues that may need to debug/use the code, or processes that govern how or where the code is to be applied? All of these matter to you but may not matter to others.

Should that stop you from learning multiple languages? Certainly not. Will you spend more time in one language than another? Almost certainly. If you do learn multiple languages, then it will not hurt to learn about each more deeply as you gain experience.

The mantra "pick the right tool for the right job" comes to mind. I'm certain that you know this already given your breadth of experience with programming languages, just about anything that can be done in one language can be implemented in another. That doesn't mean you can, or even should, reinvent the wheel. That said, you mention that you are required to use Stata for your graduate program, so that in itself is an answer.
3 likes
Comment
John Mullahy

Join Date: Dec 2016

Posts: 751
#4

21 Mar 2021, 16:11

In addition to the points raised by Joro and Leonardo, one additional thing to consider is that Stata's built-in matrix/object language, Mata, provides programming flexibility like that offered by Python. For my purposes programming with Stata together with Mata makes Stata quite powerful.

https://www.stata.com/features/matrix-programming-mata/
1 like
Comment
Oscar Ozfidan

Join Date: Sep 2018

Posts: 257
#5

22 Mar 2021, 00:10

I am fairly new (on and off about 2 years) to Stata and my number 1 reason is the very low learning curve to implement a brand new estimation in Stata. Often times, I dont know what tomorrow would bring. Using Stata, I was able to conduct a cluster analysis in less than 10 minutes something I have never done before. In R, anytime I tried to do something new (which have not been that many times to be fair), I often found myself facing error messages usually involving managing the environment (size of this matrix that matrix etc,). When I tried to figure out the issue, I had hard time getting my hands on a comprehensive overview of the procedure I was trying to implement. R is open source but there is a corporation behind Stata. They make sure everything needed from the user is minimal and all procedures are well documented. Of course, Stata is not free but the price for me is many times worth having the benefit of not spending many hours dealing with error messages trying to figure out the right way.

Last edited by Oscar Ozfidan; 22 Mar 2021, 00:13.
1 like
Comment
Marc Kaulisch

Join Date: Jan 2016

Posts: 184
#6

22 Mar 2021, 01:11

I like to cite Asjad Naqvi from https://medium.com/the-stata-guide/w...m-4b9b9d00a172

Stata’s uniqueness lies in its ability to be a great platform for data management and statistical analysis with pre-defined and rigorously vetted packages.
1 like
Comment
JanDitzen

Join Date: Jan 2015

Posts: 350
#7

22 Mar 2021, 01:30

I agree with Asjad Naqvi and Marc Kaulisch.

Stata has a general syntax which is the same across a wide set of different commands. This makes the implementation and alternation of applied work very easy. Comparing user written programs in Stata to those in R, Matlab or Python, my experience (and I might be wrong!) is that Stata programs allow for much more flexible settings. Usually I spent less than 1/3 of the coding time for a package on the problem itself. The rest is to make sure it is flexible enough so it can be applied to a wide set of applications. With flexibility I am thinking of unbalanced data, data with gaps/missings, if statements, post estimation commands, different options etc.
Finally I believe the quality of articles in and programs discussed in The Stata Journal is very high. This helps to maintain a pool of high quality software.
1 like
Comment
Brian Poi

Join Date: Feb 2021

Posts: 22
#8

22 Mar 2021, 08:28

I think the fact that Stata is purpose-built to be a statistical software package gives it distinct advantages over Python especially and R to a lesser degree. Yes, R is at its heart a statistics package, but it still makes doing simple things more complicated than they need to be in my opinion.

For example, say I want to run a plain linear regression and get heteroskedasticity-consistent robust standard errors. In Stata I type -regress y x1 x2 x3, vce(hc2)- and I am done. In R, I use lm() to fit a regression model, then I have to use a function from the sandwich package to get a robust VCE, then finally call a third function to print out the coefficient table and summary statistics in a sane way. Python, being a general purpose language, is even worse. Yes, I could do it with external libraries, but it's too much hassle for me in my old age.

In Stata time-series data is easy to work with. In R, I need to import libraries just to declare that I have time-series data, and operators like leads, lags, and differences are much clunkier in R than in Stata.

In short, for doing statistical analysis, a program like Stata (or as a distant second, Eviews) requires less programming to do everyday routine analyses. Of course, with any serious statistical analysis some amount of programming is necessary and advantageous. But half my code shouldn't be dedicated to re-implementing routine things or else using third-party libraries to do analyses built into sane software designed from the ground up for statistical analysis.
6 likes
Comment
Weiwen Ng

Join Date: Jun 2015

Posts: 1241
#9

22 Mar 2021, 09:36

Originally posted by Brian Poi View Post

I think the fact that Stata is purpose-built to be a statistical software package gives it distinct advantages over Python especially and R to a lesser degree. Yes, R is at its heart a statistics package, but it still makes doing simple things more complicated than they need to be in my opinion.

For example, say I want to run a plain linear regression and get heteroskedasticity-consistent robust standard errors. In Stata I type -regress y x1 x2 x3, vce(hc2)- and I am done. In R, I use lm() to fit a regression model, then I have to use a function from the sandwich package to get a robust VCE, then finally call a third function to print out the coefficient table and summary statistics in a sane way. Python, being a general purpose language, is even worse. Yes, I could do it with external libraries, but it's too much hassle for me in my old age.

In Stata time-series data is easy to work with. In R, I need to import libraries just to declare that I have time-series data, and operators like leads, lags, and differences are much clunkier in R than in Stata.

In short, for doing statistical analysis, a program like Stata (or as a distant second, Eviews) requires less programming to do everyday routine analyses. Of course, with any serious statistical analysis some amount of programming is necessary and advantageous. But half my code shouldn't be dedicated to re-implementing routine things or else using third-party libraries to do analyses built into sane software designed from the ground up for statistical analysis.

I agree with this. It's not just that I need to install and require various packages, that's not so much an issue. The issue is that I need to search for the functionality I require, which is a non-trivial task. For instance, I know that PoLCA will fit latent class models to binary or categorical indicators, and it can do latent class regression, and it can do the bootstrap LR test, but it can only handle those types of indicators and not other types. flexmix is a more general program that's got at least the equivalent to gsem's functionality but the syntax is more challenging to use.

About that: different package authors use different syntax. Across the native Stata commands, the syntax has considerable similarities. Once I know Stata in general, I know the syntax. R's syntax is more heterogeneous across the different packages. And some of them have other quirks. For example, R's mirt package (Phil Chalmers, brilliant guy) requires you to create a data frame containing just the questions when you are fitting an IRT model - and if you have a general data frame, it's going to treat everything as a question, including anything you might consider to be a covariate (e.g. age, sex, gender). If you want to fit any IRT model using covariates (e.g. differential item function model, explanatory model), then you need to separately supply a vector or a data frame containing the covariates. I want to email Chalmers and tell him this makes things kind of tricky, but he did develop the whole package basically for free as part of his dissertation, and he takes time out of his presumably busy day to support the package's users, so I can't bring myself to do that.

Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.
Comment

Announcement

What's the most advantageous points of Stata compared with Python or R?

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment