You are not logged in. You can browse but not post. Login or Register by clicking 'Login or Register' at the top-right of this page. For more information on Statalist, see the FAQ.
R is the Open source counterpart of SAS, which has traditionally been used in academics and research
Anyone who thinks that to be an accurate summary might want to read on. (My own alternative would be much more mundane, stressing that the market "in academics [sic] and research" defies concise summary in any such terms.) Not my battle, but I'd be surprised at any R enthusiasts and experts signing up to the idea that R is any kind of open source imitation of SAS!
This blog seems very much focused on India. Perhaps Stata hasn't made as many inroads into the Indian market. Why do we need to take this particular blog seriously, anyway? (Aside from those at StataCorp who want to market to India.)
Moreover, they measure "market share" by looking at the number of job postings for SAS and R. That raises an obvious counter: maybe companies don't advertise for Stata programmers because it is readily accessible, both in terms of price and learning curve, to the average statistician or statistical programmer, regardless of background. You don't need to hire out what you can do yourself and if you are hiring a full time person, knowledge of Stata is not a prerequisite. (For example, I had 25 years of SAS experience and no Stata experience before being hired last year to be a Stata programmer.)
The way I summarize if for students asking "why Stata?" and "Why does she still teach SAS when we all use Stata?" is that SAS is the standard for what I call industrial statistics: the BLS, Census, large corporations. SAS is very good in an environment where you have 100 different sources of data, some (say web traffic logs or sales) being updated in real-time, some with limited access (you can run summaries on payroll data, but can't update or query names) and so forth. Proc SQL is awesome.
However, for most people I work with, in academic jobs or hoping for academic jobs, the scenario is more that you have a single person, totally in control of their own data. Since in general Stata is faster and much easier to use, in the academic environment, it is vastly preferable for the stand-alone researcher. Since Stata can make ODBC connections and such, I suppose it's possible for it to work in an industrial statistics environment, but its capabilities in that regard seem to me like a talking chimp: it's cool that it can talk, but don't expect it to hold deep conversations. Also, with the infrastructure/investment/inertia of large organizations, SAS will be around for a long, long time, regardless of expanding capabilities of Stata or people learning R or Python. The memory limits issues with Stata and R are also a big problem with big data.
Then there's always SPSS. It's interesting what IBM has been trying to do with it (Python and R integration, ODBC support, etc.). I don't see SPSS reclaiming its crown in academia, and don't see it becoming a serious contender in industrial statistics. But IBM has occasionally devoted some resources to continued development, so ten years from now, it might come back. NOT!
As far as why SAS is still taught, I got the impression that it's because SAS needs to be taught, whereas the motivated student can pick up a working knowledge of Stata autodidactically in a couple of hours of playing around with it.
I'm curious, though, about, "Since Stata can make ODBC connections and such, I suppose it's possible for it to work in an industrial statistics environment, but its capabilities in that regard seem to me like a talking chimp: it's cool that it can talk, but don't expect it to hold deep conversations." I don't understand the point you're trying to make, here (the simile is obscure to me). Could you elaborate?
SAS has extensive libraries for working with Oracle, Teradata, and other data warehousing platforms large enterprises use; a single command file can pull from numerous tables and numerous sources in a secure, encrypted, vendor-supported manner, even without ODBC. Stata could do something like this if the database managers were willing to allow ODBC connections, but it would be pretty difficult to pull off, and still wouldn't be as powerful.
I'm not sure about Stata being something people can pick up on their own. For somebody with an extensive programming background, especially one like SAS, it must be refreshingly easy to use. But for a Sociologist/Psychologist/Poli-Sci grad student with no prior background with other languages, having somebody show them the ropes seems essential. Some people don't intuitively grasp strings vs. numeric data, or looping, or macros. The help files for most things in Stata I find immensely helpful (SPSS is perhaps the worst in my opinion, and SAS is a mixed bag), but for true newbies, just knowing what is optional and what is required, what is a reserved word and what is an arbitrary variable name can be confusing. Some seem to get it and are able to learn (mostly) on their own after a single-semester stats course; others never do move much beyond generate, replace, and regress
Ben, thank you for clearing that up for me. I've read that there are some in-database features, accelerators and so on that SAS Institute and the various RDBMS vendors have co-developed for add-on licensing, but I take it that you're referring to SAS/ACCESS options for the various RDBMSs. I assume their functionality requires the DBA* to grant table/view access to the SAS user's log-in. Or can they still work through a layer of stored procedures?
I see your point about the population of potential users. I hadn't considered the situation of social science graduate students without any background to prepare them for data management and statistics.
* DBAs must relish statements like the following from a SAS whitepaper for SAS/ACCESS Interface to Teradata: "The SAS Explorer, with the familiar split-window display of libraries and the library files, permits novice users to operate on Teradata DBMS databases and tables without having to write SAS code." (Talking chimps indeed!)
Comment