Multiple varlists and foreach

Tobias

Join Date: Apr 2014

Posts: 9
#1

Multiple varlists and foreach

24 Apr 2014, 09:26

Hi all,
I'm trying to produce a few scatterplots with the foreach command. The idea is to have the first variable of varlist1 plotted against the first variable of varlist2, then the second variable of varlist1 against the second element of varlist2, and so on ... So I would like to end up with 4 plots (not 16).
I couldn't find any hints on how to do this and tried it like this:
local varlist1 A1 A2 A3 A4
local varlist2 B1 B2 B3 B4
foreach var of varlist1 & var2 of varlist2 {
scatter `var' `var2'
}

Stata returns "invalid syntax". Any ideas on how to do this would be highly appreciated!

Thanks, Tobias
Tags: None
John Bigelow

Join Date: Apr 2014

Posts: 1
#2

24 Apr 2014, 09:31

Tobias,

Try:

foreach sfx in1 2 3 4 {
scatter A`sfx' B`sfx'
}

John
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35436
#3

24 Apr 2014, 09:46

John has the right idea. In this example it could be just

Code:

forval j = 1/4 { scatter A`j' B`j' , name(`j') }

I added a name() option: otherwise each plot just disappears as the next is drawn.
Comment
Joe Canner

Join Date: Mar 2014

Posts: 580
#4

24 Apr 2014, 09:46

Tobias,

This, IMO, is a weakness of Stata compared to SAS (for example): the inability to create an array from a list of variables and index the array numerically. It works fine if your variables have numerical suffixes, as in your example and John's solution. However, if your variables do not have numeric suffixes, you have to resort to some tricks. Here is one workaround (I think I have seen other solutions on Statalist in the past but I don't remember what they are):

Code:

sysuse auto, clear local varlist1 "price mpg headroom trunk" local varlist2 "weight length turn displacement" forvalues i=1/4 { local var1 = word("`varlist1'", `i') local var2 = word("`varlist2'", `i') scatter `var1' `var2' }

One could also use the unab command to populate the varlist1 and varlist2 macros if desired:

Code:

unab varlist1 : price-trunk unab varlist2 : weight-displacement

This requires caution, however, to make sure that the number of variables in each list is the same and matches up in the desired fashion. In this example, for instance, there is an extra variable (rep78) in the first list that I don't really want.

Regards,
Joe
2 likes
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35436
#5

24 Apr 2014, 09:59

Oddly, processing parallel lists was exactly what the now undocumented (really, non-documented) for was designed to do, back in Stata 7 and a few versions earlier.

It may be a case of toolkits rather than problems, but I don't see such problems often in practice.

There are other solutions, but they won't please Joe that much. There was some discussion in http://www.stata-journal.com/sjpdf.h...iclenum=pr0009

Here's one technique:

Code:

sysuse auto, clear tokenize "price mpg headroom trunk" local varlist2 "weight length turn displacement" forvalues i=1/4 { local var2 = word("`varlist2'", `i') scatter ``i'' `var2' }
Comment
Tobias

Join Date: Apr 2014

Posts: 9
#6

24 Apr 2014, 10:05

Thanks everyone! Issue resolved. Since my actual variables do not all have numerical suffixes I used Joe's solution.
Best, Tobias
Comment
Joe Canner

Join Date: Mar 2014

Posts: 580
#7

24 Apr 2014, 10:14

Nick,

"Toolkits" is a good word for what is needed (absent any official solution from Stata). You are right that this problem doesn't arise very often in practice, nor is it that difficult to solve, but when it does come up it would be good to have a readily available toolkit of solutions at hand without having to search the internet or re-invent the wheel. I don't know where or how one would maintain such a thing, but it might be worth a separate discussion.

Regards,
Joe
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35436
#8

24 Apr 2014, 10:23

The benchmark for any improvement is that it must seem simpler than (mix of code and pseudocode)

J is length of each list

forval j = 1/J {
local a : word `j' of local A
local b : word `j' of local B
<other selections>

<something involving a, b, ...>
}
Comment
Sergiy Radyakin

Join Date: Apr 2014

Posts: 1867
#9

24 Apr 2014, 13:44

Originally posted by Joe Canner View Post

This, IMO, is a weakness of Stata compared to SAS (for example): the inability to create an array from a list of variables and index the array numerically. ......

Joe, arrays are an integral part of Stata, and e.g. the graphics subsystem would be completely impossible without them. You can perfectly well define an array of anything, and index it numerically, whether it is a number, string, point or graph. Unfortunately even the advanced Stata courses shy away from classes, arrays and plugins, preferring to deal with merge, append, and collapse.

Some 15 years ago Bill Gould has posted an article on SAS-like arrays in Stata, which might be also of interest.

Best, Sergiy Radyakin
Comment
Joe Canner

Join Date: Mar 2014

Posts: 580
#10

24 Apr 2014, 14:37

Sergiy (and Nick),

Personally, I don't find anything particularly "SAS-like" in the syntax: local x : word ‘i’ of ‘array’

Granted, it is not that difficult to do, once you are exposed to it, but it is not that intuitive and is somewhat clumsy (especially when you get beyond one dimension). I don't know what is going on under the hood, but given that Stata already uses array notation to refer to observations and already uses matrices to great effect, it doesn't seem like it would be that much of a stretch to actually use array/matrix notation in this context.

Code:

array A[4] var_one var_two var_three var_four forval j=1/4 { replace A[`j']=0 // Stupid example }

The problem, I suppose, is how to deal with conflicts with the existing notation for referring to specific observations (var[`i']) and specifying both a specific observation and a specific variable in an array.

All that said, I am not going to hold my breath waiting for this change, nor whine (too much) about the lack. My only point is that the workaround is not as intuitive and "SAS-like" as it's made out to be.

Regards,
Joe
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35436
#11

24 Apr 2014, 14:56

Being SAS-like in detail is no more important for Stata than (I guess wildly) being Stata-like is important for SAS. That's not meant to be deprecatory or dismissive in any direction, but it's the way it is.
Comment
Joe Canner

Join Date: Mar 2014

Posts: 580
#12

24 Apr 2014, 15:05

Agreed; Bill Gould said as much at the Stata Conference in New Orleans. On the other hand, if SAS does something in a way that is (arguably) more elegant or intuitive, there is nothing wrong with Stata adopting the idea, simply because it is a good idea, not because SAS does it. There are not many examples of where Stata falls short in this regard, but this is one of them, in my opinion.
Comment
Jeph Herrin

Join Date: Apr 2014

Posts: 332
#13

24 Apr 2014, 15:38

I barely know SAS from SPSS, but I do know that lookup tables are extremely common in programming, and Stata doesn't really support them at the macro level. This problem comes up often for me; in some contexts, judicious variable naming does the job, otherwise I usually I use Joe's solution or create pairings using -char- (to give one variable the characteristic [pair] equal to another). But none of these is as satisfactory as a proper array would be - allowing matrices to contain strings would solve the problem very simply (from the user perspective!).
Comment

Sergiy Radyakin

Join Date: Apr 2014
Posts: 1867

#14

24 Apr 2014, 16:07

Mata code:

Code:

  A=tokens("var_one var_two var_three var_four")
  for(j=1;j<=4;j++) {
    A[1,j] // or use otherwise
  }

Lookup tables in Stata are called associative arrays:

Code:

  // create population lookup table
  T=asarray_create()
  asarray(T,"US",313.9)
  asarray(T,"UK",63.23)
  asarray(T,"UA",45.59)
  // use
  asarray(T,"US")

For example, Tobias could solve his problem with the help of:

Code:

  G=asarray_create()
  asarray(G,"price","weight")
  asarray(G,"mpg","rep78")
  asarray(G,"length","trunk")
 
  for(g=1;g<=asarray_elements(G);g++) {
    y=asarray_keys(G)[g,1]
    x=asarray(G,y)
    stata(sprintf("twoway scatter %s %s",y,x))
  }

Comment

Alan Riley (StataCorp)

StataCorp Employee

Join Date: Mar 2014

Posts: 168
#15

24 Apr 2014, 16:09

Originally posted by Joe Canner View Post

Sergiy (and Nick),

Personally, I don't find anything particularly "SAS-like" in the syntax: local x : word ‘i’ of ‘array’

Granted, it is not that difficult to do, once you are exposed to it, but it is not that intuitive and is somewhat clumsy (especially when you get beyond one dimension). I don't know what is going on under the hood, but given that Stata already uses array notation to refer to observations and already uses matrices to great effect, it doesn't seem like it would be that much of a stretch to actually use array/matrix notation in this context.

Code:

array A[4] var_one var_two var_three var_four forval j=1/4 { replace A[`j']=0 // Stupid example }

The problem, I suppose, is how to deal with conflicts with the existing notation for referring to specific observations (var[`i']) and specifying both a specific observation and a specific variable in an array.

Sergiy mentioned above that Stata does have arrays in the context of class programming. It does, and they can actually be used outside of formal Stata classes. Joe's pseudo-code example above can be accomplished in Stata like this:

Code:

.A = { "var_one", "var_two", "var_three", "var_four" } forval j=1/4 { replace `.A[`j']' = 0 }

See [P] classman for more details -- search within it for 'array'.

Note that .A must exist as an array before individual elements can be assigned to it. That is, if you wish to assign things one at a time to it, the syntax would be

Code:

.A = {} .A[1] = "var_one" .A[2] = "var_two" ...
2 likes
Comment

Announcement

Multiple varlists and foreach

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment