Hi,
I was responding to a participant's question earlier, and I just realized of something that I thought would be useful to have in the summarize command. As it currently is, when you use it for descriptive statistics of several values only the statistics of the last variable are stored in r(), and they're all scalars. I just thought that when you're passing more than one variable in the varlist it would be more useful to have a vector for each statistic that holds the values for all the variables. You could also, as an alternative, create a scalar for each statistic-variable combination, and that would work too. If you do the vectors you could have rownames or colnames, depending if you decide to do it in a one column vector or a 1 row vector, where each rowname or colname would be the name of the variable so that would make the stats easily accessible. For example
If you decide to them all as scalars you could have an abbreviation of the statistic followed by the underscore and then the variable name. For example
I would prefer the vector method but either one would work. This way we don't have to call the summarize several times when programming, and could hold the statistics in memory across the program. This is just a thought, maybe there is already another command that does this and I apologize for my possible ignorance beforehand. The good thing is that this could work in many different versions of Stata so it can be implemented in older versions with just an update. Of course... the problem is that old codes may need a revision....
I was responding to a participant's question earlier, and I just realized of something that I thought would be useful to have in the summarize command. As it currently is, when you use it for descriptive statistics of several values only the statistics of the last variable are stored in r(), and they're all scalars. I just thought that when you're passing more than one variable in the varlist it would be more useful to have a vector for each statistic that holds the values for all the variables. You could also, as an alternative, create a scalar for each statistic-variable combination, and that would work too. If you do the vectors you could have rownames or colnames, depending if you decide to do it in a one column vector or a 1 row vector, where each rowname or colname would be the name of the variable so that would make the stats easily accessible. For example
Code:
quiet summarize var1 var2 mat sds = r(sd) // this would now be a matrix not a scalar di sds[var1,1] // If you decide to make it a single column vector
Code:
quiet summarize var1 var2 di r(sd_var1)
Comment