transplot package downloadable from SSC

Nick Cox

Join Date: Mar 2014

Posts: 35418
#1

transplot package downloadable from SSC

03 Jul 2020, 15:09

Thanks as always to Kit Baum, a new transplot package is now downloadable from SSC using

Code:

ssc install transplot

Stata 8 is the minimum requirement. That said, I have not tested this in Stata 8 but I imagine that people may shout out if there are problems running it in any old version of Stata.

transplot is to draw plots trying out transformations. It grows out of a long-term personal interest in transformations, an odd topic in that statistically experienced people seem to vary greatly in their willingness to transform.

That said, often a researcher just knows from experience or theory that a particular transformation (including here link function) should make sense.

But sometimes you need to try out a transformation before you can be sure that it is a good idea -- or see that it is useless, or changes matters so little that it is pointless -- or even to spot that it is a bad idea.

The immediate stimulus for this command came when I focused on why I never (well, hardly ever) use any of the official commands ladder, gladder, qladder. What I wanted instead was typically a focused comparison of what the data look like on the original scale and using one or at most a few different transformed scales. Most commonly, the question can just be: is taking logarithms a good idea?

I gave a talk centred on transplot at the London Stata conference in September 2019 but did not release the code (or a help file, which I had not even written at the time). Marination has allowed some further modest extensions of functionality.

The slides are accessible at https://www.stata.com/meeting/uk19/slides/uk19_cox.pptx

The help file is fairly detailed so a few examples of the command at work should be enough for now.

First, the command is used in one-way mode with named (a) distribution plotting command (b) variables (c) transformations (@ is symbol for a variable on original form). The unsurprising big picture here is that each variable shown is strongly positively skewed and would be easier to work with when logged.

Code:

set scheme s1color webuse grunfeld, clear transplot qnorm invest mvalue kstock, trans(@ log10) ms(Oh) combine(colfirst)

Second, an example in which we play with logarithmic and reciprocal versions of a response variable:

Code:

sysuse auto, clear transplot scatter mpg weight, ytrans(@ log10 100/@) ms(Oh)

Third you can try transforming the predictor too:

Code:

transplot scatter mpg weight, ytrans(@ log10 100/@) xtrans(@ log10) ms(Oh) combine(colfirst)

Attached Files
Tags: None

4 likes
Nick Cox

Join Date: Mar 2014

Posts: 35418
#2

19 Jul 2020, 02:57

Here's another example. This repeats themes from above -- transformations might help and plotting with respect to some reference distribution might help too -- but adds the idea of comparing groups.

Do foreign and domestic cars in the auto data vary in mpg? Here you need qplot from the Stata Journal as well as to download transplot from SSC.

Code:

sysuse auto, clear set scheme s1color transplot qplot mpg, over(foreign) trans(@ sqrt log 1000/@) scheme(s1color) legend(pos(11) ring(0) order(2 1) col(1)) trscale(invnormal(@)) xtitle(standard normal deviate)

Here a commentary might run: normal quantile plots do show that foreign cars have higher mpg than domestic, but the comparison is more complicated than an additive shift, so is it (for example) multiplicative rather than additive? A logarithmic transformation does help -- noting along the way that a root transformation does not help much, so forget about it -- but we might as well as keep going and use reciprocals. In fact it's a standard comment that gallons per so many miles (or litres per so many km!) is as or more natural a scale as the original. Evidently reciprocals flip the groups around so that domestic cars plot higher than foreign in the last panel -- being more inefficient.

An implication of the first panel is that a t test oversimplifies!

Although this example ends up as one line of code that does the Hogwarts stuff, I always find myself fooling around and building up to it bit by bit -- and sometimes digressing or deviating along the way. So. it's more typical that several intermediate steps end up on the cutting room floor.
1 like
Comment
ericmelse

Join Date: May 2014

Posts: 422
#3

19 Jul 2020, 22:06

Dear Nick,

Thank you for this additional example, being able to (better) compare groups is important for practical purposes.
When I compare the first panel (top left) with the fourth panel (bottom right) my (possibly) naiev interpretation is that the reciprocal transformation should provide a quantile regression model result with coefficients that are more close between quantiles.

Indeed, comparing the result of :

Code:

sysuse auto, clear gen mpg1000 = 1000/mpg sqreg mpg foreign , quantile(.10 .25 .5 .75 .90) *{results omitted}

with the result of:

Code:

sqreg mpg1000 foreign , quantile(.10 .25 .5 .75 .90) *{results omitted}

shows that the difference between the coefficients, for example, of q50 and q10 as well as of q50 and q90 is reduced (from 3 to 1,07 and from 3 to 1.65).

Of course this is a 'toy' example, so I cannot assume that this finding should replicate with similar data, but, I suppose the objective of using transplot is to investigate if such an analytical improvement presents itself, or not.

http://publicationslist.org/eric.melse
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35418
#4

20 Jul 2020, 01:54

Indeed, the spirit of transplot is entirely descriptive or exploratory. What formal inferences follow - in terms of say tests or model fits -- is a different question and I would not dream of including any such in the code.

I would want to stress the critical role of transplot in underlining which tests make most sense, or how far tests make sense. The most common pitfall I've seen is comparing two or more means without being clear that additive shift is the main story. I've not seen any text discourage graphics before or alongside such tests, but my impression is that most encourage drawing histograms or box plots, variously inefficient or even irrelevant for consideration of means and variation around them.
Comment

Announcement

transplot package downloadable from SSC

Comment

Comment

Comment