Estimating adjusted mean difference by treatment group, following -regress-

Evan Hardy

Join Date: Apr 2023

Posts: 4
#1

Estimating adjusted mean difference by treatment group, following -regress-

18 Apr 2023, 14:56

I am looking to calculate the adjusted mean difference in heart rate by treatment groups (alloc) - this is a binary 0/1 variable, 0=placebo, 1=intervention. I have performed a regress command with finalhr as the dependent variable, i.alloc as the predictor, and adjusted for covariates basehr x y and z. finalhr had to be log-transformed so i used eform() to antiolog coefficients (not sure if this is correct either)

Final code looks like regress finalhr i.alloc basehr x y z

The regress runs well and satisfies assumptions, but I would like to estimate the adjusted mean difference in heart rate, by treatment group.

I tried -margins r.alloc, atmeans- but I don't believe this worked (it may have worked and I may be wrong).

Any help on how to estimate this adjusted mean difference would be much appreciated. Apologies if the answer is something easy, I have very little experience in statistics and with a deadline fast approaching, my internet research has not got me far.

Thanks.

Last edited by Evan Hardy; 18 Apr 2023, 15:03. Reason: Wrong code for margins, adding tags
Tags: adjusted mean difference, margins, regression
Clyde Schechter

Join Date: Apr 2014

Posts: 30095
#2

18 Apr 2023, 16:06

Exponentiating the coefficients of a linear regression that has a log-transformed outcome variable does not give you coefficients for the untransformed outcome. In fact, in this situation, I don't think it gives you anything that can be simply described or named.

Be that as it may, -margins- doesn't care about how -regress- displays things. -margins- works with the actual coefficients and standard errors that Stata calculated, so that "mistake" has no impact on -margins-. But do understand that the metric of the -margins- results is the log-transformed heart rate. Moreover, because log is a non-linear function, exponentiating those margins will not get you the expected mean of the untransformed heart rate: it gets you the expected geometric mean, which is always less than or equal to the ordinary (arithmetic) mean. (And they are only equal when there is no variation.)

You don't explain why you chose to log-transform finalhr, and without knowing that, I can't advise you on how to proceed from here.
Comment
Evan Hardy

Join Date: Apr 2023

Posts: 4
#3

18 Apr 2023, 23:45

Hi Clyde, thanks for the advice. Apologies for my mistakes, I am trying to learn the underlying statistics at the same time as the software which is proving somewhat challenging.

Initially, I log transformed finalhr to ensure normality of residuals. However I ran the code again transforming continuous covariates, rather than finalhr, and residuals were still normal. I believe this would solve the problem of being able to use margins as the coefficients will now be for the non-transformed finalhr?

Again thank you for replying and continued advice is welcomed and much appreciated.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30095
#4

19 Apr 2023, 09:44

Initially, I log transformed finalhr to ensure normality of residuals.

The reason I asked about this is that many, perhaps most, log transformations are done for bad reasons. Yours appears to be one of them. It is quite striking how people are taught to fuss over normality and heteroskedasticity in linear regression, yet they are rarely called upon to verify the single most important assumption for linear regression, namely linearity! Normality is widely misunderstood in connection with linear regression. People are often under the impression that the dependent variable must be normally distributed. This is wrong. It is the residuals that are supposed to be normally distributed. But even that requirement is not really necessary in most cases. In large samples, the central limit theorem will drive the sampling distribution of the coefficient estimates to normality regardless of the distribution of the residuals, and that will allow correct influence from the usual standard errors and t-statistics. So normality of residuals is really only necessary in small samples, where you can't count on the central limit theorem. On the other hand, small-sample tests of normality are usually grossly underpowered. There's a "catch-2" here: it isn't clear if there are any samples small enough to need normality of residuals that are also large enough to enable testing for normality!

As for heteroskedasticity (which you don't raise as an issue, but I'm mentioning it here for completeness), while it does invalidate the standard error calculations, it does not bias the coefficient estimates. And, more important, you can fix the standard error problem by just using robust standard errors if you have heteroskedasticity.

You don't say what your sample size is, but if it was more than 50 you certainly don't have to worry about normality unless the residual distribution is truly bizarre and highly skewed. Even at a sample size of 30 you are probably OK ignoring normality.

So I would go back and look into the relationships between log heart rate and your predictor variables, and untransformed heart rate and your predictor variables. One of those dependent variables will show a more linear relationship than the other. Well, maybe not. A heart rate variable, assuming we're talking about human beings, is unlikely to be outside the range of 40-150, and most will be in the 50-100 range (maybe higher if measured under stress or exercise conditions). This range is narrow enough that the non-linearity of the log transformation is minor. So both the untransformed and log-transformed variable may look equally linearly related to predictors. In that case, since you want your answers in the metric of heart rate itself, not log heart rate, just go back and redo the regression using untransformed heart rate as the outcome variable and work with those results.

However I ran the code again transforming continuous covariates, rather than finalhr, and residuals were still normal. I believe this would solve the problem of being able to use margins as the coefficients will now be for the non-transformed finalhr?

Yes, the margins results would be in the untransformed heart rate metric. But is the regression valid this way. Let me be blunt: linearity is the most important thing for validity of a linear regression model. If log y is linear in x, you are guaranteed that y is not linear in log x. and vice versa. So if the original log transformation of the outcome was valid, then what you are proposing here is definitely not valid modeling. Of course, I suspect you have yet to actually explore the linearity of relationships, and it may well be that in your situation the y vs log x relationship is more linear--in which case, go with it. But you need to explore that first: look into y vs x, ln y vs ln x, ln y vs x, and y vs ln x. Graphical exploration is probably the best way. The -graph matrix- command will enable you to see all of these in juxtaposition. If none of these is even remotely linear, then you may need to modify your model by trying other transformations, or adding interaction terms, etc.

Added: Clarification--when I refer to x and y in the preceding paragraph I am using those to refer generically to continuous independent and dependent variables, respectively. I do not mean them to be specifically the variables you called x and y in #1.
Comment
Evan Hardy

Join Date: Apr 2023

Posts: 4
#5

19 Apr 2023, 12:01

Thanks again, Clyde.

My sample size is way above 50 so I don't think I'd have to worry too much about normality, as you said, but I checked anyway and the residuals still fit a normal distribution well.

As you suspected, there wasn't really much of a difference in the linearity when using log heart rate vs untransformed heart rate - if anything untransformed heart rate was slightly better anyway - so I shall stick with that.

Please do let me know if you suspect I am still doing things incorrectly but if not, could I double check whether the original way I attempted to work out the adjusted mean difference using -margins r.alloc, atmeans- was in fact correct, or if this is also wrong?

I appreciate the helpful information you've given so far, and even more so your patience with my rather basic knowledge of regression.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30095
#6

19 Apr 2023, 12:43

Please do let me know if you suspect I am still doing things incorrectly but if not, could I double check whether the original way I attempted to work out the adjusted mean difference using -margins r.alloc, atmeans- was in fact correct, or if this is also wrong?

It was a correct way to work out the difference at means between the two groups in the log heart rate metric. For the adjusted mean difference leave out the -atmeans- option. Both are reasonable ways to present your findings. The difference between them could be thought of as follows:
The difference at means is the expected difference in heart rate between two people, one in each group, both of whom are exactly at the full-sample mean on all model explanatory variables other than group.

The adjusted mean difference is the expected difference in heart rate between any two people selected at random, one from each group, if each group had the same distribution for all model explanatory variables other than group.

I appreciate the helpful information you've given so far, and even more so your patience with my rather basic knowledge of regression.

None of us was born knowing anything about regression. We were all beginners once.
1 like
Comment
Evan Hardy

Join Date: Apr 2023

Posts: 4
#7

19 Apr 2023, 15:48

Amazing. Thank you so much for all the help
Comment

Announcement

Estimating adjusted mean difference by treatment group, following -regress-

Comment

Comment

Comment

Comment

Comment

Comment