I'm attempting to do a multiple linear regression of outcome calcium levels with ~10 predictor variables, one of which is a continuous variable and the rest binary.
On checking assumptions - normality is not met.
My data contains 1611 observations with mean 128.95, SD 355.61 and maximum 4930. It contains 768 observations that equal 0. The minimum positive calcium level value is 1.
A log transformation of outcome fixs the normality issue but obviously does not work on the large majority of the data that equals 0. One of my main aims of the model is explanatory and ease of understanding as it is for medical purposes.
I've attatched the rvfplot and normality plot below.
I understand there is alot of controversy around using a log(x+1) transformation. Would it be appropriate to apply this transformation to my data or is there better alternatives?
On checking assumptions - normality is not met.
My data contains 1611 observations with mean 128.95, SD 355.61 and maximum 4930. It contains 768 observations that equal 0. The minimum positive calcium level value is 1.
A log transformation of outcome fixs the normality issue but obviously does not work on the large majority of the data that equals 0. One of my main aims of the model is explanatory and ease of understanding as it is for medical purposes.
I've attatched the rvfplot and normality plot below.
I understand there is alot of controversy around using a log(x+1) transformation. Would it be appropriate to apply this transformation to my data or is there better alternatives?
Comment