Hi all,
I have cross-sectional dataset which contains the data of firms' annual sales. I'm interested in a regression model to test the effect of R&D spending on a firm's sales.. As is usual for income data, it is positively skewed.So, I want to do the log transformation of these skewed data before regression.
I read the post(http://www.stata.com/statalist/archi.../msg00553.html) in which it suggests that not to do the transformation to solve the skewness problem.Instead,glm may be a better choice.Then I checked out the manual of stata about glm.But I am not sure which family and link function fits my data best.Because my data is annual sales,it may not be a count data, so I think if it still proper to use poisson or nbreg. Also, it's not a dummy, a ratio or rate, so logit,probit won't be suitable. Finally, I think gamma or inverse guassian might be suitable.But I am still not sure if I am correct to select the regression.
Is there any guideline I can follow to find a regression command based on the distribution of my data when it's a skewed one? Also, since it no longer be a simple OLS, how can I use stata to graph the results of GLM regression ? I have read some example provided in Stata but most of them are data like count,ratio,rate and few is about continuous data like annual sale. Therefore, it would be very helpful to provide some examples to improve my understanding in dealing with skewed data.
Thank you for your attention and patience to this matter.
Best,
David
I have cross-sectional dataset which contains the data of firms' annual sales. I'm interested in a regression model to test the effect of R&D spending on a firm's sales.. As is usual for income data, it is positively skewed.So, I want to do the log transformation of these skewed data before regression.
I read the post(http://www.stata.com/statalist/archi.../msg00553.html) in which it suggests that not to do the transformation to solve the skewness problem.Instead,glm may be a better choice.Then I checked out the manual of stata about glm.But I am not sure which family and link function fits my data best.Because my data is annual sales,it may not be a count data, so I think if it still proper to use poisson or nbreg. Also, it's not a dummy, a ratio or rate, so logit,probit won't be suitable. Finally, I think gamma or inverse guassian might be suitable.But I am still not sure if I am correct to select the regression.
Is there any guideline I can follow to find a regression command based on the distribution of my data when it's a skewed one? Also, since it no longer be a simple OLS, how can I use stata to graph the results of GLM regression ? I have read some example provided in Stata but most of them are data like count,ratio,rate and few is about continuous data like annual sale. Therefore, it would be very helpful to provide some examples to improve my understanding in dealing with skewed data.
Thank you for your attention and patience to this matter.
Best,
David
Comment