Dear all,
I am new to this forum, so do let me know if I'm not making use of it's users help and expertise in the right way.
I am trying to make a regression analysis of several variables on CEO's salary across 10 years of panel data.
One of the control variables will be the industry the CEO is active in, indicated by the SIC description.
I have tried to simply include it by using the following code:
encode SICDescription, gen(nSICDescription)
reg SalaryCEO variable1 variable2 variable3 i.nSICDescription
Please let me know first of all if this is a legitimate way to use the description as a control variable and secondly, as there are 354 different industries recognised across 2412 companies, would you recommend consolidating into bigger groups and is there a statistical way to organise/explain this?
Another control variable I want to use for predicting the salary is the year in which it was earned. This is a 'continuous' variable ranging from 2010 to 2020, but I assume it has to be used as a categorical variable too as the numerical value of the year does not provide any information in itself. Would the following syntax be a good way to go about using it?
tostring Year, gen(stYear)
encode stYear, gen(nstYear)
reg SalaryCEO variable1 variable2 variable3 i.nSICDescription i.Year
The results do not seem unexpected, but the use of syntax seems irregular and ineffective at the least.
Final question regarding this subject.
Variable profit/loss can obviously not be normalised using it's logged values as there are negative values present in the variable. Would generating a loss variable and transforming all negative values for the former variable to positive entries in the latter and then taking the logarithm of both variables and using the new logged values in the regression be a correct way of dealing with this data?
Thank you very much in advance for helping me with understand these issues and as mentioned before do let me know if I should be using this board in a different way!
Best,
Luke
I am new to this forum, so do let me know if I'm not making use of it's users help and expertise in the right way.
I am trying to make a regression analysis of several variables on CEO's salary across 10 years of panel data.
One of the control variables will be the industry the CEO is active in, indicated by the SIC description.
I have tried to simply include it by using the following code:
encode SICDescription, gen(nSICDescription)
reg SalaryCEO variable1 variable2 variable3 i.nSICDescription
Please let me know first of all if this is a legitimate way to use the description as a control variable and secondly, as there are 354 different industries recognised across 2412 companies, would you recommend consolidating into bigger groups and is there a statistical way to organise/explain this?
Another control variable I want to use for predicting the salary is the year in which it was earned. This is a 'continuous' variable ranging from 2010 to 2020, but I assume it has to be used as a categorical variable too as the numerical value of the year does not provide any information in itself. Would the following syntax be a good way to go about using it?
tostring Year, gen(stYear)
encode stYear, gen(nstYear)
reg SalaryCEO variable1 variable2 variable3 i.nSICDescription i.Year
The results do not seem unexpected, but the use of syntax seems irregular and ineffective at the least.
Final question regarding this subject.
Variable profit/loss can obviously not be normalised using it's logged values as there are negative values present in the variable. Would generating a loss variable and transforming all negative values for the former variable to positive entries in the latter and then taking the logarithm of both variables and using the new logged values in the regression be a correct way of dealing with this data?
Thank you very much in advance for helping me with understand these issues and as mentioned before do let me know if I should be using this board in a different way!
Best,
Luke
Comment