Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Question regarding turning numerical variables into factor variables

    Dear all,

    I am new to this forum, so do let me know if I'm not making use of it's users help and expertise in the right way.
    I am trying to make a regression analysis of several variables on CEO's salary across 10 years of panel data.

    One of the control variables will be the industry the CEO is active in, indicated by the SIC description.
    I have tried to simply include it by using the following code:

    encode SICDescription, gen(nSICDescription)
    reg SalaryCEO variable1 variable2 variable3 i.nSICDescription


    Please let me know first of all if this is a legitimate way to use the description as a control variable and secondly, as there are 354 different industries recognised across 2412 companies, would you recommend consolidating into bigger groups and is there a statistical way to organise/explain this?

    Another control variable I want to use for predicting the salary is the year in which it was earned. This is a 'continuous' variable ranging from 2010 to 2020, but I assume it has to be used as a categorical variable too as the numerical value of the year does not provide any information in itself. Would the following syntax be a good way to go about using it?

    tostring Year, gen(stYear)
    encode stYear, gen(nstYear)
    reg SalaryCEO variable1 variable2 variable3 i.nSICDescription i.Year


    The results do not seem unexpected, but the use of syntax seems irregular and ineffective at the least.

    Final question regarding this subject.
    Variable profit/loss can obviously not be normalised using it's logged values as there are negative values present in the variable. Would generating a loss variable and transforming all negative values for the former variable to positive entries in the latter and then taking the logarithm of both variables and using the new logged values in the regression be a correct way of dealing with this data?

    Thank you very much in advance for helping me with understand these issues and as mentioned before do let me know if I should be using this board in a different way!

    Best,

    Luke
    Last edited by Luke Schreuder; 06 Mar 2022, 10:04. Reason: Factor variables

  • #2
    Luke:
    welcome to this forum.
    1) if you have panel data with a continuous regressand, -regress. should not be your first choice. See -xtreg- instead;
    2)
    Code:
    encode SICDescription, gen(nSICDescription)
    is correct if your original variable was in -string- format. In addition, see -encode- cautionary tale in its entry in Stata .pdf manual;
    3) if -Year- is already numeric, just go -i.Year-;
    4) normality is a weak requirement for reidual distribution only. I would stick with your -profit_loss- in its original metric.
    Kind regards,
    Carlo
    (StataNow 18.5)

    Comment


    • #3
      Dear Carlo,

      Thank you for your quick and concise answer. I am quite unfamiliar with using -xtreg-, so I will make sure to read into the manual (as well as revise the encode manual as you advise).
      As I assume from your answer Stata will no longer assign meaning to the numerical value of the observations if the i-prefix is used in the regression command so I'll skip the tostring transformation on the Year variable.

      I'll continue working on the regression with this new information and return to the forum once I have reached my desired outcome or additional questions arise!

      Kind regards,
      Luke

      Comment

      Working...
      X