Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Categorical variables

    Dear everyone,

    I am new to STATA and am a little confused when it comes to categorical variables and a logit regression using them. All my data is already in numerical form. My dependent variable y takes values from 0,1,2. Similarly, x1 takes values from 1-6. This variables already denote categories, as for instance for x1 - value 1 is allocated to those under 18. Does this mean that x and y are already categorical variables ?

    I have read that in order to signalise STATA that the variable is a categorical one, I should put "i." in front of the variable when running the regression. Also as my dependent variable is categorical I tried the following command:

    logit i.BTAE i.MScat6 i.gender i.Exp i.Agecat i.Finback i.Prof

    but I get the following error: depvar may not be a factor variable r(198);

    Is it that my dependent variable is not categorical or how do I signalise STATA it is? Also should a categorical variable have a column for each category where it takes value 0 if n is in that respective category and 0 otherwise (so for a categorical variable 0,1,2 I would have 3 columns) ?

    Thank you very much in advance!

    Best regards,
    Eliss

  • #2
    Everything seems fine with your code, except that you cannot do a -logit- if you have more than 2 categories in the dependent variable.

    And yes, Stata does not accept factor variables in the place of the dependent variable.

    Comment


    • #3
      Dear Joro Kolev,

      thank you for your response. I realise now I should have used the ologit regression. However, would it be possible to use a simple regression in my case just with using the variables as continuous variables? Would the results be meaningful to interpret?

      thank you!

      Comment


      • #4
        Hi Eliss, I cannot tell you that because you never said what is your dependent variable. You said "dependent variable y takes values from 0,1,2", but you did not say what is the meaning of those numbers. There are (in my mind at least) three cases:

        a) The numbers 0,1,2 are meaningful numerical measurements, with some notion of distance on them, e.g., the distance from 0 to 1 is the same as the distance from 1 to 2. E.g., if your dependent variable is number of cars a family has, and your observations are families, then simple regression is fine. (Some pedants might rightfully point out that the range of your dependent variable is limited, so OLS is not the most appropriate method here... But fundamentally it is a possibility to estimate this by OLS.)

        b) The numbers 0,1,2 are ordered, but there is no notion of distance on them. E.g., 0 no car, 1 few cars, 2 many many cars. The distance here from 0 to 1 is not the same as the distance from 1 to 2. You need to use -oprobit- or -ologit-, some categorical ordered model. OLS will not do.

        c) The numbers 0,1,2 are denoting completely arbitrary categories. E.g., 0 red cars, 1 blue cars, 2 green cars. Then you need -mprobit- or -mlogit- (former is better because it makes fewer assumptions). -oprobit- and -ologit- and OLS will not do in this case.



        Originally posted by Eliss Millen View Post
        Dear Joro Kolev,

        thank you for your response. I realise now I should have used the ologit regression. However, would it be possible to use a simple regression in my case just with using the variables as continuous variables? Would the results be meaningful to interpret?

        thank you!

        Comment


        • #5
          Thank you very much for the clarification, it has helped a lot!

          Comment


          • #6
            Hello, I am trying to run a time series regression using the keep($controls) command. However, STATA tells me that "factor-variable and time-series operators not allowed" when none of my control variables are categorical variables. Why might this be? Below is my code:

            global controls lnprice lvwprice chinf L.chinf lrate lpratio

            est clear

            eststo: quietly xtreg lpbevreg c.eincentive1000##Hhy1000 $controls i.year, robust
            estadd local FE "No"

            eststo: quietly xtreg lpbevreg c.eincentive1000##Hhy1000 $controls i.year, fe robust
            estadd local FE "Yes"

            eststo: quietly xtreg lpbevreg c.incentive100##Hhy1000 $controls i.year, robust
            estadd local FE "No"

            eststo: quietly xtreg lpbevreg c.incentive100##Hhy1000 $controls i.year, fe robust
            estadd local FE "Yes"


            esttab,
            keep($controls)
            starl(* 0.10 ** 0.05 *** 0.01) label noobs nonotes nomtitle collabels(none) compress
            scalars("r2_a" "FE Fixed effects")


            esttab using "lpbevregfe.tex", replace
            keep($controls)
            starl(* 0.10 ** 0.05 *** 0.01)
            label booktabs noobs nonotes nomtitle collabels(none) compress alignment(D{.}{.}{-1})
            scalars("r2_a" "FE Panel fixed effects")

            Comment

            Working...
            X