Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • OLS with categorical independent variables_assumptions

    Hi everybody!
    I'm running OLS analysis for continuous dependent variable and 8 categorical independent variables. Could you please help me about the assumptions of the method and how to check them in stata?..Also, in case they are not met, how I could go on?

    thanks!

  • #2
    Lina:
    welcome to the list.
    Unfortunately, your question is too vague to receive a helpful reply.
    Please take a look at FAQ on how to post effectively: it's diffcicult to reply to your query if you do not post what you typed and what Stata gave you back.
    Assumptions on OLS are covered in any decent textbook on basici statistics and econometrics.


    Kind regards, Carlo
    Kind regards,
    Carlo
    (Stata 19.0)

    Comment


    • #3
      Thanks for this Mr. Lazzaro. The dependent variable is referred to expenditure and the independent variables are demographics, some of these have 2 categories (e.g. male/female) whereas others more than 2 categories (e.g status in employment). I know the assumptions of OLS ,but I'd like to ask you first of all if I have to create dummy variables for the IVs with more than 2 categories or if I run the regression in stata without any transformations. Also, who do I check if there is linear relationship between DV and IVs, and what if DV is not normally distributed?...I have about 3500 observations of this is helpful.

      Best,
      Lina

      Comment


      • #4
        Lina:
        thanks for providing more details, which allows some remrks concerning your query:
        - you don't have to botheri yourself with creating categorical variables by hand, as Stata has a cozy command for that task: -fvvarlist-;
        - for checking linear realtionship between DV and IVs, Stata has lots of visual and analytical methods: please, see -regress postestimation-;
        - you don't have to worry about DV being not normally distributed, because nomal distribution in OLS relates to residuals, not DV.
        I find hard to go further without seeing what you typed and what Stata gave you back (there's a FAQ explaining why this increases your likelihood of receiving helpful replies). The gist of the matter is that we do not know either your code, or your results until you make them available to us (there's another FAQ on code delimiters, the best way to paste what you're going to post).
        Kind regards,
        Carlo
        (Stata 19.0)

        Comment


        • #5
          For categorical explanatory variables you don't have to check for linearity, as it is impossible to violate that assumption: Categorical variables are turned into a set of indicator (dummy) variables if you use the factor variable notation (see: help fvvarlist as Carlo already mentioned). With each indicator variable you just compare two points (conditional means) and you can always connect two points by a linear line.
          ---------------------------------
          Maarten L. Buis
          University of Konstanz
          Department of history and sociology
          box 40
          78457 Konstanz
          Germany
          http://www.maartenbuis.nl
          ---------------------------------

          Comment


          • #6
            Thank you so much for your helpful comments.!So, first I use the fvvarlist to create the dummy variables from the categorical IVs, then I run the regression and then I check the assumptions through regress postestimation, right?

            Comment


            • #7
              Lina:
              yes, you're right.
              Kind regards,
              Carlo
              (Stata 19.0)

              Comment


              • #8
                Lina,

                I would strongly recommend you a book by Berry (Berry, W. D. 1993. Understanding Regression Assumptions (Vol. 92), Sage.), which is very helpful in my point of view. Once you get the idea behind the assumptions, Stata can provide you with all the tools necessary to test them.

                Anton

                Comment


                • #9
                  Lina:
                  expanding on Anton's reference list, I would recommend you another valuable (and lovely short) textbook on this topic: Allison PD. Multiple regression. A primer. Thousand Oaks, CA: Pine Forge Press, 1999.
                  Last edited by Carlo Lazzaro; 18 Jul 2015, 11:41.
                  Kind regards,
                  Carlo
                  (Stata 19.0)

                  Comment


                  • #10
                    Many many thanks all of you!!!I really appreciate it!

                    Comment


                    • #11
                      hello again!
                      I hope all of you are fine.
                      Could you please let me know if this result is ok and if I interprete it correctly?
                      HEALTHRT is health expenditure and its continuous (DV)
                      MB02 gender (male/female)
                      maritalst marital status (never married/married/widowed/divorced)
                      Is the command right for the OLS analysis?

                      If so, then as for the interpretation:
                      there is NO significant difference in the expenditure of two genders (what about the minus "-" in males' coef.?)
                      Married and never married have significant effects on health expenditure contrary to the widowed.
                      The effect of divorced is also significant judging by the constant (put everything equal to zero in the model).
                      Married and paid 528.09 more on health than divorced did while never married 322.10 less than the reference category.
                      The same for widowed.
                      Is it ok?...Is the weight ok?


                      Comment


                      • #12
                        Lina:
                        your regress code seems right for the purpose of your analysis.
                        Hoewever, your interpretation of the coefficients is not always correct:
                        - other things being equal, male spend less than female, but there's no evidence that the difference is statistically significant;
                        - other things being equal,
                        Married and never married have significant effects on health expenditure contrary to the widowed.
                        ,
                        Married and paid 528.09 more on health than divorced did while never married 322.10 less than the reference category.
                        (i.e. widowed); I'm not sure if your results are in line with the literature;
                        - the constant refers to divorced female only;
                        - i can't comment on the correctness of weight.

                        For the future, please post what you typed and what Stata gave you back via Code delimiters (see the FAQ on this topics). Thanks.
                        Kind regards,
                        Carlo
                        (Stata 19.0)

                        Comment


                        • #13
                          Great! ..Many thanks for your response. I'm not interesting if the results are in line with the literature - actually I'm expecting not to be- since the data are pooled from different sources, I simply want to see if the command and the interpretation are correct. It was a trial. The p values (e.g for the male) means that the effect of being male on expenditure is significant?And what about the weights?...The dataset is from household budgets survey and the weights are given from the source of survey, I haven't calculated anything on this. Could you have any suggestion how could I use weights?

                          Thank you,

                          Lina

                          Comment


                          • #14
                            One last query, if the F statistic is missing (it appears a blue link that however is not clear what it means), what it means for my model?
                            I saw some post on forum but I couldn't understand.

                            Comment


                            • #15
                              Lina:
                              the reason why your F-statistic is missing is well covered in this thread: http://www.stata.com/statalist/archi.../msg00685.html.
                              If your data come from a survey, you should take a look at -help svy- prefix.
                              Kind regards,
                              Carlo
                              (Stata 19.0)

                              Comment

                              Working...
                              X