Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • What regression model is suitable for a dataset with mixed types of variables?

    Hello Statalist!

    As the title suggests, I'm unsure of how I can perform a regression analysis with different types of variables. My dataset contains answers from a self-constructed survey, where I have gathered information on the following independent variables:
    • gender (binary),
    • age (continuous),
    • occupational status (binary, it's simply 'student' or 'non-student'),
    • educational level (ordinal, 1 to 3),
    • adults in household (ordinal, 1 to 4),
    • household income (categorical, 1 to 7) and a
    • financial literacy score which is continuous within the range 0 to 1.
    Apart from these, I also have a measure of risk aversion for each individual, this is the dependent variable. This value (within the range 1 to 5) is based on calculations from three separate questions where I have calculated an average which is around 3 for my particular sample.

    So to put it simply, I want to see how the independent variables potentially affects the risk aversion measure, but I am not sure if a regular OLS regression is preferable given these variables.

    Do you have any advice? I apologize if the answer should be obvious. Have a good day.

  • #2
    John:
    welcome to this forum.
    Is your dependent variable ordered (something like: 1=worst; 2=reasonable.....; 5=best)?
    Kind regards,
    Carlo
    (StataNow 18.5)

    Comment


    • #3
      Originally posted by Carlo Lazzaro View Post
      John:
      welcome to this forum.
      Is your dependent variable ordered (something like: 1=worst; 2=reasonable.....; 5=best)?
      Thank you Carlo.

      In short, yes. To provide some context, the values 1 to 5 represent different levels of constant relative risk aversion (CRRA) utility. A value of 1 indicates a relatively low risk aversion, suggesting that an individual with such a value is inclined towards taking more risks. In contrast, a value of 5 can be seen as a relatively high fear of risk.

      In the survey, I have asked participants three questions that let's me estimate their level of risk aversion. Each question has five alternatives that corresponds to a certain level of risk aversion ranging from 1 to 5. From these answers I have calculated an average level of risk aversion for each individual. While it is possible to conduct a regression analysis for each of the three questions separately, I'm uncertain about the most appropriate approach. I would be grateful for any guidance on this.

      Comment


      • #4
        It's the flavour of the outcome (dependent variable) that is crucial here. I guess most researchers looking at such data would start with ologit or oprobit. The use of plain or vanillla regression is hard to defend, as it would take those grades as equally spaced.

        (This looks a bit like an assignment. Please note our comments in the FAQ Advice on such matters. Also, this was cross-posted at https://www.reddit.com/r/stata/comme...for_a_dataset/ It's a rule there, and a request here, that you tell people about cross-posting.)

        Comment


        • #5
          Originally posted by Nick Cox View Post
          It's the flavour of the outcome (dependent variable) that is crucial here. I guess most researchers looking at such data would start with ologit or oprobit. The use of plain or vanillla regression is hard to defend, as it would take those grades as equally spaced.

          (This looks a bit like an assignment. Please note our comments in the FAQ Advice on such matters. Also, this was cross-posted at https://www.reddit.com/r/stata/comme...for_a_dataset/ It's a rule there, and a request here, that you tell people about cross-posting.)
          Hey Nick. Thank you for your input.

          This is not a regular assignment (homework) question. Me and a colleague are writing a master's thesis on this subject and are unsure about the correct regression model. As for the cross-posting, I was unaware of these rules and I do apologize.

          Comment


          • #6
            Telling us about cross-posting is as said requested here. The only rules are unwritten rules.

            https://www.statalist.org/forums/help is where to start, as every prompt advises.

            Comment


            • #7
              The result of averaging over 3 Likert-type items is usually considered a quasi-interval scale, at least in the social sciences. Thus, a linear model might well be a reasonable starting point. You might want to run an ordered model as kind of a robustness check. This is what you would often do when writing a paper; not sure about a master thesis.

              Comment


              • #8
                I did some research on Likert data in past and as Dan said the typical approach is to treat the combination of multiple Likert questions as continuous. Otherwise, you've got 3 models with an ordered DV, which might be tricky to interpret (I'd try it since it's your thesis, just to learn something and maybe offer something interesting to the literature).

                While I have no support for it, you might try summing the responses to the three questions if they all aim at measuring the same sort of thing.

                Anything you do will be subject to criticism as ordered responses are not continuous. Look to the literature for guidance.

                Comment


                • #9
                  John:
                  a bit off topic here, but if you go -regress- I would consider both the linear and the squared terms for -age- and search for a possible turning point.
                  Kind regards,
                  Carlo
                  (StataNow 18.5)

                  Comment


                  • #10
                    Thank you for your insights Carlo Lazzaro, Nick Cox, daniel klein and George Ford. Me and my colleague have tried several methods, but it seems like a
                    Code:
                    oprobit
                    regression with
                    Code:
                    mfx, predict(outcome(n))
                    gives us a good foundation for our analysis.

                    Comment


                    • #11
                      mfx has not been a part of official Stata for over a decade now. Use margins instead.

                      Comment


                      • #12
                        Originally posted by John Draper View Post
                        To provide some context, the values 1 to 5 represent different levels of constant relative risk aversion (CRRA) utility.
                        Given the initialism, I guess that there is a body of literature involving this type of outcome in your field of study. You might want to start there for guidance about how to analyze this kind of outcome if a consideration is to gain acceptance of your approach among your peers.

                        Originally posted by John Draper View Post
                        I . . . have a measure of risk aversion for each individual, this is the dependent variable. This value (within the range 1 to 5) is based on calculations from three separate questions where I have calculated an average . . .
                        On the other hand, if the convention in your field of study doesn't foreclose the possibility, then your colleague and you might want to consider fitting a MIMIC model using gsem with each of the three individual question's response as an indicator variable.

                        Although it does require an adequate sample size, fitting such a MIMIC model is not that difficult mechanically: see Example 36g in the user's manual (Stata Structural Equation Modeling Reference Manual) for details about how to go about it.

                        The advantage of this approach, as opposed to averaging or summing the three questions' responses, is that you don't need to assume that each question's response weighs equally in determining the risk-aversion score.

                        Comment

                        Working...
                        X