Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Which regression? Continuous dependent variable between 0 and 1

    Dear Statalist community,

    I have two different dependent variables for which I need to run regressions.

    My questions are:
    - Which regressions are most suitable for both cases respectively
    - Which is the best follow-up literature

    I am new to Statalist. So any feedback, also on my posting style is appreciated.

    The original sample provided around 340 observations for each dependent variable. (For graphical analysis of distribution see pictures attached at the end of post)

    Unfortunately, by including several independent variables I reduced the sample size to 29 observations respectively.

    1.:The observations for the first dependent variable are continuous with a lower bound of 0 and an upper bound of 1. An excerpt:

    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input float DepVar_1
    .009060956
    .007225433
    .007398274
    .00996016
    .008196721
    .008309846
    .03448276
    .00947672
    .023762377
    .0046189376
    end


    2.: The observations of the second dependent variable are also continuous with a lower bound of 0. In contrast there is no upper bound. An excerpt:

    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input float DepVar_2
    13.19481
    11.045174
    15.830535
    15.94977
    13.91649
    18.7512
    12.101977
    12.70576
    9.620095
    10.507575
    end


    Iam really grateful for any kind of support!
    Helmut



    Attachements:
    Click image for larger version

Name:	Histo_DepVar_1.png
Views:	1
Size:	17.3 KB
ID:	1451433

    Click image for larger version

Name:	Histo_DepVar_2.png
Views:	1
Size:	19.0 KB
ID:	1451434

    Last edited by Helmut Siegfried; 01 Jul 2018, 13:31.

  • #2
    First off, if you are going from 340 cases down to 29, I would seriously re-assess my data and model. Why are so many cases missing? Is there one variable in particular that has a huge amount of MD? Would multiple imputation be an option? For basic and advanced MD techniques, see

    https://www3.nd.edu/~rwilliam/stats3/MD01.pdf

    https://www3.nd.edu/~rwilliam/stats3/MD02.pdf

    For your 0/1 variable, some sort of fractional regression model may be appropriate. For some options, see

    https://www3.nd.edu/~rwilliam/stats3...onseModels.pdf

    OLS regression may be ok for the 2nd DV. Hard to say without knowing more about it, but a lower bound of 0 does not automatically concern me,
    -------------------------------------------
    Richard Williams, Notre Dame Dept of Sociology
    StataNow Version: 19.5 MP (2 processor)

    EMAIL: [email protected]
    WWW: https://www3.nd.edu/~rwilliam

    Comment


    • #3
      Dear Richard,

      thank you for the quick reply and your feedback. I will certainly look into your recommendations.
      The problem with the sample size origins from the merging of two datasets where the identifiers in the second dataset are very scarce.

      Again thank you a lot!
      Helmut

      Comment


      • #4
        For your second case I would check out Poisson regression. You don't have response values very close to zero but whenever a negative prediction would be absurd there can be value in avoiding that. The functional form can make more sense anyway. The response doesn't have to be a count. That's a common myth.

        See e.g. https://blog.stata.com/2011/08/22/us...tell-a-friend/ (and some of the discussion).

        Comment


        • #5
          Dear Nick,

          thank you for this valuable additional feedback!

          Helmut

          Comment


          • #6
            Dear Richard and Nick,

            please forgive for this follow up question.

            I understand that fractional probit/logit use maximum likelihood estimation which has great properties for large samples.

            Is fracreg also superior to regress in a small sample setting like mine?


            Again thank you a lot for your response. Your help is greatly appreciated

            Helmut

            Comment


            • #7
              I don't see sample size as an issue here. I wouldn't recommend/not recommend any of these methods as particularly good/bad for large/small samples.

              Comment


              • #8
                As usual thanks for your valuable insights!

                Comment

                Working...
                X