Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Discrete explanatory variable

    Hello,

    I'm running a fractional response model. One of the explanatory variables of women's status is discrete in nature. Its values lie in the interval [0,6], and higher values imply a higher status. Can I treat it as a continuous variable?

  • #2
    Maybe. If you do treat it as a continuous variable, you will be constraining the model to the following assumptions: the difference between outcomes associated with, say, values 3 and 4, is the same as the difference between outcomes associated with values 1 and 2, or 2 and 3, or 4 and 5, or 5 and 6. Same here means same direction and same size. If this is true, or close to true, then treating it as a continuous variable is fine. One way to check this is to first use it as a discrete variable and then examine the coefficients you get for each level. If each coefficient differs from the preceding one by roughly the same amount, then treating the variable as continuous will be fine. At the other extreme, if going from one level to the next sometimes takes the outcome in different directions, you would be ill advised to treat the variable as continuous.

    Comment


    • #3
      Sure, I'll check that. But I have four such variables and two of them take values from 0-6, one from 0-14, and the last one from 0-3. If I'm treating them discrete in nature (according to the results of treating them discrete versus continuous), there will be so many categories and the model might not be parsimonious. What should I be doing?

      Comment


      • #4
        Well, first, parsimony is nice, but you shouldn't trade accuracy for parsimony. After all, the ultimate in parsimony is a model with only a constant term. It's the analog of the stopped clock that is right once a day.

        The four variables you describe add up to 29 degrees of freedom, assuming you do not include interaction terms. There are various rules of thumb about how many observations you need per degree of freedom. The most lenient such rule I know if is 30. So if you had a little under 900 observations to fit the model with, you'd be ok. If your data set is smaller than that, and you can't get more data, and if the coefficients don't look good for treating any of them as continuous, you could consider things like combining categories within the variables. Maybe you don't lose much predictive accuracy by reducing that 0-14 variables to just four or 5 categories. Maybe you can get those 0-6 categories down to 0-3 or something like that.

        Also, try graphing the outcome variable against each of your discrete variables. You might see a non-linear relationship that you can nevertheless describe functionally with fewer degrees of freedom. For example, maybe the relationship between outcome and 0-6 fails the continuous variable trial, but on graphical exploration it looks U-shaped. Then maybe using the variable and its square will work.

        At the end of the day, building a model is as much art as it is science. You have to try things out. Of course, there is always the risk that in your trials you will stumble upon a solution that beautifully fits the random noise in your data and the model fails spectacularly when you try it out in another data set. There's always that risk.

        Comment


        • #5
          Thank you for the insightful points, Clyde. I'll definitely work on them.

          Comment


          • #6
            Hello, Clyde. I tried graphing my outcome of interest against each of my discrete explanatory variables (2 of them attached below), but I'm not able to figure out anything meaningful from the graphs. I also combined categories within variables and obtained the marginal effects after running a fractional response model as follows:

            Code:
             margins, dydx(dm fm av fr)
            
            Average marginal effects                        Number of obs     =     48,202
            Model VCE    : Robust
            
            Expression   : Conditional mean of c_vector, predict()
            dy/dx w.r.t. : 2.dm 3.dm 2.fm 3.fm 2.av 3.av 4.av 5.av 2.fr
            
            ------------------------------------------------------------------------------
                         |            Delta-method
                         |      dy/dx   Std. Err.      z    P>|z|     [95% Conf. Interval]
            -------------+----------------------------------------------------------------
                      dm |
                    2-3  |  -.0071423    .002056    -3.47   0.001     -.011172   -.0031127
                    4-6  |  -.0005733   .0029887    -0.19   0.848    -.0064311    .0052844
                         |
                      fm |
                    2-3  |  -.0050407   .0033678    -1.50   0.134    -.0116414      .00156
                    4-6  |  -.0076704   .0033267    -2.31   0.021    -.0141907   -.0011501
                         |
                      av |
                    3-5  |  -.0059851   .0043137    -1.39   0.165    -.0144398    .0024696
                    6-8  |  -.0093236   .0032793    -2.84   0.004     -.015751   -.0028962
                   9-11  |  -.0111638   .0037065    -3.01   0.003    -.0184284   -.0038993
                  12-14  |  -.0202523   .0030152    -6.72   0.000     -.026162   -.0143427
                         |
                      fr |
                    2-3  |  -.0284811   .0016918   -16.83   0.000     -.031797   -.0251651
            ------------------------------------------------------------------------------
            Note: dy/dx for factor levels is the discrete change from the base level.
            Can you please suggest something based on these results? Also, how do I interpret them as my dependent variable is in the interval [0,1]?
            Attached Files

            Comment


            • #7
              The problem with the graphs is over-plotting: it is impossible to know whether any individual marker represents one observation or many.

              You'd get a better picture (literally) with (e.g.)

              Code:
              scatter c_vector attitude_violence, ms(Oh) jitter(2)
              where 2 is just a suggestion to be tuned.

              Beyond that, the conditional means can be seen through (e.g.)

              Code:
              tabstat c_vector, by(attitude_violence) 
              
              egen mean_viol = mean(c_vector), by(attitude_violence)
              egen tag = tag(attitude_violence) 
              line mean_viol attitude_violence if tag, sort
              It's not obvious to me why you would want to coarsen these predictor variables. That would be just throwing information that might be helpful, and doing so arbitrarily.

              Comment


              • #8
                Hello, Nick.

                I followed the codes for the scatter, it graphs the data well but I'm still confused about the interpretation. I also obtained the graph for the conditional means, and I am not able to figure out a clear-cut pattern. I'm really confused as to how I should treat the variables in the regression analysis.

                Attached Files

                Comment


                • #9
                  Looks like that there is no pattern.
                  ---------------------------------
                  Maarten L. Buis
                  University of Konstanz
                  Department of history and sociology
                  box 40
                  78457 Konstanz
                  Germany
                  http://www.maartenbuis.nl
                  ---------------------------------

                  Comment


                  • #10
                    Hello, Maarten. I'm getting some patterns for the other three variables, that is, for decision-making, freedom of mobility, and access to financial resources.
                    Attached Files

                    Comment

                    Working...
                    X