Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Simulating a negatively skewed distribution conditional on covariates

    Hello! I am testing a method in a simulation study and I would like to test this method when the outcome variable is negatively skewed. This outcome must be predicted by baseline covariates (x1,...,x4).

    In the situation where the outcome is normally distributed, my model for the outcome looks like this:

    Code:
        set obs 300
        gen x1 = 10*rnormal(0, 1)
        gen x2 = rnormal(0, 1)
        gen x3 = rnormal(0, 1)
        gen x4 = rnormal(0, 1)
        gen u1 = rnormal(0, 1)
    
        scalar a1 = .4 //Coefficent for x1
        scalar a2 = -.4 //Coefficient for x2
        scalar a3 = -.4 //Coefficent for x3
        scalar a4 = .4 //Coefficent for x4
        scalar a5 = 0.1 //Coefficient for u1
        scalar a_sd = sqrt(1-(a1^2)-(a2^2)-(a3^2)-(a4^2)-(a5^2)) //Standard Deviation of error term
        gen e_y = rnormal(0,a_sd) //Generate error
        
        gen y  = a1*x1 + a2*x2 + a3*x3 + a4*x4 + a5*u1 + e_y
    My question is how can I code y so that it is a negatively skewed variable, but is still conditional on covariates and in such a way that I can change the coefficients?

    I would like the resulting y variable to resemble a variable that would be created if I was to use the -rbeta- command, only that I use existing variables to create that distribution instead.
    Code:
    gen y = rbeta(6,1)
    Last edited by Mollie Paynee; 07 Mar 2024, 14:33.

  • #2
    So you want y to take values in [0,1] and have a skewed distribution? How about

    Code:
    gen alpha = exp(a0 + a1*x1 + a2*x2 + a3*x3 + a4*x4)
    gen y = rbeta(alpha,1)
    The mean of y has the logistic form. You can replace "1" in rbeta(alpha,1) to get different distributional shapes. In any case, the shape depends on x. If you choose a0 large enough, 1 < exp(a0 + a1*x1 + a2*x2 + a3*x3 + a4*x4) most of the time and then the distribution will have a negative skew. I think.

    Comment


    • #3
      Thank you so much, Jeff, that is really helpful. Y can take values in any range, as I am then categorizing the Y variable.

      Comment


      • #4
        Originally posted by Mollie Paynee View Post
        . . . .how can I code y so that it is a negatively skewed variable, but is still conditional on covariates and in such a way that I can change the coefficients?
        This change (in red) to your code will give the distribution of y a negative skew:
        Code:
            set obs 300
            gen x1 = -abs(10*rnormal(0, 1))
            gen x2 = -abs(rnormal(0, 1))
            gen x3 = -abs(rnormal(0, 1))
            gen x4 = -abs(rnormal(0, 1))
            gen u1 = -abs(rnormal(0, 1))
        
            scalar a1 = .4 //Coefficent for x1
            scalar a2 = -.4 //Coefficient for x2
            scalar a3 = -.4 //Coefficent for x3
            scalar a4 = .4 //Coefficent for x4
            scalar a5 = 0.1 //Coefficient for u1
            scalar a_sd = sqrt(1-(a1^2)-(a2^2)-(a3^2)-(a4^2)-(a5^2)) //Standard Deviation of error term
            gen e_y = rnormal(0,a_sd) //Generate error
            
            gen y  = a1*x1 + a2*x2 + a3*x3 + a4*x4 + a5*u1 + e_y
        
        histogram y, scheme(stsj) ylabel( , nogrid)

        Comment

        Working...
        X