Simulating a negatively skewed distribution conditional on covariates

Mollie Paynee

Join Date: Jan 2023

Posts: 5
#1

Simulating a negatively skewed distribution conditional on covariates

07 Mar 2024, 14:02

Hello! I am testing a method in a simulation study and I would like to test this method when the outcome variable is negatively skewed. This outcome must be predicted by baseline covariates (x1,...,x4).

In the situation where the outcome is normally distributed, my model for the outcome looks like this:

Code:

set obs 300 gen x1 = 10*rnormal(0, 1) gen x2 = rnormal(0, 1) gen x3 = rnormal(0, 1) gen x4 = rnormal(0, 1) gen u1 = rnormal(0, 1) scalar a1 = .4 //Coefficent for x1 scalar a2 = -.4 //Coefficient for x2 scalar a3 = -.4 //Coefficent for x3 scalar a4 = .4 //Coefficent for x4 scalar a5 = 0.1 //Coefficient for u1 scalar a_sd = sqrt(1-(a1^2)-(a2^2)-(a3^2)-(a4^2)-(a5^2)) //Standard Deviation of error term gen e_y = rnormal(0,a_sd) //Generate error gen y = a1*x1 + a2*x2 + a3*x3 + a4*x4 + a5*u1 + e_y

My question is how can I code y so that it is a negatively skewed variable, but is still conditional on covariates and in such a way that I can change the coefficients?

I would like the resulting y variable to resemble a variable that would be created if I was to use the -rbeta- command, only that I use existing variables to create that distribution instead.

Code:

gen y = rbeta(6,1)

Last edited by Mollie Paynee; 07 Mar 2024, 14:33.
Tags: distribution, simulation, skewed
Jeff Wooldridge

Join Date: Apr 2014

Posts: 2081
#2

07 Mar 2024, 15:50

So you want y to take values in [0,1] and have a skewed distribution? How about

Code:

gen alpha = exp(a0 + a1*x1 + a2*x2 + a3*x3 + a4*x4) gen y = rbeta(alpha,1)

The mean of y has the logistic form. You can replace "1" in rbeta(alpha,1) to get different distributional shapes. In any case, the shape depends on x. If you choose a0 large enough, 1 < exp(a0 + a1*x1 + a2*x2 + a3*x3 + a4*x4) most of the time and then the distribution will have a negative skew. I think.
2 likes
Comment
Mollie Paynee

Join Date: Jan 2023

Posts: 5
#3

14 Mar 2024, 06:55

Thank you so much, Jeff, that is really helpful. Y can take values in any range, as I am then categorizing the Y variable.
Comment

Joseph Coveney

Join Date: Apr 2014
Posts: 4351

14 Mar 2024, 08:45

Originally posted by Mollie Paynee View Post

. . . .how can I code y so that it is a negatively skewed variable, but is still conditional on covariates and in such a way that I can change the coefficients?

This change (in red) to your code will give the distribution of y a negative skew:

Code:

    set obs 300
    gen x1 = -abs(10*rnormal(0, 1))
    gen x2 = -abs(rnormal(0, 1))
    gen x3 = -abs(rnormal(0, 1))
    gen x4 = -abs(rnormal(0, 1))
    gen u1 = -abs(rnormal(0, 1))

    scalar a1 = .4 //Coefficent for x1
    scalar a2 = -.4 //Coefficient for x2
    scalar a3 = -.4 //Coefficent for x3
    scalar a4 = .4 //Coefficent for x4
    scalar a5 = 0.1 //Coefficient for u1
    scalar a_sd = sqrt(1-(a1^2)-(a2^2)-(a3^2)-(a4^2)-(a5^2)) //Standard Deviation of error term
    gen e_y = rnormal(0,a_sd) //Generate error
    
    gen y  = a1*x1 + a2*x2 + a3*x3 + a4*x4 + a5*u1 + e_y

histogram y, scheme(stsj) ylabel( , nogrid)

Announcement

Simulating a negatively skewed distribution conditional on covariates

Comment

Comment

Comment