How to generate random values between 0 and 10 with mean 4.5

Jen Ward

Join Date: Apr 2021

Posts: 68
#1

How to generate random values between 0 and 10 with mean 4.5

25 Oct 2023, 09:52

Hi there,

I need to generate a variable that has values between 0 and 10, with a mean of 4.5 and SD = 2.

I tried using rnormal but I also generate negative values

Code:

g a = rnormal(4.5, 2)

I then tried rpoisson but the range is from 0 to 12.

Code:

g a = rpoisson(4.5)

Lastly I tried runiformint, it keeps the range from 0 to 12 but the mean is 5 with SD=3

Code:

g a = runiformint(0, 10)

Can anyone help?
Tags: None
Leonardo Guizzetti

Join Date: Jul 2016

Posts: 2389
#2

25 Oct 2023, 09:58

It's odd to specify that you want a random number with specific moments and not specify the distribution it should come from. Likely you will want the Normal, but you would need to accept truncation beyond [0, 10] or else specify what should happen in the tails (censored values perhaps?). Either way, you final distribution will not have the same SD and may not have exactly the same mean. What exactly are you trying to do?
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35433
#3

25 Oct 2023, 11:06

Leonardo Guizzetti makes excellent points. But it’s possible that a beta distribution might help. That said, if your distribution must be discrete that should be the first desideratum. It's hard to work out what the criteria are if rnormal(), rpoisson() and runiformint() are being tried, as they are qualitatively as well as quantitatively different.
Comment
Jen Ward

Join Date: Apr 2021

Posts: 68
#4

26 Oct 2023, 03:47

Thanks both. I am trying to simulate/create a variable that captures a Likert scale with values from 0 to 10 which is also normally distributed.

I am trying to recreate values observed in the real dataset for simulation. Can you advise how this could be achieved?
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35433
#5

26 Oct 2023, 04:36

The criteria are contradictory.

The query is like asking for a cat that is also a dog, in this sense: Any distribution for an integer scale such as you specify is inherently discrete and bounded. It can't also be a normal distribution, as a normal distribution is unbounded and continuous. What is possible, but much looser, is some idea of being as you state and also approximately symmetric, specifically approximately bell-shape.

If the possible values are 0(1)10 then a mean of 4.5 already implies slight skewness. A binomial with range 0 to 10 and mean close to 5 would be close to normal in shape but not normal in any strict sense. It would have an SD of

Code:

. di sqrt(10 * 0.45 * 0.55) 1.5732133

which is distinctly less than 2.

Here is a token simulation:

Code:

. clear . set obs 10000 Number of observations (_N) was 0, now 10,000. . . set seed 2803 . . gen wanted = rbinomial(10, 0.45) . . su wanted Variable | Obs Mean Std. dev. Min Max -------------+--------------------------------------------------------- wanted | 10,000 4.4963 1.552942 0 10 . . tab wanted wanted | Freq. Percent Cum. ------------+----------------------------------- 0 | 24 0.24 0.24 1 | 202 2.02 2.26 2 | 740 7.40 9.66 3 | 1,660 16.60 26.26 4 | 2,420 24.20 50.46 5 | 2,396 23.96 74.42 6 | 1,553 15.53 89.95 7 | 756 7.56 97.51 8 | 214 2.14 99.65 9 | 31 0.31 99.96 10 | 4 0.04 100.00 ------------+----------------------------------- Total | 10,000 100.00

To get closer with a simulation, you need something more complicated, and contrariwise I don't have simple suggestions on how to get it, but there could well be smarter ideas from someone else.

But what is the purpose here? It sounds as if you have data already, which is where your mean and SD come from, and want to get a handle on variability, in which case bootstrapping might be a much better answer.
Comment
Jen Ward

Join Date: Apr 2021

Posts: 68
#6

26 Oct 2023, 05:25

Hi Nick Cox , thanks for your reply and the worked examples. I plan to run simulations to estimate the sample size for a study where this Likert scale is a predictor in the model; I wanted its values to be close to the ones observed in the pilot.
Comment

Nick Cox

Join Date: Mar 2014
Posts: 35433

26 Oct 2023, 05:36

That helps; thanks. Mata has an rdiscrete() function for fully specified distributions. Here is a toy example.

Code:

. clear

. set obs 1000
Number of observations (_N) was 0, now 1,000.

. gen wanted = .
(1,000 missing values generated)

. mata : st_store(., "wanted", rdiscrete(1000, 1, (0.1, 0.2, 0.3, 0.2, 0.1,
>  0.1)))

. tab wanted

     wanted |      Freq.     Percent        Cum.
------------+-----------------------------------
          1 |         93        9.30        9.30
          2 |        187       18.70       28.00
          3 |        303       30.30       58.30
          4 |        198       19.80       78.10
          5 |        116       11.60       89.70
          6 |        103       10.30      100.00
------------+-----------------------------------
      Total |      1,000      100.00

. replace wanted = wanted - 1
(1,000 real changes made)

. tab wanted

     wanted |      Freq.     Percent        Cum.
------------+-----------------------------------
          0 |         93        9.30        9.30
          1 |        187       18.70       28.00
          2 |        303       30.30       58.30
          3 |        198       19.80       78.10
          4 |        116       11.60       89.70
          5 |        103       10.30      100.00
------------+-----------------------------------
      Total |      1,000      100.00

In your case, you'd need to specify 11 probabilities, not 6, as I guess will be clear.

Comment

Jen Ward

Join Date: Apr 2021

Posts: 68
#8

26 Oct 2023, 05:47

That's great, thanks Nick Cox

Would this approach also work for age? I am only interested in adults aged 18 to 65
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35433
#9

26 Oct 2023, 06:51

You'd need to specify 48 probabilities, but it could be done. How well it would work I can't predict. Depends in part on how far you need exactly the same distribution. Real distributions are lumpy, and age data can suffer from heaping, so that's another area of difficulty.
1 like
Comment
Leonardo Guizzetti

Join Date: Jul 2016

Posts: 2389
#10

26 Oct 2023, 09:51

Nick's given some excellent general advice. I would find it very unwieldy to have to specify (and justify) discrete probabilities for each age category. The point of simulating data is to capture the essential elements of of the populating you are trying to describe, and it's easy to get bogged down in the finer details. The activity you have described in #8 does not connect to your initial question. I'll assume it's the ages that you are most interested in.

It would be a sensible starting point that your age distribution is described as a (censored) normal distribution with whatever mean and SD you observe with your dataset. Any data outside of 18-65 years can be ignored and replaced with a valid age inside the range. This would be reasonable if you were sampling from a single population of those adults. Some finer points that may be worth considering follow.
age may be rounded to the nearest integer. This might reflect how data are originally recorded. You can investigate if this matters for your purposes.

"lumpiness" as Nick has described, can be an issue. This can be described as mixture population, where you choose to randomly sample from 1 of 2 or more different distributions. Again, you'll need to investigate if that's relevant for your needs.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35433
#11

26 Oct 2023, 10:04

Thinking about it more: I would recommend taking the empirical probabilities and then smoothing them, and if necessary rescaling to sum to 1.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35433
#12

26 Oct 2023, 11:12

The full context is emerging only slowly, but the implication seems to be that there are several predictors -- in which case I would underline that matching each marginal distribution for each predictor won't reproduce their joint distribution unless -- as seems unlikely -- the predictors are independent.
Comment
Jen Ward

Join Date: Apr 2021

Posts: 68
#13

27 Oct 2023, 03:15

Thanks both for the considerations.

Leonardo Guizzetti - I like your suggestion that "any data outside of 18-65 years can be ignored and replaced with a valid age inside the range". How can this be achieved in Stata; would I need to identify generated values outside the range, set them to missing and sample again?
Comment
Joseph Coveney

Join Date: Apr 2014

Posts: 4374
#14

27 Oct 2023, 03:55

Originally posted by Jen Ward View Post

I plan to run simulations to estimate the sample size for a study where this Likert scale is a predictor in the model; I wanted its values to be close to the ones observed in the pilot.

Unless your pilot study is really tiny so that not all of the available scores appear in the dataset, then for this purpose I'd go with Nick's suggestion in #5 of randomly sampling the data in-hand, that is, use the empirical distribution of the questionnaire item's ordered-categorical response.

And if you're including other respondent characteristics as predictors, e.g, respondent's age, then I'd sample the predictors rowwise for the reason Nick implied in #12.
1 like
Comment
Leonardo Guizzetti

Join Date: Jul 2016

Posts: 2389
#15

27 Oct 2023, 06:41

Originally posted by Jen Ward View Post

Thanks both for the considerations.

Leonardo Guizzetti - I like your suggestion that "any data outside of 18-65 years can be ignored and replaced with a valid age inside the range". How can this be achieved in Stata; would I need to identify generated values outside the range, set them to missing and sample again?

Yes, that's one way. Another way is to generate two variables with the same distribution, one that you keep, and the other that you take from in the event that values in the first are out of range. Then drop the second variable.

In light of this though, go with Nick's suggestion first.
Comment

Announcement

How to generate random values between 0 and 10 with mean 4.5

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment