Stata equivalent to "qbinom" in R?

Ariel Linden

Join Date: Apr 2014

Posts: 153
#1

Stata equivalent to "qbinom" in R?

11 Dec 2023, 14:03

Hi All,

R has a function called qbinom that returns the returns the value of the inverse cumulative density function (cdf) of the binomial distribution given a certain random variable q, number of trials (size) and probability of success on each trial (prob).

For example,

Code:

qbinom(0.005, 2000, 0.02241113)

returns the value 29. That is, it "backs in" to the number of successes k that are required in Stata's binomial (and invbinomial) functions.

Does anyone know how to get an equivalent result in Stata?

(I have figured out binomial() is equal to R's pbinom() and that binomialp() is equal to R's dbinom()).
Thanks!

Ariel
Tags: None

Scott Merryman

Join Date: Mar 2014
Posts: 895

11 Dec 2023, 21:18

There is no closed form for the inverse CDF of the binomial distribution - R uses the Cornish–Fisher Expansion (https://en.wikipedia.org/wiki/Cornis...sher_expansion) to calculate the value of k.

Here is a naïve approach:

Code:

capture program drop qbinom
program define qbinom, rclass
    syntax , q(numlist) size(numlist)  prob(numlist)
    local k = 0
    local p = 0
    while `p' -  `prob' <= -1e-10 {
        local ++k
        local p = invbinomial(`size', `k', `q')
    }
    disp " k = " `k'
    return scalar k = `k'
end

Using Haghish's -rcall-

Code:

. qbinom, q(.4) size(30) prob(.25)
 k = 7

. rcall: qbinom(.4, size=30, prob=.25)
program Rpath already defined
[1] 7

. qbinom, q(.005) size(2000) prob(.02241113)
 k = 29

. rcall: qbinom(.005, size=2000, prob=.02241113)
program Rpath already defined
[1] 29

Haghish, E. F. (2019). Seamless interactive language interfacing between R and Stata. The Stata Journal, 19(1), 61-82.

Comment

Joseph Coveney

Join Date: Apr 2014
Posts: 4374

12 Dec 2023, 00:49

If you're using a recent version of Stata with its Python integration, there's also a similar function in the SciPy library's stats subpackage that you could avail yourself of. Based on its speed, it seems to also use an approximation to the CDF in order to choose an entry point before beginning its search.

If you're doing one-offs, then Scott's naive approach is all you need, but if you're contemplating simulation or other repetitive use, it might be faster in the long run to call Python. If the example below, it's about 50-fold the speed of a naive search.

Code:

version 18.0

clear *

// seedem
set seed 421948205

mata:
mata set matastrict on

real scalar function qbinomial(real scalar q, real scalar n, real scalar pi) {

    real scalar k

    if (pi < 0.5) {
        k = 0
        do {
            k++
        } while (invbinomial(n, k, q) -  pi <= -1e-10)
    }
    else {
        k = n + 1
        while (invbinomial(n, k-1, q) - pi >= 1e-10) {
            k--
        }
    }

    return(k)
}
end

python:
from scipy.stats import binom
from sfi import Scalar, Macro

end

quietly set obs 1000
quietly generate int mata_icdf = .
quietly generate int python_icdf = .
quietly generate double q = .
quietly generate double pi = .

tempname mata_icdf python_icdf n q pi

timer clear
#delimit ;
forvalue i = 1/`=_N' {;

    scalar define `n' = runiformint(1000, 2000);
    scalar define `q' = runiform();
    scalar define `pi' = runiform();

    timer on 1;
    mata: st_numscalar(st_local("mata_icdf"),
        qbinomial(st_numscalar(st_local("q")),
        st_numscalar(st_local("n")), st_numscalar(st_local("pi"))));
    timer off 1;

    timer on 2;
    python: Scalar.setValue(Macro.getLocal('python_icdf'),
        binom.ppf(Scalar.getValue(Macro.getLocal('q')),
        Scalar.getValue(Macro.getLocal('n')),
        Scalar.getValue(Macro.getLocal('pi'))));
    timer off 2;

    quietly replace q = `q' in `i';
    quietly replace pi = `pi' in `i';
    quietly replace mata_icdf = `mata_icdf' in `i';
    quietly replace python_icdf = `python_icdf' in `i';
};
#delimit cr

assert mata_icdf == python_icdf

timer list
display in smcl as result %3.0f r(t1) / r(t2)

exit

Do-file and log file attached if you're interested.

Attached Files

Comment

daniel klein

Join Date: Mar 2014
Posts: 3824

12 Dec 2023, 03:06

This

Code:

version 18

mata :

real scalar cdfbinomial(
    
    real scalar n,
    real scalar x,
    real scalar p
    
    )
{
    real scalar mu
    real scalar sigma
    real scalar gamma
    real scalar z
    real scalar k
    
    
    mu    = n*p
    sigma = sqrt(n*p*(1-p))
    gamma = ((1-p)-p)/sigma
    
    z = invnormal(x)
    k  = floor(mu + sigma * (z + gamma*(z*z-1)/6) + 0.5)
    k  = min((k,n))
    
    if (binomial(n,k,p) >= x)
        while (binomial(n,k-1,p) > x)
            k--
    else
        while (binomial(n,k,p) <= x)
            k++
        
    return(k)
}

end

is (supposed to be) a copy of the core functionality of the respective R algorithm.

I can replicate Joseph's timings using his code in #3.

Code:

(output omitted)
. timer list
   1:      9.15 /     1000 =       0.0091
   2:      0.17 /     1000 =       0.0002

. display in smcl as result %3.0f r(t1) / r(t2)
 54

Using the normal approximation as an entry point is about 50(+) times faster in Mata, too.

Last edited by daniel klein; 12 Dec 2023, 03:12.

Comment

Ariel Linden

Join Date: Apr 2014

Posts: 153
#5

12 Dec 2023, 08:35

Thank you all for these thoughtful and very helpful responses!

Ariel
Comment

Ariel Linden

Join Date: Apr 2014
Posts: 153

13 Dec 2023, 16:48

Hi Daniel,

I am not sure how to get a return scalar from this mata code so that I can proceed to do something with the result. I added st_numscalar("k", k) to your code below to convert mata's k to Stata's k. But that does not appear to return the scalar to Stata.

I am guessing it is an easy fix that I am overlooking.

Thanks!

Ariel

Code:

version 17
clear all

mata :

real scalar cdfbinomial(
    
    real scalar n,
    real scalar x,
    real scalar p
    
    )
{
    real scalar mu
    real scalar sigma
    real scalar gamma
    real scalar z
    real scalar k
    
    
    mu    = n*p
    sigma = sqrt(n*p*(1-p))
    gamma = ((1-p)-p)/sigma
    
    z = invnormal(x)
    k  = floor(mu + sigma * (z + gamma*(z*z-1)/6) + 0.5)
    k  = min((k,n))
    
    if (binomial(n,k,p) >= x)
        while (binomial(n,k-1,p) > x)
            k--
    else
        while (binomial(n,k,p) <= x)
            k++

    return(k)

    st_numscalar("k", k)

}

end

mata :cdfbinomial(2000, 0.005, 0.02241113)

Code:

. di k
k not found
r(111);

Comment

Joseph Coveney

Join Date: Apr 2014
Posts: 4374

13 Dec 2023, 18:06

Originally posted by Ariel Linden View Post

I added st_numscalar("k", k) to your code below to convert mata's k to Stata's k. But that does not appear to return the scalar to Stata.

The function returns before it gets there: put the st_numscalar() line before the return().

If you're going to use the function in that way, you might want to make it a subroutine instead.

Code:

void function cdfbinomial(
    
    real scalar n,
    real scalar x,
    real scalar p
    
    )
{
    real scalar mu
    real scalar sigma
    real scalar gamma
    real scalar z
    real scalar k
    
    
    mu    = n*p
    sigma = sqrt(n*p*(1-p))
    gamma = ((1-p)-p)/sigma
    
    z = invnormal(x)
    k  = floor(mu + sigma * (z + gamma*(z*z-1)/6) + 0.5)
    k  = min((k,n))
    
    if (binomial(n,k,p) >= x)
        while (binomial(n,k-1,p) > x)
            k--
    else
        while (binomial(n,k,p) <= x)
            k++

    st_numscalar("k", k)

    // return(k)

}

Comment

daniel klein

Join Date: Mar 2014
Posts: 3824

14 Dec 2023, 06:07

I would not hardwire the name k into the function. In fact, I would not have the Mata function return a Stata scalar at all. It just complicates matters if you want to re-use the rather generic function in your next project in a different context. Instead, set up the scalar from within Stata, i.e., code

Code:

. mata : st_numscalar("k", invcdfbinomial(2000, 0.005, 0.02241113))
. display k
29

Here is the definition of invcdfbinomial(), which is a better name for the function, that adds some basic argument validation:

Code:

version 16.1

mata :

real scalar invcdfbinomial(
    
    real scalar n,
    real scalar x,
    real scalar p
    
    )
{
    real scalar mu
    real scalar sigma
    real scalar gamma
    real scalar z
    real scalar k
    
    
    if ( hasmissing((n,x,p)) )
        return(.)
    
    if (n < 0)
        _error(3300)
    
    if ( (x<0)|(x>1) )
        _error(3300)
    
    if ( (p<0)|(p>1) )
        _error(3300)
    
    mu    = n*p
    sigma = sqrt(n*p*(1-p))
    gamma = ((1-p)-p)/sigma
    
    z = invnormal(x)
    k  = floor(mu + sigma * (z + gamma*(z*z-1)/6) + 0.5)
    k  = min((k,n))
    
    if (binomial(n,k,p) >= x)
        while (binomial(n,k-1,p) > x)
            k--
    else
        while (binomial(n,k,p) <= x)
            k++

    return(k)
}

end

Comment

Ariel Linden

Join Date: Apr 2014

Posts: 153
#9

14 Dec 2023, 09:06

Thank you again, Joseph and Daniel! Mata is a completely different beast than Stata.

Ariel
Comment
Ariel Linden

Join Date: Apr 2014

Posts: 153
#10

02 Jan 2024, 09:46

Hi Daniel and Joseph,

Is it possible to change the input criteria of the mata function to accept a variable instead of the numeric value for n?

Code:

real scalar invcdfbinomial( real scalar n, // change to varname or varlist? real scalar x, real scalar p )

Similarly, can the line of code that is used to pull the value back into Stata be changed to reflect that it is a variable and not a value?

Code:

. mata : st_numscalar("k", invcdfbinomial(varname, 0.005, 0.02241113))

Thank you again! Ariel
Comment
daniel klein

Join Date: Mar 2014

Posts: 3824
#11

02 Jan 2024, 11:30

Do you mean changing the scalar argument to a vector (or even to a matrix)? As far as I can tell, that's not straightforward. Although normal() and binomial() support matrix arguments, you would need to modify the optimization part to work with matrices.

The obvious but (very) slow workaround is

Code:

generate result = . levelsof varname , local(levels) foreach level of levels { mata : st_numscalar("k", invcdfbinomial(`level', #, #)) replace result = scalar(k) if varname == `level' }
Comment
Ariel Linden

Join Date: Apr 2014

Posts: 153
#12

02 Jan 2024, 14:25

Hi Daniel,

That is actually the exact same code that I wrote to get around what I was hoping would be a simple change to the mata function you wrote.

Okay, if there is no more efficient approach, I'll stick with the foreach loop!

Thanks again for your help. You are a mata wizard!

Ariel
Comment
Ariel Linden

Join Date: Apr 2014

Posts: 153
#13

21 Jan 2024, 15:18

Hi All,

The same issue that exists with qbinom() not having a Stata equivalent also exists with R's qpois() function. Stata has invpoissontail() which returns values that are somewhat close to those of qpois() but not close enough. In the documentation for R's qpois(), it states that "qpois uses the Cornish–Fisher Expansion to include a skewness correction to a normal approximation, followed by a search." I am guessing that is why the results between R and Stata do not match.

Is there a way to get identical results in Stata?

Thanks in advance!

Ariel

Code:

qpois(0.001, 36 * 0.9801219) = 18

Code:

. di invpoissontail(36 * 0.9801219, 0.001) = 19.518189
Comment

daniel klein

Join Date: Mar 2014
Posts: 3824

#14

22 Jan 2024, 05:03

Adopting the code underlying the corresponding R algorithm, but omitting the more elaborated checks therein, this should work:

Code:

version 16.1

mata :

real scalar invcdfpoisson(
    
    real scalar m,
    real scalar x
    
    )
{
    real scalar mu
    real scalar sigma
    real scalar gamma
    real scalar z
    real scalar k
    
    
    if ( hasmissing((m,x)) )
        return(.)
        
    if (m < 0)
        _error(3300)
    
    if ( (x<0)|(x>1) )
        _error(3300)
    
    if (m == 0)
        return(0)
    
    if (x == 0)
        return(0)
    
    if (x == 1)
        return(.)
    
    mu    = m
    sigma = sqrt(m)
    gamma = 1/sigma
    
    z = invnormal(x)
    k  = round(mu + sigma * (z + gamma*(z*z-1)/6) + 0.5)
    
    if (poisson(m,k) >= x)
        while ( k & (poisson(m,k-1)>x) )
            k--
    else
        while (poisson(m,k) <= x)
            k++

    return(k)
}

end

Code:

. mata : invcdfpoisson(36 * 0.9801219, 0.001)
  18

I am not so sure about the edge cases, though.

Last edited by daniel klein; 22 Jan 2024, 05:13. Reason: corrected link; slight code polish

Comment

Ariel Linden

Join Date: Apr 2014

Posts: 153
#15

22 Jan 2024, 10:59

Thank you, again, Daniel! This works perfectly!
Comment

Announcement