Counting the fraction of (X, Z) pairs with the property that X > Z

Itzhak Rasooly

Join Date: Aug 2022

Posts: 23
#1

Counting the fraction of (X, Z) pairs with the property that X > Z

01 Feb 2024, 11:25

Consider two variables X , Z (these can have a different number of non-missing observations). I am trying to count the fraction of all possible (X, Z) pairs that have the property that X > Z. (Actually, I am trying to do something a bit more complicated but this should be a good warm-up!)

For example, suppose that my dataset is:
X Z

1 0

. 2

(Here, X has one missing observation.) In this case, there are two possible pairs, i.e. (1, 0) and (1, 2), and X > Z in 1/2 of the cases.

In Python, one could do this by writing something like:

HTML Code:

pairs = 0 x_exceeds_z = 0 for x in x_list: for z in z_list: pairs += 1 if x > z: x_exceeds_z += 1 print(x_exceeds_z/pairs)

However, I have no idea how to do this in STATA. Is it easy to do?

If I may a second question, I will ultimately want to bootstrap (a more complicated version of) this estimate. Is this also easy to do in STATA?

Thanks in advance for any suggestions or pointers.

Last edited by Itzhak Rasooly; 01 Feb 2024, 11:27.
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#2

01 Feb 2024, 11:33

Code:

count if X > Z & !missing(X, Z) local numerator `r(N)' count local denominator `r(N)' display as text "Fraction of pairs with X > Z = " as result `=`numerator'/`denominator''

Added: Stata is really not like Python. It requires a different way of thinking about data and its organization. It's primitives are higher order objects than those of Python. I suggest you take some time to get an overview of Stata's approach. Your Stata installation comes with PDF user manuals. You can find them in Stata's Help menu. Read the User's Guide [U] and Getting Started [GS] manuals to get a sense of how things work in Stata.

Last edited by Clyde Schechter; 01 Feb 2024, 11:38.
Comment
Itzhak Rasooly

Join Date: Aug 2022

Posts: 23
#3

01 Feb 2024, 11:43

Hi Clyde, many thanks for the suggestion! However, I think your code may only compare (X, Z) values in the same row? Indeed, when I checked X = (1, 2, 3), Z = (2, 3, 4), your code seemed to give the answer of 0, which is not correct. (To clarify, I want to consider all pairs X_i, Z_j where i and j can take all feasible values.)
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#4

01 Feb 2024, 12:01

I want to consider all pairs X_i, Z_j where i and j can take all feasible values

I did not understand your original question. I thought you wanted to only consider X_i with Z_i.

So it's a little more complicated:

Code:

* Example generated by -dataex-. For more info, type help dataex clear input byte(x z) 1 0 . 2 end preserve keep z tempfile zs save `zs' restore drop z cross using `zs' count if x > z & !missing(x, z) local numerator `r(N)' count local denominator `r(N)' display as text "Fraction of pairs with X > Z = " as result `=`numerator'/`denominator''

If x and z, when not missing, are always integers, then there is a nicer way. Post back if that's the case.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#5

01 Feb 2024, 13:45

On reflection, the approach in #4, while appropriate for a small data set, will be far too demanding of both memory and computation time in a large data set. It's a brute force counting method.

Here's something a bit more efficient

Code:

count local denominator = r(N)^2 rename (x z) (x0 x1) gen `c(obs_t)' obs_no = _n reshape long x, i(obs_no) drop if missing(x) rangestat (sum) dominated_zs = _j, interval(x . x) rangestat (sum) equal_zs = _j, interval(x 0 0) replace dominated_zs = dominated_zs - equal_zs summ dominated_zs if _j == 0, meanonly local numerator `r(sum)' display as text "Fraction of pairs with X > Z = " as result `=`numerator'/`denominator''

-rangestat- is written by Robert Picard, Nick Cox, and Roberto Ferrer. It is available from SSC.

This approach is quite similar to what I had in mind for use if x and z are always integers, and there would be no noticeable incremental benefit to the further modifications that could be made in that case.

This approach destroys the data initially in memory. So if you need to retain the original data, -preserve- it before doing this, and -restore- it at the end.
Comment

Bruce Weaver

Join Date: May 2014
Posts: 1133

01 Feb 2024, 18:53

If I understand what Itzhak Rasooly is asking for, I think I would use -fillin-.

Code:

. * Read in the data from #1
. clear

. input byte (x z)

            x         z
  1. 1 0
  2. . 2
  3. end

. * Generate all x-z pairs
. fillin x z // generate all x-z pairs

. * Flag the pairs where x > z
. generate byte XgtZ = x > z if !missing(x, z)
(2 missing values generated)

. quietly summarize XgtZ // mean = p(x > z)

. generate pXgtZ = r(mean)

. list, clean noobs

    x   z   _fillin   XgtZ   pXgtZ  
    1   0         0      1      .5  
    1   2         1      0      .5  
    .   0         1      .      .5  
    .   2         0      .      .5  

. drop if _fillin // If you want to revert to the original dataset
(2 observations deleted)

.
. * Read in the data from #3
. clear

. input byte (x z)

            x         z
  1. 1 2
  2. 2 3
  3. 3 4
  4. end

. * Generate all x-z pairs
. fillin x z // generate all x-z pairs

. * Flag the pairs where x > z
. generate byte XgtZ = x > z if !missing(x, z)

. quietly summarize XgtZ // mean = p(x > z)

. generate pXgtZ = r(mean)

. list, clean noobs

    x   z   _fillin   XgtZ      pXgtZ  
    1   2         0      0   .1111111  
    1   3         1      0   .1111111  
    1   4         1      0   .1111111  
    2   2         1      0   .1111111  
    2   3         0      0   .1111111  
    2   4         1      0   .1111111  
    3   2         1      1   .1111111  
    3   3         1      0   .1111111  
    3   4         0      0   .1111111  

. drop if _fillin // If you want to revert to the original dataset
(6 observations deleted)

--
Bruce Weaver
Email: [email protected]
Version: Stata/MP 19.5 (Windows)

Comment

Itzhak Rasooly

Join Date: Aug 2022

Posts: 23
#7

02 Feb 2024, 01:57

Dear both, thanks so much for the code! Clyde Schechter Unfortunately, X and Z need not be integers.
Comment

Announcement

Counting the fraction of (X, Z) pairs with the property that X > Z

Comment

Comment

Comment

Comment

Comment

Comment