Two arguments in the "rank" function of egen

FitzGerald Blindman

Join Date: Sep 2023

Posts: 32
#1

Two arguments in the "rank" function of egen

20 Feb 2025, 11:37

Hi friends,

I have data about parents and their children, it is build like that:

id_parent - parent identifier
birth_year - the birth year of every one of his children
id_child - child identifier

So the number of observation that will get certain value of the variable "id_parent" is the number of children this parent have. Every value of id_child will appear twice - one when he or she connected to his/her father ID and one for his/her mother ID.

Now I want to rank the children according to their year of birth, so I wrote the next code:

Code:

bysort id_parent: egen sibiling_order = rank(birth_year), unique

The trouble is that in case of twins it might create situation that one twin is ranked 1 for his father and 2 for his mother and vice versa. It is problem for me when I later want to keep only couples that had their first child together.
So I tried the next code:

Code:

bysort id_parent: egen sibiling_order = rank(birth_year id_child), unique

and I thought it will solve it since it suppose to rank them initially by the birth year and if there are equalizer than by the id_child, but I got an error message that says "birth_yearid_child invalid name", and it made me thought maybe the function rank can use only one variable.

So does it true? and maybe someone can find me way to solve my problem?

Thank you already,

FitzGerald
Tags: None
George Ford

Join Date: Aug 2014

Posts: 3081
#2

20 Feb 2025, 11:49

would this work? bysort id_parent (id_child): egen sibiling_order = rank(birth_year), unique
Comment
FitzGerald Blindman

Join Date: Sep 2023

Posts: 32
#3

20 Feb 2025, 11:55

Originally posted by George Ford View Post

would this work? bysort id_parent (id_child): egen sibiling_order = rank(birth_year), unique

Thank you George, I think it does.

I'm checking it deeply now and I will update you.

Fitz
Comment
FitzGerald Blindman

Join Date: Sep 2023

Posts: 32
#4

20 Feb 2025, 12:00

Originally posted by George Ford View Post

would this work? bysort id_parent (id_child): egen sibiling_order = rank(birth_year), unique

Hi George,

I'm sorry but it doesn't work. The first example for twins I find was ranked oppositely for every parent..

Do you another idea?

Fitz
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35240
#5

20 Feb 2025, 12:02

Data example please!
Comment
FitzGerald Blindman

Join Date: Sep 2023

Posts: 32
#6

20 Feb 2025, 12:06

Originally posted by Nick Cox View Post

Data example please!

I'm sorry but I work on limited administrative data, on a restricted computer so I can't attach you anything from there.
I only can describe it theoretically from my PC.

Sorry,

Fitz
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35240
#7

20 Feb 2025, 12:12

It is fine by me that you can't (shouldn't) post confidential data.

All we want to see are faked realistic data or even faked silly data sufficient to show the problem, as explained in our FAQ Advice.

Otherwise you are being optimistic about our capacity to read a word description and understand it (I am pretty poor at that myself)

-- or you are expecting that we read your story and then invent a data example to show you some code (I am quite good at Stata code but I often make very silly mistakes without having a data example to play with).

If you can type posts on Statalist you can type out data examples (and may even be able to copy and paste into Statalist).
Comment
Leonardo Guizzetti

Join Date: Jul 2016

Posts: 2374
#8

20 Feb 2025, 12:35

I think you are framing this discussion in a problematic way, but maybe I haven't fully understood the problem.

There are some important aspects that seem to have been omitted in your description.
There is no mention of scenarios with single-parent adoptions, same-sex couples, or single-mothers, so I assume that you will not encounter them.

You mention twins in your dataset, but if all you have is birth year, how would you separately identify and rank true (monozygotic or dyzygotic) twins as opposed to two or more siblings born in the same year. And what about triplets, or other multiple births?

What about divorce and remarriage resulting in half-siblings or blended families?

Based on your description, you have either two separate datasets, one for the women and the other for the men. Or perhaps it's one dataset in a long layout the one row per parent-child dyad. Do you have information linking parents together into dyads? In any case, this seems like an odd data representation.

What precisely is the goal of this new rank variable? Are you trying to identify birth order within parent? Or birth order within couple?

I would consider whether you could first arrange the data into triads, with one observation per child, parent 1 and parent 2. I can't offer further help without some example data to work with and some clarity relating to the issues above.
2 likes
Comment

Announcement

Two arguments in the "rank" function of egen

Comment

Comment

Comment

Comment

Comment

Comment

Comment