Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to create a within-group rank in a panel dataset

    Dear all,

    I am working with a three-year balanced panel (years: 2000, 2001 and 2002) of Brazilian municipalities and I would like to create a variable that ranks (descending order) the municipalities within the same microregion (a microregion is a set of municipalities) according to the variable IDHM2000, which measures the Human Development Index (HDI) of each municipality in the year 2000.

    In particular, the dataset looks like the following:
    year municipality microregion IDHM2000
    2000 Armazém Tubarão .666
    2001 Armazém Tubarão .666
    2002 Armazém Tubarão .666
    2000 Braço do Norte Tubarão .687
    2001 Braço do Norte Tubarão .687
    2002 Braço do Norte Tubarão .687
    2000 Capivari de Baixo Tubarão .672
    2001 Capivari de Baixo Tubarão .672
    2002 Capivari de Baixo Tubarão .672
    2000 Grão-Pará Tubarão .634
    2001 Grão-Pará Tubarão .634
    2002 Grão-Pará Tubarão .634
    2000 Porto Alegre Metropolitana .75
    2001 Porto Alegre Metropolitana .75
    2002 Porto Alegre Metropolitana .75
    2000 Canoas Metropolitana .6
    2001 Canoas Metropolitana .6
    2002 Canoas Metropolitana .6
    2000 Gravataí Metropolitana .58
    2001 Gravataí Metropolitana .58
    2002 Gravataí Metropolitana .58

    Thus, my goal is to have a within-microregion rank that orders the municipalities by assigning 1 to the municipality with the highest value for IDHM2000, 2 for the municipality with the second highest value for that variable, 3 for the municipality with the third highest value for IDHM2000, and so on.

    To make it clear, I manually created a new variable, 'rank', that does what I want:
    year municipality microregion IDHM2000 rank
    2000 Armazém Tubarão .666 3
    2001 Armazém Tubarão .666 3
    2002 Armazém Tubarão .666 3
    2000 Braço do Norte Tubarão .687 1
    2001 Braço do Norte Tubarão .687 1
    2002 Braço do Norte Tubarão .687 1
    2000 Capivari de Baixo Tubarão .672 2
    2001 Capivari de Baixo Tubarão .672 2
    2002 Capivari de Baixo Tubarão .672 2
    2000 Porto Alegre Metropolitana .75 1
    2001 Porto Alegre Metropolitana .75 1
    2002 Porto Alegre Metropolitana .75 1
    2000 Canoas Metropolitana .6 2
    2001 Canoas Metropolitana .6 2
    2002 Canoas Metropolitana .6 2
    2000 Gravataí Metropolitana .58 3
    2001 Gravataí Metropolitana .58 3
    2002 Gravataí Metropolitana .58 3

    Can you help to create such variable?

    Thank you very much in advance.

    Any help is greatly appreciated.


    Please find below the code to import the example dataset:
    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input double year str33 municipality str66 microregion double IDHM2000
    2000 "Armazém"          "Tubarão"      .666
    2001 "Armazém"          "Tubarão"      .666
    2002 "Armazém"          "Tubarão"      .666
    2000 "Braço do Norte"   "Tubarão"      .687
    2001 "Braço do Norte"   "Tubarão"      .687
    2002 "Braço do Norte"   "Tubarão"      .687
    2000 "Capivari de Baixo" "Tubarão"      .672
    2001 "Capivari de Baixo" "Tubarão"      .672
    2002 "Capivari de Baixo" "Tubarão"      .672
    2000 "Porto Alegre"      "Metropolitana"  .75
    2001 "Porto Alegre"      "Metropolitana"  .75
    2002 "Porto Alegre"      "Metropolitana"  .75
    2000 "Canoas"            "Metropolitana"   .6
    2001 "Canoas"            "Metropolitana"   .6
    2002 "Canoas"            "Metropolitana"   .6
    2000 "Gravataí"         "Metropolitana"  .58
    2001 "Gravataí"         "Metropolitana"  .58
    2002 "Gravataí"         "Metropolitana"  .58
    end

  • #2
    Ranks are same for ties. For example, in "Metropolitana", if "Porto Alegre" and "Canoas" both have 0.75, then their ranks are 1, and the rank for "Gravatai" is 3.

    Code:
    gsort microregion -IDHM2000 municipality
    by microregion: gen rank = sum(municipality!=municipality[_n-1])
    by microregion: replace rank = rank[_n-1] if IDHM2000 == IDHM2000[_n-1]

    Comment


    • #3
      Hey Fei Wang , thank you very much!

      It works like a charm!

      Comment


      • #4
        egen, rank() allows a by: prefix. To rank from highest to lowest, negate the argument.

        Comment

        Working...
        X