Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to rearrange duplicates from rows to columns

    Hi all,

    I am currently working on this dataset and got a couple of queries please regarding rearranging the data. My initial dataset looks as below which has date of events by State-LGAs-Year, with the final column containing the date of events.

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input str28 state str46 LGA float year int Date_Start
    "New South Wales" "Blue Mountains"  2011 18890
    "New South Wales" "Blue Mountains"  2011 18924
    "New South Wales" "Blue Mountains"  2013 19611
    "New South Wales" "Blue Mountains"  2013 19644
    "New South Wales" "Blue Mountains"  2013 19647
    "New South Wales" "Blue Mountains"  2014 20028
    "New South Wales" "Blue Mountains"  2014 20041
    "New South Wales" "Blue Mountains"  2015 20302
    "New South Wales" "Blue Mountains" 2019 21792
    "Victoria"        "Blue Mountains" 2015 20159
    "Victoria"        "Blue Mountains" 2015 20376
    "Victoria"        "Blue Mountains" 2017 20862
    "Victoria"        "Peyton"          2017 21386
    end
    format %tdnn/dd/CCYY Date_Start

    I would like to convert the duplicate rows by State-LGA-Year into columns. For example, rows 1 and 2 from the above data has event information for the state New South Wales, LGA Blue Mountains for the year 2011. Date of the duplicate event have to be rearranged as a column for all the duplicate rows and the resulting output should look as below:

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input str28 state str46 LGA float year int(Date_Start Date_Start_01 Date_Start_02)
    "New South Wales" "Blue Mountains"  2011 18890 18924     .
    "New South Wales" "Blue Mountains"  2013 19611 19644 19647
    "New South Wales" "Blue Mountains"  2014 20028 20041     .
    "New South Wales" "Blue Mountains"  2015 20302     .     .
    "New South Wales" "Blue Mountains" 2019 21792     .     .
    "Victoria"        "Blue Mountains" 2015 20159 20376     .
    "Victoria"        "Blue Mountains" 2017 20862     .     .
    "Victoria"        "Peyton"          2017 21386     .     .
    end
    format %tdnn/dd/CCYY Date_Start
    format %tdnn/dd/CCYY Date_Start_01
    format %tdnn/dd/CCYY Date_Start_02

    Finally, I need to find the most recent date of the last three columns (that provides the date of each event happened in a year), for which the resulting dataset would look as below:

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input str28 state str46 LGA float year int(Date_Start Date_Start_01 Date_Start_02 recentdate)
    "New South Wales" "Blue Mountains"  2011 18890 18924     . 18924
    "New South Wales" "Blue Mountains"  2013 19611 19644 19647 19644
    "New South Wales" "Blue Mountains"  2014 20028 20041     . 20041
    "New South Wales" "Blue Mountains"  2015 20302     .     .     .
    "New South Wales" "Blue Mountains" 2019 21792     .     .     .
    "Victoria"        "Blue Mountains" 2015 20159 20376     . 20376
    "Victoria"        "Blue Mountains" 2017 20862     .     .     .
    "Victoria"        "Peyton"          2017 21386     .     . 21386
    end
    format %tdnn/dd/CCYY Date_Start
    format %tdnn/dd/CCYY Date_Start_01
    format %tdnn/dd/CCYY Date_Start_02
    format %tdnn/dd/CCYY recentdate

    Thanks for your help.

    Regards
    Vignesh.

  • #2
    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input str28 state str46 LGA float year int Date_Start
    "New South Wales" "Blue Mountains" 2011 18890
    "New South Wales" "Blue Mountains" 2011 18924
    "New South Wales" "Blue Mountains" 2013 19611
    "New South Wales" "Blue Mountains" 2013 19644
    "New South Wales" "Blue Mountains" 2013 19647
    "New South Wales" "Blue Mountains" 2014 20028
    "New South Wales" "Blue Mountains" 2014 20041
    "New South Wales" "Blue Mountains" 2015 20302
    "New South Wales" "Blue Mountains" 2019 21792
    "Victoria"        "Blue Mountains" 2015 20159
    "Victoria"        "Blue Mountains" 2015 20376
    "Victoria"        "Blue Mountains" 2017 20862
    "Victoria"        "Peyton"         2017 21386
    end
    format %tdnn/dd/CCYY Date_Start
    
    bys state LGA year: gen which=_n
    reshape wide Date_Start, i(state LGA year) j(which)
    egen wanted= rowmax(Date_Start*)
    format wanted %tdnn/dd/CCYY
    Res.:

    Code:
    
    
    . l, sepby(state)
    
         +--------------------------------------------------------------------------------------------+
         |           state              LGA   year   Date_St~1   Date_Sta~2   Date_Sta~3       wanted |
         |--------------------------------------------------------------------------------------------|
      1. | New South Wales   Blue Mountains   2011   9/20/2011   10/24/2011            .   10/24/2011 |
      2. | New South Wales   Blue Mountains   2013   9/10/2013   10/13/2013   10/16/2013   10/16/2013 |
      3. | New South Wales   Blue Mountains   2014   11/1/2014   11/14/2014            .   11/14/2014 |
      4. | New South Wales   Blue Mountains   2015    8/2/2015            .            .     8/2/2015 |
      5. | New South Wales   Blue Mountains   2019   8/31/2019            .            .    8/31/2019 |
         |--------------------------------------------------------------------------------------------|
      6. |        Victoria   Blue Mountains   2015   3/12/2015   10/15/2015            .   10/15/2015 |
      7. |        Victoria   Blue Mountains   2017   2/12/2017            .            .    2/12/2017 |
      8. |        Victoria           Peyton   2017   7/21/2018            .            .    7/21/2018 |
         +--------------------------------------------------------------------------------------------+
    Last edited by Andrew Musau; 03 Mar 2022, 05:43.

    Comment


    • #3
      Hi Andrew,

      Thanks for your kind help. It works perfectly.

      Regards
      Vignesh.

      Comment

      Working...
      X