Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Isolating three largest values in dataset with repeated times and multiple entities

    I would like to analyse singular countries from this data set and compare per108 in the last 20 or so years for the three biggest parties in absseat. However, I am unable to isolate the three biggest parties per country per year. I have tried treating this like a panel data but due to the repeated dates, I am unable to do so. Any help would be highly appreciated.

  • #2
    No data example here. However, try translating this to your set-up:

    Code:
    gen  OK = !missing(whatever) 
    
    bysort OK country year (whatever) : gen first = whatever[_N] if OK 
    
    by OK country year : gen second = whatever[_N-1] if OK 
    
    by OK country year : gen third = whatever[_N-2]  if OK
    See also

    https://journals.sagepub.com/doi/pdf...6867X221106436

    for a moderately systematic discussion.

    Code:
    SJ-22-2 dm0108  . . . Speaking Stata: The largest five - A tale of tail values
            . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  N. J. Cox
            Q2/22   SJ 22(2):446--459                                (no commands)
            shows basic calculations working with largest (or smallest)
            values in a tail of a distribution as a prelude to graphics,
            tables, and more detailed analysis

    Comment


    • #3
      Could you please illustrate how I can drop those values where whatever is different from the first, second or third value? Thank you so much for the help

      Comment


      • #4
        Code:
        drop if !inlist(whatever, first, second, third)
        Caution: check the data storage type of variable whatever before doing this. The variables first, second, and third, as generated by the code in #2, will have float storage type. If whatever is stored as a double, then whatever will almost never be an exact match to any of the values of first, second, or third, even if they look like they agree in the Browser window or in a data listing. (They can differ in far out decimal places that don't show up visually when you look at the data.) So if whatever is stored as a double, then revise the code in #2 to generate first, second, and third as doubles also.
        Last edited by Clyde Schechter; 16 Dec 2024, 15:56.

        Comment


        • #5
          How can I revise the code once I changed the storage type from the data panel?

          Also, I would like to create a graph that illustrates a political view of the three largest parties in a country over a period. However, because the data is organised as follows I can't use the command xtset country year, as the years repeat. Any suggestions?
          country year political party percentage vote pro globalisation peace
          Sweden 1994 Left Party 10.3 0.0 1.9
          Sweden 1994 Social Democratic Labour Party 46.5 4.4 5.6

          Comment


          • #6
            I didn't say anything about changing the storage types in the original data. What I said is that if the variable referred to by Nick as whatever in #2 is a double, then you have to make first,second and third doubles as well. You do that by replacing -gen ...- by -gen double ...- throughout in the code in #2.

            As for the graph you want to create, I don't know what a political view of the three largest parties in a country over a period means. So I'm going to have to bow out of that one. As far as not being able to -xtset- your data because years repeat within country, there are two possibilities here. One is to -reshape- the data wide so you will have one observation per country year combination. The other is to ask yourself whether you need to include a time variable in your -xtset- command: most -xt- commands do not require a time variable in -xtset-, you can just -xtset country- and Stata will not bother you about repetitions of year within country. All you lose, if you go that route, is the ability to calculate lags, leads, and estimate autoregressive structure. But if you don't need those things, then you have nothing to lose.

            Once you decide how you want to proceed, if you need help with coding, when posting back show example data using the -dataex- command, not a hand-crafted table, and give a clear explanation of what you want the graph to be: what goes on the x-axis, what goes on the y-axis, how many curves for each country? Do you want line graphs or scatter plots or what?

            If you are running version 18, 17, 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

            Comment


            • #7
              I don't follow what kind of graph you want either, but Clyde Schechter's point about storage types is naturally valid. If you want to keep the three largest values you need not create new variables.

              Code:
              gen  OK = !missing(whatever)   
              
              bysort OK country year (whatever) : keep if _n > _N - 3 & OK

              Comment


              • #8
                Click image for larger version

Name:	Screenshot 2024-12-17 at 14.55.18.png
Views:	1
Size:	419.9 KB
ID:	1769607

                Code:
                * Example generated by -dataex-. For more info, type help dataex
                clear
                input double(country edate party pervote per108 per110) float OK
                11 -5584 11810             13.6      0     0 1
                11 -5584 11620             15.8      0     0 1
                11 -5584 11320             46.5      0     0 1
                11 -4121 11810           12.385     .7     0 1
                11 -4121 11420           22.749    2.3     0 1
                11 -4121 11320           46.132     .8     0 1
                11 -2658 11620           14.373      0     0 1
                11 -2658 11420           24.442      0     0 1
                11 -2658 11320           46.047      0     0 1
                11 -1192 11620           17.108      0     0 1
                11 -1192 11420           23.807     .7     0 1
                11 -1192 11320           44.581      0     0 1
                11  -579 11420           18.209      2     0 1
                11  -579 11620           19.518      0     0 1
                11  -579 11320           46.216      0     0 1
                11   261 11620           16.557      0     0 1
                11   261 11420           17.492    2.2     0 1
                11   261 11320           47.789      0     0 1
                11  1724 11620           13.722      0     0 1
                11  1724 11420           16.975      0     0 1
                11  1724 11320           47.269    2.2     0 1
                11  3180 11420           14.256      0     0 1
                11  3180 11810           15.679     .2     0 1
                11  3180 11320           50.116      0     0 1
                11  3915 11420           16.211      0     0 1
                11  3915 11810           19.919      1     0 1
                11  3915 11320           45.343     .5     0 1
                11  5007 11620           14.294      0     0 1
                11  5007 11810           25.101      0     0 1
                11  5007 11320           43.559      0     0 1
                11  6106 11620           15.589      0     0 1
                11  6106 11810           24.085      0     0 1
                11  6106 11320           42.749      0     0 1
                11  7198 11810            18.07      0     0 1
                11  7198 11620           20.343     .8     0 1
                11  7198 11320           43.244      0     0 1
                11  8297 11810           15.476     .3     0 1
                11  8297 11620           23.644     .7     0 1
                11  8297 11320           45.606      0     0 1
                11  9389 11420           14.231      0     0 1
                11  9389 11620           21.328      1     0 1
                11  9389 11320           44.684      0     0 1
                11 10488 11420           12.202     .7     0 1
                11 10488 11620           18.297    4.9     0 1
                11 10488 11320           43.207    1.6     0 1
                11 11580 11420            9.128    7.5     0 1
                11 11580 11620           21.924  7.813     0 1
                11 11580 11320           37.705  3.077     0 1
                11 12679 11810            7.653   3.03  .606 1
                11 12679 11620            22.38 15.578     0 1
                11 12679 11320            45.25  3.398     0 1
                11 14143 11220            11.99      0 3.323 1
                11 14143 11620             22.9  9.281     0 1
                11 14143 11320            36.39  4.688     0 1
                11 15598 11420           13.394  6.061     0 1
                11 15598 11620           15.256  4.255     0 1
                11 15598 11320           39.854  1.099     0 1
                11 17061 11810            7.879   .955     0 1
                11 17061 11620            26.23   3.73     0 1
                11 17061 11320           34.994  1.018     0 1
                11 18524 11110             7.34      0  .763 1
                11 18524 11620            30.06   2.11  .053 1
                11 18524 11320            30.66      0     0 1
                11 19980 11710           12.857      0  2.54 1
                11 19980 11620           23.325  1.408     0 1
                11 19980 11320           31.015   .739     0 1
                11 21436 11710           17.534      0  3.03 1
                11 21436 11620           19.836   .995     0 1
                11 21436 11320           28.261  2.581     0 1
                11 22899 11620 19.1026142541736  1.348   .27 1
                11 22899 11710 20.5366981413734   .226 1.432 1
                11 22899 11320  30.326280829554   .852     0 1
                12 -5198 12420           13.793      0     0 1
                12 -5198 12620           17.008      0     0 1
                12 -5198 12320           41.027      0     0 1
                12 -3735 12420            14.75      0     0 1
                12 -3735 12620            18.43      0     0 1
                12 -3735 12320           45.694     .3     0 1
                12 -2272 12520           10.486     .6     0 1
                12 -2272 12620            18.78     .4     0 1
                12 -2272 12320           46.659    1.2     0 1
                12  -816 12520           10.231      0     0 1
                12  -816 12620            18.72     .3     0 1
                12  -816 12320           48.331     .3     0 1
                12   619 12520            10.37    1.1     0 1
                12   619 12620            21.28    1.3     0 1
                12   619 12320           46.762     .9     0 1
                12  2081 12420            10.56     .5     0 1
                12  2081 12620            20.99     .5     0 1
                12  2081 12320           43.144     .3     0 1
                12  3537 12810            10.21      0     0 1
                12  3537 12620            20.59     .9     0 1
                12  3537 12320           46.525      1     0 1
                12  5000 12520            13.52     .1     0 1
                12  5000 12620            19.61    3.2     0 1
                12  5000 12320           35.289    1.2     0 1
                12  6463 12520            11.13      0     0 1
                12  6463 12620            27.08     .8     0 1
                12  6463 12320           42.259     .7     0 1
                12  7927 12520            8.914     .1     0 1
                end
                format %td edate
                label values country country
                label def country 11 "sweden", modify
                label def country 12 "norway", modify
                label values party party
                label def party 11110 "swe: Green Ecology Party", modify
                label def party 11220 "swe: Left Party", modify
                label def party 11320 "swe: Social Democratic Labour Party", modify
                label def party 11420 "swe: Liberals", modify
                label def party 11620 "swe: Moderate Coalition Party", modify
                label def party 11710 "swe: Sweden Democrats", modify
                label def party 11810 "swe: Centre Party", modify
                label def party 12320 "nor: Labour Party", modify
                label def party 12420 "nor: Liberal Party", modify
                label def party 12520 "nor: Christian People’s Party", modify
                label def party 12620 "nor: Conservative Party", modify
                label def party 12810 "nor: Centre Party", modify
                I would like to have a graph similar to this one.

                I tried using xtline per108, i(country) t(edate) but it will not take into account the evolution of each party.

                I would thus like to have a connected line graph per country showing the positions of parties in one country during a selectable time period, with y-axis showing the position of a party on a certain matter (eg: per108) and on the x-axis the time.

                Comment


                • #9
                  You may want to further customize this graph with additional options. But the core of it is this:
                  Code:
                  keep country edate party per108
                  frame put party, into(working)
                  frame change working
                  duplicates drop
                  forvalues i = 1/`=_N' {
                      local j = party[`i']
                      local lbl_`j' :label (party) `j'
                      local lbl_`j' = substr(`"`lbl_`j''"', 5, .)
                  }
                  macro dir
                  frame change default
                  
                  levelsof party, local(parties)
                  reshape wide per108, i(country edate) j(party)
                  foreach p of local parties {
                      label var per108`p' `"`lbl_`p''"'
                  }
                  
                  gen latency1940 = datediff_frac(td(1jan1940), edate, "y") + 1940
                  graph twoway connect per108* latency1940, by(country, rows(2)) sort

                  Comment

                  Working...
                  X