Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • tag three observations with values nearest to the current one.

    Dear All, I have this data
    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input float gdp
    38.2203
    148.788
    160.813
    179.149
    228.366
    17.9307
    35.0746
    77.0319
    80.3094
    309.432
     75.864
     108.08
    end
    and wish to tag three observations (in the same sample) with values nearest to the current value for each observation? Any suggestions? Thanks.
    Ho-Chuan (River) Huang
    Stata 17.0, MP(4)

  • #2
    Where do you get the current value from?

    Comment


    • #3
      Dear Clyde, Thanks for the reply. Starting from the first observation, the value of `gdp' is 38.2203. I'd like to obtain three observations (except for the first/current observation) from the rest of all observations wth values nearest to 38.2203 (Maybe we need to create three new variables for these?). Is this clear to you?
      Ho-Chuan (River) Huang
      Stata 17.0, MP(4)

      Comment


      • #4
        OK, now I understand.

        Code:
        gen long obs_no = _n
        tempfile copy
        preserve
        rename _all match_=
        save `copy'
        
        restore
        cross using `copy'
        
        drop if match_obs_no == obs_no
        gen delta = abs(match_gdp - gdp)
        by obs_no (delta), sort: keep if _n <= 3
        by obs_no (delta): gen rank = _n
        
        reshape wide match@_gdp match@_obs_no delta, i(obs_no) j(rank)
        
        order obs_no gdp, first
        If your data set is very large, this code will exceed your memory limitations and take a very long time. Basically if you are starting with N observations the -cross- command will expand the data set to N2 observations. So if that isn't viable, post back and let me know. There is another way that is a bit more complicated but uses only order N memory.

        Comment


        • #5
          Dear Clyde, Thanks a lot. Actually, this is a question from someone else. I will post your answer to him/her, and see what happens. If necessary, I will post back.
          Ho-Chuan (River) Huang
          Stata 17.0, MP(4)

          Comment


          • #6
            I'm interested in what you tried, as that will also shed light on what you're trying to do.

            Here is an inelegant solution.
            I'm going to say that these are annual observations, just for the sake of the example.

            Code:
             
            g year = _n
            sort gdp
            g rank = _n
            foreach num of numlist 1/3 {
            g lower`num' = gdp[_n-`num']
            g lower`num'dist = abs(gdp-lower`num')
            g higher`num' = gdp[_n+`num']
            g higher`num'dist = abs(gdp-higher`num')
            }
            list
            // at this point we have the three closest in each direction and we just need to pick the three that are closest
            foreach i of numlist 1/3{
            egen mindist = rowmin(*dist) // record closest
            g close`i'value = . // here we will hold the closest value
            foreach num of numlist 1/3{
            replace close`i'value = higher`num' if higher`num'dist == mindist
            replace higher`num'dist = . if mindist == higher`num'dist
            replace close`i'value = lower`num' if lower`num'dist == mindist
            replace lower`num'dist = . if mindist == lower`num'dist
            }
            drop mindist
            }
            drop lower* higher*
            sort year
            list
            Last edited by Arthur Morris; 27 Feb 2020, 21:56.

            Comment


            • #7
              Dear Arthur, Thanks for the suggestion. I will give a try soon.
              Ho-Chuan (River) Huang
              Stata 17.0, MP(4)

              Comment

              Working...
              X