Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Creating a group ID based on multiple variables: no bysort & no weights allowed

    Hello Statalist,

    I am attempting to create a group ID which identifies each unique combination of the variables "investorid", "AnnounceDate2", and "uniqueinvestmentid" in the example data below.

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input float(investorid AnnounceDate2) str11 uniqueinvestmentid
    1 17995 "16x3307" 
    1 17995 "16x3307" 
    1 17995 "16x3307" 
    1 17995 "16x3307" 
    1 18443 "16x13109"
    1 18567 "16x3990" 
    1 18604 "16x12503"
    1 18736 "16x6502" 
    1 19153 "16x7257" 
    1 19450 "16x2982" 
    1 19801 "16x11283"
    1 19844 "16x11559"
    1 19927 "16x11810"
    1 19996 "16x4271" 
    1 20032 "16x6827" 
    1 20187 "16x6907" 
    1 20206 "16x5897" 
    1 20502 "16x6593" 
    1 20545 "16x6809" 
    1 20683 "16x4722" 
    1 20698 "16x4745" 
    1 20759 "16x9229" 
    1 20836 "16x6372" 
    1 20957 "16x12705"
    1 20978 "16x2978" 
    1 21154 "16x103"  
    1 21172 "16x2978" 
    1 21202 "16x4231" 
    1 21535 "16x5931" 
    2 20226 "17x4326" 
    end
    format %td AnnounceDate2
    I can use the following code:

    Code:
    egen investmentid = group(investorid AnnounceDate2 uniqueinvestmentid)
    but this just makes a continuing count all the way to the end of the data, whereas I want the investmentid to start again at 1 when it moves to investorid==2, and so on. Unfortunately, I cannot use the following code:

    Code:
    bysort investorid: egen investmentid = group(AnnounceDate2 uniqueinvestmentid)
    because you cannot combine bysort with egen commands.

    I tried to do the following as well:

    Code:
    bysort investorid: gen investmentid=_n
    replace investmentid[_n]=investmentid [_n-1] if AnnounceDate2[_n]==AnnounceDate2[_n-1] & uniqueinvestmentid[_n]==uniqueinvestmentid[_n-1]
    But unfortunately this gives me the error code "weights not allowed", an issue that, based on my reserach, is fairly well-documented, because Stata interprets the [_n] in the initial statement as a weight and refuses to combine it with the replace command.

    Given all of the above - does anyone have any suggestions as to how I might achieve what I'd like to do here? It seems like a fairly easy issue in terms of logic but I cannot get there. Thanks in advance for any help you can provide!




  • #2
    I suggest that you give an example of how your desired identifier will look, I can't follow all your criteria. The requirement to start at 1 again every time there is a new investor id would seem to be satisfied by giving an identifier that is identically 1, which is hardly an identifier,

    Comment


    • #3
      The error message you received on the code
      Code:
      bysort investorid: gen investmentid=_n
      replace investmentid[_n]=investmentid [_n-1] if AnnounceDate2[_n]==AnnounceDate2[_n-1] & uniqueinvestmentid[_n]==uniqueinvestmentid[_n-1]
      is eliminated by changing the replace command to
      Code:
      replace investmentid=investmentid [_n-1] if AnnounceDate2[_n]==AnnounceDate2[_n-1] & uniqueinvestmentid[_n]==uniqueinvestmentid[_n-1]

      Comment


      • #4
        Hi Nick,

        Sure - sorry if that was unclear. I have manually created what I'd want for the ID variable in the example data below (see investmentid):

        Code:
        * Example generated by -dataex-. To install: ssc install dataex
        clear
        input float(investorid AnnounceDate2) str11 uniqueinvestmentid float investmentid
        1 17995 "16x3307"   1
        1 17995 "16x3307"   1
        1 17995 "16x3307"   1
        1 17995 "16x3307"   1
        1 18443 "16x13109"  2
        1 18567 "16x3990"   3
        1 18604 "16x12503"  4
        1 18736 "16x6502"   5
        1 19153 "16x7257"   6
        1 19450 "16x2982"   7
        1 19801 "16x11283"  8
        1 19844 "16x11559"  9
        1 19927 "16x11810" 10
        1 19996 "16x4271"  11
        1 20032 "16x6827"  12
        1 20187 "16x6907"  13
        1 20206 "16x5897"  14
        1 20502 "16x6593"  15
        1 20545 "16x6809"  16
        1 20683 "16x4722"  17
        1 20698 "16x4745"  18
        1 20759 "16x9229"  19
        1 20836 "16x6372"  20
        1 20957 "16x12705" 21
        1 20978 "16x2978"  22
        1 21154 "16x103"   23
        1 21172 "16x2978"  24
        1 21202 "16x4231"  25
        1 21535 "16x5931"  26
        2 20226 "17x4326"   1
        2 20437 "17x6689"   2
        end
        format %td AnnounceDate2
        As you can see, for investorid==1, the ID remains the same for the first few observations because investorid, uniqueinvestmentid, and AnnounceDate2 are all identical. Once they are different it moves on to 2 and so on. Then it goes back to 1 when the investorid==2. Hope that helps.


        Also - William - thanks for your suggestion but I tried that and it still gives me the "weights not allowed" error message.

        Comment


        • #5
          I see now that there was a second error in your replace command that I overlooked. In post #3 I should have written that the error message you received on the code
          Code:
          bysort investorid: gen investmentid=_n
          replace investmentid[_n]=investmentid [_n-1] if AnnounceDate2[_n]==AnnounceDate2[_n-1] & uniqueinvestmentid[_n]==uniqueinvestmentid[_n-1]
          would be eliminated by changing the replace command to
          Code:
          replace investmentid=investmentid[_n-1] if AnnounceDate2[_n]==AnnounceDate2[_n-1] & uniqueinvestmentid[_n]==uniqueinvestmentid[_n-1]
          (notice the elimination of the incorrect space in front of [_n-1]). Note that each of the [_n] subscripts are unnecessary and can be removed. (See the output of help subscripting). Subscripts are allowed in expressions (and elsewhere) on the right of the equal sign, but not as part of the variable name on the left of the equal sign, in generate and replace commands.
          Code:
          replace investmentid=investmentid[_n-1] if AnnounceDate2==AnnounceDate2[_n-1] & uniqueinvestmentid==uniqueinvestmentid[_n-1]
          Last edited by William Lisowski; 08 Mar 2020, 11:32.

          Comment


          • #6
            That works! Thanks William!

            Comment

            Working...
            X