Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Establishing equivalence between sets of observations across years

    Hello, This is something I've been struggling with about a month now and am not making any headway. Here is the problem. I have identified groups across three different sets of years. I have a unique id for each group-year, but the group-ids are not consistent across time, and the group membership can change slightly over time. I need to have a way of identifying groups across years. For instance, A,B,C are group "1" in 2001, but in 2005, C spins off as a separate group "3", and A-B add D as a member and have group id number "2" in 2005. I need to find a way to capture the fact that Group 3 in 2005 is the same as group 1 in 2001 (since the majority of the members are the same). Any suggestions will be very much appreciated. See below the data structure:

    Individual Id; Group Id; Year
    A; 1; 2001
    B; 1; 2001
    C; 1; 2001
    A; 2; 2005
    B; 2; 2005
    D; 2; 2005
    C; 3; 2005
    F; 3; 2005
    A;6; 2009
    B; 6; 2009
    D; 6; 2009
    C; 7; 2009

    I've tried various iterations of the merge command and the bysort command, but haven't really made much headway.

    Thanks

  • #2
    From your description, you define "same group" as a majority of members retained. Therefore, you can loop through group members and calculate the proportion of a particular group's members in the other groups. Eliminating "similar" groups successively should reduce the search significantly depending on how large your dataset is. Here is a way to determine proportions for Group_Id=1 in your data example


    Code:
    input str5 Individual_Id float(Group_Id Year)
    "A" 1 2001
    "B" 1 2001
    "C" 1 2001
    "A" 2 2005
    "B" 2 2005
    "D" 2 2005
    "C" 3 2005
    "F" 3 2005
    "A" 6 2009
    "B" 6 2009
    "D" 6 2009
    "C" 7 2009
    end
    
    *DETERMINE TOTAL NUMBER OF GROUPS
    qui sum Group_Id
    local f= r(min)
    local l= r(max)
    
    *GET MEMBERS OF EACH GROUP
    
    forvalues i= `f'/`l'{
    levelsof Individual_Id if Group_Id ==`i', local(idlevel`i')
    local n`i': word count `idlevel`i''
    }
    
    *DETERMINE PROPORTIONS FOR GROUP 1
    foreach id in `idlevel1'{
    generate idlevel`id' = regexm( Individual_Id ,  "`id'")
    local fw1: word 1 of `idlevel1'
    local fwl: word `n1' of `idlevel1'
    }
    
    egen idlevel1= rowtotal(idlevel`fw1' - idlevel`fwl')
    bysort Group_Id: replace idlevel1= sum((1/`n1')*idlevel1)
    bysort Group_Id: replace idlevel1=idlevel1[_N]
    list, sepby(Group_Id)
    Code:
    . list, sepby(Group_Id)
    
         +------------------------------------------------------------------------+
         | Indivi~d   Group_Id   Year   idlevelA   idlevelB   idlevelC   idlevel1 |
         |------------------------------------------------------------------------|
      1. |        A          1   2001          1          0          0          1 |
      2. |        B          1   2001          0          1          0          1 |
      3. |        C          1   2001          0          0          1          1 |
         |------------------------------------------------------------------------|
      4. |        A          2   2005          1          0          0   .6666667 |
      5. |        B          2   2005          0          1          0   .6666667 |
      6. |        D          2   2005          0          0          0   .6666667 |
         |------------------------------------------------------------------------|
      7. |        C          3   2005          0          0          1   .3333333 |
      8. |        F          3   2005          0          0          0   .3333333 |
         |------------------------------------------------------------------------|
      9. |        A          6   2009          1          0          0   .6666667 |
     10. |        B          6   2009          0          1          0   .6666667 |
     11. |        D          6   2009          0          0          0   .6666667 |
         |------------------------------------------------------------------------|
     12. |        C          7   2009          0          0          1   .3333333 |
         +------------------------------------------------------------------------+
    
    .
    Group1 is 100% Group1 and two-thirds Groups 2 and 6.

    Comment


    • #3
      Thank you so much Andrew for the response. I am so grateful. Your code works great with the input data.
      However, I just cannot make this code work with my original data. After much trial and error, I think I've nailed down the problem. I think the reason it doesn't work with my original data is because my Individual_Ids are string variables with more than one word - example, "Agrabada Industries Ltd." However, I cannot figure out what the solution is since all the commands you've used seem to work, when used individually, with string variables with more than one word. Any advice will be very much appreciated. Below is my original data in case that helps.

      The other question I have is regarding the code. In the step "*DETERMINE PROPORTIONS FOR GROUP 1", it sounds like I will be able to get the proportions only for Group1. How do I get the proportions for all other groups?

      Thanks
      Dalhia Mani

      Individual_Id Year Group_Id
      3I Infotech Ltd. 2009 40001
      A B C Bearings Ltd. 2005 22067
      A B C Bearings Ltd. 2009 40004
      A B C India Ltd. 2001 2086
      A B C Paper Ltd. 2009 50318
      A B T Industries Ltd. 2005 20631
      A B T Ltd. 2001 668
      A C C Ltd. 2001 6
      A C C Ltd. 2005 20004
      A G K Computer Secure Prints Ltd. 2001 10
      A I A Engineering Ltd. 2005 20008
      A I A Engineering Ltd. 2009 40012
      A I Champdany Inds. Ltd. 2009 40013
      A K Capital Services Ltd. 2005 20009
      A P L Apollo Tubes Ltd. 2009 50231
      A P M Industries Ltd. 2005 20010
      A P M Industries Ltd. 2009 40015
      A T N International Ltd. 2005 30306
      A T N International Ltd. 2009 40026
      A T V Projects India Ltd. 2005 22072
      A V T Natural Products Ltd. 2001 14
      A V T Natural Products Ltd. 2005 20013
      A V T Natural Products Ltd. 2009 40021
      Aarti Drugs Ltd. 2001 16
      Aarti Drugs Ltd. 2005 22063
      Aarti Drugs Ltd. 2009 42009
      Aarti Industries Ltd. 2001 16
      Aarti Industries Ltd. 2005 22063
      Aarti Industries Ltd. 2009 42009
      Aarvee Denims & Exports Ltd. 2009 40022
      Aarya Global Shares & Securities Ltd. 2009 40023
      Aaswa Trading & Exports Ltd. 2001 2055
      Aaswa Trading & Exports Ltd. 2009 40027
      Aavishkar Finance & Trade Pvt. Ltd. 2005 20058
      Abee Info-Consumables Ltd. 2001 19
      Abhinandan Investments Ltd. 2005 20863
      Abhinav Capital Services Ltd. 2009 40030
      Accel Frontline Ltd. 2009 40032
      Accel Ltd. 2005 20018
      Accel Transmatic Ltd. 2001 23
      Accel Transmatic Ltd. 2005 20018
      Accel Transmatic Ltd. 2009 40032
      Accentia Technologies Ltd. 2001 24
      Acclaim Industries Ltd. 2009 42048
      Acknit Industries Ltd. 2009 40035
      Acme Resources Ltd. 2005 30306
      Acrow India Ltd. 2009 40036
      Adam Comsof Ltd. 2001 28
      Adani Enterprises Ltd. 2001 29
      Addi Industries Ltd. 2001 31
      Addi Industries Ltd. 2005 20022
      Addi Industries Ltd. 2009 40040
      Adhunik Synthetics Ltd. 2005 22080
      Adhunik Yarns Ltd. 2005 22080
      Adinath Bio-Labs Ltd. 2009 50295
      Adinath Exim Resources Ltd. 2009 40023
      Adinath Textiles Ltd. 2005 20024
      Adinath Textiles Ltd. 2009 40043
      Aditya Birla Finance Ltd. 2001 188
      Aditya Birla Money Ltd. 2005 22005
      Aditya Birla Nuvo Ltd. 2001 10307
      Aditya Birla Nuvo Ltd. 2005 30381
      Aditya Birla Nuvo Ltd. 2009 50340
      Aditya Electro Communications Ltd. 2001 111
      Aditya Electro Communications Ltd. 2005 20106
      Advance Multitech Ltd. 2009 40046
      Advance Petrochemicals Ltd. 2009 40046
      Advance Powerinfra Tech Ltd. 2005 22098
      Advance Powerinfra Tech Ltd. 2009 40047
      Advanced Micronic Devices Ltd. 2009 40048
      Aegis Logistics Ltd. 2009 40052
      Agro Dutch Inds. Ltd. 2001 44
      Agro Dutch Inds. Ltd. 2009 40055
      Ahlcon Parenterals (India) Ltd. 2001 31
      Ahlcon Parenterals (India) Ltd. 2005 20022
      Ahlcon Parenterals (India) Ltd. 2009 40040
      Ahluwalia Contracts (India) Ltd. 2001 31
      Ahluwalia Contracts (India) Ltd. 2005 20022
      Ahluwalia Contracts (India) Ltd. 2009 40040
      Ahmedabad Advance Mills Ltd. 2001 45
      Ahmedabad Advance Mills Ltd. 2005 20031
      Ahmedabad Advance Mills Ltd. 2009 40056
      Aimco Pesticides Ltd. 2009 40059
      Ajanta Mercantile Ltd. 2005 20471
      Ajanta Mercantile Ltd. 2009 40619
      Ajanta Soya Ltd. 2001 2073
      Ajanta Soya Ltd. 2005 20034
      Ajanta Tubes Ltd. 2001 149
      Ajcon Global Services Ltd. 2001 48
      Ajcon Global Services Ltd. 2009 40061
      Ajel Ltd. 2001 246
      Ajmera Realty & Infra India Ltd. 2009 40062
      Akar Tools Ltd. 2001 2092
      Akash Agro Inds. Ltd. 2005 20862
      Aksh Optifibre Ltd. 2005 20038
      Aksharchem (India) Ltd. 2009 40064
      Akzo Nobel India Ltd. 2005 20114
      Albert David Ltd. 2001 2089
      Albert David Ltd. 2005 20041
      Albert David Ltd. 2009 40065
      Alchemist Corporation Ltd. 2005 20468
      Alchemist Corporation Ltd. 2009 40066
      Alchemist Ltd. 2005 20042
      Alchemist Ltd. 2009 50315
      Alchemist Realty Ltd. 2001 54
      Alembic Ltd. 2001 2029
      Alembic Ltd. 2005 22103
      Alembic Ltd. 2009 40067
      Alfavision Overseas (India) Ltd. 2001 55
      Alfred Herbert (India) Ltd. 2005 20048
      Alkyl Amines Chemicals Ltd. 2001 2004
      Alkyl Amines Chemicals Ltd. 2009 40074
      Allana Frozen Foods Ltd. 2005 20051
      Allanasons Ltd. 2001 58
      Allsec Technologies Ltd. 2009 40004
      Alna Trading & Exports Ltd. 2001 58
      Alna Trading & Exports Ltd. 2005 20051
      Alna Trading & Exports Ltd. 2009 40077
      Alok Industries Ltd. 2009 42003
      Alpa Laboratories Ltd. 2009 40048
      Alpine Capital Services Ltd. 2001 2115
      Alpine Capital Services Ltd. 2005 20053
      Alps Industries Ltd. 2001 62
      Alufluoride Ltd. 2005 20057
      Alumeco India Extrusion Ltd. 2001 63
      Alumeco India Extrusion Ltd. 2005 22087
      Amal Ltd. 2005 20058
      Amal Ltd. 2009 42006
      Amalgamated Electricity Co. Ltd. 2009 40083
      Amani Trading & Exports Ltd. 2001 2055
      Amani Trading & Exports Ltd. 2009 40027
      Ambuja Cement Eastern Ltd. [Merged] 2001 6
      Ambuja Cement Eastern Ltd. [Merged] 2005 20004
      Ameer Trading Corpn. Ltd. 2005 20058
      Amines & Plasticizers Ltd. 2001 67
      Amines & Plasticizers Ltd. 2005 20064
      Amines & Plasticizers Ltd. 2009 40090
      Amit Commercial Co. Ltd. 2005 20571
      Amit Securities Ltd. 2005 30371
      Amit Spinning Inds. Ltd. 2005 20066
      Amit Spinning Inds. Ltd. 2009 40091
      Amluckie Investment Co. Ltd. 2001 2267
      Amluckie Investment Co. Ltd. 2005 30306
      Amradeep Industries Ltd. 2005 22015
      Amradeep Industries Ltd. 2009 42043
      Amrapali Industries Ltd. 2005 22015
      Amraworld Agrico Ltd. 2005 22015
      Amraworld Agrico Ltd. 2009 42043
      Amrit Agro Inds. Ltd. 2005 20067
      Amrit Banaspati Co. Ltd. 2001 74
      Amrit Banaspati Co. Ltd. 2005 20067
      Amrit Banaspati Co. Ltd. 2009 50318
      Amrit Corp. Ltd. 2001 74
      Amrit Corp. Ltd. 2005 20067
      Amrit Corp. Ltd. 2009 50318
      Amulya Leasing & Finance Ltd. 2005 20053
      Ancent Software International Ltd. 2005 30264
      Ancent Software International Ltd. 2009 50232
      Andhra Cements Ltd. 2001 10276
      Andhra Cements Ltd. 2009 42002
      Andhra Petrochemicals Ltd. 2005 20071
      Andhra Petrochemicals Ltd. 2009 40095
      Andhra Sugars Ltd. 2005 20071
      Andhra Sugars Ltd. 2009 40095
      Andrew Yule & Co. Ltd. 2005 22038
      Andrew Yule & Co. Ltd. 2009 40221
      Anik Industries Ltd. 2001 2041
      Anik Industries Ltd. 2005 30299
      Anik Industries Ltd. 2009 40436
      Anjani Dham Inds. Ltd. 2005 20073
      Anjani Portland Cement Ltd. 2005 20074
      Anjani Portland Cement Ltd. 2009 40097
      Anjani Synthetics Ltd. 2005 20073
      Anka India Ltd. 2005 20075
      Ankur Drugs & Pharma Ltd. 2001 79
      Anshuni Commercials Ltd. 2001 2052
      Apcotex Industries Ltd. 2001 2051
      Apex Intertech Ltd. 2001 87
      Apis India Ltd. 2001 368
      Aplab Ltd. 2009 40108
      Apollo Finance Ltd. 2005 20083
      Apollo Hospitals Enterprise Ltd. 2001 88
      Apollo Hospitals Enterprise Ltd. 2005 22005
      Apollo International Ltd. 2005 20083
      Apollo Tyres Ltd. 2005 20083
      Apollo Tyres Ltd. 2009 40110
      Apple Credit Corpn. Ltd. 2005 20084
      Apple Finance Ltd. 2005 20084
      Aqualand (India) Ltd. 2009 40108
      Arch Pharmalabs Ltd. 2009 40174
      Archway Investment Co. Ltd. 2001 10365
      Arco Leasing Ltd. 2005 20086
      Arco Leasing Ltd. 2009 40117
      Arihant Classic Finance Ltd. 2005 21019
      Arihant Enterprise Ltd. 2001 231
      Arihant Enterprise Ltd. 2005 21019
      Arihant Enterprises Ltd. 2005 21019
      Arihant Finlease (India) Ltd. 2005 20868
      Arihant Leasing & Holdings Ltd. 2005 20207
      Arihant Superstructures Ltd. 2005 22091
      Arlem Breweries Ltd. [Merged] 2001 2061
      Aro Granite Inds. Ltd. 2005 20042
      Aro Granite Inds. Ltd. 2009 40128
      Aroni Commercials Ltd. 2005 30262
      Aroni Commercials Ltd. 2009 42041
      Arora Fibres Ltd. 2001 98
      Arora Fibres Ltd. 2005 20093
      Arora Fibres Ltd. 2009 40129
      Art Leather Ltd. 2005 20484
      Artefact Projects Ltd. 2005 20095
      Artheon Finance Ltd. 2009 40133
      Artillegence Bio-Innovations Ltd. 2005 22018
      Artillegence Bio-Innovations Ltd. 2009 42020
      Arunoday Mills Ltd. 2005 20066
      Arvind International Ltd. 2001 2043
      Arvind International Ltd. 2009 42026
      Arvind Ltd. 2005 20058
      Aryan Pesticides Ltd. [Merged] 2001 105
      Asahi India Glass Ltd. 2001 10388
      Asahi Songwon Colors Ltd. 2009 40064
      Ashiana Ispat Ltd. 2001 10232
      Ashiana Ispat Ltd. 2005 20104
      Ashima Ltd. 2005 22013
      Ashirwad Capital Ltd. 2001 2062
      Ashirwad Capital Ltd. 2005 20105
      Ashirwad Capital Ltd. 2009 40142
      Ashirwad Steels & Inds. Ltd. 2005 22033
      Ashirwad Steels & Inds. Ltd. 2009 40143
      Ashish Polyplast Ltd. 2001 111
      Ashish Polyplast Ltd. 2005 20106
      Ashok Alco-Chem Ltd. 2009 40145
      Ashok Leyland Ltd. 2005 20362
      Ashok Leyland Ltd. 2009 40146
      Ashram Online.Com Ltd. 2001 10301
      Ashram Online.Com Ltd. 2005 30271
      Asia H R Technologies Ltd. 2005 20109
      Asia H R Technologies Ltd. 2009 42028
      Asia Pack Ltd. 2005 20110
      Asian Films Production & Distribution Ltd. 2001 311
      Asian Hotels (North) Ltd. 2001 114
      Asian Hotels (North) Ltd. 2005 20111
      Asian Paints Ltd. 2001 117
      Asian Paints Ltd. 2005 20114
      Asian Paints Ltd. 2009 42121
      Asian Star Co. Ltd. 2005 20115
      Asian Tea & Exports Ltd. 2001 2016
      Asian Tea & Exports Ltd. 2009 40152
      Asman Investments Ltd. 2001 2118
      Asman Investments Ltd. 2005 20058
      Assam Petrochemicals Ltd. 2005 20118
      Asutosh Enterprises Ltd. 2009 42102
      Asya Infrastructure & Tourism Corpn. Ltd. 2009 40159
      Atcom Technologies Ltd. 2005 20739
      Athena Financial Services Ltd. 2001 2002
      Atlas Cycles (Haryana) Ltd. 2005 20126
      Atlas Cycles (Haryana) Ltd. 2009 40161
      Atul Ltd. 2005 20058
      Atul Ltd. 2009 42006
      Aunde India Ltd. 2005 22021
      Aunde India Ltd. 2009 40163
      Aurangabad Paper Mills Ltd. 2005 20128
      Auroma Coke Ltd. 2005 20130
      Ausom Enterprise Ltd. 2001 280
      Autolite (India) Ltd. 2001 695
      Autolite (India) Ltd. 2005 20132
      Autolite (India) Ltd. 2009 50249
      Automobile Products Of India Ltd. 2009 40169
      Automotive Axles Ltd. 2005 30297
      Automotive Axles Ltd. 2009 42118
      Automotive Stampings & Assemblies Ltd. 2009 40170
      Autopal Industries Ltd. 2009 50249
      Autoriders Finance Ltd. 2005 20133
      Autoriders Finance Ltd. 2009 42030
      Autoriders International Ltd. 2005 20133
      Autoriders International Ltd. 2009 42030
      Avanti Feeds Ltd. 2001 126
      Avanti Feeds Ltd. 2009 40172
      Aviva Industries Ltd. 2001 10231
      Aviva Industries Ltd. 2005 20134
      Aviva Industries Ltd. 2009 42036
      Avon Mercantile Ltd. 2005 30306
      Avon Mercantile Ltd. 2009 40173
      Avon Organics Ltd. 2009 40174
      Awdhan Trading Co. Ltd. 2005 20480
      Axis Bank Ltd. 2009 42034
      Azimuth Investments Ltd. 2005 20376
      B & A Ltd. 2009 40178
      B & A Packaging India Ltd. 2009 40178
      B & B Realty Ltd. 2005 22100
      B 2 B Software Technologies Ltd. 2001 2113
      B 2 B Software Technologies Ltd. 2005 22115
      B 2 B Software Technologies Ltd. 2009 40179
      B D H Industries Ltd. 2001 132
      B F Utilities Ltd. 2001 10335
      B F Utilities Ltd. 2009 42117
      B G F L Corporate Finance Pvt. Ltd. 2001 320
      B I T S Ltd. 2009 40184
      B K V Industries Ltd. 2001 133
      B N K Capital Markets Ltd. 2001 10311
      B N R Udyog Ltd. 2001 1183
      B N Rathi Securities Ltd. 2001 136
      B P Capital Ltd. 2005 20934
      B P L Engineering Ltd. 2001 138
      B P L Ltd. 2001 138
      B S Appliances Ltd. 2001 138
      B S E L Infrastructure Realty Ltd. 2001 24
      B S E L Infrastructure Realty Ltd. 2009 40159
      B S L Ltd. 2001 2122
      B S L Ltd. 2005 30356
      B S L Ltd. 2009 42022
      B S Refrigerators Ltd. 2001 138
      B S T Ltd. 2001 138
      Bachhraj & Co. Pvt. Ltd. 2001 10319
      Bacil Pharma Ltd. 2005 30265
      Bacil Pharma Ltd. 2009 40192
      Badra Estates & Inds. Ltd. 2001 464
      Badra Estates & Inds. Ltd. 2005 20456
      Badra Estates & Inds. Ltd. 2009 40875
      Bagalkot Udyog Ltd. 2001 143
      Bahubali Services Ltd. 2001 526

      Comment


      • #4
        I think the reason it doesn't work with my original data is because my Individual_Ids are string variables with more than one word - example, "Agrabada Industries Ltd."
        Usually when faced with string variables with spaces, e.g., company names, you will want to create a numerical identifier whose values have a one-to-one correspondence with elements in the string variable. The command encode allows you to do this.


        The other question I have is regarding the code. In the step "*DETERMINE PROPORTIONS FOR GROUP 1", it sounds like I will be able to get the proportions only for Group1. How do I get the proportions for all other groups?
        The premise is that you have to start somewhere. I just happened to start with Group 1. You can perhaps identify the largest group and start there. You may do all groups at once but I think the way to go in your case is to start at one group and eliminate what you call "same" groups and then proceed with the groups that are left. Here are the adjustments to #2

        Code:
        input str5 Individual_Id float(Group_Id Year)
        "3I Infotech Ltd." 1 2001
        "A B C Bearings Ltd." 1 2001
        "Aarti Drugs Ltd." 1 2001
        "3I Infotech Ltd." 2 2005
        "A B C Bearings Ltd." 2 2005
        "Aditya Electro Communications Ltd" 2 2005
        "Aarti Drugs Ltd." 3 2005
        "Badra Estates & Inds. Ltd." 3 2005
        "3I Infotech Ltd." 6 2009
        "A B C Bearings Ltd." 6 2009
        "Aditya Electro Communications Ltd" 6 2009
        "Aarti Drugs Ltd." 7 2009
        end
        
        encode Individual_Id, gen(iid)
        *DETERMINE TOTAL NUMBER OF GROUPS
        qui sum Group_Id
        local f= r(min)
        local l= r(max)
        
        *GET MEMBERS OF EACH GROUP
        
        forvalues i= `f'/`l'{
        levelsof iid if Group_Id ==`i', local(iid`i')
        local n`i': word count `iid`i''
        }
        
        *DETERMINE PROPORTIONS FOR GROUP 1
        foreach id in `iid1'{
        generate iid`id' = inlist( iid,  `id')
        local fw1: word 1 of `iid1'
        local fwl: word `n1' of `iid1'
        }
        
        egen pid1= rowtotal(iid`fw1' - iid`fwl')
        bysort Group_Id: replace pid1= sum((1/`n1')*pid1)
        bysort Group_Id: replace pid1=pid1[_N]
        drop iid`fw1'- iid`fwl'
        list, sepby(Group_Id)
        Code:
        . list, sepby(Group_Id)
        
             +-----------------------------------------------+
             | Indivi~d   Group_Id   Year     iid       pid1 |
             |-----------------------------------------------|
          1. |    3I In          1   2001   3I In          1 |
          2. |    A B C          1   2001   A B C          1 |
          3. |    Aarti          1   2001   Aarti          1 |
             |-----------------------------------------------|
          4. |    3I In          2   2005   3I In   .6666667 |
          5. |    A B C          2   2005   A B C   .6666667 |
          6. |    Adity          2   2005   Adity   .6666667 |
             |-----------------------------------------------|
          7. |    Aarti          3   2005   Aarti   .3333333 |
          8. |    Badra          3   2005   Badra   .3333333 |
             |-----------------------------------------------|
          9. |    3I In          6   2009   3I In   .6666667 |
         10. |    A B C          6   2009   A B C   .6666667 |
         11. |    Adity          6   2009   Adity   .6666667 |
             |-----------------------------------------------|
         12. |    Aarti          7   2009   Aarti   .3333333 |
             +-----------------------------------------------+
        You have already defined elements in each group. For the data example, we can view these using the following

        Code:
        . forvalues i= `f'/`l'{
          2. di "iid`i'=" "`iid`i''"
          3. }
        iid1=1 2 3
        iid2=1 2 4
        iid3=3 5
        iid4=
        iid5=
        iid6=1 2 4
        iid7=3
        So we checked proportions for iid1 i.e. firms assigned values 1, 2, and 3 by encode. To check for iid2, i.e., values 1, 2, and 4, just simply change the last bit of code to


        Code:
        *DETERMINE PROPORTIONS FOR GROUP 2
        foreach id in `iid2'{
        generate iid`id' = inlist( iid,  `id')
        local fw1: word 1 of `iid2'
        local fwl: word `n2' of `iid2'
        }
        
        egen pid2= rowtotal(iid`fw1' - iid`fwl')
        bysort Group_Id: replace pid2= sum((1/`n2')*pid2)
        bysort Group_Id: replace pid2=pid2[_N]
        drop iid`fw1'- iid`fwl'
        list, sepby(Group_Id)

        We could loop through all groups but as I said before, I don't think that this is the best approach for what you want to achieve.
        Last edited by Andrew Musau; 17 May 2018, 08:28.

        Comment


        • #5
          Dear Andrew, Thank you!! The encode command does the trick, but the code still doesn't fully work. Here are the two issues:


          1) The code works for most groups but not all. After much trial and error, I think the problem is in cases when the iid values are on the higher side, but I am not sure how to fix this issue. See one example below where the code doesn't seem to recognize iid1134. I don't receive an error message, and the code runs through but doesn't seem to take into account some of the iids. Here AVT Natural Products Ltd. has iid of 15, Midland Rubber & Produce Co. Ltd. has iid of 1046 and Neelamalai Agro Inds Ltd. has iid of 1134.

          Year Group_Id iid iid1134 pid14
          2001 14 A V T Natural Products Ltd. 0 .6666667
          2001 14 Midland Rubber & Produce Co. Ltd. 0 .6666667
          2001 14 Neelamalai Agro Inds. Ltd. 1 .6666667
          2005 20013 A V T Natural Products Ltd. 0 .6666667
          2005 20013 Midland Rubber & Produce Co. Ltd. 0 .6666667
          2005 20013 Neelamalai Agro Inds. Ltd. 1 .6666667
          2009 40021 A V T Natural Products Ltd. 0 .6666667
          2009 40021 Midland Rubber & Produce Co. Ltd. 0 .6666667
          2009 40021 Neelamalai Agro Inds. Ltd. 1 .6666667


          Another example (Here Accel Transmatic has iid 25 and Toyama Electric has iid 1737:
          Individual_Id Year Group_Id iid pid23
          Accel Transmatic Ltd. 2001 23 Accel Transmatic Ltd. 1
          Toyama Electric Ltd. 2001 23 Toyama Electric Ltd. 1
          Accel Transmatic Ltd. 2005 20018 Accel Transmatic Ltd. .5
          Toyama Electric Ltd. 2005 21108 Toyama Electric Ltd. .5
          Accel Transmatic Ltd. 2009 40032 Accel Transmatic Ltd. .5


          2) I have 2605 groups, so I would like to explore the option looping through the groups. I thought I would run the loop only on the groups in 2001 - the group_ids are arranged by Year, and the first 1201 groups are the ones in 2001. And then once I have the pid values for each of these, I can then check if there are cases of conflict (i.e. if an iid has .67 in pid55 and .9 in pid67, then I place this iid in Group 67). I created the loop below, but I get the following error message (invalid syntax r(198);, and I am not sure if it has something to do with the issue mentioned above or some other issue.

          forvalues j= 1/1201{
          foreach id in `iid`j''{
          generate iid`id' = inlist( iid, `id')
          local fw1: word `j' of `iid`j''
          local fwl: word `n`i'' of `iid`j''
          }

          egen pid`j'= rowtotal(iid`fw`j'' - iid`fwl')
          bysort Group_Id: replace pid`j'= sum((`j'/`n`i'')*pid`j')
          bysort Group_Id: replace pid`j'=pid`j'[_N]
          /*drop iid`fw1'- iid`fwl'*/
          list, sepby(Group_Id)

          }



          Thank you again! I am very grateful to this statalist and to you in particular.




          Comment


          • #6
            I am away this week so if no one else gives you some suggestions, I will get back to you.

            Comment


            • #7
              Okay Thanks Andrew.

              Comment


              • #8
                1) The code works for most groups but not all. After much trial and error, I think the problem is in cases when the iid values are on the higher side, but I am not sure how to fix this issue. See one example below where the code doesn't seem to recognize iid1134. I don't receive an error message, and the code runs through but doesn't seem to take into account some of the iids. Here AVT Natural Products Ltd. has iid of 15, Midland Rubber & Produce Co. Ltd. has iid of 1046 and Neelamalai Agro Inds Ltd. has iid of 1134.

                You are absolutely right that there is an anomaly in the code. The weighting factor that we use for aggregation should be based on the size of the largest group so that comparisons are consistent. This was not apparent in my codes above because it just happened that my target groups had at least as many members as the other groups. I now correct for this by introducing a variable that counts the number of members per group.


                2) I have 2605 groups, so I would like to explore the option looping through the groups. I thought I would run the loop only on the groups in 2001 - the group_ids are arranged by Year, and the first 1201 groups are the ones in 2001. And then once I have the pid values for each of these, I can then check if there are cases of conflict (i.e. if an iid has .67 in pid55 and .9 in pid67, then I place this iid in Group 67). I created the loop below, but I get the following error message (invalid syntax r(198);, and I am not sure if it has something to do with the issue mentioned above or some other issue.
                Now I provide a code that loops across all groups. This will create all indicators at once and then determine the proportions thereafter. With a large data set, you may be hit with variable limits because the defaults are set at conservative levels for purposes of efficient memory management. Therefore, I suggest that you temporarily increase the maximum number of variables to the allowed limit before you open your data set. In Stata/SE, this is set at 32767 but see help maxvar for others.

                Here is the code for the data example. The main changes are highlighted. I think that this should do it but do report if you run into problems.

                Code:
                set maxvar 32767
                input str5 Individual_Id float(Group_Id Year)
                "3I Infotech Ltd." 1 2001
                "A B C Bearings Ltd." 1 2001
                "Aarti Drugs Ltd." 1 2001
                "3I Infotech Ltd." 2 2005
                "A B C Bearings Ltd." 2 2005
                "Aditya Electro Communications Ltd" 2 2005
                "Aarti Drugs Ltd." 3 2005
                "Badra Estates & Inds. Ltd." 3 2005
                "3I Infotech Ltd." 6 2009
                "A B C Bearings Ltd." 6 2009
                "Aditya Electro Communications Ltd" 6 2009
                "Aarti Drugs Ltd." 7 2009
                end
                
                encode Individual_Id, gen(iid)
                *DETERMINE TOTAL NUMBER OF GROUPS
                qui sum Group_Id
                local f= r(min)
                local l= r(max)
                
                *GEN COUNT OF ID BY GROUP
                bys Group_Id (Year): egen total= count(iid)
                
                *GET MEMBERS OF EACH GROUP
                
                forvalues i= `f'/`l'{
                levelsof iid if Group_Id ==`i', local(iid`i')
                local n`i': word count `iid`i''
                }
                
                *ID FIRST AND LAST DUMMIES TO DROP LATER
                local idf: word 1 of `iid`f''
                local idl: word `n`l'' of `iid`l''
                
                forvalues i= `f'/`l'{
                foreach id in `iid`i''{
                generate ii`i'd`id' = inlist( iid,  `id')
                }
                }
                
                *LOOP ACROSS GROUPS
                levelsof Group_Id, local(gid)
                foreach i in `gid'{
                egen pid`i'= rowtotal(ii`i'd*)
                *SPECIFY WEIGHTING CONDITION
                bysort Group_Id: replace pid`i'= cond(`n`i'' > total, sum((1/`n`i'')*pid`i'), sum((1/total)*pid`i'))
                bysort Group_Id: replace pid`i'=pid`i'[_N]
                }
                drop ii`f'd`idf'- ii`l'd`idl'
                list, clean
                Code:
                . list, clean
                
                       Indivi~d   Group_Id   Year     iid   total       pid1       pid2       pid3       pid6       pid7  
                  1.      3I In          1   2001   3I In       3          1   .6666667   .3333333   .6666667   .3333333  
                  2.      A B C          1   2001   A B C       3          1   .6666667   .3333333   .6666667   .3333333  
                  3.      Aarti          1   2001   Aarti       3          1   .6666667   .3333333   .6666667   .3333333  
                  4.      3I In          2   2005   3I In       3   .6666667          1          0          1          0  
                  5.      A B C          2   2005   A B C       3   .6666667          1          0          1          0  
                  6.      Adity          2   2005   Adity       3   .6666667          1          0          1          0  
                  7.      Aarti          3   2005   Aarti       2   .3333333          0          1          0         .5  
                  8.      Badra          3   2005   Badra       2   .3333333          0          1          0         .5  
                  9.      3I In          6   2009   3I In       3   .6666667          1          0          1          0  
                 10.      A B C          6   2009   A B C       3   .6666667          1          0          1          0  
                 11.      Adity          6   2009   Adity       3   .6666667          1          0          1          0  
                 12.      Aarti          7   2009   Aarti       1   .3333333          0         .5          0          1

                Comment


                • #9
                  Andrew, This works like a charm!! Thank you!!

                  I added some more code (which I found using past histories of statalist conversations: https://www.statalist.org/forums/for...value-in-a-row) to identify the Group_Id for the maximum match. So I am all set.

                  Thanks very much to this group and to you in particular.
                  Best

                  Comment

                  Working...
                  X