Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Error with summing/aggregating data

    Dear StataList.
    I've run into an awkward situation with my research and would appreciate feedback.


    I'm trying to sum/collapse my data using the following command:

    Code:
    collapse (sum) Industrial_FDI, by(Source Destination Mdate)
    On the following dataset:

    Code:
    . dataex
    
    ----------------------- copy starting from the next line -----------------------
    
    
    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input int Year str24 Source str32 Destination double Industrial_FDI float Mdate
    22280 "Switzerland"       "China"       1.47251911207098 731
    22280 "Japan"             "China"       1.47251911207098 731
    22280 "Switzerland"       "China"        .92032444504436 731
    22280 "Netherlands"       "Singapore"   174.739225454947 731
    22280 "USA"               "Vietnam"                  475 731
    22279 "Switzerland"       "China"       1.47039530321274 731
    22279 "Switzerland"       "China"       1.47039530321274 731
    22274 "France"            "China"       1.46200452935144 731
    22261 "Singapore"         "China"       1.45350408845809 731
    22260 "Germany"           "China"       72.8466128209823 731
    22257 "Switzerland"       "China"       1.45330910722439 731
    22257 "Switzerland"       "China"       1.45330910722439 731
    22252 "China"             "China"       13.3241748640299 731
    22251 "USA"               "Japan"       224.114838749136 731
    22250 "Japan"             "China"       13.1254967184639 731
    22246 "Japan"             "China"       1.42929172796481 730
    22243 "Japan"             "China"        71.037723977011 730
    22242 "Italy"             "China"       4.74306127060842 730
    22239 "Japan"             "China"       169.036413203002 730
    22231 "Switzerland"       "China"       .883324734312303 730
    22231 "USA"               "China"       52.7821794405179 730
    22229 "France"            "China"       219.378654424014 730
    22224 "Germany"           "Malaysia"    14.0660346472531 730
    22208 "USA"               "Thailand"    167.588638886432 729
    22206 "France"            "China"       14.6463989964518 729
    22206 "France"            "China"       23.4342383943229 729
    22206 "France"            "China"       70.3027151829686 729
    22206 "France"            "China"       46.8684767886458 729
    22202 "USA"               "China"       2.34892824286582 729
    22202 "USA"               "China"       5.87232060716454 729
    22202 "USA"               "China"       2.05531221250759 729
    22202 "USA"               "China"       14.2586920658836 729
    22201 "Switzerland"       "South Korea" 70.8870338156458 729
    22195 "Luxembourg"        "Australia"   4.69359059236302 729
    22194 "Norway"            "South Korea" 4.71514820495173 729
    22187 "France"            "Japan"       4.66782669589684 728
    22175 "Ireland"           "Singapore"                 50 728
    22173 "Austria"           "Singapore"   4.74752519872969 728
    22173 "Austria"           "Singapore"   47.4752519872969 728
    22173 "Austria"           "Singapore"   23.7376259936485 728
    22168 "Germany"           "Singapore"   168.158395321009 728
    22166 "Switzerland"       "Japan"       70.8985842377751 728
    22165 "USA"               "China"       168.625295555008 728
    22161 "Switzerland"       "China"       1.42248928655661 728
    22161 "Japan"             "China"       1.42248928655661 728
    22158 "Republic of Korea" "China"       1.43279989397281 727
    22158 "Switzerland"       "Malaysia"    170.025870396086 727
    22154 "Japan"             "Thailand"    11.8376334223717 727
    22148 "Republic of Korea" "China"       1.42387858174219 727
    22148 "Sweden"            "China"       220.271146293609 727
    22146 "Japan"             "Vietnam"     47.7593983106899 727
    22140 "USA"               "Vietnam"     14.7367186384904 727
    22137 "Taiwan, China"     "Singapore"   4.71078145259312 727
    22131 "France"            "Japan"       2.05815514449713 727
    22127 "Republic of Korea" "Vietnam"     4.73920004435891 726
    22124 "France"            "Japan"        1.4116033179393 726
    22123 "Switzerland"       "China"       1.39827383910223 726
    22123 "Republic of Korea" "China"       1.39827383910223 726
    22119 "Switzerland"       "China"       1.38844199585252 726
    22108 "France"            "Malaysia"    4.51854809448261 726
    22104 "United Kingdom"    "China"        4.5098558532441 726
    22102 "USA"               "Australia"   4.49807171944178 726
    22100 "Italy"             "China"       160.106778603857 726
    22100 "Italy"             "China"        67.460609045415 726
    22097 "USA"               "Singapore"   39.3226429292268 726
    22097 "India"             "Australia"   4.49401633476878 726
    22096 "Japan"             "Thailand"    159.459715772488 725
    22092 "Spain"             "Hong Kong"   67.3299900732994 725
    22089 "USA"               "Malaysia"     56.372859323183 725
    22087 "USA"               "Vietnam"     12.2963039270761 725
    22085 "Switzerland"       "Thailand"    .840660912894789 725
    22085 "Switzerland"       "Thailand"    1.34505746063166 725
    22084 "Sweden"            "Australia"   14.0565231354385 725
    22084 "Sweden"            "Australia"   4.49808740334031 725
    22084 "Sweden"            "Australia"   44.9808740334031 725
    22081 "Japan"             "Hong Kong"   22.4969863225535 725
    22077 "Switzerland"       "China"        4.5508071753774 725
    22070 "Sweden"            "Singapore"   4.49304503867099 725
    22069 "Switzerland"       "China"       159.111387540069 725
    22069 "Canada"            "Singapore"   13.5653338617733 725
    22068 "Germany"           "Malaysia"    11.1318554994703 725
    22067 "Japan"             "Australia"   6.65908449543908 725
    22067 "Switzerland"       "Japan"       .833456864420312 725
    22061 "France"            "Japan"       1.30743468534779 724
    22057 "United Kingdom"    "China"       21.9025158442112 724
    22043 "Singapore"         "Hong Kong"   1.30029837047334 724
    22042 "Luxembourg"        "Australia"   64.7895823469963 724
    22037 "Switzerland"       "China"       1.33080210809541 724
    22025 "United Kingdom"    "Australia"   4.34713583030131 723
    22025 "United Kingdom"    "Australia"   43.4713583030131 723
    22009 "USA"               "South Korea" 2.16100141027503 723
    22009 "USA"               "South Korea" 5.40250352568758 723
    22008 "Finland"           "China"       154.548600913529 723
    22008 "Finland"           "China"       21.7062280470122 723
    22004 "Japan"             "Indonesia"   22.2249731898926 722
    22004 "Japan"             "Indonesia"   44.4499463797852 722
    22004 "Germany"           "South Korea" 4.44499463797852 722
    22001 "Italy"             "China"       157.193985846178 722
    21997 "Israel"            "Singapore"   4.26362991706475 722
    21991 "USA"               "Australia"   44.6736730924153 722
    end
    format %tdnn/dd/CCYY Year
    format %tm Mdate
    ------------------ copy up to and including the previous line ------------------ Listed 100 out of 4847 observations Use the count() option to list more
    For some reason my result gives me the following where certain observations are a bit weird like Destination = Source which shouldn't be the case.
    Code:
    ----------------------- copy starting from the next line -----------------------
    
    
    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input str24 Source str32 Destination float Mdate double Industrial_FDI
    "Australia" "Australia"   563   101.458437994723
    "Australia" "Australia"   566   2.33208955223881
    "Australia" "Australia"   581   15.4619197076725
    "Australia" "Australia"   587    170.23691303731
    "Australia" "Australia"   593  19.78759430908783
    "Australia" "Australia"   596   74.2605824986821
    "Australia" "Australia"   605   4.90839704023658
    "Australia" "Australia"   608   104.960633704013
    "Australia" "Australia"   614   13.9347606063775
    "Australia" "Australia"   617   5.78119979840956
    "Australia" "Australia"   629   15.2241926410949
    "Australia" "Australia"   632   99.4388069644084
    "Australia" "Australia"   641     2.288999981688
    "Australia" "Australia"   647   5.51639984300326
    "Australia" "Australia"   664   .843091231433301
    "Australia" "Australia"   668   39.2105021408934
    "Australia" "Australia"   677   1.92692699075277
    "Australia" "Australia"   694   7.86361687264259
    "Australia" "Australia"   704   11.3541805578708
    "Australia" "Australia"   707    99.287435221592
    "Australia" "China"       684   13.0069508609361
    "Australia" "China"       691   168.388272675028
    "Australia" "China"       694   23.5422195353382
    "Australia" "Hong Kong"   527   75.7795824545007
    "Australia" "Japan"       530     48.89616898516
    "Australia" "Japan"       533   4.86198052776799
    "Australia" "Japan"       575   209.627289482439
    "Australia" "Japan"       599   2.52105077396259
    "Australia" "Japan"       689  31.46587462561203
    "Australia" "Malaysia"    719   1.93879521160795
    "Australia" "Thailand"    548   171.477712751225
    "Australia" "Thailand"    608    13.648001659597
    "Australia" "Thailand"    629   15.2850073752218
    "Australia" "Thailand"    632   15.6977881424658
    "Australia" "Thailand"    648   16.4092260752282
    "Australia" "Thailand"    654 32.831835630514206
    "Australia" "Thailand"    655   13.1880015561842
    "Australia" "Thailand"    691   168.388272675028
    "Australia" "Thailand"    697   173.927564908218
    "Australia" "Vietnam"     692   4.72239964015315
    "Austria"   "Australia"   696   98.9853658768075
    "Austria"   "Australia"   704  13.62732315842048
    "Austria"   "Australia"   708   1.99153102070734
    "Austria"   "Australia"   709  71.34996415377798
    "Austria"   "China"       530   174.070651296971
    "Austria"   "China"       536   151.448993013762
    "Austria"   "China"       539     13.62100904435
    "Austria"   "China"       542   184.606916265865
    "Austria"   "China"       560   5.06399625264277
    "Austria"   "China"       572   159.126625623167
    "Austria"   "China"       608    5.4592006638388
    "Austria"   "China"       611    5.3475935828877
    "Austria"   "China"       617   205.811055359469
    "Austria"   "China"       629   233.717707280495
    "Austria"   "China"       632    184.12350761939
    "Austria"   "China"       635   164.307827617231
    "Austria"   "China"       642    185.30757519347
    "Austria"   "China"       645   194.248159249396
    "Austria"   "China"       650   43.9310120403514
    "Austria"   "China"       653   194.490143026502
    "Austria"   "China"       665   4.38602105912134
    "Austria"   "China"       666   158.503256568763
    "Austria"   "China"       675  703.2588937655989
    "Austria"   "China"       697  39.70584188038487
    "Austria"   "China"       712  74.23658509929537
    "Austria"   "China"       713   164.194191650498
    "Austria"   "China"       715   4.43164483930187
    "Austria"   "Indonesia"   639   5.12369917635947
    "Austria"   "Malaysia"    660  77.42466321410795
    "Austria"   "Myanmar"     637   5.43506081144018
    "Austria"   "Philippines" 666   11.1308281438298
    "Austria"   "Singapore"   640  15.57551365648006
    "Austria"   "Singapore"   660    9.6780829017635
    "Austria"   "Singapore"   678   6.64654182128338
    "Austria"   "Singapore"   728  75.96040317967508
    "Austria"   "South Korea" 521   212.132991658096
    "Austria"   "South Korea" 647   256.012780670976
    "Austria"   "South Korea" 660   82.2637046649897
    "Austria"   "Vietnam"     669   4.46906589250447
    "Austria"   "Vietnam"     680   4.46438246233659
    "Austria"   "Vietnam"     695    7.0817637257623
    "Bahamas"   "China"       640   187.495531213074
    "Bahamas"   "China"       651   196.148602997146
    "Bahamas"   "Indonesia"   718   66.1299832911311
    "Bahamas"   "Malaysia"    717   87.7791343940589
    "Belarus"   "China"       663   152.923585054526
    "Belgium"   "China"       524   142.214169191331
    "Belgium"   "China"       659   4.97904296914879
    "Belgium"   "China"       664     208.1742517214
    "Belgium"   "China"       704   11.5760034079754
    "Belgium"   "Hong Kong"   659   4.97904296914879
    "Belgium"   "Japan"       667   4.39319118768444
    "Belgium"   "Japan"       702   4.67042817386196
    "Belgium"   "Malaysia"    518   65.3701585226344
    "Belgium"   "Malaysia"    661   4.51418865922954
    "Belgium"   "Malaysia"    670                114
    "Belgium"   "Malaysia"    691   17.7374992018125
    "Belgium"   "Singapore"   584   5.72120019337657
    "Belgium"   "Thailand"    611   190.374648395722
    "Bermuda"   "China"       649   192.097178997362
    end
    format %tm Mdate
    ------------------ copy up to and including the previous line ------------------ Listed 100 out of 2885 observations Use the count() option to list more
    Last edited by Nour Mohamed; 15 Jan 2025, 12:35.

  • #2
    I am not quite clear on what the question is, but

    Code:
    list if Source == Destination
    and even

    Code:
    drop if Source == Destination
    might be helpful.

    Comment


    • #3
      You say destination should not equal source, but there is a counter example in your example data. If I load the first data example, then run the following:

      Code:
      count if Destination == Source
      list if Destination == Source, clean noobs
      I see there is a place where source and destination are equal.

      Code:
      . count if Destination == Source
        1
      
      . list if Destination == Source, clean noobs
      
               Year   Source   Dxestin~n   Industr~I     Mdate  
          12/3/2020    China      China   13.324175   2020m12

      Comment


      • #4
        Nick Cox and Daniel Schaefer give good advice.

        I can only add a caution. Resist the temptation to do a quick fix by running -drop if Source == Destination-. Clearly it was your expectation that there should be no observations where Source == Destination. The fact that this expectation has proved wrong says that either you do not understand your data, or there was an error in the data management that created your data. Either way, you should resolve whatever caused that misunderstanding or data management problem before proceeding. If there was a problem in the data management, you should review it, beginning to end, because where one mistake shows up, others may lurk as yet unnoticed. Better to find and fix them now.

        Comment


        • #5
          Originally posted by Clyde Schechter View Post
          Nick Cox and Daniel Schaefer give good advice.

          I can only add a caution. Resist the temptation to do a quick fix by running -drop if Source == Destination-. Clearly it was your expectation that there should be no observations where Source == Destination. The fact that this expectation has proved wrong says that either you do not understand your data, or there was an error in the data management that created your data. Either way, you should resolve whatever caused that misunderstanding or data management problem before proceeding. If there was a problem in the data management, you should review it, beginning to end, because where one mistake shows up, others may lurk as yet unnoticed. Better to find and fix them now.
          Thank you
          I'll retrace my steps before dropping any Source == Destination

          Comment


          • #6
            Originally posted by Daniel Schaefer View Post
            You say destination should not equal source, but there is a counter example in your example data. If I load the first data example, then run the following:

            Code:
            count if Destination == Source
            list if Destination == Source, clean noobs
            I see there is a place where source and destination are equal.

            Code:
            . count if Destination == Source
            1
            
            . list if Destination == Source, clean noobs
            
            Year Source Dxestin~n Industr~I Mdate
            12/3/2020 China China 13.324175 2020m12
            I ran the code as suggested and have around 300 observations where Destination == Source.
            The data is supposed to be capital expenditure in the Industrial sector(which I'm proxying for industrial data since I don't have access to FDi Markets) from Orbis so I'm a bit confused.

            Comment

            Working...
            X