Good day Everyone,
Please I have two cross sectional datasets, namely df15 and df16. All the datasets have the same variables (Banks, Country, Year, etc) but slightly different observations (Banks). Almost all the observations (banks) are the same just that there are very few banks in one dataset that is not in the other dataset. df15 and df16 have 1552 and 1634 observations respectively. It is important to state that the Year variable for df15 dataset is filled with only '2015' values throughout while the Year variable for df16 dataset is also filled with only '2016' values throughout. Now the problem is, whenever I append these two datasets, the total observations increases from approximately 1000 (i.e. df15 has 1552 and df16 has 1634) to 3186 which means stata summed the two datasets to get that 3186 total observations. Upon investigation of the data editor in stata, I realized that after I appended the datasets, the second dataset, df16 started from 1553 observation after the last observation (1552) of df15 dataset ended. In view of this problem, how can I accurately append these two datasets without just doubling the observations?
Kindly find the attached, which are extract from the two stata datasets:
The first dataset df15 is as follows:
The second dataset df16 is also as follows:
The codes I used are as follows:
Please how can I accurately append these two datasets without doubling the observations? Thanks in advance.
Please I have two cross sectional datasets, namely df15 and df16. All the datasets have the same variables (Banks, Country, Year, etc) but slightly different observations (Banks). Almost all the observations (banks) are the same just that there are very few banks in one dataset that is not in the other dataset. df15 and df16 have 1552 and 1634 observations respectively. It is important to state that the Year variable for df15 dataset is filled with only '2015' values throughout while the Year variable for df16 dataset is also filled with only '2016' values throughout. Now the problem is, whenever I append these two datasets, the total observations increases from approximately 1000 (i.e. df15 has 1552 and df16 has 1634) to 3186 which means stata summed the two datasets to get that 3186 total observations. Upon investigation of the data editor in stata, I realized that after I appended the datasets, the second dataset, df16 started from 1553 observation after the last observation (1552) of df15 dataset ended. In view of this problem, how can I accurately append these two datasets without just doubling the observations?
Kindly find the attached, which are extract from the two stata datasets:
The first dataset df15 is as follows:
Code:
* Example generated by -dataex-. For more info, type help dataex clear input str119 Banks str25 Country double(Interest_Revenue Interest_Expense) float Year "BNP PARIBAS" "France" 22553000 22553000 2015 "CREDIT AGRICOLE SA" "France" 11558000 11558000 2015 "BANCO SANTANDER SA" "Spain" 32812000 32812000 2015 "SOCIETE GENERALE" "France" 9306000 9306000 2015 "DEUTSCHE BANK AG" "Germany" 15881000 15881000 2015 "CREDIT MUTUEL (COMBINED - IFRS)" "France" 7075000 7075000 2015 "INTESA SANPAOLO" "Italy" 9132000 9132000 2015 "ING BANK NV" "Netherlands" 12744000 12744000 2015 "BPCE SA" "France" 2841000 2841000 2015 "UNICREDIT SPA" "Italy" 10664004 10664004 2015 "LA BANQUE POSTALE" "France" 3124903 3124903 2015 "CAIXABANK, S.A." "Spain" 4352650 4352650 2015 "BANQUE FEDERATIVE DU CREDIT MUTUEL" "France" 3830000 3830000 2015 "BANCO BILBAO VIZCAYA ARGENTARIA SA" "Spain" 16022000 16022000 2015 "COOPERATIEVE RABOBANK U.A." "Netherlands" 9139000 9139000 2015 "DZ BANK AG DEUTSCHE ZENTRAL-GENOSSENSCHAFTSBANK, FRANKFURT AM MAIN" "Germany" 2755000 2755000 2015 "CREDIT AGRICOLE CORPORATE AND INVESTMENT BANK SA" "France" 1898000 1898000 2015 "NATIXIS SA" "France" 2370000 2370000 2015 "COMMERZBANK AG" "Germany" 5727000 5727000 2015 "ABN AMRO BANK NV" "Netherlands" 6077000 6077000 2015 end
The second dataset df16 is also as follows:
Code:
* Example generated by -dataex-. For more info, type help dataex clear input str119 Banks str25 Country double(Interest_Revenue Interest_Expense) float Year "BNP PARIBAS" "France" 22376000 22376000 2016 "DEUTSCHER SPARKASSEN-UND GIROVERBAND EV (COMBINED)" "Germany" 31166000 31166000 2016 "CREDIT AGRICOLE SA" "France" 11739000 11739000 2016 "BANCO SANTANDER SA" "Spain" 31089000 31089000 2016 "SOCIETE GENERALE" "France" 9467000 9467000 2016 "DEUTSCHE BANK AG" "Germany" 14707000 14707000 2016 "CREDIT MUTUEL (COMBINED - IFRS)" "France" 6899000 6899000 2016 "INTESA SANPAOLO" "Italy" 8598000 8598000 2016 "ING BANK NV" "Netherlands" 13317000 13317000 2016 "BPCE SA" "France" 2996000 2996000 2016 "UNICREDIT SPA" "Italy" 10307011 10307011 2016 "LA BANQUE POSTALE" "France" 2827871 2827871 2016 "CAIXABANK, S.A." "Spain" 4156856 4156856 2016 "BANQUE FEDERATIVE DU CREDIT MUTUEL" "France" 3981000 3981000 2016 "BANCO BILBAO VIZCAYA ARGENTARIA SA" "Spain" 17060000 17060000 2016 "COOPERATIEVE RABOBANK U.A." "Netherlands" 8743000 8743000 2016 "DZ BANK AG DEUTSCHE ZENTRAL-GENOSSENSCHAFTSBANK, FRANKFURT AM MAIN" "Germany" 2547000 2547000 2016 "CREDIT AGRICOLE CORPORATE AND INVESTMENT BANK SA" "France" 2833000 2833000 2016 "NATIXIS SA" "France" 2654000 2654000 2016 "COMMERZBANK AG" "Germany" 4164000 4164000 2016 end
The codes I used are as follows:
Code:
use df15.dta, clear append using df16
Comment