Hello,
I am new to Stata, so apologies for the convoluted question - I am not yet familiar with all of the language around the programme.
I am working with a dataset with 5 variables and around 200 million observations. I unfortunately cannot share an example of the data as it is confidential, but here is how it looks when I browse:
Each country is encoded as a number from 1-140 (there are 140 countries). I am able to generate variables counting the missing observations and total observations for each variable grouped by country using the following two commands:
egen missingvar = total(missing(var)) , by(country)
egen totalvar = total(var) , by(country)
I would like to create an Excel spreadsheet like the one below (going all the way to 140), to record the data quality of each variable per jurisdiction, disregarding the non-profit variable:
I feel I am missing a step (or 10) between the missing and total observation variables I generate using the above commands, and the resulting exported data in Excel.
Also, if I simply export the relevant variables to Excel I receive an error message stating this is not possible because I have too many observations, but this perhaps relates to the missing steps.
Thank you in advance for your help.
With my best wishes,
Clara
I am new to Stata, so apologies for the convoluted question - I am not yet familiar with all of the language around the programme.
I am working with a dataset with 5 variables and around 200 million observations. I unfortunately cannot share an example of the data as it is confidential, but here is how it looks when I browse:
countries | date 1 | date 2 | non-profit | type | |
USA | 20766 | . | true | Unincorporated Nonprofit Association (D) | |
France | 20467 | . | false | Corporation - Professional - Domestic | |
Italy | 20660 | . | false | Limited Partnership (D) | |
USA | 20453 | . | true | Unincorporated Nonprofit Association (D) | |
USA | 14770 | 20291 | false | For-profit Corporation | |
China | 11983 | 21886 | false | Professional Corporation | |
Russia | . | . | false | Limited Liability Company (D) | |
China | 12235 | 17217 | false |
|
|
China | . | . | true | Unincorporated Nonprofit Association (D) | |
China | . | . | false | Corporation - Professional - Domestic | |
UK | 14770 | 22410 | false | Limited Partnership (D) | |
Germany | 4336 | 9831 | false | Corporation - Professional - Domestic | |
France | 8068 | 10251 | true | Corporation - Nonprofit - Domestic | |
France | -8299 | 9817 | true | Corporation - Nonprofit - Domestic |
egen missingvar = total(missing(var)) , by(country)
egen totalvar = total(var) , by(country)
I would like to create an Excel spreadsheet like the one below (going all the way to 140), to record the data quality of each variable per jurisdiction, disregarding the non-profit variable:
country code | date 1 | date 2 | type |
total | missing | total | missing | total | missing | |
1 | ||||||
2 | ||||||
3 | ||||||
4 | ||||||
5 | ||||||
6 | ||||||
7 | ||||||
8 | ||||||
9 | ||||||
10 | ||||||
11 | ||||||
12 | ||||||
13 | ||||||
14 |
Also, if I simply export the relevant variables to Excel I receive an error message stating this is not possible because I have too many observations, but this perhaps relates to the missing steps.
Thank you in advance for your help.
With my best wishes,
Clara
Comment