Maximizing Stata's Metadata Capabities

Peter Deffebach

Join Date: Jul 2017

Posts: 5
#1

Maximizing Stata's Metadata Capabities

08 Nov 2017, 09:25

I am currently working on merging in data on country-years to produce a massive master dataset with every variable one could want for every country and year possible. This means my dataset has variables from the UN, WDI, OECD, etc.

I have been doing a good job keeping track of which data is from which source using Notes. Right now, if you are curious about the data source of a variable, you can just write `note variable` and it will tell you the data source.

I am wondering if I can go even further, and make it easy to just select variables based on which data source they come from. I don't want to simply add suffixes to names, since as anyone who has worked with WDI data can attest to, the names are already complicated enough. Are there any other ways you are aware of that I can attach meta-data to variables to make it easier to keep track of this complicated dataset?

Thanks for the help.
Tags: None
Nick Cox

Join Date: Mar 2014

Posts: 35697
#2

08 Nov 2017, 09:41

In short, no. But there's more scope for using variable names, variable labels, value labels and notes to tag different kinds of variables differently than you may realise. ds offers some filters and findname (Stata Journal) offers more.
Comment
Peter Deffebach

Join Date: Jul 2017

Posts: 5
#3

12 Nov 2017, 13:43

Thanks for the help. Being competent in ds is a super good skill to have in general with Stata.

It is a bit of a bummer that variables don't have more user-defined OOP properties. Would be nice to create a .source() field attached to variables.
Comment
wbuchanan

Join Date: Mar 2014

Posts: 1362
#4

29 Nov 2017, 01:32

Peter Deffebach I would suggest using characteristics. If you add an _dta[UN] characteristic with all of the variables sourced from the UN records you could easily access that variable list with `: char _dta[UN]’. Additionally, you could also define a standard characteristic for each variable :

Code:

char def varname[source] UN

and could test whether the source for a given variable is from the user specific source:

Code:

if `”`: char varname[source]’”’ == “UN” { ... } else { ... }

characteristics will probably get you closest to something like an OOP type framework and I’ve found them very useful. I’ve used similar techniques to this to write programs around similar types of data sets (like the one used by brewscheme to define color palettes) to look up specific attributes that I define for those data (mostly these are properties related to the metadata in the color Brewer palettes).
1 like
Comment

Announcement

Maximizing Stata's Metadata Capabities

Comment

Comment

Comment