Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Look-up Tables

    Hi,

    While I have used Stata for simple analysis for a while, I am starting to need to do more complicated tasks.

    I wish to make use of reference tables in a similar way that one would use vlookup in Excel. I have a few things I wish to do with this, but here is a simple example:

    In my research group we are asking a child’s country of birth, and the mother and father’s countries of birth. When I designed the REDCap form I made this a complete drop-down list with 197 countries and an “Other” option that unlocks a free-text field and an “Unknown” option. The idea was to avoid multiple spellings of the same country and allow for countries to then be processed into larger groups. It looks like this:
    Click image for larger version

Name:	image_14528.png
Views:	1
Size:	201.1 KB
ID:	1499044

    (filename: paeds.dta)

    As you can see the countries have appropriate value labels in Stata.

    Rather than analyse these on an individual country basis, I have decided to group them according to the system of regions and subregions used by the United Nations Statistical Division.

    I have another Stata file with all of the countries listed next to their matching number (as per the value label system above), with their UNSD subregion:
    Click image for larger version

Name:	countries.png
Views:	1
Size:	149.6 KB
ID:	1499045


    (filename countries.dta)

    I would like to write a .do file that generates a new variable in paeds.dta. The .do file would look up the country of birth in countries.dta and enter the appropriate subregion into the new variable. Can anyone tell me how this would be achieved? Or direct me to an explanation (as I am sure this has been addressed).

    Kind regards
    Last edited by Ryan Lucas; 19 May 2019, 16:52.

  • #2
    Welcome to Statalist.

    Added in edit: Unfortunately, your Stata attachements did not appear in your original post correctly.You subsequently uploaded screenshots but I have not taken the time to revise my text to match your screenshots.

    The general approach to the problem will be something like the following. Assume your paeds dataset has variables mcnum and fcnum with the country of birth numeric code for the mother and father respectively. Also assume your countries dataset has variables cnum and subr with country number and corresponding subregion.

    The following example is an untested attempt to start you in a useful direction.
    Code:
    use paeds, clear
    generate cnum = mcnum
    merge m:1 cnum using countries, keep(master match)
    rename subr msubr
    drop _merge
    replace cnum = fcnum
    merge m:1 cnum using countries, keep(master match)
    rename subr fsubr
    drop _merge
    To help with presentation of your problems on Statalist, take a few moments to review the Statalist FAQ linked to from the top of the page, as well as from the Advice on Posting link on the page you used to create your post. Note especially sections 9-12 on how to best pose your question. It's particularly helpful to copy commands and output from your Stata Results window and paste them into your Statalist post using code delimiters [CODE] and [/CODE], and to use the dataex command to provide sample data, as described in section 12 of the FAQ.

    If you are running version 15.1 or a fully updated version 14.2, dataex is already part of your official Stata installation. If not, run ssc install dataex to get it. Either way, run help dataex and read the simple instructions for using it. dataex will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

    When asking for help with code, always show example data. When showing example data, always use dataex.
    Last edited by William Lisowski; 19 May 2019, 17:13.

    Comment


    • #3
      Hi William,

      Thank you so much - I have now taught myself merge.It worked perfectly, and it will really come in handy. By conceptualising it as a vlookup I don't think I ever would have figured this out.

      In the future I will use dataex, this is another code that I hadn't used before.

      Thanks again,

      Comment


      • #4
        Less flexible then doing this from first principle, but maybe convenient is the user written kountry command, which also has the ability to assign regions to countries. See ssc desc kountry
        ---------------------------------
        Maarten L. Buis
        University of Konstanz
        Department of history and sociology
        box 40
        78457 Konstanz
        Germany
        http://www.maartenbuis.nl
        ---------------------------------

        Comment

        Working...
        X