Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Keeping observations using lists or macros


    Hi,

    Im trying to keep a certain set of observarions from my dataset.

    I want to keep observations that have any of a defined set of values in any of a subset of variables.

    To be specific, each of my variables is a moment in an occupational trayectory (first job, second job, third job, etc..) and i would like to keep observations that at any point in their trayectory have had any of certain jobs (eg: accountant, doctor, manager, teacher, etc). So i would want to keep observations that at any point have had any of those jobs.

    I could do this manually with a very long keep command (i have 11 variables and 141 jobs i want to keep) but i get an "expression too long" error.

    (the actual code is 1800 lines long)

    Code:
    keep    if    job1 ==    1120        ///
    |    job1 ==    1200            ///
    |    job1 ==    1210            ///
    |    job2 ==    1120        ///
    |    job2 ==    1200            ///
    |    job2 ==    1210            ///
    |    job3 ==    1120

    Is there a simpler way to do this ? Can i use lists or macros for this? (i have never used them, but i would like to learn)


    Thanks in advanced













  • #2
    I can't figure out what you want here. From the code snippet you show it appears that each of these jobs corresponds to a variable that can take on certain values. And it looks like you want to keep only those observations where any one of these jobs takes on the value 1120, 1200, or 1210. But doing that the way you're doing it would only give us 141 * 3 = 423 lines of code. So I'm definitely missing something here.

    And in any case, without you showing an example of your data, using the -dataex- command there is very little chance that I would be able to correctly guess what your data looks like and come up with code that would actually work for your real data (as opposed to my imagined version of it).

    So please post back with an example of your data, created by -dataex-, and a clearer explanation of what you need.

    If you are running version 15.1 or a fully updated version 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.



    When asking for help with code, always show example data. When showing example data, always use -dataex-.

    Comment


    • #3
      The variables are "number of job" (first job, second job, third job, etc..) and the values are jobs (doctor, teacher, manager, etc..)( (in ISCO codes https://www.ilo.org/public/english/b...co08/index.htm)

      i have 153 (not 141, sory) jobs i would like to keep and 12 "number of job" so i get (153*12 = 1836 lines of code)

      Here is the data ex

      v16 is current occupation
      v31ia is 1st occupation
      v31ib is 2nd occupation
      v31ic is 3rd occupation.. etc..


      Code:
      * Example generated by -dataex-. To install: ssc install dataex
      clear
      input int(v16 v31ia v31ib v31ic v31id v31ie v31if v31ig v31ih v31ii v31ij v31ik)
      2611 9998 9998 9998 9998 9998 9998 9998 9998 9998 9998 9998
      1300 4110 4110 4110 4110 4110 4110 9998 9998 9998 9998 9998
      5223 7533 5311 5221 9111 8131 8131 3133 5221 5221 9998 9998
      2632 4227 2421 9998 9998 9998 9998 9998 9998 9998 9998 9998
      2261 4311 5242 5242 4110 2310 9998 9998 9998 9998 9998 9998
      1311 9621 4110 9998 9998 9998 9998 9998 9998 9998 9998 9998
      2143 5312 4413 9998 9998 9998 9998 9998 9998 9998 9998 9998
      3352 8322 9998 9998 9998 9998 9998 9998 9998 9998 9998 9998
      2354 9998 2653 2355 9999 9998 9998 9998 9998 9998 9998 9998
      2611 4110 4312 4110 9998 9998 9998 9998 9998 9998 9998 9998
      2140 9998 9998 9998 9998 9998 9998 9998 9998 9998 9998 9998
      3431 5223 3435 3431 9998 9998 9998 9998 9998 9998 9998 9998
      5230 5230 2166 2166 9998 9998 9998 9998 9998 9998 9998 9998
      5153 5222 5132 4110 5132 9998 9998 9998 9998 9998 9998 9998
      2144 5246 5131 9998 9998 9998 9998 9998 9998 9998 9998 9998
      2631 9998 9998 4110 9998 9998 9998 9998 9998 9998 9998 9998
      2423 4313 2424 2424 9998 9998 9998 9998 9998 9998 9998 9998
      5221 2654 2300 9998 9998 9998 9998 9998 9998 9998 9998 9998
      2310 2341 2300 2359 9998 9998 9998 9998 9998 9998 9998 9998
      9998 9998 9998 9998 9998 9998 9998 9998 9998 9998 9998 9998
      5230 3251 3343 9998 9998 9998 9998 9998 9998 9998 9998 9998
      5223 5223 5244 9998 9998 9998 9998 9998 9998 9998 9998 9998
      5221 6111 8100 9998 9998 9998 9998 9998 9998 9998 9998 9998
      4110 9998 9998 9998 9998 9998 9998 9998 9998 9998 9998 9998
      2353 4110 5223 9998 9998 9998 9998 9998 9998 9998 9998 9998
      5312 4110 9998 9998 9998 9998 9998 9998 9998 9998 9998 9998
      2611 5223 9998 9998 9998 9998 9998 9998 9998 9998 9998 9998
      5223 5230 9621 7000 9998 9998 9998 9998 9998 9998 9998 9998
      2611 9998 9998 9998 9998 9998 9998 9998 9998 9998 9998 9998
      2611 5223 3344 4110 9998 9998 9998 9998 9998 9998 9998 9998
      end

      Comment


      • #4
        OK, thanks. It's a one-liner, actually:

        Code:
        egen wanted = anymatch(v*), values(1120 1200 1210)
        Just replace the numbers 1120 1200 1210 by the actual list of the 12 values you are looking for. Also, this code assumes that all and only the relevant variables begin with v. If that is not the case, then you will have to rework the v* to some notation that actually characterizes all and only the relevant variables. Post back with more information about the variable names if you need help with that.

        If you wanted to place the 12 values, or for that matter, the list of variables, into a local macro and then refer to those macros in the -egen- command you could, but it doesn't serve any apparent purpose to do so.

        Comment


        • #5
          Originally posted by Clyde Schechter View Post
          OK, thanks. It's a one-liner, actually:

          Code:
          egen wanted = anymatch(v*), values(1120 1200 1210)
          Just replace the numbers 1120 1200 1210 by the actual list of the 12 values you are looking for. Also, this code assumes that all and only the relevant variables begin with v. If that is not the case, then you will have to rework the v* to some notation that actually characterizes all and only the relevant variables. Post back with more information about the variable names if you need help with that.

          If you wanted to place the 12 values, or for that matter, the list of variables, into a local macro and then refer to those macros in the -egen- command you could, but it doesn't serve any apparent purpose to do so.
          I think this worked perfectly in a quick test. I will try it later with the complete list of variables and values.

          Thanks!

          Comment


          • #6
            Elaborating on Clyde's example, since your 12 variables are v16 and v31ia through v31ik, if there are no other variables beginning v31i your command will be something like
            Code:
            egen wanted = anymatch(v15 v31i*), values(1120 1200 1210)
            where you will replace 1120 1200 1210 with the 153 values you are looking for. You might find it easiest to do this using a few local macros.
            Code:
            local doctor 1234 2345 3456 
            local manager 9876 8765 
            egen wanted = anymatch(v15 v31i*), values(`doctor' `manager')
            I have confirmed that anymatch will not throw up its hands when confronted with a list of 153 4-digit values.

            Comment


            • #7
              Originally posted by William Lisowski View Post
              Elaborating on Clyde's example, since your 12 variables are v16 and v31ia through v31ik, if there are no other variables beginning v31i your command will be something like
              Code:
              egen wanted = anymatch(v15 v31i*), values(1120 1200 1210)
              where you will replace 1120 1200 1210 with the 153 values you are looking for. You might find it easiest to do this using a few local macros.
              Code:
              local doctor 1234 2345 3456
              local manager 9876 8765
              egen wanted = anymatch(v15 v31i*), values(`doctor' `manager')
              I have confirmed that anymatch will not throw up its hands when confronted with a list of 153 4-digit values.
              Great, thanks to both of you!

              Could i generate "wanted" in a way that instead of a binary variable i get value 1 for obs that match "doctor", value 2 for obs that match "manager" etc... ?

              (i know people could have jobs from more than 1 list, but im not that concerned about that, ill figure it out)

              edit: maybe i could just make wanted1, wanted2, wanted3 etc.. each of those with a list of values and then combine them in wichever way i like but if there is i simpler way i would like to know.
              Last edited by Joaquin Carrascosa; 18 Jun 2019, 16:29.

              Comment


              • #8
                edit: maybe i could just make wanted1, wanted2, wanted3 etc.. each of those with a list of values and then combine them in wichever way i like but if there is i simpler way i would like to know.
                I think this would be a much better way to do it. It preserves the information in the data.

                Comment

                Working...
                X