Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • To create variables if they do not exist in data

    I have 100 data sets, they have a different number of variables. For example, some data sets have 33 variables (pre_tos1-pre_tos33), but others have a smaller number of variables than them (e.g., 29 variables, or 31 variables). Here, I want to create variables if they do not exist in the data set, and assign them to zero.


    I tried this work using the following codes.

    HTML Code:
        quietly forvalues s=1/33 {
            gen p_tos`s' = 0
    
            foreach k of var pre_tos`s' {
                capture confirm var `k'
                if c(rc) == 0  {
                    replace p_tos`s' = `k' 
                }
            }
        }
    variable pre_tos7 not found
    r(111);

    For example, the first data set does not have variable pre_tos7. When I used the following codes, the following error appears:


    HTML Code:
    variable pre_tos7 not found
    r(111);

    Could you please provide any advice?




  • #2
    I don't follow your code. So I'm not sure exactly how to explain what's wrong with it. As I understand it, you want to check the file for variables pre_tos1 through pre_tos33, and if they do not exist, create them, with zero values. Here's how I would do that.

    Code:
    forvalues i = 1/33 {
        capture confirm var pre_tos`i'
        if c(rc) == 111 { // VARIABLE NOT FOUND
            gen pre_tos`i' = 0
        }
        else if c(rc) != 0 { // SOME OTHER ERROR; DISPLAY & STOP
            display in red "Unexpected error confirming pre_tos`i'
            exit(111)
        }
        // IF HERE, c(rc) == 0, VARIABLE FOUND, NOTHING TO DO
    }

    Comment


    • #3
      Thank you for your response. I tried your code, but I saw the following error message.

      HTML Code:
      varlist or in range required
      r(100);

      Comment


      • #4
        Please post an example of one of your data sets, and also copy/paste your code and Stata's response directly from your Results window or output log into a code block here on the forum. I tested the code in my own Stata and it ran fine. Either there is something different about your data from what this code assumes, or you have in some way miscopied the code.

        Be sure to use -dataex- to post your example data. If you don't already have it, run -ssc install dataex- to get it, and run -help dataex- to read the simple instructions for using it.

        Added: The error message "varlist or in range required" sounds like it comes from a -drop- command where you didn't specify anything to drop. There are no such commands in the code I suggested. I think this message is likely coming from someplace else in your code.
        Last edited by Clyde Schechter; 13 Apr 2017, 16:19.

        Comment


        • #5
          There is a bug in my code in #2 that could lead to incorrect results (but not to the error message you refer to in #3). Please use:

          Code:
          forvalues i = 1/33 {
              capture confirm var pre_tos`i', exact
              if c(rc) == 111 { // VARIABLE NOT FOUND
                  gen pre_tos`i' = 0
              }
              else if c(rc) != 0 { // SOME OTHER ERROR; DISPLAY & STOP
                  display in red "Unexpected error confirming pre_tos`i'
                  exit(111)
              }
              // IF HERE, c(rc) == 0, VARIABLE FOUND, NOTHING TO DO
          }
          The -exact- option is needed because, without it, when Stata checks for, say pre_tos2, if there is a pre_tos20 or something like that for which pre_tos2 is an abbreviation, then Stata will treat it as a successful identification of the variable being sought, which is not correct here. The -exact- option will assure that only an exact match to the variable name being searched for will be accepted.

          Comment


          • #6
            Oh, please ignore my comments. I did not remove my command drop. Thank you very much for your help!

            Comment


            • #7
              Hello,
              Very nice discussion! I am facing the similar problem. I have about 50 datasets that need to be appended for my analysis. However, some variables do not exist in all dataset which creates a problem during append. Therefore, I want to generate a conditional variable (s) to overcome the problem. As for example, I want to generate variable A = B if B exists in the dataset, otherwise (if B does not exist in the dataset), generate B=missing first and then generate (or replace) A=B. Could you please help me to write the program in STATA.

              Thank you.

              Comment


              • #8
                I don't understand your question. -append- does automatically what I think you are asking for. If we have a variable, call it x in one data set that does not exist in the other and we -append- the two data sets (in either order), the resulting dataset will contain the variable x, and it will have its original values in the observations that came from the dataset that had x, and it will have missing values in the observations that came from the dataset did not have x. So you don't need to create any new variable that has this behavior: x itself will be like this in the -append-ed dataset.

                Comment


                • #9
                  Thanks for your quick response. Sorry for confusing you. Actually, I am not going to use all variables for append. My dataset has about 2000 variables. First I will create a subset of the datasets by using required variables (say, x1 x2 ............x100) for each of 50 datasets. Usually, I am using "keep" command to make the subset database and finally try to append all the subset database. But If a variable does not exist in the dataset then keep command shows an error message. As for example, say variable x1 exist in dataset 1 but not exist in dataset 2, if I want to create a subset database (say, x1 x2 x3) by using "keep" command it works for dataset 1 but not for dataset 2 (as x1 does not belong to dataset 2). To overcome this situation, I want to create a variable y1=x1 if x1 exist (which is true for dataset 1), otherwise (for dataset 2), y1= missing value so that two subsets are like:
                  subset 1: y1 (=x1) , x2, x3
                  subset 2: y1 (missing), x2, x3
                  then I will append subset 1 and subset 2.
                  That means I will create 50 subset databases with equal no. of variables (with the same variable name) from 50 datasets. I will temporarily save each subset databases and finally run append command and save it permanently for my analysis. To create the subset with same and equal no. of variables I need this program that I mentioned earlier. Hope this clarifies you. Thank you once again.

                  Comment


                  • #10
                    First, you will need to define two local macros: one is the list of variables you want to retain, and the other is the list of files you want to process.
                    Code:
                    local variables_to_keep // NEEDS TO BE DEFINED AND INCLUDE y1
                    
                    local  filelist // NEEDS TO BE DEFINED
                    Then the following code will loop over the files, creating variable y1 from x1 if it exists, or with all missing values, keep the desired variables, and append the files as it goes. The final result is the appended results.
                    Code:
                    tempfile building
                    clear
                    save `building', emptyok
                    
                    foreach f of local filelist {
                        use `"`f'"', clear
                        capture confirm var x1, exact
                        if c(rc) == 0 {    // VARIABLE FOUND
                            rename x1 y1
                        }
                        else if c(rc) == 111 {    // VARIABLE NOT FOUND
                            gen y1 = .
                        }
                        else {
                            display in red `"Unexpected error processing file `f'"'
                            exit c(rc)
                        }
                        keep `variables_to_keep'
                        append using `building'
                        save `"`building'"', replace
                    }
                    
                    use `building', clear
                    save final_appended_file, replace
                    Note: Not tested. Beware of typos, unbalanced braces, etc. The code assumes that x1 (and y1) is a numeric variable; if it is a string, then instead of -gen y1 = .- it would be -gen y1 = ""-.

                    Comment


                    • #11
                      I am wondering: append, then keep (instead of keep, then append) would be a solution?

                      Comment


                      • #12
                        Romalpa,

                        You are quite right! -append- followed by -keep- does exactly what is asked for and is much simpler.

                        Comment


                        • #13
                          Many thanks, Clyde Shechter! I will try and get back to you if required. Thanks, Romalpa for your suggestions as well. I will try both approaches.

                          Comment

                          Working...
                          X