Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Using Forvalue/Foreach to loop over a random series of number

    Dear Statalist I am trying to construct a loop that would help me extracting the data I need from an extremely large database.

    Precisely I am trying to extract a specific sub-list of codes from a much larger list. I have the list I need in Excel and it doesn't have any mathematical order. I will try to be more specific using an example:



    File in excel:
    CODES
    1111
    1214
    1317
    2324
    etc

    I can easily make the list going through a single row rather than the column (111 1213 1317 2324, etc)

    Because we are talking about a list of almost 1000 codes I was trying to write a loop with forvalues using the copy and past function from excel but because the numbers do not follow a specific interval I am not sure how to construct it.

    Obviously if I write something like:

    forvalue x = 111 1214 1317 2324 etc {
    generate newvar = .
    replace newvar = 1 if code ==`x'
    }
    drop if newvar!=1

    Stata won't run it as the forvalue is not correctly specified but my question is whether there is a way to construct the loop in a way that I might simply replace the list of values with a simple copy and past to make it work, even if the numbers do not follow any specific rationale.

    Thanks
    Last edited by Raffaele Palladino; 09 Feb 2016, 08:03. Reason: loop forvalue foreach

  • #2
    As you imply forvalues is quite wrong here as it doesn't support arbitrary lists of numbers.

    Try

    Code:
     
    local X 111 1214 1317 2324
    
    foreach x of local X { 
    
    }

    Comment


    • #3
      Also, the -generate newvar = .- command should be before the loop, not inside it. Otherwise, when you go through the loop the second time, Stata will complain that newvar already exists and will stop.

      You also might want to consider a different approach. Writing out this local macro with 1,000 numerical codes is going to be a real chore, and probably you will make mistakes along the way. Even copy/paste is not that reliable as precisely mousing the exact part of the spreadsheet you need is also error prone. Also, it seems like you don't really want the variable newvar, except to help you decide which observations in your data to keep. So you don't really need newvar for this. I would do it this way:

      Code:
      // FIRST CREATE A TEMPORARY STATA DATA SET WITH THE CODES IN IT
      import excel using my_excel_code_file.xls(x), firstrow case(lower) clear
      keep codes
      rename codes code
      duplicates drop
      tempfile code_file
      save `code_file'
      
      // NOW LOAD IN THE DATA SET THAT I WANTED TO CREATE newvar IN
      use my_dataset_for_newvar, clear
      merge m:1 code using `code_file', keep(match) nogenerate
      
      // DATA IN MEMORY NOW CONTAINS ONLY OBSERVATIONS WHERE 
      // VARIABLE code HAS ONE OF THE VALUES CONTAINED IN THE CODE 
      // COLUMN OF THE SPREADSHEET

      Comment

      Working...
      X