Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • after importing csv file the first row shows variable name as v1 , v2 etc

    After using the following command to import CSV file, I upload the data on stata. The problem is the first row shows up as v1, v2, v3 and so on - as you can see it in my posted data example. How can I have the variable names ( which is in my second row ) in my first row ? I need to get rid of the row with v1, v2 - completely removed from the data.

    Code:
    import delimited using ig_nd.csv, clear
    After importing the cSV file, I also see these message multiple times

    (encoding automatically selected: ISO-8859-1)
    Note: Unmatched quote while processing row 70754; this can be due to a formatting problem in the
    file or because a quoted data element spans multiple lines. You should carefully inspect your
    data after importing. Consider using option bindquote(strict) if quoted data spans multiple
    lines or option bindquote(nobind) if quotes are not used for binding data.



    dataex v1 v2

    ----------------------- copy starting from the next line -----------------------
    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input strL(v1 v2)
    "ig_id"      "sequence"
    "20210012690.0" "1.0"              
    "20200326458.0" "1.0"              
    "20200075033.0" "1.0"              
    "20200075034.0" "1.0"              
    "20200075035.0" "1.0"              
    "20200075035.0" "2.0"
    end
    Last edited by Tariq Abdullah; 22 Sep 2023, 18:26.

  • #2
    Best would be to start over with:
    Code:
    import delimited using ig_nd.csv, varnames(1) clear
    That will will instruct Stata to import the information in the first row of the csv file as variable names rather than data.

    If the real file is very large and takes a long time to read in, and you would rather patch what you already have, you can fix the file you have with
    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input strL(v1 v2)
    "ig_id"      "sequence"
    "20210012690.0" "1.0"              
    "20200326458.0" "1.0"              
    "20200075033.0" "1.0"              
    "20200075034.0" "1.0"              
    "20200075035.0" "1.0"              
    "20200075035.0" "2.0"
    end
    
    foreach v of varlist _all {
        rename `v' `=`v'[1]'
    }
    drop in 1
    destring sequence, replace
    Note: If you want to -destring ig_id- as well, you can do that. Generally identifiers aren't used in calculations. On the other hand, if you will need to -xtset- your data with ig_id as the panel variable, then it needs to be numeric, and there may be some other situations where a numeric variable would be necessary of more convenient. It all depends on what you plan to do.

    Comment


    • #3
      Variable names
      Option varnames(row#wvarnames), e.g., for variable names on row 2 of the CSV:
      Code:
      import delimited using ig_nd.csv, clear varnames(2)
      For commands used infrequently and/or with hard-to-navigate options, like import or twoway, I always use the GUI first (i.e., going to File > Import > Text data (delimited, .csv, ...)). The box that pops up will explicitly list "First row as variable names" as an option. After executing the import that way, with all the options set as needed, you can copy the code that pops up in the results window into your .do file.


      Error:
      Only way to figure out the correct response is to know the data you're importing. The error message is on point here. Like the error message suggests, look at the indicated rows of your csv (using a text editor, for instance), then decide whether unmatched quotes are okay (option bindquotes(nobind)), or whether they indicate multiple rows need to be combined into single observations in Stata (option bindquotes(strict)).

      Comment


      • #4
        Originally posted by Clyde Schechter View Post
        Best would be to start over with:
        Code:
        import delimited using ig_nd.csv, varnames(1) clear
        That will will instruct Stata to import the information in the first row of the csv file as variable names rather than data.

        If the real file is very large and takes a long time to read in, and you would rather patch what you already have, you can fix the file you have with
        Code:
        * Example generated by -dataex-. For more info, type help dataex
        clear
        input strL(v1 v2)
        "ig_id" "sequence"
        "20210012690.0" "1.0"
        "20200326458.0" "1.0"
        "20200075033.0" "1.0"
        "20200075034.0" "1.0"
        "20200075035.0" "1.0"
        "20200075035.0" "2.0"
        end
        
        foreach v of varlist _all {
        rename `v' `=`v'[1]'
        }
        drop in 1
        destring sequence, replace
        Note: If you want to -destring ig_id- as well, you can do that. Generally identifiers aren't used in calculations. On the other hand, if you will need to -xtset- your data with ig_id as the panel variable, then it needs to be numeric, and there may be some other situations where a numeric variable would be necessary of more convenient. It all depends on what you plan to do.
        Mr. Schechter,

        With your help I got rid of the issue. Thanks so much for taking the time to address my concern ! Appreciate the valuable insight behind handling the data with accurate command too!

        Comment


        • #5
          Originally posted by Matthew Holt View Post
          Variable names
          Option varnames(row#wvarnames), e.g., for variable names on row 2 of the CSV:
          Code:
          import delimited using ig_nd.csv, clear varnames(2)
          For commands used infrequently and/or with hard-to-navigate options, like import or twoway, I always use the GUI first (i.e., going to File > Import > Text data (delimited, .csv, ...)). The box that pops up will explicitly list "First row as variable names" as an option. After executing the import that way, with all the options set as needed, you can copy the code that pops up in the results window into your .do file.


          Error:
          Only way to figure out the correct response is to know the data you're importing. The error message is on point here. Like the error message suggests, look at the indicated rows of your csv (using a text editor, for instance), then decide whether unmatched quotes are okay (option bindquotes(nobind)), or whether they indicate multiple rows need to be combined into single observations in Stata (option bindquotes(strict)).
          Thanks so much for your kind response! I'll explore that option if I ever run into these issues again! Thank you for your time and feedback !

          Comment

          Working...
          X