after importing csv file the first row shows variable name as v1 , v2 etc

Tariq Abdullah

Join Date: Apr 2021

Posts: 366
#1

after importing csv file the first row shows variable name as v1 , v2 etc

22 Sep 2023, 18:21

After using the following command to import CSV file, I upload the data on stata. The problem is the first row shows up as v1, v2, v3 and so on - as you can see it in my posted data example. How can I have the variable names ( which is in my second row ) in my first row ? I need to get rid of the row with v1, v2 - completely removed from the data.

Code:

import delimited using ig_nd.csv, clear

After importing the cSV file, I also see these message multiple times

(encoding automatically selected: ISO-8859-1)
Note: Unmatched quote while processing row 70754; this can be due to a formatting problem in the
file or because a quoted data element spans multiple lines. You should carefully inspect your
data after importing. Consider using option bindquote(strict) if quoted data spans multiple
lines or option bindquote(nobind) if quotes are not used for binding data.

dataex v1 v2

----------------------- copy starting from the next line -----------------------

Code:

* Example generated by -dataex-. For more info, type help dataex clear input strL(v1 v2) "ig_id" "sequence" "20210012690.0" "1.0" "20200326458.0" "1.0" "20200075033.0" "1.0" "20200075034.0" "1.0" "20200075035.0" "1.0" "20200075035.0" "2.0" end

Last edited by Tariq Abdullah; 22 Sep 2023, 18:26.
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#2

22 Sep 2023, 18:45

Best would be to start over with:

Code:

import delimited using ig_nd.csv, varnames(1) clear

That will will instruct Stata to import the information in the first row of the csv file as variable names rather than data.

If the real file is very large and takes a long time to read in, and you would rather patch what you already have, you can fix the file you have with

Code:

* Example generated by -dataex-. For more info, type help dataex clear input strL(v1 v2) "ig_id" "sequence" "20210012690.0" "1.0" "20200326458.0" "1.0" "20200075033.0" "1.0" "20200075034.0" "1.0" "20200075035.0" "1.0" "20200075035.0" "2.0" end foreach v of varlist _all { rename `v' `=`v'[1]' } drop in 1 destring sequence, replace

Note: If you want to -destring ig_id- as well, you can do that. Generally identifiers aren't used in calculations. On the other hand, if you will need to -xtset- your data with ig_id as the panel variable, then it needs to be numeric, and there may be some other situations where a numeric variable would be necessary of more convenient. It all depends on what you plan to do.
1 like
Comment
Matthew Holt

Join Date: Mar 2020

Posts: 5
#3

22 Sep 2023, 19:27

Variable names
Option varnames(row#wvarnames), e.g., for variable names on row 2 of the CSV:

Code:

import delimited using ig_nd.csv, clear varnames(2)

For commands used infrequently and/or with hard-to-navigate options, like import or twoway, I always use the GUI first (i.e., going to File > Import > Text data (delimited, .csv, ...)). The box that pops up will explicitly list "First row as variable names" as an option. After executing the import that way, with all the options set as needed, you can copy the code that pops up in the results window into your .do file.

Error:
Only way to figure out the correct response is to know the data you're importing. The error message is on point here. Like the error message suggests, look at the indicated rows of your csv (using a text editor, for instance), then decide whether unmatched quotes are okay (option bindquotes(nobind)), or whether they indicate multiple rows need to be combined into single observations in Stata (option bindquotes(strict)).
1 like
Comment
Tariq Abdullah

Join Date: Apr 2021

Posts: 366
#4

23 Sep 2023, 04:12

Originally posted by Clyde Schechter View Post

Best would be to start over with:

Code:

import delimited using ig_nd.csv, varnames(1) clear

That will will instruct Stata to import the information in the first row of the csv file as variable names rather than data.

If the real file is very large and takes a long time to read in, and you would rather patch what you already have, you can fix the file you have with

Code:

* Example generated by -dataex-. For more info, type help dataex clear input strL(v1 v2) "ig_id" "sequence" "20210012690.0" "1.0" "20200326458.0" "1.0" "20200075033.0" "1.0" "20200075034.0" "1.0" "20200075035.0" "1.0" "20200075035.0" "2.0" end foreach v of varlist _all { rename `v' `=`v'[1]' } drop in 1 destring sequence, replace

Note: If you want to -destring ig_id- as well, you can do that. Generally identifiers aren't used in calculations. On the other hand, if you will need to -xtset- your data with ig_id as the panel variable, then it needs to be numeric, and there may be some other situations where a numeric variable would be necessary of more convenient. It all depends on what you plan to do.

Mr. Schechter,

With your help I got rid of the issue. Thanks so much for taking the time to address my concern ! Appreciate the valuable insight behind handling the data with accurate command too!
Comment
Tariq Abdullah

Join Date: Apr 2021

Posts: 366
#5

23 Sep 2023, 04:14

Originally posted by Matthew Holt View Post

Variable names
Option varnames(row#wvarnames), e.g., for variable names on row 2 of the CSV:

Code:

import delimited using ig_nd.csv, clear varnames(2)

For commands used infrequently and/or with hard-to-navigate options, like import or twoway, I always use the GUI first (i.e., going to File > Import > Text data (delimited, .csv, ...)). The box that pops up will explicitly list "First row as variable names" as an option. After executing the import that way, with all the options set as needed, you can copy the code that pops up in the results window into your .do file.

Error:
Only way to figure out the correct response is to know the data you're importing. The error message is on point here. Like the error message suggests, look at the indicated rows of your csv (using a text editor, for instance), then decide whether unmatched quotes are okay (option bindquotes(nobind)), or whether they indicate multiple rows need to be combined into single observations in Stata (option bindquotes(strict)).

Thanks so much for your kind response! I'll explore that option if I ever run into these issues again! Thank you for your time and feedback !
Comment

Announcement

after importing csv file the first row shows variable name as v1 , v2 etc

Comment

Comment

Comment

Comment