stata:I/O error writing .dta file

Olivia Li

Join Date: Dec 2021

Posts: 40
#1

stata:I/O error writing .dta file

26 Dec 2021, 07:59

Hi everyone!
I've got a problem when trying to merge two big data, borrow.dta and return,dta.
I got the following message:
(master is borrow.dta)

Code:

merge 1:1 ID_time using return.dta I/O error writing .dta file Usually such I/O errors are caused by the disk or file system being full.

Could you please kindly tell me how to deal with this ?
Or all I need is to change my PC with a larger CPU or something?

Thanks a lot!!
Tags: I/O error, stata
Daniel Feenberg

Join Date: Oct 2014

Posts: 321
#2

26 Dec 2021, 08:39

You wouldn't think the merge command would need to write to disk, but it does have a save command, used if the data is not sorted. You could -set trace on- and rerun your program to find out where and what Stata was trying to write.
Comment
Olivia Li

Join Date: Dec 2021

Posts: 40
#3

26 Dec 2021, 19:50

Originally posted by [email protected] View Post

You wouldn't think the merge command would need to write to disk, but it does have a save command, used if the data is not sorted. You could -set trace on- and rerun your program to find out where and what Stata was trying to write.

Thanks for your help , I tried save and it cannot be saved , neither.
How should I do to cope with this?
Comment
Chen Samulsion

Join Date: Jan 2018

Posts: 790
#4

26 Dec 2021, 23:11

Hope this material could help you:
https://back.nber.org/stata/efficient/bigio.html
Comment
Daniel Feenberg

Join Date: Oct 2014

Posts: 321
#5

27 Dec 2021, 06:01

I wasn't clear, I expect that if you made sure that both datasets were sorted (in Stata, so that -merge- knew they were sorted) before the -merge- command, that might avoid the -save- command that -merge- executed behind the scenes.

Have you thought about exactly how big the merged file will be? Will it fit in your core memory? Will you want to save it to disk? You should be able to estimate the size. -des- will return the size of one row of the dataset in core :

Code:

des di "`r( width)'"

You can use that information with the number of rows in each dataset to estimate the size of the merged file and compare it to the free space on your computer disk storage. If n1 and n2 are the number of rows respectively and w1 and w2 the the widths, then worst case for the merged file size is (n1+n2)*(w1+w2) bytes but each matched row reduces the size of the merged file by w1+w2 bytes.

There is a posting somewhere saying that Stata needs core memory equal to the worst case, even if all the records match. I don't know if this is correct, or was ever correct, but if that is the case you might divide the using dataset in to pieces, merge the pieces and append the results into one dataset.

When you have really big files, you have to be conscious of the size, and plan for it.
Comment
Olivia Li

Join Date: Dec 2021

Posts: 40
#6

28 Dec 2021, 19:12

Originally posted by Chen Samulsion View Post

Hope this material could help you:
https://back.nber.org/stata/efficient/bigio.html

Thanks soooo much, very helpful !
Comment
Olivia Li

Join Date: Dec 2021

Posts: 40
#7

28 Dec 2021, 19:14

Originally posted by [email protected] View Post

I wasn't clear, I expect that if you made sure that both datasets were sorted (in Stata, so that -merge- knew they were sorted) before the -merge- command, that might avoid the -save- command that -merge- executed behind the scenes.

Have you thought about exactly how big the merged file will be? Will it fit in your core memory? Will you want to save it to disk? You should be able to estimate the size. -des- will return the size of one row of the dataset in core :

Code:

des di "`r( width)'"

You can use that information with the number of rows in each dataset to estimate the size of the merged file and compare it to the free space on your computer disk storage. If n1 and n2 are the number of rows respectively and w1 and w2 the the widths, then worst case for the merged file size is (n1+n2)*(w1+w2) bytes but each matched row reduces the size of the merged file by w1+w2 bytes.

There is a posting somewhere saying that Stata needs core memory equal to the worst case, even if all the records match. I don't know if this is correct, or was ever correct, but if that is the case you might divide the using dataset in to pieces, merge the pieces and append the results into one dataset.

When you have really big files, you have to be conscious of the size, and plan for it.

Thanks for your answering , I may need a little more time to digest ^_^
Comment

Announcement

stata:I/O error writing .dta file

Comment

Comment

Comment

Comment

Comment

Comment