Dear members,
I am using Stata 12.1 for Mac.
My research requires a lot of data management before getting to the statistical part. And this is where I am stuck.
I have two datasets: (-describe- shown at end of message)
Dataset 1 has the following variables:
- Ticker
- Group
- Year
- this dataset has companies (ticker) which belong to different risk factor groups (group) each year. (The companies can possibly change from a group to another each year.)
- # of obs.: 4082 observations
Dataset 2 has:
- Group
- data (YMD)
- returns
- this dataset has daily returns from 02Jan2000 thru 29Dec2012
- # of obs.: 17712 observations
I have tried a few things:
1) -merge-
But have learned that to use this function i would need a unique identifier. Given that it is inherent to the data that no one field is uniquely identified, I did not find a way to use this command.
2) -joinby-
joinby group using [dataset 2]
result: the command copied data without respect to date/year
3) used these line of code with dataset 1 in the memory:
quietly levelsof year, local(levs)
quietly foreach lev of local levs {
joinby grupo using fipe_ajustes
}
result: Stata stopped the operation and issue the following message:
"sum of expand values exceed 2,147,483,647
The dataset may not contain more than 2,147,483,647 observations."
Here, I am not sure if this 3rd code is working, and if the problem is really size of dataset.
I read the guide for posting, and I hope to have been clear enough.
Thanks,
Clarice
------------------------------------------------
Dataset 1:
. describe
Contains data from stocks_ajustes.dta
obs: 4,082
vars: 3 20 Apr 2014 18:20
size: 81,640
-----------------------------------------------------------------------------------------------------------------------------
storage display value
variable name type format label variable label
-----------------------------------------------------------------------------------------------------------------------------
ticker str16 %16s ticker
year int %10.0g year
group str2 %9s
-----------------------------------------------------------------------------------------------------------------------------
Dataset 2:
. describe
Contains data from fipe_ajustes.dta
obs: 17,712
vars: 3 20 Apr 2014 17:55
size: 247,968
-----------------------------------------------------------------------------------------------------------------------------
storage display value
variable name type format label variable label
-----------------------------------------------------------------------------------------------------------------------------
grupo str2 %9s grupo
ret_ajust double %10.0g
period float %td
-----------------------------------------------------------------------------------------------------------------------------
Sorted by:
Note: dataset has changed since last saved
I am using Stata 12.1 for Mac.
My research requires a lot of data management before getting to the statistical part. And this is where I am stuck.
I have two datasets: (-describe- shown at end of message)
Dataset 1 has the following variables:
- Ticker
- Group
- Year
- this dataset has companies (ticker) which belong to different risk factor groups (group) each year. (The companies can possibly change from a group to another each year.)
- # of obs.: 4082 observations
Dataset 2 has:
- Group
- data (YMD)
- returns
- this dataset has daily returns from 02Jan2000 thru 29Dec2012
- # of obs.: 17712 observations
I have tried a few things:
1) -merge-
But have learned that to use this function i would need a unique identifier. Given that it is inherent to the data that no one field is uniquely identified, I did not find a way to use this command.
2) -joinby-
joinby group using [dataset 2]
result: the command copied data without respect to date/year
3) used these line of code with dataset 1 in the memory:
quietly levelsof year, local(levs)
quietly foreach lev of local levs {
joinby grupo using fipe_ajustes
}
result: Stata stopped the operation and issue the following message:
"sum of expand values exceed 2,147,483,647
The dataset may not contain more than 2,147,483,647 observations."
Here, I am not sure if this 3rd code is working, and if the problem is really size of dataset.
I read the guide for posting, and I hope to have been clear enough.
Thanks,
Clarice
------------------------------------------------
Dataset 1:
. describe
Contains data from stocks_ajustes.dta
obs: 4,082
vars: 3 20 Apr 2014 18:20
size: 81,640
-----------------------------------------------------------------------------------------------------------------------------
storage display value
variable name type format label variable label
-----------------------------------------------------------------------------------------------------------------------------
ticker str16 %16s ticker
year int %10.0g year
group str2 %9s
-----------------------------------------------------------------------------------------------------------------------------
Dataset 2:
. describe
Contains data from fipe_ajustes.dta
obs: 17,712
vars: 3 20 Apr 2014 17:55
size: 247,968
-----------------------------------------------------------------------------------------------------------------------------
storage display value
variable name type format label variable label
-----------------------------------------------------------------------------------------------------------------------------
grupo str2 %9s grupo
ret_ajust double %10.0g
period float %td
-----------------------------------------------------------------------------------------------------------------------------
Sorted by:
Note: dataset has changed since last saved
Comment