Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • merge using temp files

    i'm trying to run the following code to merge a load of datasets together.

    Code:
    preserve
    clear 
    tempfile etfdownload
    save `etfdownload', emptyok    
    
    local etf "BNO CAFE    CORN DBA DBB DBC DBE DBO DGL GLD GSG PALL PPLT SLV UGA URA USCI USO DXJ EPI EWA EWC EWG EWH EWI EWJ EWL EWM EWP EWS EWT EWU EWW EWY EWZ EZA FXI IDX PIN RSX FXB FXE FXY UUP ACWI DIA FEZ IEMG IWD IWF IWM    IWN    IWO    MDY    OEF    PFF    QQQ    SPY    VTI    GDX    GDXJ IBB ITB IYR KBE KRE OIH SMH VNQ XHB XLB XLE XLF XLI XLK XLP XLU XLV XLY XME XOP XRT XTL"
    foreach e of local etf{
        getsymbols `e', fm(12) fd(1) fy(1992) lm(12) ld(22) ly(2018) frequency(d) price (close) clear yahoo
        label variable p_close_`e' "`e'"
        rename p_close_`e' `e'
        drop r_`e' R_`e' volume_`e' 
        label variable period "date"
        rename period date
        format date %tdnn/dd/CCYY
        
        merge 1:1 date `etfdownload', keepusing(date) 
        save `"`etfdownload'"', replace
    }
    
    save "/Users/gilesgerman/Documents/stokpix/databasemodels/model2/STATA Data 5.dta", replace
    then error i get is invalid name. Can you help please?

  • #2
    Well, there is one problem I see right off the bat. There may be others that are less obvious, but:

    -merge 1:1 date `etfdownload', keepusing(date)- is invalid syntax. The file to be merged in has to be preceded by -using-. So try

    Code:
    merge 1:1 date using  `etfdownload', keepusing(date)
    As, I say, that one just popped out at me. There may be other errors. If you still get the same error message after fixing this, add -set tracedepth 1- and then -set trace on- to the code just before your -foreach- loop. That will probably produce a lot of annoying, long output, but at least it will enable you to see which statement inside the loop is throwing the error, and perhaps with that knowledge it will be obvious what the problem is.

    Comment


    • #3
      Seems to me like this data would be more useful if it was in a long layout so I would use append instead of merge. And even if you prefer the wide layout, you can easily use reshape to get there. I would also prefer code that does not have to redo all downloads just because you want to add another eft. Here's a solution that uses runby (from SSC) to perform the download for each symbol. It saves a successful download in a subdirectory called "getsymbols_data". If you rerun the example, it will simply load the stored dataset if it exists instead of downloading it again. With runby, what you leave in memory when the user-defined program terminates is considered results and stored. Once all symbols are processed, the stored results are appended and replace the data in memory.

      Code:
      clear all
      
      * Example generated by -dataex-. To install: ssc install dataex
      clear
      input str4 symbol
      "CORN"
      "DBA" 
      "DBB" 
      "DBC" 
      "DBE" 
      end
      
      cap mkdir "getsymbols_data"
      
      program doit
          local s = symbol
          cap use  "getsymbols_data/`s'.dta", clear
          if _rc {
              getsymbols `s', fm(12) fd(1) fy(1992) lm(12) ld(22) ly(2018) frequency(d) price (close) clear yahoo
              gen symbol = "`s'"
              rename *_`s' *
              save "getsymbols_data/`s'.dta", replace
          }
      end
      runby doit, by(symbol) verbose
      tab symbol

      Comment


      • #4
        I’ve had a go at both of these and can’t get them to work. Clyde I amended the piece of code you recommended I can’t merge the files as the first file does not have date to merge it into as it is empty how do I start the code off please?

        Robert for your code I am unfamiliar with functions you have run here and have not managed to get them to work. I have appended the files but can’t get the reshape to work as per your other suggestion. How do I reshape a file with variables for each of the ETFs and date?

        Comment


        • #5
          Yes, I see the problem. The solution is to not save `etfdownload' when it is empty. Then inside the loop, you check to see if it exists yet or not; if so you merge it in, if not you just save what you have and move on.

          Code:
          preserve
          clear 
          tempfile etfdownload
          // save `etfdownload', emptyok    
          
          local etf "BNO CAFE CORN DBA DBB DBC DBE DBO DGL GLD GSG PALL PPLT SLV UGA URA USCI USO DXJ EPI EWA EWC EWG EWH EWI EWJ EWL EWM EWP EWS EWT EWU EWW EWY EWZ EZA FXI IDX PIN RSX FXB FXE FXY UUP ACWI DIA FEZ IEMG IWD IWF IWM    IWN    IWO    MDY    OEF    PFF    QQQ    SPY    VTI    GDX    GDXJ IBB ITB IYR KBE KRE OIH SMH VNQ XHB XLB XLE XLF XLI XLK XLP XLU XLV XLY XME XOP XRT XTL"
          foreach e of local etf{
              getsymbols `e', fm(12) fd(1) fy(1992) lm(12) ld(22) ly(2018) frequency(d) price (close) clear yahoo
              label variable p_close_`e' "`e'"
              rename p_close_`e' `e'
              drop r_`e' R_`e' volume_`e' 
              label variable period "date"
              rename period date
              format date %tdnn/dd/CCYY
              
              capture confirm file `etfdownload'
              if c(rc) == 0 {
                  merge 1:1 date `etfdownload', keepusing(date) 
              }
              save `"`etfdownload'"', replace
          }
          
          save "/Users/gilesgerman/Documents/stokpix/databasemodels/model2/STATA Data 5.dta", replace
          In the code above, I have commented out the offending -save- of the empty file, just to call attention to it. But you should actually just remove that line from the code. The changes inside the loop are italicized.

          With regard to the question you have directed at Robert Picard, it is very vague. "Not managed to get them to work" and "can't get reshape to work" don't provide any information about what is wrong. You either have to provide a much clearer explanation or, better, show an example of the data and the code and the results you are getting so anyone who wants to help can see just what is going wrong. (If when you show the results you get it isn't obvious why those results are not what you need, then provide an explanation of that, or, perhaps show what the results you want for that example would look like.)

          Comment


          • #6
            Clyde,

            I am having difficulties getting this one to work.....

            I have used the
            Code:
            set tracedepth 1
            set trace on
            trick as discussed.

            I'm getting the following error.

            merge 1:1 date /var/folders/9m/d6mv7qtj5p18vfrr2p0ssfr00000gn/T//S_00529.000002, keepusing(date)
            ---------------------------------------------------------------------------------------------------------------------------------- begin merge ---
            - version 11
            - gettoken mtype 0 : 0, parse(" ,")
            - if (!strpos("`mtype'", ":") & "`mtype'"!="") {
            = if (!strpos("1:1", ":") & "1:1"!="") {
            if (_caller()>=11) {
            di as smcl as txt "{p}"
            di as smcl "(note: you are using old"
            di as smcl "{bf:merge} syntax; see"
            di as smcl "{bf:{help merge:[D] merge}} for new syntax)"
            di as smcl "{p_end}"
            }
            merge_10 `mtype' `0'
            exit
            }
            - local origmtype `"`mtype'"'
            = local origmtype `"1:1"'
            - mata: map_mtype(`"mtype"')
            - gettoken token : 0, parse(" ,")
            - if ("`token'"=="_n") {
            = if ("date"=="_n") {
            if ("`mtype'"!="1:1") {
            error_seq_not11 "`mtype'" "`origmtype'"
            }
            gettoken token 0 : 0, parse(" ,")
            local mtype "_n"
            }
            - syntax [varlist(default=none)] using/ [, ASSERT(string) DEBUG GENerate(name) FORCE KEEP(string) KEEPUSing(string) noLabel NOGENerate noNOTEs REP
            > LACE noREPort SORTED UPDATE ]
            / invalid name
            ------------------------------------------------------------------------------------------------------------------------------------ end merge ---
            }
            save `"`etfdownload'"', replace
            }
            r(198);


            Can you help please something wrong with the merge 1:1 command I think?

            Giles

            Comment


            • #7
              The -merge- command requires the keyword -using- between the varlist and the filename.

              Code:
              merge 1:1 date using /var/folders/9m/d6mv7qtj5p18vfrr2p0ssfr00000gn/T//S_00529.000002, keepusing(date)
              I see that this is the same mistake pointed out in #2, but I also see that I failed to notice it myself in my response in #5 where I began by copying the code from #1.
              Last edited by Clyde Schechter; 05 Feb 2018, 15:25.

              Comment


              • #8
                Hello Clyde,

                For this piece of code I am ending up with a file with the date and the last ETF downloaded in this case it is XTL.

                I think maybe I should be using a file somewhere in the middle to merge to.

                Giles

                Comment


                • #9
                  I don't think I can help you out here. I don't know anything about -getsymbol-. Since it includes a -clear- option, I assume that it creates or retrieves some kind of data that it leaves in memory. Is it possible that the -drop r_`e' R_`e' volume_`e'- command is eliminating all of the data it gives you except the date?

                  Here's another possibility. If the -getsymbol- command always gives the same name to the variables (that is, the variable names don't change with different values of `e' from local etf, then the -merge- will not accomplish anything. When you -merge- two files, any variables that exist in both files are either left unchanged, or are updated from the using file to the data in memory (depending on options specified in the -merge- command), but no new variables (except _merge) are created. If you're expecting to have a series of *BNO variables followed by a series of *CAFE variables and then *CORN variables, etc, this is not the way to go about it. You need to explicit create those variables inside the loop by renaming the variables that -getsymbol- gives you to have the corresponding suffix. I can't suggest specific code since I have no idea what the data look like after -getsymbol-. But a series of -renames- or perhaps a group -rename- statement should solve that problem.

                  Comment


                  • #10
                    getsymbol (from SSC) replaces the data in memory with its results. For example, here's a sample run:

                    Code:
                    . getsymbols CORN, fm(12) fd(1) fy(1992) lm(12) ld(22) ly(2018) frequency(d) price (close) clear yahoo
                    You specified a future date, so the last date was changed to today's date
                    The data is being pulled from Yahoo.com ; the variables of this dataset are:
                         close volume
                    Period returns were calculated using the variable: close and only this variable was kept; if you want to keep all variables use the 
                    > option keepall
                    Symbol CORN was downloaded
                    1929 periods of 1 symbol(s) from Yahoo were downloaded.
                    Number of observations with valid prices for all symbols: 1929  out of  1929
                    
                    .
                    and here are the first 5 observations in memory:
                    Code:
                    * Example generated by -dataex-. To install: ssc install dataex
                    clear
                    input float period double p_close float(r R) long volume str4 symbol
                    18422              25.15           .           . 1700 "CORN"
                    18423          25.459999  .012250657  .012326004  200 "CORN"
                    18424          25.790001  .012878305  .012961587  500 "CORN"
                    18427          26.110001  .012331562   .01240791 2200 "CORN"
                    18428 25.969998999999998 -.005376434 -.005362007 7000 "CORN"
                    end
                    format %td period
                    Here are the first 5 observations if the DBA symbol is used:
                    Code:
                    * Example generated by -dataex-. To install: ssc install dataex
                    clear
                    input float period double p_close_DBA float(r_DBA R_DBA) long volume_DBA
                    17171              25.02            .           .  27800
                    17174              24.98 -.0016000003 -.001598721 143600
                    17175              24.66  -.012893007 -.012810248 136700
                    17176              24.73   .002834584  .002838605  76600
                    17177 25.099999999999998   .014850765  .014961585 159600
                    end
                    format %td period
                    So I'll urge Giles to go back to my #3 post, copy the example as is into a text file and run it as a whole. This will download the data for the 5 symbols I used in the example and append the results. I'll restate my suggestion that the results are probably more useful if left in a long layout. If a wide layout is desired, all you need to do is to perform a simple reshape:
                    Code:
                    reshape wide p_close r R volume, i(period) j(symbol) string

                    Comment


                    • #11
                      I'm confused by the outputs shown in #10. In the first output (second code block) it appears that -getsymbol- produces variables called period, p_close, r, R, volume, and symbol. In the second output (third code block), the variables appear to be customized to the symbol DBA, and there is no variable for symbol.

                      Has Robert Picard performed a -reshape long- before the first output, or a -reshape wide- before the second? Or is there an option in -getsymbol- that governs this and was specified differently?

                      While I agree with Robert that the long layout that his code in #3, which does serial -append-s, produces is very probably more useful than the wide layout produced by serial -merge-s, if the -getsymbol- output looks like the second output shown in #10, then the code in #3 would need to be modified so that every symbol produces the unadorned period, p_close, r, R, volume and symbol variables. (This would be most efficiently done with a -gen symbol = ...- command and some -rename-s.) However , if the -getsymbol- output does look like the second output in #10, then a loop of -merge-s might be simpler, followed by a -reshape long- at the end (or leave it alone if a final wide layout is really needed).

                      Comment


                      • #12
                        Clyde, you are correct that in #10, I mixed up the results from the example in #3 with running the getsymbols separately. Sorry about that. The first 5 observations after the following call
                        Code:
                        getsymbols CORN, fm(12) fd(1) fy(1992) lm(12) ld(22) ly(2018) frequency(d) price (close) clear yahoo
                        is
                        Code:
                        * Example generated by -dataex-. To install: ssc install dataex
                        clear
                        input float period double p_close_CORN float(r_CORN R_CORN) long volume_CORN
                        18422              25.15           .           . 1700
                        18423          25.459999  .012250657  .012326004  200
                        18424          25.790001  .012878305  .012961587  500
                        18427          26.110001  .012331562   .01240791 2200
                        18428 25.969998999999998 -.005376434 -.005362007 7000
                        end
                        format %td period
                        So some variable renaming is needed. That's what the code in #3 does. To make things absolutely clear, the following code downloads all 5 symbols in the examples, saves the data in separate datasets, appends data for all 5 symbols, and finally reshapes the data to a wide layout.

                        Code:
                        clear all
                        
                        * Example generated by -dataex-. To install: ssc install dataex
                        clear
                        input str4 symbol
                        "CORN"
                        "DBA" 
                        "DBB" 
                        "DBC" 
                        "DBE" 
                        end
                        
                        cap mkdir "getsymbols_data"
                        
                        program doit
                            local s = symbol
                            cap use  "getsymbols_data/`s'.dta", clear
                            if _rc {
                                getsymbols `s', fm(12) fd(1) fy(1992) lm(12) ld(22) ly(2018) frequency(d) price (close) clear yahoo
                                gen symbol = "`s'"
                                rename *_`s' *
                                save "getsymbols_data/`s'.dta", replace
                            }
                        end
                        runby doit, by(symbol) verbose
                        tab symbol
                        
                        reshape wide p_close r R volume, i(period) j(symbol) string

                        Comment


                        • #13
                          Robert, thanks for clearing that up.

                          Comment


                          • #14
                            A big thanks to you Robert it took me a bit of time to get my head around this but I have got it working.

                            Giles

                            Comment


                            • #15
                              That's great, thanks for the closure.

                              Comment

                              Working...
                              X