Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Processing matrices in Mata after import from Stata within a Mata loop

    Dear all,

    I came across a relativley simple problem that, so far, I have been unable to solve on my own. The basic issue is the following: I have generated some datasets in MatLab (saved as .txt-files) that I want to import into Mata after loading them as datasets into Stata.

    For the purpose of exposition I have created the following minimal working example (MWE): Say I want to generate two datasets which will be exported and saved as .txt-files. In the following, these datasets shall be loaded into Mata from Stata such that I can process the matrices (in the MWE this is extracting vecotrs) in Mata. Unfortunately, my code does not work:.

    Code:
    clear
    set more off
    
    /* Generating these two data sets is needed for exposition and reproduction only */
    
    forvalue i = 1/2 {
     set obs 20
     set seed `i'
     gen v1 = runiform()
     gen v2 = runiform()
     export delimited using result_`i'.txt, novar replace
     capture clear
    }
    
    
    /* Mata */
    mata
    
    mata clear
    
    for (i=1; i<=2; i++) {
     stata("capture clear")
     stata(sprintf("import delimited result_%f.txt", i))
     stata("putmata vars = (v1 v2), replace")
     V1 = vars[|1,1 \ 20, 1|] 
    /* this is where Stata reports the error message:  <istmt>:  3499  vars not found r(3499); /*
     }
    
    end
    I don't quite understand why Mata can't find the vars matrix. However, replacing the Mata loop by the following expression gives the desired result (but is unfortunately no solution to my actual problem that is not contained in the MWE):

    Code:
    for (i=1; i<=2; i++) {
     stata("capture clear")
     stata(sprintf("import delimited result_%f.txt", i))
     stata("putmata vars = (v1 v2), replace")
     stata("mata: V1 = vars[|1,1 \ 20, 1|]")
     stata("mata: V2 = vars[|1,2 \ 20, 2|]")
    }
    Since I reckon that the solution to this problem must be rather obivous, I very much appreciate any kind of help or comments.

    Thanks in advance,

    Toby

  • #2
    I don't know why but apparently -stata("putmata vars = (v1 v2), replace")- fails to put the result in vars. However, since you are using Mata code already, it would make more sense to use a Mata function to retrieve the data from Stata instead of calling Stata from Mata to let Stata do the work. What you can use is -vars = st_data(.,"v1 v2")-.
    Although this solution will solve your problem with the error the you get, I think that you still end up with something that is not what you want, because at the end of the loop your variables only contain the data from the last data set that you have loaded. The easiest way to solve this is using Stata instead. You can e.g. use:
    Code:
    forvalues i=1/2{
     capture clear
     import delimited result_`i'.txt
     putmata vars`i' = (v1 v2), replace
    }
    Now vars1 contains the data of dat aset 1 and vars2 contains the data of data set 2.
    Best,

    Aljar

    Comment


    • #3
      Aljar,

      thank you very much. That works perfectly fine. I'm still puzzled why putmata doesn't work in this case but your solution is the better one anyway.

      Best,

      Toby

      Comment


      • #4
        Hi
        I do not understand why you try to do this using Stata code.
        If you really want to import data in the files into Mata matrices why not try doing this in Mata and nothing else.
        Off course with Stata concepts like graphs and dialogs you have to use Stata, but not nessacerely here.

        Off course if you could save data as xl files then you could use -xl- in Mata to get the data.
        I can't get an example to work, so I have to get back on that one

        Better still though is to use -mm_insheet- from moremata (type -ssc install moremata- to install).

        Code:
        : mm_insheet("result_1.txt", ";")
                          1             2
             +-----------------------------+
           1 |   0,66818523    0,94153506  |
           2 |   0,92796105    0,48994529  |
           3 |   0,48448139    0,50393951  |
           4 |   0,46973896    0,81581414  |
           5 |    0,9726482    0,16113926  |
           6 |    0,5563972    0,93271309  |
           7 |   0,81569326   0,035417296  |
           8 |   0,83644992     0,3993324  |
           9 |   0,58071828    0,77910513  |
          10 |   0,37679639    0,39831084  |
          11 |   0,11862344    0,46694052  |
          12 |   0,50558829    0,85333455  |
          13 |    0,5678066    0,89523721  |
          14 |  0,029267196    0,93861431  |
          15 |   0,83597755    0,47460064  |
          16 |   0,49913794    0,32111409  |
          17 |   0,71880579    0,32651082  |
          18 |   0,58769572     0,1882409  |
          19 |   0,02479526    0,62382007  |
          20 |  0,000307792    0,92227256  |
             +-----------------------------+
        Last edited by Niels Henrik Bruun; 17 Apr 2015, 03:31.
        Kind regards

        nhb

        Comment


        • #5
          You can do everything in Mata but then you have to abstract away the names of the matrices as well. E.g. use a vector of pointers to the matrices. From a programming point of view this would be the way to go. However, if you want to have convenient names for the matrices for easy access later on, processing everything in Stata is the way to go since Stata is great in its macro capabilities.

          Comment


          • #6
            Hi Aljar
            You're absolutely right, but in this case the quest is to get data from files into Mata matrices.
            And hence you will have to handle names at the point when you move the from Stata to Mata anyway.
            Besides if you look at the example data files there are no variable names to worry about.

            Finally I think your comment leads to the conclusion that we need something like the concept of a dataframe from R or Python/Pandas in Mata.
            Kind regards

            nhb

            Comment


            • #7
              Maybe we do agree but I do not fully understand your comments.
              If you want to have N matrices in Mata that hold some results and these matrices should have the names: result1, result2, ..., resultN, then the easiest way to create these is to use Stata, since with Stata you can create these names using macros. Maybe there is a way of doing this in Mata as well, but I am not aware of it. If you have an idea how to do this, please let me know because that would be interesting.
              However, if I have to write a program I do not want to worry about these variable names or let inputs determine my variable names. So I would do something like:
              Code:
              mata
              N = 10
              result = J(N, 1, NULL)
              for(idx = 1; idx <= N; ++idx){
                  result[idx] = &runiform(4,4)
              }
              // Access them with:
              *result[1]
              *result[2]
              *result[N]
              end
              If you mean that I still have to have a name for the vector of pointers, then yes, you are correct, you need to have some name. If you mean that I need to create unique names for all matrices, the above code shows that this is not the case.

              Comment


              • #8
                Hi again
                I promised to return with the excel case. I know I'm departing from the sample files and as such from the given case.
                But many times today you can save data in a xl file instead of a csv-like file, so here is the solution in that case.

                The data:
                Code:
                clear
                set obs 20
                set seed 1
                gen v1 = runiform()
                gen v2 = runiform()
                export excel using "result_1.xlsx", replace
                I you know the number of rows to import (here it is 20):
                Code:
                : xl.get_number((1, 20),1::2)
                                  1             2
                     +-----------------------------+
                   1 |  .6681852341   .9415350556  |
                   2 |  .9279610515   .4899452925  |
                   3 |  .4844813943   .5039395094  |
                   4 |  .4697389603   .8158141375  |
                   5 |  .9726482034   .1611392647  |
                   6 |  .5563971996   .9327130914  |
                   7 |  .8156932592    .035417296  |
                   8 |  .8364499211   .3993324041  |
                   9 |  .5807182789   .7791051269  |
                  10 |  .3767963946   .3983108401  |
                  11 |  .1186234355   .4669405222  |
                  12 |  .5055882931   .8533345461  |
                  13 |  .5678066015   .8952372074  |
                  14 |  .0292671956   .9386143088  |
                  15 |  .8359775543   .4746006429  |
                  16 |   .499137938   .3211140931  |
                  17 |  .7188057899   .3265108168  |
                  18 |  .5876957178   .1882409006  |
                  19 |  .0247952603   .6238200665  |
                  20 |  .0003077923    .922272563  |
                     +-----------------------------+
                What I missed at first was that you only specify top and bottum row (and likewise columns) you want to import.

                If you do not know the number of rows to import but only a maximum number (here 200) you can do like:
                Code:
                : dta_m = xl.get_number((1, 200),1::2)
                
                : slct = dta_m[.,1] :&lt; .
                
                : select(dta_m, slct)
                                  1             2
                     +-----------------------------+
                   1 |  .6681852341   .9415350556  |
                   2 |  .9279610515   .4899452925  |
                   3 |  .4844813943   .5039395094  |
                   4 |  .4697389603   .8158141375  |
                   5 |  .9726482034   .1611392647  |
                   6 |  .5563971996   .9327130914  |
                   7 |  .8156932592    .035417296  |
                   8 |  .8364499211   .3993324041  |
                   9 |  .5807182789   .7791051269  |
                  10 |  .3767963946   .3983108401  |
                  11 |  .1186234355   .4669405222  |
                  12 |  .5055882931   .8533345461  |
                  13 |  .5678066015   .8952372074  |
                  14 |  .0292671956   .9386143088  |
                  15 |  .8359775543   .4746006429  |
                  16 |   .499137938   .3211140931  |
                  17 |  .7188057899   .3265108168  |
                  18 |  .5876957178   .1882409006  |
                  19 |  .0247952603   .6238200665  |
                  20 |  .0003077923    .922272563  |
                     +-----------------------------+
                As Aljar rightfully points out: This way you do not have any information about row and column labels.
                But then again you didn't save any such information in your data at first anyway
                Kind regards

                nhb

                Comment


                • #9
                  Hi Aljer
                  As a comment to #7.
                  You said
                  If you want to have N matrices in Mata that hold some results and these matrices should have the names: result1, result2, ..., resultN, then the easiest way to create these is to use Stata, since with Stata you can create these names using macros.
                  And you're about that if you have some sort of naming algorithm to use, like eg . "result`idx'", but in that case knowing the number is the same as knowing the name of the matrix.

                  So to me it is more logical to keep them in an array of pointers to the matrices as you do in your example using numbers as keys.

                  However if you want to keep the matrices and keep them apart with names that can't easily be generated from the number then I would use -asarray- from Mata.
                  This way you also avoid to using pointers even though they probably are at play inside -asarray-.
                  An example of this:
                  Code:
                  : mata clear
                  
                  : arr = asarray_create("string", 1)
                  
                  : for (i=1; i<=10;i++) asarray(arr, sprintf("key%f", i), runiform(4,4))
                  
                  : asarray(arr, "key3")
                                   1             2             3             4
                      +---------------------------------------------------------+
                    1 |  .5293895865    .009314626   .8879408655   .9017141387  |
                    2 |  .1900565184   .1102146786   .7944666527   .2409055904  |
                    3 |  .6336910098   .4356327089   .1628101349   .5603298885  |
                    4 |  .1028475226   .5365545617   .9025998455   .7194821713  |
                      +---------------------------------------------------------+
                  Regarding my comments in #6 about dataframes I admit that that the connection is very far fetched, sorry.
                  It was in part because I at that point was more concerned about keeping variable names together with data.

                  I still think that we need some sort of dataframe type, but this case is hardly an argument for it
                  Kind regards

                  nhb

                  Comment

                  Working...
                  X