Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Multiply two data sets

    Hi everyone,

    I have two data set or matrices: matrix a (2x164) and matrix b (164x82). In these two data sets, the first row is the variable name. Would you mind helping me with the code for multiplying matrix a with matrix b? If matrix c (2x82 is the result from this multiplication, how can I name the column of vector c as the column name of vector b? I only see Stata command for multiplying smaller matrix. I read in a post that we can use matrix score to multiply matrix, but I don't understand (Re: st: multiply two data sets (stata.com). I really look forward to your help. Thank you.
    Attached Files

  • #2
    Before you post next time, please read the forum FAQ. Among the things you will learn there is that attachments are discouraged here. Many are reluctant to download files from people they do not know.

    Moreover, it is premature to post if you have not yet imported your data into Stata.

    The following code first creates toy data sets that have the 2x164 and 164x82 structure you describe and then shows how to make a matrix from each, and multiply them. The way Stata matrices work, the column names will automatically be generated the way you want them--you don't have to do anything specific to make that happen.

    Code:
    //  CREATE TWO TOY DATASETS, 2 x 164 and 164 x 82
    clear*
    set seed 1234
    set obs 2
    forvalues i = 1/164 {
        gen var`i' = rnormal()
    }
    tempfile matrix_a_dataset
    save `matrix_a_dataset'
    
    clear
    set obs 164
    forvalues i = 1/82 {
        gen var`i' = rnormal()
    }
    tempfile matrix_b_dataset
    save `matrix_b_dataset'
    
    //  OPEN BOTH DATA SETS AND MAKE EACH INTO A MATRIX
    use `matrix_a_dataset'
    mkmat _all, matrix(A)
    use `matrix_b_dataset', clear
    mkmat _all, matrix(B)
    
    //  MULTIPLY THEM
    matrix C = A*B
    
    //  SEE THE RESULT
    matrix list C
    In the future, when showing examples from your Stata data sets, please use the -dataex- command to do so. If you are running version 17, 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

    Comment


    • #3
      Hi Clyde Schechter,

      Thank you so much for spending your time with my question. I will be more careful in posting my question. I am new to Stata.

      Actually, my original data set is in excel file. So, I imported the file A into Stata as normal (choose the first row as the variable name) and create a matrix from that file using:
      mkmat _all, matrix(A) and then I I imported the file B into Stata as normal (choose the first row as the variable name) and create a matrix from that file using: mkmat _all, matrix(B) I then multiply two matrices: matrix C = A*B However, when I list matrix C, I only see one dot (. ) as value. My two data sets have very small value around '0.0000...." , Is that the reason for the dot . in value ? I change data values to bigger ones, and it is normal. 2. How can I export the value of matrix C in excel to see the value? Thank you so much.

      Comment


      • #4
        I can't answer that question without example data from A and B. Please use the -dataex- command for that, as described in #2.

        How can I export the value of matrix C in excel to see the value?
        I'm not sure what you're asking here. You won't see anything in an export to Excel that isn't already there in Stata. If you are just asking how to export the matrix to Excel, you can go one of two ways. You can save matrix C as a Stata data set using the -svmat- command and then use the -export excel- command. Or, you can just directly send the matrix to Stata using the -putexcel- command. I recommend you read the help files for those commands for the details most relevant to your situation.
        Last edited by Clyde Schechter; 16 Nov 2022, 11:56.

        Comment


        • #5
          Thank you Clyde Schechter. I really appreciate your help. I have found out the problems with my dataset. The command you suggested work well. By the way, your command
          mkmat _all, matrix(A) was really helpful for me. Before knowing this command, I need to type all variable names after "mkmat" to transfer a dataset into a matrix. Statalist is such a helpful website for me as a beginner of Stata.

          Comment


          • #6
            Hi Clyde Schechter . May I ask you a problem about the dot (.) value in matrix multiplication? I have this problem again, Here is one example of my data set.

            Data set 1: Matrix A: 63x6 (First row is variable name: (r_outputSA r_outputSB r_outputSC r_outputSD r_outputSE r_outputSF)

            Code:
            * Example generated by -dataex-. To install: ssc install dataex
            clear
            input double(r_outputSA r_outputSB r_outputSC r_outputSD r_outputSE r_outputSF)
               .028010642156004906     .04961969703435898   .006265750154852867     .05790507793426514     .15201322734355927   .0009028613567352295
             .00006475767440861091 3.4551355838630116e-06                     .   .0023777915630489588  .00028061700868420303                      .
              .0002179262082790956                      .                     .   .0006858560955151916 .000044671727664535865                      .
              .0000156565620272886 .000032480897061759606                     .  .00018620112678036094  .00023494294146075845                      .
              .0005638853181153536   .0006631739088334143                     .    .006475695408880711     .01765397936105728                      .
             .00032361861667595804 1.9687381325184106e-07                     .    .001983100548386574  .00016585960111115128                      .
            .000010891028978221584                      .                     .   .0000739168535801582  .00005325684469426051                      .
            1.3328967725101393e-06                      .                     .  .00042762455996125937  5.277554009808227e-06                      .
               .002788304351270199  .00008890821482054889                     .  .00009249442518921569  .00013203399430494756                      .
              .0002936965611297637   .0005966047756373882                     .    .003852513153105974   .0024658881593495607                      .
              .0008872057660482824   .0036832778714597225  .0005053951754234731    .013263932429254055   .0003742418484762311 .000010627330993884243
              .0016202520346269012    .007983513176441193 9.045985279954039e-06    .002823522547259927    .004797215573489666  .00036815062048844993
              .0000663469108985737  3.412479543385416e-07 .00023996132949832827    .003263812279328704 .000047649096813984215  .00008316463208757341
               .018261132761836052     .03467252105474472   .002061000093817711    .015878455713391304   .0017163255251944065 3.7919771784800105e-06
               .000979466363787651     .02164074406027794  .0032485127449035645    .013500336557626724    .010991986840963364  .00022331386571750045
               .003613679204136133    .012249059043824673  .0020989789627492428    .019938014447689056    .008175709284842014  .00001736473677738104
               .006556142121553421   .0063374522142112255   .003201751969754696    .011582903563976288    .005307966377586126                      .
                .01236317865550518    .004625868517905474 .00004547600110527128    .009252256713807583     .07880105078220367                      .
               .013837406411767006    .030872199684381485   .016985127702355385    .004301629029214382    .013343626633286476     .03817717730998993
               .004153494257479906     .02780100889503956   .020726686343550682     .01002054288983345     .03137969598174095    .011641348712146282
               .027185386046767235    .026329636573791504  .0031723417341709137    .009665540419518948    .026034554466605186                      .
              .0021124668419361115     .02433040551841259   .002233925275504589    .002650490729138255   .0050885663367807865 .000017490669051767327
               .016227398067712784     .01663345843553543  .0002899888495448977    .013262263499200344    .003947804216295481                      .
              .0004544835537672043     .03461487591266632   .007620340678840876    .009460187517106533    .004250082187354565                      .
              .0007137019420042634    .005668018478900194   .004272565711289644    .014384723268449306   .0007043324294500053                      .
                .00748798344284296    .014425098896026611    .02469121664762497     .01438978686928749    .008932601660490036     .08216676861047745
               .013667752034962177   .0054094623774290085                     .    .028458306565880775    .006753254681825638                      .
              .0008649314404465258   .0009499737643636763                     .   .0075714620761573315   .0018148832023143768                      .
              .0007043302175588906  .00036685631494037807                     .    .013734820298850536   .0002579229767434299 5.0792905312846415e-06
              .0014226543717086315   .0010111505398526788                     .    .019549211487174034   .0003400412679184228                      .
               .004168923478573561    .025828540325164795  .0003339883987791836    .020782534033060074   .0008365141693502665                      .
               .005885549820959568    .007096793968230486  .0013488472905009985    .011814701370894909    .006832518614828587                      .
              .0030321197118610144    .012865406461060047    .00853660423308611     .03385758027434349    .005304514430463314                      .
               .008522095158696175    .003221618477255106   .003695306135341525     .04312318190932274   .0011185448383912444      .7830935120582581
              .0067201703786849976    .004648316185921431  .0006398517871275544    .043602701276540756    .005103618837893009                      .
              .0051711685955524445   .0009911464294418693                     .    .005146488547325134   .0001952583115780726                      .
               .025319445878267288   .0029402924701571465  .0012249802239239216    .011011653579771519    .007717293221503496  .00008068795432336628
              .0030174783896654844  .00041722238529473543                     .  .00022976644686423242  .00011689642997225747                      .
               .004260730464011431   .0020693785045295954  .0021095681004226208   .0006267566350288689    .004039441701024771                      .
              .0020780847407877445                      .                     .   .0007490666466765106  .00003309580642962828                      .
              .0017026773421093822   .0005363154341466725                     .   .0048517752438783646   .0009821925777941942 .000027782180040958337
               .001858784700743854  .00003264495899202302 .00010657485108822584    .003738557221367955   .0002526174357626587                      .
              .0004138555086683482                      .                     .    .004482151474803686 .000027148902518092655                      .
              .0011117756366729736    .002693448681384325                     .   .0009557530866004527  .00005396201231633313                      .
               .024134598672389984    .010375241748988628   .013538336381316185     .06860639154911041    .005809848662465811                      .
               .016299864277243614     .02929079905152321    .06266392022371292    .019734373316168785    .005338361021131277                      .
                .09303045272827148     .08437301963567734    .13159611821174622     .21291466057300568      .2592647969722748                      .
                .06959248334169388      .2171393483877182    .30940020084381104     .09714970737695694     .07293380796909332    .002513479208573699
                .03552258014678955    .014825730584561825    .02343806065618992    .012650213204324245    .021147556602954865      .0207003615796566
                .20571650564670563      .1887497901916504    .17607654631137848    .033439651131629944     .13530263304710388     .04721968621015549
                .05354158580303192    .030441761016845703    .06215858832001686    .029795316979289055     .03306756541132927   .0009570740512572229
               .028646841645240784    .013990260660648346   .032719217240810394   .0054548210464417934   .0007501738145947456                      .
              .0076539646834135056    .007869061082601547   .006071640178561211   .0015147371450439096   .0062455907464027405                      .
              .0023000845685601234   .0006383337895385921   .012543915770947933   .0008029405144043267   .0001474357268307358                      .
               .015031245537102222    .002487184014171362   .019613908603787422    .012023371644318104   .0013713749358430505                      .
                .05403594300150871   .0009577648597769439   .002725173719227314    .001142515684477985  .00044048269046470523  .00043702186667360365
               .014495433308184147   .0025080624036490917   .004891590680927038  .00004328033173806034   .0005282759084366262                      .
               .013650747016072273  8.055420153141313e-07   .011532423086464405   .0053956215269863606     .00095799082191661                      .
                .05401573330163956    .001178013626486063   .006145713850855827   .0028147411067038774    .003625937271863222    .011353242211043835
               .014303953386843205  .00040388008346781135   .008762874640524387  .00043748438474722207     .03154277801513672                      .
                 .0178863275796175   .0005467586452141404 .00045797013444826007  .00004458831244846806  .00016687257448211312                      .
               .015024268068373203  .00026940705720335245                     . .000024139153538271785   .0017468760488554835                      .
               .026384413242340088  4.209818598610582e-06                     . .000026297851945855655   .0008887496660463512                      .
            end
            Data set 2: Matrix B (6x6); The first row is variable name: SA SB SC SD SE SF

            Code:
            * Example generated by -dataex-. To install: ssc install dataex
            clear
            input double(SA SB SC SD SE SF)
                .7960249343395248 .000025034530043559202 6.151659800563317e-06 .00013005344578610436 .009095352476671205  .0065184831467752615
             .0016265820206022863      .7996463517307073    .21986245735421178   .010688479576824844  .03462008856798449  .0016549733677926899
            .00006240822375249429    .012278529326338914      .506561511729278 .00009037061030655939 .004507717006949219                     0
             .0019553565952591515    .002851523034351657  .0003582074938300905     .7009852537878608  .06405552099468913 6.838907638512297e-06
              .045839396298961575    .011925505864878685    .03605330649918175   .024816788766830137   .6305878088848237 .00011056410185291905
               .02433679284008173    .008173752950849362   .006178806645480923   .048871363507919284 .018659096153617005     .9390123869001116
            end
            When I multiply matrix A and B, I get matrix C. When I list matrix C, there are only dot (.) values in some rows (for example row 2 to row 10). I do not understand why. I know that the dot (.) value can not mean zero. I want to see the true results (even if it is very small, for example 0.0000..). I really look forward to your help. Thank you for your time!

            Regards,

            Nhung

            Comment


            • #7
              The . is Stata's system-missing value. It means that the value is unknown, or somehow not ascertained. It is not zero.

              Now, let's look at the second row of matrix C, which has all missing values. How are the values in the second row of matrix C calculated. The entry in the first column of the second row of C is the dot product of the second row of A with the first column of B. But look at the second row of A. It contains a missing value in the third column. So the dot product of that row and that column cannot be calculated: its third term is missing. So the entry in the first column of the second row is missing. The same argument shows that all of the entries in the second row of C are missing, because the missing value in the second row of A precludes calculating a dot product of that row with any column of C.

              If you look at the entire matrix C you will see that a row of C contains all missing values if and only if the corresponding row of A contains at least one missing value. And the reasoning is exactly like that for the example just worked out.

              So all of those missing-values in C are the result of the missing values in A. You should look into why those cells in the A matrix have missing values.

              Comment


              • #8
                Hi Clyde Schechter !

                I understand where the error comes from now. Actually, the missing value in matrix A is zero, so when I fill in the missing value with zero, there is no (.) in matrix C. I really appreciate your help. Good luck and take care!

                Comment


                • #9
                  Hi Clyde Schechter,

                  Back to my first post in this forum. May I ask about #2 in this post? In the data set you created, for example, if dataset a has a variable " firm id" as the first column, having values "01" and "02". If we exclude variable "firm id", we have matrix a (2 x164). Meanwhile in dataset b, there is a variable "sector id" at the first column with values of sectors values "S1"... "S164" (not numeric values). If we exclude variable "sector id", we have matrix b (164x164). Here, I am wondering if there is any way we can multiply data set a with data set b without having to turn them into matrice.

                  If I use the matrix multiplication, I need to drop two variables "firm id" and "sector id" before turning data set a and b into matrice, which means I do not have the variable "firm id" in the result. I want to add this variable as the first column of resulting matrix (2x83) to let readers know that 2 rows mean 2 firms with id "01 and "02". In addition, my research includes a lot of data set multiplication, and it seems inconvenient to turn datasets into matric many times and have to delete other variables (For example, firm id or sector id) before turning that data set into a matrix.

                  I really look forward to from you. Thanks a lot!

                  Best regards,

                  Nhung

                  Comment


                  • #10
                    Well, you can retain the firm_id variable as the names of the product matrix rows.

                    But you cannot do the same with sector_id, because the rownames for the second matrix simply have no place in the resulting product. (That is, you can treat data matrix B the same way, but the names are lost in any case because the product matrix has nothing that directly corresponds to the rows of the second factor.

                    Code:
                    //  CREATE TWO TOY DATASETS, 2 x 164 and 164 x 82
                    clear*
                    set seed 1234
                    set obs 2
                    forvalues i = 1/164 {
                        gen var`i' = rnormal()
                    }
                    gen firm_id = string(_n, "%02.0f")
                    order firm_id, first
                    tempfile matrix_a_dataset
                    save `matrix_a_dataset'
                    
                    clear
                    set obs 164
                    forvalues i = 1/82 {
                        gen var`i' = rnormal()
                    }
                    gen sector_id = "S" + string(_n, "%02.0f")
                    order sector_id, first
                    tempfile matrix_b_dataset
                    save `matrix_b_dataset'
                    
                    //  OPEN BOTH DATA SETS AND MAKE EACH INTO A MATRIX
                    use `matrix_a_dataset'
                    ds firm_id, not
                    local vbles `r(varlist)'
                    mkmat `vbles', matrix(A) rownames(firm_id)
                    use `matrix_b_dataset', clear
                    ds sector_id, not
                    local vbles `r(varlist)'
                    mkmat `vbles', matrix(B) rownames(sector_id)
                    
                    //  MULTIPLY THEM
                    matrix C = A*B
                    
                    //  SEE THE RESULT
                    matrix list C
                    You can see that the names "01" and "02" do show up as the rownames in the product matrix. But the S# names do not, even though they have been incorporated into matrix B (run -matrix list B- to see that this is true).

                    Comment


                    • #11
                      Hi Clyde

                      Thank you for your reply. It seems that in this situation (the data set you created), we need to change data sets into matrices to multiply.

                      Comment


                      • #12
                        Yes, I am not aware of any simple way to do matrix multiplication of Stata data sets without converting them to matrices. You could write your own Stata code to do that, of course, but it would be much slower than passing through Stata matrices and availing yourself of the compiled algorithms for matrix multiplication.

                        Comment


                        • #13
                          Thanks a lot, Clyde!

                          Comment

                          Working...
                          X