Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Basic matrix question from unexperienced user

    Hello,

    I have only recently started using STATA, so forgive me for what will probably be something very straightforward.

    Let's say you have 2 matrices:

    A =
    a
    b
    c
    d

    B = (1,2,3)

    Now, I would like to merge these two into following matrix:

    C =
    a 1
    a 2
    a 3
    b 1
    b 2
    b 3
    c 1
    c 2
    c 3
    d 1
    d 2
    d 3

    So a new matrix, with
    - the number of rows = number of rows A * number of columns in B
    - number of columns = 2

    Which commands do I use to go from A to B?

    Thank you in advance,

    Willem

  • #2
    Are A and B the same type of matrix? In other words, are eltype(A) and eltype(B) equal? I ask because your A seems to be a string matrix. If the matrices are not the same type, then you generally cannot combine them in this way.

    I will assume they are the same type. The following code, used interactively, should give you what you want.
    Code:
    A = (1, 2, 3, 4)
    B = (1, 2, 3)
    
    lenA = length(A)
    lenB = length(B)
    
    C = J(lenA * lenB, 2, .)
    
    for (i = 1; i <= lenA; i++) {
        for (j = 1; j <= lenB; j++) {
            C[(i-1)*lenB + j, .] = A[i], B[j]
        }
    }
    You'll probably have to make changes to fit your needs. For example, if you have string matrices, use C = J(lenA * lenB, 2, ""). Also, this code assumes A and B will both be vectors (matrices with a single row or a single column).

    Finally, a minor point: it's actually "Stata", not "STATA".

    Comment


    • #3
      Originally posted by Willem Vanlaer View Post
      So a new matrix, with
      - the number of rows = number of rows A * number of columns in B
      - number of columns = 2

      Which commands do I use to go from A to B?
      As James indicated, a matrix in Mata must contain either all numeric or all string values, so we'll assume that both A and B are of the same type. In addition, we'll assume (as in your example) that A is n x 1 and B is 1 x m. Given this, you can obtain C with
      vec(J(cols(B),1,A')), J(rows(A),1,B')
      I haven't thought about whether this is the most efficient way to do this, but that probably won't matter unless your actual matrices are really large (in which case, optimizing the code would probably also involve considering what you are doing both upstream and downstream).
      Last edited by Phil Schumm; 12 May 2014, 12:44.

      Comment


      • #4
        If A and B are real, you can also use the Kronecker product (#). Mata: C = A#B' See help mata op_kronecker Christophe

        Comment


        • #5
          Dear all,

          Thank you for your reply's.
          I have gone into more detail on each reply.
          I will also say which is my concrete 'problem', but I will do that in the post below.

          Originally posted by James Fiedler View Post
          Are A and B the same type of matrix? In other words, are eltype(A) and eltype(B) equal? I ask because your A seems to be a string matrix. If the matrices are not the same type, then you generally cannot combine them in this way.

          I will assume they are the same type. The following code, used interactively, should give you what you want.
          Code:
          A = (1, 2, 3, 4)
          B = (1, 2, 3)
          
          lenA = length(A)
          lenB = length(B)
          
          C = J(lenA * lenB, 2, .)
          
          for (i = 1; i <= lenA; i++) {
          for (j = 1; j <= lenB; j++) {
          C[(i-1)*lenB + j, .] = A[i], B[j]
          }
          }
          You'll probably have to make changes to fit your needs. For example, if you have string matrices, use C = J(lenA * lenB, 2, ""). Also, this code assumes A and B will both be vectors (matrices with a single row or a single column).

          Finally, a minor point: it's actually "Stata", not "STATA".
          1. Both are indeed the same type of matrix. I should have stated that more clearly.
          2. When I copy your code, it does exactly what I want. However, it does seem a bit complicated for what should be a relatively basic issue.
          3. Thanks. I will refer to it as "Stata" from now on.

          Originally posted by Phil Schumm
          As James indicated, a matrix in Mata must contain either all numeric or all string values, so we'll assume that both A and B are of the same type. In addition, we'll assume (as in your example) that A is n x 1 and B is 1 x m. Given this, you can obtain C with
          vec(J(cols(B),1,A')), J(rows(A),1,B')
          I haven't thought about whether this is the most efficient way to do this, but that probably won't matter unless your actual matrices are really large (in which case, optimizing the code would probably also involve considering what you are doing both upstream and downstream).
          This is an easy way, which helped me get what I want.

          Originally posted by Christophe Kolodziejczyk
          If A and B are real, you can also use the Kronecker product (#). Mata: C = A#B' See help mata op_kronecker Christophe
          This does not seem to work..

          Comment


          • #6
            The concrete issue I am facing is the following:

            I have data on different variables in Excel.
            1 tab holds data one 1 variable.
            The way it is formatted is rows are countries and colomns are years.
            E.g.
            Countries/Years 1961 1962
            Belgium 0,44 0,46
            France 0,60 0,62

            What I want in Stata is to have a dataset for which the:
            - 1 row is 1 country for 1 year (e.g. France, 1962)
            - a different colomn for a different variable (e.g. 1 colomn for public debt, 1 colomn for private debt).
            E.g
            Countries Years Public Debt
            Belgium 1961 0,44
            Belgium 1962 0,46
            France 1961 0,60
            France 1962 0,62

            What I have done thus far is:
            1. Used Phil Schumm's suggestion to the colomns for the countries and the years ok
            2. To get the data for public debt like I want to I did the following:
            2.a. Import Data from Excel (automatically, each variable is public debt data for 1 year, for all the countries)
            2.b. putmata *
            2.c. mata
            2.d. PDMat = (A,B,C,D,etc.) [where A,B,C, etc. are the different variables]
            2.e. PubDebt = vec(PDMat)

            Now I have all the data in the format I want.

            I have 2 remaining questions:

            A. Is it really necessary to manually type in the names of all variables in part "2.d" of my code?
            B. How do I turn the "PubDebt" vector into a variable (on which I can later perform some regression analyses)?
            I want to perform the process I have mentioned above several times (i.e. as many times as I have tabs in Excel, which equals the amount of variables I have)

            I have tried to be as clear as possible.
            Your help is much appreciated.

            Willem


            Last edited by Willem Vanlaer; 15 May 2014, 10:55.

            Comment


            • #7
              It sounds like you're trying to reinvent the reshape command using Mata—a good programming exercise, perhaps, but probably not how you want to spend your time. Suppose you have an Excel sheet called "Public Debt" that looks like this:
              Code:
              country    year_1961    year_1962
              -------    ---------    ---------
              Belgium         0.44         0.46
               France          0.6         0.62
                 etc.
              and other similar sheets called "Private Debt", etc. Then the following code should give you what you want:
              Code:
              clear
              tempfile mydata
              save `"`mydata'"', emptyok
              
              loc sheets `""Public Debt" "Private Debt""'
              foreach sheet of local sheets {
                  import excel mydata.xlsx, sheet("`sheet'") clear first
                  gen sheet = subinstr("`sheet'"," ","",.)
                  append using `"`mydata'"'
                  save `"`mydata'"', replace
              }
              
              reshape long year_, i(country sheet) j(year)
              reshape wide year_, i(country year) j(sheet) string
              ren year_* *
              There are a few housekeeping details WRT variable naming, etc. that you may need to play around with, but in general, a "double reshape" is the way to handle this problem.

              Comment


              • #8
                My apology. I misunderstood what you intented to do. I understood that you wanted to make only one column.
                Anyway you can still use the kronecker product to produce the first column as demonstrated below
                Code:
                 .
                . mata:
                ------------------------------------------------- mata (type end to exit) ---------
                :
                :                 // There are 5 countries and 3 years of data
                :                 Country = (1,2,3,4,5)'
                  :                 year    = (1,2,3)
                  :                
                :                 id = Country#J(cols(year),1,1),J(rows(Country),1,year')
                  :                
                :                 id
                        1   2
                     +---------+
                   1 |  1   1  |
                   2 |  1   2  |
                   3 |  1   3  |
                   4 |  2   1  |
                   5 |  2   2  |
                   6 |  2   3  |
                   7 |  3   1  |
                   8 |  3   2  |
                   9 |  3   3  |
                  10 |  4   1  |
                  11 |  4   2  |
                  12 |  4   3  |
                  13 |  5   1  |
                  14 |  5   2  |
                  15 |  5   3  |
                     +---------+
                  :                
                : end
                -----------------------------------------------------------------------------------
                But as Phil points out given your problem at hand the use of the (Stata) reshape command is a better choice. If you want to do it the Mata way you can use colshape() or rowshape() function, but you have to be careful, especially if you have more than one variable you want to reshape. For example if you have two variables public and income in wide format and they are arranged in this manner

                public_debt1961, public_debt1962,public_debt1963,income1961,income1 962,income1963.
                ...

                then it will be easier to use rowshape or colshape if the variables are arranged in the following order

                public_debt1961,income1961, public_debt1962,income1962,public_debt1963,income1 963.
                ...

                In my opinion and given your problem reshape in Stata is easier to use.

                Best
                Christophe

                Comment


                • #9
                  Originally posted by Phil Schumm View Post
                  It sounds like you're trying to reinvent the reshape command using Mata—a good programming exercise, perhaps, but probably not how you want to spend your time. Suppose you have an Excel sheet called "Public Debt" that looks like this:
                  Code:
                  country year_1961 year_1962
                  ------- --------- ---------
                  Belgium 0.44 0.46
                  France 0.6 0.62
                  etc.
                  and other similar sheets called "Private Debt", etc. Then the following code should give you what you want:
                  Code:
                  clear
                  tempfile mydata
                  save `"`mydata'"', emptyok
                  
                  loc sheets `""Public Debt" "Private Debt""'
                  foreach sheet of local sheets {
                  import excel mydata.xlsx, sheet("`sheet'") clear first
                  gen sheet = subinstr("`sheet'"," ","",.)
                  append using `"`mydata'"'
                  save `"`mydata'"', replace
                  }
                  
                  reshape long year_, i(country sheet) j(year)
                  reshape wide year_, i(country year) j(sheet) string
                  ren year_* *
                  There are a few housekeeping details WRT variable naming, etc. that you may need to play around with, but in general, a "double reshape" is the way to handle this problem.
                  Phil, thank you for clearing that up. That is indeed what I have been trying to do and it seems the reshape function is exactly what I need.
                  It renders perfectly the results I am looking for.

                  I do have one small request left. Would it perhaps be possible to explain "in human language" what happens in the different parts of your code?
                  Of course, I have googled all the different components and I do understand most of the individual functions. However, I have some trouble comprehending the overall logic.
                  It does not have to be very elaborate, but it would sure help me a lot.


                  Originally posted by Christophe Kolodziejczyk
                  y apology. I misunderstood what you intented to do. I understood that you wanted to make only one column.
                  Anyway you can still use the kronecker product to produce the first column as demonstrated below Code:

                  . . mata: ------------------------------------------------- mata (type end to exit) --------- : : // There are 5 countries and 3 years of data : Country = (1,2,3,4,5)' : year = (1,2,3) : : id = Country#J(cols(year),1,1),J(rows(Country),1,year') : : id 1 2 +---------+ 1 | 1 1 | 2 | 1 2 | 3 | 1 3 | 4 | 2 1 | 5 | 2 2 | 6 | 2 3 | 7 | 3 1 | 8 | 3 2 | 9 | 3 3 | 10 | 4 1 | 11 | 4 2 | 12 | 4 3 | 13 | 5 1 | 14 | 5 2 | 15 | 5 3 | +---------+ : : end -----------------------------------------------------------------------------------
                  But as Phil points out given your problem at hand the use of the (Stata) reshape command is a better choice. If you want to do it the Mata way you can use colshape() or rowshape() function, but you have to be careful, especially if you have more than one variable you want to reshape. For example if you have two variables public and income in wide format and they are arranged in this manner

                  public_debt1961, public_debt1962,public_debt1963,income1961,income1 962,income1963.
                  ...

                  then it will be easier to use rowshape or colshape if the variables are arranged in the following order

                  public_debt1961,income1961, public_debt1962,income1962,public_debt1963,income1 963.
                  ...

                  In my opinion and given your problem reshape in Stata is easier to use.

                  Best
                  Christophe
                  Christophe, thank you for providing me with your advice. At least I have learnt some Mata functions now. It will probably prove to be useful one day.

                  Comment


                  • #10
                    Originally posted by Willem Vanlaer View Post
                    I do have one small request left. Would it perhaps be possible to explain "in human language" what happens in the different parts of your code?
                    Code:
                     1 clear
                     2 tempfile mydata
                     3 save `"`mydata'"', emptyok
                     4
                     5 loc sheets `""Public Debt" "Private Debt""'
                     6 foreach sheet of local sheets {
                     7     import excel mydata.xlsx, sheet("`sheet'") clear first
                     8     gen sheet = subinstr("`sheet'"," ","",.)
                     9     append using `"`mydata'"'
                    10     save `"`mydata'"', replace
                    11 }
                    12
                    13 reshape long year_, i(country sheet) j(year)
                    14 reshape wide year_, i(country year) j(sheet) string
                    15 ren year_* *
                    Lines 1–3 create an empty dataset, saved under the temporary name mydata, which will be used to store the data cumulatively as we read them in from the Excel sheets. Line 5 simply specifies a list of sheet names (you would add the remaining sheet names to this), stored in the local macro named sheets. Lines 6–11 loop through each sheet, read the data in, add a variable containing the sheet name (without spaces), and add these to the data from the previous sheets. Note that at the end of each loop, the cumulative data are again saved. After the loop terminates, we have all of the data from all of the sheets, so we're ready to reshape. Line 13 rearranges the data in long form, with multiple observations for each country for a given variable—one per year. Line 14 then moves the sheets up into columns (wide form with respect to sheet), as you originally specified. Finally, line 15 merely strips off the year_ prefix on each new column, so that the columns are now named according to the original sheet names (again, minus the spaces). As I mentioned in my original post, you may want to handle the variable naming differently.

                    Comment

                    Working...
                    X