Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to pass Mata matrix back to Stata variables: st_store() fails when I have missing data

    Good morning,

    I wrote a Mata function that uses panelsetup() to do something by groups, and it (seemed that it) is miraculously working after I spent almost 40 hours debugging it.

    I tried it on another dataset, and it failed with the message

    Code:
                  st_store():  3200  conformability error
             myquantile_by():     -  function returned error
                     <istmt>:     -  function returned error
    r(3200);
    After scratching my head for another business day, I finally figured out what is the problem: When I have missing values in my data, I exclude those missing values with my -mark- command, and from thereafter I am not able to stick back my results in my data because the dimension of my results vector and my original data are different. So the question is, how do you stick your Mata results matrix back to your Stata data, when you have missing values and because of this the dimensionalities start to differ?

    Here is an example of the problem. In the following Mata code I write a Mata function that replicates what -egen, min()- and -egen,max()- do, that is, it calculates min and max by groups.

    The function works fine when I do not have missing data, and indeed produces the same results at the -egen, min()- and -egen,max()-.

    Code:
    clear
    
    clear mata
    
    mata:
            void function mean_by_store(string scalar var, string scalar groupid, string scalar touse)
            {
                    real scalar     i, j, j0, j1, min, max
                    real colvector  id, y
                    real matrix info, result
                    string colvector minmax
                    
                    minmax = ("min" \ "max")
    
                    id = st_data(., groupid, touse)
                    y = st_data( ., var, touse)
                    
                    result = J(rows(y),2,.)
    
                    info=panelsetup(id, 1)
                    
                    for (i=1; i<=rows(info); i++) {
                            j0 = info[i, 1]
                            j1 = info[i, 2]
                            min = min(y[|j0\j1|])
                            max = max(y[|j0\j1|])
                            for (j=j0; j<=j1; j++) {
                            result[j,1] = min
                            result[j,2] = max
                            }
                    }
                    
                    for (j=1; j<=2; j++) st_store(., st_addvar("double", "my_"+minmax[j]), result[,j])
            }
    end
    
    sysuse auto
    
    keep price rep
    
    sort rep
    
    mark touse
    
    mata: mean_by_store("price", "rep", "touse")
    
    egen min = min(price), by(rep)
    egen max = max(price), by(rep)
    summ my_min min my_max max
    
    . summ my_min min my_max max
    
        Variable |        Obs        Mean    Std. Dev.       Min        Max
    -------------+---------------------------------------------------------
          my_min |         74    3589.203    261.2575       3291       4195
             min |         74    3589.203    261.2575       3291       4195
          my_max |         74    13178.01    2871.961       4934      15906
             max |         74    13178.01    2871.961       4934      15906
    so far so good.

    But now when I exclude some missing values with my -mark- statement, it all falls to pieces:

    Code:
    . sysuse auto, clear
    (1978 Automobile Data)
    
    . 
    . keep price rep
    
    . 
    . sort rep
    
    . 
    . mark touse if !missing(rep)
    
    . 
    . mata: mean_by_store("price", "rep", "touse")
                  st_store():  3200  conformability error
             mean_by_store():     -  function returned error
                     <istmt>:     -  function returned error
    r(3200);
    
    end of do-file
    
    r(3200);
    So what do we do here? What is the solution to moving variables back to Stata from Mata when there are missing values and this makes the dimensionality of objects different?







  • #2
    You used
    Code:
    st_data(., groupid, touse)
    to get the data from Stata into Mata for the subset of observations you wanted to use. The analogous function for returning data to a subset of observations would seem to be something like the following, perhaps.
    Code:
    st_store(., st_addvar("double", "my_"+minmax[j]), touse, result[,j])

    Comment


    • #3
      Thank you, William, what you showed me worked beautifully !


      Originally posted by William Lisowski View Post
      You used
      Code:
      st_data(., groupid, touse)
      to get the data from Stata into Mata for the subset of observations you wanted to use. The analogous function for returning data to a subset of observations would seem to be something like the following, perhaps.
      Code:
      st_store(., st_addvar("double", "my_"+minmax[j]), touse, result[,j])

      Comment


      • #4
        Not directly related to the question, but note that you can cut some loops:

        Code:
        void mean_by_store(
            string scalar var,
            string scalar groupid,
            string scalar touse
            )
        {
            real colvector y
            real matrix    info
            real colvector j0, j1
            real scalar    i
            
            y = st_data(., var, touse)
            
            info = panelsetup(st_data(., groupid, touse), 1)
            j0 = info[, 1]
            j1 = info[, 2]
            
            result = J(rows(y), 2, .)
            
            for (i=1; i<=rows(info); ++i)
                result[j0[i]::j1[i],] = J((j1[i]-j0[i]+1), 1, minmax(y[|j0[i]\ j1[i]|]))
            
            st_store(., st_addvar(J(1, 2, "double"), ("my_min", "my_max")), touse, result)
        }
        
        end
        Last edited by daniel klein; 18 Jun 2021, 01:44.

        Comment


        • #5
          Thank you, Daniel. What you are showing looks better, and it also works a bit faster that my code.

          Originally posted by daniel klein View Post
          Not directly related to the question, but note that you can cut some loops:

          Code:
          void mean_by_store(
          string scalar var,
          string scalar groupid,
          string scalar touse
          )
          {
          real colvector y
          real matrix info
          real colvector j0, j1
          real scalar i
          
          y = st_data(., var, touse)
          
          info = panelsetup(st_data(., groupid, touse), 1)
          j0 = info[, 1]
          j1 = info[, 2]
          
          result = J(rows(y), 2, .)
          
          for (i=1; i<=rows(info); ++i)
          result[j0[i]::j1[i],] = J((j1[i]-j0[i]+1), 1, minmax(y[|j0[i]\ j1[i]|]))
          
          st_store(., st_addvar(J(1, 2, "double"), ("my_min", "my_max")), touse, result)
          }
          
          end

          Comment


          • #6
            I find (most of) your original code slightly easier to read through. If you are worried about speed, then sort (in Stata, which quite fast) by groupid and the target variable (price, in this example). You can then loop over panels and collect the first and last values to get the minimum and maximum. That approach, however, assumes no missing values; you are probably not willing to make that assumption.

            Comment


            • #7
              Hello! I need a help please!
              Endeed, I was estimating GMM with xtabond2 command but my panel is unbalanced and the result sometimes is "No observations" or
              Favoring space over speed. To switch, type or click on mata: mata set matafavor speed, perm.
              Warning: Number of instruments may be large relative to number of observations.
              Warning: Two-step estimated covariance matrix of moments is singular.
              Using a generalized inverse to calculate robust weighting matrix for Hansen test.
              Difference-in-Sargan/Hansen statistics may be negative.
              estimates post: matrix has missing values
              stata(): 3598 Stata returned error
              xtabond2_mata(): - function returned error
              <istmt>: - function returned error
              r(3598);

              Please, I'd like to get a support and get the result from my last estimation. Thankyou!

              Comment


              • #8
                Justin MUKUNDABANTU
                Your question is not related to the topic discussed in this thread. Please post your question as a new topic in the "General" subforum. Please also have a look at the FAQs before posting. From the information you have given us, there is not much we can say to help. You need to tell us more about the data: How many time periods do you have? What is the exact command line you have typed into the command window?

                In any case, please start a new topic.
                https://twitter.com/Kripfganz

                Comment

                Working...
                X