Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Mata function by group

    I calculate a real-valued scalar by using the following program:

    /* myproc.ado */
    capture program drop myproc
    program define myproc
    mata: myindexfunc()
    gen myindex=MYINDEX
    end

    mata:
    void myindexfunc() {
    xvar = st_data(.,"xvar")
    yvar = st_data(.,"yvar")
    lat = st_data(.,"lat")
    lon = st_data(.,"lon")
    n = length(xvar)
    d = J(n, n, .)
    for (i=1;i<=n;i++) {
    for (j=1;j<=n;j++) {
    d[i,j]=exp(-sqrt((lon[i]-lon[j])^2+(lat[i]-lat[j])^2))
    }
    }
    myindex = cross(cross(xvar,d)',yvar)/sum(xvar)/sum(yvar)
    st_numscalar("MYINDEX",myindex)
    }
    end

    The program above creates the "myindex" variable that contains a single value of "MYINDEX". The problem here is that my data file has a group variable (say, "group") along with xvar, yvar, lat, lon. My goal is to apply the mata function "myindexfunc()" by group. How do I have to modify the program to do the operation by group?

  • #2
    Code:
    help mf_panelsetup

    Comment


    • #3
      Dear Hua: Thanks for letting me know the function. I don't understand how it works at all, but I will take a look at it.

      Comment


      • #4
        First, this can be a pure Mata solution, the ado program in this case does not do anything interesting except store the result in a new variable, this can be easily archived in Mata using st_view() . It is always a good idea to make one function do just one thing, hence I separated the computation part to the following Mata function:


        Code:
        mata:
        real scalar indexfunc_wrk(real matrix data)
        {
            real matrix d
            real scalar n, i, j, myindex
            
            n = rows(data)
            d = J(n, n, .)
            
            for (i=1;i<=n;i++) {
                for (j=1;j<=n;j++) {
                    d[i,j]=exp(-sqrt((data[i,4]-data[j,4])^2+(data[i,3]-data[j,3])^2))
                }
            }
            
            myindex = cross(cross(data[.,1],d)',data[.,2])/sum(data[.,1])/sum(data[.,2])
            return(myindex)
        }
        end
        Notice that for the pairwise distance matrix d, only the lower or upper triangular part and diagonal part need be computed since it is symmetric. It can be further optimized to use a vectorized version which should be faster if n is large enough. But that's material for another post if you are interested.

        Now we can use indexfunc_wrk() to archive the functionality of your original myproc.ado

        Code:
        mata:
        void  indexfunc(string scalar xvar,
                        string scalar yvar,
                        string scalar latitude,
                        string scalar longitude,
                        string scalar index)
        {
            real matrix data, result
            real scalar myindex
            
            st_view(data, ., (xvar, yvar, latitude, longitude))
            st_view(result,.,index)
            result[.,.]=J(rows(data), 1, indexfunc_wrk(data))
        }
        end
        A quick test looks like:

        Code:
        .
        . set seed 12345
        
        . set obs 5
        number of observations (_N) was 0, now 5
        
        . gen xvar = runiform()
        
        . gen yvar = runiform()
        
        . gen lat  = runiform()
        
        . gen lon  = runiform()
        
        . gen index= .
        (5 missing values generated)
        
        . mata:indexfunc("xvar", "yvar", "lat", "lon", "index")
        
        .
        . list
        
             +------------------------------------------------------+
             |     xvar       yvar        lat        lon      index |
             |------------------------------------------------------|
          1. | .3576297   .2076905   .0039323   .4106361   .7108687 |
          2. | .4004426   .0286627   .0130297   .2607687   .7108687 |
          3. | .6893833   .6889245   .4204224   .0267289   .7108687 |
          4. | .5597356   .4693434   .6161914   .1079424   .7108687 |
          5. | .5744513   .2071526   .8948836    .366684   .7108687 |
             +------------------------------------------------------+
        Now we can deal with the group variable case. Mata offers panelsetup(), panelsubmatrix() and panelsubview() functions for working with group data.

        "
        HTML Code:
        These functions assist with the processing of panel data.  The idea is to
            make it easy and fast to write loops like
        
                        for (i=1; i<=number_of_panels; i++) {
                                X = matrix corresponding to panel i
                                ...
                                ...(calculations using X)...
                                ...
                        }
        "
        We now write a Mata function grpindexfunc() which works on each group.

        Code:
        mata:
        void grpindexfunc(string scalar grp,
                          string scalar xvar,
                          string scalar yvar,
                          string scalar latitude,
                          string scalar longitude,
                          string scalar index)
        {
            real matrix gid, data, info, grpdata, grpidx
            real scalar i, min, max
            real rowvector myindex
            
            st_view(gid=., ., grp)
            st_view(data=., ., (xvar, yvar, latitude, longitude))
        
            /* setup the panel information based on group variable*/
            info = panelsetup(gid, 1)
            min = max = 1
            myindex = J(1, rows(data), .)
            for (i=1; i<=rows(info); i++) {
                /* work on each group */
                panelsubview(grpdata=., data, i, info)
                if(i==1) {
                    min = 1
                }
                else {
                    min = info[i-1, 2]+1
                }
                max = info[i, 2]
                /* put the result to a vector */
                myindex[1,min..max] = J(1, max-min+1, indexfunc_wrk(grpdata))
            }
            st_view(grpidx=., ., index)
            /* put the result into the variable */
            grpidx[.,.] = myindex'
        }
        end
        A quick test shows this indeed matches the results of indexfunc() on each group:

        Code:
        . clear
        
        . set seed 12345
        
        . set obs 10
        number of observations (_N) was 0, now 10
        
        . gen group = 1
        
        . replace group = 2 if _n > 6
        (4 real changes made)
        
        .
        . gen xvar = runiform()
        
        . gen yvar = runiform()
        
        . gen lat  = runiform()
        
        . gen lon  = runiform()
        
        . gen index= .
        (10 missing values generated)
        
        . mata:grpindexfunc("group", "xvar", "yvar", "lat", "lon", "index")
        
        .
        . list
        
             +--------------------------------------------------------------+
             | group       xvar       yvar        lat        lon      index |
             |--------------------------------------------------------------|
          1. |     1   .3576297   .0039323   .0038868   .9342622   .6446639 |
          2. |     1   .4004426   .0130297   .9400924   .9856864   .6446639 |
          3. |     1   .6893833   .4204224   .6912759   .4312878   .6446639 |
          4. |     1   .5597356   .6161914   .9186656   .1280123   .6446639 |
          5. |     1   .5744513   .8948836   .4197324    .870147   .6446639 |
             |--------------------------------------------------------------|
          6. |     1   .2076905   .4106361   .8040955   .7715443   .6446639 |
          7. |     2   .0286627   .2607687   .5986309   .7001778   .6329346 |
          8. |     2   .6889245   .0267289   .0034476    .644223   .6329346 |
          9. |     2   .4693434   .1079424   .4883178   .6357448   .6329346 |
         10. |     2   .2071526    .366684   .8822047   .9978225   .6329346 |
             +--------------------------------------------------------------+
        
        .
        . /* now we use indexfunc() on each group to show the results match*/
        . preserve
        
        . keep if _n in 1/6
        (4 observations deleted)
        
        . mata:indexfunc("xvar", "yvar", "lat", "lon", "index")
        
        . list
        
             +--------------------------------------------------------------+
             | group       xvar       yvar        lat        lon      index |
             |--------------------------------------------------------------|
          1. |     1   .3576297   .0039323   .0038868   .9342622   .6446639 |
          2. |     1   .4004426   .0130297   .9400924   .9856864   .6446639 |
          3. |     1   .6893833   .4204224   .6912759   .4312878   .6446639 |
          4. |     1   .5597356   .6161914   .9186656   .1280123   .6446639 |
          5. |     1   .5744513   .8948836   .4197324    .870147   .6446639 |
             |--------------------------------------------------------------|
          6. |     1   .2076905   .4106361   .8040955   .7715443   .6446639 |
             +--------------------------------------------------------------+
        
        . restore
        
        . preserve
        
        . keep if _n in 7/10
        (6 observations deleted)
        
        . mata:indexfunc("xvar", "yvar", "lat", "lon", "index")
        
        . list
        
             +--------------------------------------------------------------+
             | group       xvar       yvar        lat        lon      index |
             |--------------------------------------------------------------|
          1. |     2   .0286627   .2607687   .5986309   .7001778   .6329346 |
          2. |     2   .6889245   .0267289   .0034476    .644223   .6329346 |
          3. |     2   .4693434   .1079424   .4883178   .6357448   .6329346 |
          4. |     2   .2071526    .366684   .8822047   .9978225   .6329346 |
             +--------------------------------------------------------------+
        
        . restore
        I attached my entire do file which contained checking of the results of new functions against your original code.
        Attached Files
        Last edited by Hua Peng (StataCorp); 30 Aug 2015, 11:13.

        Comment


        • #5
          Thank you very much, Peng. Your answer solves my problem. I appreciate it.

          Comment

          Working...
          X