Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • #31
    Carlos Zambrana : My guess is that you do not have the latest -matchit-. Currently, SSC has the 1.3 version. I have just tested it and it should work. So my sugestion is to uninstall the pkg completely and re-install it from ssc again. If this fails you can also install it from my job's server:

    Code:
    net install matchit, from("http://www.wipo.int/esd/RePEc/wip/soft/") replace force
    Let me know if works!

    Comment


    • #32
      Thank you very much for your prompt reply. I did as you suggested and it still gives me the same error message. I have version 1.3 from May of 2016. I tried in three different computers and the problem persisted. I also noted that I do not get the error when running it with two variables of the same data set. It only seems to give me the error when matching two data sets.

      Comment


      • #33
        Ok, this is weird. It at least we know it is not the version. If it works with the column syntax then the MATA functions are there. I don't have the same problem here. Can you share a working example of the error?

        Comment


        • #34
          Julio,

          This code produces the error for me.

          Code:
          clear
          sysuse auto
          
          tempfile temp
          gen id=_n
          save `temp'
          
          matchit id make using `temp', idusing(id) txtusing(make) sim(tokenwrap, "soundex_fk") di
          Thanks again!

          Comment


          • #35
            Julio,

            Here's a better code. Apparently I get the error when running it with two variables of the same file but specifying the weight.

            Code:
            clear
            version
            which matchit
            which freqindex
            
            sysuse auto
            
            tempfile temp
            gen make1 = make
            gen id=_n
            save `temp'
            
            capture noisily matchit id make using `temp', idusing(id) txtusing(make) sim(tokenwrap, "soundex_fk") di w(simple)
            capture noisily matchit id make using `temp', idusing(id) txtusing(make) sim(tokenwrap, "soundex_fk") di w(log)
            capture noisily matchit id make using `temp', idusing(id) txtusing(make) sim(tokenwrap, "soundex_fk") di w(root)
            capture noisily matchit id make using `temp', idusing(id) txtusing(make) sim(tokenwrap, "soundex_fk") di 
            
            capture noisily matchit make make1 , sim(tokenwrap, "soundex_fk") di w(simple)
            capture noisily matchit make make1 , sim(tokenwrap, "soundex_fk") di w(log)
            capture noisily matchit make make1 , sim(tokenwrap, "soundex_fk") di w(root)
            
            capture noisily matchit make make1 , sim(tokenwrap, "soundex_fk") di
            Results:
            Code:
            . clear
            
            . version
            version 14.2
            
            . which matchit
            c:\ado\plus\m\matchit.ado
            *! 1.3 J.D. Raffo May 2016
            
            . which freqindex
            c:\ado\plus\f\freqindex.ado
            *! 1.1 J.D. Raffo April 2016
            
            . 
            . sysuse auto
            (1978 Automobile Data)
            
            . 
            . tempfile temp
            
            . gen make1 = make
            
            . gen id=_n
            
            . save `temp'
            file C:\Users\zambrana\AppData\Local\Temp\ST_06000001.tmp saved
            
            . 
            . capture noisily matchit id make using `temp', idusing(id) txtusing(make) sim(tokenwrap, "soundex_fk") di w(simple)
            Matching current dataset with C:\Users\zambrana\AppData\Local\Temp\ST_06000001.tmp
            Applying weights function: simple
            Similarity function: tokenwrap
             
            Performing preliminary diagnosis
            --------------------------------
             
            Analyzing Master file
            tokenwrap not found as a similarity function. Check spelling.
            Mata run-time error
            
            . capture noisily matchit id make using `temp', idusing(id) txtusing(make) sim(tokenwrap, "soundex_fk") di w(log)
            Matching current dataset with C:\Users\zambrana\AppData\Local\Temp\ST_06000001.tmp
            Applying weights function: log
            Similarity function: tokenwrap
             
            Performing preliminary diagnosis
            --------------------------------
             
            Analyzing Master file
            tokenwrap not found as a similarity function. Check spelling.
            Mata run-time error
            
            . capture noisily matchit id make using `temp', idusing(id) txtusing(make) sim(tokenwrap, "soundex_fk") di w(root)
            Matching current dataset with C:\Users\zambrana\AppData\Local\Temp\ST_06000001.tmp
            Applying weights function: root
            Similarity function: tokenwrap
             
            Performing preliminary diagnosis
            --------------------------------
             
            Analyzing Master file
            tokenwrap not found as a similarity function. Check spelling.
            Mata run-time error
            
            . capture noisily matchit id make using `temp', idusing(id) txtusing(make) sim(tokenwrap, "soundex_fk") di 
            Matching current dataset with C:\Users\zambrana\AppData\Local\Temp\ST_06000001.tmp
            Similarity function: tokenwrap
             
            Performing preliminary diagnosis
            --------------------------------
             
            Analyzing Master file
            tokenwrap not found as a similarity function. Check spelling.
            Mata run-time error
            
            . 
            . capture noisily matchit make make1 , sim(tokenwrap, "soundex_fk") di w(simple)
            Matching columns make and make1
            Applying weights function: simple
            Similarity function: tokenwrap
            tokenwrap not found as a similarity function. Check spelling.
            Mata run-time error
            
            . capture noisily matchit make make1 , sim(tokenwrap, "soundex_fk") di w(log)
            Matching columns make and make1
            Applying weights function: log
            Similarity function: tokenwrap
            tokenwrap not found as a similarity function. Check spelling.
            Mata run-time error
            
            . capture noisily matchit make make1 , sim(tokenwrap, "soundex_fk") di w(root)
            Matching columns make and make1
            Applying weights function: root
            Similarity function: tokenwrap
            tokenwrap not found as a similarity function. Check spelling.
            Mata run-time error
            
            . 
            . capture noisily matchit make make1 , sim(tokenwrap, "soundex_fk") di 
            Matching columns make and make1
            Similarity function: tokenwrap
            0%
            20%
            40%
            60%
            80%
            Done!
            
            . 
            end of do-file

            Comment


            • #36
              Carlos Zambrana : I believe the issue is that, due to my mistake, -freqindex- has not included the function tokenwrap (and probably some of the other new ones). Unfortunately, I'm currently traveling so I cannot fix the ssc package or the one in our server remotely. You copy the function from the -matchit- ado file and past it at the end of the -freqindexfile-.

              Alternatively, you can use the one I'm attaching here.
              Attached Files
              Last edited by Julio Raffo; 30 Mar 2017, 22:18.

              Comment


              • #37
                Hi Julio,

                I tried your ado file and it gave me a different error:
                Code:
                . capture noisily matchit id make using `temp', idusing(id) txtusing(make) sim(tokenwrap, "soundex_fk") di w(log) override
                Matching current dataset with C:\Users\zambrana\AppData\Local\Temp\ST_0c000001.tmp
                Applying weights function: log
                Similarity function: tokenwrap
                 
                Performing preliminary diagnosis
                --------------------------------
                 
                Analyzing Master file
                There seems to be an error with the chosen optional argument(s): , soundex_fk
                (note: break is recommended. Press any key to ignore this and continue .
                                 <istmt>:  3499  soundex_fk not found
                Since you suggested I make changes to the ado file, I hope you forgive me the impertinence of trying other changes. I found that two other changes are still needed to make it work:
                - In the definition of the syntax, at the beginning:
                Code:
                SIMilmethod(string)    ->     SIMilmethod(string asis)
                - When tokenizing the contents of similmethod:
                FROM
                Code:
                 tokenize "`similmethod'" , parse(",")
                 if ("`1'"!="") {
                  local similfunc `1'
                  macro shift
                  local similargs `*'
                 }
                TO
                Code:
                if ("`1'"!="") {
                  gettoken similfunc similargs : similmethod , parse(",") quotes
                 }
                And after that it seemed to work. I hope it's OK I made changes to it, and I sure hope they are the right ones.

                Thanks again for the package and for being so helpful toward the users.

                Comment


                • #38
                  Carlos Zambrana : Sorry, that was another mistake on my end as I was working from out of the office. I have now updated both -matchit- and -freqindex- in our server and sent it also to our friends at SSC. For the former you could run this code to be sure that your replacing with the new code.

                  Code:
                  cap program drop matchit
                  ado uninstall matchit
                  ado dir matchit
                  net install matchit, from("http://www.wipo.int/esd/RePEc/wip/soft/") replace force
                  which matchit
                  
                  cap program drop freqindex
                  ado uninstall freqindex
                  ado dir freqindex
                  net install freqindex, from("http://www.wipo.int/esd/RePEc/wip/soft/") replace force
                  which freqindex
                  I did changes before seeing your other suggestions and it seems to work. But I would look at them and see if I include particularly the asis one. Let me know if it works

                  Comment


                  • #39
                    It works!

                    Below I will correct my code from Mar 27 in case anyone actually tries it, because once the package works I get a bunch of errors that are actually my fault.

                    Thanks again Julio!

                    Corrected code:
                    Code:
                    clear
                    version
                    which matchit
                    which freqindex
                    
                    sysuse auto
                    tempfile temp
                    gen make1 = make
                    gen id=_n
                    save `temp'
                    capture noisily matchit id make using `temp', idusing(id) txtusing(make) sim(tokenwrap, "soundex_fk") di w(simple)
                    
                    sysuse auto, clear
                    tempfile temp
                    gen make1 = make
                    gen id=_n
                    save `temp'
                    capture noisily matchit id make using `temp', idusing(id) txtusing(make) sim(tokenwrap, "soundex_fk") di w(log)
                    
                    sysuse auto, clear
                    tempfile temp
                    gen make1 = make
                    gen id=_n
                    save `temp'
                    capture noisily matchit id make using `temp', idusing(id) txtusing(make) sim(tokenwrap, "soundex_fk") di w(root)
                    
                    sysuse auto, clear
                    tempfile temp
                    gen make1 = make
                    gen id=_n
                    save `temp'
                    capture noisily matchit id make using `temp', idusing(id) txtusing(make) sim(tokenwrap, "soundex_fk") di 
                    
                    capture noisily matchit make make1 , sim(tokenwrap, "soundex_fk") di w(simple) gen(whatever1)
                    capture noisily matchit make make1 , sim(tokenwrap, "soundex_fk") di w(log) gen(whatever2)
                    capture noisily matchit make make1 , sim(tokenwrap, "soundex_fk") di w(root) gen(whatever3)
                    
                    capture noisily matchit make make1 , sim(tokenwrap, "soundex_fk") di gen(whatever4)

                    Comment


                    • #40
                      Julio thank you for creating this amazing function called matchit. I have a question on the best way to match when I have the following case. I have in one table a company came "LSI International Inc" and in the other table "L S I International Inc" so there is blank spase between L, S and I. When the company names are like this the software does not work properly and instead gives other matching names which are not relevant. What is the best approach in this case? Unfortunately I do not control how the company names are written and where are the blanks.

                      Comment


                      • #41
                        yes, blanks within acronyms are tricky. What often works for me is removing all blanks from names (i.e. "L S I International Inc"->"LSIInternationalInc") and then using a ngram similscore (e.g. sim(bigram) or sim(ngram, 3)). Using weights is probably a good idea too (as "INC" is less informative than "LSI"). If I were you, I would also put everything in the same case and remove everything not alphanumeric (e.g. regex: [^a-z0-9]).

                        Hope it helps

                        Comment


                        • #42
                          Dear dr. Julio,

                          Your comand is great! I've been looking for something similar for a long time and it was perfect to find this post.

                          Thank you a lot!

                          Kind regards,

                          Larissa

                          Comment


                          • #43
                            Just in case, let me share here that the new version of matchit (v1.5.1) had a change in the ngram_circ similarity function. The previous function had a small bug that double counted the first gram which affected (slightly) the similarity score. It shouldn't have dramatic backward compatibility issues, but I wanted to have it documented here. My thanks to the user that spotted the issue.
                            J.

                            Comment

                            Working...
                            X