Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • extracting the values of a variable as a local macro in the order that they appear in the data

    Hi. I'm new to Statalist. Could anyone help me with extracting the values of a variable as a local macro in the order that they appear in the data ? levelsof doesn't work as it sorts the values in alphabetical order.

  • #2
    Code:
    frame put my_variable, into(new_frame)
    frame change new_frame
    duplicates drop
    local my_local
    forvalues j = 1/`=_N' {
        local my_local `my_local' `=my_variable[`j']'
    }
    frame change default // OR RETURN TO WHATEVER FRAME YOU WERE IN
    Note: Because no example data was provided, this code is not tested. Beware of typos or other errors, but this is the gist of it.

    Comment


    • #3
      This works perfectly ! Thank you so much for your help ! I didn't know that we could work with data frames in Stata. Also , the syntax of the kind
      `=my_variable[`j']' is this sort of an extended macro function ?

      Comment


      • #4
        Data frames are new as of Stata 16. Definitely read up on them in the PDF documentation: I find them to be the single most useful innovation in this version of Stata.

        And yes, `=expression' is a macro extended function. You can put any valid Stata expression in there.

        Comment


        • #5
          Thanks again !

          Comment


          • #6
            I'd be curious of the context of how Arush wants to use the values of the variable in this way. My gut feeling --perhaps wrong-- is that using that information might lead to a somewhat brittle way to accomplish something better accomplished in some other way that Arush would find helpful.

            Comment


            • #7
              Hi Mike, I was basically trying to check if two datasets contained the same data. They had 199 variables each but variable names and order of variables differed completely across each. The variable labels were all same and unique for each variable but too long to be used as variable names. It was not possible to use variable labels as variable names without loosing their uniqueness.

              So I chose to create two supplementary datasets - one corresponding to each of the original two datasets. Each supplementary dataset contained two variables - one that recorded all the variable names of the original dataset and the other that recorded the variable labels. I then merged these two supplementary datasets using the column containing variable labels as the key. Thus I obtained a one to one mapping between the variable names of the two original datasets. Then I wanted to extract these variable names in two local macros , but preserving the order of this merged dataset. Finally I opened one of the original dataset and renamed all it's variables according to the other dataset, using these local macros. Then I used the -cf- command to compare the two datasets.

              I know this sounds pretty cumbersome way of doing it, but I couldn't think of an alternative way.

              Comment


              • #8
                Thanks, Arush. The problem you describe is interesting. Just for the fun of it, here is a solution that involves renaming the variables in each file to the same names using a hash of their variable labels, for which I used Mata's -hash1()- . It takes more code to simulate your data than to do what you want <grin>.

                Code:
                clear
                set seed 98765
                local nvar = 10
                set obs 10
                forval i = 1/`nvar' {
                  gen x`i' = runiform()
                  // make a nonsense label
                  local text ""
                  local length = 10 + ceil(runiform() * 60)
                  forval j = 1/`length' {
                     local text = "`text'" + char(65 + ceil(30*runiform()))
                  }
                  label var x`i' "`text'"
                }
                tempfile file1
                save `file1'
                // Make a version with different varnames in a shuffled order
                rename x* y*
                ds
                forval i = 1/`nvar' {
                   local choose = word(r(varlist), ceil(`nvar' *runiform()))
                   order `choose'    
                }
                tempfile file2
                save `file2'
                // end data simulation.
                //
                /// Solution starts here.
                // Rename variables in file2 based on a hash of the var label
                foreach v of varlist * {
                   local vl: variable label `v'
                   mata: st_local("hashname", "z" + strofreal(hash1("`vl'"), "%20.0g"))
                   rename `v' `hashname'
                }
                save `file2', replace
                //
                // Rename variables in file1 based on a hash of the var label
                use "`file1'", clear
                foreach v of varlist * {
                   local vl: variable label `v'
                   mata: st_local("hashname", "z" + strofreal(hash1("`vl'"), "%20.0g"))
                   rename `v' `hashname'
                }
                // Compare
                cf * using "`file2'", all

                Comment


                • #9
                  Thanks for the alternative solution Mike ! I'm not familiar with Mata , but this maybe useful for future.

                  Comment

                  Working...
                  X