Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Wish: To facilitate future replication of Stata results, a StataCorp utility to help users "freeze" a collection of user-contributed ADO, Mata and MLIB programs for publication/posting with the Stata DO files that call those programs

    Many more responsible journals require that referees and eventual readers be able to replicate the analytical results in submitted papers. Through support to the Stata Journal and periodic Stata user conferences, Stata also encourages and helps users to produce and publish Stata programs that extend Stata's capabilities in small and large ways. But in the future when a researcher attempts to replicate the results in a published paper, the community-contributed programs originally used might not be available in the same version, or at all.

    Thus a user wishing to enable future replication of a set of interlocking DO files and community-contributed ADO/Mata/Mlib files must figure out how to assemble and "freeze" the community-contributed ADO files used in a given research project. This is doable and many users are already doing it, each in his or her own way. But it would be great if there were a set of StataCorp supported conventions and utilities to standardize the process. (Ideally some journals, starting with Stata Journal, would even require that Stata users conform to such StataCorp-recommended conventions and use the recommended program-freezing utilities.)

    diana gold 's SSC program -dependencies- seems to me to be an excellent model for a Stata-supported way to "freeze" (her word) a set of user-contributed Stata ADO/MATA/MLIB files in order to facilitate future replication of research results. I like the fact that it allows the future replicator to temporarily modify their -adopath- as they replicate and then undo this change and delete the replication-specific collection of ADO/Mata/MLIB programs at will. Other SSC programs that accomplish some of the same objectives include -zippkg-, -rqrs-, -which_version-, -copycode-, -adolist- and -usepackage-.

    I take the point that results produced using the updated community-contributed ADO files may differ from those originally published exactly because the ADO file's bugs have been fixed. The new results might be "better". But I think this is an argument in favor of, rather than against, requiring authors to publish their frozen ADO files as part of a journal submission. I think that replicators need to start with a script that reproduces as exactly as possible the published result, before they experiment to discover the sensitivity of those results to different approaches and/or data. It is the replicator's responsibility to discover that the newer version of the community-contributed program produces a different result.

    diana gold, daniel klein Nick Cox, Sergio Correia and others have extensively discussed these issues on these threads:
    https://www.statalist.org/forums/for...lable-from-ssc
    https://www.statalist.org/forums/for...ge-require-ado
    https://www.statalist.org/forums/for...o-local-folder
    https://www.statalist.org/forums/for...os#post1523554
    https://www.statalist.org/forums/for...79#post1662079
    Last edited by Mead Over; 29 Apr 2022, 15:55.

    Comment


    • a slightly bigger arrow in the replace all button in the do files such that we can access the replace all in selection with more ease.

      Comment


      • Extension of the existing -tabulate- and -table- commands to have an option that would look at attached variable labels of the input variables and use those defined levels of the label to add them to the tabulations. In the one dimensional case, this is closest to Ben Jann's -fre- command with the -i()- option, which lets the user include specific values that would otherwise have zero frequency. One related request of mine has now been implemented as the new -table, zerocounts- option. However, this option is limited in scope to where zero counts are implied by the cross-tabulation.

        There is no support for this currently in any of the official commands, but this can be a useful feature when trying to create tabulations where you specifically want to show zero frequency counts.

        As a quick example to demonstrate this, consider the following.

        Code:
        tabi 0 1 2 \ 0 3 4
        The output eliminates the first column because there are no observations in any of those cells.

        Code:
        . tabi 0 1 2 \ 0 3 4
        
                   |          col
               row |         2          3 |     Total
        -----------+----------------------+----------
                 1 |         1          2 |         3
                 2 |         3          4 |         7
        -----------+----------------------+----------
             Total |         4          6 |        10

        Comment


        • It'd be useful to have some documentation on the gr_edit command

          Comment


          • `
            EDIT: Sorry--no intended post here. Cat walked on the keyboard and somehow that led to a save.

            Comment


            • Seconding #364. At present it is easy enough to record actions in the graph editor and copy paste them into a gr_edit command, but it would be great to have documentation on how to write those commands from scratch.

              Better yet, it would be great to be able to do everything that can be done via the graph editor within the original graph or twoway command used to generate a graph, which sometimes seems impossible (recent example here: https://www.statalist.org/forums/for...bol-color-size)

              Comment


              • It would be nice to have a convenient one-step command to copy value labels from one frame to another. You can accomplish it by copying any variable to which that value is attached (assuming there is one, which isn't always the case) into the second frame, and then apply that label to the desired other variable and then drop the one you brought in. Or you can -label save- the label from the original frame to a tempfile and -run- it in the other. But it wold be convenient to be able to do it in a single command.

                Comment


                • Originally posted by Clyde Schechter View Post
                  It would be nice to have a convenient one-step command to copy value labels from one frame to another. You can accomplish it by copying any variable to which that value is attached (assuming there is one, which isn't always the case) into the second frame, and then apply that label to the desired other variable and then drop the one you brought in. Or you can -label save- the label from the original frame to a tempfile and -run- it in the other.
                  It could be even simpler than that. To copy a value label to the current frame:

                  Code:
                  frame other_frame : mata : st_vlload("lblname", values=., labels="")
                  mata : st_vlmodify("lblname", values, labels)
                  To copy a value label to another frame:

                  Code:
                  mata : st_vlload("lblname", values=., labels="")
                  frame other_frame : mata : st_vlmodify("lblname", values, labels)

                  I see how a more general and robust approach would be convenient.

                  Comment


                  • I've become quite fond of the -describe using-.

                    It would be nice when working with large datasets to peek into the first or the last rows.
                    I suggest options for -use- like -use var1 var2 using "dataset.dta" in 1/100, last- to see the last 100 rows for var1 and var2 in "dataset.dta".

                    If a min/max/missing report could be saved in the dataset as metadata, that would be nice too. Maybe this should be an option to -save-.
                    Kind regards

                    nhb

                    Comment


                    • Originally posted by Niels Henrik Bruun View Post
                      I've become quite fond of the -describe using-.

                      It would be nice when working with large datasets to peek into the first or the last rows.
                      I suggest options for -use- like -use var1 var2 using "dataset.dta" in 1/100, last- to see the last 100 rows for var1 and var2 in "dataset.dta".

                      If a min/max/missing report could be saved in the dataset as metadata, that would be nice too. Maybe this should be an option to -save-.
                      The usual -in- range for the last observations would be as below (not the final character is a lower case L).

                      Code:
                      mycmd in -100/l

                      Comment


                      • I would like for -capture drop x y z- to work even if one the variables does not exist. Now, if there was no y, then it would not delete x or z. As a result, I have to use multiple lines of code when one should do.

                        Comment


                        • Veto #371. First, such a change would violate the (implicit) rule of "do it all or do nothing at all", which is implemented pretty much throughout Stata and ensures that you will never have to guess on the current state of the dataset.

                          More generally, while I see how the behavior can be frustrating in this situation, a change would necessarily lead to inconsistencies. If

                          Code:
                          capture drop x y z
                          would drop only existing variables then

                          Code:
                          drop x y z
                          would need to do the same. This is because capture does not change the behavior of any other command. If it did, we would need to look up the specific modifications capture did to each specific command. This is clearly way more inconvenient than the current (predictable) behavior.

                          If instead, we changed the way drop works, we would not even need capture. The problem with that is that whenever a command is referring to a varlist, it refers either to existing variables or to new variables, never to a mixture. Commands that may refer to existing and new variables, e.g., generate, have these groups of variables clearly separated in their syntax diagram.

                          Changing drop would also be inconsistent with keep because we can obviously not keep variables that do not exist.
                          Last edited by daniel klein; 19 May 2022, 07:55.

                          Comment


                          • Seconding #372. Also -- if a captured command "partially" succeeds, what is the return code stored in _rc? Is it 0, because of "partial" success, or is it (in this case) 111, because of "partial" failure? This would likely mess with a lot of the traditional usage of capture to lead in to a conditional argument based on the value of _rc.

                            Comment


                            • Originally posted by Ali Atia View Post
                              Seconding #372. Also -- if a captured command "partially" succeeds, what is the return code stored in _rc? Is it 0, because of "partial" success, or is it (in this case) 111, because of "partial" failure? This would likely mess with a lot of the traditional usage of capture to lead in to a conditional argument based on the value of _rc.
                              That would be a very strange behaviour for the capture command. I would rather have
                              Code:
                              drop var1 var2, force
                              as an option in the command. I still think this is not a very good option. If you absolutely need this behaviour you can also program it yourself with something along the lines of:

                              Code:
                              program define checkvars, rclass
                                syntax namelist
                              
                                unab varlist: *
                                return local newvars: list namelist - varlist
                                return local vars: list namelist & varlist
                              end
                              Code:
                              checkvars var1 var2 var3 var4
                              drop `r(vars)'

                              Comment


                              • Originally posted by Daniel Fernandes View Post

                                That would be a very strange behaviour for the capture command. I would rather have
                                Code:
                                drop var1 var2, force
                                as an option in the command. I still think this is not a very good option. If you absolutely need this behaviour you can also program it yourself with something along the lines of:

                                Code:
                                program define checkvars, rclass
                                syntax namelist
                                
                                unab varlist: *
                                return local newvars: list namelist - varlist
                                return local vars: list namelist & varlist
                                end
                                Code:
                                checkvars var1 var2 var3 var4
                                drop `r(vars)'
                                I believe you meant to respond to #371 -- we are in agreement that altering the behavior of capture in this way wouldn't be a good idea.

                                Comment

                                Working...
                                X