Wishlist for Stata 18

wbuchanan

Join Date: Mar 2014

Posts: 1361
#466

03 Sep 2022, 06:54

[email protected] what you’re asking should be possible using Mata and the information in

Code:

help dta

. What you’re asking for is something that parses everything up to the <data> tag in the dta file spec (https://www.stata.com/help.cgi?dta#map).
1 like
Comment
Daniel Feenberg

Join Date: Oct 2014

Posts: 323
#467

03 Sep 2022, 08:22

Make debugging easier with one or more of:

1) Provide the line number or context for error messages triggered inside a -foreach loop- without requiring the user to rerun the program with -set trace on;set tracedepth1-.

2) Mark error messages with a distinctive tag so that they can be searched for in an editor.

3) Provide a way to suppress purely informative messages such as "NN real changes made" or "NN observations deleted" without suppressing actual error messages.

4) Add the variable name to the informative messages mentioned in (3) so that they can be related to the particular variable when executed in a -foreach- loop.

5) Add more context to error message. If an option is improper, what is the string that isn't a proper option. If there is a type mismatch, what are the types that don't match. If there is syntax error, where did the parsing stop. etc, etc. Little bits of information can save a lot of time. How long does it take you to understand the following code fragment and error message:

Code:

. list +-------+ | x y | |-------| 1. | 3 2 | +-------+ . gen z=x*y type mismatch r(109);

How long for a beginning Stata user?

6) Provide a -trace- setting that doesn't trace Stata provided code, only code in the runing program and current directory.

Last edited by Daniel Feenberg; 03 Sep 2022, 08:25.
2 likes
Comment
daniel klein

Join Date: Mar 2014

Posts: 3818
#468

03 Sep 2022, 13:34

Originally posted by [email protected] View Post

3) Provide a way to suppress purely informative messages such as "NN real changes made" or "NN observations deleted" without suppressing actual error messages.

See

Code:

help quietly

Originally posted by [email protected] View Post

5) Add more context to error message.
How long does it take you to understand the following code fragment and error message:

Code:

. list +-------+ | x y | |-------| 1. | 3 2 | +-------+ . gen z=x*y type mismatch r(109);

How long for a beginning Stata user?

A click on r(109) yields:

In an expression, you attempted to combine a string and numeric
subexpression in a logically impossible way. For instance, you
attempted to subtract a string from a number or you attempted
to take the substring of a number.

Seem like a pretty accurate description of the problem to me.

Last edited by daniel klein; 03 Sep 2022, 13:36.
1 like
Comment
Daniel Feenberg

Join Date: Oct 2014

Posts: 323
#469

03 Sep 2022, 15:25

No doubt Stata works better in interactive mode, but I don't.
Comment
Federico Bindi

Join Date: Nov 2021

Posts: 16
#470

05 Sep 2022, 00:25

Introduce new data structures (such as lists and tuples). I know, there's Python for that, but still...
1 like
Comment
Maarten Buis

Join Date: Mar 2014

Posts: 3426
#471

05 Sep 2022, 04:56

Originally posted by Federico Bindi View Post

Introduce new data structures (such as lists and tuples). I know, there's Python for that, but still...

Do you want those in Mata? I cannot imagine why you would want those in Stata (but that could also have something to do with my imagination...). If you present a use case for those datastructures, then your request can become more convincing. Everybody can say they want something, but there is only a limited amount of resources.

Right now it sounds like a common problem that many people who migrate from language A to language B have: they miss some aspect of language A, but don't know yet that there is some other way that language B does things that makes that aspect unnecessary and/or even inefficient. This is not ciriticism of you, it is a common process we all go through at some point.

---------------------------------
Maarten L. Buis
University of Konstanz
Department of history and sociology
box 40
78457 Konstanz
Germany
http://www.maartenbuis.nl
---------------------------------
4 likes
Comment
John Mullahy

Join Date: Dec 2016

Posts: 742
#472

05 Sep 2022, 07:22

re: #470

Code:

ssc describe tuples

While not a new data structure it may still prove useful.
Comment
Ali Atia

Join Date: May 2020

Posts: 737
#473

05 Sep 2022, 11:43

Originally posted by Federico Bindi View Post

Introduce new data structures (such as lists and tuples). I know, there's Python for that, but still...

In many ways, Stata's macros replicate (or can be made to replicate) the functionality of lists -- see here for more: pmacrolists.pdf (stata.com).
Comment
Karen Strope (StataCorp)

StataCorp Employee

Join Date: Mar 2014

Posts: 4
#474

06 Sep 2022, 07:10

Originally posted by Tom Dietz View Post

This isn't a part of the code but a thought on policies. I am moving to emeriti status and so my university will no longer pay for Stata. With the end of perpetual licenses I will have to pay out of pocket for Stata in about two years. So after 30 years of using and teaching Stata almost exclusively I will reluctantly switch to R as I plan to continue doing research. I'm wondering if Stata could have a pricing policy of the sort used by many scientific societies--with a special rate for retirees.

Hi Tom! I wanted to reassure you that perpetual licenses are still available. You can upgrade your existing perpetual license online at https://www.stata.com/order. If you would like to purchase a new perpetual license, or if you have any questions, you can contact us at [email protected]. We are happy to go over licensing options, including licensing options for retirees, with you.
3 likes
Comment
wbuchanan

Join Date: Mar 2014

Posts: 1361
#475

06 Sep 2022, 09:00

[email protected] correction, not possible to do in Mata, currently. That said, if StataCorp is able to allow the buffer functions in Mata to read/write 8-byte unsigned/signed integers it wouldn't be terribly difficult to do what you're asking in Mata and only read in the metadata while skipping the data and strls entirely.
Comment
daniel klein

Join Date: Mar 2014

Posts: 3818
#476

06 Sep 2022, 09:18

Originally posted by wbuchanan View Post

[email protected] correction, not possible to do in Mata, currently.

Probably, I misunderstand the request, or there is a bug in one of my routines. I have a clumsy way of reading the variable names from the dta file in my usesome command. I was under the impression that you could, in principle, read the other metadata fields as well.
Comment
wbuchanan

Join Date: Mar 2014

Posts: 1361
#477

07 Sep 2022, 07:07

daniel klein I just looked at the code you referenced and tried an experiment with it to see if it would recover the correct information, but it doesn't seem like it is parsing 8-byte unsigned integers correctly (assuming I was interpreting things correctly). My approach was going to be to read in enough bytes initially to get the <map> element, and then use those 8byte unsigned integers to quickly locate all of the other elements in the file header. When I used the same method you are using in your

Code:

hexread

function the result was definitely not correct:

Code:

. mata ------------------------------------------------- mata (type end to exit) ------------------------ : fh = fopen("Sample.dta", "r") : test = fread(fh, 612) : mapelem = ustrregexm(test, ".*(<map>.*</map>).*") : mapstr= ustrregexs(1) : map = ustrregexra(mapstr, "</?map>", "") : x = ascii(substr(map, 9, 16)) : y = inbase(16, x) : for(i = 2; i <= cols(y); ++ i) y[1] = y[1] + substr("0" + y[1], -2) : frombase(16, y[1]) 3.18931e+38 <- This is a really small file, so it seems unlikely that the second value in the map element would be at this byte position : x = ascii(strreverse(substr(map, 9, 16))) : y = inbase(16, x) : for(i = 2; i <= cols(y); ++ i) y[1] = y[1] + substr("0" + y[1], -2) : frombase(16, y[1]) 0 <- The second value in <map> should be the byte value where the <map> element is located

Regardless, if the point of the buffer functions in Mata is to allow the reading/writing of files, it seems like it would be reasonable that they be able to read/write the same types of values used in a .dta file.
1 like
Comment
wbuchanan

Join Date: Mar 2014

Posts: 1361
#478

07 Sep 2022, 12:11

I'm unable to edit my previous comment, but there seem to be some inconsistencies in what -ustrregexm/s()- returns when using a different .dta file. I will continue trying to see if I can figure out what is wrong with the way I applied the logic of daniel klein's function.
Comment
Hua Peng (StataCorp)

StataCorp Employee

Join Date: Jun 2014

Posts: 343
#479

08 Sep 2022, 16:54

ustrregexm(), ustrregexs(), and ustrregexra() are not going to work for what you intend here. All three functions are implemented using ICU lib, which works with well formed text string in UTF-16 encoding, not binary data. Hence the function will convert Stata/Mata strings from UTF-8 to UTF-16 encoding before handing it to ICU, and the results will be convert back to UTF-8 encoding. Any invalid Unicode sequence will be changed.

Code:

mapelem = ustrregexm(test, ".*(<map>.*</map>).*")

will not work consistently depending on if there is line feed in test. The following modified regular expression should handle line feed:

Code:

mapelem = ustrregexm(test, "(.|\n)*(<map>(.|\n)*</map>)(.|\n)*")

After that,

Code:

mapstr= ustrregexs(1)

will converted the byte sequence between <map> and </map> to UTF-16 then converted it back to UTF-8, hence any invalid Unicode sequences will be changed during the conversion, i.e., highly likely you get something back which is different. Here I believe strpos() and substr() should work,

Code:

mata: fh = fopen("auto.dta", "r") test = fread(fh, 612) // test = "ab<map>cd</map>ef" p1 = strpos(test, "<map>") p1 p2 = strpos(test, "</map>") len = p2 - p1 - 5 len start = p1 + 5 b = substr(test, start, len) b end

Last edited by Hua Peng (StataCorp); 08 Sep 2022, 17:12.
4 likes
Comment
wbuchanan

Join Date: Mar 2014

Posts: 1361
#480

09 Sep 2022, 07:06

Hua Peng (StataCorp)
Thanks again as always for the insight. All that said, any chance for the buffer functions in Mata to make it possible to parse 8-byte unsigned integers?
Comment

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment