My query does not focus on a specific Mata problem but an issue of program design. I am writing a set of programs in Stata which rely heavily on calling Mata routines for executing the main calculations, while I use Stata for data management. The datasets being processed may be quite large and I have found that transferring data variables between Stata and Mata using putmata or getmata commands can take considerable amounts of time. Under some circumstances there can be memory problems too. Hence, I have switched to using views of variables using st_view rather than copies of variables. This appears to save time and memory.
But ... the st_view help information contains a warning that worries me (a lot). It says:
" Cautions when using views 3 ... For faster data access, an st_view() connection accesses data using variable indices, not variable names. However, variable indices can change when variables are created or removed. If a variable is created or removed while your code is using a view connection, there is a chance the view will switch to another variable."
If the "chance" were to happen, this would be disastrous because I and the program would have no way of knowing that it has happened and the results would be completely misleading. The whole point of organising my program as I have is that Stata is more suitable for managing and storing lots of data, whereas Mata is more efficient for carrying out optimisations and other calculations - my program relies heavily on the Mata linear programming class. My questions are:
A. Has anyone actually had this happen? and under what circumstances? Is it very rare or a serious potential issue?
B. For now, I have built in protection by redeclaring all operative views before calling any Mata routine, but this seems like overkill. I believe that st_view is efficient because it relies upon index manipulation, but nonetheless there are processing and memory costs. What is the happy medium? William Gould's Mata book provides very little guidance on views. For small problems this does not matter but I am writing code that may take 1 or 2 hours of CPU time to run in actual use, even with 4 or 8 cores.
But ... the st_view help information contains a warning that worries me (a lot). It says:
" Cautions when using views 3 ... For faster data access, an st_view() connection accesses data using variable indices, not variable names. However, variable indices can change when variables are created or removed. If a variable is created or removed while your code is using a view connection, there is a chance the view will switch to another variable."
If the "chance" were to happen, this would be disastrous because I and the program would have no way of knowing that it has happened and the results would be completely misleading. The whole point of organising my program as I have is that Stata is more suitable for managing and storing lots of data, whereas Mata is more efficient for carrying out optimisations and other calculations - my program relies heavily on the Mata linear programming class. My questions are:
A. Has anyone actually had this happen? and under what circumstances? Is it very rare or a serious potential issue?
B. For now, I have built in protection by redeclaring all operative views before calling any Mata routine, but this seems like overkill. I believe that st_view is efficient because it relies upon index manipulation, but nonetheless there are processing and memory costs. What is the happy medium? William Gould's Mata book provides very little guidance on views. For small problems this does not matter but I am writing code that may take 1 or 2 hours of CPU time to run in actual use, even with 4 or 8 cores.
Comment