Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • User written ado -savespss- produces malformed data file in case of long value labels

    Dear Statalisters,
    (dear Sergiy Radyakin),

    I recently found a shortcoming of -savespss-: When a data set contains a single value label longer than 255 characters (bytes?), the SPSS data file produced seems to be malformed.
    I'm running -savespss- version 1.61 in Stata 13.1, fully patched. The problem exists both under Ubuntu 14.04 and Windows Server 2008 R2 environments.

    Consider the following Stata code to produce a SPSS file:
    Code:
    local length 256
    quietly {
        clear
        set obs 1
        generate variable=1
        label define VALLAB 1 `"`=`length'*"a"'"'
        label values variable VALLAB
    }
    capture : confirm file `"test.sav"'
    if (_rc==0) erase `"test.sav"'
    savespss `"test.sav"'
    The resulting SPSS file cannot be read. SPSS answers the GET FILE statement with the following error message:
    Code:
    GET FILE='test.sav'.
    >Error.  Command name: GET FILE
    >Invalid SPSS Statistics data file: test.sav (DATA1204)
    >Execution of this command stops.
     
    >Error # 1405 in column 8.  Text: test.sav
    >Error when attempting to get a data file.
    PSPP produces an error message more detailed:
    Code:
    GET FILE="test.sav".
    
    error: `test.sav' near offset 0x1f8: Variable index record (type
    4) does not immediately follow value label record (type 3) as it should.
    Note that as soon as you change the local `length' in the Stata code chunk to 255 or less, everything works as expected.
    Is this a known limitation to -savespss-? Does anyone encounter the same problem? And: Is there any known workaround, despite manually truncating all value labels to 255 characters?

    Kind regards
    Bela

  • #2
    Hi again,

    Originally posted by Daniel Bela View Post
    Is this a known limitation to -savespss-? Does anyone encounter the same problem?
    I forgot to check this yesterday, but now I did: IBM SPSS Statistics 22 allows for only 120 characters in a single value label when creating them via the GUI, PSPP does not seem to have such a limitation, but when saving the data set and re-opening it with either IBM SPSS Statistics or GNU PSPP, labels seem to get truncated to 134 characters.

    So this seems to be a limitation of the SAV-data file format, not of -savespss- itself.

    Nonetheless, IMHO should -savespss- either issue an error when encountering long value labels, or force-truncate the respective string.

    Any other thoughts, someone?

    Comment


    • #3
      Thank you Daniel,

      indeed Stata format is now somewhat reacher with longer strings (2bln>32767) and longer value labels (65535>120 than SPSS format. It also has more discrete missing values. However it has shorter variable labels (80 vs 120 in SPSS) and can't label strings or non-integer numbers. It also doesn't have missing intervals (all values within a certain range) which is present in SPSS, and labelled intervals (present in CSPro). Would users be interested in an article for the Stata Journal on comparison of data file formats? (Let me know by liking the post).

      Verification for length of value labels was added in the version 1.71, which was not posted to the web, so Daniel is using v1.61. Based on the documentation, the threshold is 120 characters. I haven't ever encountered a file with 134 characters long value labels, but this might be dependent on SPSS version. Early versions of SPSS were sporting a 60-char limit, as documented online here:
      Release History

      Release 14.0
      • The maximum length of a value label is extended to 120 bytes (previous limit was 60 bytes).
      The homepage of -savespss- had a poll on versions of SPSS. No user has reported versions prior to 14. The poll is now closed.

      The file format itself allows value labels up to 255 characters long, so 120 is an artificial restriction introduced by SPSS product, not the file format. With the introduction of unicode it effectively means about 80 letters can be stored potentially, without change of the format, or about 40 letters can be stored now.

      Current development version 1.73 has more features that Daniel and other users may like. Such as:
      • automatic date/time variables conversion to SPSS date/time format;
      • a clear description of which missing values are used for numeric variables;
      • support for strLs in Stata 13;
      • better dialogs;
      • another command to save part of the data only (often requested, though, I have no idea why this is needed!?).
      More importantly however, is that v1.71 tentatively should fix a bug with long strings (the support of long strings was clearly marked as experimental). The problem occurred only if you used savespss v1.51-1.61, and only if you output long string variables and only from Stata 13. The problem is not immediately obvious (unlike the issue that Daniel wrote about), because SPSS opens the file, but several characters in the middle of the long strings do disappear.

      I am grateful to Daniel for his attention to details and his efforts in determining the source of the problem.
      I have sent him the link for the 1.73 version, which, if no other problems are found will be submitted to SSC.

      Spot another bug in any of my programs? Let me know!

      Best, Sergiy Radyakin
      Last edited by Sergiy Radyakin; 25 Jul 2014, 14:27.

      Comment

      Working...
      X