Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Batch Import SAS Value Labels

    I am working with a Stata formatted dataset from an organization that sent me the value labels in SAS format. I have Stata data in one file and then a separate Excel spreadsheet that has four columns: value label, value name, data type (not useful for Stata) and current data field value. The variable names that use those value labels are not included in that spreadsheet.

    I am trying to create a .do file that will generate the value labels I want from the Excel sheet, so that I can label the variables with the appropriate values.
    For each label in column 2, label define based on columns 4 and 1. Thus, for the table example below:
    label define ABO 1 "O" 2 "A" 3 "B" 4 "AB"
    label define ABOMAT 1 "Identical" 2 "Compatible" 3 "Incompatible"
    And then assign them to the .dta file?

    I appreciate any advice!
    Thanks,
    Sarah
    Data Field Formatted Value SAS ANALYSIS FORMAT DATA TYPE Data Field Value
    O ABO C 1
    A ABO C 2
    B ABO C 3
    AB ABO C 4
    Identical ABOMAT C 1
    Compatible ABOMAT C 2
    Incompatible ABOMAT C 3

  • #2
    This should get you well on your way. Good luck sorting out which variables go with which formats; you didn't give any info on that aspect.
    Code:
    clear
    
    *========I'M ASSUMING RAW TEXT WITH NO HEADER ROW.   ADJUST ACCORDINGLY TO IMPORT EXCEL
    *========AND NAMES OF VARIABLES IF IT HAS A HEADER ROW
    import delimited C:\data\trash1\sasformats.txt, delimiter(space)
    
    *========PUT SECOND AND FOURTH COLUMNS (VALUE LABEL AND NUMERIC VALUE) FIRST
    order v2 v4
    *========DATA TYPE IRRELEVANT, DROP IT
    drop v3
    
    *========WRAP TEXT FOR LABEL IN QUOTATION MARKS; CHAR(34) IS A DOUBLE QUOTE
    replace v1=char(34)+v1+char(34)
    
    *========GET RID OF REPEATS IN VALUE LABEL NAME COLUMN, AND TACK ON "label define "
    bysort v2: replace v2="" if _n!=1
    replace v2="label define " + v2 if length(trim(v2))>0
    
    *=========ADD CONTINUATION MARKER WHERE NECESSARY
    replace v1=v1+" ///" if length(v2[_n+1])==0 & _n<_N
    
    *=========OUTPUT RESULTS!
    outfile using "C:\data\trash1\mylabels.do", noquote replace
    Last edited by ben earnhart; 15 Dec 2014, 18:40.

    Comment


    • #3
      Also, see http://www.stata.com/statalist/archi.../msg00349.html .
      David Radwin
      Senior Researcher, California Competes
      californiacompetes.org
      Pronouns: He/Him

      Comment


      • #4
        Brilliant, thanks so much! I tweaked it a bit to make it work.
        For anyone finding this thread in the future, for UNOS STAR File SAS data conversion to Stata, this should get you started. Note that C type variables (with string values) have to be dropped, but these usually don't need value labels anyway, they are self-explanatory.

        order SASANALYSISFORMAT DataFieldValue DataFieldFormattedValue
        replace DataFieldFormattedValue = char(34)+DataFieldFormattedValue+char(34)
        bysort SASANALYSISFORMAT DataFieldValue: replace DataFieldValue="" if _n!=1
        replace DataFieldValue ="." if DataFieldValue ==""
        replace DataFieldValue ="7777777" if DataFieldValue =="**OTHER**"
        replace DataFieldValue ="9999999" if DataFieldValue =="Null or Missing"
        drop if DATATYPE=="C"
        destring DataFieldValue, replace
        drop DATATYPE
        replace SASANALYSISFORMAT ="label define "+SASANALYSISFORMAT if length(trim(SASANALYSISFORMAT))>0
        gen str add = ", add"
        outfile using "Outfile.do", noquote replace wide

        Comment

        Working...
        X