Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Marginsplot at defined values

    Hi,

    I am running Stata 18 on Windows 10. The aim is to model the impact of filament length on failure. Please find the dataex file pasted below, where failure (0 = no fail, 1 = fail), length (mm) and age (in weeks). There are numerous approaches to analysing this data. Here I am focusing on treating the continuous variable length as a categorical variable, because I stumbled on Stata output that I don’t understand. Code and questions appear below.

    Approach 1. Cut length into quartiles, generating a new variable “length_cat” and treat these as categorical.

    xtile length_cat = length, nq(4)
    margins length_cat
    marginsplot


    So far so good.

    Approach 2. I don’t maintain this is in any way sensible, but suppose we keep length as quartiles, and treat “length_cat” as continuous (by omitting i.), noting the possible values of “length_cat” and the corresponding mean length of each level of length_cat:

    codebook length_cat
    bysort length_cat: su length


    Plot the predicted margins, specifying values of continuous variable to report.

    // A.
    logistic fail length_cat weeks
    margins, at(length_cat=(1 2 3 4))
    marginsplot


    No questions here: the association is constrained to a monotonic relationship.

    //B.
    qui logistic fail length_cat weeks
    margins, at(length=(4.74 6.41 7.80 10.38))
    marginsplot


    How is it that Stata plot this, given that the variable “length” is not in the regression? If I change the variable name “length” to “mystery”, Stata understandably complains that “mystery” is not in the list of covariates and produces no margins.

    //C.
    qui logistic fail length_cat weeks
    margins, at(length =(1(2)14))
    marginsplot


    Same question as for B.

    //D.
    qui logistic fail length_cat weeks
    margins, at(length_cat=(1(2)14))
    marginsplot


    What has Stata plotted here? The variable “length_cat” only takes the values 1, 2, 3, 4. (I’m guessing the answer to this question will partly explain B. and C.)

    Thankyou!

    Janine

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input float(partID length weeks) long fail float(pop totfail prop_fail) byte length_cat
      1 1.8 75 0  1 0         0 1
      2   2 71 1  1 1         1 1
      3 2.1 60 0  1 0         0 1
      4 2.2 48 0  1 0         0 1
      5 2.3 72 0  1 0         0 1
      6 2.4 26 0  1 0         0 1
      7 2.6 63 0  3 0         0 1
      8 2.6 45 0  3 0         0 1
      9 2.6 50 0  3 0         0 1
     10 2.7 73 1  3 2  .6666667 1
     11 2.7 69 0  3 2  .6666667 1
     12 2.7 70 1  3 2  .6666667 1
     13 2.8 43 0  7 1 .14285715 1
     14 2.8 68 0  7 1 .14285715 1
     15 2.8 41 0  7 1 .14285715 1
     16 2.8 74 0  7 1 .14285715 1
     17 2.8 60 0  7 1 .14285715 1
     18 2.8 67 1  7 1 .14285715 1
     19 2.8 36 0  7 1 .14285715 1
     20 2.9 69 0 10 1        .1 1
     21 2.9 53 0 10 1        .1 1
     22 2.9 66 1 10 1        .1 1
     23 2.9 56 0 10 1        .1 1
     24 2.9 70 0 10 1        .1 1
     25 2.9 48 0 10 1        .1 1
     26 2.9 52 0 10 1        .1 1
     27 2.9 60 0 10 1        .1 1
     28 2.9 55 0 10 1        .1 1
     29 2.9 65 0 10 1        .1 1
     30   3 35 0 10 3        .3 1
     31   3 52 1 10 3        .3 1
     32   3 63 1 10 3        .3 1
     33   3 62 0 10 3        .3 1
     34   3 67 0 10 3        .3 1
     35   3 52 0 10 3        .3 1
     36   3 67 0 10 3        .3 1
     37   3 16 0 10 3        .3 1
     38   3 42 0 10 3        .3 1
     39   3 62 1 10 3        .3 1
     40 3.1 64 0 12 5  .4166667 1
     41 3.1 68 0 12 5  .4166667 1
     42 3.1 32 0 12 5  .4166667 1
     43 3.1 51 0 12 5  .4166667 1
     44 3.1 70 0 12 5  .4166667 1
     45 3.1 76 1 12 5  .4166667 1
     46 3.1 67 1 12 5  .4166667 1
     47 3.1 63 1 12 5  .4166667 1
     48 3.1 75 1 12 5  .4166667 1
     49 3.1 63 0 12 5  .4166667 1
     50 3.1 59 0 12 5  .4166667 1
     51 3.1 66 1 12 5  .4166667 1
     52 3.2 69 1  5 2        .4 1
     53 3.2 41 0  5 2        .4 1
     54 3.2 67 1  5 2        .4 1
     55 3.2 37 0  5 2        .4 1
     56 3.2 63 0  5 2        .4 1
     57 3.3 61 0 15 4 .26666668 1
     58 3.3 57 1 15 4 .26666668 1
     59 3.3 33 1 15 4 .26666668 1
     60 3.3 59 0 15 4 .26666668 1
     61 3.3 47 0 15 4 .26666668 1
     62 3.3 60 0 15 4 .26666668 1
     63 3.3 38 0 15 4 .26666668 1
     64 3.3 34 0 15 4 .26666668 1
     65 3.3 62 0 15 4 .26666668 1
     66 3.3 76 1 15 4 .26666668 1
     67 3.3 68 0 15 4 .26666668 1
     68 3.3 62 0 15 4 .26666668 1
     69 3.3 78 1 15 4 .26666668 1
     70 3.3 33 0 15 4 .26666668 1
     71 3.3 66 0 15 4 .26666668 1
     72 3.4 68 0 17 5 .29411766 1
     73 3.4 65 1 17 5 .29411766 1
     74 3.4 56 0 17 5 .29411766 1
     75 3.4 42 1 17 5 .29411766 1
     76 3.4 64 0 17 5 .29411766 1
     77 3.4 68 0 17 5 .29411766 1
     78 3.4 57 0 17 5 .29411766 1
     79 3.4 69 0 17 5 .29411766 1
     80 3.4 74 1 17 5 .29411766 1
     81 3.4 66 0 17 5 .29411766 1
     82 3.4 64 0 17 5 .29411766 1
     83 3.4 56 1 17 5 .29411766 1
     84 3.4 60 0 17 5 .29411766 1
     85 3.4 50 0 17 5 .29411766 1
     86 3.4 80 0 17 5 .29411766 1
     87 3.4 50 0 17 5 .29411766 1
     88 3.4 48 1 17 5 .29411766 1
     89 3.5 48 0 22 9  .4090909 1
     90 3.5 59 0 22 9  .4090909 1
     91 3.5 62 0 22 9  .4090909 1
     92 3.5 57 0 22 9  .4090909 1
     93 3.5 59 1 22 9  .4090909 1
     94 3.5 75 1 22 9  .4090909 1
     95 3.5 34 0 22 9  .4090909 1
     96 3.5 66 0 22 9  .4090909 1
     97 3.5 69 1 22 9  .4090909 1
     98 3.5 77 1 22 9  .4090909 1
     99 3.5 74 1 22 9  .4090909 1
    100 3.5 50 1 22 9  .4090909 1
    end
    label values fail faillabel
    label def faillabel 0 "no fail", modify
    label def faillabel 1 "fail", modify








  • #2
    I am not sure what is going on in B and C. Stata should give an error message complaining that length is not in the regression. But I think what may be happening is that Stata is applying variable abbreviation here, and is interpreting length as an abbreviation for length_cat, which is a variable in the regression. This would even be appropriate behavior if there weren't also a variable whose exact name is length. But given that there is such a variable, Stata should not be allowing you to abbreviate length_cat to length here. I would consider this behavior a bug.

    As for d), there is no mystery here. Within the -at()- option you can specify any values for a variable you want--they don't have to be within the observed range of the variable in the data set. (Mind, applying -margins, at()- with values outside the range of the observed data is usually a bad idea, but it is legal to do.) What Stata does when you run -margins, at()- is create new observations with the -at()- values replacing the actual values of the -at()- variables, and then applies the -predict- command to calculate what the outcome variable expectation is conditional on those -at()- values. This kind of calculation is not at all constrained by the actual observed values in the data.

    Comment


    • #3
      Very illuminating, thankyou Clyde!

      Comment


      • #4
        The at() option parses variable names relative to the column stripe of e(b) instead of the variable names in the current dataset. If you have variable abbreviations on (the default for set varabbrev), then margins' parsing code will accept abbreviations in option at(). In the above example, length is a non-ambiguous abbreviation for length_cat since variable length is not among the independent variables in the currently fitted model.

        Comment


        • #5
          The at() option parses variable names relative to the column stripe of e(b) instead of the variable names in the current dataset.
          That's good to know. Does that appear in the documentation anywhere? If so, I missed it. If not, it should probably be added.

          Comment


          • #6
            Jeff,

            Thanks for your contribution!

            I take this as a warning to turn variable abbreviation off in the case that one has similarly named variables.

            I wonder how ubiquitous this parsing method it, among Stata commands and options.

            Comment


            • #7
              Here are some of the commands and elements where Stata parses variable names (varlists) relative to the column stripe elements in e(b).

              margins marginslist

              margins options dydx(), dyex(), eydx(), and eyex()

              test coeflist

              testparm varlist

              lincom exp

              pwcompare marginlist

              contrast termlist

              Comment

              Working...
              X