Interpreting LOWESS Plots

April Kimm

Join Date: Mar 2021

Posts: 45
#1

Interpreting LOWESS Plots

10 Feb 2022, 04:03

These graphs are Logit transformed LOWESS smoothed curves. I am having a hard time interpreting the plots because they are presented in terms of the log odds ratio and all values are smaller than zero.

Could anyone please help me identify the relationships (linear or nonlinear) and tell me if I need functional forms to grasp the structure of the relationships?

Thank you very much.
Attached Files

Last edited by April Kimm; 10 Feb 2022, 04:16.
Tags: None
Nick Cox

Join Date: Mar 2014

Posts: 35211
#2

10 Feb 2022, 04:21

You have some very small probabilities there. For example logit -5 is already < 0.01.

Code:

. mata : (0, -5, -10, -15)' , invlogit((0, -5, -10, -15)') 1 2 +-----------------------------+ 1 | 0 .5 | 2 | -5 .0066928509 | 3 | -10 .0000453979 | 4 | -15 3.05902e-07 | +-----------------------------+

See mylabels on SSC for one way to get labelling the y axis in terms of probabilities. Also https://www.stata-journal.com/articl...article=gr0032
1 like
Comment
April Kimm

Join Date: Mar 2021

Posts: 45
#3

10 Feb 2022, 04:30

Thank you very much for the quick response. I think the data indicates small probabilities because they are rare events. But I still need to analyze the relationships.

Could anyone please help me identify if I need functional forms or splines to capture the relationships?

Thank you again.

Last edited by April Kimm; 10 Feb 2022, 04:36.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35211
#4

10 Feb 2022, 05:16

Indeed; they are small for the best reasons.

Splines may or not work better. The drop-off around 115 days looks quirky to me. I would want to get nearer to the data to understand that, not further away by fitting a function locally or globally, but the two impulses are different.
Comment
April Kimm

Join Date: Mar 2021

Posts: 45
#5

12 Feb 2022, 03:50

I am still struggling with the Lowess plots. Can anyone tell me if I can estimate the above patterns using piecewise regressions with knots?

Thank you in advance.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35211
#6

12 Feb 2022, 04:55

As I understand it, you have three binary responses for terrorism (guerrilla activity? something else) versus time in days (or is it some other time units; principles are the same), so that the data for each variable could be reduced to a table of each outcome 0 or 1 as columns and time roughly 50(1)120. Hence you could show us such a table as in effect a data example. Are these panel or longitudinal data?

You could use splines in a regression. The questions are (1) what kind (2) where to place the knots (3) whether they are going to work well (e.g. whether you can avoid artefacts such that spline goes negative. Difficulties with handling overall probabilities near 0 or near 1 are precisely why the variant smoother you've used was introduced. I don't have particular recommendations there, as spline choice seems to me a difficult art.

I see fairly flat curves here, but it's not my field. Either the downturn in two graphs around time 115 is an interesting finding to be interpreted substantively, or it's a quirk because there are so few data points for the highest values.
Comment
Rich Goldstein

Join Date: Mar 2014

Posts: 4408
#7

12 Feb 2022, 05:36

lowess plots that do not include the data (i.e., a scatter plot) are hard to interpret as, especially in the tails, changes in the plot can be driven by a very small amount of data; for some reason, the use of the "logit" option suppresses the scatterplot; you can either add it back in by overlaying the scatter plot or drop the logit option
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35211
#8

12 Feb 2022, 06:01

Rich Goldstein I am in full agreement that I would like to see the data too, as expressed in #4 and #6. But if the data are 0s and 1s a scatter plot is often not informative, even with jittering. However, here 1s are in a tiny minority, so when they occur would be helpful to know.

I'll like to see plots like say

Code:

egen mean = mean(terrorism), by(time) egen count = count(terrorism), by(time) twoway connected mean time, sort name(G1) twoway connected count time, sort name(G2) graph combine G1 G2, col(1) xcommon

as those two reductions are naturally equivalent to the original data. It is important to see any gaps too.
Comment
April Kimm

Join Date: Mar 2021

Posts: 45
#9

13 Feb 2022, 04:17

Yes, i have three binary dependent variables and they are panel data (month). For above graphs, a big drop off around 115 days could in part be attributed to the decreases in information sources available as the U.S. combat troops began to leave South Vietnam.

I created the plots as suggested but the count maps do not make sense to me because the number (count) of dependent variables indicates even more than 15000 incidents happen at some point. According to the data, the maximum number of conventional war was 28 (please see the attachments).

I am having a hard time interpreting the plots.
Attached Files

conv_WAR.docx (27.8 KB, 1 view)
Comment

Nick Cox

Join Date: Mar 2014
Posts: 35211

#10

13 Feb 2022, 04:34

It seems that your date variable is malformed, running 6701 ..,, 6712,, 6801, ..., 6812, and so on. That is readable to you as 1967 January, ..., 1967 December, 1968 January, and so on, but Stata sees only 11 gaps of 1 and 1 gap of 89 in any interval 100 long and takes what you give literally.

Note that assigning a date display format can't make sense out of this.

Anything based on these "dates" -- sorry for the bad news -- is essentially useless until redone.

This script illustrates the problem and a solution.

Code:

. clear

. set obs 24
Number of observations (_N) was 0, now 24.

. gen problem = cond(_n <= 12, 6800 + _n, 6900 + _n - 12)

. gen mdate = ym(1900 + floor(problem/100), mod(problem, 100))

. format mdate %tm

. list, sep(12)

     +-------------------+
     | problem     mdate |
     |-------------------|
  1. |    6801    1968m1 |
  2. |    6802    1968m2 |
  3. |    6803    1968m3 |
  4. |    6804    1968m4 |
  5. |    6805    1968m5 |
  6. |    6806    1968m6 |
  7. |    6807    1968m7 |
  8. |    6808    1968m8 |
  9. |    6809    1968m9 |
 10. |    6810   1968m10 |
 11. |    6811   1968m11 |
 12. |    6812   1968m12 |
     |-------------------|
 13. |    6901    1969m1 |
 14. |    6902    1969m2 |
 15. |    6903    1969m3 |
 16. |    6904    1969m4 |
 17. |    6905    1969m5 |
 18. |    6906    1969m6 |
 19. |    6907    1969m7 |
 20. |    6908    1969m8 |
 21. |    6909    1969m9 |
 22. |    6910   1969m10 |
 23. |    6911   1969m11 |
 24. |    6912   1969m12 |
     +-------------------+

I can't as yet help with your other puzzlements. If you have monthly data, where does an interpretation in terms of 115 days come from?

Please see https://www.statalist.org/forums/help#stata for advice about attachments. MS Word attachments are more or less problematic for very many members here and you are asked not to post them.

Last edited by Nick Cox; 13 Feb 2022, 04:37.

Comment

April Kimm

Join Date: Mar 2021

Posts: 45
#11

15 Feb 2022, 11:02

I am trying to generate appropriate monthly variables. The original format for the year-month variables are from 6701 to 7212.
I revised the command from above but the codes below are not working. Stata says the following command is invalid: "gen problem = cond(_n <= 12, 6700 +_n, 6800 + _n, 6900 + _n, 7000 +_n, 7100 + _n, 7200 + _n - 12).

clear .

set obs 66
Number of observations (_N) was 0, now 66.

gen problem = cond(_n <= 12, 6700 +_n, 6800 + _n, 6900 + _n, 7000 +_n, 7100 + _n, 7200 + _n - 12) .

gen mdate = ym(1900 + floor(problem/100), mod(problem, 100))

format mdate %tm
Comment

Nick Cox

Join Date: Mar 2014
Posts: 35211

#12

15 Feb 2022, 11:43

You don't have to create a problem variable; you have one already. I created an example dataset, because you haven't given one yet.

This code creating mdate should work generally for years 1967 onwards (in the 20th century). The fact that the example data are for 1967 and 1968 only doesn't bite.

I take it that your variable with values like 6701 to 7212 is called month. Here's a demo that the code will work generally

Code:

 
* the first few commands are for me or anyone else to set up a dataset 
. clear

. set obs 12
Number of observations (_N) was 0, now 12.

. gen month = 100 * (67 + ceil(_n/2)) + cond(mod(_n, 2), 3, 9)

* so you should have a variable a bit like this 
. l month

     +-------+
     | month |
     |-------|
  1. |  6803 |
  2. |  6809 |
  3. |  6903 |
  4. |  6909 |
  5. |  7003 |
     |-------|
  6. |  7009 |
  7. |  7103 |
  8. |  7109 |
  9. |  7203 |
 10. |  7209 |
     |-------|
 11. |  7303 |
 12. |  7309 |
     +-------+

* and this is what to do 
. gen mdate = ym(1900 + floor(month/100), mod(month, 100))

. format mdate %tm

. list, sep(2)

     +----------------+
     | month    mdate |
     |----------------|
  1. |  6803   1968m3 |
  2. |  6809   1968m9 |
     |----------------|
  3. |  6903   1969m3 |
  4. |  6909   1969m9 |
     |----------------|
  5. |  7003   1970m3 |
  6. |  7009   1970m9 |
     |----------------|
  7. |  7103   1971m3 |
  8. |  7109   1971m9 |
     |----------------|
  9. |  7203   1972m3 |
 10. |  7209   1972m9 |
     |----------------|
 11. |  7303   1973m3 |
 12. |  7309   1973m9 |
     +----------------+

Comment

April Kimm

Join Date: Mar 2021

Posts: 45
#13

15 Feb 2022, 12:07

It worked very well. Thank you so much for your help. I appreciate it!
Comment
April Kimm

Join Date: Mar 2021

Posts: 45
#14

16 Feb 2022, 11:12

I plotted the graphs following commands (#8) but I am not sure how to interpret them. For example, the count maps do not make sense to me because the number (count) of dependent variables indicates approximately 15000 incidents happen at some point. However, the maximum number of conventional war was 28, for example.
Attached Files
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35211
#15

16 Feb 2022, 11:41

Sorry, but it is hard to comment without knowing more. You seem to have a big gap in the middle round about early 1969.
Comment

Announcement