Comparing correlation coefficients between two continuous variables by the third categorical variable

Pankaj Pokhrel

Join Date: May 2021

Posts: 38
#1

Comparing correlation coefficients between two continuous variables by the third categorical variable

01 Apr 2024, 13:48

I want to compare the correlation coefficient between the order of income and the order of risk across different vegetables grown. The income and risk go from 1 to 5; 1 being the highest level of income and the highest level of risk and 5 being the lowest level of income and lowest level of risk. I want to compare income vs risk relationship for each variable. I want a table with vegetables in one column and a correlation coefficient between income and risk in the other. I could have used bysort option to subset the vegetable, but the list of vegetables is very very long, and bysort returns the stored result for the last subset only. Could anybody kindly suggest how this could be done? Thanks!!
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 29953
#2

01 Apr 2024, 14:29

I think the simplest way to do that, assuming your data look as I imagine them to, is:

Code:

rangestat (corr) income risk, by(vegetable) interval(risk . .)

-rangestat- is written by Robert Picard, Nick Cox, and Roberto Ferrer. It is available from SSC.

If this does not do what you want, when posting back, please supply more information. Most important, show example data, and use the -dataex- command to do it. If you are running version 18, 17, 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

Then show what the exact code you used (including any modifications of what was proposed here) and show the results you got. Then, unless it is blatantly obvious, explain how what you got differs from what you wanted.

Last edited by Clyde Schechter; 01 Apr 2024, 14:32.
Comment

Pankaj Pokhrel

Join Date: May 2021
Posts: 38

01 Apr 2024, 23:37

Hi Clyde,
Thank you for letting me know about the use of rangestat command. However, my dataset has more than one observation for each vegetable type. Nonetheless, observations are unique by farmer_id and vegetable. Below is the example dataset.

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input int farmer_id float income byte risk float vegetable
10305 2 3  5
11004 5 4 10
11202 5 4  5
11501 1 2  1
11901 3 2  2
12801 4 2 28
13305 5 2 23
20201 1 2  1
20204 5 4 10
20206 5 2  5
20302 5 4 23
20413 3 2 29
30102 4 3  3
30208 1 2  1
30410 2 3  3
end
label values risk A1_3
label def A1_3 2 "High Risk", modify
label def A1_3 3 "Medium Risk", modify
label def A1_3 4 "Low Risk", modify
label values vegetable A1_2
label def A1_2 1 "Tomato", modify
label def A1_2 2 "Potato", modify
label def A1_2 3 "Cauliflower", modify
label def A1_2 5 "Cabbage", modify
label def A1_2 10 "Beans", modify
label def A1_2 23 "Pumpkin", modify
label def A1_2 28 "Brinjal/Eggplant/Aubergine", modify
label def A1_2 29 "Corriander", modify

Comment

Clyde Schechter

Join Date: Apr 2014

Posts: 29953
#4

02 Apr 2024, 10:08

However, my dataset has more than one observation for each vegetable type.

Well, you need to think carefully about exactly what correlations you want to calculate. If you wish to correlate risk and income separately for each vegetable, and including the same farmer more than once is OK in this calculation, then the code shown in #2 is fine. If you wish a separate correlation for each farmer and vegetable, just change the -by()- option to -by(farmer_id vegetable)-.
Comment
Pankaj Pokhrel

Join Date: May 2021

Posts: 38
#5

03 Apr 2024, 02:24

Thank you Clyde. I figured out the way. The way is to calculate statistics (here correlation) using a key variable with the same upper and lower bounds in the interval option.
Comment

Announcement

Comparing correlation coefficients between two continuous variables by the third categorical variable

Comment

Comment

Comment

Comment