Thesis data analysis - no observations/error

Jacopo Bertotti

Join Date: Jun 2022

Posts: 5
#1

Thesis data analysis - no observations/error

22 Jun 2022, 13:14

Hello everybody, I am currently writing my thesis to finish my bachelor in business economics. However I am struggling with the data analysis. In my data, participants are assigned randomly to either condition 1 or 2 (both containing of only 1 question). Consequently, in the columns some respondents have an answer and other don't (respondents only answer a question in one condition, not both). When running a one-way ANOVA, Stata reports 'no observations' and when comparing means it reports an error r(2000). I guess it has something to do with the missing data in the columns of both questions, but I am not sure. Could somebody help me out?

Thanks in Advance!
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 29958
#2

22 Jun 2022, 13:34

It is hard to be certain from just a general description, without seeing actual example data and the actual code used and outputs received from Stata. But, if you have a dichotomous treatment condition, and you are then trying to look at differences in responses to a question between groups, and the question is such that only one group responds to it, then you will get only error messages. If you think about it, it isn't even meaningful to speak of between-group differences between responses if one group has no responses! So probably you need to think a bit longer and more clearly about what your research goals are and what analyses will fulfill them.

But it would probably be best if you post back showing example data (using the -dataex- command, please) and showing the actual code and output that you are concerned about. That way we can make sure that there isn't something else going wrong here. In the future, when showing data examples, please use the -dataex- command to do so. If you are running version 17, 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

When asking for help with code, always show example data. When showing example data, always use -dataex-.
1 like
Comment

Jacopo Bertotti

Join Date: Jun 2022
Posts: 5

23 Jun 2022, 04:25

Thanks a lot for your response Clyde Schechter. My research question is: what is the effect of price exposure on purchase intention, and how is this effect influenced by domain knowledge. So in other words, does the trick of hiding the price of a luxury product increase purchase intention? And does this trick still work if someone has a lot of knowledge about that luxury product? I assigned participants of my survey to either an exposed condition (showed picture of luxury watch WITH the price) or a Non-exposed condition (showed picture of luxury watch WITHOUT the price). I asked them 'How likely would you be to buy this watch' to measure the purchase intention. So I have a column with the Non-exposed condition measuring the purchase intention, and a column with the exposed condition measuring the purchase intention. Underneath you will find the data using -dataex- as you suggested. Column 3 and 4 are the ones I'm talking about. If you want more information, let me know!

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input byte(Introduction Sex Age NonExp Exp Q5_1 Q5_2 Q5_3 Q5_4 Q5_5 Q5_6 Q5_7 Q6)
1 1 21 . 2 1 1 1 2 4 2 4 2
1 2 21 . 4 1 1 2 4 4 2 4 4
1 1 21 2 . 4 3 3 3 5 4 4 5
1 1 21 3 . 1 1 1 3 3 4 4 4
1 2 22 . 1 2 1 1 1 3 1 4 1
1 1 21 2 . 3 4 2 3 4 3 4 3
1 2 58 . 2 4 4 3 5 5 5 4 2
1 2 27 . 1 3 2 4 2 2 4 4 3
1 1 24 3 . 2 2 2 4 4 2 4 3
1 1 55 . 1 1 1 1 1 1 1 4 1
1 1 60 1 . 1 1 1 1 1 1 4 2
1 2 47 . 1 1 1 1 3 1 1 4 5
1 2 32 1 . 1 1 1 3 4 4 4 3
1 2 20 2 . 2 2 1 4 1 1 4 1
1 2 28 . 2 1 1 1 2 2 1 4 1
1 2 52 1 . 1 1 1 3 1 1 4 3
1 1 61 1 . 3 1 4 2 3 4 4 5
1 2 60 . 1 1 1 1 1 1 1 4 3
1 2 23 . 1 1 1 1 1 1 2 4 2
1 1 57 2 . 2 3 4 1 3 4 4 3
1 2 58 4 . 1 2 1 1 4 4 4 2
1 2 32 . 1 1 1 2 2 1 4 4 3
1 2 55 1 . 1 1 1 2 2 2 4 3
1 2 21 . 1 1 1 1 4 4 4 4 4
1 1 24 4 . 2 1 2 3 4 3 4 4
1 1 21 . 2 1 1 1 4 2 1 4 2
1 1 20 . 2 1 1 1 2 2 1 4 2
1 2 21 2 . 1 1 2 1 1 1 4 2
1 1 21 4 . 3 1 1 1 1 1 4 1
1 1 19 3 . 1 2 1 3 4 1 4 3
1 1 20 4 . 1 1 2 2 3 1 4 2
1 1 21 . 1 1 1 1 3 3 3 4 2
1 1 36 . 3 1 2 2 5 4 3 4 3
1 1 16 . 4 1 1 2 4 3 4 4 3
1 1 22 2 . 1 2 1 3 4 2 4 3
1 1 27 . 1 2 4 3 4 4 4 4 1
1 1 23 3 . 2 1 1 4 3 2 4 3
1 2 16 . 1 1 1 1 2 1 1 4 1
1 2 25 . 4 2 3 2 4 4 4 4 4
1 1 19 3 . 1 1 1 4 4 2 4 3
1 2 19 . 1 1 2 1 4 1 1 4 3
1 2 19 . 1 1 1 1 4 2 1 4 4
1 1 29 1 . 1 1 1 2 2 1 4 2
1 1 27 3 . 1 1 2 2 4 3 4 4
1 1 22 2 . 1 1 1 1 2 1 4 2
1 1 21 4 . 1 1 1 1 1 1 4 3
1 1 20 . 4 3 4 2 2 4 4 4 4
1 2 21 1 . 3 1 1 2 2 3 4 2
1 2 21 4 . 3 2 1 3 4 3 4 1
1 1 54 1 . 4 5 5 5 4 5 4 5
1 1 21 . 1 4 2 3 5 4 3 4 5
1 1 21 . 1 1 3 3 3 5 4 4 4
1 1 21 2 . 1 1 2 3 4 2 4 4
1 1 28 . 2 1 1 1 1 1 1 4 2
1 2 54 1 . 3 1 3 1 3 2 4 4
1 2 51 1 . 1 1 1 1 1 3 4 4
1 1 56 . 1 1 1 3 1 3 1 4 4
1 1 60 . 1 2 3 2 3 2 2 4 4
1 1 21 2 . 1 1 2 1 2 1 4 4
1 1 22 . 1 1 1 1 1 1 1 4 3
1 2 26 2 . 1 1 1 3 1 1 4 2
1 2 63 1 . 1 1 1 2 1 1 4 3
1 2 63 . 1 1 1 1 4 2 4 4 2
1 1 21 . 4 4 4 3 2 4 4 4 4
1 1 21 3 . 1 1 1 3 2 4 4 2
1 1 14 . 2 3 1 1 3 2 3 4 2
1 1 22 . 2 2 4 3 3 4 4 4 4
1 2 22 3 . 2 3 2 3 3 1 4 2
1 1 20 1 . 2 2 2 4 4 5 4 4
1 2 28 . 2 2 2 3 2 2 4 4 3
1 1 27 1 . 1 1 2 2 1 2 4 3
1 1 24 . 4 2 2 2 5 2 3 4 4
1 1 20 . 1 1 1 1 1 1 4 4 4
1 2 56 1 . 1 1 1 1 1 1 4 3
1 2 20 . 4 2 2 1 3 2 4 4 3
1 2 22 4 . 2 4 2 5 3 5 4 3
1 1 24 . 1 2 1 1 3 2 2 4 1
1 2 20 1 . 2 1 1 1 1 3 4 3
1 2 21 . 1 3 1 1 4 3 2 4 2
1 2 21 1 . 1 1 1 2 1 2 4 1
1 2 21 . 1 1 1 1 2 4 2 4 5
1 2 21 1 . 1 1 2 4 3 1 4 1
1 1 22 . 1 2 1 2 3 1 4 4 2
1 1 22 . 1 1 1 1 2 1 1 4 2
1 1 20 2 . 1 1 1 1 1 1 4 3
1 2 19 4 . 1 1 1 5 5 4 4 1
1 1 19 2 . 4 4 4 2 4 1 4 3
1 1 22 . 2 4 3 3 2 4 1 4 1
1 1 16 4 . 1 1 1 1 2 2 4 1
1 1 18 . 5 1 2 2 2 4 4 4 4
1 1 16 . 1 1 1 1 4 4 4 4 2
1 2 23 4 . 1 1 1 1 1 2 4 5
1 1 13 . 3 2 3 4 5 3 2 4 2
1 1 29 2 . 1 1 1 2 4 1 4 3
1 1 21 . 1 1 1 1 1 1 1 4 1
1 1 15 . 3 2 4 2 5 4 2 4 3
1 1 19 1 . 2 4 2 2 4 2 4 3
1 1 17 1 . 4 4 4 3 5 4 4 5
1 1 27 3 . 3 3 3 3 3 3 4 2
1 1 19 . 1 1 3 2 3 4 2 4 4
end

Comment

Clyde Schechter

Join Date: Apr 2014

Posts: 29958
#4

23 Jun 2022, 10:20

I see. You didn't show the code and output that prompted you to start this thread, but I'm going to guess that it was something like -ttest Exp = NonExp-, as that would definitely get you a "no observations" error message. The reason is this: in every observation in your data set, either Exp or NonExp has a missing value. Observations with a missing value in any variable mentioned in a command are excluded from the calculations. So every observation in your data set was excluded, and Stata found no observations remaining to try to compare Exp with NonExp.

What you have is, for Stata, an unworkable data layout. It has to be revised so that there is a single variable that designates, in each observation, which condition (exposed or unexposed) the person is in, and another variable that includes their responses.

I'm not sure what all the variables you show in the data are. You refer to the "third and fourth columns" but those are Age and NonExp, which doesn't seem right. Perhaps you meant Q5_3 and Q5_4. But those don't match your description of having values for only the exposed, or only the non-exposed people. And there is this "variable" Introduction, which, at least in the example, is really a constant, and so is pointless. I"m going to assume that what you mean is this:

NonExp has the intention response for those people who were not exposed to the price, and is missing for those that were. Exp has the intention response for those who were exposed to the price, and is missing for those that were not.

So I'm going to just ignore the Q* variables, Sex, Age, and Introduction on the assumption they have nothing to do with the immediate problem, and focus on re-organizing the information in Exp and NonExp:

Code:

assert missing(Exp, NonExp) label define exposed 0 "No Price Exposure" 1 "Price Exposure" gen byte exposed:exposed = !missing(Exp) gen byte intention_response = max(Exp, NonExp) drop Exp NonExp

After that, depending on whether you want to treat your intention response variable as discrete, ordinal, or continuous, here are four ways you might contrast intention between the two groups:

Code:

tab intention_response exposed, col chi2 // DISCRETE ranksum intention_response, by(exposed) // ORDINAL ttest intention_response, by(exposed) // CONTINUOUS regress intention_response i.exposed // ANOTHER WAY TO DO CONTINUOUS, EQUIVALENT

Speaking in general terms, from the way you have organized the data and described it, I have the sense that you are thinking about Stata as if it were a spreadsheet. It emphatically is not a spreadsheet, and thinking that way will get you into the kind of trouble you have already encountered and more. Spreadsheets are designed to arrange data in ways that enhance their comprehensibility to the human eye and brain; their ability to actually analyze data is limited. Stata is a statistical package, with far greater analytic ability. But the data organizations that work best with Stata are often difficult for human visual understanding. When working in Stata, you should try to forget you have ever used a spreadsheet. The data browser/editor may look a bit like a spreadsheet, but always remember that it isn't one and that visually attractive data arrangements will often obstruct analysis.
Comment
Jacopo Bertotti

Join Date: Jun 2022

Posts: 5
#5

23 Jun 2022, 13:21

Wow, you just solved all my problems haha. Thanks! I appreciate it a lot, how you solved my problem with my 'terrible' description of my issue. Now the data is really clear, and it does not give error signs anymore. My thesis supervisor is not responding to my emails, so I don't have a lot of support from his side. So now that I am already communicating to someone with a lot of knowledge I want to ask the following: What is the best way for me to check my hypothesis H1: The Purchase Intention for exclusive products will be higher when the prices are not exposed and, H2: this effect will disappear when domain knowledge is higher. My supervisor told me to do a one-way anova for the main effect, and a two-way anova for the interaction. What are your thoughts on this?

For your information (since I was not clear enough), Q5_1 till 6 are statements on a five-item likert scale (strongly disagree - strongly agree) to test domain knowledge. Q6 is a question where I presented participants with five different brands of luxury watches, and asked how many of these are known to you (1-5).

What would be the best way to answer the hypothesis in your opinion?

Thanks in Advance!
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 29958
#6

23 Jun 2022, 13:51

Your question involves more knowledge of the substance of your problem than it does statistics. Both of these hypothesis tests would involved a model including interaction terms. But, there is the question of how you decide which products are "exclusive" (and how that is represented in your data) and how you want to operationalize knowledge: presumably you will derive it from the answers to questions Q5_*, and maybe Q6 is also part of that, but there are many ways to do that. I do not have any content knowledge about marketing, and simply cannot advise you on these substantive issues.
Comment

Announcement