Hello,
I am dealing with cross sectional data that has groups within the observations. For example, the dataset of an online firm registered in one country that allows multiple banks from different countries to sell the loans originated on their platforms. Each row in the dataset is one observation (one loan) with several characteristics, the start date, the interest rate, if it has been paid back, etc. Hence, each observation is unique as it represent different loans originated by multiple banks. I do not regard the data as time series as Time-series would be the same loans on several observations over time, daily, monthly, or so. I do not consider the data as panel as well, because I am assessing the performance of one online business (there is only one wave of data). Am I correct in my inference?
Secondly, since different banks sell their loans from different countries, I need to control for their fixed effects. Hence, I included some characteristics of the banks such as size, age, etc. of each bank as control variables. I also control for the location of the banks. I have over 40 banks in the data spanning from 20 countries. Instead of including countries, I classified them based on geographical region such as asia, africa, etc. I have 4 regions in my data and I did this because inclusion of country as factor variable instead of geographical location causes specification errors. I want to control for serial correlation by including cluster robust errors. What variable should I be adding as cluster in vce (cluster clustervar)? I usually see people adding geography. Is it okay to choose any other bank characteristic variable that you controlled for, for example size in my case? Size is different for all banks.
I am dealing with cross sectional data that has groups within the observations. For example, the dataset of an online firm registered in one country that allows multiple banks from different countries to sell the loans originated on their platforms. Each row in the dataset is one observation (one loan) with several characteristics, the start date, the interest rate, if it has been paid back, etc. Hence, each observation is unique as it represent different loans originated by multiple banks. I do not regard the data as time series as Time-series would be the same loans on several observations over time, daily, monthly, or so. I do not consider the data as panel as well, because I am assessing the performance of one online business (there is only one wave of data). Am I correct in my inference?
Secondly, since different banks sell their loans from different countries, I need to control for their fixed effects. Hence, I included some characteristics of the banks such as size, age, etc. of each bank as control variables. I also control for the location of the banks. I have over 40 banks in the data spanning from 20 countries. Instead of including countries, I classified them based on geographical region such as asia, africa, etc. I have 4 regions in my data and I did this because inclusion of country as factor variable instead of geographical location causes specification errors. I want to control for serial correlation by including cluster robust errors. What variable should I be adding as cluster in vce (cluster clustervar)? I usually see people adding geography. Is it okay to choose any other bank characteristic variable that you controlled for, for example size in my case? Size is different for all banks.
Comment