Calculating Cronbach's Alpha with data in long format

Hanna Aileen

Join Date: Jul 2017

Posts: 7
#1

Calculating Cronbach's Alpha with data in long format

18 Jul 2017, 04:28

Hi everyone,

I am currently working with paneldata. I use them in the long format and am now trying to calculate Cronbach's alpha for some z standardized items. Not all of them were observed in the same year and there are some missings (which are correctly defined as missings).
I have identified the data as paneldata (tsset) and then used
alpha item1 item2 item3 ..., as is
For some combinations of items the error r2000 no observations occurs. If the problematic item is excluded (alpha item2 item3...., as is), stata gives some result.
Then I checked the alpha for the same dataset but reshaped to the wide format (item12000 item12001 etc.). Now I calculated again alpha (also with the problematic items) but only with the years in which the single items had been observed. (alpha item12000 item22001 item32001...., as is). Now no error occured and the size of alpha was different to what it was in the long format (when checked with the same items).

First I thought that there weren't enough observations/too many missings for some of the combinations of items but in the wide format that wasn't the case.

Do you know how the different sizes and the error can be explained?
Do I have to define the dataset differently or is it not possible to calculate alpha for different years?

Thanks in advance for your help!!
Hanna
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 29846
#2

18 Jul 2017, 08:23

I find your description of the problem unclear. I think you should read the FAQ for excellent advice about how to post clear questions that are likely to attract a helpful, and timely, response. Pay particular attention to #12, which describes how to helpfully post example data (-dataex-) and how to post Stata code and output (code delimiters). Your question would be much more understandable if you provided example data and also showed the exact code that you ran and the corresponding Stata output.
Comment

Hanna Aileen

Join Date: Jul 2017
Posts: 7

18 Jul 2017, 09:22

Hi Clyde,
thanks for your remarks! I hope that it is now easier to understand.
My dataset looks like that:

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input double(ID YEAR) float(item1 item2 item3 item4 item5 item6 item7)
201 2004         .         .         .          .         .         .         .
201 2005         . 1.1271297  -.924086   .6443151  .9743385 -1.801065 -.3586388
201 2006         .         .         .          .         .         .         .
201 2007         .         .         .          .         .         .         .
201 2008         .         .         .          .         .         .         .
203 2004         .         .         .          .         .         .         .
203 2005         .  .3799749  -.924086 -.04950336  .9743385 1.3115923 .30973095
203 2006         .         .         .          .         .         .         .
203 2007         .         .         .          .         .         .         .
203 2008         .         .         .          .         .         .         .
602 2004         .         .         .          .         .         .         .
602 2005         . -.3671799 -.3311043 -.04950336 .38599825  .6890609 -.3586388
602 2006         .         .         .          .         .         .         .
602 2007         .         .         .          .         .         .         .
602 2008         .         .         .          .         .         .         .
602 2009         .  .3799749 .26187727   .6443151 .38599825 .06652937 .30973095
602 2010 -1.262279         .         .          .         .         .         .
602 2011         .         .         .          .         .         .         .
602 2012         .         .         .          .         .         .         .
603 2004         .         .         .          .         .         .         .
end

The items 1-7 are standardized. It is a (unbalanced) panel dataset so I used:

Code:

tsset

to compute the internal consistency of the variables 1-7 I used:

Code:

alpha item1 item2 item3 item4 item5 item 6 item7, as is

I then get the error:

Code:

no observations
r (2000);

When I use the same code but without the first item I get a result.

Code:

alpha item2 item3 item4 item5 item 6 item7, as is

After reshaping the data into this format:

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input double ID float(item12004 item22005 item32005 item42005 item52005 item62005 item72005)
 201 .  1.1271297  -.924086   .6443151   .9743385  -1.801065  -.3586388
 203 .   .3799749  -.924086 -.04950336   .9743385  1.3115923  .30973095
 602 .  -.3671799 -.3311043 -.04950336  .38599825   .6890609  -.3586388
 603 .          .         .          .          .          .          .
 604 .          .         .          .          .          .          .
 605 .          .         .          .          .          .          .
 901 . -1.1143347  -.924086 -.04950336  .38599825 -1.1785337 -1.0270085
1501 .   .3799749 -.3311043 -1.4371402 -.20234205 -.55600214 -1.0270085
1601 .  1.1271297  -.924086  -.7433218  -1.967363 -.55600214 -1.0270085
1602 .   .3799749  2.040822  1.3381335  -.7906823 -.55600214  1.6464704
1603 .  1.1271297  .8548589 -.04950336 -.20234205 -.55600214  -.3586388
1701 .   .3799749  .8548589 -2.1309586  1.5626788  1.3115923  1.6464704
1704 .   .3799749 1.4478406 -2.1309586   .9743385  1.3115923 -1.0270085
1705 .  1.1271297  .8548589 -.04950336   .9743385   .6890609  .30973095
1901 .          .         .          .          .          .          .
1903 .          .         .          .          .          .          .
2301 .   .3799749 .26187727 -1.4371402  -.7906823 -1.1785337  -.3586388
2302 .  -.3671799  .8548589 -.04950336   .9743385  .06652937   .9781007
2304 .  1.1271297 .26187727  -.7433218  -.7906823  .06652937   .9781007
2305 .          .         .          .          .          .          .
end

and using the same code again:

Code:

alpha item12004 item22005 item32005 item42005 item52005 item62005 item72005, as is

no error occurs. When again using the same code but without the first item (like before in the long format version):

Code:

alpha item22005 item32005 item42005 item52005 item62005 item72005, as is

The size of alpha is different to what it was before in the long format version with the same data.

Sorry for the confustion before!
Does somebody know how the difference may be explained?

Comment

Clyde Schechter

Join Date: Apr 2014

Posts: 29846
#4

18 Jul 2017, 10:08

Well, you're doing your -alpha-s on different data, so why would you expect to get the same result?

In your first -alpha item2 item3 item4 item5 item6 item7, asis-, the data includes all observations for which the variables item2 through item7 are all non-missing. That is mostly, in your example, data with YEAR == 2005, but not exclusively. For example, the 16th observation in your original data has non-missing values for all of these items, but it has YEAR == 2009. But in your -alpha item22005 item32005 item42005 item52005 item62005 item72005, asis- command, only values originally from YEAR == 2005 are included.
Comment
Hanna Aileen

Join Date: Jul 2017

Posts: 7
#5

18 Jul 2017, 11:52

In that case this is true for I reduced the dataset to show the structure.
However the problem is still that when using in the wide format all the items with the years for which there are observations and excluding all other years, eg:

Code:

alpha item12010 item22005 item22009 item32005 item32009 etc. , as is

I get an alpha and when using the long format (which I prefer) and the same items but for all years

Code:

alpha item1 item2 item3 etc., as is

I get for some combinations of items the mentioned error

Code:

no observations r(2000);

And that is why I was wondering whether it is not possible to compute alpha when not all of the items are observed in the same years.
Because I would like to calculate my alpha with the data in the long format without reshaping it to the wide format.

Last edited by Hanna Aileen; 18 Jul 2017, 11:55.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 29846
#6

18 Jul 2017, 12:08

And that is why I was wondering whether it is not possible to compute alpha when not all of the items are observed in the same years.

No, it is not possible with the pattern of missing data you show.

And using the wide layout is not a satisfactory workaround. The alpha that you calculate that way fails to recognize that item22005 and item22009 are in fact the same item--it treats them as separate items, which is a mis-specification of what's actually going on here.
Comment
Hanna Aileen

Join Date: Jul 2017

Posts: 7
#7

25 Jul 2017, 04:18

Thanks Clyde for your help! I will then try to find another way.
Comment

Announcement