Summing over observations

Hulya Kocyigit

Join Date: Jun 2021

Posts: 17
#1

Summing over observations

22 Apr 2024, 09:59

Hello,

I have data on job ads. I observe whether an ad was active for each month between jan 2017 to dec 2022.

I have the following variables in my dataset: id_ad type_ad location active_jan2017 active_feb2017... active_dec2022, where

id_ad: unique numeric id for each ad
type_ad: numeric ad group identifier
location: numeric location identifier
active_monyear: a dummy variable indicating whether the ad was active on the specified month.

I want to find the total number of active jobs for each ad type in each location for each of 60 months. How can I do that?

Thank you for your attention!
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 29799
#2

22 Apr 2024, 10:12

This is burdensome to do with the data laid out wide, as you have it. It's a one-liner after -reshape-ing to long:

Code:

reshape wide active_, i(id_ad location) j(month) string collapse (sum) active_, by(type_ad location month)

Note: The code assumes that the same id_ad may occur in multiple locations in the same month, but, regardless, in all is appearances it is always has the same value of ad_type. If this is not true, the -reshape- command will fail.

In the future, when asking for help with code, please use the -dataex- command and show example data. Although sometimes, as here, it is possible to give an answer that has a reasonable probability of being correct, this is usually not the case. Moreover, such answers are necessarily based on experience-based guesses or intuitions about the nature of your data. When those guesses are wrong, both you and the person trying to help you have wasted their time as you end up with useless code. To avoid this, a -dataex- based example provides all of the information needed to develop and test a solution.

If you are running version 18, 17, 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.
1 like
Comment
Hulya Kocyigit

Join Date: Jun 2021

Posts: 17
#3

22 Apr 2024, 12:20

Thank you very much, Clyde. I think there is a typo in the code, as you said, the data is already in wide format so we need to reshape it to long.

I have quite big data (about 1 million observations and the code has been running over 2 hours to reshape it. Do you think that there is a way to speed up the process?
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 29799
#4

22 Apr 2024, 12:33

Yes, you are right, I meant -reshape long-.

If you are running the current version of Stata (18), you can try adding the -favor(speed)- option to the -reshape- command. There are also user-written commands that do -reshape-'s job and run faster: -greshape- is part of the -gtools- package. It, and another one, -tolong- are both available from SSC. The syntax for either of those is almost identical to that of -reshape-.
Comment

Announcement

Summing over observations

Comment

Comment

Comment