How to make arrays in Stata from HTML JavaScript arrays?

Lisa Spoelstra

Join Date: Jan 2018

Posts: 24
#16

06 Jun 2018, 09:18

For some locals I need to make 121 observations and for some I only have 15 observations. As you can see here: view-source:http://www.lifemath.net/cancer/breas...rapy/index.php
So if I start with this:
clear set obs 100 What number do I need to use? I need to run it on an other dta-file.

Sorry for the ambiguity.
Comment
Stephen Jenkins

Join Date: Apr 2014

Posts: 1424
#17

06 Jun 2018, 11:49

Lisa: your URL took me to a Breast Cancer Treatment Outcome Calculator and I couldn't see directly from there how your data look.

Taking up previous comments: how did you extract the data shown in the example arrays? Is it possible to extract the data differently so that e.g. you have a column of numbers for each of your arrays (as per below).. The columns may be of different length. Your examples are helping (thank you), but not quite helping sufficiently for observers like me to figure out an algorithm to try and solve your problem. You talk about numbers and "run it on another data file". How would you link the information in one file with the appropriate observation in the other data file? Suppose that alongside the 'deathrate' variable there was another column of distinct numbers -- let's call this additional variable "age" (so I interpret deathrate as age-specific deathrates). Now suppose that there is a third column which contains some other information (your "z" variable?) which is linked one-to-one with each distinct value of age. In this sort of data format there is probably some straightforward algorithm for deriving a new fourth variable.

Code:

deathrate 0 .006083 .000414 .000301 .000218 .000172 .000158 .000141 .000128 .000117 .000108 .000105 .00011 .000132 .000173 .000228 .000292 .000355 .000407 .00044 .000456 .000471 .000489 .000501 .000509 .000515 .000506 .000516 .000531 .000552 .000578 .000609 .000646 .000693 .000753 .000827 .000864 .000949 .001047 .001157 .001273 .001393 .001514 .001639 .001774 .001919 .002045 .002211 .002383 .00256 .002746 .002949 .003176 .003431 .003718 .004039 .004462 .004859 .005304 .005809 .006384 .00706 .007817 .008596 .00935 .0101 .011305 .012297 .013426 .014706 .016123 .017603 .019196 .021033 .023175 .025585 .028661 .031295 .034315 .037906 .042094 .04667 .051554 .057062 .063411 .070761 .079054 .087065 .095796 .105294 .115605 .126771 .138833 .151829 .165787 .180734 .196684 .213644 .231608 .25056 .270467 .29546625 .31785796 .3413658 .36598949 .25056 .41855346 .44646215 .47542285 .34136579 .53635949 .56824611 .60100582 .44646215 .66887997 .70384251 .73937455 .77538108 .60100582 .84840204 .88519205 .92200867 .73937455 .99521085 .81175992 .84840204 .88519205 .92200867 .95872534 .99521085
Comment
David Fisher

Join Date: Apr 2014

Posts: 407
#18

07 Jun 2018, 01:28

Ok I've re-read your earlier messages and have read through the source code (@Stephen Jenkins you need to follow her link and then "view the source" from within your browser).
I can see why you're confused. I really don't think Stata is a particularly good tool for this project. But if you want to see it through, here are some more thoughts.

Firstly, the problem of your arrays of differing sizes. The source code makes it clear that "nvsr_death_prob_yearly" and "nvsr_life_expect" are based on whole-number ages, starting at age=0. So although they are different sizes (121 and 124 by my count), it doesn't matter since they both start at age=0. You can read them into Stata as previously described. I would also create an "age" variable to clarify this as you go along, e.g. using "gen int age = _n - 1".

The other array, "L_breastcancer_distribution", represents estimated probabilities of death in each of the 15 years following diagnosis. Hence, its first entry needs to line up with `age_input' + 1, where `age_input' is the specific, single age inputted by the user. Assuming that nvsr_death_prob_yearly, nvsr_life_expect and age (i.e. the array of all possible ages from 0 to 123) are already present in the dataset as described above, here is one way of generating "L_breastcancer_distribution":

Code:

local age_input = 80 local L_breastcancer_distribution 0.031545 0.076709 0.094483 0.091899 0.083154 0.079074 0.073931 0.069022 0.066901 0.059433 0.061200 0.058224 0.054612 0.052216 0.047595 gen L_breastcancer_distribution = . tokenize `L_breastcancer_distribution' forvalues i=1/15 { qui replace L_breastcancer_distribution = ``i'' in `=`age_input' + 1 +`i'' }

Best wishes,

David.
Comment

Lisa Spoelstra

Join Date: Jan 2018
Posts: 24

#19

07 Jun 2018, 01:39

Stephen, thanks for your suggestion. I cannot share the data file, but we have 9000 persons in it with 100+ variables and 900000+ observations. With the code in JavaScript we can calculate their life expectancy etc. What I need to do is to transcribe the JavaScript code into a Stata code so that we can run the code in Stata for the 9000 persons and calculate their life expectancy etc.

With the help from you, I tried a lot of things. And I think I am nearly there. However, something goes wrong if I want to code this, for example. Then Stata says it is an invalid syntax.

Code:

gen nvsr_death_prob_yearly = .
local nvsr_death_prob_yearly 0 0.006083 0.000414 0.000301 0.000218 0.000172 0.000158 0.000141 0.000128 0.000117 0.000108 0.000105 0.000110 0.000132 0.000173 0.000228 ///
0.000292 0.000355 0.000407 0.000440 0.000456 0.000471 0.000489 0.000501 0.000509 0.000515 0.000506 0.000516 0.000531 0.000552 0.000578 0.000609 0.000646 0.000693 ///
0.000753 0.000827 0.000864 0.000949 0.001047 0.001157 0.001273 0.001393 0.001514 0.001639 0.001774 0.001919 0.002045 0.002211 0.002383 0.002560 0.002746 0.002949 ///
0.003176 0.003431 0.003718 0.004039 0.004462 0.004859 0.005304 0.005809 0.006384 0.007060 0.007817 0.008596 0.009350 0.010100 0.011305 0.012297 0.013426 0.014706 ///
0.016123 0.017603 0.019196 0.021033 0.023175 0.025585 0.028661 0.031295 0.034315 0.037906 0.042094 0.046670 0.051554 0.057062 0.063411 0.070761 0.079054 0.087065 ///
0.095796 0.105294 0.115605 0.126771 0.138833 0.151829 0.165787 0.180734 0.196684 0.213644 0.231608 0.250560 0.270467 0.295466253 0.317857963 0.341365799 0.365989493 ///
0.39172287 0.418553461 0.44646215 0.475422854 0.50540224 0.53635949 0.568246115 0.601005825 0.634574456 0.66887997 0.703842512 0.73937455 0.775381082 0.811759916 0.848402039 ///
0.88519205 0.92200867 0.958725335 0.995210852
tokenize `nvsr_death_prob_yearly'
forvalues 1/121 {
    replace nvsr_death_prob_yearly = ``i'' in `i'
}

As already said, I need to do something with

Code:

clear
obs 121

But when I do that all the other 900000+ are deleted...

And thereafter I need to make a variable with only 15 observations, like this:

Code:

gen L_breastcancer_distribution = .
local L_breastcancer_distribution 0.03154 0.076709 0.094483 0.091899 0.083154 0.079074 0.073931 0.069022 0.066901 0.059433 0.061200 0.058224 0.054612 0.052216 0.047595
tokenize `L_breastcancer_distribution'
forvalues 1/15 {
replace L_breastcancer_distribution = ``i'' in `i'
}

Then I need to start again with this?:

Code:

clear
obs 15

I am sorry, but I am more used to SAS and no one in my surrounding (including my supervisor) has done this before...

Last edited by Lisa Spoelstra; 07 Jun 2018, 01:43.

Comment

Jorrit Gosens

Join Date: Jan 2015

Posts: 1019
#20

07 Jun 2018, 02:04

I think you need to think through, and/or describe here, what the use of the new list of 15 and 121 observations are.
If you have a dataset with info your 90.000 individuals, I do not see the point of making two new variables with 15 adn 121 observations, and the rest missings. There also doenst seem to be a relation between the 1st values of these two series and the first person in your set of person data.
My guess is you should either be trying to create 15 or 121 new variables, or you should keep this as a separate dataset from your personal info dataset for the moment.
Comment
Lisa Spoelstra

Join Date: Jan 2018

Posts: 24
#21

07 Jun 2018, 02:13

David, thanks for your suggestion. However, the code you suggested does not work. "L_breastcancer_distribution" has nothing to do with the age of the person, but with the years after diagnosis. And if I run your suggested code, I only get the values in order of how i put it in the local at person 82 - 96. The variable age is already in the datafile.
Comment

Lisa Spoelstra

Join Date: Jan 2018
Posts: 24

#22

07 Jun 2018, 02:21

Jorrit, I need to transcribe this code from JavaScript:

Code:

 /********************************************************************


* Array of probability of dying between year x and x+1, where x is age, starting at age x=0, taken from:


* -National Vital Statistics Reports Vol 54 No 14, April 19, 2006, United States Life Tables 2003,


* Table 3. Life table for females: United States, 2003.


* adjusted to exclude the probability of dying from breast cancer using data from:


* -National Vital Statistics Reports Vol 55, No 19, August 21, 2007, Deaths: Final Data for 2004,


* Table 3. Number of deaths and death rates by age, race, and sex: United States, 2004


* Table 10. Number of deaths from 113 selected causes by age: United States, 2004


* q(x)


****************************************************************************/


var nvsr_death_prob_yearly = new Array(0, 0.006083, 0.000414, 0.000301, 0.000218, 0.000172, 0.000158, 0.000141, 0.000128, 0.000117, 0.000108, 0.000105, 0.000110, 0.000132, 0.000173, 0.000228, 0.000292, 0.000355, 0.000407, 0.000440, 0.000456, 0.000471, 0.000489, 0.000501, 0.000509, 0.000515, 0.000506, 0.000516, 0.000531, 0.000552, 0.000578, 0.000609, 0.000646, 0.000693, 0.000753, 0.000827, 0.000864, 0.000949, 0.001047, 0.001157, 0.001273, 0.001393, 0.001514, 0.001639, 0.001774, 0.001919, 0.002045, 0.002211, 0.002383, 0.002560, 0.002746, 0.002949, 0.003176, 0.003431, 0.003718, 0.004039, 0.004462, 0.004859, 0.005304, 0.005809, 0.006384, 0.007060, 0.007817, 0.008596, 0.009350, 0.010100, 0.011305, 0.012297, 0.013426, 0.014706, 0.016123, 0.017603, 0.019196, 0.021033, 0.023175, 0.025585, 0.028661, 0.031295, 0.034315, 0.037906, 0.042094, 0.046670, 0.051554, 0.057062, 0.063411, 0.070761, 0.079054, 0.087065, 0.095796, 0.105294, 0.115605, 0.126771, 0.138833, 0.151829, 0.165787, 0.180734, 0.196684, 0.213644, 0.231608, 0.250560, 0.270467, 0.295466253,0.317857963,0.341365799,0.365989493,0.39172287,0.418553461,0.44646215,0.475422854,0.50540224,0.53635949,0.568246115,0.601005825,0.634574456,0.66887997,0.703842512,0.73937455,0.775381082,0.811759916,0.848402039,0.88519205,0.92200867,0.958725335,0.995210852);





/********************************************************************


* A 15-part step function (L_breastcancer_distribution) to represent the fraction


* of deaths that occurs in each of the 15 years after diagnosis.


* This is the probability density function of the SEER cohort from which we derived our parameters,


* normalized to 1 at 15 years.


***************************************************************************/


var L_breastcancer_distribution = new Array(0.031545, 0.076709, 0.094483, 0.091899, 0.083154, 0.079074, 0.073931, 0.069022, 0.066901, 0.059433, 0.061200, 0.058224, 0.054612, 0.052216, 0.047595);





/********************************************************************


* Expectation of life in years at age x, starting at age x=0, taken from:


* -National Vital Statistics Reports Vol 54 No 14, April 19, 2006, United States Life Tables 2003,


* Table 3. Life table for females: United States, 2003.


* adjusted to exclude the probability of dying from breast cancer using data from:


* -National Vital Statistics Reports Vol 55, No 19, August 21, 2007, Deaths: Final Data for 2004,


* Table 3. Number of deaths and death rates by age, race, and sex: United States, 2004


* Table 10. Number of deaths from 113 selected causes by age: United States, 2004


* e(x)


****************************************************************************/


var nvsr_life_expect = new Array(80.5, 80.0, 79.0, 78.0, 77.1, 76.1, 75.1, 74.1, 73.1, 72.1, 71.1, 70.1, 69.1, 68.1, 67.2, 66.2, 65.2, 64.2, 63.2, 62.3, 61.3, 60.3, 59.4, 58.4, 57.4, 56.4, 55.5, 54.5, 53.5, 52.6, 51.6, 50.6, 49.7, 48.7, 47.7, 46.8, 45.8, 44.8, 43.9, 42.9, 42.0, 41.1, 40.1, 39.2, 38.3, 37.3, 36.4, 35.5, 34.6, 33.6, 32.7, 31.8, 30.9, 30.0, 29.2, 28.3, 27.4, 26.5, 25.7, 24.8, 24.0, 23.1, 22.3, 21.5, 20.7, 19.9, 19.1, 18.4, 17.6, 16.9, 16.1, 15.4, 14.7, 14.0, 13.3, 12.7, 12.0, 11.4, 10.8, 10.2, 9.6, 9.0, 8.5, 8.0, 7.5, 7.0, 6.6, 6.2, 5.8, 5.4, 5.0, 4.7, 4.4, 4.1, 3.8, 3.5, 3.3, 3.1, 2.8, 2.6, 2.5, 2.3, 2.1, 2.0, 1.8, 1.7, 1.5, 1.4, 1.2, 0.8, 0.4, 0.2, 0.1, 0.05, 0.03, 0.01, 0.006, 0.003, 0.001, 0.002, 0.0008);

And

Code:

 /*********************************************************************


* STEPs 2.c, 2.d, & 2.e calculate cancer death rate in each of the 15 years following diagnosis


* Calculates yearly lethalities due to breast cancer and other causes


***************************************************************************/


for (i=1; i<=15; i++) {


//STEP 2.c calculates cancer death distribution by multiplying 15yr KM cancer death rate by expected BRCA yearly lethality


//percentage of overall cancer deaths occuring in the given year is computed, and cumulatively summed


cancer_death_dist_cumm[i] = cancer_death_dist_cumm[i-1] + L_breastcancer_distribution[i-1]*L_breastcancer_KM;


//cancer-specific hazard is computed as the chance of cancer death divided by cancer-specific survival to that point


cancer_death_hazard[i] = L_breastcancer_distribution[i-1]*L_breastcancer_KM / (1-cancer_death_dist_cumm[i-1]);





L_breastcancer_death_yearly[i]=remaining_percentage[i-1] * cancer_death_hazard[i];


//STEP 2.d calculates non-BRCA death rate by multiplying the fraction of patients not dying of cancer by the yearly risk of death due to non-cancer causes for the given age


if (age==0){


L_nonbreastcancer_prob[i]=0;


} else {


L_nonbreastcancer_prob[i]=nvsr_death_prob_yearly[i+age];


}


L_nonbreastcancer_death_yearly[i]=(remaining_percentage[i-1] - L_breastcancer_death_yearly[i]) *L_nonbreastcancer_prob[i];


//STEP 2.e calculates overall death rate by adding breast cancer deaths to non-breast cancer deaths


L_overall_death_yearly[i]=L_breastcancer_death_yearly[i]+L_nonbreastcancer_death_yearly[i];


remaining_percentage[i]=remaining_percentage[i-1]-L_overall_death_yearly[i];


}





/********************************************************************


* STEP 2.f Calculate 15 values for cumulative breast cancer, non-breast cancer, and total death rates by summing the respective yearly values computed in the steps above.


****************************************************************************/





for(i=1;i<=15;i++) {


L_cancer_death_cumm[i]=L_cancer_death_cumm[i-1]+L_breastcancer_death_yearly[i];


L_noncancer_death_cumm[i]=L_noncancer_death_cumm[i-1]+L_nonbreastcancer_death_yearly[i];


L_overall_death_cumm[i]=L_overall_death_cumm[i-1]+L_overall_death_yearly[i];


}


/********************************************************************


* STEP 3 Calculate the mean number of years of life left that can be expected for the cancer patient


****************************************************************************/


/********************************************************************


* STEP 3.a Calculate the life expectancy for the cancer patient by multiplying the chance of dying in each of the years 1-15 by the number of years survived to that point. Then add the NVSR life expectancy for people 15 years older than the patient's current age, multiplied by the patients chance of surviving 15 years.


****************************************************************************/


calc_life_expectation = 0;


for (i=1; i<=15; i++){


calc_life_expectation = calc_life_expectation + L_overall_death_yearly[i] * (i-0.5);


}


calc_life_expectation = calc_life_expectation + (1 - L_overall_death_cumm[15]) * (nvsr_life_expect[age + 15] +15)





/********************************************************************


* STEP 3.b The program calculates the expected years of life lost due to cancer, by subtracting the calculated life expectancy (step 3.a) from the NVSR-given life expectancy for the specified age.


****************************************************************************/





expect_years_life_lost = nvsr_life_expect[age] - calc_life_expectation;





/***************************************************************


* Determine whether projections exceed 100 years of age, and remove such projections-- data is not projected to ages above 100


**************************************************************/


age_difference = 100-age;





if (age_difference<15){


for (i=age_difference; i<=15; i++) {


L_cancer_death_cumm[i]=L_cancer_death_cumm[age_difference];


L_noncancer_death_cumm[i]=L_noncancer_death_cumm[age_difference];


L_overall_death_cumm[i]=L_overall_death_cumm[age_difference];


}


}

etc.

Into Stata.. Which doesn't work so far.

Comment

Stephen Jenkins

Join Date: Apr 2014
Posts: 1424

#23

07 Jun 2018, 02:31

Lisa: you have some typos in your forvalues syntax. The following (which also incorporates David Fisher's helpful suggestions) runs without error. But questions remain (see below)

Code:

clear all
set obs 150 // some large number >= length longest array

gen int age = _n - 1

gen nvsr_death_prob_yearly = .
local nvsr_death_prob_yearly 0 0.006083 0.000414 0.000301 0.000218 0.000172 0.000158 0.000141 0.000128 0.000117 0.000108 0.000105 0.000110 0.000132 0.000173 0.000228 ///
0.000292 0.000355 0.000407 0.000440 0.000456 0.000471 0.000489 0.000501 0.000509 0.000515 0.000506 0.000516 0.000531 0.000552 0.000578 0.000609 0.000646 0.000693 ///
0.000753 0.000827 0.000864 0.000949 0.001047 0.001157 0.001273 0.001393 0.001514 0.001639 0.001774 0.001919 0.002045 0.002211 0.002383 0.002560 0.002746 0.002949 ///
0.003176 0.003431 0.003718 0.004039 0.004462 0.004859 0.005304 0.005809 0.006384 0.007060 0.007817 0.008596 0.009350 0.010100 0.011305 0.012297 0.013426 0.014706 ///
0.016123 0.017603 0.019196 0.021033 0.023175 0.025585 0.028661 0.031295 0.034315 0.037906 0.042094 0.046670 0.051554 0.057062 0.063411 0.070761 0.079054 0.087065 ///
0.095796 0.105294 0.115605 0.126771 0.138833 0.151829 0.165787 0.180734 0.196684 0.213644 0.231608 0.250560 0.270467 0.295466253 0.317857963 0.341365799 0.365989493 ///
0.39172287 0.418553461 0.44646215 0.475422854 0.50540224 0.53635949 0.568246115 0.601005825 0.634574456 0.66887997 0.703842512 0.73937455 0.775381082 0.811759916 0.848402039 ///
0.88519205 0.92200867 0.958725335 0.995210852

di "`nvsr_death_prob_yearly'"

tokenize `nvsr_death_prob_yearly'


forvalues i = 1/121 {
    replace nvsr_death_prob_yearly = ``i'' in `i'
}

local age_input = 80
gen L_breastcancer_distribution = .
local L_breastcancer_distribution 0.03154 0.076709 0.094483 0.091899 0.083154 0.079074 0.073931 0.069022 0.066901 0.059433 0.061200 0.058224 0.054612 0.052216 0.047595
tokenize `L_breastcancer_distribution'

forvalues i = 1/15 {
replace L_breastcancer_distribution = ``i'' in `=`age_input' + 1 +`i''
}

describe
summarize
list in 1/121

The remaining mystery to me is the connection between these calculations and your 90,000 individuals (which I think is what Jorrit is also asking). I think you need to explain how you wish to make the connection. To return a theme in my earlier message: we now have a dataset keyed on age, and I assume that in your data set on individuals you have information on age. Can you not save the dataset just created and then -merge- it with the dataset on individuals?

Are you taking time out from your day job? https://www.imdb.com/name/nm4816771/

Comment

Jorrit Gosens

Join Date: Jan 2015

Posts: 1019
#24

07 Jun 2018, 02:45

In rough terms, I believe that you are trying to make a complicated calculation, using some personal data such as age etc, and multiply (or whatever) with corresponding factors from your arrays.
That means you should either create 15 and 121 new variables, that repeat these factors for each of the observations in your dataset with personal data, or (probably better) you keep it in a separate dataset as Stephen and David have given code for. You then create a small number of new variables merging m:1 the personal data set with this new dataset, so that Stata retrieves e.g., the corresponding risk factor with the age for each observation. When you have all the correct values for these factors in a new variable in your personal dataset, you can type up the calculation in terms of variables in your do file.

This is of course still somewhat abstract, and not a literal translation of your javascript into Stata code, but at least I can say you shouldnt create 2 new variables in your personal dataset with the arrays in long form, nor should you reduce the personal data set to 15 or 121 observations, as this just leads to Stata throwing away a lot of data
Comment

Lisa Spoelstra

Join Date: Jan 2018
Posts: 24

#25

07 Jun 2018, 05:21

Stephen and Jorrit, I tried your suggestions for the arrays and now I have a new dataset looking like this:

age	nvsr_death_prob_yearly	L_breastcancer_distribution	nvsr_life_expect	L_breastcancer_percentage	L_nonbreastcancer_prob	.....
0	0	-	80.5	-	-
1	0.006083	-	80	-	-
2	0.000414	-	79	-	-
..	..	..	..	..	..
81	0.04667	0.03154	9	0	0
82	0.051554	0.076709	8.5	0	0
83	0.057062	0.094483	8	0	0
..	..	..	..	..	..
95	0.180734	0.047595	3.5	0	0
..	..	..	..	..	..
100	0.270467	-	2.5	-	-
101	0.2954662	-	2.3	-	-
....

These are the codes that I used:

Code:

clear all
set obs 150 // some large number >= length longest array

gen int age = _n - 1

gen nvsr_death_prob_yearly = .
local nvsr_death_prob_yearly 0 0.006083 0.000414 0.000301 0.000218 0.000172 0.000158 0.000141 0.000128 0.000117 0.000108 0.000105 0.000110 0.000132 0.000173 0.000228 ///
0.000292 0.000355 0.000407 0.000440 0.000456 0.000471 0.000489 0.000501 0.000509 0.000515 0.000506 0.000516 0.000531 0.000552 0.000578 0.000609 0.000646 0.000693 ///
0.000753 0.000827 0.000864 0.000949 0.001047 0.001157 0.001273 0.001393 0.001514 0.001639 0.001774 0.001919 0.002045 0.002211 0.002383 0.002560 0.002746 0.002949 ///
0.003176 0.003431 0.003718 0.004039 0.004462 0.004859 0.005304 0.005809 0.006384 0.007060 0.007817 0.008596 0.009350 0.010100 0.011305 0.012297 0.013426 0.014706 ///
0.016123 0.017603 0.019196 0.021033 0.023175 0.025585 0.028661 0.031295 0.034315 0.037906 0.042094 0.046670 0.051554 0.057062 0.063411 0.070761 0.079054 0.087065 ///
0.095796 0.105294 0.115605 0.126771 0.138833 0.151829 0.165787 0.180734 0.196684 0.213644 0.231608 0.250560 0.270467 0.295466253 0.317857963 0.341365799 0.365989493 ///
0.39172287 0.418553461 0.44646215 0.475422854 0.50540224 0.53635949 0.568246115 0.601005825 0.634574456 0.66887997 0.703842512 0.73937455 0.775381082 0.811759916 0.848402039 ///
0.88519205 0.92200867 0.958725335 0.995210852

di "`nvsr_death_prob_yearly'"
tokenize `nvsr_death_prob_yearly'
forvalues i = 1/121 {
    replace nvsr_death_prob_yearly = ``i'' in `i'
}

Code:

local age_input = 80
gen L_breastcancer_distribution = .
local L_breastcancer_distribution 0.03154 0.076709 0.094483 0.091899 0.083154 0.079074 0.073931 0.069022 0.066901 0.059433 0.061200 0.058224 0.054612 0.052216 0.047595
tokenize `L_breastcancer_distribution'

forvalues i = 1/15 {
replace L_breastcancer_distribution = ``i'' in `=`age_input' + 1 +`i''
}

describe
summarize
list in 1/121

Code:

gen nvsr_life_expect = .
local nvsr_life_expect 80.5 80.0 79.0 78.0 77.1 76.1 75.1 74.1 73.1 72.1 71.1 70.1 69.1 68.1 67.2 66.2 65.2 64.2 63.2 62.3 61.3 60.3 59.4 58.4 57.4 56.4 55.5 ///
54.5 53.5 52.6 51.6 50.6 49.7 48.7 47.7 46.8 45.8 44.8 43.9 42.9 42.0 41.1 40.1 39.2 38.3 37.3 36.4 35.5 34.6 33.6 32.7 31.8 30.9 30.0 29.2 28.3 27.4 26.5 ///
25.7 24.8 24.0 23.1 22.3 21.5 20.7 19.9 19.1 18.4 17.6 16.9 16.1 15.4 14.7 14.0 13.3 12.7 12.0 11.4 10.8 10.2 9.6 9.0 8.5 8.0 7.5 7.0 6.6 6.2 5.8 5.4 5.0 ///
4.7 4.4 4.1 3.8 3.5 3.3 3.1 2.8 2.6 2.5 2.3 2.1 2.0 1.8 1.7 1.5 1.4 1.2 0.8 0.4 0.2 0.1 0.05 0.03 0.01 0.006 0.003 0.001 0.002 0.0008
di "`nvsr_life_expect'"
tokenize `nvsr_life_expect'
forvalues i = 1/121 {
    replace nvsr_life_expect = ``i'' in `i'
}

Code:

local age_input = 80 
gen L_breastcancer_percentage = . 
local L_breastcancer_percentage 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
tokenize `L_breastcancer_percentage'
forvalues i = 1/16 {
replace L_breastcancer_percentage = ``i'' in `=`age_input' + 1 +`i''
}

local age_input = 80 
gen L_nonbreastcancer_prob = . 
local L_nonbreastcancer_prob 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
tokenize `L_nonbreastcancer_prob'
forvalues i = 1/16 {
replace L_nonbreastcancer_prob = ``i'' in `=`age_input' + 1 +`i''
}

If I understand correctly, I need to merge it with the other dataset with 90000 observations and use the formulas as already described. I then can connect it with the other variables in the dataset with 90000 observations by using the formulas.

However, I do not understand why we use local age_input = 80. Yes, I understand that we need to have it somewhere in the dataset.. But..

Comment

Stephen Jenkins

Join Date: Apr 2014

Posts: 1424
#26

07 Jun 2018, 07:13

However, I do not understand why we use local age_input = 80. Yes, I understand that we need to have it somewhere in the dataset..

David and I thought that the values in this local also related to age, and were trying to align the array values with the right age. Now you tell us that the values refer to "years after diagnosis". It sounds as if you require 2 data files. Create one with "age" as the key as before. But now create the second with a "years_after_diagnosis" variable created in an analogous way to the way we/you created age. I assume your 90,000 individuals file contains both age and years_after_diagnosis variables. Now do 2 merges into that file. One using the age-keyed file; a second using the other file. Note well the responses to your more recent post about merging and also read help merge
Comment

Lisa Spoelstra

Join Date: Jan 2018
Posts: 24

#27

07 Jun 2018, 08:20

Okay, now I have nvsr_death_prob_yearly and nvsr_life_expect in my dataset with 90000 observations, corresponding to every age. Thanks for the help! However, I need to add these number too (from JavaScript):

Code:

var L_breastcancer_distribution = new Array(0.031545, 0.076709, 0.094483, 0.091899, 0.083154, 0.079074, 0.073931, 0.069022, 0.066901, 0.059433, 0.061200, 0.058224, 0.054612, 0.052216, 0.047595);

These are the chances of death in each of the 15 years after diagnosis with breast cancer. Assume that every person in my dataset started at the same time, I tried this:

Code:

replace L_breastcancer_distribution = .
local L_breastcancer_distribution 0.031545 0.076709 0.094483 0.091899 0.083154 0.079074 0.073931 0.069022 0.066901 0.059433 0.061200 0.058224 0.054612 0.052216 0.047595
di "`L_breastcancer_distribution'"
tokenize `L_breastcancer_distribution'
forvalues i=1/15 {
    qui replace L_breastcancer_distribution = ``i'' in `i'
}

Which seems to work.

But when I want to make formulas, Stata say that this is an invalid syntax:

Code:

replace cancer_death_dist_cumm = .
local n = _n
forvalues 1/`15' {
replace cancer_death_dist_cumm [_n] = cancer_death_dist_cumm [_n-1] + L_breastcancer_distribution [_n-1] * L_breastcancer_KM
}

What I need in the end is something like this:

ID	age	L_breastcancer_distribution_year1	L_breastcancer_distribution_year2	L_breastcancer_distribution_year3	..	L_breastcancer_distribution_year15
1
2
3
4
5
6

Same for cancer_death_dist_cumm, what is wrong in my syntax?

Last edited by Lisa Spoelstra; 07 Jun 2018, 08:23.

Comment

Stephen Jenkins

Join Date: Apr 2014

Posts: 1424
#28

07 Jun 2018, 08:31

You've repeated the forvalues syntax error you had in your earlier posts! In

Code:

forvalues 1/`15'

I'm guessing you need instead

Code:

forvalues i = 1/15

Note missing "i = ". Also if "15", the number, is what you want, referring to it as if a macro will fail.
Also puzzling is the replace statement that follows -- there is no reference to "i" in there, so the whole forvalues statement is redundant! Will you achieve what you want if you replace "_n" by "i"?
[_n refers to the current observation, as in rows of the dataset. I think you want to refer to variable names.]
Comment

Lisa Spoelstra

Join Date: Jan 2018
Posts: 24

#29

07 Jun 2018, 09:01

I've changed the first code, in this, because it gave only values for the first 15 values;

Code:

replace L_breastcancer_distribution = .
local L_breastcancer_distribution 0.031545 0.076709 0.094483 0.091899 0.083154 0.079074 0.073931 0.069022 0.066901 0.059433 0.061200 0.058224 0.054612 0.052216 0.047595
di "`L_breastcancer_distribution'"
tokenize `L_breastcancer_distribution'
    qui replace L_breastcancer_distribution = ``i'' in `i'

I tried a lot of things, with this code

Code:

replace cancer_death_dist_cumm = .
local i = `i'
forvalues i=1/15 {
replace cancer_death_dist_cumm i = cancer_death_dist_cumm i-1 + L_breastcancer_distribution i-1 * L_breastcancer_KM
}

But doesn't work.. What is wrong? Sorry I am a bit stuck..

Comment

Stephen Jenkins

Join Date: Apr 2014

Posts: 1424
#30

07 Jun 2018, 12:19

Lisa I dont' think you need the following at all.

Code:

local i = `i'

My suggestion was that you have a line within the forvalues call that is something like:

Code:

forvalues i=1/15 { replace cancer_death_dist_cumm`i' = cancer_death_dist_cumm`=`i'-1' + L_breastcancer_distribution`=`i'-1' * L_breastcancer_KM }

The idea is to evaluate the local macro contents "on the fly" so that the relevant variable name is cited. Search for help on this trick
Comment

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment