Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Print excel after calculating Shannon entropy

    Dear Statalists, I have a problem since I am calculating the Shannon entropy for a very huge number of cells of a grid, but when I try to export the values for each cell all in an excel file I only get the values of the last cell.
    Please, find my code here:
    global geo "A1 A2 A3 A....."
    foreach var of global geo {
    entropyetc `var'
    }
    I tried many options to print all my results for example adding these lines:
    matrix list r(entropyetc)
    putexcel A1=matrix(r(entropyetc))

    and I have also played with the results export function in the main menu. But I always obtain the last line of my results including Shannon exp_H Simpson rec_lambda and dissim for the last cell. Can you please help me?
    Many thanks.


  • #2
    What is shown on screen is not saved. In order for Stata to export something, it needs to have that something somewhere. So the first step would be store the different entropies somewhere. You could do that in a matrix or in a dataset, but here I will use the new feature in Stata 17: collect

    On Statalist we will assume you have the latest version of Stata unless you tell us otherwise. This does not mean that it is bad to have an older version. It just means that we have to make a assumption when you tell us nothing, and the latest is often the most reasonable assumption. So I will assume you have Stata 17. If that is not the case, then what I tell you below does not apply to you.

    Code:
    // always start new
    clear all
    
    // open example data
    sysuse nlsw88
    
    // prepare the data
    gen byte marst = !never_married + married if !missing(never_married)
    label variable marst "marital status"
    label define marst 0 "never married"    ///
                       1 "widowed/divorced" ///
                       2 "married"
    label value marst marst
    
    gen byte urban = c_city + smsa
    label define urban 2 "central city" ///
                       1 "suburban"     ///
                       0 "rural"
    label value urban urban
    label variable urban "urbanicity"
    
    // collect the entropies
    local vars marst urban industry occupation race
    local i = 1
    foreach var of local vars {
        collect H=r(entropyetc)[1,1] : entropyetc `var'
        local labs `"`labs' `i++' "`var'""'
    }
    // tell collect what result correspond to what variable
    collect label levels cmdset `labs'
    // tell collect what goes on the rows and columns
    collect layout (cmdset) (result)
    // export to excel
    collect export "c:/temp/foo.xlsx", replace
    ---------------------------------
    Maarten L. Buis
    University of Konstanz
    Department of history and sociology
    box 40
    78457 Konstanz
    Germany
    http://www.maartenbuis.nl
    ---------------------------------

    Comment


    • #3
      Dear Maarten Buis,
      many thanks for your kind answer. I have STATA 2014.
      What can I use? Please, consider I have more than 10,000 columns which are my grid cells and only 5-6 attributes per cell in my rows.
      I need to calculate and export all my entropy for each one of the cells included in the grid which are the parts of my cities based on the attributes they have in them and their diversity.

      Comment


      • #4
        Stata 2014 does not exist yet (we are at version 17 now, with 2 years per version , it is going to take quite a while before we get there...) No new version came out in 2014, 13 came out in 2013 and 14 came out in 2015. So it is still a mystery what version of Stata you have. As you have noticed, this is important, because otherwise we end up giving you advise that is not helpful to you.

        Minor note: Stata is an invented word, not an acronym, so the correct spelling is Stata, not STATA.
        ---------------------------------
        Maarten L. Buis
        University of Konstanz
        Department of history and sociology
        box 40
        78457 Konstanz
        Germany
        http://www.maartenbuis.nl
        ---------------------------------

        Comment


        • #5
          This will work in Stata 13

          Code:
          // always start new
          clear all
          
          // open example data
          sysuse nlsw88
          
          // prepare the data
          gen byte marst = !never_married + married if !missing(never_married)
          label variable marst "marital status"
          label define marst 0 "never married"    ///
                             1 "widowed/divorced" ///
                             2 "married"
          label value marst marst
          
          gen byte urban = c_city + smsa
          label define urban 2 "central city" ///
                             1 "suburban"     ///
                             0 "rural"
          label value urban urban
          label variable urban "urbanicity"
          
          // collect the entropies
          
          tempname memhold res
          tempfile results
          postfile `memhold' str20 varname double H using `results'
          
          local vars marst urban industry occupation race
          foreach var of local vars {
              entropyetc `var'
              matrix `res' = r(entropyetc)
              post `memhold' ("`var'") (`res'[1,1])
          }
          postclose `memhold'
          
          drop _all
          use `results'
          export excel using "c:\temp\foo.xlsx", replace
          ---------------------------------
          Maarten L. Buis
          University of Konstanz
          Department of history and sociology
          box 40
          78457 Konstanz
          Germany
          http://www.maartenbuis.nl
          ---------------------------------

          Comment


          • #6
            Thanks to Maarten Buis for his helpful answers.

            entropyetc is from SSC -- as you are asked to explain (https://www.statalist.org/forums/help#stata).

            Maarten's point about explaining the Stata version you use is at https://www.statalist.org/forums/help#version


            Comment


            • #7
              Dear both,
              thank you very much for your support. It was very much appreciated. I made it, thanks to your suggestions.
              It the first time I write a question on Statalist, therefore I will certainly improve in the next occasions.
              Many thanks again.
              Kind regards
              Last edited by paola proietti; 28 Jun 2021, 06:38.

              Comment


              • #8
                Sorry, I have two last questions on entropyetc. Can entropyetc give values for shannon entropy bigger than 1? Should I input raw data (e.g. number of trees of type 1, number of trees of type 2) or should I input the relative number of trees of type 1over total number of trees in the same cell) , etc? Many thanks

                Comment


                • #9
                  The entropy can easily be more than 1. See e.g. https://stats.stackexchange.com/ques...greater-than-1

                  The input is immaterial as the calculation scales to probabilities any way.


                  Here's an example showing that the same results come out whether you get the command to count first or you feed it probabilities.

                  Code:
                  . sysuse auto, clear
                  (1978 automobile data)
                  
                  . entropyetc rep78
                  
                  ----------------------------------------------------------------------
                            |  Shannon H      exp(H)     Simpson   1/Simpson     dissim.
                  ----------+-----------------------------------------------------------
                        all |      1.358       3.888       0.297       3.369       0.296
                  ----------------------------------------------------------------------
                  
                  . contract rep78
                  
                  . l
                  
                       +---------------+
                       | rep78   _freq |
                       |---------------|
                    1. |     1       2 |
                    2. |     2       8 |
                    3. |     3      30 |
                    4. |     4      18 |
                    5. |     5      11 |
                       |---------------|
                    6. |     .       5 |
                       +---------------+
                  
                  . su _freq if rep78 < .
                  
                      Variable |        Obs        Mean    Std. dev.       Min        Max
                  -------------+---------------------------------------------------------
                         _freq |          5        13.8    10.73313          2         30
                  
                  . gen _prob = _freq / r(sum) if rep78 < .
                  (1 missing value generated)
                  
                  . l
                  
                       +--------------------------+
                       | rep78   _freq      _prob |
                       |--------------------------|
                    1. |     1       2   .0289855 |
                    2. |     2       8    .115942 |
                    3. |     3      30   .4347826 |
                    4. |     4      18   .2608696 |
                    5. |     5      11   .1594203 |
                       |--------------------------|
                    6. |     .       5          . |
                       +--------------------------+
                  
                  . entropyetc rep78 [aw=_prob]
                  
                  ----------------------------------------------------------------------
                            |  Shannon H      exp(H)     Simpson   1/Simpson     dissim.
                  ----------+-----------------------------------------------------------
                        all |      1.358       3.888       0.297       3.369       0.296
                  ----------------------------------------------------------------------

                  Comment


                  • #10
                    Nick Cox Thank you so much, both answers were extremely clear and relevant for my work.

                    Comment

                    Working...
                    X