MarkDoc Manual + GUI

haghish

Join Date: Aug 2014

Posts: 201
#1

MarkDoc Manual + GUI

05 Sep 2016, 11:30

The new release of MarkDoc (currently only on GitHub) has a dialog box, which not only shows all of the functionality of the package, but also makes using it for students easier. Moreover, MarkDoc manual is now available on GitHub Wiki and is updated constantly with new tutorials. The new updates were meant to facilitate using MarkDoc in classroom and workshops, to make it easier for the students to use the package for taking notes and writing their projects within Stata.

To install the package from GitHub, type:

Code:

net install markdoc, replace from("https://raw.githubusercontent.com/haghish/markdoc/master/")

and to open the dialog box, type:

Code:

db markdoc

which opens a window as shown below. For more details, see GitHub Wiki or the release notes

Last edited by haghish; 05 Sep 2016, 11:37.
Tags: None

1 like
Anders Alexandersson

Join Date: Apr 2014

Posts: 203
#2

27 Oct 2016, 13:21

I get no output from -markdoc- when used with your command -Rcall- in a do-file. I am using Stata 14.2 for Windows, -markdoc- version 3.8.3, -Rcall- version 1.3.5, and Microsoft Word 2013.

Basically, I want to do literate programming in Stata using -markdoc- but add a few R commands using -Rcall-. I can use -markdoc- and -Rcall- separately but not together. The program -Rcall- is discussed here: http://www.statalist.org/forums/foru...th-r-and-stata

This -markdoc- command works but stops to produce output if I add any -Rcall- code in the do-file:

Code:

markdoc "F:\tmp_27oct.do" , markup(markdown) export(docx) replace style("stata")

I tried both interactive and vanilla mode of -Rcall-. The output is displayed in Stata but not in the Word document; the Word document is empty once you add -Rcall- code. Are -markdoc- and -Rcall- incompatible?
Comment
haghish

Join Date: Aug 2014

Posts: 201
#3

28 Oct 2016, 06:57

I don't have any problem including R code in dynamic document with MarkDoc

The current Rcall version is 1.4 and MarkDoc is 1.3.8 so perhaps you could try to update your software and if that doesn't solve the problem in your code, you could upload an example do-file to see what is the problem.
Comment
Anders Alexandersson

Join Date: Apr 2014

Posts: 203
#4

28 Oct 2016, 07:35

I no longer have any problem including R code in dynamic document with MarkDoc. Updating to the current versions Rcall 1.4.0 and MarkDoc 3.8.5 fixed the problem. Thanks, Haghish!
Comment
Andrea Discacciati

Join Date: Feb 2016

Posts: 194
#5

28 Oct 2016, 15:02

Hi Haghish, I have one quick question.

I use a lot of loops in my code (-foreach- and -forval-). Is there a way to suppress the display of the Stata code from the loops and keep only the Stata output in the document generated by MarkDoc? Something similar to /**/ for single commands?

For example, given this code

Code:

qui log using "/Users/anddis/Desktop/a", smcl replace webuse auto /*** # title 1 some text ***/ foreach v of varlist length trunk { txt "## Variable summarized: `=strupper("`v'")'" _n su `v' } qui log c markdoc "/Users/anddis/Desktop/a.smcl", export(html) pandoc(/usr/local/bin/pandoc)

I wish that the output file a.html didn't display the following part:

Code:

. foreach v of varlist length trunk { . su `v' . }

Thank you for all the work you've been putting into this!
Comment
haghish

Join Date: Aug 2014

Posts: 201
#6

31 Oct 2016, 04:15

Anders Alexandersson glad to hear that. The great thing about GitHub is that you can simply follow the packages you are using and get notification about their recent updates... plus many other benefits for reproducibility which I am addressing in my current paper...

Andrea Discacciati
You can do that. For example, create a do-file file as follows (also attached)

Code:

/*** Hiding loops ============ To hide a __loop__ simply hide each of the commands... ***/ /**/ forval n = 1/5 { /**/ local n = `n'^2 /**/ display "print `n'" /**/ }

name the do-file, for example example.do and then execute the MarkDoc command:

Code:

markdoc example.do, exp(pdf) replace

I have attached the do-file and the PDF I get in my computer.

Attached Files

example.pdf (13.2 KB, 1 view)

example.do (170 Bytes, 1 view)
Comment
Anders Alexandersson

Join Date: Apr 2014

Posts: 203
#7

31 Oct 2016, 07:16

The problem of the empty output in Word, as I reported in #2, has re-appeared but it is specific to the R read_csv() function of the package readr. I still use Rcall 1.4.0 and MarkDoc 3.8.5, and Microsoft Word 2013; my version of R is 3.3.1 and my version of readr is 1.0.0. Here is my do-file:

Code:

/*** Some comments. ***/ sysuse auto, clear export delimited auto.txt, replace capture program drop rprogram program define rprogram syntax Rcall vanilla: /// library(readr); /// read_csv("auto.txt"); /// end rprogram /*** Some more comments. ***/

The read_csv() is a popular function in R by Hadley for reading comma-delimited text files; it is similar in functionality to Stata's command import delimited. I need it for testing reproducibility between Stata and R. Here is my Stata code for calling the do-file:

Code:

markdoc "F:\tmp_31oct.do" , markup(markdown) export(docx) replace style("stata") noisily

As reported several times before, the warnings and errors are very difficult to report because they only appear in a quickly disappearing command prompt. I get this warning and error message in the Windows command prompt:

Code:

Warning message: In if (class(.Last.value) != "try-error") {} : the condition has length >1 and only the first element will be used } pandoc.exe: Cannot decode byte '\xd7': Data.Text.Internal.Encoding.Fusion.stream Utf8: Invalid UTF-8 strea

Attached are my two best screenshots (.PNG files) of the complete output in the command prompt. The problem is Halloween spooky.

Attached Files

Last edited by Anders Alexandersson; 31 Oct 2016, 07:45. Reason: Updated R to from 3.3.0 to 3.3.1 but the problem remains.
Comment
haghish

Join Date: Aug 2014

Posts: 201
#8

01 Nov 2016, 11:28

Anders Alexandersson

I could run your code without any problem on Mac 10.10! On Windows 10, however, I also get an empty file because Pandoc fails. I get the following error:

Code:

pandoc.exe: Cannot decode byte '\xd7': Data.Text.Internal.Encoding.Fusion.streamUtf8: Invalid UTF-8 stream

So what we can do now is to figure out how can we convert the markdown file that is attached here to other formats using Pandoc on Windows and why Pandoc fails. You could perhaps post this on an online forum or maybe figure it out on your own. Perhaps an option is needed to take care of UTF-8, or perhaps updating Pandoc would solve the problem?... I'm not sure, it's been ages since I read the Pandc manual. But let us know if you figure it out. All I can say is that the error is specifically related to Pandoc and Windows, not Rcall or MarkDoc.
Attached Files

Markdown.txt (1.5 KB, 1 view)
Comment
Anders Alexandersson

Join Date: Apr 2014

Posts: 203
#9

01 Nov 2016, 15:12

haghish

Thanks. Of course, I will report here if I figure it out.

Updating pandoc did not solve the problem. I updated Rcall from 1.4.0 to current 1.4.2, and pandoc from 1.13.1 to 1.18. I am using Windows 7.

The Pandoc manual states that if your local character encoding is not UTF-8, you should pipe input and output through iconv. The equivalent Stata command is unicode convertfile but I was not able to progress using this command.

Interestingly, two alternative R commands for creating a comma-delimited file gives a different perspective: The command fread creates the same warning message as in my post #7 but not an error

Code:

library(data.table); /// fread("auto.txt"); ///

and read.csv works without even giving a warning message!

Code:

library(foreign); /// read.csv("auto.txt"); ///

read_csv and fread are much faster than read.csv, so using read.csv instead is only a temporary fix. I know this is not a good solution but the additional info might help you or someone else interested.

Last edited by Anders Alexandersson; 01 Nov 2016, 15:14. Reason: fixed typo
Comment

haghish

Join Date: Aug 2014
Posts: 201

#10

02 Nov 2016, 06:08

Anders Alexandersson

So I figured it out. The error is "unavoidable" because the output returned by R includes a "diamond" character that is not UTF-8 and it fails Pandoc. This is what Rcall returns to Stata:

Code:

          .  Rcall vanilla:  ///
           library(readr);  ///
           read_csv("auto.txt"); 
          
          # A tibble: 74 ◊ 12
                      make price   mpg rep78 headroom trunk weight length  turn
                     <chr> <int> <int> <int>    <dbl> <int>  <int>  <int> <int>
          1    AMC Concord  4099    22     3      2.5    11   2930    186    40
          2      AMC Pacer  4749    17     3      3.0    11   3350    173    40
          3     AMC Spirit  3799    22    NA      3.0    12   2640    168    35
          4  Buick Century  4816    20     3      4.5    16   3250    196    40
          5  Buick Electra  7827    15     4      4.0    20   4080    222    43
          6  Buick LeSabre  5788    18     3      4.0    21   3670    218    43
          7     Buick Opel  4453    26    NA      3.0    10   2230    170    34
          8    Buick Regal  5189    20     3      2.0    16   3280    200    42
          9  Buick Riviera 10372    16     3      3.5    17   3880    207    43
          10 Buick Skylark  4082    19     3      3.5    13   3400    200    42
          # ... with 64 more rows, and 3 more variables: displacement <int>,
          #   gear_ratio <dbl>, foreign <chr>

So the error is caused by "# A tibble : 74 ◊ 12" and the only way we can fix it is to convert the document to UTF-8. This is literally the first time I am getting this error in Pandoc. In particular, the author of read_csv() should have avoided that character. Basically, we are discussing something "rare" and I am not sure if it is worth it to add a secure convertor to MarkDoc (ensuring the document is UTF-8) and make it slower for the whole community.

So the conclusion for now is simple. MarkDoc does not support documents which are not UTF-8!

Comment

Anders Alexandersson

Join Date: Apr 2014

Posts: 203
#11

02 Nov 2016, 09:18

I found a solution specifically for the file "Markdown.txt" in #8. I use Stata 14.2 for Windows.

Basically, my solution is to change the file from extended ASCII to UTF-8 before calling markdoc. Pandoc fails in Windows because the multiplication sign in "# A tibble 74 x 12" from read_csv is extended ASCII, not Unicode UTF-8. The multiplication sign is U+00D7 in UTF-8 but UTF-8 for Windows has a separate encoding Alt +D7, according tohttp://www.fileformat.info/info/unic...r/d7/index.htm.

You can reproduce the error using Pandoc from the Windows command prompt. Here is a Pandoc getting started guide. Here is a Pandoc forum. I put the file in the same location as Pandoc. In "C:\Windows\system32.cmd.exe", I typed

Code:

cd C:\ado\Plus\Weaver\Pandoc pandoc Markdown.txt -o Word.docx

You can reproduce the invalid character also in Stata (14.2 for Windows).

Code:

. unicode convertfile "Markdown.txt" "Markdown_u.txt", dstencoding(UTF-8) illegal character found in input file Illegal character starts at byte position 351. Invalid byte is D7. file "Markdown.txt" partially converted to file "Markdown_u.txt" r(198);

Several postings suggested to encode the document to UTF-8 before using Pandoc. This can easily be done in my text editor Notepad++ under the menu "Encoding". It can be done fairly easy also in Stata:

Code:

unicode analyze "Markdown.txt" unicode encoding set "Windows-1252" unicode translate "Markdown.txt", transutf8

Pandoc now runs without an error, and the correct Word document is created.

I am not sure how change the file to UTF-8 in markdoc before calling pandoc, Perhaps somehow using the suggested unicode translate change if the error appears and the OS according to creturn c(os) is Windows?
Comment
Anders Alexandersson

Join Date: Apr 2014

Posts: 203
#12

02 Nov 2016, 09:19

Ops, I posted without seeing Haghish answer. But I did outline a possible solution. Thanks!
Comment
Anders Alexandersson

Join Date: Apr 2014

Posts: 203
#13

02 Nov 2016, 13:02

I opened Github readr issue #547 for the problem with the invalid UTF-8 character in Windows from R's read_csv(). I will report back here on Statalist when the author, Hadley Wickham, responds.
Comment
Anders Alexandersson

Join Date: Apr 2014

Posts: 203
#14

10 Nov 2016, 08:56

I got another problem, error 198 with message "option --toc not allowed" when the exported document is tex.
I am still using updated Stata 14.2 and markdoc 3.8.5 from GitHub on Windows 7.

The problem is easily reproduced by tweaking the tutorial Markdown example as in this do-file code

Code:

qui log using example, replace /*** This is a Markdown Heading ========================== This is a Markdown subheading ----------------------------- This is a text paragraph. ***/ qui log c markdoc example, replace install title("How to write text in MarkDoc") /// export(docx) toc noisily markdoc example, replace install title("How to write text in MarkDoc") /// export(tex) toc master style("stata") noisily

The output of the first markdoc is

Code:

"c:\ado\plus\Weaver\pandoc\pandoc.exe" --mathjax --toc -S --reference-docx="c:\ado\plus\m\markdoc_simple.docx" "e > xample.md" -o "C:\Users\AALEXA~1\AppData\Local\Temp\ST_2g000001.tmp.docx" (MarkDoc created example.docx)

The output of the second markdoc is

Code:

"c:\ado\plus\Weaver\pandoc\pandoc.exe" --mathjax --toc "example.md" -o "C:\Users\AALEXA~1\AppData\Local\Temp\ST_ > 2g000001.tmp.tex" option --toc not allowed r(198);

It seems from the difference in output that Pandoc will need "--toc -S" rather than just "--toc" to produce the table of content. According to the help file, the option toc should be allowed for tex (LaTeX):

toc creates table of content in PDF, Microsoft Word Docx, and LaTeX documents

Therefore, I think it is a bug in markdoc. The error exists also if I omit the install option because I already have the required third-party software.

I also noted a few minor documentation issues:
1. The versioning on GitHub at https://github.com/haghish/MarkDoc. The the markdoc_ssc folder is versioned 3.8.8 but I think it should be marked as 3.8.3 since the markdoc.ado in the folder has version 3.8.3. I first mistakenly thought that 3.8.8 was a never version.
2. The MarkDoc PDF should have updated syntax: texmaster instead of master, linesize() appears twice.
Comment
haghish

Join Date: Aug 2014

Posts: 201
#15

10 Nov 2016, 13:54

Thanks, I updated the package and the bug is fixed.

Regarding the second point, the syntax in the PDF is the "new" syntax. The old syntax will continue to work for now.
Comment

Announcement