Armin Goralczyk | 19 Dec 11:01 2007

Failure message in R on Mac with xmlTreeParse


In the following thread (R-help) the possibilities of analyzing
publications from pubmed via XML were discussed:

Using xmlTreeParse in a function results in a failure message on my
Mac which is not reproduced in R for Windows:

> esearch <- function (term){
+ 	srch.stem <- ""
+ 	srch.mode <- "db=pubmed&retmax=10000&retmode=xml&term="
+ 	doc <-xmlTreeParse(paste(srch.stem,srch.mode,term,sep=""),isURL = TRUE,
+ 		useInternalNodes = TRUE)
+ 	sapply(c("//Id"), xpathApply, doc = doc, fun = xmlValue)
+ 	}
> term <- 'meyer'
> pmid <- esearch(term) # works fine
> term <- 'meyer[au]'
> pmid <- esearch(term)
Fehler in .Call("RS_XML_ParseTree", as.character(file), handlers,
as.logical(ignoreBlanks),  :
  error in creating parser for[au]
I/O warning : failed to load external entity
(Continue reading)

Duncan Temple Lang | 19 Dec 22:23 2007

Re: Failure message in R on Mac with xmlTreeParse

The [au] portion seems to be causing the problem.
So escape the [ and ] by mapping them to %5B and %5D respectively
_before_ handing the URL string to xmlTreeParse().  (The error message
indicates that the internals have already performed the conversion, but
if you do it yourself, things should work as I can reproduce your error
message and can get the desired result by escaping the [ and ] first.)

There is more information about what needs to be escaped at

The HTTP/FTP code built into the xmlTreeParse(), htmlTreeParse() and
xmlEventParse() functions (specifically from libxml2) is minimalistic.
For better or worse, it is the code that is also in R to implement
url() connections.  It does not handle aspects of HTTP other than simple
request.  So when I run into problems with xmlTreeParse() and a URL,
I first fetch the content of the document using the RCurl package.


does fetch the document and the result can be passed directly to

RCurl is an interface to libcurl which is a very solid, stable
and feature rich library for performing HTTP, HTTPS, FTP, ... client
queries which allows us to do, in R, pretty much anything a Web browser
can do but programmatically.

(Continue reading)