xml
Writing XML files for a KEGGREST database
Using the KEGGREST library, I'm trying to write 134 XML files, where each file represents a unique pathway. I used a simple loop to write each file, then I tried an lapply function but both write, or re-writes the xml files into one file rather than the 134 files in my working directory, and ultimately the file represents just the first pathway in the list (size 24KB). The code below is reproducable: library(KEGGREST) pathways <- names(keggList("pathway", "mdm")) To get the 'KGML' (XML) file simply use: keggGet("pathway", "kgml") where "pathway" takes a named pathway like "path:mdm04146", and outputs the kgml/xml file. When writing each file, there is an error that says: No encoding supplied: defaulting to UTF-8. In the reference manual, there is no attribute to change the encoding. However, I assume this is defaulting to a txt file format, and I know I can coerce it to the xml format. To iterate over the 134 pathways to get each corresponding kgml pathway file and save it as an xml file, I initially tried getting the first three pathways as a test method, yet to no avail: for (i in pathways[1:3]){ write.table(keggGet(i, "kgml"), file = paste(i, ".xml", sep = ""), col.names=FALSE, row.names=FALSE, sep="\t", quote=FALSE) } Then I thought to first read it as a large text file, then write the xml file: out.file<-"" for(i in pathways[1:3]){ file <- keggGet(i, "kgml") out.file <- rbind(out.file, file) } write.table(out.file, file = paste(pathways[i], ".xml", sep = ""), sep = "\t") I know I can get the kgml file because it can open in R: > keggGet("path:mdm04933", "kgml") No encoding supplied: defaulting to UTF-8. [1] "<?xml version=\"1.0\"?>\n<!DOCTYPE pathway SYSTEM \"http://www.kegg.jp/kegg/xml/KGML_v0.7.2_.dtd\">\n<!-- Creation date: Nov 26, 2015 11:26:57 +0900 (GMT+9) -->\n<pathway name=\"path:mdm04933\" org=\"mdm\" number=\"04933\"\n title=\"AGE-RAGE signaling pathway in diabetic complications\"\n image=\"http://www.kegg.jp/kegg/pathway/mdm/mdm04933.png\"\n link=\"http://www.kegg.jp/kegg-bin/show_pathway?mdm04933\">\n <entry id=\"1\" name=\"path:mdm04933\" type=\"map\"\n link=\"http://www.kegg.jp/dbget-bin/www_bget?mdm04933\">\n <graphics name=\"TITLE:AGE-RAGE signaling pathway in diabetic complications\" fgcolor=\"#000000\" bgcolor=\"#FFFFFF\"\n ......................until the end of the file. I tested parts of my loop to make sure they work: > for (i in pathways[1:3]){ + print(i) + } [1] "path:mdm00010" [1] "path:mdm00020" [1] "path:mdm00030" > for(i in pathways[1:3]){ + print(paste(pathways[i], ".xml", sep = "")) + } [1] "path:mdm00010.xml" [1] "path:mdm00020.xml" [1] "path:mdm00030.xml" Although I do get a single XML file I can see it iterate and process the 134 pathways into one file. However, after processing, it is size 24KB, so it only saved the first mdm00010 pathway, and iterestingly the file is not a named file. I can watch the KB size go up and down as it's 're-writing' the file... Here is an lapply function to used to do this operation: lapply(pathways[1:3], function(i, pathways) write.table(keggGet(i, "kgml"), paste(pathways[i], ".xml", sep = ""), col.names=FALSE, row.names=FALSE, sep="\t", quote=FALSE), pathways) Lastly, I can save one xml file by using the XML package: h <- xmlInternalTreeParse(keggGet("path:mdm04933", "kgml")) saveXML(h, "try.xml"). However, when I try to loop it, it completes with no errors but no files are written: for( i in pathways[1:3]){ XML::saveXML(XML::xmlInternalTreeParse(keggGet(i, "kgml"), paste(i, ".xml", sep = ""))) } Thanks for reading!! Any help is greatly appreciated to understand what's going on here. Thanks.
Related Links
In DTDs, why are namespaces given as a URL?
Getting value of an xml element by name using Xdocument
Java DOM unable to recognize CDATA
Why does XML look differently in notepad++ and notepad? [closed]
XML/XSD White spaces are required between publicId and systemId
How to add Conditional Formatting to an XML / Excel file?
xslt 1.0 grouping with compound keys (at different levels)
Actionscript 2: Get an instance name from a variable string
XML file does not appear to have any style information associated with it warning and how to deal with it?
XML data storage options
Error in checkout - AddBodyClass
how to select attribute value of a node in xquery?
XSLT transformation using collection is missing context item
Overload extension methods of T
Permissions on XML execution missing?
XML Schema type alias?