Robert.Buergel | 16 Aug 2012 09:23
Picon
Picon

export from MS Excel to DocBook?

Hi all,

I've got a large number of text paragraphs in a Excel spreadsheet.

I think about to convert the entries of the spreadsheet cells to a docbook file using the XML capabilities of
the newer Excel versions.

My Excel looks like this:

Heading 1	|		|		|		|
		| Text a	|		|		|
		| Text b	|		|		|
		|		| Heading 2	|		|	
		|		|		| Text c	|
		|		|		| Text d	|
		
This should go in something like this:

<section>
 <title>Heading 1</title>
 <para> Text a</para>
 <para> Text b</para>
 <section>
  <title>Heading 2</title>
  <para> Text a</para>
  <para> Text b</para>
 </section>
</section>

Has anybody any experience with this? Any pointers?
(Continue reading)

Jirka Kosek | 16 Aug 2012 10:02
Picon
Favicon
Gravatar

Re: export from MS Excel to DocBook?

On 16.8.2012 9:23, Robert.Buergel <at> bmw.de wrote:

> Has anybody any experience with this? Any pointers?

XSLT 2.0 has powerfull grouping instruction xsl:for-each-group. With
that instruction and Excel file saved as .xslx it's fairly easy to
produce output you want. Also Saxon9 (XSLT 2.0 implementation) can read
directly content of .xsls files there is no need to unpack them first.
Simply use something like

doc('jar:table.xslx!!/_rels/.rels')

to access parts of XSLX file.

				Jirka

--

-- 
------------------------------------------------------------------
  Jirka Kosek      e-mail: jirka <at> kosek.cz      http://xmlguru.cz
------------------------------------------------------------------
       Professional XML consulting and training services
  DocBook customization, custom XSLT/XSL-FO document processing
------------------------------------------------------------------
 OASIS DocBook TC member, W3C Invited Expert, ISO JTC1/SC34 member
------------------------------------------------------------------

Christian Roth | 16 Aug 2012 12:07
Picon

Re: export from MS Excel to DocBook?

On 16.08.2012, at 10:02, Jirka Kosek wrote:

> doc('jar:table.xslx!!/_rels/.rels')

Why are there two '!' in this Jar URL? Is that a special mode?

-Christian

--

-- 
Christian Roth * Phone: +49 (0)89 89 04 32 95
infinity-loop GmbH * Neideckstr. 25 * 81249 München * Germany
HRB 136 783 (AG München) * Geschäftsführer: Dr. Stefan Hermann
Web: http://www.infinity-loop.de
Jirka Kosek | 16 Aug 2012 12:42
Picon
Favicon
Gravatar

Re: export from MS Excel to DocBook?

On 16.8.2012 12:07, Christian Roth wrote:
> On 16.08.2012, at 10:02, Jirka Kosek wrote:
> 
>> doc('jar:table.xslx!!/_rels/.rels')
> 
> Why are there two '!' in this Jar URL? Is that a special mode?

Sorry, it should be just one. Silly typo.

--

-- 
------------------------------------------------------------------
  Jirka Kosek      e-mail: jirka <at> kosek.cz      http://xmlguru.cz
------------------------------------------------------------------
       Professional XML consulting and training services
  DocBook customization, custom XSLT/XSL-FO document processing
------------------------------------------------------------------
 OASIS DocBook TC member, W3C Invited Expert, ISO JTC1/SC34 member
------------------------------------------------------------------

Kerry, Richard | 16 Aug 2012 17:06
Favicon

RE: export from MS Excel to DocBook?


Not knowing about the xlsx (sic) unpacking facility in Saxon, and having had no success trying to automate
Excel internally, I've recently written a simple C++ program to take an Excel spreadsheet file and
convert it to XML.  From there I'm using Xsl to generate the Xml I really want.

As it's a program I've written for work I can't publish it, though it is really quite simple.  Access Excel
using COM.  Scan all worksheets, scan all rows, scan all columns, and write the text in the cells to Xml using
Msxml.  Took about an afternoon to get it working.

Maybe now I'll see if I can get anything useful from the xlsx file directly using Saxon....

Unhelpfully,
Richard.

> -----Original Message-----
> From: Jirka Kosek [mailto:jirka <at> kosek.cz] 
> Sent: 16 August 2012 09:02
> To: Robert.Buergel <at> bmw.de
> Cc: docbook-apps <at> lists.oasis-open.org
> Subject: Re: [docbook-apps] export from MS Excel to DocBook?
> 
> On 16.8.2012 9:23, Robert.Buergel <at> bmw.de wrote:
> 
> > Has anybody any experience with this? Any pointers?
> 
> XSLT 2.0 has powerfull grouping instruction xsl:for-each-group. With
> that instruction and Excel file saved as .xslx it's fairly easy to
> produce output you want. Also Saxon9 (XSLT 2.0 
> implementation) can read
> directly content of .xsls files there is no need to unpack them first.
(Continue reading)

Jirka Kosek | 16 Aug 2012 21:48
Picon
Favicon
Gravatar

Re: export from MS Excel to DocBook?

On 16.8.2012 17:06, Kerry, Richard wrote:

> Not knowing about the xlsx (sic) unpacking facility in Saxon, and
> having had no success trying to automate Excel internally, I've
> recently written a simple C++ program to take an Excel spreadsheet
> file and convert it to XML.  From there I'm using Xsl to generate the
> Xml I really want.
> 
> As it's a program I've written for work I can't publish it, though it
> is really quite simple.  Access Excel using COM.  Scan all
> worksheets, scan all rows, scan all columns, and write the text in
> the cells to Xml using Msxml.  Took about an afternoon to get it
> working.

Today I would stay away from COM if possible. You need Excel in order to
use it (which can be problem for server environment) and it is quite
slow on large document.

> Maybe now I'll see if I can get anything useful from the xlsx file
> directly using Saxon....

Don't get discouraged from first inspection of OOXML internals. Format
is pretty convoluted but once you understood principles is pretty easy
to process if you just need to extract some data or automatically fill
some data into existing template.

				Jirka

--

-- 
------------------------------------------------------------------
(Continue reading)

David Goss | 16 Aug 2012 17:31
Favicon

RE: export from MS Excel to DocBook?

I did something like this recently to take some spreadsheets (or rather tab-delimited data from them) and
format it in LaTeX. I'm not a programmer, so my skills doing something like this are pretty crude. I used
this bit of Lua code to pull data from a tab-delimited text file produced by Excel and read it into a Lua
table. In your case, you could use the LuaXML library to write data from the table to a Docbook file as you see fit.

	function string:split( inSplitPattern, outResults )
		if not outResults then
			outResults = { }
		end
		local theStart = 1
		local theSplitStart, theSplitEnd = string.find( self, inSplitPattern, theStart )
		while theSplitStart do
			table.insert( outResults, string.sub( self, theStart, theSplitStart-1 ) )
			theStart = theSplitEnd + 1
			theSplitStart, theSplitEnd = string.find( self, inSplitPattern, theStart )
		end
		table.insert( outResults, string.sub( self, theStart ) )
		return outResults
	end
	
	-- table to hold data
	local data = {}
	
	local file = assert(io.open(arg[2], "r"), "Error error reading file")
	line = file:read("*line")
	
	repeat
		table.insert(data, line:split("\t"))
		line = file:read("*line")
	until line==nil
(Continue reading)


Gmane