Friday, 22 January 2010

Enter HeVeA

In my last post, we discussed problems with converting LaTeX documents into HTML. I briefly described DocBook, the most compelling alternative I found to LaTeX. I got a lot of response re: DocBook from people telling me I am crazy and what horrible experiences they'd had with DocBook. But I was undaunted, and I forged ahead. I have converted my ISMIR 2009 paper into DocBook and then to HTML. I did this in about half a day, mostly converting LaTeX commands to DocBook XML by hand using the Emacs nXML mode.


Or you can download a tarball with my source as well. I applied the most basic docbook-xsl transform. On Ubuntu, you can apt-get docbook-xsl and Macports seems to include this stuff as well. I used xsltproc to apply the transform with some arguements:


xsltproc --output index.html \
--stringparam bibliography.numbered 1 \
--stringparam bibliography.collection \
./bib.xml \
/usr/share/xml/docbook/stylesheet/nwalsh/html/docbook.xsl \
db-ismir09.xml

We are using --stringparam to pass some optional parameters to the docbook.xsl to get a numbered bibliography and to specify the external bibliography file. These a complete reference list of these parameters is provided on the docbook sourceforge page.

You may notice, not everything went according to plan. I had to spend a lot of time converting the bibliography by hand - the tools I found to do this didn't really work. So in the end, I left the bibliography a bit of a mess with some left-over latex in there.

Furthermore, the PDF version was a bit of a disaster. I used the dblatex utility to generate the PDF as follows:



dblatex --output=db-ismir09.pdf -T simple -L bib.xml db-ismir09.xml


While I was able to get the Docbook tool chain up and working rather quickly, I did have lots of questions. My posts to the mailing list and to the IRC channel were virtually ignored (I had one pleasant reply from a jsmith on IRC pointing me to a masters thesis written in docbook which was helpful but includes no citations or bibliography - the thesis is really a technical software documentation). This was really off-putting for me and (sigh) has led me back to LaTeX.

Luckily, thanks to my long-time friend and colleague Ben Fields, I discovered HeVeA. This is more modern and complete LaTeX to HTML package that seems very promising. I was able to convert the same ISMIR 2009 paper to HTML in a manner of minutes.


This actually looks much better than the Docbook version IMHO. The only major problem I see is that the URLs in my LaTeX don't seem to automagically become href's in the HTML as I had hoped. There was also some funkiness with the images I had to fix by hand. But this package seems to be well documented and I am optimistic.

In addition to HeVeA, Ben suggested TtH which I had tried before and had problems getting it to work (probably user error to be fair). But I hope to give it a real try again this week.

So in conclusion, in my brief experience Docbook is not as horrible as everybody said. But, the Docbook user community seemed to give me the cold shoulder (sorry I'm a lamer noob who has questions). It seems using Docbook instead of LaTeX would, in the end, create a lot more problems than it would solve. I still like the idea of Docbook - it seems so much more "future proof" than LaTeX - but I'm going to have to stick to the beaten path on this one. Looks like LaTeX wins.

UPDATE:

I've managed to use TTH to create an HTML version of the same paper.

http://docs.kurtisrandom.com/ismir2009-tth/

Looks pretty good, maybe the best yet, but not sure what it's doing with the figures - seems to create a link to the image instead of displaying it with an appropriate <img> tag. Not bad at all, but it seems HeVeA allows more control and is better documented.

0 comments:

Related Posts with Thumbnails