Converting documents to mediawiki markup

From OpenWetWare
Jump to navigationJump to search

Introduction

The initial goal of this page is to work out what existing tools are available to automatically convert documents to mediawiki markup. Based on identified shortcomings in available tools, we'd like to propose some tools that should be developed to facilitate the growth of OpenWetWare.

There appears to be few existing tools to automatically convert other document formats (e.g .doc, .xls, .ppt). The simplest approach right now appears to be to convert documents into html and from that to convert them to wiki markup as there are good html->wiki converters. Here are some good links.

More tools and techniques have been developed at Appropedia (the sustainable development wiki) including for converting PDFs: Porting formatted content to MediaWiki and Help:Porting PDF files to MediaWiki

HTML documents

html2wiki converter based on HTML::WikiConverter Perl module

Word Documents

Saving a relatively simple Word document (no images or tables) to html and then running that through the converter here produced good mediawiki formatting. A document including images, tables, and centered text did not work as well. The images would need to be added to OWW separately, the table also didn't come out quite right and centered text was no longer centered.

A direct converter can be found here.

A series of Word macros for doing simple conversions (including tables) is here; they seem to work reasonably well but aren't designed for sophisticated layouts.

Also, with the release of OpenOffice, 2.4, OpenOffice can now export documents to mediawiki format. Since OpenOffice can also read MS Word documents, this allows OpenOffice to serve as a Word to MediaWiki converted.

Images

I have had good success with the following steps for porting images embeddded in word documents to MediaWiki format on a Mac:

  1. Click on the image in the word document and choose Edit->Copy from the menu (cmd-C)
  2. Go to the application GraphicConverter and choose File->New->Image with clipboard (cmd-J)
  3. Choose File->Save as and save as a JPEG/JFIF format (.jpg) file with 100% Quality.

Alternatively, if you want to take an image which has associated text boxes, it seems to come out well if you take a screenshot of a selection with Grab (in the /Applications/Utilities folder), save as a .tiff (your only option) and then open in GraphicConverter and save as a JPEG as described above.

Excel Documents

  • If you can export a data in comma separated variable (CSV) format, then a converter exists.
  • Simpler, less feature-rich script supporting "copy and paste" conversion: Excel to Wiki Table Converter
  • The Wibbit extension currently installed has a table editor with a import from tab separated values so you do not need any external tools. On the edit page, click on the insert table button (second one from left) and click paste.

Powerpoint Documents

Not a lot here, even the .ppt->html converters are unimpressive. Based on the partial interchangeability of the .doc and .ppt formats, it might be possible go .ppt->.doc->.html->wiki.

LaTex Documents

Mediawiki has support for some Tex math formatting so there are ways to do this. Most people who are active users of Tex are probably savvy enough to be able/willing to convert their documents into mediawiki format. Still room for improvement though.

PmWiki pages

Convert PmWiki to MediaWiki, java

Conclusions and deliverables

This page is in active development but very initial conclusions will begin to appear here
Based on what I've read so far, there are ways of converting document formats and it helps if you are somewhat computer savvy but there is definitely a need for a converter for a range of formats that is well integrated with the mediawiki sofware.