If you have ever tried to convert a Microsoft Word document to an HTML document using Microsoft Word, you've probably noticed that about 90% of the resulting HTML document contains lines of unnecessary garbage. That problem aside, you really didn't get exactly what you expected, especially when it came to the tables. We recognized this as a crucial issue and have put forth a great deal of effort to ensure that you're achieving the results you expect.
The most important feature of Word2Html is that you don't need to have Microsoft Word or Office installed to read and convert documents. That's right, Word2html LT offers incredible accuracy, blistering speed, low memory and disk requirements; all without the need of the Microsoft Word objects.
Now you can painlessly convert your documents into HTML with just a few clicks. It doesn't matter if you're stuck with an old PC, you will easily convert a dozen or more documents in less time than it takes to open a single document in Microsoft Word. Plus, the output HTML file produced by Word2Html is the smallest possible, normally 10-20 times smaller than the HTML document produced by Microsoft Word or other components in the market.
Let's compare the HTML output between Microsoft Word, OpenOffice.org, and Word2html LT.
We created a test for these 3 competitors to convert Microsoft Word documents to HTML documents. All of the files used in this test can be downloaded here (24 KB). This package contains the Microsoft Word document and the output files from the 3 applications - Microsoft Word 2003 (file - msword.html), OpenOffice.org 1.0.1 (file - ooo.html) and Word2Html LT (file - word2html_lt.html).
We chose OpenOffice.org as a competitor in this test to give a fair, unbiased group of competitors. We feel that OpenOffice.org does an impressive job in its parsing of Microsoft Word documents.
The test document contained a large table with 63 columns, 40 rows, and cells with both vertical and horizontal merged cells. To throw a little more detail into the test, we have some of the merged cells colored backgrounds and the central cell contains two numbered lists, the latter containing some custom text before the roman numerals.
The table below shows the differences between the HTML document conversions performed by the different applications. You can click on a thumbnail to see the larger image.
|Original layout (MS Word)||OpenOffice.org 1.0.1||Word2html LT|
|It came as no big surprise to find that the HTML document produced by Microsoft Word completely retained its original document layout. The only down side to Microsoft Word's resulting HTML document was its size at almost 600 KB. Because the vast majority of Word/HTML conversion applications on the market today use Microsoft Word for HTML extraction, this is the kind of result you can expect from such applications.||The size of the HTML document that was created by OpenOffice.org was about 100 KB, but the result was disappointing. The cell merging was done incorrectly as well as the cell height. The custom numbered list did not contain the custom text and the number format was handled incorrectly, not to mention that the background color detection worked on the wrong cells. The overall results were quite disappointing.||While it is very hard to judge yourself, we were happy with the results. The size of the HTML document is about 28 KB, and while we do not use CSS, the rest of the document conversion was a true success. We feel confident that our libraries are the right choice.|
Copyright © 2003-2013, Wordcnv Software