Comprehensive TeX Archive Network

Direc­tory tex-archive/support/word2x

$Id: README,v 1.7 1997/04/13 02:59:59 dps Exp $

What is new in version 0.005 of word2x

Update version to 0.005
Fix version number bug
Update config.guess an config.sub
Re-generate configure.in with newer autoconf
Fix various ANSI violations that g++ 2.95 disliked

What is new in version 0.004 of word2x

Stupid bug in word2x_junk_filter::filter_junk bug which ignored the last
character read squashed.

Added german support from word2x port EX2.

What it was new in version 0.003 of word2x

word2x-0.003 is version word2x-0.002 with a major bug in strip.cc
eliminated. word2x-0.002 was 0.001 retro-fitted with some quite new
junk filtering code with lots of tunable parameters (i.e. all of
tune.h).  This code is extracted from the envolving, and currently
incomplete, source tree of the next major release. (When this happens I
will stop supporting or maintaing any 0.00x versions).

The major change is much better junk filtering, losing less text and
throwing out more junk; unicode documents should now
work. Increasing numbers of problem document which have OLE junk in
places that break the code are appearing. Splitting the document with
lls (from the LAOLA package) and attacking the WordDocument stream
works---sometime I will have a useable library that can do this
automagically. (In word2x-0.002 you have a very good chance of
tickling the strip.cc bug and its (buggy) bug trap).

Documents that do cause problems after the suggested work-around to
word2x@duncan.telstar.net please. The immediate fix is to try one of
the other two programs. (Free software people are prepared to
co-operate with the "competition"). There are links to all the
"competition" I know of on the word2x home page at
http://word2x.alcom.co.uk (hosted by the alcom.co.uk free of charge,
despite the fact charges normally apply).

Installing word2x

You need a C++ compiler and a version of make that does understands
how to make .o files from .cc files, for example GNU make. Ideally you
have getopt_long already in your C library but you might not. If this
applies set GETOPT to gopt.o in the Makefile. getopt_long is the
version supplied by the free software foundation in glibc-1.09

If your make does not know then add a rule. for GNU make the rule is

%.o: %.cc
	$(CPP) $(CPPFLAGS) -c -o $@ $<

Please note that a warning about a contravariance violation is normal.

As this is program only recently escaped, YMMV. The main reason for
its escape was incessant irration that comp.os.linux.misc posters
manage about word .doc files (IMHO this is justified). I had wrote
this program for myself and my word problem; I let it run wild in the
hope that is it useful for others. [I now know is it is helping some

Further information on other converters is avialable in the list of
converters avialble via <http://www.kfa-juelich.de/isr/1/texconv.html>
(word2x seems to have a monopoly on converters from word to latex not
requiring word and avialable on non-MS platforms).

The program has been compiled on (the first two by me personally):

Linux 2.1.30 (Unix)
SunOS (Unix)
DEC Alpha AXP under OSF/1 (Unix) 
IBM SP/2 (RS6000) under AIX (Unix) [SP/2s are heavy computing power...]

It is known not compile with

Borland C++ 3.1 (PC version).

If any manages to compile on a PC version, please tell
Duncan Simpson <dps@duncan.telstar.net> and
W.Hennings <W.Hennings@kfa-juelich.de>

Limited flat (linear) memory might be lethal, esp. if your system
lacks alloca. If have not learned to steal what is free then you can
send money (prefably UK funds), postcards, etc to the author at

Frax House, Kingston Bagpuize, OXON OX13 5AW

or for the next couple of years

Flat 6, 93 Westridge Road, Southampton

I neither suggest that you do donate nor that you do not donate.


Can be problematic. Setting LD to ./sunos_link and defining add
produced a binary that worked for me with one warning about
strncasecmp. I guess SUN's ld is incompatible with g++ or something;
using ar and ranlib, aka the sunos_link shell script, works. The
configuration script hopefuk does this stuff for you.

Reported bugs

On some platforms it misses the first 3/4 of a page. If you are
afflicted get out your copy of hexdump and adjust the start offset in
word2x to the correct value. This should be fixed now.


This program is(c) D.P.Simson 1997. The program is licenced under the
GPL version 2, or any later version (at your option). This means DOS
people must distribute source as per the GPL.

The stuff I did not write is:

config.guess and config.sub come from GNU autoconf and are thus
(c) The Free Software Foundation.

getopt.c, getopt1.c and getopt.h are (c) The Free Software Foundation.
I am fairly sure the LGPL requires these files to be distributed as

alloca.c is almost certaintly also (c) The Free Sofwtware Foundation.

install-sh is probably (c) The X consortium

Introductory proganda

Despite the fact that open formats like rtf are good and widely
avialable far too many idiots seem to insist on using word .doc
format. This program is an attempt to limit the damage this causes
users of non-microsoft systems and text processing systems, for
example LaTeX.

It is designed to be retargetable and avoid some of the travesties of
proper typsetting comitted by word, which is hobbled by the lack of
litagures in TrueType fonts (and the lack of different design sizes to
some extent). There is quite a large amount of guesswork from context
to reduce the impact of my lack of understanding a document the way
word does. One even sees interesting things like
<Paragraph mode> 550* <eqn> \F(foo, bar) <end eqn> * 42 * (pixels per em)
<End paragraph>
which is not too good! There may be multiple bits of alternating roman
and equation, multiple items of text in brackets, etc.
etc. Fortunately the reader converts these, in two stages, to a single
maths insert. Maths inserts with embedded newlines get rendered as
eqnarray* in LaTeX mode. All maths is just deleted in text mode (would
someone like to add this support?).

LaTeX mode sees the equation example above as <eqn insert> 550
* \F(foo, bar * baz) * 42 * (pixels per em) <end eqn insert> and
renders it as
% Some comments omited for brevity
$$550 \times {\text{foo} \over \text{bar} \times \text{baz} } \times 42
\times \text{(pixels per em)}$$
which looks a lot better than word's own version, which uses awful
stars instead of proper times signs.

Text mode implements tables with real columns, unlike catdoc. Long
entries are folded automatically and there is some semi-intelligent
width reduction.  Hypenation is not supported so if someone instists
on using supercalifragilistic... then an overlong line might result
(anyone care to fix this? I thought it was just overkill to implement
the hypenation algorithm along with all the rest).

Apart from the pictures and a little trailing junk the code does a
good job on the TrueType documents. The readme generates some error
messages about extra ^Us amoung other things due to a lack of
understanding of some of the inserts used in some documents. Anyone
who can decode more types of insert, please tell me about it and
preferably send a patch so I can avoid extra programming (got too much
real work to be doing).

If someone wishes to contribute *roff output I would include it. Extra
understanding of equations also gratefully recieved as the examples in
the TrueType docs are rather limited. Bibliography and any other you
can tell me about also grateful listened to.

Duncan (-:


Name Size Date Notes
INTERNALS 7312 1998-10-07 20:12
Make­file.in 3052 1998-12-28 22:17
Make­file.linux 1418 1998-10-07 20:12
README 7516 1999-08-06 02:20
al­loca.c 13293 1998-10-07 20:12
col-align.cc 2072 1998-10-07 20:12
com­pat.c 1097 1998-10-07 20:12
con­fig.guess 27344 1999-08-06 02:07
con­fig.h.bot 122 1998-10-07 20:12
con­fig.h.in 2217 1998-10-07 20:12
con­fig.h.top 44 1998-10-07 20:12
con­fig.sub 19849 1999-08-06 02:07
con­fig­ure 67689 1999-08-06 02:12
con­fig­ure.in 2130 1999-08-06 02:12
deHTML­date.cc 915 1998-11-30 12:41
deL1­date.cc 793 1998-11-30 12:41
de­date.cc 854 1998-11-30 12:41
fake_link 95 1998-10-07 20:12
fifo.h 4168 1999-08-06 02:09
fmt-html.h 995 1998-10-07 20:12
fmt-la­tex.h 920 1998-10-07 20:12
getopt.c 21957 1998-10-07 20:12
getopt.h 4539 1998-10-07 20:12
getopt1.c 4448 1998-10-07 20:12
html-em­bed.cc 7381 1998-10-07 20:12
html-fmt.cc 10893 1998-10-07 20:12
html-ta­ble.cc 5906 1999-08-06 02:12
html-ta­ble.h 520 1998-10-07 20:12
in­stall-sh 4772 1998-10-07 20:12
in­ter­face.h 1391 1998-10-07 20:12
la­tex-em­bed.cc 10641 1998-10-07 20:12
la­tex-fmt.cc 11986 1998-10-07 20:12
la­tex-ta­ble.cc 6111 1998-10-07 20:12
la­tex-ta­ble.h 527 1998-10-07 20:12
lib.h 2564 1998-11-30 12:44
li­boutfmt.a 154364 1999-08-06 03:06
map_chars.cc 864 1998-10-07 20:12
null­proc.cc 320 1998-10-07 20:12
num_unit_probe.c 1305 1998-10-07 20:12
part_num_probe.c 782 1998-10-07 20:12
reader.cc 25003 1998-10-07 20:12
reader.h 5590 1998-10-07 20:12
rtest2 131357 1999-08-06 03:06
rtest2.cc 1799 1998-10-07 20:12
scan_num.cc 1022 1998-10-07 20:12
shrink_width.cc 748 1998-10-07 20:12
strip.cc 12895 1999-05-09 16:50
strip.cc.orig 12817 1998-10-07 20:12
strip.h 2979 1999-08-06 02:09
tblock.cc 4433 1998-10-07 20:12
tblock.h 791 1998-10-07 20:12
tblock.h.orig 813 1998-10-07 20:12
text-fmt.cc 6502 1998-10-07 20:12
text-ta­ble.cc 6863 1998-10-07 20:12
text-ta­ble.h 453 1998-10-07 20:12
tran­script 4014 1999-08-06 03:06
tune.h 808 1998-10-07 20:12
uk­date.cc 1122 1998-10-07 20:12
us­date.cc 1126 1998-10-07 20:12
use_getopt.h 210 1998-10-07 20:12
word2x 323122 1999-08-06 03:06
word2x.1 3051 1998-10-07 20:12
word2x.cc 7252 1999-08-06 02:14
word6.h 796 1998-10-07 20:12
word­wrap.cc 1340 1998-10-07 20:12

Down­load the con­tents of this pack­age in one zip archive (291.6k).

word2x – Word 6 for­mat con­verter

A Word 6 to any­thing con­verter, cur­rently sup­port­ing out­put for­mat­ted as plain text or as LaTeX.

Pack­age De­tailsword2x
Li­censeGNU Gen­eral Public Li­cense
Main­tainerDun­can Simp­son
Topics im­port files in a non-TeX (or dif­fer­ent TeX) for­mat
See also catdoc
Guest Book Sitemap Contact Contact Author