-->

Previous | Table of Contents | Next

Page 281

tryaffix

The tryaffix shell script is used to estimate the effectiveness of a proposed prefix (_p switch) or suffix (_s switch, the default) with a given expanded-file. Only one affix can be tried with each execution of tryaffix, although multiple arguments can be used to describe varying forms of the same affix flag (for example, the D flag for English can add either D or ED depending on whether a trailing E is already present). Each word in the expanded dictionary that ends (or begins) with the chosen suffix (or prefix) has that suffix (prefix) removed; the dictionary is then searched for root words that match the stripped word. Nor-mally, all matching roots are written to standard output, but if the _c (count) flag is given, only a statistical summary of the results is written. The statistics given are a count of words the affix potentially applies to and an estimate of the number of dictionary bytes that a flag using the affix would save. The estimate will be high if the flag generates words that are currently generated by other affix flags (for example, in English, bathers can be generated by either bath/X or bather/S). The diction-ary file, expanded-file, must already be expanded (using the _e switch of ispell) and sorted, and things will usually work best if uppercase has been folded to lower with tr.

The affix arguments are things to be stripped from the dictionary file to produce trial roots: for English, con (prefix) and ing (suffix) are EXAMPLES. The addition parts of the argument are letters that would have been stripped off the root before adding the affix. For example, in English the affix ing normally strips e for words ending in that letter (for example, like becomes liking), so we might run


tryaffix ing ing+e

to cover both cases.

All of the shell scripts contain documentation as commentary at the beginning; sometimes these comments contain useful information beyond the scope of this manual page.

It is possible to install ispell in such a way as to only support ASCII range text if desired.

icombine

The icombine program is a helper for munchlist. It reads a list of words in dictionary format (roots plus flags) from the standard input, and produces a reduced list of standard output that combines common roots found on adjacent entries. Identical roots that have differing flags will have their flags combined, and roots that have differing capitalizations will be combined in a way that only preserves important capitalization information. The optional aff-file specifies a language file that defines the character sets used and the meanings of the various flags. The _T switch can be used to select among alternative string character types by giving a dummy suffix that can be found in an altstringtype statement.

ijoin

The ijoin program is a reimplementation of join(1), which handles long lines and 8-bit characters correctly. The _s switch specifies that the sort(1) program used to prepare the input to ijoin uses signed comparisons on 8-bit characters; the _u switch specifies that sort(1) uses unsigned comparisons. All other OPTIONS and behaviors of join(1) are duplicated as exactly as possible based on the manual page, except that ijoin will not handle newline as a field separator. See the join(1) manual page for more information.

ENVIRONMENT

DICTIONARY Default dictionary to use if no _d flag is given
WORDLIST Personal dictionary filename
INCLUDE_STRING Code for file inclusion under the _A option
TMPDIR Directory used for some of munchlist's temporary FILES

FILES

!!LIBDIR!!/!!DEFHASH!! Hashed dictionary (may be found in some other local directory, depending on the system)
!!LIBDIR!!/!!DEFLANG!! Affix-definition file for munchlist
/usr/dict/web2 or /usr/dict/words For the Lookup function (depending on the WORDS compilation option) User's private dictionary
.ispell_hashfile Directory-specific private dictionary

Page 282

SEE ALSO

spell(1), egrep(1), look(1), join(1), sort(1), sq(1L), tib(1L), ispell(4L), english(4L)

BUGS

It takes several to many seconds for ispell to read in the hash table, depending on size.

When all OPTIONS are enabled, ispell may take several seconds to generate all the guesses at corrections for a misspelled word; on slower machines this time is long enough to be annoying.

The hash table is stored as a quarter-megabyte (or larger) array, so a PDP-11 or 286 version does not seem likely.

Ispell should understand more troff syntax, and deal more intelligently with contractions.

Although small personal dictionaries are sorted before they are written out, the order of capitalizations of the same word is somewhat random.

When the _x flag is specified, ispell will unlink any existing BAK file.

There are too many flags, and many of them have non-mnemonic names.

munchlist does not deal very gracefully with dictionaries that contain nonword characters. Such characters ought to be deleted from the dictionary with a warning message. findaffix and munchlist require tremendous amounts of temporary file space for large dictionaries. They do respect the TMPDIR ENVIRONMENT variable, so this space can be redirected. However, a lot of the temporary space needed is for sorting, so TMPDIR is only a partial help on systems with an uncooperative sort(1). (Cooperative is defined as accepting the undocumented -T switch). At its peak usage, munchlist takes 10 to 40 times the original dictionary's size in kilobytes. (The larger ratio is for dictionaries that already have heavy affix use, such as the one distributed with ispell). munchlist is also very slow; munching a normal-sized dictionary (15KB roots, 45KB expanded words) takes around an hour on a small workstation. (Most of this time is spent in sort(1), and munchlist can run much faster on machines that have a more modern sort that makes better use of the memory available to it.) findaffix is even worse; the smallest English dictionary cannot be processed with this script in a mere 50KB of free space, and even after specifying switches to reduce the temporary space required, the script will run for more than 24 hours on a small workstation.

AUTHORS

Pace Willisson (pace@mit-vax), 1983, based on the PDP-10 assembly version. That version was written by R. E. Gorin in 1971, and later revised by W. E. Matson (1974) and W. B. Ackerman (1978). Collected, revised, and enhanced for the Usenet by Walt Buehring, 1987. Table-driven multilingual version by Geoff Kuenning, 1987_88. Large dictionaries provided by Bob Devine (vianet!devine). A complete list of contributors is too large to list here, but is distributed with the ispell sources in the file Contributors.

VERSION

The version of ispell described by this manual page is International Ispell version 3.1.00, October 8, 1993.

join

join—Join lines of two FILES on a common field

SYNOPSIS


join [_a 1|2] [_v 1|2] [_e empty-string] [_o field-list...] [_t char]

[_j[1|2] field] [_1 field] [_2 field] file1 file2

join {--help,--version}

DESCRIPTION

This manual page documents the GNU version of join. join prints to the standard output a line for each pair of input lines, one each from file1 and file2, that have identical join fields. Either filename (but not both) can be _, meaning the standard

Previous | Table of Contents | Next