-->

Previous | Table of Contents | Next

Page 1086

would replace the functions of the left and right curly braces with the left and right angle brackets for purposes of parsing TeX/LaTeX constructs, while retaining their functions for the tib bibliographic preprocessor. Note that the backslash, the left square bracket, and the right angle bracket must be escaped with a backslash.


opt-stmt : { cmpnd-stmt | aff-stmt }

cmpnd-stmt : compoundwords compound-opt

 aff-stmt : allaffixes on-or-off

 on-or-off : { on | off }

compound-opt : { on-or-off | controlled character }

An opt-stmt, used in the preceding code, controls certain ispell defaults that are best made language-specific. The allaffixes statement controls the default for the _P and _m options to ispell. If allaffixes is turned off (the default), ispell will default to the behavior of the _P flag: root/affix suggestions will only be made if there are no "near misses." If allaffixes is turned on, ispell will default to the behavior of the _m flag: root/affix suggestions will always be made.

The compoundwords statement controls the default for the _B and _C options to ispell. If compoundwords is turned off (the default), ispell will default to the behavior of the _B flag: run-together words will be reported as errors. If compoundwords is turned on, ispell will default to the behavior of the _C flag: run-together words will be considered as compounds if both are in the dictionary. This is useful for languages such as German and Norwegian, which form large numbers of compound words. Finally, if compoundwords is set to controlled, only words marked with the flag indicated by character (which should not be otherwise used) will be allowed to participate in compound formation. Because this option requires the flags to be specified in the dictionary, it is not available from the command line.


flag-stmt : flagmarker character

The flagmarker statement describes the character that is used to separate affix flags from the root word in a raw dictionary file. This must be a character that is not found in any word (including in string characters; see following). The default is / because this character is not normally used to represent special characters in any language.


num-stmt : compoundmin digit

The compoundmin statement controls the length of the two components of a compound word. This only has an effect if compoundwords is turned on or if the _C flag is given to ispell. In that case, only words at least as long as the given minimum will be accepted as components of a compound. The default is 3 characters.


char-sets : norm-sets [ alt-sets ]

The character-set section describes the characters that can be part of a word, and defines their collating order. There must always be a definition of "normal" character sets; in addition, there may be one or more partial definitions of "alternate" sets that are used with various text formatters.


norm-sets :[deftype ] charset-group

A "normal" character set may optionally begin with a definition of the file suffixes that make use of this set. Following this are one or more character-set declarations.


deftype : defstringtype name deformatter suffix*

The defstringtype declaration gives a list of file suffixes that should make use of the default string characters defined as part of the base character set; it is only necessary if string characters are being defined. The name parameter is a string giving the unique name associated with these suffixes; often it is a formatter name. If the formatter is a member of the troff family, nroff should be used for the name associated with the most popular macro package; members of the TeX family should use tex. Other names may be chosen freely, but they should be kept simple, as they are used in ispell's _T switch to specify a formatter type. The deformatter parameter specifies the deformatting style to use when processing files with the given suffixes. Currently, this must be either tex or nroff. The suffix parameters are a whitespace-separated list of strings which, if present at the end of a filename, indicate that the associated set of string characters should be used by default for this file. For example, the suffix list for the troff family typically includes suffixes such as .ms, .me, .mm, and so on.


charset-group : { char-stmt | string-stmt | dup-stmt}*

Page 1087

A char-stmt describes single characters; a string-stmt describes characters that must appear together as a string, and which usually represent a single character in the target language. Either may also describe conversion between uppercase and lowercase. A dup-stmt is used to describe alternate forms of string characters, so that a single dictionary may be used with several formatting programs that use different conventions for representing non-ASCII characters.


              char-stmt :    wordchars character-range

                        |    wordchars lowercase-range uppercase-range

                        |    boundarychars character-range

                        |    boundarychars lowercase-range uppercase-range

              string-stmt    :    stringchar string

                        |    stringchar lowercase-string uppercase-string

Characters described with the boundarychars statement are considered part of a word only if they appear singly, embedded between characters declared with the wordchars or stringchar statements. For example, if the hyphen is a boundary character (useful in French), the string foo-bar would be a single word, but -foo would be the same as foo, and foo_bar would be two words separated by nonword characters.

If two ranges or strings are given in a char-stmt or string-stmt, the first describes characters that are interpreted as lowercase and the second describes uppercase. In the case of a stringchar statement, the two strings must be of the same length. Also, in a stringchar statement, the actual strings may contain both uppercase and characters themselves without difficulty; for instance, the statement:


stringchar "\\*(sS" "\\*(Ss"

is legal and will not interfere with (or be interfered with by) other declarations of "s" and "S" as lowercase and uppercase, respectively.

A final note on string characters: some languages collate certain special characters as if they were strings. For example, the German "a-umlaut" is traditionally sorted as if it were ae. ispell is not capable of this; each character must be treated as an individual entity. So in certain cases, ispell will sort a list of words into a different order than the standard "dictionary" order for the target language.


alt-sets : alttype [ alt-stmt*]

Because different formatters use different notations to represent non-ASCII characters, ispell must be aware of the representations used by these formatters. These are declared as alternate sets of string characters.


alttype : altstringtype name suffix*

The altstringtype statement introduces each set by declaring the associated formatter name and filename suffix list. This name and list are interpreted exactly as in the defstringtype statement. Following this header are one or more alt-stmts that declare the alternate string characters used by this formatter.


alt-stmt : altstringchar alt-string std-string

The altstringchar statement describes alternate representations for string characters. For example, the _mm macro package of troff represents the German "a-umlaut" as a\*:, while TeX uses the sequence \"a. If the troff versions are declared as the standard versions using stringchar, the TeX versions may be declared as alternates by using the statement:


altstringchar \\\"a a\\*

When the altstringchar statement is used to specify alternate forms, all forms for a particular formatter must be declared together as a group. Also, each formatter or macro package must provide a complete set of characters, both uppercase and lowercase, and the character sequences used for each formatter must be completely distinct. Character sequences that describe uppercase and lowercase versions of the same printable character must also be the same length. It may be necessary to define some new macros for a given formatter to satisfy these restrictions. (The current version of buildhash does not enforce these restrictions, but failure to obey them may result in errors being introduced into files that are processed with ispell.)

An important minor point is that ispell assumes that all characters declared as wordchars or boundarychars will occupy exactly one position on the terminal screen.

Previous | Table of Contents | Next