-->

Previous | Table of Contents | Next

Page 1156

INTRODUCTION

DOS uses a different character code mapping from UNIX. Seven-bit characters still have the same meaning; only characters with the eight-bit set are affected. To make matters worse, there are several translation tables available depending on the country where you are. The appearance of the characters is defined using code pages. These code pages aren't the same for all countries. For instance, some code pages don't contain upper -case accented characters. On the other hand, some code pages contain characters that don't exist in UNIX, such as certain line-drawing characters or accented consonants used by some Eastern European countries. This affects two things relating to filenames:

Uppercase characters In short names, only uppercase characters are allowed. This also holds for accented characters. For instance, in a code page that doesn't contain accented uppercase characters, the accented lowercase characters get transformed into their unaccented counterparts.
Long filenames Microsoft has finally come to their senses and uses a more standard mapping for the long filenames. They use Unicode, which is basically a 32-bit version of ASCII. Its first 256 characters are identical to UNIX ASCII. Thus, the code page also affects the correspondence between the codes used in long names and those used in short names.

mtools considers the filenames entered on the command line as having the UNIX mapping and translates the characters to get short names. By default, code page 850 is used with the Swiss uppercase/lowercase mapping. I chose this code page because its set of existing characters most closely matches UNIX's. Moreover, this code page covers most characters in use in the USA, Australia, and Western Europe. However, it is still possible to chose a different mapping. There are two methods: the country variable and explicit tables.

CONFIGURATION USING COUNTRY

The COUNTRY variable is recommended for people that also have access to MS-DOS system files and documentation. If you don't have access to these, I'd suggest you use explicit tables instead.

Syntax: COUNTRY=" country [,[ codepage ], country.sys ]"

This tells mtools to use a UNIX-to-DOS translation table that matches codepage and an lowercase-to-uppercase table for country and to use the country.sys file to get the lowercase-to-uppercase table. The country code is most often the telephone prefix of the country. Refer to the DOS help page on country for more details. The codepage and the country.sys parameters are optional. Don't type in the square brackets; they are only there to indicate which parameters are optional. The country.sys file is supplied with MS-DOS. In most cases, you don't need it because the most common translation tables are compiled into mtools. Don't worry if you run a UNIX-only box that lacks this file.

If codepage is not given, a per-country default code page is used. If the country.sys parameter isn't given, compiled-in defaults are used for the lowercase-to-uppercase table. This is useful for other Unices than Linux, which may have no country.sys file available online.

The UNIX-to-DOS are not contained in the country.sys file, and thus mtools always uses compiled-in defaults for those. Thus, only a limited amount of code pages are supported. If your preferred code page is missing, or if you know the name of the Windows 95 file that contains this mapping, drop me a line at Alain.Knaff@inrialpes.fr.

The COUNTRY variable can also be set using the environment.

CONFIGURTION USING EXPLICIT TRANSLATION TABLES

Translation tables may be described in lines in the configuration file. Two tables are needed: first the DOS-to-UNIX table and then the lowercase-to-uppercase table. A DOS-to-UNIX table starts with the tounix keyword, followed by a colon and 128 hexadecimal numbers. A lower-to-upper table starts with the fucase keyword, followed by a colon and 128 hexadecimal numbers.

The tables only show the translations for characters whose codes is greater than 128 because translation for lower codes is trivial. Example:

Page 1157


tounix:





0xc7 0xfc 0xe9 0xe2 0xe4 0xe0 0xe5 0xe7

0xea 0xeb 0xe8 0xef 0xee 0xec 0xc4 0xc5

0xc9 0xe6 0xc6 0xf4 0xf6 0xf2 0xfb 0xf9

0xff 0xd6 0xdc 0xf8 0xa3 0xd8 0xd7 0x5f

0xe1 0xed 0xf3 0xfa 0xf1 0xd1 0xaa 0xba

0xbf 0xae 0xac 0xbd 0xbc 0xa1 0xab 0xbb

0x5f 0x5f 0x5f 0x5f 0x5f 0xc1 0xc2 0xc0

0xa9 0x5f 0x5f 0x5f 0x5f 0xa2 0xa5 0xac

0x5f 0x5f 0x5f 0x5f 0x5f 0x5f 0xe3 0xc3

0x5f 0x5f 0x5f 0x5f 0x5f 0x5f 0x5f 0xa4

0xf0 0xd0 0xc9 0xcb 0xc8 0x69 0xcd 0xce

0xcf 0x5f 0x5f 0x5f 0x5f 0x7c 0x49 0x5f

0xd3 0xdf 0xd4 0xd2 0xf5 0xd5 0xb5 0xfe

0xde 0xda 0xd9 0xfd 0xdd 0xde 0xaf 0xb4

0xad 0xb1 0x5f 0xbe 0xb6 0xa7 0xf7 0xb8

0xb0 0xa8 0xb7 0xb9 0xb3 0xb2 0x5f 0x5f





fucase:





0x80 0x9a 0x90 0xb6 0x8e 0xb7 0x8f 0x80

0xd2 0xd3 0xd4 0xd8 0xd7 0xde 0x8e 0x8f

0x90 0x92 0x92 0xe2 0x99 0xe3 0xea 0xeb

0x59 0x99 0x9a 0x9d 0x9c 0x9d 0x9e 0x9f

0xb5 0xd6 0xe0 0xe9 0xa5 0xa5 0xa6 0xa7

0xa8 0xa9 0xaa 0xab 0xac 0xad 0xae 0xaf

0xb0 0xb1 0xb2 0xb3 0xb4 0xb5 0xb6 0xb7

0xb8 0xb9 0xba 0xbb 0xbc 0xbd 0xbe 0xbf

0xc0 0xc1 0xc2 0xc3 0xc4 0xc5 0xc7 0xc7

0xc8 0xc9 0xca 0xcb 0xcc 0xcd 0xce 0xcf

0xd1 0xd1 0xd2 0xd3 0xd4 0x49 0xd6 0xd7

0xd8 0xd9 0xda 0xdb 0xdc 0xdd 0xde 0xdf

0xe0 0xe1 0xe2 0xe3 0xe5 0xe5 0xe6 0xe8

0xe8 0xe9 0xea 0xeb 0xed 0xed 0xee 0xef

0xf0 0xf1 0xf2 0xf3 0xf4 0xf5 0xf6 0xf7

0xf8 0xf9 0xfa 0xfb 0xfc 0xfd 0xfe 0xff

The first table maps DOS character codes to UNIX character codes. For example, the DOS character number 129 is a u with two dots on top of it. To translate it into UNIX, we look at the character number 1 in the first table (1 = 129 - 128). This is 0xfc. (Beware; numbering starts at 0.) The second table maps lowercase DOS characters to uppercase DOS characters. The same lowercase u with dots maps to character 0x9a, which is an uppercase U with dots in DOS.

UNICODE CHARACTERS GREATER THAN 256

If an existing MS-DOS name contains Unicode character greater than 256, these are translated to underscores or to characters that are close in visual appearance. For example, accented consonants are translated into their unaccented counterparts. This translation is used for mdir and for the UNIX filenames generated by mcopy. Linux does support Unicode too, but unfortunately, too few applications support it yet to bother with it in mtools. Most importantly, xterm can't display Unicode yet. If there is sufficient demand, I might include support for Unicode in the UNIX filenames as well.

Caution: When deleting files with mtools, the underscore matches all characters that can't be represented in UNIX. Be careful before mdel!

LOCATION OF CONFIGURATIO FILES AND PARSING ORDER

The configuration files are parsed in the following order:

Compiled-in defaults

Previous | Table of Contents | Next