Chapter 16
Using Internationalization

by Eric Ries

Introduction to Internationalization Here you'll find an introduction to the concepts of Internationalization and an overview of the techniques used to implement it.
Advantages of Internationalization There are many advantages of Internationalization over traditional techniques that you will learn about.
Internationalization classes In-depth coverage of the new Java classes for implementing Internationalization in Java is included.

Joe Programmer is a Java developer for Company X. His distributed sales application is a huge success in California, where his company is based, mainly because he follows good object-oriented design and implementation: keeping his objects portable, reusable, and independent. One day, Company X decides to start selling its product in Japan. Joe Programmer, who does not know Japanese, gets a Java-literate translator to go through his code and make all the necessary changes using some custom Japanese language character set that Joe doesn't really understand. But he happily compiles this Japanese-language version of his code and sends it off to Japan, where it is a big success. Encouraged by this result, Company X starts moving into other markets; France and Canada are next. To Joe's dismay, he finds that he has to maintain several completely different versions of his code because France and Canada, although they share a common language, have a completely different culture! Poor Joe now has five compiled versions of his code: an American English, Japanese, French, Canadian French, and Canadian English. Now, when he makes even the slightest change to his code, he has to make the same change five times, and then hire several translators to make language changes directly in the source code. Clearly, Joe is in an unacceptable situation.

What Is Internationalization?

In the previous scenario, Joe Programmer is said to have written a myopic program, one that is only suited to one locale. A locale is a region (usually geographic, but not necessarily so) that shares customs, culture, and language. Each of the five versions of Joe's program was localized for one specific locale, and was unusable outside that locale without major alteration. This violates the fundamental principle of OOP design, because Joe's program is no longer portable or reusable. The process of isolating the culture-dependent code (text, pictures, and so on) from the language-independent code (the actual functionality of the program), is called Internationalization. Once a program has been through this process, it can easily be adapted to any Locale with a minimum amount of effort. Version 1.1 of the Java language provides built-in support for internationalization, which makes writing truly portable code easy.

Java Support for Internationalization

Java 1.1 introduces several changes to the Java language which support internationalization. In the past, writing internationalized code required extra effort and was substantially more difficult than writing myopic code. One of the design goals of Java 1.1 was to reverse this paradigm. Java seeks to make writing internationalized code easier than its locale-specific counterpart. The changes introduced with Java 1.1 mainly affect three packages:

java.util Introduces the Locale class. A Locale encapsulates certain information about a locale, but does not provide the actual locale-specific operations. Rather, affected methods can now be passed a Locale object as a parameter which will alter their behavior. If no Locale is specified, Java 1.1 supports a default Locale taken from the environment. This package also provides support for ResourceBundles, objects that encapsulate locale-sensitive data in a portable, independent way.
java.io All of the classes in java.io which worked with InputStreams and OutputStreams now have corresponding classes which work with class Reader and Writer. Readers and Writers work like Streams, except they are designed to handle 16-bit Unicode characters instead of 8-bit bytes.
java.text An entirely new package that provides support for manipulating various kinds of text. This includes collating (sorting) text, formatting dates and numbers, and parsing language-sensitive data.

The Locale class

A Locale object encapsulates information about a specific locale. This consists of just enough information to uniquely identify the locale's region. When a locale-sensitive method is passed a Locale object as a parameter, it will attempt to modify its behavior for that particular locale. A Locale is initialized with a language code, a country code, and an optional "variant" code. These three things define a region, although you need not specify all three. For example, you could have a Locale object for American English, California variant. If you ask the Calendar class what the first month of the year is, the Calendar will try and find a name suitable for Californian American English. Since month names are not affected by what state you are in, the Calendar class has no built-in support for Californian English, and it tries to find a best fit. It will next try American English, but since month names are constant in all English-speaking countries, this will fail as well. Finally, the Calendar class will return the month name that corresponds to the English Locale. This best-fit lookup procedure allows the programmer complete control over the granularity of internationalized code.

You create a Locale object using the following syntax:

Locale theLocale = new Locale("en", "US");

where "en" specifies English, and "US" specifies United States. These two-letter codes are used internally by Java programs to identify languages and countries. They are defined by the ISO-639 and ISO-3166 standards documents respectively. More information on these two documents can be found at:

http://www.ics.uci.edu/pub/ietf/http/related/iso639.txt
http://www.chemie.fu-berlin.de/diverse/doc/ISO_3166.html

Currently, the JDK supports the following language and country combinations in all of its locale-sensitive classes, such as Calendar, NumberFormat, and so on. This list may change in the future, so be sure to check the latest documentation (see Table 16.1).
Table 16.1 Locales Supported by the JDK
Locale Country Language

da_DK Denmark Danish

DE_AT Austria German

de_CH Switzerland German

de_DE Germany German

el_GR Greece Greek

en_CA Canada English

en_GB United Kingdom English

en_IE Ireland English

en_US United States English

es_ES Spain Spanish

fi_FI Finland Finnish

fr_BE Belgium French

fr_CA Canada French

fr_CH Switzerland French

fr_FR France French

it_CH Switzerland Italian

it_IT Italy Italian

ja_JP Japan Japanese

ko_KR Korea Korean

nl_BE Belgium Dutch

nl_NL Netherlands Dutch

no_NO Norway Norwegian (Nynorsk)

no_NO_B Norway Norwegian (Bokmål)

pt_PT Portugal Portuguese

sv_SE Sweden Swedish

tr_TR Turkey Turkish

zh_CN China Chinese(Simplified)

zh_TW Taiwan Chinese (Traditional)

Programmers can also create their own custom Locales, simply by specifying a unique sequence of country, language, variant. Multiple variants can be separated by an underscore character. To create a variant of Californian American English running on a Windows machine, use the following code:

**Table 16.1 Locales Supported by the JDK**
Locale	Country	Language
da_DK	Denmark	Danish
DE_AT	Austria	German
de_CH	Switzerland	German
de_DE	Germany	German
el_GR	Greece	Greek
en_CA	Canada	English
en_GB	United Kingdom	English
en_IE	Ireland	English
en_US	United States	English
es_ES	Spain	Spanish
fi_FI	Finland	Finnish
fr_BE	Belgium	French
fr_CA	Canada	French
fr_CH	Switzerland	French
fr_FR	France	French
it_CH	Switzerland	Italian
it_IT	Italy	Italian
ja_JP	Japan	Japanese
ko_KR	Korea	Korean
nl_BE	Belgium	Dutch
nl_NL	Netherlands	Dutch
no_NO	Norway	Norwegian (Nynorsk)
no_NO_B	Norway	Norwegian (Bokmål)
pt_PT	Portugal	Portuguese
sv_SE	Sweden	Swedish
tr_TR	Turkey	Turkish
zh_CN	China	Chinese(Simplified)
zh_TW	Taiwan	Chinese (Traditional)

Locale theLocale = new Locale("en", "US", "CA_WIN");

Remember that methods that do not understand this particular variant will try and find a "best fit" match, in this case probably "en_US".

The two-letter abbreviations listed here are not meant to be displayed to the user; they are meant only for internal representation. For display, use one of the Locale methods listed in Table 16.2.
Table 16.2 Locale Display Methods
Method Name Description

getDisplayCountry()

getDisplayCountry(Locale) Country name, localized for default Locale, or specified Locale

getDisplayLanguage()

getDisplayLanguage(Locale) Language name, localized for default Locale, or specified Locale

getDisplayName()

getDisplayName(Locale) Name of the entire locale, localized for default Locale, or specified Locale

getDisplayVariant()

getDisplayVariant(Locale) Name of the Locale's variant. If the localized name is not found, this will return the variant code.

These methods are very useful when you want to have a user interact with a Locale object. Here's an example of using the getDisplayLanguage() method:

**Table 16.2 Locale Display Methods**
Method Name	Description
`getDisplayCountry()`
`getDisplayCountry(Locale)`	Country name, localized for default `Locale`, or specified `Locale`
`getDisplayLanguage()`
`getDisplayLanguage(Locale)`	Language name, localized for default `Locale`, or specified `Locale`
`getDisplayName()`
`getDisplayName(Locale)`	Name of the entire locale, localized for default `Locale`, or specified `Locale`
`getDisplayVariant()`
`getDisplayVariant(Locale)`	Name of the Locale's variant. If the localized name is not found, this will return the variant code.

Locale.setDefault( new Locale("en", "US") ); //Set default Locale to American English
Locale japanLocale = new Locale("ja:, "JP"); //Create locale for Japan
System.out.println( japanLocale.getDisplayLanguage() );
System.out.println( japanLocale.getDisplayLanguage( Locale.FRENCH ) );

This code fragment will print out the name of the language used by japanLocale. In the first case, it is localized for the default Locale, that has been conveniently set to American English. The output would therefore be "Japanese." The second print statement will localize the language name for display in French, which yields the output "Japonais." All of the Locale "display" methods use this same pattern. Almost all Internationalization API methods allow you to explicitly control the Locale used for localization, but, in most cases, you'll just want to use the default Locale.

Another thing to note in the above example is the use of the static constant Locale.FRENCH. The Locale class provides a number of these useful constants, each of which is a shortcut for the corresponding Locale object. A list of these objects is shown in Table 16.3:

Table 16.3 Locale Static Objects
Constant Name Locale Shortcut for

CANADA English Canada new Locale("en", "CA", "")

CANADA_FRENCH French Canada new Locale("fr", "CA", "")

CHINA SCHINESE PRC Chinese (Simplified) new Locale("zh", "CN", "")

CHINESE Chinese Language new Locale("zh", "", "")

ENGLISH English Language new Locale("en", "", "")

FRANCE France new Locale("fr", "FR", "")

FRENCH French Language new Locale("fr", "", "")

GERMAN German Language new Locale("de", "", "")

GERMANY Germany new Locale("de", "DE", "")

ITALIAN Italian Language new Locale("it", "", "")

ITALY Italy new Locale("it", "IT", "")

JAPAN Japan new Locale("jp", "JP", "")

JAPANESE Japanese Language new Locale("jp", "", "")

KOREA Korea new Locale("ko", "KR", "")

KOREAN Korean Language new Locale("ko", "", "")

TAIWAN TCHINESE Taiwan (Traditional Chinese) new Locale("zh", "TW", "")

UK Great Britain new Locale("en", "GB", "")

US United States new Locale("en", "US", "")

**Table 16.3** **Locale Static Objects**
Constant Name	Locale	Shortcut for
CANADA	English Canada	new `Locale`("en", "CA", "")
CANADA_FRENCH	French Canada	new `Locale`("fr", "CA", "")
CHINA SCHINESE PRC	Chinese (Simplified)	new `Locale`("zh", "CN", "")
CHINESE	Chinese Language	new `Locale`("zh", "", "")
ENGLISH	English Language	new `Locale`("en", "", "")
FRANCE	France	new `Locale`("fr", "FR", "")
FRENCH	French Language	new `Locale`("fr", "", "")
GERMAN	German Language	new `Locale`("de", "", "")
GERMANY	Germany	new `Locale`("de", "DE", "")
ITALIAN	Italian Language	new `Locale`("it", "", "")
ITALY	Italy	new `Locale`("it", "IT", "")
JAPAN	Japan	new `Locale`("jp", "JP", "")
JAPANESE	Japanese Language	new `Locale`("jp", "", "")
KOREA	Korea	new `Locale`("ko", "KR", "")
KOREAN	Korean Language	new `Locale`("ko", "", "")
TAIWAN TCHINESE	Taiwan (Traditional Chinese)	new `Locale`("zh", "TW", "")
UK	Great Britain	new `Locale`("en", "GB", "")
US	United States	new `Locale`("en", "US", "")

Packaging Locale-sensitive Data

The Locale class allows you to easily handle Locale-sensitive methods. However, most programs (especially applets and GUI-based applications) require the use of Strings, data, and other resources that also need to be localized. For instance, most GUI programs have "OK" and "Cancel" buttons. This is fine for the United States, but other locales require different labels for these buttons. In Germany, for instance, you might use "Gut" and "Vernichten" instead. Traditionally, information such as this was included in the source code of an application, which, as Programmer Joe found out earlier, can lead to many problems when trying to simultaneously support many localized versions of one program. To solve this problem, Java provides a way to encapsulate this data into objects which are loaded by the VM upon demand. These objects are called ResourceBundles.

ResourceBundles--Naming conventions ResourceBundle is an abstract class that must be extended in order to provide any functionality. ResourceBundles are loaded by a class loader by name, and must follow a very strict naming convention in order to be loaded properly. This is best illustrated by example. Say you have a class called LabelBundle which extends ResourceBundle and contains the names of all GUI labels you use in an application. The class called LabelBundle provides default information, while LabelBundle_fr provides French labels, LabelBundle_ge_GE provides German labels, and LabelBundle_en_US_MAC provides Macintosh-specific American English labels. You request a ResourceBundle using the following static method:

ResourceBundle getResourceBundle(String baseName, Locale locale, ClassLoader loader)

This method uses the specified ClassLoader to search for a class that matches baseName, plus certain attributes of the specified Locale. There is a very specific search pattern that is used to find the "closest match" to the Bundle you request:

bundleName + "_" + localeLanguage + "_" + localeCountry + "_" + localeVariant 
bundleName + "_" + localeLanguage + "_" + localeCountry 
bundleName + "_" + localeLanguage           
bundleName + "_" + defaultLanguage + "_" + defaultCountry + "_" + defaultVariant 
bundleName + "_" + defaultLanguage + "_" + defaultCountry 
bundleName + "_" + defaultLanguage 
bundleName

In our example, if you request the baseName LabelBundle with a fr_FR_WIN (French language, France, Windows platform) Locale, the getResourceBundle() method will perform the following steps:

Search for the class LabelBundle_fr_FR_WIN, which fails because you have defined no such class.
Search for the class LabelBundle_fr_FR, which also fails because you did not define a France-only Bundle.
Search for class LabelBundle_fr. This succeeds, and returns the class with this name. However, if this search had failed (if you had not supplied a French-language Bundle), the search would have continued, using the language, country, and variant codes supplied in the default Locale.

Creating ResourceBundles Now that you understand the naming convention used with ResourceBundles, take a look at how they are created. The simplest form of ResourceBundles extends the ResourceBundle class directly, and then overrides one method:

Object handleGetObject(String key)

This method returns an Object that corresponds to the specified key. These keys are internal representations of the content stored in the ResourceBundle, and should be the same for all localized versions of the same data. An extremely simple version of your LabelBundle might be defined as follows:

class LabelBundle extends ResourceBundle {
   public Object handleGetObject(String key) {
      if( key.equals("OK") )
         return "OK";
      else if( key.equals("Cancel") )
         return "Cancel";

      // Other labels could be handled here

   return null; // If the key has no matches, always return null
   }
}

Other versions of the same bundle might return values translated into different languages. You can see, however, that this method of handling key-value pairs is very inefficient if you have more than a few keys. Luckily, Java provides two subclasses of ResourceBundle which can make life easier: ListResourceBundle and PropertyResourceBundle.

ListResourceBundles use an array of two-element arrays to store the key-value pairs used above. All you have to do is override the default getContents() method, like this:


class LabelBundle extends ListResourceBundle {
   static final Object[][] labels = {
      {"OK", "OK"},
      {"Cancel", "Cancel"},
      ("AnotherKey", "Another Value"}
      //More key-value pairs can go here
   };

   public Object[][] getContents() {
      return labels;
   }
}

You could also provide your own similar functionality using a Hashtable, but that's only worthwhile if you want the contents to change dynamically over time.

PropertyResourceBundles are created as needed from predefined "property" files stored on disk. These are usually used for system-wide settings, or when large amounts of data need to be stored in a key-value pair. PropertyResourceBundles are built from files with the same name as the corresponding class file, but with the .properties extension instead. To implement the LabelBundle_de_DE class, you might provide a file called LabelBundle_de_DE.properties with the following content:

OK=Gut
Cancel=Vernichten
AnotherKey=This value has a lot of text stored within it. Of course, it really ought to be translated into German first...

Contents are always specified in the form "key=value" and are assumed to be Strings (although they can be cast into other appropriate objects). This functionality is based on the java.util.Properties class. See Chapter 33 for more information on the java.util package.

NOTE: Although the examples given here all deal with String objects, ResourceBundles can store Objects of any type, including Dates, Applets, GUI elements, or even other ResourceBundles!

Accessing ResourceBundles As previously mentioned, you load ResourceBundles by name using the static method getResourceBundle(). Assuming this succeeds (it throws an Exception otherwise), you can then query individual values within the Bundle using the getObject() method. Of course, this also usually requires an explicit cast to the kind of Object you want, so you need to know this information ahead of time. As a matter of convenience, ResourceBundle also provides the following methods that return already-cast Objects:

getMenu(String)
getMenuBar(String)
getObject(String)
getString(String)
getStringArray(String)

Other Changes to java.util

There have been some other changes to the java.util packages which are mainly straightforward, but make substantial changes from the 1.1 Core API. The Date class is no longer to be used for time manipulation; it is simply a wrapper for one particular instant in time. For creating Date objects, you should now use the Calendar class. Calendar is an abstract class that provides culture-independent methods for manipulating the epoch, century, year, month, week, day, and time in various ways. In order to instantiate the Calendar class, you have to extend it and provide methods based on a particular Calendar standard. The only one that (so far) comes with the JDK is the GregorianCalendar class, which provides very sophisticated functionality for the world's most popular calendar system. Future releases may include support for various lunar, seasonal, or other calendar systems. An adjunct to the Calendar class, which is not usually used directly by the programmer, is the TimeZone (and SimpleTimeZone) class, which allows dates and times to be properly adjusted for other time zones.

The Date, Calendar, and TimeZone classes provide a huge amount of functionality that most programmers will never need to know about. You don't need to understand the intricacies of temporal arithmetic to make use of these classes; they all contain default methods that allow you to get the current time and date, and display it in a Locale-sensitive way. By merely using the provided methods, your programs will become localized by default; requiring no added effort on your part.

NOTE: There are many more methods in these few classes than are worth discussing here. If you are interested, a simple example of the Calendar and Date classes interacting is provided in the example at the end of the chapter. For a more complete discussion, you should consult the Java API documentation directly.

Input-Output (I/O) Changes

The old java.io package operated exclusively on byte streams: a continuous series of 8-bit quantities. However, Java's Unicode characters are 16 bits, which makes using them with byte streams difficult. Java 1.1 introduces a whole series of 16-bit character stream Readers and Writers, which correspond to the old InputStream and OutputStream. The two sets of classes can work together or separately, depending on whether your program needs to input or output text of any kind.

Character Set Converters

The way in which characters are represented as binary numbers is called an encoding scheme. The most common scheme used for English text, is called the ISO Latin-1 encoding. The set of characters supported by any one encoding is said to be its character set, which includes all possible characters that can be represented by the encoding. Usually, the first 127 codes of an encoding correspond to the almost universally accepted ASCII character set, which includes all of the standard characters and punctuation marks. Nevertheless, most encodings can vary radically, especially since some, like Chinese and Japanese encodings, have character sets that bear little resemblance to English!

Luckily, Java 1.1 provides classes for dealing with all of the most common encodings around. The ByteToCharConverter and CharToByteConverter classes are responsible for performing very complex conversions to and from the standard Unicode characters supported by Java. Each encoding scheme is given its own label by which it can be identified. A complete list of JDK 1.1 supported encodings and their labels follows is shown in Table 16.4:
Table 16.4 JDK 1.1-supported Character Encodings
Label Encoding Scheme Description

8859_1 ISO Latin-1

8859_2 ISO Latin-2

8859_3 ISO Latin-3

8859_4 ISO Latin-4

8859_5 ISO Latin/Cyrillic

8859_6 ISO Latin/Arabic

8859_7 ISO Latin/Greek

8859_8 ISO Latin/Hebrew

8859_9 ISO Latin-5

Big5 Big 5 Traditional Chinese

CNS11643 CNS 11643 Traditional Chinese

Cp1250 Windows Eastern Europe / Latin-2

Cp1251 Windows Cyrillic

Cp1252 Windows Western Europe / Latin-1

Cp1253 Windows Greek

Cp1254 Windows Turkish

Cp1255 Windows Hebrew

Cp1256 Windows Arabic

Cp1257 Windows Baltic

Cp1258 Windows Vietnamese

Cp437 PC Original

Cp737 PC Greek

Cp775 PC Baltic

Cp850 PC Latin-1

Cp852 PC Latin-2

Cp855 PC Cyrillic

Cp857 PC Turkish

Cp860 PC Portuguese

Cp861 PC Icelandic

Cp862 PC Hebrew

Cp863 PC Canadian French

Cp864 PC Arabic

Cp865 PC Nordic

Cp866 PC Russian

Cp869 PC Modern Greek

Cp874 Windows Thai

EUCJIS Japanese EUC

GB2312 GB2312-80 Simplified Chinese

JIS JIS

KSC5601 KSC5601 Korean

MacArabic Macintosh Arabic

MacCentralEurope Macintosh Latin-2

MacCroatian Macintosh Croatian

MacCyrillic Macintosh Cyrillic

MacDingbat Macintosh Dingbat

MacGreek Macintosh Greek

MacHebrew Macintosh Hebrew

MacIceland Macintosh Iceland

MacRoman Macintosh Roman

MacRomania Macintosh Romania

MacSymbol Macintosh Symbol

MacThai Macintosh Thai

MacTurkish Macintosh Turkish

MacUkraine Macintosh Ukraine

SJIS PC and Windows Japanese

UTF8 Standard UTF-8

Java 1.1 also provides ways for developers to create their own encodings and to create Converters for already-existing but unsupported encodings. The details of how character conversion is done is actually quite complex, and those who are interested are referred to Java's Web pages.

**Table 16.4** **JDK 1.1-supported Character Encodings**
Label	Encoding Scheme Description
8859_1	ISO Latin-1
8859_2	ISO Latin-2
8859_3	ISO Latin-3
8859_4	ISO Latin-4
8859_5	ISO Latin/Cyrillic
8859_6	ISO Latin/Arabic
8859_7	ISO Latin/Greek
8859_8	ISO Latin/Hebrew
8859_9	ISO Latin-5
Big5	Big 5 Traditional Chinese
CNS11643	CNS 11643 Traditional Chinese
Cp1250	Windows Eastern Europe / Latin-2
Cp1251	Windows Cyrillic
Cp1252	Windows Western Europe / Latin-1
Cp1253	Windows Greek
Cp1254	Windows Turkish
Cp1255	Windows Hebrew
Cp1256	Windows Arabic
Cp1257	Windows Baltic
Cp1258	Windows Vietnamese
Cp437	PC Original
Cp737	PC Greek
Cp775	PC Baltic
Cp850	PC Latin-1
Cp852	PC Latin-2
Cp855	PC Cyrillic
Cp857	PC Turkish
Cp860	PC Portuguese
Cp861	PC Icelandic
Cp862	PC Hebrew
Cp863	PC Canadian French
Cp864	PC Arabic
Cp865	PC Nordic
Cp866	PC Russian
Cp869	PC Modern Greek
Cp874	Windows Thai
EUCJIS	Japanese EUC
GB2312	GB2312-80 Simplified Chinese
JIS	JIS
KSC5601	KSC5601 Korean
MacArabic	Macintosh Arabic
MacCentralEurope	Macintosh Latin-2
MacCroatian	Macintosh Croatian
MacCyrillic	Macintosh Cyrillic
MacDingbat	Macintosh Dingbat
MacGreek	Macintosh Greek
MacHebrew	Macintosh Hebrew
MacIceland	Macintosh Iceland
MacRoman	Macintosh Roman
MacRomania	Macintosh Romania
MacSymbol	Macintosh Symbol
MacThai	Macintosh Thai
MacTurkish	Macintosh Turkish
MacUkraine	Macintosh Ukraine
SJIS	PC and Windows Japanese
UTF8	Standard UTF-8

Readers and Writers

Character streams make heavy use of Character set converters. Fortunately, they also hide the underlying complexity of the conversion process, making it easy for Java programs to be written without knowledge of the Internationalizing process. Again, you see that programs are internationalized by default.

The advantages of using character streams over byte streams are many. Although they have the added overhead of doing character conversion on top of byte reading, they also allow for more efficient buffering. Byte streams are designed to read information a byte at a time, while character streams read a buffer at a time. According to JavaSoft, this, combined with a new efficient locking scheme, more than compensates for the speed loss caused by the conversion process. Every Input or Output Stream in the old class hierarchy now has a corresponding Reader or Writer class that performs similar functions using character streams (see Table 16.5).
Table 16.5 Input Output Streams and Corresponding Reader and Writer Classes
(from JavaSoft)
Byte Stream Corresponding Character Stream OutputStream) Function Class(InputStream/
Class(Reader/Writer)

InputStream Reader Abstract class from which all other classes inherit methods, and so on

BufferedInputStream BufferedReader Provides a buffer for input operations

LineNumberInputStream LineNumberReader Keeps track of line numbers

ByteArrayInputStream CharArrayReader Reads from an array

N/A InputStreamReader Translates a byte stream into a character stream

FileInputStream FileReader Allows input from a file on disk

FilterInputStream FilterReader Abstract class for filtered input

PushbackInputStream PushbackReader Allows characters to be pushed back into the stream

PipedInputStream PipedReader Reads from a process pipe

StringBufferInputStream StringReader Reads from a String

OutputStream Writer Abstract class for character-output streams

BufferedOutputStream BufferedWriter Buffers output, uses platform's line separator

ByteArrayOutputStream CharArrayWriter Writes to a character array

FilterOutputStream FilterWriter Abstract class for filtered character output

N/A OutputStreamWriter Translates a character stream into a byte stream

FileOutputStream FileWriter Translates a character stream into a byte file

PrintStream PrintWriter Prints values and objects to a Writer

PipedOutputStream PipedWriter Writes to a PipedReader

N/A StringWriter Writes to a String

The impact of these changes is actually quite minor if you're developing new programs. All you have to do is remember to use Reader and Writer classes where before you used InputStream and OutputStream. The biggest change you'll have to worry about relates to the DataInputStream and PrintStream, which used to be the classes of choice for sending text input and output. The DataInputStream.readLine() method has been deprecated--you should use BufferedReader.readLine() instead. Further, you can no longer instantiate a new PrintStream object, although you can still use pre-existing PrintStreams (such as System.out) for debugging purposes. To output line-terminated strings, you should use the PrintWriter class instead. The main offshoot of this is that all code which used to communicate with the DataInputStream and PrintStream classes (which includes much Socket, File, and Piped code), will have to be updated to use the proper Reader and Writer classes. To make this easier, Java 1.1 provides a class called InputStreamReader and OutputStreamWriter, which is used to create a new Writer or Reader based on a byte stream. This makes the Reader/Writer system compatible with all of the other classes that currently use byte streams (like URL, Socket, File, and so on).

**Table 16.5** *Input Output Streams* and Corresponding *Reader* and *Writer* Classes
(from JavaSoft)
Byte Stream	*Corresponding Character Stream OutputStream) Function*	Class(InputStream/ Class(Reader/Writer)
`InputStream`	`Reader`	Abstract class from which all other classes inherit methods, and so on
`BufferedInputStream`	`BufferedReader`	Provides a buffer for input operations
`LineNumberInputStream`	`LineNumberReader`	Keeps track of line numbers
`ByteArrayInputStream`	`CharArrayReader`	Reads from an array
N/A	`InputStreamReader`	Translates a byte stream into a character stream
`FileInputStream`	`FileReader`	Allows input from a file on disk
`FilterInputStream`	`FilterReader`	Abstract class for filtered input
`PushbackInputStream`	`PushbackReader`	Allows characters to be pushed back into the stream
`PipedInputStream`	`PipedReader`	Reads from a process pipe
`StringBufferInputStream`	`StringReader`	Reads from a String
`OutputStream`	`Writer`	Abstract class for character-output streams
`BufferedOutputStream`	`BufferedWriter`	Buffers output, uses platform's line separator
`ByteArrayOutputStream`	`CharArrayWriter`	Writes to a character array
`FilterOutputStream`	`FilterWriter`	Abstract class for filtered character output
N/A	`OutputStreamWriter`	Translates a character stream into a byte stream
`FileOutputStream`	`FileWriter`	Translates a character stream into a byte file
`PrintStream`	`PrintWriter`	Prints values and objects to a Writer
`PipedOutputStream`	`PipedWriter`	Writes to a PipedReader
N/A	`StringWriter`	Writes to a String

The New Package: java.text

The most advanced and complex Internationalization API features are found in the java.text package. They include many classes for formatting and organizing text in a language- independent way. For instance, date formatting can be quite problematic for programmers. In America, dates are written in month-day-year order, but in Europe, dates are written in day-month-year order. This makes interpreting a date like 10/2/97 difficult: Does this represent October 2, 1997 or February 10, 1997? This is the purpose of properly formatted text. Most of these classes are not intended to be instantiated directly, and can be accessed through static getDefault() methods.

Text collating, on the other hand, is the process of sorting text according to particular rules. In English, sorting in alphabetical order is relatively easy, because English lacks many special characters (such as accents) that could complicate things. In French, however, things are not so simple. Two words that look very similar (like péché and pêche) have entirely different meanings. Which should come first alphabetically? And what about characters like hyphenation or punctuation? The Java Collation class provides a way of defining language-specific sort criteria in a robust, consistent manner.

Text boundaries can also be ambiguous across languages. Where do words, sentences, and paragraphs begin and end? In English, a period generally marks the end of a sentence, but is this always the case? Certainly not. The TextBoundary and CharacterIterator classes can intelligently break up text into various sub-units based on language-specific criteria. Java 1.1 comes with built-in support for some languages, but you can always define your own set of rules, as well. TextBoundary works by returning the integer index of boundaries that occur within a String, as demonstrated by the following example, which breaks a String up by words:

String str = "This is a line of text. It contains many words, sentences, and formatting."
TextBoundary byWord = TextBoundary.getWordBreak();
int from, to;
from = byWord.first();
while( (to = byWord.next()) != DONE ) {
   System.out.println( byWord.getText().substring(from, to) );
   from = to;
}

This snippet of code will print out each word on its own line. Although this example is trivial, text boundaries can be extremely important, especially in GUI applications that require text selection, intelligent word-wrapping, and so on.

An Example: InternationalTest

To better understand how all of this fits together, take a look at this very simple Java application that makes use of several of the features discussed in this chapter. It is included on the CD-ROM accompanying this book, if you'd like to play with it yourself.

The application is a very simple one. It takes up to three command-line parameters that specify a locale. It uses this information to:

Display some information about the default locale and the one entered
Try and load a ResourceBundle corresponding to the specified locale, and print out what the Bundle contains
Display the date, localized to the specified locale

Besides the main application class (InternationalTest), the program requires several other classes. Most are ResourceBundles that correspond to different locales (currently, ResourceBundles must be created as public classes, but this may change in a future release of the JDK). Another thing to note is that this application passes "null" as the ClassLoader parameter to the getResourceBundle() method. This is because applications are loaded from the CLASSPATH environment variable, and do not have an explicit ClassLoader. So long as the ResourceBundles are also available via CLASSPATH, you don't need a separate ClassLoader to load them. If you were making an applet, on the other hand, you would need a ClassLoader to load the classes across the Internet. You can use the same ClassLoader instance that loaded the applet like this:

ClassLoader loader = this.getClass().getClassLoader();

The complete listing of InternationalTest follows in Listing 16.1:

Listing 16.1InternationalTest.java

import java.util.*;
import java.lang.*;
import java.text.DateFormat;

class InternationalTest extends Object {

public static void main(String args[]) {String lang = "", country = "", var = "";

   try {
      lang = args[0];
      country = args[1];
      var = args[2];
   } catch(ArrayIndexOutOfBoundsException e) {
      if( lang.equals("") ) {
         System.out.println("You must specify at least one parameter");
         System.exit(1);
         }
   }

   Locale locale = new Locale(lang, country, var);
   Locale def = Locale.getDefault();

   System.out.println( "Default Locale is: "+ def.getDisplayName() );
   System.out.println("You have selected Locale: "+locale.getDisplayName() );
   System.out.println("Default language, localized for your locale is: " +
            def.getDisplayLanguage( locale ) );
   System.out.println("Default country name, localized: " +                          locale ) );

   ClassLoader loader = null;

   ResourceBundle bundle = null;
   try {
      bundle = ResourceBundle.getResourceBundle( "TestBundle", locale, loader );
   } catch( MissingResourceException e) {
      System.out.println( "No resources available for that locale." );
   } finally {
         System.out.println( "Resources available are: ");
         System.out.println(" r1: " + bundle.getString("r1") );
         System.out.println(" r2:" + bundle.getString("r2") );
   }

   DateFormat myFormat = DateFormat.getDateTimeFormat(DateFormat.FULL, DateFormat.FULL, locale);
   Calendar myCalendar = Calendar.getDefault( locale );
   System.out.println("The localized date and time is: " + 
            myFormat.format( myCalendar.getTime() ) );

   }
}

Figures 16.1, 16.2, and 16.3 show output from the InternationalTest program:

FIG. 16.1
American English locale.

FIG. 16.2
Canadian French, and Canadian French Macinotsh locales.

FIG. 16.3
Canadian English and Germany locales.

So where does this leave Joe Programmer? Well, he's got some work to do to convert his application to the Internationalization API. His labels, text, and localized resources need to be encapsulated into ResourceBundles for each locale he supports. He also needs to adjust a few methods and objects to use localized date, time, and message formats. When this process is complete, he'll find that not only will his program be localized for many locales, but that he also does not need to support multiple versions of the same program. Even better, when a new locale needs to be supported, he doesn't need to modify his source code at all--he just needs to get his locale-specific resources translated to this new language/customs. His program is now, once again, portable, reusable, and independent.

Chapter 16 Using Internationalization