Java 1.1 Unleashed
- 52 -
|
Minimal Name | Full Name | Language | Country |
ar | ar_EG | Arabic | Egypt |
be | be_BY | Belorussian | Belarus |
bg | bg_BG | Bulgarian | Bulgaria |
ca | ca_ES | Catalan | Spain |
cs | cs_CZ | Czech | Czech Republic |
da | da_DK | Danish | Denmark |
de | de_DE | German | Germany |
de_AT | de_AT | German | Austria |
de_CH | de_CH | German | Switzerland |
el | el_GR | Greek | Greece |
en_CA | en_CA | English | Canada |
en_GB | en_GB | English | United Kingdom |
en_IE | en_IE | English | Ireland |
(none) | en_US | English | United States |
es | es_ES | Spanish | Spain |
et | et_EE | Estonian | Estonia |
fi | fi_FI | Finnish | Finland |
fr | fr_FR | French | France |
fr_BE | fr_BE | French | Belgium |
fr_CA | fr_CA | French | Canada |
fr_CH | fr_CH | French | Switzerland |
hr | hr_HR | Croatian | Croatia |
hu | hu_HU | Hungarian | Hungary |
is | is_IS | Icelandic | Iceland |
it | it_IT | Italian | Italy |
it_CH | it_CH | Italian | Switzerland |
iw | iw_IL | Hebrew | Israel |
ja | ja_JP | Japanese | Japan |
ko | ko_KR | Korean | Korea |
lt | lt_LT | Lithuanian | Lithuania |
lv | lv_LV | Latvian | Latvia |
mk | mk_MK | Macedonian | Macedonia |
nl | nl_NL | Dutch | Netherlands |
nl_BE | nl_BE | Dutch | Belgium |
no | no_NO_B | Norwegian (Bokmål) | Norway |
no_NO_NY | no_NO_NY | Norwegian (Nynorsk) | Norway |
pl | pl_PL | Polish | Poland |
pt | pt_PT | Portuguese | Portugal |
ro | ro_RO | Romanian | Romania |
ru | ru_RU | Russian | Russia |
sh | sh_SP | Serbian (Latin) | Serbia |
sk | sk_SK | Slovak | Slovakia |
sl | sl_SI | Slovene | Slovenia |
sq | sq_AL | Albanian | Albania |
sr | sr_SP | Serbian (Cyrillic) | Serbia |
sv | sv_SE | Swedish | Sweden |
tr | tr_TR | Turkish | Turkey |
uk | uk_UA | Ukrainian | Ukraine |
zh | zh_CN | Chinese | China |
zh_TW | zh_TW | Chinese | Taiwan |
Resource bundles can be used to solve the structuring problem associated with internationalizing internal data. Using resource bundles, locale-sensitive data can be grouped together in one place--or just a few places--so that it can be localized easily.
Resource bundles implement special support for locales. In particular, the java.util.ResourceBundle class provides methods that search for bundles associated with a particular locale. The search process tries several variations, starting with the specified locale, falling back to the default locale, and finally settling for a nonlocalized bundle. The search does not fail unless there simply isn't a resource bundle by the requested name at all. That's convenient, and it is usually what you want: If the program hasn't been localized for a particular locale, it is probably better to continue running without localization than simply to fail.
As an example of how the search process works, assume that your program makes use of a resource bundle called MessageBundle. Also assume that the default locale is fr_CA, and the bundle is requested for the no_NO_NY (a situation that might occur if the user is a Norwegian living in Quebec). Now suppose that the application makes the following call:
ResourceBundle.getBundle("MessageBundle", new Locale("no", "NO", "NY"))
In response, the search tests for the existence of these bundles, in the following order:
In other words, the search starts with the specified locale and looks for progressively less- specific alternatives; if none are found, the search starts over again with the default locale; finally, if none of the bundles associated with the default locale is found, the default bundle is tried. If no bundle is found with any of those names, the MissingResourceException exception is thrown.
Resource bundles can be used for any type of data, and the ResourceBundle class can be extended to support any kind of storage mechanism. That's a useful flexibility for internationalization because any kind of data used by a program may have to be localized, including messages, prompts, labels, icons, sound files, and images.
Programmers have developed many tricks for quickly dealing with various kinds of data. But it's surprising how often those tricks incorporate cultural or language assumptions. For example, how does one sort a list of words in a language that uses accents and diacritical marks? The problem usually comes as a surprise to English-speaking programmers who are unaccustomed to using accented letters. An even bigger surprise is the revelation that there is no one answer to the question! Different languages and cultures have different rules about how to alphabetize a, á, and â, for example. There are many such issues to be considered for text, dates and times, currency values, and other kinds of data.
Fortunately, the Java libraries provide assistance for handling dates, times, and textual information in a locale-independent way. The next two sections explain the relevant facilities.
The world doesn't have as many calendar systems as it does languages, but it has enough to be troublesome for the programmer writing international applications. In addition to the standard Gregorian calendar used by most of the world, Chinese, Hebrew, and Islamic calendars are in wide use.
In Java 1.1, dates are represented by instances of the java.util.Date class. Times are represented that way, too; after all, a time is nothing more than a very precise date.
To represent any date, some calendar system must be chosen. The Date class implements one particular calendar system, in which times are represented as the number of milliseconds since the first instant of January 1, 1970, GMT (Greenwich mean time). The number can be negative, to indicate times before that demarcation point. That may not seem like a particularly useful calendar to you; in fact, it is not meant for human consumption. The purpose of this calendar system is to be a simple, uniform calendar for representing times within the Java library.
As such, Date objects are really useful only for comparing with other Date objects. The class provides comparison methods so that it is easy to tell how one Date relates to another. More complicated operations, such as learning about months, years, and days of the week, are the job of the Calendar classes.
Calendar provides a generalized view of different calendar systems; particular calendar systems are supported by subclasses of Calendar. The primary purpose of Calendar objects is to convert between Date objects and integer values for year, month, day of month, day of week, hour, minute, second, and so on. (The documentation for the various calendar classes calls these values fields.) You can change the time a Calendar object represents by providing a new Date object, or by changing the integer values that represent the portions of the calendrical date specification. For example, given a Date object d1, you can calculate another date one week later by doing the following:
Calendar cal = Calendar.getInstance(); // acquire a localized Calendar object cal.setTime(d1); // set the Calendar's time value cal.set(Calendar.WEEK_OF_MONTH, // set the "week-of-month" field to cal.get(Calendar.WEEK_OF_MONTH) + 1); // the current value plus one Date d2 = cal.getTime(); // now retrieve a new Date object
There are some interesting details in this code fragment, so I'll discuss the code step by step. First, note that I didn't just create a new instance of Calendar with a constructor. Instead, I used the "factory method" getInstance() to get a new Calendar object. The getInstance() method creates an object appropriate for the default locale. There is also a version of the method that takes a Locale object as a parameter and creates an object appropriate for the specified locale. This is a pattern common to most of the internationalization classes discussed in the rest of this chapter.
After creating the Calendar object, I had to set the time it represented, using my d1 object. That's a bit cumbersome; hopefully, some future version of Calendar will allow a Date object to be specified as a parameter to the getInstance() method for automatic initialization.
Next, I advanced the week by 1. First I queried the current week of the month, added 1, and reset the same field. The Calendar class can represent dates in terms of the following different fields, denoted by named constants:
Field Name | Description |
AM_PM | Before or after noon |
DATE | The day of the month |
DAY_OF_MONTH | The day of the month (synonym for DATE) |
DAY_OF_WEEK | The day of the week |
DAY_OF_WEEK_IN_MONTH | Occurrence of this day of the week in this month (for example, Tuesday the 10th is the second Tuesday of the month) |
DAY_OF_YEAR | The day of the year |
DST_OFFSET | Offset from UTC for daylight saving time in this time zone |
ERA | The era in which this date occurs (for example, A.D. or B.C.) |
HOUR | Hour in 12-hour clock |
HOUR_OF_DAY | Hour in 24-hour clock |
MILLISECOND | Milliseconds within the second |
MINUTE | The minute of the hour |
MONTH | The month of the year |
SECOND | The second within the minute |
WEEK_OF_MONTH | The week of the month |
WEEK_OF_YEAR | The week of the year |
YEAR | The year within the era |
ZONE_OFFSET | Offset from UTC in this time zone |
Any of these fields can be retrieved from a Calendar object with the get() method or be set with the set() method. All the fields are stored as integers; to translate to textual representations, use the DateFormat class described in "Formatted Output and Input," later in this chapter.
After changing the week of the month, I was able to retrieve the new Date object, which represents the moment exactly one week after the original Date. Notice that there are a couple of funny things going on in this step.
The first question you might ask is this: What if the WEEK_OF_MONTH field was already set to 5? By incrementing it, we set the new date to be in the sixth week of the month. But that doesn't make any sense because months don't have six weeks.
The answer is that Calendar objects, by default, are very permissive about the way dates are specified. If you specify a date as being in the seventh week of the twelfth month, the calendar assumes that you mean the second week of the first month of the following year. If you specify a time as 25:00, the resulting time is 1:00 A.M. of the following day. This behavior can be turned off by calling setLenient(false).
Another question may have occurred to you: How does the calendar make sense of the fields after the WEEK_OF_MONTH field is incremented? After all, the calendar also knows about fields representing the day of the year and the day of the month. It seems as though changing just one of the fields would cause a conflict with some of the other fields.
The answer is that Calendar keeps track of which fields are explicitly set and gives them precedence over fields that have been inferred from the time value. Because we set only the WEEK_OF_MONTH field, that value has precedence over other fields that contain contradictory information. In fact, Calendar has rules for choosing between contradictory fields even if they have all been explicitly set, but in our example, those rules aren't required.
Java 1.1 includes one specialized calendar implementation, GregorianCalendar, which implements the standard calendar system used by most western countries.
Calendar can localize its operations based on locales, and it can also understand time zones. Instances of java.util.TimeZone represent time zones and incorporate knowledge about time zones around the world, including offsets from GMT and rules about daylight saving time. Other classes in the Java library in addition to the Calendar class make use of TimeZone, including DateFormat (discussed in "Formatted Output and Input," later in this chapter).
There are several ways to create TimeZone objects. To get the current time zone where the computer is running, use the getDefault() method. If you want an object for a particular zone, call getTimeZone() with a string containing that zone's ID (for example, United States Central Standard Time uses an ID of CST). You can also create a TimeZone using the constructor and set an explicit offset from GMT using the setRawOffset() method.
Most of the time, you don't have to query or manipulate TimeZone objects directly; you can just create the appropriate instances and pass them, as needed, to other objects (such as Calendar) which use the time zone information appropriately. Therefore, you can think of TimeZone objects as similar to Locale objects: they primarily serve as names or tokens.
Textual data presents an entirely different class of problems. As I write this chapter, I am (not surprisingly) using a word processing program. It is instructive to consider the various ways in which such an application uses and manipulates text, and what kinds of problems might be involved where internationalization is concerned.
For one thing, the interface for most word processors makes copious use of lists of names: styles, fonts, cross-reference tags, and so on. Such lists are ordered alphabetically. Sorting localized text is an internationalization problem, not merely because of different alphabets, but also because of different rules used across cultures.
When I use the arrow keys to move around in my document, the cursor moves one letter or line at a time. But by holding down various combinations of modifier keys, I can move (or, for that matter, delete or select text) by other units: words, sentences, or paragraphs. Furthermore, my word processor adjusts line breaks as I type. But Unicode complicates the task of recognizing boundaries between textual units, including acceptable places for line breaking. In addition, Unicode incorporates the concept of combining characters: two 16-bit Unicode characters (usually a base character and an accent mark) that combine to produce the appearance of one character. So even moving letter by letter is related to internationalization!
When I search for text in my document, I can choose whether the search should be sensitive to the difference between uppercase and lowercase letters. The program can also change the capitalization of text automatically, whether at my direction or in the process of formatting headings or entries in the table of contents. The difference between uppercase and lowercase is easy to deal with in ASCII, but the issue is much more complicated in an international character set like Unicode.
Java 1.1 provides features for dealing with all these issues. They are divided into three categories: collation, text boundaries, and character classification.
The term collation refers to the act of comparison. It can also refer to sorting, but the Java library uses it in the first sense. Java 1.1 does not provide sorting routines, but it does provide the collation facilities--the comparison facilities--required to implement sort routines on text objects.
The primary collation class is called java.text.Collator. As with many of the Java internationalization classes, Collator objects are not created directly using the constructor; instead, they are created using the static Collator.getInstance() and Collator.getInstance(Locale) methods, which create localized versions of Collator.
The basic collation functionality of Collator lies in the compare(String, String) method. Just as similar facilities in other languages such as C, this method returns an integer value: zero if the strings are equal, less than or greater than zero if the first string is, respectively, less than or greater than the second. If you are interested only in whether the strings are equal, the convenience method equals(String, String) returns a boolean value indicating equality.
For one-time comparisons, calls to compare() and equals() are sufficient. If certain strings are to be compared multiple times (as might happen when sorting a list of strings), it is better to use CollationKey objects. The getCollationKey(String) method returns an instance of CollationKey that represents the given string for purposes of comparison within the Collator object's locale. In general, CollationKey objects can be compared more efficiently than String objects can be. Given two CollationKey objects called k1 and k2, you compare them in this way:
k1.compareTo(k2)
You can also retrieve the original source string from a CollationKey, using the getSourceString() method, so that you don't have to keep track of which keys represent which strings.
The way Collator objects decide on the relationship between two strings can be tuned to follow desired conventions. The strength of a comparison refers to how literal it is. For example, an extremely strong comparison considers a and A to be different characters; a weaker comparison might consider a, A, ä, and Å to be all the same. Four strength levels are provided. The precise meaning of the strength values depends on the locale, but here are some common definitions:
It may seem that there is no difference between TERTIARY and IDENTICAL, but there is a difference. Unicode may provide more than one way to specify the same letter--for example, the character ë can be specified either as a single 16-bit character or as a combination of e and a combining [dieresis] mark. The two versions are considered equal under a TERTIARY comparison but are different under the rules for IDENTICAL comparisons.
The strength a Collator object uses for comparisons can be set with the setStrength() method; the strength can be queried using getStrength().
The BreakIterator class provides a locale-independent way to find the boundaries between certain kinds of textual elements, such as words and sentences. BreakIterator is even helpful for finding boundaries between printable characters; again, this is because two 16-bit Unicode characters can combine to produce one printed glyph. (It wouldn't be a good idea to try to put the insertion cursor between a letter and its accent!)
Once again, you don't create instances of BreakIterator with a constructor. There are actually several different static factory methods for creating BreakIterator objects, depending on what kind of textual element you want to learn about:
Each of these methods comes in two varieties: one with a Locale parameter, and one with no parameter (this form uses the default locale).
Once you have a BreakIterator object, how do you use it? First, you must inform the object about the text you want to examine, using the setText(String) method. Then you can move through the text looking at boundaries using the first(), last(), next(), and previous() methods. Each of these methods moves the current position of the BreakIterator to the requested boundary and returns the integer position of that boundary within the text. The next() and previous() methods return a special value, BreakIterator.DONE, if there are no more boundaries in the requested direction. Additionally, the next() method has a variant that takes an integer parameter for moving ahead by multiple boundaries in one step.
Iterating through text with a BreakIterator is a little more complicated than traversing a data structure using an Enumeration object. The reason is that the information you get back from a BreakIterator is usually not the complete element you are interested in; instead, it is the start or end point of that element within the text. You must save that value and then find the other boundary before you can do useful things. This is a case in which it makes sense to declare and use two loop variables in a for statement. For example, here is a code fragment that loops through all the words in a String called textBuffer, printing them as it goes:
BreakIterator words = BreakIterator.getWordInstance(); words.setText(textBuffer); for ( int start = words.first(), end = words.next(); end != BreakIterator.DONE; start = end, end = words.next()) { System.out.println(textBuffer.substring(start,end)); }
Another method, current(), returns the index of the current boundary without changing or moving the current point. The following(int) method returns the first boundary after the specified position in the text.
What do you do if you need some of the BreakIterator facilities but can't afford to store your application's text in a String object? If efficiency or ease of access dictate that you use a text data structure (such as a trie--a specialized text data structure) instead of a string, you should define a CharacterIterator class for that data structure. CharacterIterator is an interface, similar in some ways to the java.util.Enumeration interface. It provides an abstract interface for iterating over the elements in a data structure. Classes that implement CharacterIterator don't have to contain all the data; they just have to know how to read it sequentially from the data structure and how to keep track of the current position. CharacterIterator differs from Enumeration in that it is specialized for character data and allows bidirectional scanning.
BreakIterator actually has two setText() methods: One takes a String parameter, and the other takes a CharacterIterator parameter. Once you build a CharacterIterator for your data structure, you can use the BreakIterator facilities to analyze the text stored there.
Even with all the useful text-handling facilities already described, occasionally a program has to work at the level of individual characters. Operations that were trivial using the ASCII character set (such as converting between uppercase and lowercase) are much more complicated with Unicode.
The java.lang.Character class contains several methods for making various tests on character values and for converting them in some way. You can test for the case of a character, or learn whether it is a whitespace character, for example, and you can convert between uppercase and lowercase and back again.
Manipulating data is all very well and good, but at some point, it comes down to reading that data from somewhere and writing (or displaying) it. Java provides facilities to help with several aspects of internationalized I/O, including Unicode streams and formatting and parsing classes.
In Java 1.1, the java.io package contains several I/O stream classes that read and write Unicode data. These classes are called readers and writers, and extend the Reader and Writer classes.
Because the String and char data types in Java represent Unicode data, Unicode I/O streams are the natural and preferred mechanism for text input and output. And it doesn't take any extra work to use them instead of the byte-oriented streams. I won't go into much detail about Unicode streams in this chapter; if you want more details about Unicode I/O streams, see Chapter 12, "The I/O Package."
The hard part about internationalized input and output is formatting and parsing. Data that is intrinsically textual can be read, manipulated, and written again rather easily, but what about data that is represented as text but must be manipulated in some other form? How do you convert back and forth between the values and their textual representations in a locale- independent way?
That's the job of the Format classes. The java.text package contains the abstract Format class, which describes a generic facility for formatting and parsing textual data representations. Several subclasses of Format perform internationalized handling of data types such as Date and numeric values, plus messages for users.
The interface defined by the Format class consists of four methods: two for formatted output, and two for parsing formatted input. The intent is that strings generated by Format and its subclasses can also be reparsed by those same classes to generate an equivalent object or set of objects.
The primary formatting method is format(Object), which returns a formatted string. The other formatting method is more complicated: format(Object, StringBuffer, FieldPosition) returns the StringBuffer object that is passed as the second parameter, after appending the formatted value to it. Subclasses of Format use the FieldPosition object to communicate information about the formatting process. When the method is called, the FieldPosition parameter contains an integer, the field identifier, which the caller uses to express interest in a particular portion of the formatted string. When the method returns, the FieldPosition object has been updated to contain the beginning and ending positions of that field within the StringBuffer. Such information might be useful for choosing appropriate sizes for GUI elements, and this variation of the format() method is useful for building a formatted string in pieces.
The two parsing methods parallel the formatting methods. ParseObject(String) returns an Object created from the information in the string. ParseObject(String, ParsePosition) also returns an Object representing the parsed value, but the ParsePosition parameter is used to control the current position when parsing a string piece by piece. Instances of ParsePosition contain an integer representing a position within a string. When the method is called, the ParsePosition parameter indicates where in the string the method should begin parsing; when the method returns, the parameter has been updated to point to the first character following the parsed value. Thus, it is ready to pass to the next ParseObject method to parse the next piece of the string.
Once again, Format is an abstract class and doesn't provide implementations of all these methods. It does provide default implementations of the simple, single-parameter versions, which work by calling the more general methods so that you can build a working Format object simply by supplying implementations for only two methods.
The Java 1.1 library provides specialized subclasses of Format for handling two kinds of locale-sensitive data: dates and numbers. DateFormat and NumberFormat each provide the following facilities:
DateFormat can format and parse Date objects as dates, or times, or both. NumberFormat can interpret numbers--that is, values of any of the Java numeric types--as ordinary numbers, currency values, or percentages. When parsing numbers using the parse() method, numbers are returned as instances of java.lang.Number.
NOTE: NumberFormat can even handle the two arbitrary-precision numeric types, java.math.BigInteger and java.math.BigDecimal. However, an instance of one of these types is formatted by first calling its longValue() method and then formatting the result as a long. Therefore, the resulting output isn't accurate unless the actual value is within the range that can be represented as a long. Additionally, NumberFormat does not ever return a BigInteger or BigDecimal object when parsing.
You may be surprised to learn that, even though DateFormat and NumberFormat provide all those extra facilities on top of the interface defined by Format, they don't actually implement the parsing and formatting functions! DateFormat and NumberFormat are themselves abstract classes. You must create instances using getInstance() or one of its variants.
When you call getInstance() on one of these classes, you get a preconfigured instance of DecimalFormat (for numbers) or SimpleDateFormat (for dates). These classes can be configured with rules and patterns to format and parse values according to the conventions of a wide variety of locales. The configuration rules are documented in the API documentation, and it's possible to use these classes directly to handle specialized formatting and parsing needs. Unless you really need something special, though, it's best to just call one of the getInstance() methods on DateFormat or NumberFormat and use the object that is returned.
Why? If DecimalFormat and SimpleDateFormat are so configurable, why aren't they the standard objects? Why the extra level of inheritance, with abstract classes that don't actually provide an implementation of the core functionality? The answer is that, as configurable as they are, DecimalFormat and SimpleDateFormat may not be flexible enough to handle all the locales in the world. Some locales may require entirely new formatting classes to be written if they are to be supported correctly. If you asked for a format object for such a locale, you would actually get an instance of one of the new classes. Keeping the basic interface definition and factory methods in a separate class, independent of the actual functionality, makes it easier to support such atypical locales without having to modify any of the classes in the core library.
Being able to format and parse dates, times, and numbers is nice, but such items rarely occur in isolation; usually they are embedded in other text. The MessageFormat class is designed for formatting and parsing textual strings that may incorporate other data items. Error messages are a prime example of what MessageFormat is good for, but the facility can be used to prepare any text meant for human conceptions, including GUI elements and printed reports.
In many respects, MessageFormat is similar to C's printf() function. In fact, there is a static method, MessageFormat.format(String, Object[]), which is very similar indeed. One parameter is a format string that functions as a pattern, with embedded format specifiers indicating how the other parameters should be processed and substituted into the pattern string. Unlike printf(), however, the format() method doesn't actually print the formatted message; it just returns it as a String. Because Java doesn't permit methods to take a variable number of parameters, the additional items are passed as an Object array. Also, the syntax for format specifiers is different; among other things, they incorporate the number of the data item to which they refer. This is because localization of text often involves changing not only the words, but the structure of sentences. Thus, the item that occurs first in the English version of a sentence may have to come last in the German version. (Format patterns are usually taken from a localized resource bundle rather than being included directly in the code.)
Format specifiers within patterns are surrounded by curly braces. Here is a simple example:
MessageFormat.format("No entry for {0} in the database", new Object[] {name});
The format specifier {0} indicates that the first element of the array (with index 0) should be substituted into the string at that point. The element should be a String.
What if one of your data items is not a String? One solution is to format it separately (for example, with NumberFormat or DateFormat) and use the resulting String. There's no need to do that, however, because you can include the fact that the element is a number or date in the format specifier. This example includes a number, a string, and a date:
// numAppt is an int, pName is a String, and apptDate is a Date MessageFormat.format("{0,number} appointments for {1} on {2,date,medium}", new Object[] {new Integer(numAppt), pName, apptDate});
There are four possibilities for the type selector after the comma: date, time, number, and choice. The date selector results in a call to DateFormat.getDateInstance() to do the formatting; the time selector results in a call to DateFormat.getTimeInstance(). The number selector by itself uses NumberFormat.getInstance(), but style options can be specified to modify that behavior. The choice selector is explained in the next section.
If the type selector is followed by another comma, then whatever follows (up to the curly brace that ends the format specifier) is a style option. The format specifier for the date in the preceding example includes the style option medium, which is one of the styles for dates and times. The others are short, long, and full, and they correspond to the SHORT, MEDIUM, LONG, and FULL constants that DateFormat provides for selecting styles with the getInstance() methods. Additionally, the style for a date or time format can be a valid configuration pattern understood by the SimpleDateFormat class.
The valid styles for number formats are currency, percent, and integer; these styles modify the way the number is interpreted appropriately. The style can also be a configuration pattern accepted by the DecimalFormat class.
So far, I've discussed only the static format(String, Object[]) method. But instances of MessageFormat are useful, too. In fact, the static method is implemented in terms of a throwaway instance:
public static String format(String pattern, Object[] arguments) { MessageFormat temp = new MessageFormat(pattern); return temp.format(arguments); }
Why might you want to create an instance of MessageFormat? There are several possible reasons. For one thing, it would be a good thing to do if you had to print the same message multiple times with different data values. Because parsing and processing the format pattern is reasonably expensive, it would be a good idea to do so only once if the pattern is going to be reused many times. Another reason is that the static version of the method uses the default locale, whereas instances can be created for specific locales. Therefore, you should never use the static method in a multilingual program. You would also create an instance of MessageFormat if you were using it to parse text rather than to format it (more about that topic later).
The final reason why MessageFormat instances are useful is that they give us more flexibility. With an instance, you can build a pattern programmatically. For example, the preceding example used three different objects; but you could have written it this way:
MessageFormat msg = new MessageFormat("{0} appointments for {1} on {2}"); msg.setFormat(0, NumberFormat.getInstance()); // format 1, for the string, is already set. msg.setFormat(2, DateFormat.getDateInstance(Date.MEDIUM)); msg.format(new Object[] {new Integer(numAppt), pName, apptDate});
That seems like an awful lot of trouble to go through when you could just encode the information in the format string. But it can be useful in some complicated situations--and it is particularly useful if you have to include some data that is not a number, date, time, or string. You can extend the Format class to handle locale-independent formatting of any data type you want, but you can't extend the MessageFormat format specifier syntax to support your new Format classes. You can, however, make use of the new classes by explicitly including them with the setFormat() method, as just shown.
I mentioned earlier that the format specifiers include the ordinal number of the array element to which they refer, so that they can be reordered if necessary during the localization process. It's important to note that the numbers used to identify the specifiers in the setFormat() call are different from the numbers actually included in the format specifiers. The numbers used by the setFormat() method always refer to the specifiers in the order they occur in the pattern string, starting with zero. That's a problem for internationalization, because when the order of the specifiers changes during localization, the numbers used to set the format objects for the message have to change, too. I hope this problem is resolved in a future Java release.
As you can with the other subclasses of Format, you can use MessageFormat for parsing as well as formatting. The parse() methods return arrays of Object, and the patterns are the same as for formatting. The intent is that a message formatted with a pattern can be parsed with the same pattern. Although this logic may fail in some situations (including some uses of ChoiceFormat, described in the next section), it works in general.
There are two other things to know about MessageFormat. There is currently an arbitrary limit of 10 format specifiers for a single pattern. There's no need for a limit at all, and a comment in the source code indicates that this limit may someday be removed. (On the other hand, if you are using format patterns with 83 specifiers, you should probably consider a different strategy.)
TIP: If you have to include a { or } character in a format pattern, you must enclose it within the string in single quotation marks. If you have to include a single quote character, double it. This quoting mechanism is, in my opinion, a little strange and inconsistent with the backslash-quoting mechanism used for the first level of quoting in Java strings. However, there are problems with using that mechanism in this case as well, so it is difficult to say whether another option may have been better. Just remember to be on the lookout for occasions when any of those three characters must appear in a string formatted by MessageFormat.
Occasionally, some part of the text surrounding a value has to change depending on the value itself. There are many examples of this, but the classic example involves singular and plural forms of words. There was a time when computer users accepted messages like There were 1 match(es) found, but today, people expect better.
Java 1.1 provides the ChoiceFormat class for solving this problem. Assuming that the variable numMatches contains the number of matches found, here's one simple way to format the preceding message:
MessageFormat msg = new MessageFormat("There {0} found."); double[] limits = {0, 1, 2}; String[] choices = {"were no matches", "was 1 match", "were {0,number} matches"}; ChoiceFormat = new ChoiceFormat(limits, choices); msg.setFormat(0, choice); msg.format(new Object[] {numMatches})
The ChoiceFormat object is created with two arrays: an array of limits and an array of choices. The two arrays should be the same size. The limits array consists of double values, sorted in ascending order. The choices array consists of String values. ChoiceFormat is invoked to format a number. It compares that number to the numbers in the limits array, and formats the number using the element of the choices array that corresponds to the chosen limit. If there are N entries in each array, and we call the number to be formatted x, the matching element is chosen this way:
1 | if x < limit[1] |
i | if limit[i] <= x < limit[i+1] |
N | if limit[N] <= x |
The selected choice is substituted into the pattern and processed recursively. Note that the third choice in the preceding example--used when numMatches is 2 or greater--has a format specifier embedded within it. That specifier is used to format the number into the message if the third choice is selected.
That's a bit cumbersome, but fortunately there's an easier way. The ChoiceFormat class understands patterns of its own, and you can embed those patterns in the format specifiers for MessageFormat. For example, the previous message can also be specified like this:
MessageFormat.format("There {0,choice,0#were no matches|1#was 1 match|"
+ "2#were {0,number} matches} found.");
ChoiceFormat can also be used to parse strings like this one.
Java 1.1 provides a new feature for the AWT package that is designed to help with internationalizing graphical interfaces. In Java 1.0, buttons and several other kinds of components displayed their names as a user-visible label. In some cases, the name was useful for determining which button was clicked and was hard-coded into the application, making the label text difficult to localize. Java 1.1 components have an explicit label, distinct from the name, that is used for the user-visible display if it has been set. This separation of values allows the label to be localized while enabling you to use the same component name in all locales.
In this chapter, you learned about internationalization and all the Java 1.1 features that support this task. Internationalization is primarily a problem of managing program structure so that locale-sensitive data and operations are easy to find, and of using appropriate facilities to manipulate certain kinds of data.
Java's resource bundles help with the structuring issues, and a wide variety of other facilities are included for manipulating, formatting, and parsing locale-sensitive data, including dates, times, numbers, and text, in an internationalized manner.
©Copyright, Macmillan Computer Publishing. All rights reserved.