Java 1.1 Unleashed
-15-
|
Class | Description |
BreakIterator | Finds boundary locations in text |
CharacterIterator | Interface for bidirectional text iteration |
ChoiceFormat | Attaches a format to a range of numbers |
CollationElementIterator | Walks through each character of an international string |
CollationKey | Compares strings that are part of a Collator class |
Collator | Abstract class that provides Unicode text-comparison services |
DateFormat | Abstract base class for date and time formatting |
DateFormatSymbols | Encapsulates date and time formatting functionality for changes across languages and countries |
DecimalFormat | Formats decimal numbers |
DecimalFormatSymbols | Represents symbols such as decimal separators and grouping separators required by DecimalFormat when formatting numbers |
FieldPosition | Aligns columns of formatted text |
Format | Abstract base class for all formats |
MessageFormat | Formats localizable concatenated messages |
NumberFormat | Abstract base class for all number formats |
ParsePosition | Records the parsing position for formatted strings |
SimpleDateFormat | Formats and parses dates in a localized way |
StringCharacterIterator | Implements the CharacterIterator interface for strings |
RuleBasedCollator | Simple Collator implementation |
SimpleDateFormat | Concrete class for parsing and formatting dates |
BreakIterator | Iterates through word and line boundaries in a stringParseException; thrown when an error occurs while parsing or formatting |
Formats provide a way for the programmer to easily handle the formatting of text, numbers, dates, times, and so on, in a localized way. For example, in the United States, the number 4.00 written as a monetary amount is $4.00, but in Germany, it is DM4,00. By using classes derived from java.text.Format, you are spared the details of how a particular locale writes its numbers or strings for any representation such as money, time, and so on. Instead, you can concentrate on writing the application at hand. The JDK 1.1 provides classes you can use to format numbers, dates and times, and text messages. Formats rely heavily on the use of java.util.Locale, so let's take a look at that before continuing. Table 15.2 lists the important methods supported by the Format class.
Method | Description |
format() | Formats the given object |
parseObject() | Parses the given string using the current format |
The java.text.NumberFormat class is the abstract base class for all classes that support number formatting and parsing. Code that uses a NumberFormat-derived class can be written to be completely independent of the current locale with respect to number conventions (that is, decimal sign, percent sign, separator for thousands, and so on).
The NumberFormat class provides a few static methods that return an appropriate number, currency, or percent format for a specific locale:
NumberFormat defFormat = NumberFormat.getInstance(); NumberFormat defCurrFmt = NumberFormat.getCurrencyInstance(); NumberFormat defPctFmt = NumberFormat.getPercentInstance(); NumberFormat frFormat = NumberFormat.getInstance( Locale.FRENCH ); NumberFormat usCurrFmt = NumberFormat.getCurrencyInstance( Locale.US ); getPercentInstanceNumberFormat usPctFmt = NumberFormat.getPercentInstance( Locale.US );
The first three methods in this list return the NumberFormat objects for the default Locale for numbers, currencies, and percents, respectively.
The second three methods also return NumberFormat objects for numbers, currencies, and percents; however, these methods also accept a Locale object (in these examples, Locale.FRENCH and Locale.US) as a parameter and use that as the Locale whose NumberFormat should be returned.
Once you retrieve a NumberFormat object, you can use it to generate a properly formatted number. The following line creates the string for 4.00 monetary units in the default locale:
String moneyString = NumberFormat.getCurrencyInstance().format( 4.00 );
You can also use the NumberFormat class to parse a string that you know is a representation of a number in the current locale. For example, the following statement parses the string "$4.00" and finds the value 4.00 to store in the variable myFour:
Number myFour = NumberFormat.getCurrencyInstance(Locale.US).parse( "$4.00" );
Finally, you can use the NumberFormat class in conjunction with the FieldPosition class to provide a simple way of aligning numbers on different fields, such as the decimal sign, the percent sign, and so on. Table 15.3 lists the important methods in the NumberFormat class.
Method | Description |
format() | Formats the given object into a string; overrides the Format class |
parseObject() | Parses a string and creates an Object object |
parse() | Parses a string and returns a Number object |
isParseIntegerOnly() | Returns true if this parser stops when it hits a decimal point |
setParseIntegerOnly() | Specifies whether the parser should read past the decimal point |
getInstance() | Gets the default NumberFormat object for a locale |
getNumberInstance() | Gets a general-purpose formatter for a locale |
getCurrencyInstance() | Gets a currency formatter for a locale |
getPercentInstance() | Gets a percent formatter for a locale |
getAvailableLocales() | Returns all locales supported by the NumberFormat class |
isGroupingUsed() | Returns true if grouping is used; example: The number 12345 would be 12,345 with grouping turned on |
setGroupingUsed() | Specifies whether grouping should be used |
get/setMaximumIntegerDigits() | Gets or sets the maximum number of integer digits to be used |
get/setMinimumIntegerDigits() | Gets or sets the minimum number of integer digits to be used |
get/setMaximumFractionDigits() | Gets or sets the maximum number of fraction digits to be used |
get/setMinimumFractionDigits() | Gets or sets the minimum number of fraction digits to be used |
import java.text.*; import java.util.*; public class NumberFormatExample { public static void main( String args[] ) { try { double averages[] = { 0.456, 0.78, 0.3, 1.0, .25, .345 }; String spaces = " "; System.out.println( "Available Locales for NumberFormat" ); Locale availLocs[] = NumberFormat.getAvailableLocales(); for( int i=0; i<availLocs.length; i++ ) { System.out.println( "\t" + availLocs[i].getDisplayName() ); DecimalFormat fmt = (DecimalFormat) NumberFormat.getInstance( availLocs[i] ); String pattern = fmt.toPattern(); int len = pattern.length(); String newPattern = pattern.substring(0,len-4) + ".000"; fmt = new DecimalFormat( newPattern, new DecimalFormatSymbols(availLocs[i]) ); FieldPosition status = new FieldPosition( NumberFormat.FRACTION_FIELD ); for( int j=0; j<averages.length; j++ ) { StringBuffer sb = new StringBuffer(); fmt.format( averages[j], sb, status ); System.out.println( spaces.substring(0, 20-status.getEndIndex()) + sb.toString() ); } System.out.println(""); } } catch (Exception e) { e.printStackTrace(); } }
}
The output from this program looks like this:
Available Locales for NumberFormat Belorussian (Belarus) 0,456 0,780 0,300 1,000 0,250 0,345 Bulgarian (Bulgaria) 0,456 0,780 0,300 1,000 0,250 0,345
As you can see, the same code block produces different output strings depending on the locale.
The java.text.DateFormat class is an abstract base class for all classes that parse and format dates and times in a localized manner. Like the NumberFormat class, the DateFormat class also provides a number of static functions that retrieve default formats for dates and times:
DateFormat fmt = DateFormat.getDateInstance(); DateFormat fmt = DateFormat.getDateInstance( DateFormat.SHORT ); DateFormat fmt = DateFormat.getDateInstance( DateFormat.LONG, Locale.FRENCH ); DateFormat fmt = DateFormat.getDateTimeInstance(); DateFormat fmt = DateFormat.getDateInstance( DateFormat.LONG, DateFormat.SHORT ); DateFormat fmt = DateFormat.getDateInstance( DateFormat.LONG, DateFormat.SHORT, Locale.US ); DateFormat fmt = DateFormat.getTimeInstance(); DateFormat fmt = DateFormat.getTimeInstance( DateFormat.SHORT ); DateFormat fmt = DateFormat.getTimeInstance( DateFormat.LONG, Locale.FRENCH );
The getDateFormat() method returns the default date format for the default locale. The second version of this method, getDateFormat(DateFormat.SHORT), returns the default date format for the given style for the default locale. The style attribute can be SHORT, MEDIUM, LONG, FULL, or DEFAULT. The third form of this method, getDateFormat(DateFormat.LONG, Locale.FRENCH), returns the default date format for the given style in the given locale. The getDateTimeFormat() method returns the default date and time format for the default locale. The getDateFormat(DateFormat.LONG, DateFormat.SHORT) method returns the default date and time format for the given date and time formatting styles for the default locale. The next version of this method, getDateFormat(DateFormat.LONG, DateFormat.SHORT, Locale.US), returns the default date and time format for the given date and time formatting styles for the given locale. The getTimeFormat() method returns the default time format for the default locale. The second version of this method, getTimeFormat(DateFormat.SHORT), returns the default time format with the given style for the default locale. The style attribute for the time methods can have the same value as the style attribute for the date methods: SHORT, LONG, FULL, or DEFAULT. The third version of this method, getTimeFormat(DateFormat.LONG, Locale.FRENCH), returns the default time format with the given style in the given locale.
Here are some examples of the DateFormat attributes:
Attribute | Example |
SHORT | 4/2/97 (completely numeric) |
MEDIUM | Apr 2, 1997 |
LONG | April 2, 1997 |
FULL | Wednesday, April 2, 1997 AD |
These field constants are also available in the DateFormat class: AM_PM_FIELD, DATE_FIELD, and DAY_OF_WEEK_FIELD, among others. These field constants are used with the FieldPosition class to help properly align strings formatted with a DateFormat object. For example, if the formatted date is Friday, April 4, 1997, and the FieldPosition class is using the DAY_OF_WEEK_FIELD for alignment, the DateFormat object determines that the day of the week begins in string[0] and ends in string[5]; the FieldPosition object sets its getBeginIndex() and getEndIndex() methods to 0 and 5, respectively.
Table 15.4 lists the important methods in the DateFormat class.
Method | Description |
format() | Formats the given object into a string; overrides class Format |
parseObject() | Parses a string and creates an Object object |
parse() | Parses a string and returns a Number object |
getTimeInstance() | Gets a time formatter for a locale |
getDateInstance() | Gets a date formatter for a locale |
getDateTimeInstance() | Gets a date and time formatter for a locale |
getInstance() | Gets the default DateFormat for a locale |
getAvailableLocales() | Gets all the locales supported by DateFormat |
get/setCalendar() | Gets or sets the Calendar to be used by the DateFormat object |
get/setNumberFormat() | Gets or sets the NumberFormat to be used by the DateFormat object |
get/setTimeZone() | Gets or sets the TimeZone object for the calendar of this DateFormat object |
is/setLenient() | Gets or sets whether this object uses lenient parsing; if lenient parsing is used, the parser will use heuristics to interpret the input; if lenient parsing is off, the parser will use strict parsing rules |
import java.text.*; import java.util.*; public class DateFormatExample { public static void main( String args[] ) { try { System.out.println( "Available Locales for DateFormat" ); Locale availLocs[] = DateFormat.getAvailableLocales(); for( int i=0; i<availLocs.length; i++ ) { System.out.println( "\t" + availLocs[i].getDisplayName() ); } SimpleDateFormat fmt = new SimpleDateFormat( "'It is now' H:mm 'on' EEEE',' MMMM d',' yyyy" ); FieldPosition status = new FieldPosition( DateFormat.DAY_OF_WEEK_FIELD ); // format today's date Date today = new Date(); StringBuffer sbToday = new StringBuffer(); fmt.format( today, sbToday, status ); int todayOffset = status.getEndIndex(); // format tomorrow's date Date tomorrow = new Date( today.getTime() + 86400000 ); StringBuffer sbTmw = new StringBuffer(); fmt.format( tomorrow, sbTmw, status ); int tmwOffset = status.getEndIndex(); // format tomorrow+1 Date tp1 = new Date( tomorrow.getTime() + 86400000 ); StringBuffer sbTp1 = new StringBuffer(); fmt.format( tp1, sbTp1, status ); int tp1Offset = status.getEndIndex(); // align all dates in column 40 of the screen using the DAY_OF_WEEK String spaces = " "; System.out.println("Dates"); System.out.print( spaces.substring(0, 40-todayOffset) ); System.out.println( sbToday.toString() ); System.out.print( spaces.substring(0, 40-tmwOffset) ); System.out.println( sbTmw.toString() ); System.out.print( spaces.substring(0, 40-tp1Offset) ); System.out.println( sbTp1.toString() ); // parse a date from a string (reverse-formatting) String dateStr = "It is now 16:26 on Tuesday, February 4, 1997"; Date date = fmt.parse( dateStr ); System.out.println("Parsing"); System.out.println( "\t" + date.toString() ); } catch (Exception e) { e.printStackTrace(); } } }
The output of this program looks something like the following. The actual output from your system may vary depending on which locales are installed.
Available Locales for DateFormat Arabic (Egypt) Belorussian (Belarus) Bulgarian (Bulgaria) Catalan (Spain) Czech (Czech Republic) Danish (Denmark) German (Germany) <more locales here> Dates It is now 14:37 on Monday, April 7, 1997 It is now 14:37 on Tuesday, April 8, 1997 It is now 14:37 on Wednesday, April 9, 1997 Parsing
Tue Feb 04 16:26:00 PST 1997
As you can see, the dates are aligned on the day of the week, and the string we made up is parsed into a java.util.Date object, just as we expected!
The java.text.ChoiceFormat class is a formatter that allows you to attach a pattern to a range of numbers (of type double). Its most common use is in conjunction with the MessageFormat class to handle cases with plurals (for example, "zero objects," "one object," and "many objects"), although it is certainly not limited to such use.
A ChoiceFormat object is specified with a list of ascending numbers (doubles) that determine the limits to be used. A number X falls into a given interval between list[j] and list[j+1] if and only if list[j] <= X < list[j+1]. If X < list[0], then list[0] is used. Similarly, if X > list[N-1] (where there are N items in the list), list[N-1] is used.
For example, if the list is {1.0, 2.0, 3.0}, then the following are true:
0.5 maps to list[0] because 0.5 is less than 1.0 (list[0]) 1.5 also maps to list[0] because 1.0 <= 1.5 < 2.0 2.5 maps to list[1] because 2.0 <= 2.5 < 3.0 3.5 maps to list[2] because 3.0 <= 3.5
Along with the list of numbers that determine the limits is a list of objects. The list of objects has the same number of items as the list of limits and contains the items to be used as the formats for the corresponding limits. Although it sounds confusing, it is really very simple. Table 15.5 lists the important methods in the ChoiceFormat class.
Method | Description |
applyPattern() | Sets the pattern for a ChoiceFormat object |
toPattern() | Gets the pattern for a ChoiceFormat object |
setChoices() | Sets the limits for a ChoiceFormat object |
getLimits() | Gets the limits for a ChoiceFormat object |
getFormats() | Gets the formats for a ChoiceFormat object |
format() | Formats an object into a string |
parse() | Parses a string and creates a Number object |
nextDouble() | Finds the next double greater than or equal to a given value |
previousDouble() | Finds the next double less than or equal to a given value |
import java.text.*; import java.util.*; public class SimpleChoiceFormatExample { public static void main( String args[] ) { try { double[] limits = { 1, 4, 7, 10 }; String[] seasons = { "Winter", "Spring", "Summer", "Autumn" }; ChoiceFormat fmt = new ChoiceFormat( limits, seasons ); for (int i = 1; i <= 12; ++i) { System.out.println( "Month number " + i + " falls in " + fmt.format(i) ); } } catch (Exception e) { e.printStackTrace(); } }
}
This program prints the following output:
Month number 1 falls in Winter Month number 2 falls in Winter Month number 3 falls in Winter Month number 4 falls in Spring Month number 5 falls in Spring Month number 6 falls in Spring Month number 7 falls in Summer Month number 8 falls in Summer Month number 9 falls in Summer Month number 10 falls in Autumn Month number 11 falls in Autumn Month number 12 falls in Autumn
By letting the formatter do the work, you save the effort of doing many comparisons to determine the season in which a particular month falls. The next section makes it clearer why the ChoiceFormat class is useful.
The java.text.MessageFormat class provides a simple way to get concatenated messages in a language-neutral (localized) way. A MessageFormat object has a specified pattern and, optionally, a list of Format objects associated with it. A MessageFormat object's specification is of the following form:
MessageFormat fmt = new MessageFormat( "The incoming fax from {0} has a total of {1} pages.");
The string passed in is called the pattern. The pattern is used by the MessageFormat object when formatting and is subject to the following set of rules:
Table 15.6 lists the important methods found in the MessageFormat class.
Method | Description |
get/setLocale() | Gets or sets the locale for the MessageFormat object |
applyPattern() | Sets the pattern for the object |
toPattern() | Gets the pattern for the object |
setFormats() | Sets all the formats to be used by the object |
setFormat() | Sets an individual format to be used by the object |
getFormats() | Gets all the formats for the object |
format() | Formats an object and returns a string |
parse() | Parses a string and returns an array of objects |
parseObject() | Parses a string and returns the next object |
import java.text.*; import java.util.*; public class MessageFormatExample { public static void main( String args[] ) { try { MessageFormat fmt = new MessageFormat( "The fax from {1} has {0} pages." ); Object fmtArgs[] = { new Long(5), "Joe Schmo" }; System.out.println( fmt.toPattern() + "; " + fmt.format(fmtArgs) ); } catch (Exception e) { e.printStackTrace(); } } }
The output of this short code snippet is as follows:
The fax from {1} has {0} pages.; The fax from Joe Schmo has 5 pages.
Because we did not specify any formats for the particular arguments, the MessageFormat object simply substituted fmtArgs[0] for {0} and fmtArgs[1] for {1}. Listing 15.5 uses a ChoiceFormat object in conjunction with a MessageFormat object to create a formatted string. You can find this file on the CD-ROM that accompanies this book.
import java.text.*; public class MessageFormatExample2 { public static void main( String args[] ) { try { // the limits to use for ChoiceFormat double[] limits = { 0, 1, 2}; // strings for 0, 1, and >1 pages String[] pages = { "no pages", "one page", "{1,number} pages" }; // a ChoiceFormat object based on the given limits ChoiceFormat chFmt = new ChoiceFormat( limits, pages ); // senders of faxes String[] senders = { "Joe", "Fred", "Mary" }; // formats to use for the arguments in the MessageFormat Format[] testFormats = { null, chFmt }; MessageFormat messFmt = new MessageFormat( "The fax from {0} has {1}." ); messFmt.setFormat( 1, chFmt ); for (int i = 0; i < 3; ++i) { // an array of Objects to pass to the MessageFormat Object[] testArgs = { senders[i], new Long(i) }; // format the arguments and print out the resulting string System.out.println( messFmt.toPattern() + " -> " + messFmt.format(testArgs) ); } } catch (Exception e) { e.printStackTrace(); } } }
The output of this code is as follows:
The fax from {0} has {1,choice,0.0#no pages|1.0#one
page|2.0#{1,number} pages}. -> The fax from Joe has no pages.
The fax from {0} has {1,choice,0.0#no pages|1.0#one page|2.0#{1,number} pages}. ->
The fax from Fred has one page.
The fax from {0} has {1,choice,0.0#no pages|1.0#one page|2.0#{1,number} pages}. ->
The fax from Mary has 2 pages.
The java.text.Collator class is an abstract class that provides a common interface for the language-sensitive comparison of strings, text searches, and alphabetical sorting. The Collator class hides from the developer the nuances of any individual locale so that you can use the same code in any local setting. Table 15.7 lists the important methods used in the Collator class.
Method | Description |
getInstance() | Gets a Collator for a locale |
compare() | Compares two strings according to the rules for a Collator |
getCollationKey() | Transforms a string into a set of bits that can be |
compared (bitwise) to other CollationKeys from the same Collator | |
get/setStrength() | Gets or sets the strength of the Collator; legal values are PRIMARY, SECONDARY, and TERTIARY |
get/setDecomposition() | Gets or sets the decomposition mode for the Collator; legal values are NO_DECOMPOSITION, CANONICAL_DECOMPOSITION, and FULL_DECOMPOSITION |
getAvailableLocales() | Gets a list of locales supported by the Collator class |
Languages throughout the world differ with respect to both the characters they use and the way they treat those characters when comparing and sorting them. There are four areas that apply to correct string comparison and sorting: ordering characters, grouping characters, expanding characters, and ignoring characters. The following sections explain these areas in more detail.
NOTE: Java uses the Unicode representation of strings instead of using multibyte representation.
There are three types of orderings: primary, secondary, and tertiary. When you compare two strings, you first do so by comparing characters at the same positions in each string. The first difference in this primary ordering determines the order of the strings, regardless of the remaining characters. For example, "deed" is less than "definition". The first primary difference is in the third character, where "e" is less than "f". In languages such as English and German, the primary ordering is in base letters. For example, "a" is different than "b" but is not different from "A". Remember that Java uses the Unicode representation of characters, which is actually a superset of the ASCII character set. Punctuation such as spaces and quotation marks precede numbers, which precede letters in the ordering.
If the primary ordering shows that the strings are identical, the comparison then moves to secondary ordering. In English, the secondary ordering is in the case of the characters. Thus, "apple" is less than "Apple". In this example, the primary ordering of the characters is the same, but the first characters have a secondary difference. In languages such as Czech and French, the secondary ordering is in accents.
Finally, if all secondary orderings are also identical, the comparison moves to tertiary ordering. For example, in Czech, the secondary ordering is based on accent marks ("e" is less than "è") and the tertiary ordering is based on case.
Some languages stipulate that a certain sequence of characters should be treated as a single character. For example, in Spanish, "c" is less than "ch" which is less than "d", because "ch" is treated as a base character in itself and is placed in the ordering between the characters "c" and "d".
Note that the language normally determines when a set of characters should be treated as a single character (as in the Spanish "ch" example), but the programmer can also insert his or her own grouped characters, as we will see later in this chapter with the RuleBasedCollator example.
Some languages stipulate that a single character be treated as a sequence of characters. For example, in German, "s" is less than "ß" which is less than "t". In this case, the character "ß" is treated as though it is the characters "ss" for ordering purposes.
Most languages have certain characters that can be ignored when you compare and sort strings. That is, some characters are not significant unless there are no differences in the remainder of the string. In English, one such character is the dash (-). For example, "foobar" is less than "foo-bar" which is less than "foobars".
For any given collation operation, you can specify a strength (PRIMARY, SECONDARY, or TERTIARY). The strength of a collation is the highest level at which comparisons are made; differences in levels beyond the specified strength are ignored. For example, if you set the strength of a collation to SECONDARY, any characters that have tertiary differences are reported as being equal. The strength of a collator can be set using Collator.setStrength().
It is simple to compare two strings using the Collator.compare() method. However, the comparison algorithm used by Collator.compare() is very complex. If you are sorting long lists of strings, the operation may be very slow because compare() repetitively compares the same strings. As an alternative, you can use the java.text.CollationKey class, which is a key representing a given string in a collation. You can generate CollationKey objects for all your strings and cache them for use in all comparisons instead of using the strings themselves. Because they are bit-ordered, CollationKey objects allow you to do bitwise comparisons; in addition, once the keys are generated, comparisons are faster than direct comparisons of the two strings.
Table 15.8 lists the important methods in the CollationKey class.
Method | Description |
compareTo() | Compares the CollationKey to another CollationKey object from the same Collator |
getSourceString() | Returns a reference to the actual string which maps to the CollationKey under the given Collator |
toByteArray() | Converts the CollationKey to a sequence of bits; used for bitwise comparison of keys |
Decomposing characters is another way of saying preparing characters for sorting. Decomposing characters involves the four attributes discussed earlier: ordering, grouping, expanding, and ignoring characters. Decomposing characters is the process of actually applying a language's rules to a given set of characters.
When you are dealing with Unicode characters, there are three decomposition modes to consider:
The java.text.RuleBasedCollator class is a concrete class that provides a very simple Collator implementation using tables (hence the name). This class uses a set of collation rules to determine the result of comparisons. The rules can take on three different forms:
The definitions for each component of these rules are as follows:
Component | Description |
modifier | There is only one modifier, "@", which indicates that allUcondary cdeg.[partialdiff]ferences are ordered in reverse. |
relation | There are four relations. The first three--"<", ";", and ","--mean "greater than" for primary, secondary, and tertiary differences, respectively. The fourth relation is "=", which means "equal." |
reset | There is only one reset, "&", which specifies that the next rule follows the position in which the reset argument would be sorted. The reset argument follows the "&", as in "a < b & a < c". In this case, "a" is the reset argument and "c" is placed in the list after text argument "a", yielding "a < c < b". |
Any sequence of characters, excluding "special" characters (those characters contained in whitespace or used as modifiers, relations, and resets). To use a special character within a string, place it in single quotation marks (for example, `&'). |
Example | Comment |
a < b < c | |
a < c & a < b | Equivalent to a < b < c |
a < b & b < c | Equivalent to a < b < c |
- < a < b < c | `-' can be ignored because it preceded the first relation |
Example | Comment |
a < b & c < d | c has not been put into the rules and thus cannot be used as a reset argument |
w < x y < z | There is no relation between x and y |
w <, x | There is no text argument between the relations < and , |
Table 15.9 lists the important methods in the RuleBasedCollator class.
Method | Description |
getRules() | Returns a string representation of the rules for the object |
getCollationElementIterator() | Gets a CollationElementIterator object for a given string under the RuleBasedCollator |
compare() | Compares two strings based on the rules in the RuleBasedCollator |
getCollationKey() | Gets a CollationKey for a given string under the rules for the RuleBasedCollator |
import java.text.*; import java.util.*; public class RuleBasedCollatorExample { public static void main( String args[] ) { try { // make a collation with rules from the US RuleBasedCollator collUS = (RuleBasedCollator) Collator.getInstance(Locale.US); // provide the ordering for levels // no need to do the C's because they will have a primary difference String newRules = "< B1 < a1, A1 < a2, A2 < a3, A3 < B2 < B3"; String sampleInput[] = { "B1", "a1", "A3", "A1", "B3", "B2", "a2", "A2", "B1" }; RuleBasedCollator newColl = new RuleBasedCollator( newRules ); newColl.setStrength( Collator.TERTIARY ); CollationKey keys[] = new CollationKey[sampleInput.length]; // print the original list for( int i=0; i<sampleInput.length; i++ ) { System.out.print( sampleInput[i] + " " ); keys[i] = newColl.getCollationKey( sampleInput[i] ); } System.out.println(""); // sort the list for( int i=0; i<sampleInput.length-1; i++ ) { for (int j=i+1; j<sampleInput.length; j++ ) { if( keys[i].compareTo(keys[j]) > 0 ) { CollationKey tmpkey = keys[i]; keys[i] = keys[j]; keys[j] = tmpkey; String tmp = sampleInput[i]; sampleInput[i] = sampleInput[j]; sampleInput[j] = tmp; } } } // print the sorted list for( int i=0; i<sampleInput.length; i++ ) { System.out.print( sampleInput[i] + " " ); } System.out.println(""); } catch (Exception e) { e.printStackTrace(); } } }
The output of the program is as follows:
B1 a1 A3 A1 B3 B2 a2 A2 B1 B1 B1 a1 A1 a2 A2 A3 B2 B3
There are two classes of iterators in the java.text package: the CollationElementIterator, which is used to iterate through each character of an international string; and StringCharacterIterator, which implements the CharacterIterator interface and is used for bidirectional iteration over a given string.
The java.text.CollationElementIterator class allows you to go through each character of an international string and return the ordering priority of the positioned character. The "key" of a character is an integer that comprises the primary, secondary, and tertiary orders for the character. The primary order is of type short (16 bits); the secondary and tertiary orders are of type byte (8 bits). This integer is formed internally to the iterator based on other characters in the string.
Table 15.10 lists the important methods in the CollationElementIterator class.
Method | Description |
reset() | Resets the iterator's marker to the beginning of the string |
next() | Gets the next character from the iterator |
primaryOrder() | Gets the primary order of the given character |
secondaryOrder() | Gets the secondary order of the given character |
tertiaryOrder() | Gets the tertiary order of the given character |
import java.text.*; import java.util.*; public class CollationElementIteratorExample { public static void main( String args[] ) { try { // make a collation with rules from the US RuleBasedCollator collUS = (RuleBasedCollator) Collator.getInstance(Locale.US); // provide the ordering for levels String newRules = "< B1 < a1, A1 < a2, A2 < a3, A3 < B2 < B3"; // sample list String sampleInput[] = { "B1a1A3A1B3B2a2A2B1" }; RuleBasedCollator newColl = new RuleBasedCollator( collUS.getRules() + newRules ); // sort the list for( int i=0; i<sampleInput.length; i++ ) { CollationElementIterator iter = newColl.getCollationElementIterator( sampleInput[i] ); int next; int count = 0; while( (next = iter.next()) != CollationElementIterator.NULLORDER ) { int pri = CollationElementIterator.primaryOrder( next ); int sec = CollationElementIterator.secondaryOrder( next ); int ter = CollationElementIterator.tertiaryOrder( next ); System.out.println( "orderings for character " + count + " of string " + i + " are " + pri + "," + sec + "," + ter ); count++; } System.out.println(""); } } catch (Exception e) { e.printStackTrace(); } } }
The output of this program is as follows:
orderings for character 0 of string 0 are 96,0,0 orderings for character 1 of string 0 are 97,0,0 orderings for character 2 of string 0 are 99,0,1 orderings for character 3 of string 0 are 97,0,1 orderings for character 4 of string 0 are 101,0,0 orderings for character 5 of string 0 are 100,0,0 orderings for character 6 of string 0 are 98,0,0 orderings for character 7 of string 0 are 98,0,1 orderings for character 8 of string 0 are 96,0,0
Notice that characters 6 and 7 have the same primary value (the value 98); this is because a2 and A2 are the same as far as primary comparisons go. However, there is a case difference between a2 and A2 as our rules are written; thus, there is a tertiary difference (the third value; character 6 has value 0 and character 7 has value 1).
The StringCharacterIterator class implements the CharacterIterator interface for strings. The CharacterIterator interface specifies a protocol for the bidirectional iteration over text, on a range of character positions bounded by startIndex and endIndex-1.
Table 15.11 shows four methods from this class which allow you to manipulate and query the indices of this iterator.
Method | Description |
startIndex() | Retrieves the starting index for the given iterator |
endIndex() | Retrieves the ending index |
getIndex() | Retrieves the index of the character currently being used by the iterator |
setIndex() | Changes the current index |
Method | Description |
current() | Returns the character at the current index |
previous() | Decrements the index by 1 and returns the character at the new index |
next() | Increments the index by 1 and returns the character at the new index |
import java.text.*; import java.util.*; public class StringCharacterIteratorExample { public static void main( String args[] ) { try { String source = "This is my string! Hey there."; StringCharacterIterator iter = new StringCharacterIterator( source ); int pos = 7; System.out.println( "Starting from position " + pos ); System.out.print( "\t" ); for (char c = iter.setIndex(pos); c != CharacterIterator.DONE && iter.getIndex() <= iter.getEndIndex(); c = iter.next()) { System.out.print( c ); } System.out.print( "\n" ); System.out.print( "\t" ); for (char c = iter.setIndex(pos); c != CharacterIterator.DONE && iter.getIndex() >= iter.getBeginIndex(); c = iter.previous()) { System.out.print( c ); } System.out.print( "\n" ); BreakIterator bd = BreakIterator.getWordInstance(); bd.setText( iter ); int start = bd.first(); for ( int end = bd.next(); end != BreakIterator.DONE; start = end, end = bd.next() ) { System.out.println( source.substring(start, end) ); } } catch (Exception e) { e.printStackTrace(); } } }
The output of this program is as follows:
Starting from position 7 my string! Hey there. si sihT This is my string ! Hey there.
Writing inherently international code is not easy; many large software houses spend a lot of money in their efforts to make localization simpler. What the JDK 1.1 provides in the java.text package is a huge first step toward helping all Java developers create clean, international code. Clean, international code, in turn, will help make Java the global language we all want it to be. As the Web grows, we can no longer assume anything about the demographics of those who use what we publish.
This chapter has given you some insight not only into how to use the package, but also into what it means to make a program international and localizable. As a developer, you must shift your thought processes when you write international code; as you do that, the ideas behind the java.text package will become clearer and you will fully understand how to use all the classes described in this chapter. Good luck! Viel Glück! Bonne chance! Chuk li ho wan! Kou-unn wo Negattemasu! Bueno suerte!
©Copyright, Macmillan Computer Publishing. All rights reserved.