All Packages Class Hierarchy This Package Previous Next Index
Class com.oroinc.text.regex.Util
java.lang.Object
|
+----com.oroinc.text.regex.Util
- public final class Util
- extends Object
The Util class is a holder for useful static utility methods that can
be generically applied to Pattern and PatternMatcher instances.
This class cannot and is not meant to be instantiated.
The Util class currently contains versions of the split() and substitute()
methods inspired by Perl's split function and s operation
respectively, although they are implemented in such a way as not to
rely on the Perl5 implementations of the OROMatcher packages regular
expression interfaces. They may operate on any interface implementations
conforming to the OROMatcher API specification for the PatternMatcher,
Pattern, and MatchResult interfaces. Future versions of the class may
include additional utility methods.
A grep method is not included for two reasons:
- The details of reading a line at a time from an input stream
differ in JDK 1.0.2 and JDK 1.1, making it difficult to
retain compatibility across both Java releases.
- Grep style processing is trivial for the programmer to implement
in a while loop. Rarely does anyone want to retrieve all
occurences of a pattern and then process them. More often a
programmer will retrieve pattern matches and process them as they
are retrieved, which is more efficient than storing them all in a
Vector and then accessing them.
Copyright © 1997 Original Resuable Objects, Inc. All rights reserved.
- Author:
- Daniel F. Savarese
- See Also:
- Pattern, PatternMatcher
-
INTERPOLATE_ALL
- A constant passed to the substitute()
methods indicating that interpolation variables should be computed
relative to the most recent pattern match.
-
INTERPOLATE_NONE
- A constant passed to the substitute()
methods indicating that interpolation variables should be interpreted
literally, effectively disabling interpolation.
-
SPLIT_ALL
- A constant passed to the split() methods
indicating that all occurrences of a pattern should be used to
split a string.
-
SUBSTITUTE_ALL
- A constant passed to the substitute()
methods indicating that all occurrences of a pattern should be
substituted.
-
split(PatternMatcher, Pattern, String)
- Splits up a
String
instance into a Vector
of all its substrings using a regular expression as the delimiter.
-
split(PatternMatcher, Pattern, String, int)
- Splits up a
String
instance into strings contained in a
Vector
of size not greater than a specified limit.
-
substitute(PatternMatcher, Pattern, String, String)
- Searches a string for a pattern and substitutes only the first
occurence of the pattern.
-
substitute(PatternMatcher, Pattern, String, String, int)
- Searches a string for a pattern and substitutes only the first
numSubs occurences of the pattern.
-
substitute(PatternMatcher, Pattern, String, String, int, int)
- Searches a string for a pattern and replaces the first occurrences
of the pattern with a substitution string up to the number of
substitutions specified by the numSubs parameter.
SUBSTITUTE_ALL
public static final int SUBSTITUTE_ALL
- A constant passed to the substitute()
methods indicating that all occurrences of a pattern should be
substituted.
SPLIT_ALL
public static final int SPLIT_ALL
- A constant passed to the split() methods
indicating that all occurrences of a pattern should be used to
split a string.
INTERPOLATE_ALL
public static final int INTERPOLATE_ALL
- A constant passed to the substitute()
methods indicating that interpolation variables should be computed
relative to the most recent pattern match.
INTERPOLATE_NONE
public static final int INTERPOLATE_NONE
- A constant passed to the substitute()
methods indicating that interpolation variables should be interpreted
literally, effectively disabling interpolation.
split
public static Vector split(PatternMatcher matcher,
Pattern pattern,
String input,
int limit)
- Splits up a
String
instance into strings contained in a
Vector
of size not greater than a specified limit. The
string is split with a regular expression as the delimiter.
The limit parameter essentially says to split the
string only on at most the first limit - 1 number of pattern
occurences.
This method is inspired by the Perl split() function and behaves
identically to it when used in conjunction with the Perl5Matcher and
Perl5Pattern classes except for the following difference:
In Perl, if the split expression contains parentheses, the split()
method creates additional list elements from each of the matching
subgroups in the pattern. In other words:
split("/([,-])/", "8-12,15,18")
produces the Vector containing:
{ "8", "-", "12", ",", "15", ",", "18" }
The OROMatcher split method does not follow this behavior. The
following Vector would be produced by OROMatcher:
{ "8", "12", "15", "18" }
To obtain the Perl behavior, use split method in the PerlTools
package available from
https://www.savarese.org/oro/ .
- Parameters:
- matcher - The regular expression matcher to execute the split.
- pattern - The regular expression to use as a split delimiter.
- input - The
String
to split.
- limit - The limit on the size of the returned
Vector
.
Values <= 0 produce the same behavior as using the
SPLIT_ALL constant which causes the limit to be
ignored and splits to be performed on all occurrences of
the pattern. You should use the SPLIT_ALL constant
to achieve this behavior instead of relying on the default
behavior associated with non-positive limit values.
- Returns:
- A
Vector
containing the substrings of the input
that occur between the regular expression delimiter occurences.
The input will not be split into any more substrings than the
specified limit
. A way of thinking of this is that
only the first limit - 1
matches of the delimiting
regular expression will be used to split the input.
split
public static Vector split(PatternMatcher matcher,
Pattern pattern,
String input)
- Splits up a
String
instance into a Vector
of all its substrings using a regular expression as the delimiter.
This method is inspired by the Perl split() function and behaves
identically to it when used in conjunction with the Perl5Matcher and
Perl5Pattern classes except for the following difference:
In Perl, if the split expression contains parentheses, the split()
method creates additional list elements from each of the matching
subgroups in the pattern. In other words:
split("/([,-])/", "8-12,15,18")
produces the Vector containing:
{ "8", "-", "12", ",", "15", ",", "18" }
The OROMatcher split method does not follow this behavior. The
following Vector would be produced by OROMatcher:
{ "8", "12", "15", "18" }
To obtain the Perl behavior, use split method in the PerlTools
package available from
https://www.savarese.org/oro/ .
This method is identical to calling:
split(matcher, pattern, input, Util.SPLIT_ALL);
- Parameters:
- matcher - The regular expression matcher to execute the split.
- pattern - The regular expression to use as a split delimiter.
- input - The
String
to split.
- Returns:
- A
Vector
containing all the substrings of the input
that occur between the regular expression delimiter occurences.
substitute
public static String substitute(PatternMatcher matcher,
Pattern pattern,
String sub,
String input,
int numSubs,
int numInterpolations)
- Searches a string for a pattern and replaces the first occurrences
of the pattern with a substitution string up to the number of
substitutions specified by the numSubs parameter. A
numSubs value of SUBSTITUTE_ALL will cause all occurrences
of the pattern to be replaced.
The substitution string may contain variable interpolations referring
to the saved parenthesized groups of the search pattern.
A variable interpolation is denoted by $1, or $2,
or $3, etc. If you don't want such expressions to be
interpreted literally, you should set the numInterpolations
parameter to INTERPOLATE_NONE . It is easiest to explain
what an interpolated variable does by giving an example:
Suppose you have the pattern b\d+: and you want to substitute
the b's for a's and the colon for a dash in parts of
your input matching the pattern. You can do this by changing the
pattern to b(\d+): and using the substitution expression
a$1-. When a substitution is made, the $1 means
"Substitute whatever was matched by the first saved group of the
matching pattern." An input of b123: after substitution
would yield a result of a123-. But there's a little more
to be aware of. When using interpolations with the substitute()
method, if you set the numInterpolations parameter to
INTERPOLATE_ALL, then every time a match is found, the
interpolation variables are computed relative to that match.
But if numInterpolations is set to some positive integer, then
the only the interpolation variables for the first numInterpolation
matches are computed relative to the most recent match. After that,
the remaining substitutions have their variable interpolations performed
relative to the numInterpolations 'th match. So using the
previously mentioned pattern and substition expression, if you have
an input of
Tank b123: 85 Tank b256: 32 Tank b78: 22
and use a numInterpolations value of INTERPOLATE_ALL and
numSubs value of SUBSTITUTE_ALL, then your result
will be:
Tank a123- 85 Tank a256- 32 Tank a78- 22
But if you set numInterpolations to 2 and keep
numSubs with a value of SUBSTITUTE_ALL, your result is:
Tank a123- 85 Tank a256- 32 Tank a256- 22
Notice how the last substitution uses the same value for $1
as the second substitution.
A final thing to keep in mind is that if you use an interpolation variable
that corresponds to a group not contained in the match, then it is
interpreted literally. So given the regular expression from the
example, and a substitution expression of a$2-, the result
of the last sample input would be:
Tank a$2- 85 Tank a$2- 32 Tank a$2- 22
Also, $0 is always interpreted literally.
Note, substitution patterns containing a $ character will take
longer to perform substitutions if INTERPOLATE_NONE isn't
used because group interpolation must be checked for.
- Parameters:
- matcher - The regular expression matcher to execute the pattern
search.
- pattern - The regular expression to search for and substitute
occurrences of.
- sub - The string used to substitute pattern occurences.
- input - The
String
on which to perform substitutions.
- numSubs - The number of substitutions to perform. Only the
first numSubs patterns encountered are
substituted. If you want to substitute all occurences
set this parameter to SUBSTITUTE_ALL .
- numInterpolations
- If set to INTERPOLATE_NONE, interpolation variables are
interpreted literally and not as references to the saved
parenthesized groups of a pattern match. If set to
INTERPOLATE_ALL , all variable interpolations
are computed relative to the pattern match responsible for
the current substitution. If set to a positive integer,
the first numInterpolations substitutions have
their variable interpolation performed relative to the
most recent match, but the remaining substitutions have
their variable interpolations performed relative to the
numInterpolations 'th match.
- Returns:
- A String comprising the input string with the substitutions,
if any, made. If no substitutions are made, the return String
is a copy of the input String.
substitute
public static String substitute(PatternMatcher matcher,
Pattern pattern,
String sub,
String input,
int numSubs)
- Searches a string for a pattern and substitutes only the first
numSubs occurences of the pattern.
This method is identical to calling:
substitute(matcher, pattern, sub, input, numSubs, Util.INTERPOLATE_ALL);
- Parameters:
- matcher - The regular expression matcher to execute the pattern
search.
- pattern - The regular expression to search for and substitute
occurrences of.
- sub - The string used to substitute pattern occurences.
- input - The
String
on which to perform substitutions.
- numSubs - The number of substitutions to perform. Only the
first numSubs patterns encountered are
substituted. If you want to substitute all occurences
set this parameter to SUBSTITUTE_ALL .
- Returns:
- A String comprising the input string with the substitutions,
if any, made. If no substitutions are made, the return String
is a copy of the input String.
substitute
public static String substitute(PatternMatcher matcher,
Pattern pattern,
String sub,
String input)
- Searches a string for a pattern and substitutes only the first
occurence of the pattern.
This method is identical to calling:
substitute(matcher, pattern, sub, input, 1, Util.INTERPOLATE_ALL);
- Parameters:
- matcher - The regular expression matcher to execute the pattern
search.
- pattern - The regular expression to search for and substitute
occurrences of.
- sub - The string used to substitute pattern occurences.
- input - The
String
on which to perform substitutions.
- Returns:
- A String comprising the input string with the substitutions,
if any, made. If no substitutions are made, the return String
is a copy of the input String.
All Packages Class Hierarchy This Package Previous Next Index