Splitting Strings


You can split a String on a particular delimiting character or a Regular Expression, you can use the String.split() method that has the following signature:

public String[] split(String regex)

Note that delimiting character or regular expression gets removed from the resulting String Array

Example using delimiting character:

String csvData = "John;Doe;67890;230317";
String[] csvCells = csvData.split(";");
// Result is csvCells = { "John", "Doe", "67890", "230317" };

Example using regular expression:

String inputLine = "Tell me something interesting!";
String[] wordArray = inputLine.split("\\s+"); // split by one or more space characters
// Result is wordArray = {"Tell", "me", "something", "interesting!"};

You can even directly split a String literal:

String[] namesList = "Emma, John, Olivia, Ethan".split(", ");
// Result is namesList = {"Emma", "John", "Olivia", "Ethan"};

Warning : Do not forget that the parameter is always treated as a regular expression.

"aaa.bbb".split("."); // This returns an empty array

In the previous example . is treated as the regular expression wildcard that matches any character, and since every character is a delimiter, the result is an empty array.

Splitting based on a delimiter which is a regex meta-character

The following characters are considered special (aka meta-characters) in regex

 < > - = ! ( ) [ ] { } \ ^ $ | ? * + .

To split a string based on one of the above delimiters, you need to either escape them using \\ or use Pattern.quote():

  • Using Pattern.quote():
  • String s = "a|b|c";
    String regex = Pattern.quote("|");
    String[] arr = s.split(regex);
  • Escaping the special characters:
  • String s = "a|b|c";
    String[] arr = s.split("\\|");

Split removes empty values

split(delimiter) by default removes trailing empty strings from result array. To turn this mechanism off we need to use overloaded version of split(delimiter, limit) with limit set to negative value like

String[] split = data.split("\\|", -1);

split(regex) internally returns result of split(regex, 0).

The limit parameter controls the number of times the pattern is applied and therefore affects the length of the resulting array.

If the limit n is greater than zero then the pattern will be applied at most n - 1 times, the array's length will be no greater than n, and the array's last entry will contain all input beyond the last matched delimiter. If n is negative, then the pattern will be applied as many times as possible and the array can have any length. If n is zero then the pattern will be applied as many times as possible, the array can have any length, and trailing empty strings will be discarded.

Splitting with a StringTokenizer

Besides the split() method Strings can also be split using a StringTokenizer.

StringTokenizer is even more restrictive than String.split(), and also a bit harder to use. It is essentially designed for pulling out tokens delimited by a fixed set of characters (given as a String). Each character will act as a separator. Because of this restriction, it's about twice as fast as String.split().

Default set of characters are empty spaces (\t\n\r\f). The following example will print out each word separately.

String newStr = "a quick brown dog leaped over the lazy cat";
StringTokenizer newTokenizer = new StringTokenizer(newStr);
while (newTokenizer.hasMoreTokens())
{
	System.out.println(newTokenizer.nextToken());
}

This will print out:

a
quick
brown
dog
leaped
over
the
lazy
cat

You can use different character sets for separation.

String newStr = "brown fox jumps";
// In this case, `b`, `o`, and `s` will be used as delimiters
StringTokenizer newTokenizer = new StringTokenizer(newStr, "bos");
while (newTokenizer.hasMoreTokens())
{
	System.out.println(newTokenizer.nextToken());
}

This will print out:

rwn
f x
jump

Basic Programs