Paul Kiddie

Regex to match lines containing multiple strings in Java

July 30, 2009

I spent a disproportionate amount of time today trying to get my head round some Java regular expression code that would match against certain lines in a log file which had a number of given words in it. Thanks to this entry on Stack Overflow, who posed a similar question, I converted the code from C# to Java (including adding more escape characters, as Java doesnt support the ’@’ string literal character). I then wrapped in a function to give me something that could dynamically generate a regular expression based on an array of strings (note the following code uses the excellent commons lang library hosted at Apache in order to get access to the StringUtils.join method, filling in a couple of gaps):

public static String constructRegexOr(String[] str) { return String.format("\\b(%s)\\w*\\b", StringUtils.join(str,"|")); }

Essentially the array of strings are joined with the regex alternation operator, which say if a string has any, or all of, str[0], str[1], … in it, then it is regarded as a match by the regular expression matcher.

Excellent, I thought. I tested some C# code by creating an instance of the Regex class using the regular expression provided on Stack Overflow, and used the IsMatch(str) method to test it worked, and it did, returning true/false where appropriate.

In Java, I had some issues: using str.match(regex) and the longer winded way:

... Pattern p = Pattern.compile(regex); if(p.matcher(str).match()) { return entry; }

Did not work as I expected. On a hunt, I discovered this article, which says with specific reference to the Java regular expression implementation:

“regex” is applied as if you had written “^regex$” with start and end of string anchors


This is different from most other regex libraries

Which was a revelation - with a slight modification, by using find() instead of match(), i.e.

... Pattern p = Pattern.compile(regex); if(p.matcher(str).find()) { return entry; }

The regular expression matched as expected! I’ve pasted the implementation below (with some test values!) — my ‘gotcha’ of the day.

String str[] = new String[] { "apples","and","oranges"}; String regex = String.format("\\b(%s)\\w*\\b", StringUtils.join(str,"|")); String strToTest = "oranges and lemons";

Pattern p = Pattern.compile(regex); boolean matches = p.matcher(entry).find(); //true

👋 I'm Paul Kiddie, a software engineer working in London. I'm currently working as a Principal Engineer at trainline.