Regular expression exclude word

I want to use a regular expression to exclude a complete word. I need this for a particular situation which I explain further

Problem

As part of Implementing a vulnerability Waiver Process for infected 3rd party libraries I have a jira transition dialog, which excepts the user to set some values. There are two drop-down fields or as JIRA calls it “Select List (single choice)”. These always present a value None in case nothing is selected.

jira example issue transition dialog

A issue transition dialog with drop-down fields containing ‘None’ values

In order to ensure, that when doing a transition to a specific state, a proper value is selected we use jira-validators. These validators support regular expressions, so the question is now, how I ensure that the selected value is not None:

Solution

Some searching in the web I found a solution in the regular-expressions-cookbook — which sample is readable. So the solution is

b(?!Noneb)w+

The result is a proper evaluation of the value in the dialog:

jira example issue transition dialog with error

Validation of the drop-down fields containing ‘None’ values

Explain the details

As explained in regular-expressions-cookbook and while looking at the regular-expressions.info you can understand why the above solution works:

  • negate character classes

    Typing a caret ^ after the opening square bracket negates the character class. The result is that the character class matches any character that is not in the character class.

    The issue with this is the part highlighted: It matches any character, so using [^None] ignores anything containing N, o, n and e — but we care about the whole word.

  • wordboundaries

    b allows you to perform a “whole words only” search using a regular expression in the form of bwordb

    The issue with that is that b[^None]w+b is still looking at the character class thus ignoring any word that contains N, o, n and e

  • negative lookagead

    Similar to positive lookahead, except that negative lookahead only succeeds if the regex inside the lookahead fails to match.

So the final solution using the techniques mentioned above

  • b asserts the position at a word boundary
  • (?! not followed by
  • None the word we want to “ignore” i.e. should not match
  • b asserts the position at a word boundary
  • ) ends the negative lookahead
  • w+ still match anything other

5.4. Find All Except a Specific Word

Problem

You want to use a regular expression to match any complete
word except cat. Catwoman, vindicate, and other words that
merely contain the letters “cat” should be matched—just not cat.

Solution

A negative lookahead can help you rule out specific words, and is
key to this next regex:

b(?!catb)w+
Regex options: Case
insensitive
Regex flavors: .NET,
Java, JavaScript, PCRE, Perl, Python, Ruby

Discussion

Although a negated character class (written as [^]) makes it easy to match anything
except a specific character, you can’t just write [^cat] to match anything except
the word cat.
[^cat] is a valid regex,
but it matches any character except c, a, or t. Hence, although b[^cat]+b would avoid matching
the word cat,
it wouldn’t match the word time either, because it contains the
forbidden letter t. The regular expression b[^c][^a][^t]w* is no good
either, because it would reject any word with c as its first letter, a as its second letter,
or t as its
third. Furthermore, that doesn’t restrict the first three letters to
word characters, and it only matches words with at least three
characters since none of the negated character classes are
optional.

With all that in mind, let’s take another look at how the regular
expression shown at the beginning of this recipe solved the
problem:

b # Assert position at a word boundary. (?! # Not followed by: cat # Match "cat". b # Assert position at a word boundary. ) # End the negative lookahead. w+ ...

If you want to exclude a certain word/string in a search pattern, a good way to do this is regular expression assertion function. It is indispensable if you want to match something not followed by something else.

A Simple Example

String str = "programcreek";
Pattern p = Pattern.compile(".*program(?=creek).*");
Matcher m = p.matcher(str);
 
if(m.matches()){
	System.out.println("Match!");
}else{
	System.out.println("No");
}

1. Look Ahead

In the example above, if you want to find «programcreek», but not «programriver». You can use the pattern:

programcreek matches
programriver doesn’t match

?= is positive lookahead and ?! is negative lookahead.

2. Look Behind

Lookbehind is similar. We can use ?<= for positive look behind and ?<! for negative lookbehind.

programcreek matches
softwarecreek doesn’t match

The Data Suite use the whole power of Regular Expressions to exclude elements from your results. That makes it easy for you to handle your millions of files and folders.

You do not have to take care about large and lower case because we implemented it case insensitive. In addition, the filter is always evaluated as a full match that means we will handle the ^$ for you.

Take care about the following signs:

RegEx Description
The backslash is the escape indicator. You have to escape some characters in your pattern:
. (dot)
— (dash)
(backslash)
$ (dollar)
. Matches any single character
[]
[0-3a-c]
Matches a single character that is contained within the brackets.
Matches only a single character of 0123abc.
[^] Matches a single character that is not contained within the brackets.
* Matches the preceding element zero or more times.
.* Combination of this: Matches any single character zero or more times.
(RegEx)|(RegEx) Combine two or more pattern with a logic or. Please use brakets for each pattern to separate them from each other.

For more details, please refer to a documentation of Regular Expression. There are also a lot of RegEx-Tester out there. If you are not familiar with Regular Expression, please give them a try. You can also contact our support team.

Exclude Path

If you add a path to the exclude you can do it in two ways:

  • with escaping (\\YourServer\YourShare\Folder\)
  • without escaping (//YourServer/YourShare/Folder/)

A folder must end always with a slash, otherwise, it will be interpreted as a file!

We added some best practice examples as default filters:

Pattern Description
.*\~snapshot\ Exclude all folders that are named ~snapshot
(e.g. NetApp is using this folders to store the snapshots (backups) of all files in it.)
.*\~snapshot Exclude all files that are named ~snapshot
.*\.svn\ Exclude all folders that are named .svn
(e.g. Subversion is using these folders to store synchronization information in it.)
.*/archive/.*.txt
.*\archive\.*.txt
Exclude all txt files in all folders which named archive and its subfolders
.*\archive\[^\]*.txt Exclude all txt files in the archive folder but not in the subfolders
  • Remove From My Forums
  • Question

  • Currently i am using one regex for accepting some records from a file.  That regex is ^[a-zA-Z0-9| ].*$. This regex accepts record starting with alphanumerics or blank space.
    I want to modify this regex such that, previous conditions should be there and regex should not accept record starting with word ‘bbb’.
    Which regex should be used?

    • Edited by

      Tuesday, March 9, 2010 4:27 PM

    • Moved by
      OmegaManModerator
      Thursday, August 26, 2010 2:51 PM
      Regex question (From:Visual C# Language)

Answers

  • The pattern you have does not accurately match the description you provided. You want to allow alphanumerics or spaces, yet you’ve used the «or» pipe symbol within a character class. By doing so you are actually allowing the «|» character to be a valid match. To «or» things correctly you should use a group: ([A-Z]| ) but in your case this isn’t needed. Simply remove the «|» from the character class and it will work as you originally intended:

    ^[a-zA-Z0-9 ].*$


    To answer to your question you can use a negative look-around to prevent matches that start with the word «sample.» This would be:

    ^(?!sample)[a-zA-Z0-9 ].*$

    Bear in mind that you can shorten your pattern by using the w metacharacter which matches alphanumeric characters. The new pattern would be:

    ^(?!sample)[w ].*$

    EDIT: you changed the word «sample» to «bbb» in your recent edit. Switch those words in the patterns given above.


    Document my code? Why do you think it’s called «code»?

    • Edited by
      Ahmad Mageed
      Tuesday, March 9, 2010 4:35 PM
      moved patterns to code block, forum was adding unintended spaces
    • Marked as answer by
      kkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
      Wednesday, March 10, 2010 8:22 AM

  • Using the pattern I mentioned earlier here’s a snippet to demonstrate:

    string[] inputs = { "abc1132456   1456621", "bbb1246661   164992" };
    
    string pattern = @"^(?!bbb)[w ].*$";
    foreach (string input in inputs)
    {
        Console.WriteLine("{0}: {1}", Regex.IsMatch(input, pattern), input);
    }

    Document my code? Why do you think it’s called «code»?

    • Marked as answer by
      kkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
      Wednesday, March 10, 2010 8:22 AM

    • Marked as answer by
      kkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
      Wednesday, March 10, 2010 8:22 AM

Like this post? Please share to your friends:
  • Regular expression any word match
  • Regex for word ending with
  • Regex find word starting with
  • Regular expression all but word
  • Regression testing with excel