01 September 2016

RegExp: Constraint to prevent spaces at the beginning or end.

Even though a space is a regular character, the client didn't want spaces at the beginning or end of a string. Any spaces in the middle were fine.
Of course this could be handled by the application, but it must also be implemented in the database. Using a check constraint with a regular expression will prevent the end user from entering unwanted data.

To try things out, let's just start with a simple table with a single column.

   create table test
   (str varchar2(10));

Now the magic part: add a check constraint using a regular expression

   alter table test
   add constraint str_chk check (
      regexp_like (str, '^[^[:space:]].+[^[:space:]]$')
    );

The regular expression reads: The string should start (the first caret) with any character which is not in the character class of [:space:], followed by one or more characters (the period) and it should end with (the dollar) any character as long as it's not in the character class of [:space:].

UPDATE - 07-Sept-2016

The expression that is used in the above constraint will also prevent from a single allowed character or double character to be entered. This omission has been corrected by David Grimberg. The correct regular expression should be:

^[^[:space:]](.*[^[:space:]])?$

To test the constraint, the following insert statement were used.

   insert into test values ('hello');
   insert into test values ('hel lo');
   -- Not allowed:
   --   starting with a space
   insert into test values (' hello');
   --   ending with a space
   insert into test values ('hello ');
   --   just a space
   insert into test values (' ');
   --   multiple spaces
   insert into test values ('   ');
   --   Tab
   insert into test values (chr(9));
   --   Line feed
   insert into test values (chr(10));
   --   Carrige Return
   insert into test values (chr(13));

6 comments:

  1. Your regexp also enforces a three character minimum length. As such inserting either of the following strings will fail as well:

    'a', 'ab'.

    Change the regexp to use asterisk (*) instead of plus (+), and wrap make the last two patterns optional to resolve this:

    '^[^[:space:]](.*[^[:space:]])?$'

    ReplyDelete
    Replies
    1. Yes, you're absolutely right. I will amend the blogpost. Thank you very much.

      Delete
  2. Sorry for the late reaction, only just run into this post.
    Nice use of regexp.
    But as was shown, using regexp can easily lead to little bugs due to its inherent complexity.
    If you need it, or the alternative is even more complex, then of course you should use regexp.
    But in this case why not use a simple straight forward check constraint that does the job and doesn't need explanation of what is does, hence is more maintainable by other developers.
    I'm talking about a constraint like:

    alter table test
    add constraint str_chk check (
    coalesce(str,'X') = trim(coalesce(str,'X'))
    );

    ReplyDelete
    Replies
    1. Thanks for that idea, I like it when things are simpler and easier to maintain. However in this case the check constraint that you are proposing is not equivalent to the REGEXP variation. The REGEXP variation will also exclude all "space characters" like new line, line feed, tabs and so on.

      Delete
    2. Ah, I see. You meant whitespace rather than spaces.
      Then I didn't understand the real requirement correctly.
      In that case yours is the better solution.
      Though it could still be done with lots and lots of nested trims, that would fall in the category "more complex and error prone"

      Delete
    3. Yes, you're right, I should have made it more clear that whitespace was meant. Guess I was too much into the [:space:] character class :)

      Delete