Faced with the requirement to edit for a valid email address format from a CICS application, one developer was using the java regular expression support which necessitated starting up a JVM Server in multiple CICS regions at about 90MB of storage per region. I think the developer must have gotten the regular expression to use from the W3C HTML5 Specification. He was using something like
^[a-zA-Z0-9.!#$%&'*+/=?^_`{|}~-]+@[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?(?:\.[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?)*$
The JVM Server storage overhead seemed a bit excessive to me for such a small use, so I’m thinking, why not use the C runtime regexec() function for the regular expression support? This shouldn’t be any different from my earlier uses of C runtime functions from COBOL programs, except that this would be under CICS.
So I coded up a routine using
77 ws-validate-email-pattern pic x(133) value - z"^[a-zA-Z0-9.!#$%&'*+/=?^_`{|}~-]+@[a-zA-Z0-9](?:[a-zA-Z0-9- - "]{0,61}[a-zA-Z0-9])?(?:\.[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a- - "zA-Z0-9])?)*$".
and
call 'regcomp' using by reference ws-preg ws-validate-email-pattern by value ws-regcomp-cflags returning ws-rc
and
call 'regexec' using by reference ws-preg ws-email-address by value ws-zero ws-null-ptr ws-regexec-eflags returning ws-rc if ws-rc = zero then move 'Good email:' to ws-output-result else if ws-rc = 1 then move 'Bad email:' to ws-output-result else move 'REGEX error:' to ws-output-result end-if end-if
and low and behold, it didn’t work. After some trial and error and calls to regerror() to get some diagnostics, I figured out that the regexec() function does not support the (?: )
syntax, Noncapturing Subgroups, which is in the RE pattern the developer was using and which java.util.regex does support.
I was able to get a pattern such as this
77 ws-validate-email-pattern pic x(50) value z"^[A-Z0-9._%-]+@[A-Z0-9.-]+\.[A-Z]{2,4}$".
and the earlier pattern with the :?
sequences removed
77 ws-validate-email-pattern pic x(127) value z"^[a-zA-Z0-9.!#$%&'*+/=?^_`{|}~-]+@[a-zA-Z0-9]([a-zA-Z0-9-]{ - "0,61}[a-zA-Z0-9])?(\.[a-zA-Z0-9]([a-zA-Z0-9-]{0,61}[a-zA-Z0- - "9])?)*$".
to work.
So, if I want to support the non-capturing subgroups, perhaps I could use the Standard C++ Library <regex> template class? Well, apparently, not from a COBOL program. But, it shouldn’t be too hard to write a C++ CICS program…. No, not too hard.