VALIDATING E-MAIL ADDRESSES IN ASP.NET, WITH THE
REGULAREXPRESSIONVALIDATOR CONTROL
By Marco Bellinaso (mbellinaso@vb2themax.com)
Here is how we can define the textbox and the validators for asking the user to
input an e-mail address, from an ASP.NET page:
<asp:TextBox
runat="server" CssClass="TextBox" ID="NewEmail" Width="100%" /> <asp:RequiredFieldValidator
runat="server" ControlToValidate="NewEmail"
Display="dynamic"><br>* Email is required </asp:RequiredFieldValidator> <asp:RegularExpressionValidator
runat="server"
ValidationExpression=".*@.*\..*" ControlToValidate="NewEmail"
Display="dynamic"><br>* This Email address is not valid </asp:RegularExpressionValidator>
The expression .*@.*\..* means that the string must begin with a number of
characters (.*), then it must contain a '@' character, some more characters, a
period (escaped as \.), and finally more characters. For example, marco@thephile.com
is a valid e-mail address, while marco@thephile and marco.thephile.com are
invalid addresses.
The following tables summarize the most often used syntax constructs for the
regular expressions. First of all, let's see how to express the characters that
we want to match:
ordinary characters characters other than .$^{[(|)*+?\ match themselves
\b matches a backspace
\t matches a tab
\r matches a carriage return
\v matches a vertical tab
\f matches a form feed
\n matches a newline
\ if followed by a non-ordinary character (one of those listed in the first row)
matches that character. For example \+ matches a + character
In addition to single characters, we can specify a class or a range of
characters that can be matched in the expression. That is to say that we could
want to allow any digit or any vowel in a position, and exclude all the other
characters. The following character classes allow you to do this:
. matches any character except \n
[aeiou] matches any single character specified in the set
[^aeiou] matches any character not specified in the set
[3-7a-dA-D] matches any character specified in the specified ranges (in the
example the ranges are 3-7, a-d, A-D)
\w matches any word character, that is any alphanumeric character or the
underscore (_)
\W matches any non-word character
\s matches any whitespace character (space, tab, form-feed, new line, carriage
return or vertical feed)
\S matches any non-whitespace character
\d matches any decimal character
\D matches any non-decimal character
Also, we can specify that a certain character or class of characters must be
present at least one, or between 2 and 6 times, etc. The quantifiers are put
just after a character or a class of characters, and allow you to specify how
many times the preceding character/class must be matched:
Quantifier Description
* zero or more matches
+ one or more matches
? zero or one matches
{N} N matches
{N,} N or more matches
{N,M} between N and M matches
To recap everything with another easy example, say that we have the expression [aeiou]{2,4}\+[1-5]*:
this means that a string to correctly match this expression must start with two
to four vowels, have a + sign, and terminate with zero or more digits between 1
and 5.
Regular expressions are a very powerful tool to validate the content of a
control because they can be very detailed and complex. Furthermore, you can use
them to do other advanced work, such as replacing or extracting the occurrences
that match the expression, as we'll see in practice in the next chapter. Entire
books have been written to teach how to use regular expressions, for example, "Sams
Teach Yourself Regular Expressions in 24 Hours" (Sams Press, ISBN
0-672319-36-5).
You have to be aware, however, that the regular expression used to validate the
e-mail address in the module only checks that the address is well formed. One
could still subscribe with an address that does not exist, there's nothing that
prevents that at this time. To limit this you can at least improve the regular
expression with other rules, such as checking that the domain name is at least
two characters long and that it does exist, the extension is not an extension of
fantasy (the supported extensions are limited, consult http://archive.devx.com/devxpress/gurl.asp?i=1X3463868X68583
to get a complete list), etc. There are yet other rules, and although we kept it
simple in our example, you'll find that a complete regular expression could be
much longer that ours. If regular expressions are not enough for you (and they
are not if you want to check the existence of a domain, for example), you can
use a CustomValidator and write your own function to validate a value.
However, even with the most complete expression and other methods you can't be
100% sure that the address exists, the user can write an address with a real
domain name, a real extension, but with a name of fantasy before the @. When the
messages are sent out, you won't get any error or exception at that time, the
SMTP server does its work without letting you know about the result. However,
messages sent to non-existent addresses usually come back to the sender with an
error message saying that the message couldn't be successfully delivered because
the address does not exist. These error messages are sent to the server's
postmaster and then forwarded to the site's administrator. At this point, when
you get such a message, you can manually remove the address from the DB, or you
could even write a program that parses the incoming messages to find the error
messages, and automatically delete the non-existent e-mail addresses by using
the business classes of the MailingList module. This is beyond the scope of this
book of course, and it is usually only provided by professional modules that
must be able to handle tens or hundreds of thousands of e-mails, and automate
any possible process. For most sites, though, our module is enough, and deleting
the erroneous address from the DB once a month or so is not such a big issue
(how often you should do this actually depends on the frequency of your
newsletters).
|