Email addresses have a local-part and a domain separated by an (unquoted) "@" symbol. The local-part must be one of a dot-atom or a quoted string, and the domain must be either a domain name or a domain literal. A dot-atom can only contain letters, numbers, dots, and the following characters: ! # $ % & ' * + - / = ? ^ _ ` { | } ~. However, neither the first nor the last character can be a dot, and two or more consecutive dots are not allowed. A regular expression to match for a dot-atom local-part would be as follows:
// Dot-atom "/^([!#-'*+\/-9=?^-~-]+)(?>\.(?1))*$/iD"A quoted string can only contain printable US-ASCII characters or the space character, all contained within double quotes. Double quotes and backslashes are allowed only if part of a quoted-pair (escaped with a backslash). A quoted string may be empty. A regular expression to match for a quoted string local-part would be as follows:
// Quoted string '/^"(?>[ !#-\[\]-~]|\\\[ -~])*"$/iD'A domain name consists of 1 to 127 labels (the 128th label being the (empty) root domain), separated by dots, each containing any combination of letters, numbers, or hyphens. However, neither the first nor the last character can be a hyphen. A regular expression to match for a domain name would be as follows:
// Domain name
'/^([a-z0-9](?>[a-z0-9-]*[a-z0-9])?)(?>\.(?1)){0,126}$/iD'
A domain literal is one of an IPv4 address, an IPv6 address, or an IPv4-mapped IPv6 address.
An IPv4 address consists of four groups, separated by dots, each containing a decimal value between 0 and 255. A regular expression to match for an IPv4 address would be as follows:
// IPv4 Address
'/^(25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])(?>\.(?1)){3}$/D'
An IPv6 address consists of eight groups, separated by colons, each containing a hexadecimal value between 0 and FFFF. One or more consecutive groups of 0 value can be represented as a double colon; however, this can only occur once. A regular expression to match for an IPv6 address would be as follows:
// IPv6 Address
'/^([a-f0-9]{1,4})(?>:(?1)){7}|(?!(?:.*[a-f0-9][:\]]){8,})((?1)(?>:(?1)){0,6})?::(?2)?$/iD'
An IPv4-mapped IPv6 address is an IPv6 address with the final two groups represented as an IPv4 address. One or more consecutive groups of 0 value can be represented as a double colon; however, this can only occur once. A regular expression to match for an IPv4-mapped IPv6 address would be as follows:
// IPv4-mapped IPv6 Address
'/^([a-f0-9]{1,4})(?>:(?1)){5}:|(?!(?:.*[a-f0-9]:){6,})(?2)?::(?>((?1)(?>:(?1)){0,4}):)?(25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])(?>\.(?3)){3}$/iD'
When used as a domain literal in an email address, the IP address must be contained within square brackets, and IPv6 or IPv4-mapped IPv6 addresses must be preceded by (unquoted) "IPv6:". A regular expression check to match for a domain literal would be as follows:
// Domain literal
'/^\[(?:(?>IPv6:(?>([a-f0-9]{1,4})(?>:(?1)){7}|(?!(?:.*[a-f0-9][:\]]){8,})((?1)(?>:(?1)){0,6})?::(?2)?))|(?>(?>IPv6:(?>(?1)(?>:(?1)){5}:|(?!(?:.*[a-f0-9]:){6,})(?3)?::(?>((?1)(?>:(?1)){0,4}):)?))?(25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])(?>\.(?4)){3}))\]$/iD'
By bringing these regexes together and separating the local-part from the domain with an (unquoted) "@" symbol, we are left with the following which matches for every valid RFC 5321 email address:
// Email address
'/^(?>([!#-\'*+\/-9=?^-~-]+)(?>\.(?1))*|"(?>[ !#-\[\]-~]|\\\[ -~])*")@(?>([a-z0-9](?>[a-z0-9-]*[a-z0-9])?)(?>\.(?2)){0,126}|\[(?:(?>IPv6:(?>([a-f0-9]{1,4})(?>:(?3)){7}|(?!(?:.*[a-f0-9][:\]]){8,})((?3)(?>:(?3)){0,6})?::(?4)?))|(?>(?>IPv6:(?>(?3)(?>:(?3)){5}:|(?!(?:.*[a-f0-9]:){6,})(?5)?::(?>((?3)(?>:(?3)){0,4}):)?))?(25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])(?>\.(?6)){3}))\])$/iD'
We can then create a function which returns the return value of a (case-insensitive) preg_match on the above regular expression:
function isValid5321($emailAddress)
{
return preg_match('/^(?>([!#-\'*+\/-9=?^-~-]+)(?>\.(?1))*|"(?>[ !#-\[\]-~]|\\\[ -~])*")@(?>([a-z0-9](?>[a-z0-9-]*[a-z0-9])?)(?>\.(?2)){0,126}|\[(?:(?>IPv6:(?>([a-f0-9]{1,4})(?>:(?3)){7}|(?!(?:.*[a-f0-9][:\]]){8,})((?3)(?>:(?3)){0,6})?::(?4)?))|(?>(?>IPv6:(?>(?3)(?>:(?3)){5}:|(?!(?:.*[a-f0-9]:){6,})(?5)?::(?>((?3)(?>:(?3)){0,4}):)?))?(25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])(?>\.(?6)){3}))\])$/iD', $emailAddress);
}
An obsolete version of the local-part is also possible, allowing for a mixture of atoms and quoted strings, separated by dots. An obsolete quoted string allows any US-ASCII character when part of a quoted-pair, and any US-ASCII character except the null, horizontal tab, new line, carriage return, backslash, and double quote characters when not. An obsolete local-part may only be empty if it is a single quoted string. A regular expression to match for an obsolete local-part would be as follows:
// Obsolete local-part '/^([!#-\'*+\/-9=?^-~-]+|"(?>(?>[\x01-\x08\x0B\x0C\x0E-!#-\[\]-\x7F]|\\\[\x00-\x7F]))*")(?>\.(?1))*$/iD'Comments and folding white spaces are also allowed in an email address; before and/or after the local-part, before and/or after the domain, and before and/or after any dot in a local-part and/or domain. Folding white space may also appear in a quoted string and/or in comments, and comments may nest. A comment is almost identical to a quoted string except that it is opened and closed with a left and right parentheses respectively and that parentheses are only allowed as part of a quoted-pair (or as further comments), whereas double quotes may appear freely. Folding white spaces are occurrences of the space and/or horizontal tab character preceded by, optionally, a carriage return and line feed pair followed by more spaces and/or horizontal tabs. Folding white spaces, where allowed, are optional and may occur repeatedly. A regular expression to match for comments and folding white spaces would be as follows:
// Comments and folding white spaces '/^((?>(?>(?>((?>[ ]+(?>\x0D\x0A[ ]+)*)?)(\((?>(?2)(?>[\x01-\x08\x0B\x0C\x0E-\'*-\[\]-\x7F]|\\\[\x00-\x7F]|(?3)))*(?2)\)))+(?2))|(?2))?)$/iD'We can now include these where appropriate in the earlier function to give us the following which matches for RFC 5322 email addresses:
function isValid5322($emailAddress)
{
return preg_match('/^((?>(?>(?>((?>[ ]+(?>\x0D\x0A[ ]+)*)?)(\((?>(?2)(?>[\x01-\x08\x0B\x0C\x0E-\'*-\[\]-\x7F]|\\\[\x00-\x7F]|(?3)))*(?2)\)))+(?2))|(?2))?)([!#-\'*+\/-9=?^-~-]+|"(?>(?2)(?>[\x01-\x08\x0B\x0C\x0E-!#-\[\]-\x7F]|\\\[\x00-\x7F]))*(?2)")(?>(?1)\.(?1)(?4))*(?1)@(?1)(?>([a-z0-9](?>[a-z0-9-]*[a-z0-9])?)(?>(?1)\.(?1)(?5)){0,126}|\[(?:(?>IPv6:(?>([a-f0-9]{1,4})(?>:(?6)){7}|(?!(?:.*[a-f0-9][:\]]){8,})((?6)(?>:(?6)){0,6})?::(?7)?))|(?>(?>IPv6:(?>(?6)(?>:(?6)){5}:|(?!(?:.*[a-f0-9]:){6,})(?8)?::(?>((?6)(?>:(?6)){0,4}):)?))?(25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])(?>\.(?9)){3}))\])(?1)$/isD', $emailAddress);
}
For a class which allows greater control over which type(s) of email address to validate, see EmailAddressValidator.php.
On creating the object, using either EmailAddressValidator::setEmailAddress($emailAddress) or new EmailAddressValidator($emailAddress), the default settings allow dot-atom@domain-name email addresses. If the second (optional) parameter is set to 5321 then a quoted string local-part and a domain literal domain are allowed. If the second (optional) parameter is set to 5322 then an obsolete local-part, a domain literal domain, and comments and folding white spaces are allowed. To add a format, call its associated method with either no parameter or a true parameter. To remove a format, call its associated method with a false parameter. To return the validation check (either 1 for valid or false for invalid), use the isValid() method. The following is a list of available settings:
set5321() // setQuotedString() and setDomainLiteral() set5322() // setObsolete(), setDomainLiteral(), and setCFWS() setQuotedString() // A quoted string local-part is allowed setObsolete() // An obsolete local-part is allowed setDomainLiteral() // A domain literal domain is allowed setCFWS() // Comments and folding white spaces are allowedIf you pass a
true parameter to the isValid() method then the _verifyDomain() method will be called to check to see if the domain can be resolved to MX RRs, but only if the email address is syntactically valid. If the verification is successful then the object will return true; if the verification is unsuccessful then the object will return 0.