The Daily WTF: Curious Perversions in Information Technology
Welcome to TDWTF Forums Sign in | Join | Help
in Search

The richness of language

Last post 01-17-2007 10:03 AM by savar. 12 replies.
Page 1 of 1 (13 items)
Sort Posts: Previous Next
  • 01-15-2007 4:44 AM

    • Samus_
    • Not Ranked
    • Joined on 09-18-2006
    • Suckeland City
    • Posts 5

    The richness of language

    At least in human languages, a rich one allows you to say the same thing in many different ways... at least in human languages...

    today I was wandering in some community and stopped by a post of someone asking to explain this javascript code:

    function emailCheck(email)
    {
    var tmp = "" + email + "", s = tmp.replace(/^\s*|\s*$/g, "");
    var re = /^(\w|[^_]\.|[\-])+((\@){1}([^_]))(([a-z
    ]|[\d]|[\-]|\.)+|([^_]\.[^_])*)+\.[a-z]{
    2,3}$/i
    if (!re.test(s))
    {
    return false;
    }
    re = /\.(a[c-gil-oq-uwz]|b[a-bd-jm-or-tvwyz]|
    c[acdf-ik-orsuvx-z]|d[ejkmoz]|e[ceghr-u]
    |f[i-kmorx]|g[abd-ilmnp-uwy]|h[kmnrtu]|i
    [delm-oq-t]|j[emop]|k[eg-imnprwyz]|l[a-c
    ikr-vy]|m[acdghk-z]|n[ace-giloprtuz]|om|
    p[ae-hk-nrtwy]|qa|r[eouw]|s[a-eg-ort-vyz
    ]|t[cdf-hjkm-prtvwz]|u[agkmsyz]|v[acegin
    u]|w[fs]|y[etu]|z[admrw]|com|edu|net|org
    |mil|gov|biz)$/i
    if (!re.test(s))
    {
    return false;
    }
    re = /\@\@/
    return(!re.test(s));
    }

    (please don't complain about the indentation or wrapping, this is the way that community's software shows it, in fact I'm copying it directly from the page's source)

    well I explained and made my own version:

    function emailCheck(email) {
    // declare valid TLDs
    var TLD = 'aero|biz|cat|com|coop|info|jobs|mobi|mu
    seum|name|net|org|pro|travel|gov|edu|mil
    |int';
    var ccTLD = 'ac|ad|ae|af|ag|ai|al|am|an|ao|aq|ar|as|
    at|au|aw|ax|az|' +
    'ba|bb|bd|be|bf|bg|bh|bi|bj|bm|bn|bo|br|
    bs|bt|bv|bw|by|bz|' +
    'ca|cc|cd|cf|cg|ch|ci|ck|cl|cm|cn|co|cr|
    cu|cv|cx|cy|cz|' +
    'de|dj|dk|dm|do|dz|' +
    'ec|ee|eg|eh|er|es|et|eu|' +
    'fi|fj|fk|fm|fo|fr|' +
    'ga|gb|gd|ge|gf|gg|gh|gi|gl|gm|gn|gp|gq|
    gr|gs|gt|gu|gw|gy|' +
    'hk|hm|hn|hr|ht|hu|' +
    'id|ie|il|im|in|io|iq|ir|is|it|' +
    'je|jm|jo|jp|' +
    'ke|kg|kh|ki|km|kn|kp|kr|kw|ky|kz|' +
    'la|lb|lc|li|lk|lr|ls|lt|lu|lv|ly|' +
    'ma|mc|md|me|mg|mh|mk|ml|mm|mn|mo|mp|mq|
    mr|ms|mt|mu|mv|mw|mx|my|mz|' +
    'na|nc|ne|nf|ng|ni|nl|no|np|nr|nu|nz|' +
    'om|' +
    'pa|pe|pf|pg|ph|pk|pl|pm|pn|pr|ps|pt|pw|
    py|' +
    'qa|' +
    're|ro|rs|ru|rw|' +
    'sa|sb|sc|sd|se|sg|sh|si|sj|sk|sl|sm|sn|
    so|sr|st|su|sv|sy|sz|' +
    'tc|td|tf|tg|th|tj|tk|tl|tm|tn|to|tp|tr|
    tt|tv|tw|tz|' +
    'ua|ug|uk|um|us|uy|uz|' +
    'va|vc|ve|vg|vi|vn|vu|' +
    'wf|ws|' +
    'ye|yt|yu|' +
    'za|zm|zw';

    // define 'email-like' regexp
    var re = new RegExp('^[A-Z0-9._%-]+@[A-Z0-9-]+(\.[A-Z
    0-9-]+)*(\.(' + TLD + '|' + ccTLD + '))$', 'i');

    // trim parameter
    email = email.toString().replace(/^\s*|\s*$/g, '');

    return re.test(email);
    }

    I must admit that felt a bit like this, but it left me thinking -how good- it is to have lots of ways to do the same thing in programming languages...

    Samus_ (aka "B Tucker")
    Filed under: ,
  • 01-15-2007 7:18 AM In reply to

    Re: The richness of language

    *shudders in inhuman terror and fear*

    *and then literally rolls on the floor laughing my flaming ass off for a while at the comic*


     

    http://jivlain.wordpress.com

    Yes, it does control my brain. Why do you ask?
  • 01-15-2007 7:31 AM In reply to

    Re: The richness of language

    Or just use a regexp to match the bit after the @ and then just use DNS to check that the domain exists. Better yet, send an email to that address asking for confirmation.

    Filed under: ,
  • 01-15-2007 11:47 AM In reply to

    Re: The richness of language

    SpComb:

    Or just use a regexp to match the bit after the @ and then just use DNS to check that the domain exists. Better yet, send an email to that address asking for confirmation.

    You can't do that with JavaScript*. It's an open question why this is being done in JavaScript to begin with, mind you, but...

    (* Unless you have some sort of AJAX routine on your server there to help, or something like that. )

    :(){ :|:& };:
  • 01-15-2007 12:43 PM In reply to

    Re: The richness of language

    Grr...though there was a bug, then re-read. But the forum won't let me delete this post.

    TRWTF is Community Server
  • 01-15-2007 1:01 PM In reply to

    Re: The richness of language

    Looks like you aren't trying to eliminate all invalid email addresses, nor support all valid ones (! paths) .

    Under the circumstances, I would have just used "/^.+\@.+\..+$/". Leaves a bit to be desired, but stops the idiots who think gmail.com is an e-mail address.  You also don't have to change it when there are new TLDs. :)

    (Reminds me, I just did the TEST problem in Sphere Online Judge with a regex. Overkill, no? The solution was "awk -e '/^42$/{exit}{print}'".)
     

  • 01-15-2007 7:44 PM In reply to

    Re: The richness of language

    But those are flawed!  They don't match valid RFC822/2822 email.  Don't use them.
  • 01-16-2007 2:26 AM In reply to

    Re: The richness of language

    At least he's thorough.  I always just look for a dot followed by at least two letters and count that as a TLD.  Of course if he was really thorough he would have used that page-long regexp posted here a while back.
  • 01-16-2007 6:06 AM In reply to

    Re: The richness of language

    realmerlyn:
    But those are flawed! They don't match valid RFC822/2822 email. Don't use them.
    On the other hand, RFC 822 and 2822 define many e-mail address schemes you're not likely to encounter (local network addresses and the likes). To validate e-mail addresses coming from the intarwebs, one only needs to match against a subset of the RFC822 address space.
    "Well, take it from an old hand: the only reason it would be easier to program in C is that you can't easily express complex problems in C, so you don't." - Erik Naggum (in comp.lang.lisp)
  • 01-16-2007 8:03 AM In reply to

    Re: The richness of language

    So the almost infinite majority of non-existing email addresses would get though, but a minor subset of those with invalid suffixes wouldn't. All this only at the cost of development, maintanance and risk. What a bargin.

  • 01-16-2007 4:38 PM In reply to

    • RevEng
    • Top 500 Contributor
    • Joined on 03-03-2006
    • Saskatchewan, Canada
    • Posts 101

    Re: The richness of language

    That's a great point. What is the point of validating email addresses using regexs?  Especially client-side.

    Here are the problems:

    1. There are many possible email address schemes if you truly want to follow the RFCs.  Last I saw, somebody made a regexp that could almost match them all, but it was several pages long and almost impossible to read.
    2. You're only verifying that it could be an address according to the RFC.  That doesn't mean that the email address actually exists or belongs to the person entering it.
    3. If you're validating with JavaScript, it's trivial to disable JavaScript, edit the JavaScript locally, or just submit the thing manually, making it completely trivial to break by those who have a good reason to want to.

    I use a fake email address all the time.  It is no@body.com.  It's a completely valid email address (in fact, it's even a valid domain).  Short of sending email there with a confirmation, there's little that one could do to disprove my rightful ownership of it.

    The only real reason to verify it is for the user's sake (so they don't type in their home address instead of their email address), and that can be done simply and quickly by searching for an @.  If you really need to confirm the legitimacy of their email address, only a confirmation email will accomplish that. 

     

    The reasonable man adapts himself to the world. The unreasonable one persists in trying to adapt the world to himself. Therefore, all progress depends on the unreasonable man. --GEORGE BERNARD SHAW, Maxims for Revolutionists
  • 01-16-2007 8:11 PM In reply to

    Re: The richness of language

     That's why I wrote a page about regular expression email address validation.
     

    masklinn:
    realmerlyn:
    But those are flawed! They don't match valid RFC822/2822 email. Don't use them.
    On the other hand, RFC 822 and 2822 define many e-mail address schemes you're not likely to encounter (local network addresses and the likes). To validate e-mail addresses coming from the intarwebs, one only needs to match against a subset of the RFC822 address space.

     

  • 01-17-2007 10:03 AM In reply to

    Re: The richness of language

    RevEng:
    1. There are many possible email address schemes if you truly want to follow the RFCs.  Last I saw, somebody made a regexp that could almost match them all, but it was several pages long and almost impossible to read.
    2. You're only verifying that it could be an address according to the RFC.  That doesn't mean that the email address actually exists or belongs to the person entering it.

    I also wonder, given how complex the RFC is, if there aren't mail servers which accept addresses that are NOT compliant.

     

Page 1 of 1 (13 items)
Powered by Community Server (Non-Commercial Edition), by Telligent Systems