The Daily WTF: Curious Perversions in Information Technology
Welcome to TDWTF Forums Sign in | Join | Help
in Search

Optimizing parsers can be tricky

Last post 02-25-2013 7:40 PM by mervgreen. 7 replies.
Page 1 of 1 (8 items)
Sort Posts: Previous Next
  • 02-23-2013 1:12 AM

    Optimizing parsers can be tricky

    //    - Matching and slicing on a huge input is often cause of slowdowns.
    //      The solution is to chunkify the input into smaller strings.
    //      The chunks are stored in the `chunks` var...
    
    
    // Split the input into chunks.
    chunks = (function (chunks) {
        var j = 0,
            skip = /(?:@\{[\w-]+\}|[^"'`\{\}\/\(\)\\])+/g,
            comment = /\/\*(?:[^*]|\*+[^\/*])*\*+\/|\/\/.*/g,
            string = /"((?:[^"\\\r\n]|\\.)*)"|'((?:[^'\\\r\n]|\\.)*)'|`((?:[^`]|\\.)*)`/g,
            level = 0,
            match,
            chunk = chunks[0],
            inParam;
    
        for (var i = 0, c, cc; i < input.length;) {
            skip.lastIndex = i;
            if (match = skip.exec(input)) {
                if (match.index === i) {
                    i += match[0].length;
                    chunk.push(match[0]);
                }
            }
            c = input.charAt(i);
            comment.lastIndex = string.lastIndex = i;
    
            if (match = string.exec(input)) {
                if (match.index === i) {
                    i += match[0].length;
                    chunk.push(match[0]);
                    continue;
                }
            }
    
            if (!inParam && c === '/') {
                cc = input.charAt(i + 1);
                if (cc === '/' || cc === '*') {
                    if (match = comment.exec(input)) {
                        if (match.index === i) {
                            i += match[0].length;
                            chunk.push(match[0]);
                            continue;
                        }
                    }
                }
            }
            
            switch (c) {
                case '{': if (! inParam) { level ++; chunk.push(c); break }
                case '}': if (! inParam) { level --; chunk.push(c); chunks[++j] = chunk = [; break }
                case '(': if (! inParam) { inParam = true; chunk.push(c); break }
                case ')': if ( inParam) { inParam = false; chunk.push(c); break }
                default: chunk.push(c);
            }
            
            i++;
        }
        if (level != 0) {
            error = new(LessError)({
                index: i-1,
                type: 'Parse',
                message: (level > 0) ? "missing closing `}`" : "missing opening `{`",
                filename: env.filename
            }, env);
        }
    
        return chunks.map(function (c) { return c.join('') });;
    })([[]);
    

    A fork later...

    Speeds up compilation about 10x...


  • 02-23-2013 3:05 AM In reply to

    Re: Optimizing parsers can be tricky

    Hah, this is from the LESS source. I just happened to read it the other day and noticed the remark about chunking. I didn't realize how much effort went into that single line. Do you have a link to that exact commit? I'd love to bookmark it for posterity.
  • 02-24-2013 3:42 PM In reply to

    Re: Optimizing parsers can be tricky

    TRWTF is JavaScript.


    Information technology not available until further notice. The political trolls won. Wake me up when the discussion is more interesting then YouTube comments.

  • 02-24-2013 5:58 PM In reply to

    Re: Optimizing parsers can be tricky

    MiffTheFox:

     

    TRWTF is JavaScript.

    Considering we have JavaScript libraries that can parse JavaScript itself, which is a good deal more complex than LESS, in acceptable time; no, TRWTF are the LESS guys.

    Don't kid yourself either; the new implementation, while faster, is still is a horribly piece of failure. For instance, it will fail to chunk correctly when the exact sequence "\n}\n" never happens to be in your source code. (Poor Windows people stuck with "\r\n"...) That's not even the biggest flaw. The biggest flaw is using string.replace with a string parameter as the pattern: that will replace only the first occurence instead of all occurences. Nice; a chunker that will produce at most two chunks!

  • 02-24-2013 8:33 PM In reply to

    Re: Optimizing parsers can be tricky

    Ragnax:
    the new implementation, while faster, is still is a horribly piece of failure. For instance, it will fail to chunk correctly when the exact sequence "\n}\n" never happens to be in your source code. (Poor Windows people stuck with "\r\n"...) That's not even the biggest flaw. The biggest flaw is using string.replace with a string parameter as the pattern: that will replace only the first occurence instead of all occurences. Nice; a chunker that will produce at most two chunks!

    OK, I agree the fix is horribly broken, but if it actually succeeds in speeding compilation 10x as it claims, then the original assertion that chunking the input is an optimization seems invalid then maybe no chunking needs to be performed in the first place.

    Disclaimer: I'm going entirely based on code comments here, as I was too lazy to thoroughly read the chunking code.

    Signatures are stupid.
  • 02-25-2013 12:25 PM In reply to

    Re: Optimizing parsers can be tricky

     My only critique: why is this thread not titled "Blowing Chunks"?


    HardwareGeek:

    <blink> and you're dead!



    "Where is grumpy cat?"
    - Mozilla's MOST ADVANCED USER!
  • 02-25-2013 1:50 PM In reply to

    Re: Optimizing parsers can be tricky

    joe.edwards:
    Ragnax:
    the new implementation, while faster, is still is a horribly piece of failure. For instance, it will fail to chunk correctly when the exact sequence "\n}\n" never happens to be in your source code. (Poor Windows people stuck with "\r\n"...) That's not even the biggest flaw. The biggest flaw is using string.replace with a string parameter as the pattern: that will replace only the first occurence instead of all occurences. Nice; a chunker that will produce at most two chunks!
    OK, I agree the fix is horribly broken, but if it actually succeeds in speeding compilation 10x as it claims, then the original assertion that chunking the input is an optimization seems invalid then maybe no chunking needs to be performed in the first place.

    I'm going to go out on a limb here and wager that 'the fix' was only tested on a relatively simple & small input file where the cost of chunking somehow far outweighs the cost of everything else.

    Lorne Kates:
    My only critique: why is this thread not titled "Blowing Chunks"?
    A sadly missed opportunity, I agree.
  • 02-25-2013 7:40 PM In reply to

    Re: Optimizing parsers can be tricky

     Yes, I have the link. This duet of fail comes to you from https://github.com/julfers/less.js/commit/6f0a19e65a41cee89b272f0e405e369b6e31dd60

    Lorne Kates:

    My only critique: why is this thread not titled "Blowing Chunks"?

     

    Yep. Meta-fail. It's like an encore.
Page 1 of 1 (8 items)
Powered by Community Server (Non-Commercial Edition), by Telligent Systems