The Abyss of 64-bit Compatibility



  •  Out of university I was hired as an operating system developer at Acme Corp, a maker of embedded thingamabobs.  The first project that I worked on was development of the TMB 200, a bigger, stronger and faster version of the existing TMB 100.  The TMB 200 had a 64-bit processor while the TMB 100 was a 32-bit machine, and we had a *lot* of C/C++ code to port to the new processor.  That task was given to me.

    I started with the OS, which wasn't too bad.  The OS was third-party code of generally high quality, and already had full support for the new processor.  Acme had some customizations in the kernel, and of course there were a couple of minor 64-bit incompatibilties, but this was a couple days worth of work at most.  I then moved on to Acme's userland code.  This was a much bigger can of worms, as there was 10+ years worth of code here, all of which had only ever been compiled for a 32-bit process before.  Fortunately, there wasn't much that was too terrible here.  There wasn't too much mangling of pointers down to 32-bits here (amusingly, the most common reason that code was casting pointers to integers was to log it in a printf variant).  The biggest problem ended up being the use of size_t, of all things.

    That was complete after about a month, although the occasional bug slipped through the cracks over the next several months.  Unfortunately, there was one major component of our system not yet ported: the ABC module.  The ABC module consisted of some third-party, propriety hardware drivers for some specific components on the TMB, along with quite a lot of Acme code that tried to abstract away the various hardware differences between the different TBM variants and presented a consistent interface for configuring it.  I'm sure that I scarcely need tell you that the abstraction layer didn't really exist at this point -- it was the TBM 200 design that made the ABC experts realize that a rewrite of many subcomponents was necessary.  As they were unavailable for the 64-bit porting work, it fell to me to port ABC.

    We asked the driver vendors how difficult we should expect a 64-bit port to be.  "No problem," they assured us.  "We have customers running these drivers on 64-bit processors right now!"  Technically, they didn't lie to us.  Acme was indeed a customer, and we were running their drivers on the 64-bit processors of the TMB 200.  In 32-bit compatibility mode.

     My first step, of course, was to just get the damn thing to compile.  I suppose that it did, as long as you didn't enable the "there's no fucking way this is ever going to work in 64-bit mode" level of compiler errors. I had aspirations a little higher than "it crashes and burns 3 milliseconds into initialization", so I was in favour of keeping those compiler errors enabled.  And so I began my descent into some of the most god-awful code I have ever seen.

     First off, as this was C code, the developers had to declare there own versions of fixed-width integer types.  One vendor, however, felt that that fixed width types were for sissies, and decided to live more dangerously:

    typedef int foo_int32;

    typedef long foo_long32;

     

     Another vendor *almost* understood the concept of a null pointer:

    #define FOO_NULL 0
    #define FOO_NULLPTR (void*)0

    //...

    if((foo_int32)ptr == FOO_NULL)

      And of course, there were endless instances of pointers being cast to and from integers for no particular purpose.  At one point a coworker resorting to grepping through the source for the number "4" -- and actually found a 64-bit bug this way.

     After several grueling weeks of changing int to void*, I still hadn't gotten ABC to compile successfully.  I knew that I was close, but then I got a 65000 compiler errors that confirmed that I had stared too long into the abyss, and now the abyss was gazing into me:

    foo.html: At top level
    foo.html:468: error: cast from pointer to integer of different size
    foo.html:475: error: cast from pointer to integer of different size
    foo.html:483: error: cast from pointer to integer of different size
    foo.html:490: error: cast from pointer to integer of different size
    foo.html:500: error: cast from pointer to integer of different size
    foo.html:503: error: cast from pointer to integer of different size
    foo.html:515: error: cast from pointer to integer of different size

    Once I had finished gibbering in a corner, I had to go back and double check what I had seen.  Unfortunately, the madness that had beset me was not temporary: my eyes were still telling me that the build process for a hardware driver had fed an HTML file into a C compiler, and that C compiler reported that the HTML would crash if run on a 64-bit process.  I immediately went back into the fetal position in the corner.

        Upon investigating, I discovered that foo.html was the strangest HTML file that I had ever seen.  It was almost valid HTML, which the exception of C code that was embedded in it here and there.  The driver build process fed foo.html into an executable that produced C code.  The executable was provided by the vendor in binary-only form -- no source code.  This binary was  licensed from another company that actually made money selling this C-HTML fusion to hapless C-only shops.  Our vendor was apparently unhappy with the quality of the output, so they ran the C output through a perl script that optimized it.  The C code, after being compiled and run, would produce HTML.

     The idea is that this was some C analogue of PHP or ASP.   The vendor provided a default HTML configuration interface for their hardware, powered by this crazy system.  The HTML interface was inextricably entwined with the driver, so we couldn't remove it without losing the ability to configure the hardware at all, even through their C interface.

     Fortunately, the 64-bit errors in the C/HTML code followed a pattern and were simple to fix.  And so I amended the build procedure: once the perl script had optimized the C code, a simple sed script fixed the errors before the compiler was run.



  •  OK. Here is where you put in an official request to pad the walls of your cubicle. Your gibbering is not unreasonable, and, if I were you, I would not leave the fetal position until provided with adequately strong chemicals. Preferably ones made some years ago in the Scottish Highlands. Although the stuff made last week in Siberia would still work

     

     



  • @Huor said:

    The driver build process fed foo.html into an executable that produced C code.  The executable was provided by the vendor in binary-only form -- no source code.  This binary was  licensed from another company that actually made money selling this C-HTML fusion to hapless C-only shops.  Our vendor was apparently unhappy with the quality of the output, so they ran the C output through a perl script that optimized it.  The C code, after being compiled and run, would produce HTML.

     The idea is that this was some C analogue of PHP or ASP.


    Can you do me one favour? Can you please tell me that the TMB-100 & TMB-200 weren't some safety or life critical device, like say a nuclear reactor safety system, or an X-ray machine?



  • @Vanders said:

    Can you do me one favour? Can you please tell me that the TMB-100 & TMB-200 weren't some safety or life critical device, like say a nuclear reactor safety system, or an X-ray machine?
     

    Not even close.  Remind me to tell you stories about the TMB-100 someday.  I don't even want to think about the body count that thing would have racked up if it were safety-critical.

     



  •  If it's a war device, it probably save a lot of lives.



  • @Vanders said:

    Can you do me one favour? Can you please tell me that the TMB-100 & TMB-200 weren't some safety or life critical device, like say a nuclear reactor safety system, or an X-ray machine?

    Don't worry, we don't need the TMB-100 for that.
    @This report about the THERAC-25 said:
    Currently, AECL's primary business is the design and installation of nuclear reactors.



  • That didn't make me go "WTF" so much as it made me go "OMFGWTF!!!". With suitable butchering, that's front page material, right there.


  • Considered Harmful

    @PSWorx said:

    This report about the THERAC-25

    Well, that was the most depressing thing I've read this month.




  • Discourse touched me in a no-no place

    @Huor said:

    actually made money selling this C-HTML fusion
    It's FrankenHTML! And it's being used in production (instead of staying nicely in the IOCCC where it belongs)!@Huor said:
    And so I amended the build procedure: once the perl script had optimized the C code, a simple sed script fixed the errors before the compiler was run.
    So we'll be seeing that build process on the front page before long, submitted by one of your successors?



  • @dkf said:

    It's FrankenHTML! And it's being used in production

    What about BobX?  What about BobX???



  • I saw the "foo.html" and was like: oh! there goes community server again breaking a nice cool story.

    this has to go front page, right now!

    a little snippet of that HTML would be asking too much?



  • Just to be clear, was it the binary that introduced the errors or the Perl code "optimizations"?

    If it was in Perl, you couldn't change it?


Log in to reply