The DBA is dead; Long Live the DBA



  • I have posted countless articles about our "DBAs". Now it's finally happened. The lead DBA went to change something in a test environment, but accidentally grabbed a window with the production database open, and dropped a critical table for a system that has been under public customer scrutiny for serious performance issues. Then he went to another DBA to ask "how to undo it, but could we just keep this between us?". That DBA didn't know, so he went to the developers to find out. From there, it rocketed up hill.

    Management made an example of the (first) DBA and fired him.

    While I don't like to see anyone lose their job, perhaps this fellow might be better off in another profession.

    The person they assign to replace him is tasked with promoting the change the first guy was supposed to be doing in the test environment to production. It involved dropping some tables. Changing others. Running a whole bunch of scripts. In other words, the usual DB deployment.

    Somehow, instead of just running the provided scripts, he decided to take a shortcut by querying the system tables and then just looping over the matching tables.

    It might have worked if he hadn't fogotten the where-clause.

    He may as well have just entered: drop database

    28 hours later the restore-operation finally finished. Then they first had to do the deployment. Since the DBAs had proven that they could not be trusted, management decided that the developers should make the change (since they had done it in assorted development environments). Other management stopped that because developers were not allowed to touch production (a reasonable stance). Unfortunately, this stalemate, and the ensuing arguments lasted 6 hours. Monday had come and gone and the system was still down and the customers were a*n*g*r*y.

    Here we go again!

     


  • Considered Harmful

    What. The. Fuck.

    You never fail to deliver, Snoofle. You could provide all the content for this site.

    Any idea how much money this cost the company or its clients?



  • You need your own site: TheDailyWTS



  • It's not so much the loss of revenue for one day, as the trashing of our reputation (we were trying to do something bleeding edge to get a jump on the competition).

    I actually fixed the performance problems under the hood prior to the original test. The botched deployment cost us not only this customer, but all the others that we will now have to compete with others to win over.



  • @snoofle said:

    Management made an example of the (first) DBA and fired him.

    Based on what he did I hope it was "out of a cannon" instead of just "from his job", but then again this is where snoofle works so I know that hope is in vain.



  • @snoofle said:

    developers were not allowed to touch production (a reasonable stance)

    Not where you work it isn't!

    What are the DBAs even for, FCOL?

     



  • @snoofle said:

    Then he went to another DBA to ask "how to undo it, but could we just keep this between us?". That DBA didn't know, so he went to the developers to find out.
     

    Perhaps your HR dept could include "Explain the purpose of Oracle Flashback and when you would use it" in their interview questions for these two's replacements.



  • So, you've got two DBA's who don't know how to restore a database to prior state? You've got a lead DBA who doesn't work with a rollback policy (yes, you can rollback on a drop)?

    I would have fired the DBA's and the persons who interviewed and hired this people.



  • @DaveK said:

    FCOL
    It wasn't immediately obvious to me what this meant, but in my mind I read it as "F..k All", which kind of applies.



  • @snoofle said:

    I have posted countless articles about our "DBAs". Now it's finally happened. The lead DBA went to change something in a test environment, but accidentally grabbed a window with the production database open, and dropped a critical table for a system that has been under public customer scrutiny for serious performance issues. ....
    When I started reading this I had to double check the date you posted it. It started off sounding eerily family to a previous post of yours regarding a sales guy (?) doing a demo and working on production directly. Now I don't know what is scarier .. that I remember .. or your company seems to hire people who are not the sharpest pencil in the drawer.



  • @ubersoldat said:

    So, you've got two DBA's who don't know how to restore a database to prior state? You've got a lead DBA who doesn't work with a rollback policy (yes, you can rollback on a drop)?

    I would have fired the DBA's and the persons who interviewed and hired this people.

    +1. Especially, the [b]lead[/b] DBA doesn't know if/how to rollback/restore a dropped table, so asks a junior. Really?



  • More like lead-head DBA, amirite?



  • @ubersoldat said:

    So, you've got two DBA's who don't know how to restore a database to prior state? You've got a lead DBA who doesn't work with a rollback policy (yes, you can rollback on a drop)?
     

    Past WTsnoofles have implied that these people aren't actually DBAs. They're executive-protected fuckwits that are paid to find inventive ways of managing corporate data by offloading their responsibilities onto everyone else.@ubersoldat said:

    I would have fired the DBA's and the persons who interviewed and hired this people.

    One's been hung out to dry, so that's a start at least. Perhaps ripples will now be felt amongst those that contributed to this failure.

    @Zecc said:

    "F..k All"

    Are you trying to avoid the Summoning of Silentrunner?

    @OzPeter said:

    or your company seems to hire people who are not the sharpest pencil in the drawer.

    Snoofle will have to confirm this one, but ISTR the original company hired buttfuck useless walking wastes of atoms, many of which are now being viewed as discardable since the new Company() takeover. That right, snoofs?

     


  • Trolleybus Mechanic

    @Cassidy said:

    @snoofle said:

    Then he went to another DBA to ask "how to undo it, but could we just keep this between us?". That DBA didn't know, so he went to the developers to find out.
     

    Perhaps your HR dept could include "Explain the purpose of Oracle Flashback and when you would use it" in their interview questions for these two's replacements.

     

    Perhaps your HR dept could include "Explain the purpose of SELECT and whe you would use it" in their interview question for these two's replacements.

     



  • @Lorne Kates said:

    Perhaps your HR dept could include "Explain the purpose of SELECT and whe you would use it" in their questions to the existing DBAs that wish to keep their jobs.
     

    .. would grab low-hanging fruit.

    @snoofle said:

    That DBA didn't know, so he went to the developers to find out.

    @snoofle: did any devs actually know the answer?



  • Somehow at some point while reading snoofle's post, Yakety Sax started playing in my head. And now I'm imagining the whole story as a sketch, the first DBA fucking up the database, getting panicked, going to the second DBA and, with a lot of gesturing, explaining what happened,the second just shrugging, both going to the developers, arms-a-waving, the developers going off the rails, throwing their arms in the air, the entire group of, by now a dozen or more, people going to the managers and everybody yelling at everybody. Sped-up. With Yakety Sax as background music.The first DBA is literally booted out the door, landing in a heap face-first on the ground; the second DBA calms everyone down, the music stops, the movie resumes at normal pace. Then he enters a delete without a where clause and here we go again.



  • Snoofle,

    Once everything calms down you will have to tell us what the long term ramifications are.



  • @Cassidy said:

    @Lorne Kates said:

    Perhaps your HR dept could include "Explain the purpose of SELECT and whe you would use it" in their questions to the existing DBAs that wish to keep their jobs.
     

    .. would grab low-hanging fruit.

    @snoofle said:

    That DBA didn't know, so he went to the developers to find out.

    @snoofle: did any devs actually know the answer?

    That "dev" would be me, and yes, I told him what to do. I had provided undo scripts for each step of the installation, so even forgetting the built in features of Oracle-rollback, the screwup could have been undone by simply running the undo script. There still would have been visible downtime in production, but it would have worked.

    I've frequently said I am not a db-guy, so I tend to brute force DDL in the simplest possible way; there is invariably a better, more sophisticated/efficient way to do things. The second DBA recognized that, and tried to one-off a command to do it all (as opposed to just running my scripts). He just omitted the where clause and so applied the guts of my scripts to every table in the schema... whoops.



  • @Cassidy said:

    Perhaps your HR dept could include "Explain the purpose of Oracle Flashback and when you would use it" in their interview questions for these two's replacements.

    and when you cannot use it...

    both DROP TABLE foo PURGE; and TRUNCATE TABLE foo can (as far as I know) not be remediated via Flashback queries.

    But, if you have a RMAN backup (or anything else that backs up all your archive logs properly) you can do a point-in-time recovery to the point just before the drop and then either reapply the remaining commits in the archive logs selectively or copy the accidentally dropped table back into the production environment. (depending on how much else happened in the production environment while you did that).



  • @Cassidy said:

    Snoofle will have to confirm this one, but ISTR the original company hired buttfuck useless walking wastes of atoms, many of which are now being viewed as discardable since the new Company() takeover. That right, snoofs?
    To some degree. There is a new Boss+2 in town and in his first few weeks, he canceled a release due to stability issues (good), demanded daily reports of QA runs (none of which had been scheduled), and bug counts/explanations; all to force the thing to stabilize before turning it loose on customers. I view this as a good thing. So far, he's standing his ground.

    BUT.... this is just this one guy. The parent company doesn't seem to care all that much (some, but not much), and as I had guessed, seems to be an even bigger unending supply of WTF.

     



  • @mihi said:

    @Cassidy said:

    Perhaps your HR dept could include "Explain the purpose of Oracle Flashback and when you would use it" in their interview questions for these two's replacements.

    and when you cannot use it...

    both DROP TABLE foo PURGE; and TRUNCATE TABLE foo can (as far as I know) not be remediated via Flashback queries.

    But, if you have a RMAN backup (or anything else that backs up all your archive logs properly) you can do a point-in-time recovery to the point just before the drop and then either reapply the remaining commits in the archive logs selectively or copy the accidentally dropped table back into the production environment. (depending on how much else happened in the production environment while you did that).

    This particular database/schema contains all the reference and customer data - 60+TB, so (I'm told) Flashback is not a viable option. Fortunately, we keep parallel (just in case of tape-failure) copies of monthly full backups and daily delta backups. Many of the tables have row counts in excess of 5 billion, so SELECT'ing before doing isn't always a viable option.

    I always test my queries and undo's in a smaller DB, but others don't.

     



  • @Anketam said:

    Snoofle,

    Once everything calms down you will have to tell us what the long term ramifications are.

    Things don't calm down at Snnofle's place, they go directly from one crisis into another.



  • @blakeyrat said:

    @Anketam said:
    Snoofle,

    Once everything calms down you will have to tell us what the long term ramifications are.

    Things don't calm down at Snnofle's place, they go directly from one crisis into another.

    You are quite right. Without telling any of the senior developers (who do the interviewing) or tech managers, someone hired 22 neophyte developers, straight out of school, to come up to speed and start cranking on a major rewrite scheduled for this year.

    ... Because folks just out of school have the architectural chops to take into account all the usual things when designing stuff.



  • @snoofle said:

    He just omitted the where clause and so applied the guts of my scripts to every table in the schema... whoops.
     

    Holy shitting fuck. That's... no, never mind. I've got to remind myself we're talking about $snoofle->DBA.

    @mihi said:

    both DROP TABLE foo PURGE; and TRUNCATE TABLE foo can (as far as I know) not be remediated via Flashback queries.

    Yeah, okay, smartarse - you're right on that front. I was thinking of snoof's DBAs just running a simple "DROP TABLE" (provided the recycle bin was on).
     But you're right for that situation.

    @snoofle said:

    So far, he's standing his ground.
     

    That's good.@snoofle said:

    BUT.... this is just this one guy. The parent company doesn't seem to care all that much

    That's not. I got the impressions that things would improve since the takeover... it's not looking promising.


  • @snoofle said:

    60+TB, so (I'm told) Flashback is not a viable option.
     

    It's only an option if you want some cover between backups and you have the storage (varies upon how volatile the data is).

    It's not needed if you've got fairly careful users and good DBAs.



  • @snoofle said:

    22 neophyte developers, straight out of school, to come up to speed and start cranking on a major rewrite scheduled for this year.
     

    .. and who's been tasked to bring them up to speed? Why, the senior developers that are currently assigned to the rewrite!



  • Admission:

    I've done something similar, but on a way smaller scale. Our document management system sometimes likes to bunge itself when unreserving a document via the interface. Vendor supported resolution is just to run a small update query on the database.

    Unfortunately, one time for whatever reason I had the where clause document id commented out in SQL Server Manage Studio. A query that takes a split second turned into about 5 seconds or so as hundreds of documents were unreserved.

    Whoops.

    Thank god for transactions.



  • @Nexzus said:

    I've done something similar...

    Thank god for transactions.

     

    There's your safety-net.

    The WTF isn't what he did, it's about what impact it has upon the business.

    Everyone makes mistakes, it's just a matter if you can correct the mistake before it adversely affects the business.

    In this situation there didn't appear to be many safeguards in place (snoofle's incompetent DBAs are perceived as untouchable experts so their irresponsible behaviour is quite dangerous) yet all would have been well if the DBA(s) in question had the capability of undoing their actions.

    Sadly, not.

    @snoofs: anyone determined the cost of this downtime yet?

     



  • @snoofle said:

    I always test my queries and undo's in a smaller DB, but others don't.
    I think that this is the crux of the matter. There is some old adage about "Untested code should be considered broken". I know that no matter how smart I think I am that I'm not perfect and that I can and will screw up - so its better to do that where it won't cause problems.



  • @snoofle said:

     

    ... Because folks just out of school have the architectural chops to take into account all the usual things when designing stuff.

    I've been on the other end of this. Fooled into thinking that 35k a year was an entry-level salary, had barely anything explained and working with people whos personal goal was to teach me as little as possible (For their job security).

    I'm not going to deny that I thought I was smarter then I was in the programming field (Straight out of uni, everyone does), But I always shut up and listened. but in the end, it's not a good first step into the industry. Hell, I'm a Service Desk monkey now, too scared after the handful of jobs I've gone through.

     


  • Garbage Person

    @Adanine said:

    I've been on the other end of this. Fooled into thinking that 35k a year was an entry-level salary
    I'm on the hiring side right now. $35k is $5k higher than our entry salary is. Honest to god, that's all we can afford to pay thanks to the insane fact that we have to recover from other business units for time spent in order to pay our bloody costs - and that recovery rate is capped by company policy at $80/hr. We figure we get 20 useful hours out of an entry level programmer in a week (we have metrics that prove this), so $40/hour. For an entry level, only about half of that will be billable time (factoring in the fact that they distract higher-end workers, need training, etc.). So they earn us on average $20/hour - or $40k/year. With benefits and costs, we're paying $60k for that person. We lose $20k/yr to have an average entry level employee on the books. They are a loss-making asset that needs to be covered by more experienced workers.

    Lets take me, earning $70k. I cost the business unit $100k to have on the books. Working full-on, 3/4 on billable, I pull in EXACTLY $120k of fake internal revenue (actual revenue I generate for the company as a whole is measured in millions - but by the time it hits my business unit, it's all evaporated). So I cover exactly one entry level employee. Since we also have to rent our offices from other business units, along with our servers, support, bandwidth, disk space (by the megabyte!), phone lines, desks, etc., we literally cannot afford to pay what anyone is supposedly worth.

    Oh, and those servers and desks and shit that we "rent". We had to buy them in the first place (or, rather, "fund their purchase"), but their ownership goes to other business units - who then charge us to use them. It's friggin' craptacular.




  • That's very... Wow.

    From what I've heard 40-45k was entry level in Australia, but I haven't heard much. I was recruited into a Service Desk role at 48k. I've worked in two major corporate environments, but neither was anything like what you've just described in terms of cost and purchasing.

    What would you put as an honest salary for an Entry Level position? And after 6 months of work, where the training and constant assistance is no longer an issue?



  • @Adanine said:

    From what I've heard 40-45k was entry level
    in Australia, but I haven't heard much.

    40k in Australia should be straight out of uni, since that is the threshold for repaying HECS (or whatever it is called these days). Then going up from there. This is including tax but not super, for a typical 38 hour work week.



  • @Weng said:

    We had to buy them in the first place (or, rather, "fund their purchase"), but their ownership goes to other business units - who then charge us to use them. It's friggin' craptacular.
     

    .. and nobody's seen a problem with this? Other than those paying, obv.


  • Garbage Person

    To give you an idea of the effects of this on IT infrastructure, I have in my hands the response to an infrastructure ticket. "Can you double the available storage on the following virtual servers:

    <list of six>

    from 8gb to 16gb"

     

    The gist of it is as follows:
    Those six virtual servers share a single direct attached storage box using 15000RPM 32gb SCSI Ultra320 drives (for the hardware illiterate, that standard dates from 2003, and the drives in that speed bracket are 300gb now).. No additional storage is connected to the host. This storage array is fully provisioned. To accomplish this task, we will have to swap out all 32 disks for current 300gb models at a cost of $18000 billable to you and as you are the only one requesting the upgrade, we will provision all 8.5TB of new space to you at $8500 monthly. Estimated downtime fees for this upgrade are $5000 and install labor is estimated at $3200.

    I asked for 48gb of disk space. I was given an estimate of  $26200 plus $8500 monthly.

    When I inquired about the cost of moving to a newer VM cluster in the corporate datacenter with a real SAN, as opposed to the rinkydink local cluster and its decade-old DAS, the ballpark one-time fee was mid-six figures (for unspecified reasons that I suspect are "We don't actually have that cluster yet") and tens of thousands of dollars monthly, because they're production support servers and therefore need an SLA other than "We'll fix it if we feel like it".

    The last time we needed new servers, we actually procured physical servers, because VMs are more expensive.

    NOW I know why all of middle management is obsessed with Google Docs. It's literally impossible to store data in this fucking company.

     

    It's heralded as the best idea ever - every business unit in the company "has to" run at break even at worst. Somehow, of course, the company as a whole still loses money hand over fist.I think the current thinking to deal with that is to blame The Shareholders.



  • @Weng said:

    It's heralded as the best idea ever - every business unit in the company "has to" run at break even at worst.

    Hate to break it to those heralds, but that is definitely not the best idea ever.

     



  • @Cat said:

    @Weng said:
    It's heralded as the best idea ever - every business unit in the company "has to" run at break even at worst.
    Hate to break it to those heralds, but that is definitely not the best idea ever.

    The best idea was to separate the condenser from the cylinder, James Watt, 1765.



  • @Weng said:

    It's heralded as the best idea ever - every business unit in the company "has to" run at break even at worst.


    As you probably already know, its a shitty idea. You don't run your cleaners on a cost recovery basis. At least, you shouldn't... Same with IT - it is an enabling branch of a company, not a profit-generting branch. Paying your IT department is exactly the same as paying your power bill - it is a cost of doing business that you factor into your prices.


  • Garbage Person

     @havokk said:

    As you probably already know, its a shitty idea. You don't run your cleaners on a cost recovery basis. At least, you shouldn't... Same with IT - it is an enabling branch of a company, not a profit-generting branch. Paying your IT department is exactly the same as paying your power bill - it is a cost of doing business that you factor into your prices.
    Actually, someone figured out that the non-revenue parts of IT could be sold wholesale to another company - servers, datacenters, people, etc. and then paid for as a service. It's like cloud computing, but even fucking dumber!

    So the parts of IT we still do have are all either core software development groups (those we haven't outsourced yet, anyway) or customer-facing service groups that actually generate direct profit (like mine), and we have to pay a extortionate rates to a single-source outside vendor masquerading as an internal group (to the point that we actually pay them through our internal charging system, not like a real vendor. But make no mistakes - they ARE a for-profit company, and their fiscal results are usually better than ours)

     

    I really should get my own thread for this.



  • @snoofle said:

    The lead DBA went to change something in a test environment, but accidentally grabbed a window with the production database open, and dropped a critical table

    As others have pointed out, people make mistakes. Given this truth you would hope that a DBA who's about to drop tables would make very sure he was pointing to the right environment.

    Anyone whose self-preservation instinct is so poorly developed probably deserves to go, but you could also argue that your company has paid for his lesson so should keep him on to benefit from it. I'm going out on a limb here... but surely they wouldn't be dumb enough to make the same mistake again.


Log in to reply