The Daily WTF: Curious Perversions in Information Technology
Welcome to TDWTF Forums Sign in | Join | Help
in Search

Anonymization

Last post 06-28-2013 5:43 PM by Ronald. 80 replies.
Page 1 of 2 (81 items) 1 2 Next >
Sort Posts: Previous Next
  • 06-25-2013 11:49 PM

    Anonymization

    Our production database contains lots of information that is legally required to be kept secured and confidential. As such, it is a mandatory part of our procedures that when a DBA copies production data to some other database, the relevant portions are replaced with anonymized data. You know: First123456, Last789012, 12345 Main Street, Anytown, USA, and such.

    Last week we needed data for one particular client copied to a test database in order to see if something we were trying would scale well. On Friday afternoon, the DBA copied the data, and kicked off the job to anonymize the data.

    Since he was replacing an entire table in the test database, there was no where clause on the anonymizing script.

    Unfortunately, he accidentally pointed the script at the production database.

    And left the script running when he went home.

    And the script churned away, 10,000 rows at a time, overwriting production data.

    And replication dutifully pushed the changes across the network to our standby, backups and DR systems.

    And it was still grinding away this Monday morning.

    And do you have any idea how long it takes to restore a 70+TB database from tape?

    Unfortunately, they already fired the primary DBA for incompetence, and his boss quit in frustration. If they fire this guy, there will only be two DBAs left.

    Hmmmm.

  • 06-25-2013 11:55 PM In reply to

    Re: Anonymization

    snoofle:
    And do you have any idea how long it takes to restore a 70+TB database from tape?
    Three minutes?

    snoofle:
    Unfortunately, they already fired the primary DBA for incompetence, and his boss quit in frustration. If they fire this guy, there will only be two DBAs left.
    They shouldn't fire him, just make him anonymize each row, by-hand.

    Let the healing begin!

  • I may not agree with everything Morbs just said, but he expresses himself in a way that is dignified, respectful, polite and non-threatening!
  • 06-26-2013 2:02 AM In reply to

    Re: Anonymization

    snoofle:
    You know: First123456, Last789012, 12345 Main Street, Anytown, USA, and such.
     

    I was going to make a joke about "how do you know my address". But the above isn't anything at all like my address. Not even close. So the joke wouldn't be funny.


    HardwareGeek:

    <blink> and you're dead!



    "Where is grumpy cat?"
    - Mozilla's MOST ADVANCED USER!
  • 06-26-2013 2:06 AM In reply to

    Re: Anonymization

    99 DBAs fucking it all, 99 living in fear.
    When a DBA's nailed to the server room wall
    There'll be 98 DBAs fucking it all.

  • 06-26-2013 2:31 AM In reply to

    Re: Anonymization

    Lorne Kates:

    snoofle:
    You know: First123456, Last789012, 12345 Main Street, Anytown, USA, and such.

    I was going to make a joke about "how do you know my address". But the above isn't anything at all like my address. Not even close. So the joke wouldn't be funny.

    Ms. Lorn Keats
    π ½ Plodding Moose Pike
    Markham, ON
    Canada (the one living above the USA like a 30 year-old living above his parents' garage)

    Let the healing begin!

  • I may not agree with everything Morbs just said, but he expresses himself in a way that is dignified, respectful, polite and non-threatening!
  • 06-26-2013 4:55 AM In reply to

    Re: Anonymization

    morbiuswilters:
    snoofle:
    And do you have any idea how long it takes to restore a 70+TB database from tape?
    Three minutes?
    He would only be asking if the answer was very small or very large. I flipped a coin and came up with "10 years". I think this coin is fake.
    In Soviet Russia, Football watches you!
  • 06-26-2013 5:14 AM In reply to

    Re: Anonymization

    Shoreline:
    He would only be asking if the answer was very small or very large. I flipped a coin and came up with "10 years".
    Exactly. I realized that "very long" would make more sense, but nothing at snoofie's company ever makes sense. So by the power of deductive reasoning, that only left 3 minutes.

    Let the healing begin!

  • I may not agree with everything Morbs just said, but he expresses himself in a way that is dignified, respectful, polite and non-threatening!
  • 06-26-2013 6:14 AM In reply to

    Re: Anonymization

    snoofle:
    And replication dutifully pushed the changes across the network to our standby, backups and DR systems.

    And that is a real (non-DBA) WTF... For standby and DR, automated synchronization is reasonable, but for (non-versioned) backups the process needs to occur only on "vetted" data. There are definately challenges with balancing the ability to restore back to a previous point in time (something neither standby or DR is focused on) quickly with having a reasonably granular level (e.g. restore back to any hour within the past 24, any day within the past 30), Idealy one should have the ability to selectively apply incremental changes from that point in time to the present.

     I have been fortunate enough to work with some great teams (often in the financial area) to see great success in this area; and also been around long enough to see lots of client pain when they dont have such a system....

  • 06-26-2013 6:41 AM In reply to

    • GreyWolf
    • Not Ranked
    • Joined on 12-03-2007
    • Where the Raspberry Pis grow
    • Posts 76

    Re: Anonymization

    Definitely time for a public execution.

    And I note that your score is two down, two to go.  Darwinian selection implies that the two remaining DBAs will be tougher to kill, leading to even more spectacular stories. We wait with bated breath.

    Yea verily quothing Boomzilla, "The demand for programming is greater than the actual supply of people who actually know what they're doing. "
  • 06-26-2013 6:49 AM In reply to

    Re: Anonymization

    snoofle:
    Unfortunately, they already fired the primary DBA for incompetence, and his boss quit in frustration. If they fire this guy, there will only be two DBAs left.
     

    Taking in account the wtf's you told us, why is unfortunate to fire DBAs?

  • 06-26-2013 6:51 AM In reply to

    Re: Anonymization

    snoofle:

    Unfortunately, they already fired the primary DBA for incompetence, and his boss quit in frustration. If they fire this guy, there will only be two DBAs left.

    Hmmmm.

    Would you rather have a DBA who doesn't know to check which environment his scripts are pointing to... or no DBA? Because a DBA who's no longer employed is a DBA who can't fuck up production data going forward.
    <stryhf> why do people persist in continuing to write SHIT APPLICATIONS IN JAVA
    <stryhf> writing SHIT in JAVA doesn't make it ANY LESS SHITTY
    <stryhf> IT JUST MAKES IT SLOW ASS SHIT
  • 06-26-2013 7:24 AM In reply to

    • TGV
    • Top 75 Contributor
    • Joined on 10-09-2005
    • Posts 705

    Re: Anonymization

    snoofle:
    And do you have any idea how long it takes to restore a 70+TB database from tape?
    But, name and address data should be stored once, and I don't think sucb a table could be 70TB. So, that means that you've got all these names and addresses duplicated all over the place. Now, that might be good for performance (although I fail to see how), but that also means you can rebuild those columns from the central name/address table, which can be restored much faster. Right? Or do you only have full db restore?

     

  • 06-26-2013 7:37 AM In reply to

    Re: Anonymization

    I believe y'all are a bit quick with the FIRE HIM (all the time, not just this thread). It's like a kid's view of employment. I'd just slash his pay to a lower grade. Such a person is obviously not ready for a prime time function and the salary that comes with it.

    Then if he leaves in a huff of misguided righteous indignation, you don't have to pay a severance package.

    See?


    In complex analysis, a meromorphic function on an open subset D of the complex plane is a function that is holomorphic on all D except a set of isolated points

    Filed under: , , ,
  • 06-26-2013 8:10 AM In reply to

    Re: Anonymization

     But isnt the data that got coppied to the test DB still intact because of this error?

    cant you copy back from the test DB (or even change the test server to become the production server)?

    Do things cheap and it will cost you dear
  • 06-26-2013 8:43 AM In reply to

    Re: Anonymization

    morbiuswilters:
    Shoreline:
    He would only be asking if the answer was very small or very large. I flipped a coin and came up with "10 years".
    Exactly. I realized that "very long" would make more sense, but nothing at snoofie's company ever makes sense. So by the power of deductive reasoning, that only left 3 minutes.
    Your logic is sound.
    In Soviet Russia, Football watches you!
  • 06-26-2013 8:48 AM In reply to

    Re: Anonymization

    ip-guru:

     But isnt the data that got coppied to the test DB still intact because of this error?

    cant you copy back from the test DB (or even change the test server to become the production server)?

    Your point is correct and makes sense. There is no place for that sort of logic around here. There is an absolute, unbreakable rule: no data is EVER to be copied from non production systems to production systems. Ever. Period. Not even in this case.

    BTW: There was a glitch in one of the backup tapes, so they had to go back to the previous full back up and start again. They're still applying incremental updates. This means our production systems have been returning garbage data for 5 days and counting; all publicly visible.

  • 06-26-2013 8:57 AM In reply to

    • PJH
    • Top 10 Contributor
    • Joined on 02-14-2007
    • Newcastle, UK
    • Posts 3,916

    Re: Anonymization

    snoofle:
    5 days
    Oops.
    "Because you watched 'The Very Hungry Caterpillar,' we recommend 'The Human Centipede.'"
    --
    UED - Countryside: To kill Piers Morgan
  • Parp!
  • 06-26-2013 9:17 AM In reply to

    Re: Anonymization

    snoofle:
    Your point is correct and makes sense.
     

    Oh Shit, I though there was something Sensible I was missing,

    snoofle:
    There is an absolute, unbreakable rule: no data is EVER to be copied from non production systems to production systems. Ever. Period. Not even in this case.

    So you guys have write access so you can  fuck up break the production da the production data base but are not allowed to fix it even if it sends the company down the toilet for almost a whole week.

    I wonder how the bunch of incompetent fuckwits company you work for stays in business

    Do things cheap and it will cost you dear
  • 06-26-2013 9:24 AM In reply to

    Re: Anonymization

    I think, going forward, you may wish to create an anonymizing view of that table, and instead of copying the table directly export the data from the anonymizing view into the other environments.  You could use SELECT 'LASTNAME' || ROWNUM AS LASTNAME, 'FIRSTNAME' || ROWNUM AS FIRSTNAME, ... etc FROM ADDRESSTABLE;  Then there would be no need to explicitly manipulate the data in the tables themselves.

    I > U
  • 06-26-2013 10:31 AM In reply to

    • Zecc
    • Top 25 Contributor
    • Joined on 06-12-2007
    • and hasn't left since.
    • Posts 2,063

    Re: Anonymization

    ip-guru:
    I wonder how the bunch of incompetent fuckwits company you work for stays in business
    They were recently acquired by a larger company, IIRC. Presumaly only for their clients and nice offices.

    If mixed metaphors were illegal, I'd be having an indigestion.
    typeof NaN == 'number'
    var ò_ó, ಠ⁔ಠ, ᄒᆺᄒ, ᅙᅳᅙ, ᖛᨓᖜ, ꖴᅩꖴ, ఠᨋఠ; // Naming your variables is serious business
  • 06-26-2013 10:44 AM In reply to

    Re: Anonymization

    snoofle:

    If they fire this guy, there will only be two DBAs left.

    Hmmmm.

    Still two DBAs too many, sounds like!
    -= Quango =-
  • 06-26-2013 11:25 AM In reply to

    Re: Anonymization

    dhromed:

    I believe y'all are a bit quick with the FIRE HIM (all the time, not just this thread). It's like a kid's view of employment. I'd just slash his pay to a lower grade. Such a person is obviously not ready for a prime time function and the salary that comes with it.

    Then if he leaves in a huff of misguided righteous indignation, you don't have to pay a severance package.

    See?

    This notwithstanding, Snoofle says "only two DBAs left" like in this particular case it's a bad thing. On the contrary, it's a perfect opportunity to (attempt to) hire someone who's not incompetent.
  • 06-26-2013 11:41 AM In reply to

    Re: Anonymization

    dhromed:

    It's like a kid's view of employment. I'd just slash his pay to a lower grade. Such a person is obviously not ready for a prime time function and the salary that comes with it.

    Then if he leaves in a huff of misguided righteous indignation, you don't have to pay a severance package.

    See?

     

    Not Necessarily, it the UK at least this might be considered "Constructive Dismissal" and still liable for a compensation claim.

    However I agree that fire him is a bit harsh, he made a genuine mistake and the circumstances sugest to me that there were not enough safeguards in place to prevent this.

    sensible use of permisions & parswords  on the database ( Different acciounts on dev & Prod servers, read only access to the prod server for all useres except the application...) could have reduced the possibility of this occouring considerably.

    As in a large number of snoofle's posts it is the processes & management aproach that is most to blame.

     

    Do things cheap and it will cost you dear
  • 06-26-2013 12:58 PM In reply to

    Re: Anonymization

    On the bright side.. now your company has plenty of time to implement all the required functions from your new release that were "left for later".

  • 06-26-2013 2:28 PM In reply to

    Re: Anonymization

     We had a similar cock up happen on when someone fired off the ofuscation script on production database. Luckily we only had 4gb worth to recover.

  • 06-26-2013 3:02 PM In reply to

    Re: Anonymization

    Update: they just fired the guy. Our DBA staff now consists of one application mid-level DBA (who does not have admin privileges in production) and two junior DBAs who are responsible for the production stuff. We don't yet know if they'll be preserving the at-this-point three open slots for new (hopefully competent) DBAs.

  • 06-26-2013 3:51 PM In reply to

    Re: Anonymization

    ip-guru:

    sensible use of permisions & parswords  on the database ( Different acciounts on dev & Prod servers, read only access to the prod server for all useres except the application...) could have reduced the possibility of this occouring considerably.

    As in a large number of snoofle's posts it is the processes & management aproach that is most to blame.

     

    Um, he's a DBA, so he kinda needs admin-level rights to do his job (assuming his job involves more than running restore scripts pointing to the wrong DB).

    Regarding using different accounts with limited rights, I agree completely. Unfortunately DBAs are notoriously bad at this. Example: on a particular project, all devs had their own, restricted-access logins for the client's production DB server. But those logins didn't have enough rights for what we needed to do, and the client's lazy DBAs never got around to fixing them, so we ended up using the SQL account created for our applications... which had full dbo rights, even though our apps have no need to drop tables or kill connections.

    snoofle:

    Update: they just fired the guy. Our DBA staff now consists of one application mid-level DBA (who does not have admin privileges in production) and two junior DBAs who are responsible for the production stuff. We don't yet know if they'll be preserving the at-this-point three open slots for new (hopefully competent) DBAs.

    The mid-level DBA has fewer privileges than the juniors? I can't possibly imagine how that could be problematic...

    <stryhf> why do people persist in continuing to write SHIT APPLICATIONS IN JAVA
    <stryhf> writing SHIT in JAVA doesn't make it ANY LESS SHITTY
    <stryhf> IT JUST MAKES IT SLOW ASS SHIT
  • 06-26-2013 4:17 PM In reply to

    • Ronald
    • Top 25 Contributor
    • Joined on 05-16-2013
    • Flying on a jetplane
    • Posts 1,633

    Re: Anonymization

    The_Assimilator:
    snoofle:

    Update: they just fired the guy. Our DBA staff now consists of one application mid-level DBA (who does not have admin privileges in production) and two junior DBAs who are responsible for the production stuff. We don't yet know if they'll be preserving the at-this-point three open slots for new (hopefully competent) DBAs.

    The mid-level DBA has fewer privileges than the juniors? I can't possibly imagine how that could be problematic...

    Segregation of duties. You don't want developers (which includes their database guy) doing mistakes in production, there is a different department for that.
    Hit Counter
  • 06-27-2013 12:06 AM In reply to

    • Kaslai
    • Not Ranked
    • Joined on 03-19-2013
    • Posts 6

    Re: Anonymization

    TGV:

    snoofle:
    And do you have any idea how long it takes to restore a 70+TB database from tape?
    But, name and address data should be stored once, and I don't think sucb a table could be 70TB. So, that means that you've got all these names and addresses duplicated all over the place. Now, that might be good for performance (although I fail to see how), but that also means you can rebuild those columns from the central name/address table, which can be restored much faster. Right? Or do you only have full db restore?

     

     

     

    Well let's think about this. An average set of customer information contains: First and Last names, address, city, state, postal code, date of birth, and some odds and ends.

    Let's assume that an average first name is 7 wchars, and an average last name is 6 wchars (Yes. Unicode). Date of birth can be represented with a 4 byte int, as can postal code. I'll assume addresses average to around 30 characters, including the name of the city. Any phone number can be represented in any standard form with 20 wchars. For good measure, I'll add 30 bytes for miscellaneous data.

     14 (First Name)
    +12 (Last Name)
    +4 (DoB)
    +4 (Postal Code)
    +60 (Address)
    +40 (Phone Number)
    +30 (Odds and ends)
    =165 bytes!

    So now that we have a conservative estimate of 165 bytes per customer entry, let's see how many fit in to a 70TB database.

    466,459,478,450 entries.

    Now obviously, this database serves many clients. If the company that snoofle works for serves 1000 different clients, that would work out to 466,459,478 entries per client, or roughly 3/2 times the population of the United States. Obviously this number becomes more reasonable the more clients you add or the more liberal you make the estimates.

    Before you go on about joining similar sections of tables together, that brings with it the risk of a potential glitch sending Client A's data to Client B in corner cases.

    There's also a very real chance of this database holding much more than just address data. It could just be a general data cloud that clients have access to. There's no reason why 70TB sounds unreasonable to me.

    Filed under:
  • 06-27-2013 3:31 AM In reply to

    • TGV
    • Top 75 Contributor
    • Joined on 10-09-2005
    • Posts 705

    Re: Anonymization

    Kaslai:
    There's also a very real chance of this database holding much more than just address data. It could just be a general data cloud that clients have access to. There's no reason why 70TB sounds unreasonable to me.
    If that wasn't ironic, I'd recommend you to apply directly via snoofle as their new DBA.

     

  • 06-27-2013 3:44 AM In reply to

    Re: Anonymization

    edgsousa:
    snoofle:
    Unfortunately, they already fired the primary DBA for incompetence, and his boss quit in frustration. If they fire this guy, there will only be two DBAs left.
    Taking in account the wtf's you told us, why is unfortunate to fire DBAs?
    That's what I was thinking. From the stories, it would appear that letting a bunch of chimps administer the database would be both cheaper (basically, the cost of installing some trees and giving them daily fresh fruit) and offer superior service.

     

  • 06-27-2013 3:45 AM In reply to

    • Kaslai
    • Not Ranked
    • Joined on 03-19-2013
    • Posts 6

    Re: Anonymization

     

    TGV:

    Kaslai:
    There's also a very real chance of this database holding much more than just address data. It could just be a general data cloud that clients have access to. There's no reason why 70TB sounds unreasonable to me.
    If that wasn't ironic, I'd recommend you to apply directly via snoofle as their new DBA.

     

     

     

    You mean to say that you don't store images in databases? It's the cool thing to do!
  • 06-27-2013 3:54 AM In reply to

    Re: Anonymization

    TGV:

    snoofle:
    And do you have any idea how long it takes to restore a 70+TB database from tape?
    But, name and address data should be stored once, and I don't think sucb a table could be 70TB. So, that means that you've got all these names and addresses duplicated all over the place. Now, that might be good for performance (although I fail to see how), but that also means you can rebuild those columns from the central name/address table, which can be restored much faster. Right? Or do you only have full db restore?

    Presumably they have to restore the full database from tape to be able to grab the address data. My assumption is that these are backups of the binary data files and transactions logs, and not something where they can select a single table to restore.

    If that's the case, TRWTF is that their DR replica doesn't have some better system of managing backups. For example, put the database on a big SAN and then take periodic snapshots. You can restore one of those in a hurry since it's all online and copy-on-write. As it is, it sounds like their only mechanism for recovering from a bad update is to go to a tape backup, which takes nearly a week to restore. It sounds like whoever designed their DR system made it resilient to hardware error, but not user error, which is just as important, if not more so.

    Let the healing begin!

  • I may not agree with everything Morbs just said, but he expresses himself in a way that is dignified, respectful, polite and non-threatening!
  • 06-27-2013 5:38 AM In reply to

    Re: Anonymization

    Kaslai:
    You mean to say that you don't store images in databases? It's the cool thing to do!

    Sitecore CMS and Sharepoint does it. It makes sense in a CMS because it needs to go through the publishing work-flow before being made public.

     

  • 06-27-2013 5:46 AM In reply to

    Re: Anonymization

    lucas:

    Kaslai:
    You mean to say that you don't store images in databases? It's the cool thing to do!

    Sitecore CMS and Sharepoint does it. It makes sense in a CMS because it needs to go through the publishing work-flow before being made public.

    Or it could just store the images in a normal file system and in the database keep a reference to that file which can then be published.

    Let the healing begin!

  • I may not agree with everything Morbs just said, but he expresses himself in a way that is dignified, respectful, polite and non-threatening!
  • 06-27-2013 5:57 AM In reply to

    Re: Anonymization

    morbiuswilters:
    lucas:

    Kaslai:
    You mean to say that you don't store images in databases? It's the cool thing to do!

    Sitecore CMS and Sharepoint does it. It makes sense in a CMS because it needs to go through the publishing work-flow before being made public.

    Or it could just store the images in a normal file system and in the database keep a reference to that file which can then be published.

    It largely depends....

     1) If the fie system is on a different machine than the database then additional network considerations come into play

     2) If the file system is on the same machine as the database then interactions between the I/O scheduling of the two elements come into play

     3) If the data is volatile then having 100% synchronized backups becomes very difficult.

     So, yes for the vast majority of the cases keeping images out of the database is a better idea. However, there are very specific conditions where there are major benefits to stroing the images directly in the database.

  • 06-27-2013 6:12 AM In reply to

    • TGV
    • Top 75 Contributor
    • Joined on 10-09-2005
    • Posts 705

    Re: Anonymization

    morbiuswilters:
    If that's the case, TRWTF is that their DR replica doesn't have some better system of managing backups.
    That's one of the possible WTFs I was referring to. Because if restoring from tape takes 5 days (or 10 or whatever), backing up to tape will approximately take 5 or 10 days as well, which means you need a lot of extra incremental backups just to cover the period in which you were creating the backup, which seems, er, inpractical. But if they have a more advanced backup, they should be able to restore just name and address information, so there is some other WTF lurking somewhere...

    I once worked on a project where the db was owned and administered by the client, and they had a backup procedure, but only for full recovery, and they had never tested it, and didn't want to test it either. So in all likelihood they were just wasting tapes. It added a little extra tension to pushing changes. Just what the doctor ordered.

     

  • 06-27-2013 8:21 AM In reply to

    Re: Anonymization

    morbiuswilters:
    As it is, it sounds like their only mechanism for recovering from a bad update is to go to a tape backup, which takes nearly a week to restore. It sounds like whoever designed their DR system made it resilient to hardware error, but not user error
    The DR backup was designed by the DBAs who have already been fired.

    re the 70+TB: it's not just one customer info table; it was a set of very Very VERY large tables, and no, there isn't a way to just pick out individual records from a backup; they need to restore it by table, then pick out what they need from the temp restored table back into the main table. Effectively, they need to restore the whole database just to retrieve one customer's data. The really stupid part is they don't have enough space to restore the whole thing, so they have to do it one table at a time. The means that while the restore is going on,  some tables have sensible data and some don't. All while the system is live to customers (all of whom had to be informed that their queries would produce "interesting" results for a week or so until it could be straightened out). A total fiasco.

     

     

  • 06-27-2013 2:23 PM In reply to

    • Ronald
    • Top 25 Contributor
    • Joined on 05-16-2013
    • Flying on a jetplane
    • Posts 1,633

    Re: Anonymization

    morbiuswilters:
    For example, put the database on a big SAN and then take periodic snapshots. You can restore one of those in a hurry since it's all online and copy-on-write. As it is, it sounds like their only mechanism for recovering from a bad update is to go to a tape backup, which takes nearly a week to restore. It sounds like whoever designed their DR system made it resilient to hardware error, but not user error, which is just as important, if not more so.
    SAN-level backup is never a good solution with high-volume systems because you need to quiesce the database to have a consistent snap, otherwise there is just too much uncertainty because of the heavy usage of memory buffers. SAN replication is fine for actual DR where it's all about quick recovery (business must keep going), but there is no way around application-level replication to implement people-proof solutions (those are more RPO-driven, i.e.: you don't want to ask the users to recreate manually everything they lost).
    Hit Counter
  • 06-27-2013 2:36 PM In reply to

    • Ronald
    • Top 25 Contributor
    • Joined on 05-16-2013
    • Flying on a jetplane
    • Posts 1,633

    Re: Anonymization

    snoofle:

    morbiuswilters:
    As it is, it sounds like their only mechanism for recovering from a bad update is to go to a tape backup, which takes nearly a week to restore. It sounds like whoever designed their DR system made it resilient to hardware error, but not user error
    The DR backup was designed by the DBAs who have already been fired.

    re the 70+TB: it's not just one customer info table; it was a set of very Very VERY large tables, and no, there isn't a way to just pick out individual records from a backup; they need to restore it by table, then pick out what they need from the temp restored table back into the main table. Effectively, they need to restore the whole database just to retrieve one customer's data. The really stupid part is they don't have enough space to restore the whole thing, so they have to do it one table at a time. The means that while the restore is going on,  some tables have sensible data and some don't. All while the system is live to customers (all of whom had to be informed that their queries would produce "interesting" results for a week or so until it could be straightened out). A total fiasco.

     

     

    That's the kind of situation where people often discover that the laws of physic are very real for the hardware and that domino effects are not just something funny to watch on youtube. Restoring backups is a very sequential process that does not sit well with parity calculation so all those RAID-6 or RAID-50 volumes that looked so good on the utilization chart are now a huge bottleneck; if Murphy is around this unusual pattern of I/O will cause a few spindles to die, putting even more pressure on the remaining. Plus the cache gets polluted so the overall SAN is impacted by a lower cache hit ratio, causing even more pressure on the spindles. It's unlikely that in this scenario even a highly skilled SA will be able to predict how long it will take to fully recover.
    Hit Counter
  • 06-27-2013 2:38 PM In reply to

    Re: Anonymization

     Backups: perfect storm as-a-solution.


    In complex analysis, a meromorphic function on an open subset D of the complex plane is a function that is holomorphic on all D except a set of isolated points

  • 06-27-2013 3:05 PM In reply to

    Re: Anonymization

    Ronald:
    SAN-level backup is never a good solution with high-volume systems because you need to quiesce the database to have a consistent snap, otherwise there is just too much uncertainty because of the heavy usage of memory buffers.
    What the fuck are you talking about? Every modern database keeps a transaction log that is synced to the controller on every transaction. Either something is in the transaction log, or it's not. You'll never end up with partially-applied transactions, unless something is configure wrong or you have a hardware failure. Now on Linux you might have to quiesce the file system, but that's not usually a big problem, especially on a hot spare that isn't expected to respond to real-time queries.

    Ronald:
    ...but there is no way around application-level replication to implement people-proof solutions (those are more RPO-driven, i.e.: you don't want to ask the users to recreate manually everything they lost).
    Why would application-level replication fix this? The mistakes are just going to get replicated out.

    Anyway, I didn't say anything about SAN replication. I said the DR backup should be taking periodic snapshots of the SAN so it has the ability to easily revert to an early backup. I was assuming application-level replication to the DR server, just the ability to have multiple, online snapshots so if a D(umb)BA ran a script against production accidentally, you could revert more readily.

    Let the healing begin!

  • I may not agree with everything Morbs just said, but he expresses himself in a way that is dignified, respectful, polite and non-threatening!
  • 06-27-2013 3:47 PM In reply to

    • Ronald
    • Top 25 Contributor
    • Joined on 05-16-2013
    • Flying on a jetplane
    • Posts 1,633

    Re: Anonymization

    morbiuswilters:
    Ronald:
    SAN-level backup is never a good solution with high-volume systems because you need to quiesce the database to have a consistent snap, otherwise there is just too much uncertainty because of the heavy usage of memory buffers.
    What the fuck are you talking about? Every modern database keeps a transaction log that is synced to the controller on every transaction.
    You don't understand how this works. The transaction log is not automatically written to disk; depending on the RDBMS there is one or multiple writers that will use one of multiple buffers. The only thing you can say for sure is that the log of a specific operation will be persisted before the data itself, which makes recovery possible, but by no mean does that imply that physical I/O is performed on the spot. The only way you can be sure that everything is on the disk is by letting the database engine know that it should stop using buffers because you need disk-level consistency, and this is called quiescing the database.

    Physical I/O latency is terrible. If every data operation was immediately persisted on disk the performance would be abysmal, it would be like working with 1MB or RAM and having the pagefile involved constantly.

    Hit Counter
  • 06-27-2013 3:49 PM In reply to

    • Ronald
    • Top 25 Contributor
    • Joined on 05-16-2013
    • Flying on a jetplane
    • Posts 1,633

    Re: Anonymization

    morbiuswilters:
    Ronald:
    ...but there is no way around application-level replication to implement people-proof solutions (those are more RPO-driven, i.e.: you don't want to ask the users to recreate manually everything they lost).
    Why would application-level replication fix this? The mistakes are just going to get replicated out.

    Application-level means that the database engine is made aware of what is going on and is actively involved in the process. Database mirroring, log-shipping, AlwaysOn and other such technologies are application-level replication, as opposed to storage-level which is basically just ghosting volumes without concern for how they are used.
    Hit Counter
  • 06-27-2013 4:00 PM In reply to

    Re: Anonymization

    Ronald:
    The transaction log is not automatically written to disk; depending on the RDBMS there is one or multiple writers that will use one of multiple buffers. The only thing you can say for sure is that the log of a specific operation will be persisted before the data itself, which makes recovery possible, but by no mean does that imply that physical I/O is performed on the spot.
    I don't know what the fuck you're talking about. In every database I've ever used, the transaction log was physically synced to disk on every commit, unless you specifically told it to buffer for awhile.

    Ronald:
    The only way you can be sure that everything is on the disk is by letting the database engine know that it should stop using buffers because you need disk-level consistency, and this is called quiescing the database.
    What database are you talking about here?

    Ronald:
    Physical I/O latency is terrible.
    Unless you have, like, a battery-backed write cache. You know, a standard feature on every fucking server that costs more than $200. That will give you high IOPS. You can also use SSDs, which will also give high IOPS, although they still tend to be a bit flaky in my experience.

    Let the healing begin!

  • I may not agree with everything Morbs just said, but he expresses himself in a way that is dignified, respectful, polite and non-threatening!
  • 06-27-2013 4:02 PM In reply to

    Re: Anonymization

    Ronald:
    Application-level means that the database engine is made aware of what is going on and is actively involved in the process. Database mirroring, log-shipping, AlwaysOn and other such technologies are application-level replication, as opposed to storage-level which is basically just ghosting volumes without concern for how they are used.
    I know what application-level means. That should have been obvious from my question, which you did not answer: how is application-level replication going to help when someone runs a script against the wrong database? The undesired updates are going to replicate out to the slaves.

    Let the healing begin!

  • I may not agree with everything Morbs just said, but he expresses himself in a way that is dignified, respectful, polite and non-threatening!
  • 06-27-2013 4:13 PM In reply to

    • Ronald
    • Top 25 Contributor
    • Joined on 05-16-2013
    • Flying on a jetplane
    • Posts 1,633

    Re: Anonymization

    morbiuswilters:
    Ronald:
    Physical I/O latency is terrible.
    Unless you have, like, a battery-backed write cache. You know, a standard feature on every fucking server that costs more than $200. That will give you high IOPS. You can also use SSDs, which will also give high IOPS, although they still tend to be a bit flaky in my experience.
    You confuse database buffers and storage subsystem buffers, which is like confusing server side and client side for a web developer. Talking about battery-backed cache and flaky SSDs is also a clear indicator that you have no experience with serious enterprise infrastructure, so whatever advice you can offer to an organization having to deal with high-volume transactions on 70TB of live data is at best worthless.
    Hit Counter
  • 06-27-2013 4:25 PM In reply to

    Re: Anonymization

    Ronald:
    You confuse database buffers and storage subsystem buffers...
    No, I didn't. Seriously, are you fucking illiterate?

    Ronald:
    Talking about battery-backed cache and flaky SSDs is also a clear indicator that you have no experience with serious enterprise infrastructure, so whatever advice you can offer to an organization having to deal with high-volume transactions on 70TB of live data is at best worthless.
    You're the idiot who doesn't even know how a fucking database works. And I'm sure I know more about enterprise infrastructure than you do, you fucking twat.

    I also like how you've ignored every single question I asked. Seriously, what fucking database are you talking about that doesn't sync the transaction log on every commit by default? How is application-level replication going to help a bad update? It's like you don't know the first fucking thing about databases and are just taking the piss.

    Let the healing begin!

  • I may not agree with everything Morbs just said, but he expresses himself in a way that is dignified, respectful, polite and non-threatening!
  • 06-27-2013 5:46 PM In reply to

    • Ronald
    • Top 25 Contributor
    • Joined on 05-16-2013
    • Flying on a jetplane
    • Posts 1,633

    Re: Anonymization

    As far as databases and storage are concerned, you fall in that category of people who know enough to cause problems but not enough to help. If you don't know it by now then in all likeliness it's a pattern that spreads in other areas of your expertise.

    No wonder you find everyone stupid, you clearly have extremely low metacognition skills and you have no clue when you're out of your depth.
    Hit Counter
  • 06-27-2013 6:04 PM In reply to

    Re: Anonymization

    Ronald:
    As far as databases and storage are concerned, you fall in that category of people who know enough to cause problems but not enough to help. If you don't know it by now then in all likeliness it's a pattern that spreads in other areas of your expertise.

    No wonder you find everyone stupid, you clearly have extremely low metacognition skills and you have no clue when you're out of your depth.
    I take your inability to answer a single question I posed as proof that you don't know what the fuck you're talking about.

    Seriously, what the fuck do you do for a living? I'm imagining a sad office tech who keeps getting called over to reboot the boss' SIP phone or clear out the malware on the demo laptops picked up from porn sites after Sales is done with the yearly trade show. You don't seem to know anything about storage or databases, at least.

    Let the healing begin!

  • I may not agree with everything Morbs just said, but he expresses himself in a way that is dignified, respectful, polite and non-threatening!
Page 1 of 2 (81 items) 1 2 Next >
Powered by Community Server (Non-Commercial Edition), by Telligent Systems