Important Data



  • Say you have crucial, business data, that absolutely has to be safe.

    So you do the right thing, and invest in a RAID6 system with double parity.

    Which works great. So great that you don't even notice when the first hard drive fails.

    And not even when the second hard drive fails.

    Unfortunately you do notice when the third hard disk fails.



  • Which reminds me.

    #zpool status


    looks good. Probably time to run a resilver. Also should set up a cron job to notify me of hard drive failures.


  • @Skywolf said:

    Say you have crucial, business data, that absolutely has to be safe.

    So you do the right thing, and

    BACK THE FUCKER UP EVERY DAY.

    Repeat after me: Even if you use RAID, you still need backup.

    Sysadmins who behave as if they don't understand this are TRWTF.



  • Also, WTF kind of half-assed RAID doesn't send notifications and/or sound a loud screechy alarm in the server closet on disk failure?



  • If they didn't notice that 2 HDDs had failed in their RAID, would you be surprised if I told you that they also didn't notice that their backup had fallen over 2 months ago?

    It's called WTF for a reason ;)



  • I've made a post in the new forums pointing at this post. Hope you don't mind.

    http://what.thedailywtf.com/t/old-forum-is-still-alive-important-data/3163



  • yes yes, we are still alive here.

    Some of us don't want yet another login just to post to this site.
    Some of us don't want to drink this discord koolaid.
    Some of us know better, then again some of us are just so set in our ways even if it was better we would still be here.
    Not sure which camp I am in.



  • Oh in addition, reading codingHorros posts just lets me know that Discord is really not someplace I want to be.
    i.e. he will often just call someone a "pussy" for not seeing things his way.
    People managing THAT forum seem to think that editing peoples posts without notification or a foot note is perfectly all right. The clue bat says, it isn't, because posts are considered quotes and you just changed the words they said to your own version while keeping it attributed to the original poster. So maybe I can't trust my prior point any more.


  • Discourse touched me in a no-no place

    @KattMan said:

    Oh in addition, reading codingHorros posts just lets me know that Discord is really not someplace I want to be.
    i.e. he will often just call someone a "pussy" for not seeing things his way.
    People managing THAT forum seem to think that editing peoples posts without notification or a foot note is perfectly all right.

    Our DC forum is somewhat differently moderated than meta.d - has been for a while...



  • @KattMan said:

    yes yes, we are still alive here.

    And to judge by the fact that today's Error'd uses /Comments/ rather than what.thedailywtf I wonder whether Alex et al. are losing the will to push it.



  • @pjt33 said:

    @KattMan said:
    yes yes, we are still alive here.

    And to judge by the fact that today's Error'd uses /Comments/ rather than what.thedailywtf I wonder whether Alex et al. are losing the will to push it.

    One can only hope.



  • Actually... in a system that can stand the failure of two disks but not of a third disk, there is an argument for shutting down everything when the second disk fails. That way, if you didn't notice the first one, you'll be sure to notice the second one, but you can still recover your data. Of course, you mustn't actually stop the disks, because the problem is probably old disks that have been spinning for years, and they will never start again...



  • Next time you should build the same thing but use Triple Parity.



  • @Lawrence said:

    Actually... in a system that can stand the failure of two disks but not of a third disk, there is an argument for shutting down everything when the second disk fails. That way, if you didn't notice the first one, you'll be sure to notice the second one, but you can still recover your data. Of course, you mustn't actually stop the disks, because the problem is probably old disks that have been spinning for years, and they will never start again...

    But... the purpose of redundancy isn't data preservation, it's availability. Why bother investing in a system that is double redundant if that system voluntarily takes itself offline after the failure that your redundancy was supposed to be able to handle?


  • Discourse touched me in a no-no place

    @Jaime said:

    Why bother investing in a system that is double redundant if that system voluntarily takes itself offline after the failure that your redundancy was supposed to be able to handle?
    Because, in this example, the user is stupid enough to ignore the first failure. A second failure is indicative of a third to follow soon after, so self-preservation against stupid users?



  • Redundancy is designed to preserve the availability of the system. Taking the system offline after the second failure makes the system unavailable sooner than waiting for the third failure.

    If you're suggesting that taking the system offline will preserve the data, then the wrong system is being used for data preservation. If you need to be able to recover all of the data on a drive up to the point of failure, don't invest in RAID, use something like Live Vault that does real time continuous backups.



  • @Jaime said:

    real time continuous backups

    are undoubtedly a Good Thing, but any careful admin would set up traditional "do them regularly and park them offline and offsite" backups first and then leave that infrastructure in place even after the CDP stuff is all up and running.

    It's disturbingly easy for Continuous Data Protection to turn into Corrupted Data Propagation.

    Also, offsite CDP ought to be an addition to RAID, not a substitute for it. Properly monitored RAID is still the best way to achieve high local data availability. It just isn't a backup system and should never be thought of as one.


Log in to reply