The Daily WTF: Curious Perversions in Information Technology
Welcome to TDWTF Forums Sign in | Join | Help
in Search

We need duplicate tables all over the database

Last post 11-07-2012 5:48 PM by zipfruder. 29 replies.
Page 1 of 1 (30 items)
Sort Posts: Previous Next
  • 11-05-2012 9:59 PM

    • atipico
    • Top 500 Contributor
    • Joined on 12-30-2010
    • United Republic of Bananas
    • Posts 126

    We need duplicate tables all over the database

    Locked Reply Contact
    Disclameir: I don't understand much about databases so TRWTF might be my lack of knowledge.

    So there's a need for a sign up system for applicants and despite we're the information security team we're going to develop this (possibly another WTF, I guess, but I'm new here so what do I know). So according to my boss there's going to be a "user" table, and then an "applicant" table. The data from the "user" table (such as name, address, ID number, etc etc etc) is going to be copied* into the "applicant" table. Not referenced, but rather copied.

    *(there will be more data on the "applicant" table along with the copied data)

    The reason for this? The data will be used for rather serious and official stuff, and my boss wants to avoid situations like, let's say, hipothetically:

    [hypothetical]
    - applicant signs up,
    - official application papers are sent to the address provided,
    - applicant logs in and changes address,
    - applicant sues the institution because the official application papers were sent to the "wrong" address,
    - upon further inspection, the applicant address present on the database is different from the address the papers were sent to,
    - chaos ensues.
    [/hypothetical]

    (Other possible cases are name changes after marriages, etc).

    The solution to this? User fills a form when signing up, double-check his personal data, clicks "OK" and then cannot change it. Personal data gets saved to "user" table. This table will be used only for login purposes. Then the registered user can pick an open position (or more) and apply for it. For each new application, user has to re-enter his personal data -- name, address, all the same stuff again (I'm guessing it will be auto-populated from the user table) -- then double-check it again, clicks OK, and then, guess what, cannot change this set of data too. If the applicant ever needs to make changes or corrections, then he's got to manually fill a form and phisically deliver it, in person, with attached copies of personal ID's or some other bureacracy like that.

    Althouhg I can understand the bureacratic need for following the process, and my boss's needs for avoiding future f#$%-ups, is that how things are properly done? As I said, I don't know much about databases, but isn't the proposed approach just creating lots of duplicate, dereferenced, potentially inconsistent data? Is this the only way, or the best way, for dealing with this?
  • 11-05-2012 10:43 PM In reply to

    Re: We need duplicate tables all over the database

    Locked Reply Contact
    Lots of different options for tracking changes in address. I would probably do something along the lines of: User table: UserID, Name, etc. Application Table: UserID, ActiveFlag, positionAppliedFor, dateApplied Address Table: UserID, ActiveFlag, Added_Date, SeqNum, Type (Home, Business, etc), Street, etc Phone Number Table: UserID, ActiveFlag, Added_Date, Type (Cell, Home, etc) Make each UserID only have one "Active" application, address, phone number, etc. I've also seen "ActiveFrom" and "ActiveTo" fields used... change ActiveTo field to yesterdays date, add new record that's active today through 4001-01-01: AddressTable: UserID, ActiveFromDate, ActiveToDate, type, Street, etc
  • 11-05-2012 10:44 PM In reply to

    Re: We need duplicate tables all over the database

    Locked Reply Contact

     There are places where data normalization needs to be relaxed. A common one is the 'invoicing application' that is taught in school. You know, items table, customers table, linked together through 'invoices' and 'line items' tables. To view an invoice you pull customers info from the customers table, and descriptions and prices from the th items table, right? Wrong! prices change, but invoices don't, so you have to copy the current price into each 'line items' entry. In some situations, the need to be able to produce a perfect copy of that invoice later means that you may need to copy alot of data into the invoices tables. 

    Perhaps a better way would be to maintain historical records: [customer.current_address]->[addresses.id], old addresses are retained, never altered (edits blocked by permissions or, maybe even constraints?), indexed on addresses.user_id, and letters are linked to both customers and addresses. That strikes me as cleaner, but I, like you, do not work in the database field.

     

    Extracted from EM radiation remaining from the end of previous universe -
    //Quantum Bogosort.cpp - Implement Quantum bogosort Algorythym
    // (c) Jamie Bean
    // Dedicated to my brillant mum, Paula
  • 11-05-2012 10:53 PM In reply to

    Re: We need duplicate tables all over the database

    Locked Reply Contact

    atipico:
    Disclameir: I don't understand much about databases so TRWTF might be my lack of knowledge.

    There's the first clue: you're smart enought to assume you might be the problem. The realy jucy WTFs tend to come from people who don't ponder such things.

    If you need to keep historical records of address changes, why not do so in a history table? User ID links into the user's history of address changes. Every time the user changes address (or whatever other items need to be tracked) the "old" info is entered in the history table as a new entry. Much cleaner, doesn't involve massive duplication, etc. Ideally the date/time of change would be included in the history table, so in the hypothetical situation, there's a record allowing the institution to show they sent the papers to whatever address was current.

    I'm not a DB guy either, but I did stay at a Holiday Inn Express last night!

    (edit: boy I type slow in the evening...)

  • 11-06-2012 12:48 AM In reply to

    Re: We need duplicate tables all over the database

    Locked Reply Contact

    Yes, the proper way is to have "versioned" tables. Addresses (for example) are never edited in place, but a new version record is created and referenced. You can store the date when each version became active, giving you a historical record of what address was current at such-and-such time, or you can have things like invoices, packing slips, audit tables etc reference a specific (then-current, but possibly not now-current) version of the address.

    You maintain referential integrity, and also audit-ability, and avoid redundant storage of data (a normalization no-no, one of the worst data sins is being able to ask the same question 2 ways and get 2 conflicting answers).

    Joins can get to be a pain in this model, but views can help with that.

    Signatures are stupid.
  • 11-06-2012 3:49 AM In reply to

    Re: We need duplicate tables all over the database

    Locked Reply Contact
    atipico:

    The solution to this? User fills a form when signing up, double-check his personal data, clicks "OK" and then cannot change it. Personal data gets saved to "user" table. This table will be used only for login purposes. Then the registered user can pick an open position (or more) and apply for it. For each new application, user has to re-enter his personal data -- name, address, all the same stuff again (I'm guessing it will be auto-populated from the user table) -- then double-check it again, clicks OK, and then, guess what, cannot change this set of data too.

    You don't explain exactly what happens here: an insert of the new records? an update with last_update date?

    I've found my self in this situation several times and the "best" solution have been to keep a historic table doing something like: insert into historic_user, update user set active = 0, insert into user new_user, delete from user old_user

    Why is it best? Basically because this historic table will be used only at very specific times and you don't want all that unused data in a table that's going to be hit very hard every day. And no, it's not a WTF since the requirement is very specific, you've got the tools to do it properly and it's a not uncommon case.

  • 11-06-2012 4:53 AM In reply to

    Re: We need duplicate tables all over the database

    Locked Reply Contact

    The fact that your boss is thinking of the table that records the applications as an "applicant" table rather than an "applications" table says to me that WTF is on its way.

    Given your need for accountability and auditability, it seems to me that making a hard and fast rule that DB rows shall only ever get added, never updated or deleted, should push the design in the correct direction. That, along with consistent use of artificial primary keys generated strictly sequentially, should let you build something nicely normalized where it's still easy to query for the latest available data.

  • 11-06-2012 5:00 AM In reply to

    Re: We need duplicate tables all over the database

    Locked Reply Contact
    ubersoldat:
    Basically because this historic table will be used only at very specific times and you don't want all that unused data in a table that's going to be hit very hard every day.

    Is leaving unused historical data in a heavily used table really that much of an issue? I would have thought any halfway decent DBMS would have caching that works well enough to deal with 99% of it. I would also have thought that applicants are not actually very likely to change their contact information frequently enough to bloat the tables all that much. The separate historical table looks like premature optiimization to me.

  • 11-06-2012 7:23 AM In reply to

    Re: We need duplicate tables all over the database

    Locked Reply Contact

     CREATE TRIGGER ON Users FOR UPDATE, DELETE AS {copy record to historical table here}

  • 11-06-2012 7:29 AM In reply to

    Re: We need duplicate tables all over the database

    Locked Reply Contact

    flabdablet:
    I would have thought any halfway decent DBMS would have caching that works well enough to deal with 99% of it. I would also have thought that applicants are not actually very likely to change their contact information frequently enough to bloat the tables all that much. The separate historical table looks like premature optiimization to me.
    Agreed on all fronts. Also, as long as the table had a version number field that was added to the PK, proper indexing could easily take care of any performance issues.

     

  • 11-06-2012 7:33 AM In reply to

    Re: We need duplicate tables all over the database

    Locked Reply Contact

    Sir Twist:
    CREATE TRIGGER ON Users FOR UPDATE, DELETE AS {copy record to historical table here}
    In general, triggers are fucking retarded. They solve the wrong problem while also introducing slowdowns. A properly designed system would eliminate the need for triggers.

     

  • 11-06-2012 8:09 AM In reply to

    Re: We need duplicate tables all over the database

    Locked Reply Contact

    flabdablet:
    Is leaving unused historical data in a heavily used table really that much of an issue? I would have thought any halfway decent DBMS would have caching that works well enough to deal with 99% of it.
     

    Yes, they optimize, no, it's not enough. They'll still spread your data all over the disk, slowing some kinds of access. If those changes are enough of an issue, the correct thing to do is to create another table.

    flabdablet:
    I would also have thought that applicants are not actually very likely to change their contact information frequently enough to bloat the tables all that much. The separate historical table looks like premature optiimization to me.

    Now, I have the same impression here. I'd go with a history table and a view, and change to a current table with a trigger only if needed.

     

  • 11-06-2012 8:11 AM In reply to

    Re: We need duplicate tables all over the database

    Locked Reply Contact

    lolwhat:
    In general, triggers are fucking retarded. They solve the wrong problem while also introducing slowdowns. A properly designed system would eliminate the need for triggers.
     

    In general the Universe is composed of dark energy and dark matter. Also, in general any random person is already dead.

  • 11-06-2012 8:22 AM In reply to

    Re: We need duplicate tables all over the database

    Locked Reply Contact
    atipico:
    So there's a need for a sign up system for applicants and despite we're the information security team we're going to develop this (possibly another WTF, I guess, but I'm new here so what do I know).
    Since it's a project which deals with manipulating all kinds of highly sensitive data, your team should be heavily involved in the design and review all code anyway. Getting you to write the code as well doesn't seem too bad, as long as the team's big enough that people won't be reviewing their own code.
    You'll probably find that the forum experience is improved by going to the "Site Options" tab of "Edit Profile" and turning off "Display User Signatures".
  • 11-06-2012 8:43 AM In reply to

    Re: We need duplicate tables all over the database

    Locked Reply Contact
    lolwhat:

    Sir Twist:
    CREATE TRIGGER ON Users FOR UPDATE, DELETE AS {copy record to historical table here}
    In general, triggers are fucking retarded. They solve the wrong problem while also introducing slowdowns. A properly designed system would eliminate the need for triggers.

     

    [Citation needed]
  • 11-06-2012 9:15 AM In reply to

    Re: We need duplicate tables all over the database

    Locked Reply Contact
    lolwhat:

    Also, as long as the table had a version number field that was added to the PK, proper indexing could easily take care of any performance issues.

    Given that user details won't be updated on anything like a regular schedule, surely a per-user version number field would serve only to allow that user's rows to be retrieved in historical order; the actual numbers wouldn't mean anything. What could an explicit version number achieve that a simple sequential integer PK could not?
  • 11-06-2012 9:20 AM In reply to

    Re: We need duplicate tables all over the database

    Locked Reply Contact

    flabdablet:
    What could an explicit version number achieve that a simple sequential integer PK could not?
    Use the creation date of the new record.

     

  • 11-06-2012 9:47 AM In reply to

    Re: We need duplicate tables all over the database

    Locked Reply Contact
    lolwhat:

    flabdablet:
    What could an explicit version number achieve that a simple sequential integer PK could not?
    Use the creation date of the new record.

    As part of the PK? Why?
  • 11-06-2012 9:55 AM In reply to

    Re: We need duplicate tables all over the database

    Locked Reply Contact

    flabdablet:
    As part of the PK? Why?
    Why does one use a PK?

  • 11-06-2012 10:01 AM In reply to

    • Renan
    • Top 75 Contributor
    • Joined on 08-10-2007
    • Fortaleza, Brazil
    • Posts 731

    Re: We need duplicate tables all over the database

    Locked Reply Contact
    Say, atipico, you don't happen to work for a company which has "talent" on its name, do you?
    snoofle

    That hideousness is what keeps you and I [sic] employed!
  • 11-06-2012 10:06 AM In reply to

    Re: We need duplicate tables all over the database

    Locked Reply Contact
    lolwhat:

    Why does one use a PK?

    Not sure if this is a trick question, so I'll bite. As I understand it, a PK exists to allow for unambigious identification of a given table row, so as to make it possible to join it reliably to related rows in other tables. It's also usually indexed, allowing fast lookup of rows by their PK. If the PK is an integer, and known to have been created by a sequence generator, it also lets you determine the order in which rows were added to the table, so you can do stuff like looking up the last-added row for any given user ID. So I'm still completely unclear on why you seem to be suggesting it would be a good idea to make a composite PK including an explicit per-user version number and/or timestamp, as opposed to a single sequential integer artificial PK. What am I missing here?
  • 11-06-2012 10:24 AM In reply to

    Re: We need duplicate tables all over the database

    Locked Reply Contact

    I'm sure this will infuriate someone, but I'd go so far as to say a timestamp or datetime value is almost always a poor choice of PK. (Date without time might make sense in some uncommon circumstances.)

    Time in general is a pain-in-the-ass to work with properly, because of things like timezones, daylight savings, precision (seconds? milliseconds? microseconds? ticks?), etc. You don't want to have to think about those things every time you're doing a join.

    More importantly, you're just asking to run into concurrency problems and race conditions.

    Signatures are stupid.
  • 11-06-2012 10:54 AM In reply to

    Re: We need duplicate tables all over the database

    Locked Reply Contact
    joe.edwards:
    Time in general is a pain-in-the-ass to work with properly, because of things like timezones, daylight savings, precision (seconds? milliseconds? microseconds? ticks?), etc. You don't want to have to think about those things every time you're doing a join.

    Filed under: Would 5PM CST join successfully to 6PM EST?

    If it's a primary key then you either configure the database to local time or GMT and stick with that. Do timezone conversions in the presentation layer. Let the database use whatever internal format it likes at whatever precision it likes. Just don't change it after going live with production data. I've seen one example where a global company had a data centre in Chicago but no actual operations in Chicago or that entire time zone. The database used Chicago time and that was what you had to use for joins.

    The strange part was the backup data centre is in San Diego. I never worked out if it used local time or Chicago time. I expect that the people who looked after it had actually tested it.

    To answer the original question, I'd like to ask a few more questions: How big is this? (How many thousands of applications?) and How long do you expect it to be active? (Just until the next election?) That would help you decide to do the detailed history-logging or keep it simple with duplicated data. I've seen these sorts of problems for real (financial papers sent to the ex-wife's house) so it is certainly a real problem to be addressed. The "invoices" answer is probably the right one. Keep a copy of what was sent out, even if it's a paper copy in a physical file.

    10$ more and ill install windows 7 premiom on it
  • 11-06-2012 11:22 AM In reply to

    Re: We need duplicate tables all over the database

    Locked Reply Contact

    lolwhat:
    A properly designed system would eliminate the need for triggers.
     

    Surely a properly-designed system wouldn't experience slowdown when using triggers.

    They're the most effective method I know of to add change auditing to a RDBMS.

  • 11-06-2012 11:23 AM In reply to

    Re: We need duplicate tables all over the database

    Locked Reply Contact
    Qwerty:
    Chicago time

    I've noticed operating systems ask me for the nearest major city instead of my timezone; I've always wondered why this is the case. I definitely have to think a little harder to figure out which city is geometrically closest to me (I live in a major city but it's rarely on their list), than I would to just pick my timezone. What's up with that?</tangent>

    Signatures are stupid.
  • 11-06-2012 11:32 AM In reply to

    Re: We need duplicate tables all over the database

    Locked Reply Contact
    joe.edwards:
    I've noticed operating systems ask me for the nearest major city instead of my timezone; I've always wondered why this is the case. I definitely have to think a little harder to figure out which city is geometrically closest to me (I live in a major city but it's rarely on their list), than I would to just pick my timezone. What's up with that?</tangent>

    Because "timezones" are vertical stripes on the map but different cities (usually states, but it doesn't always follow state boundaries) have different daylight-savings rules. Australian eastern time is split into 3 zones by the daylight-savings rules and there's at least 2 other countries with the same timezone - not even in the same hemisphere.

    And most users won't know what timezone they're in but they do know what city they're near.

    10$ more and ill install windows 7 premiom on it
  • 11-06-2012 2:12 PM In reply to

    Re: We need duplicate tables all over the database

    Locked Reply Contact

    atipico:
    Althouhg I can understand the bureacratic need for following the process, and my boss's needs for avoiding future f#$%-ups, is that how things are properly done?
     

    No. Fundamentally it sounds like you want to track changes to data so that although the most recent record is displayed, historical changes to that record can also be displayed for auditing purposes.

    atipico:
    isn't the proposed approach just creating lots of duplicate, dereferenced, potentially inconsistent data? Is this the only way, or the best way, for dealing with this?

    It sounds like your boss has arrived at a solution without properly explaining (or defining) the problem. TRWTF is that he should really explain what the issue is and why it's an issue then leave it to the architects to propose a solution or two, explain the benefits and consequences of selecting one over the other than allow him to decide which should be implemented.

    Most databases have some feature to record changes to data - as someone mentioned, using a trigger is one method. Flicking on FLASHBACK in Oracle allows transaction auditing is another, but it builds up large (internal) tables of all transactions so could be a performance hit. One question should really be: how often is historical data queried, and why? If it's rarely then chucking the old record into a history table along with identifiable information (user, IP, timestamp, etc) should be sufficient to report against changes and show evidence that someone's taking the piss.

    But it does sound like the current method is going to cause everyone to do things in triplicate in the rare case that one piece of paper gets manually amended.

  • 11-06-2012 2:31 PM In reply to

    Re: We need duplicate tables all over the database

    Locked Reply Contact

     

    lolwhat:
    flabdablet:
    As part of the PK? Why?
    Why does one use a PK?

    To uniquely identify the row.. but that doesn't explain the purpose of using the data types you propose.

    I'd settle with a sequential number PK and another column showing a timestamp when it changed.

  • 11-06-2012 4:20 PM In reply to

    Re: We need duplicate tables all over the database

    Locked Reply Contact
    Sir Twist:

     CREATE TRIGGER ON Users FOR UPDATE, DELETE AS {copy record to historical table here}

    Sir, I'm gonna have to ask you to step away from the keyboard...Your admin rights are revoked, effective immediately.

    “The whole modern world has divided itself into Conservatives and Progressives. The business of Progressives is to go on making mistakes. The business of Conservatives is to prevent mistakes from being corrected. Even when the revolutionist might himself repent of his revolution, the traditionalist is already defending it as part of his tradition. Thus we have two great types -- the advanced person who rushes us into ruin, and the retrospective person who admires the ruins. He admires them especially by moonlight, not to say moonshine. Each new blunder of the progressive or prig becomes instantly a legend of immemorial antiquity for the snob. This is called the balance, or mutual check, in our Constitution.”

    -- G. K. Chesterton
  • 11-07-2012 5:48 PM In reply to

    Re: We need duplicate tables all over the database

    Locked Reply Contact

     If the official applications were sent by registered mail it would solve the problem of tracking when the applications were mailed. More cost, but less hassle down the road.

    A sane policy applicatants would need to agree to would also reduce future hassle...

     

Page 1 of 1 (30 items)
Powered by Community Server (Non-Commercial Edition), by Telligent Systems