The Daily WTF: Curious Perversions in Information Technology
Welcome to TDWTF Forums Sign in | Join | Help
in Search

Speed Challenge 38 - What're your rights?

Last post 11-06-2007 10:54 PM by GeneWitch. 28 replies.
Page 1 of 1 (29 items)
Sort Posts: Previous Next
  • 10-02-2007 11:57 AM

    • flop
    • Top 500 Contributor
    • Joined on 04-27-2007
    • Posts 57

    Speed Challenge 38 - What're your rights?

    A real world problem - where something can be learned, I hope!

    You've just been assigned a major design task - storing rights to objects in your program.
    Imagine that there are some kinds of objects, with various actions associated to them; you can fetch the rights from your database, but have to store them in a efficient (space- and time-wise!) manner in your application.

    Example: Some person shall be administrating some hundred PCs in your organization; that consists of distributing software, changing meta-data, creating and deleting PCs in the database, and so on.
    Associating software consists of establishing a relationship between a PC and a software product; so it's a kind of m:n relationship, with some additional meta-data. Now having the "create" right on software products A, B and C for the machines X, Y, and Z should not automatically mean that machine W (which may have products D or E) may also get A, B or C.

    What's the best (easiest/fastest/most efficient) way to store such rights in your application? Hint: The simple way of using a multi-level-hash associating Machine->Product->Right is not efficient, if you assume that you've got several hundred PCs and several hundred products - then you'd get array of many thousand members, which have to be created and managed!

  • 10-02-2007 12:31 PM In reply to

    • flop
    • Top 500 Contributor
    • Joined on 04-27-2007
    • Posts 57

    Re: Speed Challenge 38 - What're your rights?

    To clarify that a bit -

    You've got the rights a and b (change name) on the objects A, B, C, D and E; rights c and d (eg. change IP address) on X, Y and Z; and right e (eg. create) on the relationships between (A,B,C) and (X,Y) and right f (eg. delete) on the relationships between (D,E) and (Y,Z).

    So the simple way to store that in RAM would be something like (in eg. PHP)
    $rights = Array(
        a => Array( A, B, C, D, E ),
        b => Array( A, B, C, D, E ),
        c => Array( X, Y, Z ),
        d => Array( X, Y, Z ),
        e => Array( 
          A => Array( X, Y),
          B => Array( X, Y),
          C => Array( X, Y),
          ),
        f => Array( 
          D => Array( Y, Z ),
          E => Array( Y, Z ),
          ),
        )
    );

    But that's a major array, that has to be created.
    How to do better?

  • 10-02-2007 3:13 PM In reply to

    Re: Speed Challenge 38 - What're your rights?

    I am puzzled as to why you think a multi-level hash would be inefficient. I could design something a little bit faster, or a little bit tighter on memory, but I don't think it would be as much as a factor of 2.
  • 10-03-2007 7:56 AM In reply to

    • flop
    • Top 500 Contributor
    • Joined on 04-27-2007
    • Posts 57

    Re: Speed Challenge 38 - What're your rights?

    Well, with eg. 100 PCs and 100 software products you'd get 100*100 = 10e4 array entries ... and that doesn't really scale up.

    I'm thinking about something on the order of O(n+m) for storage, although not for time ...

  • 10-03-2007 8:31 AM In reply to

    • JvdL
    • Top 200 Contributor
    • Joined on 01-26-2007
    • Spain
    • Posts 177

    Re: Speed Challenge 38 - What're your rights?

    flop:
    Well, with eg. 100 PCs and 100 software products you'd get 100*100 = 10e4 array entries ... and that doesn't really scale up.

    I'm thinking about something on the order of O(n+m) for storage, although not for time ...

    If you have an (almost) full matrix of relationships, the storage is bound to be O(n*m) no matter how you represent the data. If the matrix is sparse (each PC has only handful of software products, which is more likely), it will be O(n*h), where h<<m is the average number of products per PC. If you are know beforehand it will be almost full (h~m), you probably better store the reverse relationship: only the few PC/product combinations that do not exist.

    In any case, 10,000 data entries is not something memory can't handle. In the sparse situation, you should be able to administer tens of thousands of PC's without problem.

    BTW, I can't imagine a situation where you want to loose this kind of information when you crash, so persistence is likely to be a requirement. The obvious solution is storing the info in a DB, keyed on PC and product. Then you can easily administer billions of PCs and products without memory problems.

    flop:
    What's the best way to store such rights in your application?

    The answer to that question depends entirely what you want to do in the application. There is no general answer. My advice (besides "do your own homework") is: "you don't have to scale yet". Choose the solution that's easiest for you and not obviously WTF, like the double hash method. It will survive long enough and degrade only slowly when scaling up, giving you ample time to look for a better solution if and when it is needed.
     

     

  • 10-03-2007 8:45 AM In reply to

    • JvdL
    • Top 200 Contributor
    • Joined on 01-26-2007
    • Spain
    • Posts 177

    Re: Speed Challenge 38 - What're your rights?

    [edit time expired - new post]

    I overlooked that you mentioned that the data is already in a DB anyway. I really don't see what benefit you seek by replicating the DB in memory

    Any DB is equally fast as your double hash - many are faster (because they use query statistics - for example, they will dynamically decide whether to retrieve PC first than product or vice versa). DB's are smart enough to cache in-memory oft-requested information to avoid hitting disks, and smart enough to balance the memory/disk load without hurting themselves or other processes.

    You can't outperform hundreds of thousands of man-years of research and implementation in a speed challenge.

  • 10-03-2007 11:13 AM In reply to

    • flop
    • Top 500 Contributor
    • Joined on 04-27-2007
    • Posts 57

    Re: Speed Challenge 38 - What're your rights?

    Well, sorry - that's what I feared ... having to put a lot of details into a question, so that it's understandable, but no longer works as a challenge. I just thought that's a neat trick, that some would like to think of (or at least hear) ...

    BTW, the problem is solved ... that's why I wrote that "something can be learned".

    Yes, 10 000 data entries is not so much ... but if you've got a bit more, say 300x300, you're at 1e5 ... and that goes up rather than down.
    I've already done my homework, and we'd to scale ...

    Of course, the data is in a database, so there's always a way to query the information from it ... but that's the problem!
    If you want ask the database once and cache that information in the application, to avoid having to ask for each of the some hundred rows you're just about to print (because even at a small latency like 2ms, asking the DB a few hundred times kills your answer time), you need to store it in some way locally.
    You might be able to get that information via some other query that gives your result set ... but that depends on other design questions.

    In case someone's interested, I post my answer below.

    Warning - spoiler ahead.

    The idea was to use a hash that stores a reference to itself as values; so to allow relationships from (A, B, C) to (X, Y) you'd use some code like
    $members=Array( A, B, C, X, Y );

    $list = Array();
    foreach($members as $member)
    {
      $list[$member] &= $list;
    }

    $rights = Array(
        a => Array(
          $list,
    # Other groups for right 'a'
          ),
        b => Array(
    # and so on ...
          ),
        );

    function IsAllowed($right, $from, $to)
    {
      global $rights;

      return (isset($rights[$right][$from]) &&
          isset($rights[$right][$from][$to]));
    }

    That's only O(n+m) in memory, and very nice and quick to query ...

    So ... somebody else?

  • 10-03-2007 11:57 AM In reply to

    • flop
    • Top 500 Contributor
    • Joined on 04-27-2007
    • Posts 57

    Re: Speed Challenge 38 - What're your rights?

    BTW, that should've been
    function IsAllowed($right, $from, $to)
    {
      global $rights;

      foreach($rights[$right] as $index => $hash)
      {
        if (isset($hash[$from]) &&
            isset($hash[$from][$to]))
          return 1;
      }

      return 0;
    }

    Sorry - seems that this was too specific to be any use.
  • 10-03-2007 12:54 PM In reply to

    • JvdL
    • Top 200 Contributor
    • Joined on 01-26-2007
    • Spain
    • Posts 177

    Re: Speed Challenge 38 - What're your rights?

    flop:
    If you want ask the database once and cache that information in the application, to avoid having to ask for each of the some hundred rows you're just about to print [...] You might be able to get that information via some other query that gives your result set ... but that depends on other design questions.

    If want to print a few hundred rows out of a few thousand, there's bound to be some logic that identifies which rows you want. I'm hard pressed to believe that logic cannot be expressed in a single query, so this argument is void.
     

    flop:
    That's only O(n+m) in memory, and very nice and quick to query ...

    It's only O(n+m) if the relationship matrix is always reducible to diagonal block form like

    
      ABCDEF
    X 111000
    Y 111000
    Z 111000
    U 000111
    V 000111
    W 000111
    

    That's an unlikely assumption. In general, your solution's memory requirements are O(n*m). If on the other hand, your matrix is sparse and randomly populated, your solution is less memory-efficient than a double hash. In all case, your solution performs worse than the double hash: O(log(n)*m) instead of O(log(n)+log(m)).

  • 10-03-2007 5:40 PM In reply to

    Re: Speed Challenge 38 - What're your rights?

    Well, I'm not quite done with the code, but here's the structure I came up with. The effectiveness is quite dependent on how many different combinations of rights one has. It also needs to make clones of rights to map machine-product bridges.
     
    For Java, {} denotes a HashMap (with -> to separate keys from values) and () denotes a HashSet (a set where contains() executes in constant time).
     
    {
    	(A, B, C)->(a, b, e1, e2),
    	(D, E)->(a, b, f1, f2)
    }
    
    {
    	X->(c, d, e1),
    	Y->(c, d, e2, f1),
    	Z->(c, d, f2)
    }
    rpar PROTON all
  • 10-03-2007 5:43 PM In reply to

    Re: Speed Challenge 38 - What're your rights?

    I'm not sure how the structure in memory (for a language) could be fundamentally different from the normalised structure in DB, replacing FKs by references, keys and array indices.

    How to compress a rock?
     

    — Flurp.
  • 10-03-2007 6:57 PM In reply to

    • tster
    • Top 10 Contributor
    • Joined on 04-11-2006
    • Natick, MA
    • Posts 1,765

    Re: Speed Challenge 38 - What're your rights?

    You are assuming that the relationships are extremely dense and correlated so that basically what you can do is repetition compression.  When answering your question we have to assume that this is not the case and give you a sane solution.

     

    Solution:  use database relations.   Scales fine.

    If you have 1,000,000 computers (more computers than most organizations have) and each computer has 1,000 application permissions (more applications than almost any computer will have), your table of relationships with have 1 billion rows.  I don't see this as a problem.  I could put this database on my laptop.  If you wanted good performance you could easily fit the table into the memory of a server.


     

    The pig go. Go is to the fountain. The pig put foot. Grunt. Foot in what? ketchup. The dove fly. Fly is in sky. The dove drop something. The something on the pig. The pig disgusting... see bio for the earth shattering ending.
  • 10-03-2007 10:28 PM In reply to

    Re: Speed Challenge 38 - What're your rights?

    tster:

    You are assuming that the relationships are extremely dense and correlated so that basically what you can do is repetition compression.

    This also assumes that there are little or no updates, and an expensive precomputation phase can be performed after every update and before the next query.

    Given those assumptions, there are several ways to pack it; it's a variation on the compression problem, so there is no ideal algorithm.

    For practical purposes and plausible sizes, up to tens of thousands, perl's hashes are certainly good enough on modern hardware, and similar data structures in other languages are also likely to suffice. Memory is not so expensive that you can't throw a couple of gigabytes at the problem and then forget about it.

  • 10-04-2007 4:50 AM In reply to

    • JvdL
    • Top 200 Contributor
    • Joined on 01-26-2007
    • Spain
    • Posts 177

    Re: Speed Challenge 38 - What're your rights?

    Faxmachinen:
    Well, I'm not quite done with the code, but here's the structure I came up with. The effectiveness is quite dependent on how many different combinations of rights one has. It also needs to make clones of rights to map machine-product bridges.
     
    For Java, {} denotes a HashMap (with -> to separate keys from values) and () denotes a HashSet (a set where contains() executes in constant time).
     
    {
    (A, B, C)->(a, b, e1, e2),
    (D, E)->(a, b, f1, f2)
    }

    {
    X->(c, d, e1),
    Y->(c, d, e2, f1),
    Z->(c, d, f2)
    }

    Give up - that's not gonna work. How do you verify whether A has right e1 on X ? Only by iterating keys of the first hash map in linear sequence, testing whether A is a member of the key set (that's logarithmic BTW, not constant ), and if so, testing whether e1 is a member of the value set; then repeating this on the second hash map. In an unlucky constellation that's an O(n*m) algorithm - and I bet it's not hard to prove the unlucky inputs are dense in the set of all possible inputs.

  • 10-04-2007 5:50 AM In reply to

    • JvdL
    • Top 200 Contributor
    • Joined on 01-26-2007
    • Spain
    • Posts 177

    Re: Speed Challenge 38 - What're your rights?

    Off topic in response to Faxmachinen.

    In Java, using a HashSet (or any mutable collection) as a key in a HashMap not a good idea.

    Why? A hash map is able to look up a key value in near constant time as follows. When you apply map.get(key), it first computes key.hashCode(). Then it applies some bit magic to convert hash code to an index into the array of map entries. Then it will walk the list of entries with the same hash code and tests entry.key.equals(key) on each one until it finds the key.

    This only works if key.hashCode() executes in constant time. This is generally only true for immutable objects like strings. To get the hashCode() of a set, Java iterates all elements of the set to combine their hash codes. Than, it iterates it again to test equals(), multiple times. (allocating an iterator object every time)

    So the lookup time in a hash map keyed on sets is O(k) where k is the maximum possible size of the key sets (k=n in this particular problem). Not good.

     
     

  • 10-04-2007 7:18 AM In reply to

    Re: Speed Challenge 38 - What're your rights?

    Hello...

     It might sound like a stupid question...

    But is there a usecase that requiers you to show all rights at the same time? If there is no need to display all rights at once, then you can fare much better with only retrieving what you need. I think that no user is interested in all data at once, and hey wll usually only query for a fraction of the information. So only get the data that is currently requiered by the user to perform his work. Talk about "scaling" and "efficency" are often solved by reducing the amount of data. I recently read a blog post pointing out that most algorythems are fast for small N... So all you need to do is keep your N small...

    And a DB can hadle the storage very well... Model it exactly like you described. Well you could optimize this setup with the introduction of roles, which ould allow you to group lots of "PCs" together to a single group.

  • 10-05-2007 2:01 AM In reply to

    Re: Speed Challenge 38 - What're your rights?

    JvdL:
    This only works if key.hashCode() executes in constant time. This is generally only true for immutable objects like strings. To get the hashCode() of a set, Java iterates all elements of the set to combine their hash codes. Than, it iterates it again to test equals(), multiple times. (allocating an iterator object every time)
    Hmm, that's not quite what I wanted. I guess I could inherit the HashSet to make it return its own memory address instead.
    rpar PROTON all
  • 10-05-2007 3:33 AM In reply to

    • JvdL
    • Top 200 Contributor
    • Joined on 01-26-2007
    • Spain
    • Posts 177

    Re: Speed Challenge 38 - What're your rights?

    Faxmachinen:
    JvdL:
    This only works if key.hashCode() executes in constant time. This is generally only true for immutable objects like strings. To get the hashCode() of a set, Java iterates all elements of the set to combine their hash codes. Than, it iterates it again to test equals(), multiple times. (allocating an iterator object every time)
    Hmm, that's not quite what I wanted. I guess I could inherit the HashSet to make it return its own memory address instead.

    More generally, it's not a good idea to use mutable objects as key.

    HashSet set = new HashSet();
    HashMap map = new HashMap();
    set.add("Foo");
    map.put(set,"Bar");
    map.get(set); // returns bar
    set.add("WTF");
    map.get(set); // returns null

    It still can be useful sometimes to key on a collection (for example: ArrayList of fixed number of primary keys). As long as you're aware of the issues, and don't expose it to public.

     

     

  • 10-05-2007 8:41 AM In reply to

    Re: Speed Challenge 38 - What're your rights?

    Well, I haven't found any immutable alternative to the HashSet so far. Bah, if only Java had references.
     
    JvdL:
    How do you verify whether A has right e1 on X ? Only by iterating keys of the first hash map in linear sequence [...] then repeating this on the second hash map.
    Now why would I want to iterate the keys of the second HashMap?
    rpar PROTON all
  • 10-05-2007 8:54 AM In reply to

    Re: Speed Challenge 38 - What're your rights?

    Ofcourse, I realize that I look like an idiot just after the edit time expires. What I meant to say was, "Bah, if only Java had a reference operator."
    rpar PROTON all
  • 10-05-2007 1:01 PM In reply to

    • JvdL
    • Top 200 Contributor
    • Joined on 01-26-2007
    • Spain
    • Posts 177

    Re: Speed Challenge 38 - What're your rights?

    Faxmachinen:
    Well, I haven't found any immutable alternative to the HashSet so far.
    
    public class ReadOnlyCollection {
    	private Collection base;
    	private int hash;
    	public ReadOnlyCollection(Collection collection){
    		base = collection;
    		hash = collection.hashCode();
    	}
    	public int hashCode() { return hash; }
    	public boolean equals(Object other) {
    		if( other instanceof ReadOnlyCollection ) {
    			ReadOnlyCollection that = (ReadOnlyCollection) other;
    			// Note: fail fast on hash, because equals() is O(n)
    			return this.hash == that.hash && this.base.equals(that.base);
    		}
    		return false;
    	}
    	public int size() { return base.size(); }
    	public boolean contains(Object member) { return base.contains(member);}
    	public Iterator iterator() { return new ReadOnlyIterator(base.iterator());}
    	// Add more immutable collection methods as desired..
    }
    public class ReadOnlyIterator implements Iterator {
    	private Iterator base;
    	public ReadOnlyIterator(Iterator i) { this.base = i; }
    	public void remove() { throw new UnsupportedOperationException(); }
    	public boolean hasNext() { return base.hasNext(); }
    	public Object next() { return base.next(); }
    }
    
  • 10-05-2007 1:16 PM In reply to

    • JvdL
    • Top 200 Contributor
    • Joined on 01-26-2007
    • Spain
    • Posts 177

    Re: Speed Challenge 38 - What're your rights?

    Faxmachinen:
    Now why would I want to iterate the keys of the second HashMap?

    I assumed you would support like

    {
    (X,Y)->(c, d, e1),
    (Z)->(c, d, f2)
    }

     

  • 10-06-2007 1:19 PM In reply to

    Re: Speed Challenge 38 - What're your rights?

    Good stuff. Thanks.
    rpar PROTON all
  • 10-07-2007 4:34 PM In reply to

    • sobani
    • Not Ranked
    • Joined on 10-07-2007
    • Posts 8

    Re: Speed Challenge 38 - What're your rights?

    JvdL, you could go through all that trouble, or you could try:
    java.util.Collections.unmodifiableCollection(collection);
    Or, maybe even better:
    java.util.Collections.unmodifiableSet(set);
    :-p
  • 10-07-2007 6:08 PM In reply to

    • JvdL
    • Top 200 Contributor
    • Joined on 01-26-2007
    • Spain
    • Posts 177

    Re: Speed Challenge 38 - What're your rights?

    sobani:
    JvdL, you could go through all that trouble, or you could try:
    java.util.Collections.unmodifiableCollection(collection);
    Or, maybe even better:
    java.util.Collections.unmodifiableSet(set);

    :-p

    Except that these unmodifiable collections exhibit the same problem that was the reason for the trouble in the first place. Collections.unmodifiableSet(base).hashCode() does not execute in constant time, it calls the base.hashCode() unabridged, which iterates the base.
     

  • 10-08-2007 6:15 AM In reply to

    • ammoQ
    • Top 10 Contributor
    • Joined on 04-13-2005
    • Vienna.Austria.Europe.Earth
    • Posts 3,445

    Re: Speed Challenge 38 - What're your rights?

    I tend to use the "just look up in the database solution". If I wanted to use a hashtable (or hashset), it would be just one big instance. Nothing with "multi-level gadagada". I'd use classes like the following example to combine multible keys into one. The only obvious drawback is that it can't be used to list privileges on something without iterating through the whole table, but that should be done by looking into the database tables anyway. Depending on how often I have to check the same privileges again and again, the hashtable could be used as some kind of cache only - the first time (privilege not in DB), I check the database; next time, it's in the hashtable.

     

    public class MachineSoftwareRight {
        String machine;
        String software;
        String right; 

        public String getMachine() { return machine; }
        public String getSoftware() { return software; }
        public String getRight() { return right; }

        public MachineSoftwareRight(String machine, String software, String Right) {
           this.machine=machine;
           this.software=software;
           this.right=right;
        }

       public int hashCode() {
        return machine.hashCode()^software.hashCode()^right.hashCode();
       }

       public boolean equals(Object o) {
           if (o instanceof MachineSoftwareRight) {
              MachineSoftwareRight r = (MachineSoftwareRight) o;
              return machine.equals(r.getMachine()) && software.equals(r.getSoftware()) && right.equals(r.getRight());
           }
           else return false;
       }
    }
     

     

     


             

     

     



     

    beanbag girl 4ever
  • 11-06-2007 10:35 PM In reply to

    • GeneWitch
    • Top 100 Contributor
    • Joined on 12-23-2006
    • Orange County, CA
    • Posts 310

    Re: Speed Challenge 38 - What're your rights?

    Use a byte to represent each set of access restrictions, and an array where size=number of machines.

    then use bitmask flags (or the proper way of saying that).

    That will cover 255 different configurations of software.

    you are using 100 bytes + overhead for the whole thing.

     

    am i wrong here?

    According to the rules of engagement it appears i am not wrong. and if your "administrator" is too lazy to use numerical IDs, write a simple hash to write the bytes to an array that's still barely larger than the amount of computers you're administrating.

    sudo make me a sandwich
  • 11-06-2007 10:47 PM In reply to

    • GeneWitch
    • Top 100 Contributor
    • Joined on 12-23-2006
    • Orange County, CA
    • Posts 310

    Re: Speed Challenge 38 - What're your rights?

    This makes my solution O(m+(n/8)) bytes in memory, i believe. which per your agenda is better than O(m+n).

     

    Oh i forgot the edit timed out. if you do 8 bit masks (config "A" = 01000000 and config "B" =00100000) then it's O(m+(n/8)) otherwise it's O(m+(n/255)).

     take your pick.

    sudo make me a sandwich
  • 11-06-2007 10:54 PM In reply to

    • GeneWitch
    • Top 100 Contributor
    • Joined on 12-23-2006
    • Orange County, CA
    • Posts 310

    Re: Speed Challenge 38 - What're your rights?

    GeneWitch:

    This makes my solution O(m+(n/8)) bytes in memory, i believe. which per your agenda is better than O(m+n).

     

    Oh i forgot the edit timed out. if you do 8 bit masks (config "A" = 01000000 and config "B" =00100000) then it's O(m+(n/8)) otherwise it's O(m+(n/255)).

     take your pick.

    I of course meant O(m) for 1 byte (8 configurations). each set of 8 would increase m by 1 (2m, 3m, etc); the 255 option one is of course more efficient in memory, but harder on anyone trying to use it without a reference chart. O((n/8)*m)... i'm too tired to work this out. haha.

    sudo make me a sandwich
Page 1 of 1 (29 items)
Powered by Community Server (Non-Commercial Edition), by Telligent Systems