|
PL/SQL fuzzy logic
Last post 06-23-2009 10:33 AM by dhromed. 42 replies.
-
04-23-2009 3:33 PM
|
|
-
jlaiho


- Joined on 04-23-2009
- Posts 4
|
Yes, null does, for some cases, equal to an empty string in Oracle. That's old news. What's more interesting is that in boolean context null is neither true nor false, so we have a very nice fuzzy logic engine at our perusal. Combined with some PL/SQL features, this becomes entertaining. Consider the following example: declare -- declare a table of single-character elements type tab_t is table of varchar2(1); -- and declare a variable of that table type tab tab_t; -- write a procedure to tell whether an element exists in the table; -- check the result both with straightforward logic and with negated logic; -- it is expected that both checks print out the same result procedure check_member(elem varchar2, tab tab_t) as cond varchar2(20); begin cond := 'Non-negated'; if elem member of tab then dbms_output.put_line(cond||': Is member'); else dbms_output.put_line(cond||': Is not member'); end if; cond := 'Negated'; if not (elem member of tab) then dbms_output.put_line(cond||': Is not member'); else dbms_output.put_line(cond||': Is member'); end if; dbms_output.put_line(''); end check_member; begin -- first test; this is fine dbms_output.put_line('Is "a" member of list ("a")'); check_member('a',tab_t('a'));
-- second test; this is fine dbms_output.put_line('Is "a" member of list ("b")'); check_member('a',tab_t('b'));
-- third test; looks good dbms_output.put_line('Is "a" member of list ("a",null)'); check_member('a',tab_t('a',null));
-- fourth test; not so good dbms_output.put_line('Is "a" member of list ("b",null)'); check_member('a',tab_t('b',null)); end;
For the first three test cases the results for both negated and non-negated tests are equal and as expected; "Is member" for the first and third, and "Is not member" for the second test case. The fourth one is the oddity; it'll print out:
Is "a" member of list ("b",null) Non-negated: Is not member Negated: Is memberAfter some perusal of Oracle manuals, this appears to be caused by the null handling in boolean Oraclean logic. Result of a boolean operation where one of the operands is null, will be null. So, apparently, the presence of a null element in the list will, for the case where the searched element does not exist in the list, causes the "member of" operation to return null. And "not null" is "null" as well, and as a result for both tests the execution falls in the "else" branch (because null is not true). Curiously, for the case where the list does contain the searched element, the presence of null does not have any unexpected effects. And yes, I did also test different ordering of the list members, as well as lists with more than just two members; these things do not seem to change the results. Let's say it took some time to dig this out from a piece of misbehaving code.
|
|
-
-
LoztInSpace


- Joined on 01-11-2006
- Melbourne
- Posts 120
|
It does make sense, and the code is only 'misbehaving' because you have a boolean test for tri-state logic. i.e. you are missing some conditions.
If you consider null to be 'unknown value' and not some kind of 'special zero or blank' it makes more sense.
Is 'a' in the list ('a',null) - definately yes.
Is 'b' in the list ('a') - definately no.
Is 'b' in the list ('a',null) - well I'm not sure, because even though it's not in the ones I know about there is an unknown value.
You are missing "is null" in your tests. The behaviour is logical, you're just not checking correctly.
|
|
-
-
jlaiho


- Joined on 04-23-2009
- Posts 4
|
In a way your explanation does make sense. And that would bring a kind of solution, so instead of plain
if not (elem member of list)
it would be needed to write
if ((elem member of list) is null) or (not (elem member of list))
which just doesn't seem that much readable any more - unless I wrap it again into a function that really does return a boolean and not null or some other odd creature. Especially when I can write the reverse of the same condition without any special clause.I would accept failure of the comparision if the list itself was null, but not when a member of the list is null - because that's what the list operators are for - to allow me to work on sets of data without knowing in advance what is inside the list.
|
|
-
-
LoztInSpace


- Joined on 01-11-2006
- Melbourne
- Posts 120
|
I'm glad my explanation makes sense 'in a way'. Unfortunately for you it actually makes sense 'in a way that reflects the reality of SQL', so you have to live with it.
If you have unknowns in your data then you have to handle them. I think you'll find that you cannot dodge the issue by writing the reverse, assuming you mean get a non-null answer to "is this in my list" (see my last example).
What you've done so far is to establish if an element is not in a list or the list contains an unknown value, and this is the best you will be able to do.
What you need to do is determine what happens when the answer is 'I don't know' or completely remove this possibility by using a NOT NULL constraint on the underlying data.
Whether you wrap it up in a neat little function or not is irrelevent until you get past this part.
|
|
-
-
-
LoztInSpace


- Joined on 01-11-2006
- Melbourne
- Posts 120
|
"I think jlaiho's confusion is warranted"
no it isn't. Why not? Because:
"The same issues exist in every SQL implementation"
Tri-state is not illogical or stupid or anything, he just doesn't know it (which is quite a gap for a developer expected to use SQL but that's another story).
As for your assertion that SQL is odd, well how would you handle yes/no/unknown values is Java/C++/VB? You have to store and handle it somehow, it just happens SQL does it for you (once you understand it). If there is no requirement for unknown then simply ensure the column is not null and you're done.
|
|
-
-
Welbog


- Joined on 02-08-2007
- Posts 600
|
LoztInSpace:Tri-state is not illogical or stupid or anything, he just doesn't know it (which is quite a gap for a developer expected to use SQL but that's another story).
Illogical, no. Stupid, I would say yes. But that's subjective.
LoztInSpace:As for your assertion that SQL is odd, well how would you handle yes/no/unknown values is Java/C++/VB? You have to store and handle it somehow, it just happens SQL does it for you (once you understand it). If there is no requirement for unknown then simply ensure the column is not null and you're done.
I would do it the way nearly every other formal system does it: true and false. Two-state logic, two-state operations. No special logical operators that apply to nulls differently compared to other operands.
In the OP's example, `'a' member of ('b',null)` should return false. Because it's false. {'b',null} is a set, and the element 'a' is not a member of that set. That is how logic works in nearly every other formal system, because that is the intuitive and natural way logic works. If an operator is applied to a null and that operation doesn't make any sense, then the operation should either take a parameter to tell it what it should do if it sees a null (i.e. treat as true, treat as false, skip, fail) or have an exception mechanism. But if an operation does make sense to be applied to nulls (like member of), then regular two-state boolean return values should be used.
SQL's logic not only goes against established formal system tradition but also against human intuition regarding operations. That's an utter failure at usability, and I have yet to see a real benefit from treating logic and nulls this way in my years of dealing with SQL. In fact, I have seen quite the opposite: queries that are much longer than they would have been if nulls weren't treated as special magical values with their own operations and return values.
|
|
-
-
bstorer


- Joined on 02-01-2007
- Alexandria, VA
- Posts 4,131
|
LoztInSpace:"I think jlaiho's confusion is warranted"
no it isn't. Why not? Because:
"The same issues exist in every SQL implementation"
This is not a defense. Stupidity is not validated by repetition. LoztInSpace:Tri-state is not illogical or stupid or anything
It's illogical in the sense that it behaves contrary to the way everything else does. LoztInSpace:As for your assertion that SQL is odd, well how would you handle yes/no/unknown values is Java/C++/VB?
First of all, I wouldn't use a word like "NULL", which has distinct connotations, when I really ought to use "UNKNOWN". Hey, I could even use "UNK" and save everyone a character! Plus, UNK is fun to say. UNK, UNK, UNK. Second, I'd recognize that programmers want a yes/no answer. It's all well and good to have a ternary logic, but then why aren't the control structures designed for it? It's like they went out of their way to make SQL difficult to use. LoztInSpace:You have to store and handle it somehow, it just happens SQL does it for you (once you understand it).
Or, here's a novel idea, design your language on behalf of the developers, not the mathematical theory. Yes, unknowns are all well and good, but making the default behavior of a language tedious and prone to error isn't a good way to go about it. The times where I want a yes/no answer far outweigh the times when I want a yes/no/unknown answer, so design with that in mind. The time to be tedious is when I really need that extended functionality, not the simple, every-day stuff.
 These puppies are Nazis. They aren't being obvious about it, but I can tell.
|
|
-
-
bjolling


- Joined on 06-08-2008
- Belgium
- Posts 739
|
Welbog: Illogical, no. Stupid, I would say yes. But that's subjective.
LoztInSpace:As for your assertion that SQL is odd, well how would you handle yes/no/unknown values is Java/C++/VB? You have to store and handle it somehow, it just happens SQL does it for you (once you understand it). If there is no requirement for unknown then simply ensure the column is not null and you're done.
I would do it the way nearly every other formal system does it: true and false. Two-state logic, two-state operations. No special logical operators that apply to nulls differently compared to other operands.
As always, two-state and tri-state logic have their place and the OP needs to know about the difference. SQL offers both options - make your column non-nullable if you want two-state instead of complaining about it. I'm currently working on an application that processes insurance claims. Unlike bstorer, I have to deal with many questions that are not just yes/no but often are yes/no/don't know. Or not all questions are mandatory. If the answer to a question is unknown, you can't assume that it is false by default. Two-State driving under influence
isDriving? | and | isDrunk? -----------+-----+--------- Yes | Yes | Yes Yes | No | No No | No | Yes No | No | No
Tri-State driving under influence adds:
isDriving? | and | isDrunk? -----------+-----+--------- ? | ? | Yes ? | ? | No Yes | ? | ? No | ? | ? ? | ? | ?
If someone isDriving but you don't know if he isDrunk, then you also don't know if he was driving under influence. No fuzziness here - please move along :-)
 It's... Monkey Piston's Frying Circle!
|
|
-
-
LoztInSpace


- Joined on 01-11-2006
- Melbourne
- Posts 120
|
bstorer:
LoztInSpace:
"I think jlaiho's confusion is warranted"
no it isn't. Why not? Because:
"The same issues exist in every SQL implementation"
This is not a defense. Stupidity is not validated by repetition.
That's not whay I am saying. Confusion is not warranted because all databases behave like this. You don't have to like it but that's the fact and that's why it should not be confusing.
bstorer:
LoztInSpace:Tri-state is not illogical or stupid or anything
It's illogical in the sense that it behaves contrary to the way everything else does. LoztInSpace:As for your assertion that SQL is odd, well how would you handle yes/no/unknown values is Java/C++/VB?
First of all, I wouldn't use a word like "NULL", which has distinct connotations, when I really ought to use "UNKNOWN". Hey, I could even use "UNK" and save everyone a character! Plus, UNK is fun to say. UNK, UNK, UNK.
Agree. I'd love to see the distinction between unknown, not supplied, empty string (I kid - I kid!) etc. It's a bit overloaded.
bstorer:
Second, I'd recognize that programmers want a yes/no answer. It's all well and good to have a ternary logic, but then why aren't the control structures designed for it? It's like they went out of their way to make SQL difficult to use.
Of you want yes/no then use NOT NULL constraints. If you allow nulls in boolean then you need to deal with tri-state. How can you not? You have 3 values! As for the control structures, they are there - you just don't like them (assuming you mean using IS NULL)
bstorer:
LoztInSpace:You have to store and handle it somehow, it just happens SQL does it for you (once you understand it).
Or, here's a novel idea, design your language on behalf of the developers, not the mathematical theory. Yes, unknowns are all well and good, but making the default behavior of a language tedious and prone to error isn't a good way to go about it. The times where I want a yes/no answer far outweigh the times when I want a yes/no/unknown answer, so design with that in mind. The time to be tedious is when I really need that extended functionality, not the simple, every-day stuff.
Same as above. You use NOT NULL when you want a yes/no answer. This stuff is driven by business requirements, data models and designs that reflect those requirements. You have to choose - always. It's no more tedious or error prone than deciding whether a column or variable has to be a number, date, varchar or whatever (although some seem to get that wrong pretty regularly too). Are you seriously asserting that all these 'problems' are actually down to some arbitrary syntax of the CREATE TABLE statement?
By the way, the GUI table creator in SQL Server seems to default the 'allow nulls' tick box to be off, which tends to be correct.
|
|
-
-
Welbog


- Joined on 02-08-2007
- Posts 600
|
LoztInSpace:Of you want yes/no then use NOT NULL constraints. If you allow nulls in boolean then you need to deal with tri-state. How can you not? You have 3 values! As for the control structures, they are there - you just don't like them (assuming you mean using IS NULL)
The OP's example wasn't storing a null in a boolean-type column, it was a return value from a built-in function. I think that it should be up to the designer to decide how nulls in boolean columns should be handled using some kind of flag as part of the query (or some other mechanism) instead of globally saying "null has the meaning 'might be true or false'", which is an assumption that SQL forces upon you. Null doesn't always mean that, and in my experience it normally means something more like "I don't give a damn what the value is" I don't like that my database software imposes assumptions regarding the meaning of data this way.
But more in line with the OP's confusion, his "member of" function does not return false when asked if 'a' is a member of the set {'b',null}. This isn't rocket science, it's basic set theory: 'a' is not a member of {'b',null}. SQL is imposing the assumption that null "might be" 'a', so it returns null instead of false. That is a confusing imposition that very few formal systems make, because the majority of formal systems let you define for yourself what null means. Just because the imposition is well documented and easy to understand once you know what it is doesn't make it a good idea or even a straightforward idea. It's confusing and unnecessary. It's counterintuitive and usually results in larger, more complicated queries.
That's what this is. It's perfectly logical, understandable, sound, unnecessary and counterintuitive. Null, intuitively, has no meaning other than "no other value is stored". SQL is designed under the assumption that null means "this could represent anything". It is my opinion that this is a bad assumption, because that's not what null means in every scenario. You could argue that since SQL was designed this way, that if you're using null to mean something other than "this could represent anything" you're using SQL incorrectly. That's fair, but all of my other systems use nulls differently so then I would have to overengineer a solution to store my meaningless nulls meaningfully in SQL. To me, SQL nulls are designed wrong, plain and simple.
|
|
-
-
bstorer


- Joined on 02-01-2007
- Alexandria, VA
- Posts 4,131
|
Welbog:But more in line with the OP's confusion, his "member of" function does not return false when asked if 'a' is a member of the set {'b',null}. This isn't rocket science, it's basic set theory: 'a' is not a member of {'b',null}. SQL is imposing the assumption that null "might be" 'a', so it returns null instead of false. That is a confusing imposition that very few formal systems make, because the majority of formal systems let you define for yourself what null means. Just because the imposition is well documented and easy to understand once you know what it is doesn't make it a good idea or even a straightforward idea. It's confusing and unnecessary. It's counterintuitive and usually results in larger, more complicated queries.
This, I think, is the key point. Mathematically valid or not, consistent across all DBs or not, it's still unacceptable because it makes our jobs harder.
 These puppies are Nazis. They aren't being obvious about it, but I can tell.
|
|
-
-
jlaiho


- Joined on 04-23-2009
- Posts 4
|
Welbog:That's what this is. It's perfectly logical, understandable, sound, unnecessary and counterintuitive. Null, intuitively, has no meaning other than "no other value is stored". SQL is designed under the assumption that null means "this could represent anything". It is my opinion that this is a bad assumption, because that's not what null means in every scenario. You could argue that since SQL was designed this way, that if you're using null to mean something other than "this could represent anything" you're using SQL incorrectly. That's fair, but all of my other systems use nulls differently so then I would have to overengineer a solution to store my meaningless nulls meaningfully in SQL. To me, SQL nulls are designed wrong, plain and simple. It's the OP here; I've been silent, and listening - erm, reading. Thanks for all about the discussion so far, it's been an educating one. When reading the above, it brought to my mind some claims done before y2k that there are systems where "no year given" could be stored as "99". Somehow it starts to sound as an echo of the above: if you have a "not null" numeric column, then you'll have to use one of your real data values to signify "no data", because the null is forbidden, and even if allowed would only mean "there might or might not be data". But now I fear that I'll be attacked by data design purists insisting to tell me that there should not be that kind of column at all, but instead the data that does not occur on each row should be split off to a separate table - at which point lack of row in the other table would indicate lack of data. But this, of course, would be again futzed by some apps programmer doing an outer join of these two tables, and being able to retrieve a null for the row where there was no value. Oh my.. "no value" just changed into "might or might not be a value". Perhaps there really should also be a token for a real "no value - this variable has intentionally been left blank", in addition to the current "perhaps someone just forgot to jot down the value". And thanks for whoever it was who wrote the example where the "don't know" answer truly made sense; it helped to illustrate the value of three-value logic.
|
|
-
-
niik


- Joined on 05-06-2009
- Posts 11
|
bjolling:Tri-State driving under influence adds:
isDriving? | and | isDrunk? -----------+-----+--------- ? | ? | Yes ? | ? | No Yes | ? | ? No | ? | ? ? | ? | ?
If someone isDriving but you don't know if he isDrunk, then you also don't know if he was driving under influence. No fuzziness here - please move along :-)
In fact, the 3-state logic
according to relational theory is:
isDriving? | and | isDrunk? -----------+-----+--------- ? | ? | Yes
? | No | No Yes | ? | ? No | No | ? ? | ? | ?
and it's absolutely correct (if
driver was not driving, who cares if he was drunk, and if he's not
drank he's not drunk driver - driving or not).
I
believe the same logic is implemented in Oracle (SQL serve has it implemented since SQL 2000).
The basic and most common
misunderstanding of the meaning of NULL lays in the fact that most of the
people tend to apply one meaning to it. Unfortunately that's not right. Null
was introduced into relational theory to cover temporary unavailable information
(because touple - or row if you prefer - can not contain null values, but null
is necessary to allow existence of temporary incomplete touples). Later on even
Codd started experimenting with idea of having two types of null - one
representing simple missing information and other one that will represent
unknown and/or not applicable. Wrong, in my opinion. Null should be avoided as
much as possible, and should be used only to allow temporary missing values of
the attribute for witch we are sure has a value. E.g. DateOfBirth - it could be
unknown, but we are sure everyone was born on some date. Personally, I would
recommend to keep incomplete rows in a separate table, if it is possible of
course.
If you have column where null means
not aplicable (like SSN for non-US citizens), you have table that is not
normalized. If your table is fully normalized, you should have tables
Person(PersonID, Name, Surname,...) and PersonSSN(PersonID,SSN). But this does
not solve the problem, because view that join (outer) those two tables would
still have null for a non-US and we could not know if it means "not
applicable" or "missing". Or, maybe, we can?
PersonID | Name | Surname PersonID | SSN -----------+-----+--------- ----------+----- 1 | John | Smith 1 | aaa 2 | John | Doe 2 | NULL 3 | Johan | Strauss 5 | bbb 4 | Jean | Morris 5 | George| West
It's easy to make a query that will
tell you PersonID 3 and 4 are non-US with not applicable SSN, but for 2 SSN is
missing.
What happens if you want to list all
the people with SSN different than given one (or the range of SSN). This is
usual complain against three-state-logic, but it actually tells more about
question. Do you realy know what you want with this question?
- I want list of the people with SSN
different than given one, and I do not want foreigners in the list
- I want list of the people with SSN different than given one, with all the foreigners
in the list
But there is another question: how
do I want to treat Mr. John Doe?
- I do not know his SSN and I
do want him to appear in the list because I want list of all the people that do
not have this SSN together with a people that maybe don't have it (Mr. John Doe
IS on the list), or
- I do want a list of a people I'm 100% sure do not have this SSN (in which
case Mr. John Doe will not appear on the list)
Combining previous two cases, we can
write 4 different queries, providing 4 different result sets. Which one is
right, depends on the question.
You see, proper using of NULL values
gives you opportunity to provide right answer on the right (well defined)
question. What kind of two-state-logic could provide it? No one, except if you
define "special case SSN" with a meaning on "UNKNOWN" and
"NOTAPPLICABLE". That's fine, but you will still need to understand
question because you can provide 4 different answers even in this case.
Welbog:But more in line with the OP's
confusion, his "member of" function does not return false when asked
if 'a' is a member of the set {'b',null}. This isn't rocket science,
it's basic set theory: 'a' is not a member of {'b',null}. SQL is
imposing the assumption that null "might be" 'a', so it returns null
instead of false. That is a confusing imposition that very few formal systems
make, because the majority of formal systems let you define for yourself what
null means. Just because the imposition is well documented and easy to
understand once you know what it is doesn't make it a good idea or even a
straightforward idea. It's confusing and unnecessary. It's counterintuitive and
usually results in larger, more complicated queries.
Wrong.
I mean, I agree with "This
isn't rocket science, it's basic set theory", but I do not agree with the
statement: 'a' is not a member of {'b',null} .
Truth is - we do not know if a
is member of {b, null} because null, as I said before, means unknown. You have
set (basket?) containing 1 apple and one box. We don't know what is in the box
(aka null).Can you tell you are 100% sure there's no pear in the box? You can
not, because you don't know if inside box is another apple, pear, plum or box
is empty.
Why this is not obvious? Simple,
because people mix null with the empty - those two terms are not synonyms. The
statement "Pear is not a member of set containing apple and empty
box" is true. But, statement "Pear is not a member of set containing
apple and box" (missing word is empty, and box represents null, meaning we
do not know what's inside box) is neither true nor false, because we do not
know.
Three-state-logic. Like it or not,
but you need it if you want to give right answer on right question. Don't blame
RDBMS if you didn't understand question or customer didn't know what he
wants.
By the way, could anyone explain me
how all previous cases will be easy and simple to solve in any other
programming language?
I apologize if you find my grammar and/or vocabulary is poor and
wrong, but I'm not a native English speaker, and I do my best to improve.
|
|
-
-
bjolling


- Joined on 06-08-2008
- Belgium
- Posts 739
|
niik: bjolling:Tri-State driving under influence adds:
isDriving? | and | isDrunk? -----------+-----+--------- ? | ? | Yes ? | ? | No Yes | ? | ? No | ? | ? ? | ? | ?
If someone isDriving but you don't know if he isDrunk, then you also don't know if he was driving under influence. In fact, the 3-state logic
according to relational theory is:
isDriving? | and | isDrunk? -----------+-----+--------- ? | ? | Yes
? | No | No Yes | ? | ? No | No | ? ? | ? | ?
Heh, how did I miss that? Next time I'll engage my brain before posting, I promise.
 It's... Monkey Piston's Frying Circle!
|
|
-
-
Welbog


- Joined on 02-08-2007
- Posts 600
|
niik:Why this is not obvious? Simple, because people mix null with the empty - those two terms are not synonyms. The statement "Pear is not a member of set containing apple and empty box" is true. But, statement "Pear is not a member of set containing apple and box" (missing word is empty, and box represents null, meaning we do not know what's inside box) is neither true nor false, because we do not know.
Spoken like a true SQL programmer.
In most programming languages, null has the meaning "no value is associated with this variable or expression". Empty has the meaning "this variable of expression has an empty value". The distinction exists and is very real (because it is an important distinction). In parsing theory, the DFA that accepts the null string is different from the DFA that accepts the empty string. The DFA that accepts the null string doesn't accept any string. It not that it "maybe" accepts some string, it just doesn't accept any string. That is the nature of null as I see it. Null doesn't mean that this expression "might" have a value, it means that the expression doesn't have a value. If, like in your examples, null does mean that it "might" have a value, then it's up to you to treat null that way. In my experience null means "no value" more often than it means "might have a value".
It should not be the RDBMS that decides what null means; it should be up to me.
|
|
-
-
bstorer


- Joined on 02-01-2007
- Alexandria, VA
- Posts 4,131
|
niik:The basic and most common
misunderstanding of the meaning of NULL lays in the fact that most of the
people tend to apply one meaning to it. Unfortunately that's not right. Null
was introduced into relational theory to cover temporary unavailable information
(because touple - or row if you prefer - can not contain null values, but null
is necessary to allow existence of temporary incomplete touples). Later on even
Codd started experimenting with idea of having two types of null - one
representing simple missing information and other one that will represent
unknown and/or not applicable. Wrong, in my opinion. Null should be avoided as
much as possible, and should be used only to allow temporary missing values of
the attribute for witch we are sure has a value. E.g. DateOfBirth - it could be
unknown, but we are sure everyone was born on some date. Personally, I would
recommend to keep incomplete rows in a separate table, if it is possible of
course.
If you have column where null means
not aplicable (like SSN for non-US citizens), you have table that is not
normalized. If your table is fully normalized, you should have tables
Person(PersonID, Name, Surname,...) and PersonSSN(PersonID,SSN).
Look, I know the mathematics behind the relational model, and I'm certain Welbog does, too. But you're missing the point entirely: nobody cares what the mathematical basis is. The adherence to the mathematics are an impedence to getting work done. We have real jobs to do and we aren't interested in things that make those jobs harder. I can count on one hand the number of times I've had a situation where I was looking for '1234' and would accept NULL in its place. On the other hand, the number of times I need '1234' and only '1234' are too numerous to count. Over-normalized data and the plethora of joins that result are more often a hinderance than a help. There's a reason we don't write our applications in Prolog. If you want to have this behavior in some academic database package, knock yourself out, but get it out of my production system.
 These puppies are Nazis. They aren't being obvious about it, but I can tell.
|
|
-
-
Welbog


- Joined on 02-08-2007
- Posts 600
|
bstorer:Look, I know the mathematics behind the relational model, and I'm certain Welbog does, too.
I remember when I took database design in university and we covered tri-state logic. I thought, "Hmm, that's neat, I guess," when it was first introduced. Then we had an assignment that dealt with it and nulls and I quickly learned that all of the extra checks I had to do were annoying and tedious. Nothing about tri-state logic is difficult to use in practice, it's just annoying as shit.
|
|
-
-
niik


- Joined on 05-06-2009
- Posts 11
|
Welbog:It should not be the RDBMS that decides what null means; it should be up to me.
Look, we practicaly agree about everything, except the small issue you stated at the end. In fact, I agree with that sentence too, but it seems we read it in different way. It is up to you, You want all peeple with SSN different then 'aaa' including the people without SSN, you write WHERE SSN is Null or SSN<>'aaa'. You want list of people that have SSN, but different than 'aaa' you write WHERE SSN is Not Null and SSN<>'aaa'. It's not a big deal, it's clear on first sight, and at the end it was you that gave meaning to the NULL.
You have opportunity to choose meaning, thanks to the 3-state-logic and RDBMS
|
|
-
-
bstorer


- Joined on 02-01-2007
- Alexandria, VA
- Posts 4,131
|
niik:It is up to you, You want all peeple with SSN different then 'aaa' including the people without SSN, you write WHERE SSN is Null or SSN<>'aaa'. You want list of people that have SSN, but different than 'aaa' you write WHERE SSN is Not Null and SSN<>'aaa'. It's not a big deal, it's clear on first sight, and at the end it was you that gave meaning to the NULL.
You have opportunity to choose meaning, thanks to the 3-state-logic and RDBMS
Earlier in this thread I pointed out that it's asinine to use a ternary logic internally and then only provide binary control structures. Thanks for demonstrating that so nicely.
 These puppies are Nazis. They aren't being obvious about it, but I can tell.
|
|
-
-
niik


- Joined on 05-06-2009
- Posts 11
|
bstorer:nobody cares what the mathematical basis is. The adherence to the mathematics are an impedence to getting work done.
It's hard to imagine something more wrong that those sentencies. It's pure example of, somethimes very skilled and briliant, developers which creates applications that runs just thanks to the rawprocessor power, huge amount of RAM and fast IO. Because they are ignorant, because they tend to think that only application code and developement speed counts and mostly because they don't want to waste their precious time to stupid and uslessthings like design and modeling. They can do everything on the fly, just give them an IDE and they'll make tables as they need them They are wizards. Andat the end they blame RDBMS, underscaled hardware, slow OS or incompetent DBA. Typical. Work done and work well done are two different things. bstorer:We have real jobs to do and we aren't interested in things that make those jobs harder
I have a real job, and my duty last 25 years was and is to have jobs well done. bstorer: I can count on one hand the number of times I've had a situation
where I was looking for '1234' and would accept NULL in its place.
Where SSN is null orSSN='1234' bstorer:On
the other hand, the number of times I need '1234' and only '1234' are
too numerous to count.
Where SSN='1234' Where is the problem, you have shorter where clause in your more often case
bstorer:Over-normalized data and the plethora of joins
that result are more often a hinderance than a help.
Wrong. You can always make views that will hide "plethora of joins" from the application. Sentence "having lot of joins directly means worse performance" is wrong too (except in the case of incorrect joins and joins not supported with proper indexes). I had to prove that hundreds of times until now, and never failed, believe me. It's a wrong design, wrong understanding of requirements and customer needs and wrong implementation (including joins, views, queries, indexes etc) that is cause of hinderance in most of the cases, not "overnormalization". Btw. There's not such thing as "overnormalization" because your schema is normalized or not. OK, could be de-normalized, but this term has different meaning than not-normalized.
|
|
-
-
niik


- Joined on 05-06-2009
- Posts 11
|
bstorer: it's asinine to use a ternary logic internally and then only provide binary control structures. Here, I absolutely and fully agree with you :-)
|
|
-
-
bstorer


- Joined on 02-01-2007
- Alexandria, VA
- Posts 4,131
|
niik:Wrong. You can always make views that will hide "plethora of joins" from the application. Sentence "having lot of joins directly means worse performance" is wrong too (except in the case of incorrect joins and joins not supported with proper indexes). I had to prove that hundreds of times until now, and never failed, believe me.
Your whole post underlies a complete misunderstanding of getting work done. I'm not interested in code performance. Hell, I use Ruby, how could I be? I'm concerned about the amount of time developers spend doing the tedious and the redundant. SQL is tedious, and that's a black mark against it in my book. When I speak of a hinderance, I speak of a hinderance to me, the developer. niik:Btw. There's not such thing as "overnormalization" because your schema is normalized or not. OK, could be de-normalized, but this term has different meaning than not-normalized.
That depends: last I checked there were approximately fourteen million levels of normal forms (That might be a slight overestimation.). At what level does it become "normalized"?
 These puppies are Nazis. They aren't being obvious about it, but I can tell.
|
|
-
-
PJH


- Joined on 02-14-2007
- Newcastle, UK
- Posts 3,140
|
bstorer: niik:Btw. There's not such thing as "overnormalization" because your schema is normalized or not. OK, could be de-normalized, but this term has different meaning than not-normalized.
That depends: last I checked there were approximately fourteen million levels of normal forms (That might be a slight overestimation.). At what level does it become "normalized"?
At the point where you've examined the current schema/schemas/schemata*, and decided that 40 tables per record, and 12,000 'join tables' are a bit of an overkill, and effectively denormalized 13,999,995 levels to improve the performance of your application?
* where appropriate. While not an invitation for a sub-thread....
3 logicians go into a bar.; the barman says ‘Would you all like a drink?’. The first says 'I’m not sure', the second says 'I’m not sure', and the third says 'Yes'.
|
|
-
-
niik


- Joined on 05-06-2009
- Posts 11
|
PJH:
bstorer: niik:Btw. There's not such
thing as "overnormalization" because your schema is normalized or
not. OK, could be de-normalized, but this term has different meaning than
not-normalized.
That depends: last I checked there were approximately
fourteen million levels of normal forms (That might be a slight
overestimation.). At what level does it become "normalized"?
At the point
where you've examined the current schema/schemas/schemata*, and decided that 40
tables per record, and 12,000 'join tables' are a bit of an overkill, and
effectively denormalized 13,999,995 levels to improve the performance of your
application?
* where appropriate. While
not an invitation for a sub-thread....
Huh, I'm a bit confused
now. Sorry for off-topic, but discussion went in that direction.
First, talking about
normalization there are 6 normal forms only, as I know. And I've never seen in
my whole life database that is normalized to the 6th normal form. Or we are not
talking about the same forms :-) Definitively not. Could you explain me
what is denormalized level, please (taking in account that could be over 14
millions of them)?
I've never heard about term
"tables per record". Usualy it goes vice versa. Do you create new
tables for each record? If you do, that could be defined as a kind of
partitioning rather than normalization. Personally, I can't imagine reason why
it should be done in that way, but this does not mean it's wrong - I just don't
know "full story". The same is valid for 12000 joined tables.
Second, I've made a crucial
mistake when I mentioned performance, but this is a part of my background.
Performance counts as well as resource utilization, hardware requirements,
software cost and development time. And, as they are usually on opposite sides
of an equation (mats again ;) there's no universal truth. In some cases I'm
using plain text files to store and retrieve data. In some other cases
I'll use data in the spreadsheet and perform just order by column and
visual search to find the data I need, rather than create tables, indexes,
views. It depends on needs.
If performance is not your
concern that's just fine, but you can not generalize based on this starting premise.
If you buy Ferrari you can't complain it consumes too much fuel, maintenance
fees are high and insurance is killing just because you use it once per week to
go to the supermarket. Relational databases are, like it or not, the most efficient
systems to store and retrieve data, at least today (excluding some special
cases). And there are bunch of them with different prices, characteristics,
support options etc. You should choose one that fits you. And there are still
other options that could be evaluated if performance is not an issue.
The problem with "job
done" approach is that it is used too often as an excuse for making
application without any planning - just decide: we will use this for
data persistency and rush into the code. We will make design on the fly, we
will decide structure when need it, someone else will be in charge to tune it,
anyway. And, after a while, we start complaining about that not being
friendly to us. Of curse is not. You can't imagine how many times "the
huge development problem" that involved a lot of people writing complex
code with nested cursors and recursions (or loops, doesn't really matter),
involving functions and procedure calls - or say it in simple way -
spending a lot of developers time was solved by simple redesign and set instead
of row oriented operations. Once complex code, with thousands and thousands
lines, hard to debug, prone to errors, resource consuming with slow response, became
short, fast and reliable piece of code. I did that. A lot of times. And
developers, by accepting those changes and recommendations, were
increasing efficiency a lot. Every next job was done faster. Because it was designed
and created on the rock solid base of a proper application development
approach. First think, then code, and of course, lern from mistakes :-)
But this also depends on
needs, as I said before. I would not spend my time developing state-of-the art
application when it doesn't need to be, and I'll use small city car to go to a shopping
mall, not Ferrari.
|
|
-
-
PJH


- Joined on 02-14-2007
- Newcastle, UK
- Posts 3,140
|
niik:First, talking about
normalization there are 6 normal forms only, as I know. And I've never seen in
my whole life database that is normalized to the 6th normal form. Or we are not
talking about the same forms :-) Definitively not.
The 14 million was a joke. niik:Could you explain me
what is denormalized level, please
Denormalization is where, having taken your schema to, say, 5th NF, you take a positive decision for performance or other reasons, to take some of it back to 4th NF undoing some of the steps you took to take it to 5th NF. Note that this is totally different to simply stopping at 4th NF and not going any further. niik:I've never heard about term
"tables per record". Usualy it goes vice versa.
Again, a joke.
3 logicians go into a bar.; the barman says ‘Would you all like a drink?’. The first says 'I’m not sure', the second says 'I’m not sure', and the third says 'Yes'.
|
|
-
-
niik


- Joined on 05-06-2009
- Posts 11
|
PJH:Denormalization is where, having taken your schema to, say, 5th NF, you take a positive decision for performance or other reasons, to take some of it back to 4th NF undoing some of the steps you took to take it to 5th NF. Note that this is totally different to simply stopping at 4th NF and not going any further.
Thanks for better explanation of what I said: niik:OK, could be de-normalized, but this term has different meaning than not-normalized.
|
|
-
-
niik


- Joined on 05-06-2009
- Posts 11
|
How to minimize usage of NULL in DB
Just to try to clarify some thinghs, if I can (in fact I've just finished writing of the document (below) when I found this thread). How to minimize usage of NULL
Even NULL is allowed in both relational theory and practice its usage should
be avoided as much as possible. As NULL is defined as representation of unknown piece of data it should be
treated in that way. That means, strictly following relational theory, if an
attribute can be omitted it doesn't belong to the tuple because tuple contains
values and NULL is not a value. Why NULL is defined in the relational theory then? Because NULL represents missing data at the moment of processing. The question
is does sentence "tuple does not have this attribute" is the same
with "I do not know value of the attribute".
Put it in simpler way, if we have basic
personal data having just two attributes: name and job, what is the meaning of
null assigned to the job attribute? Does it mean we don't know the person's job
or he is unemployed? Or could be this specific person belongs to category where
employment status is not applicable, e.g. school and preschool kids. If we have
to send job offer to the unemployed we can't clearly distinguish it as well as
we can't distinguish missing attribute values if we want to make a list of
people with incomplete data.
How we can solve that problem?
Introducing predefined values for
"unemployed" and "not applicable" makes picture much
clearer. Introducing "unknown" can further make a difference between
really unknown and "not entered yet". Now we can do all the queries
we need, even we can sent a note to the relevant employees to keep up with
entering data if there are still a lot of NULL jobs, meaning there are a lot of data pending to be entered. This
could be particularly valuable in case of parent-child relations such as
catalogue tables (countries, organizations, locations etc.). It also adds value
to the information provided to the end user - displaying, on example, values
for location "not applicable" and "unknown" or "not
entered yet" for a NULL in most
cases is better than displaying just NULL or empty string. Of course, the "instead of NULL" values should be selected based on the business requirements,
e.g. "unknown" could be omitted and null can be used to show not
entered and unknown at the same time, or we can define column as a not null and
introduce separate value for unknown.
Advantage of covering NULL with predefined values, apart from more descriptive presentation of
data, lays in better coverage of two valued logic. In case of filters like
"not equal" the result set will never contain rows with a NULL.
WHERE
Location<>'Loc1' will return all rows with
locations different from 'Loc1'
but not the rows with NULL locations. To avoid that we have to write filtering clause like: WHERE Location is null or
Location<>'Loc1'.
If we don't care about what is a meaning of
NULL (we assume null means "no
job" for whatever reason - unemployed, not applicable or unknown) and
percentage of the rows containing NULL is high (usually high is more than 10% of rows containing NULL) then we have to normalize our table into two related tables - persons
and person_jobs. This approach, even if it looks like an unnecessary step that
leads to more complex queries on a first sight, in fact provides much efficient
retrieval of data, especially in the case when there is a filter based on the
job column. And it could easily solve the problem when some of them decide to get
second job.
As there is not such thing as universal
solution, I would recommend defining all columns as NOT NULL except in
the case of:
- Columns that could have missing
information during some period of time (e.g. forms that could be saved temporarily
with still missing values, tables that accept imports with not mandatory fields
that will be entered afterwards, etc.), which effectively covers
"temporarily missing data". This doesn't really violate normalization
rules as those rows could be treated as rows still in the process of inserting
data, although I would personally prefer to keep incomplete rows in separate
table and send them to the main one once data are complete;
- Columns where NULL always means "do not have a value" or "not
applicable" and amount of rows containing nulls does not go over 10% of
total number of rows (it's against basic relational principles but this is an
acceptable de-normalization step). Note that 10% is not a magic number and it
depends on the total number of rows. As number of rows goes up, the percentage
of "allowed null rows" should decrease in order to maintain
efficiency of the queries. Having tens
of thousands of additional "is null" operators or isnull(), coalesce()
or similar functions will always bring unnecessary load to the database server
(as well as additional application code);
- Columns representing hierarchy
structure could have null representing top of the hierarchy, although this
could cause problem if it could mean also "unknown" because, on
example, list of top managers defined as ManagerID is null
will contain people that are unassigned at the moment. A solution for it could
be a rule: "Top manager has his own id as ManagerID" In
that case null is avoided and logic to select top manager(s) is just shifted from Where ManagerID is Null to Where ManagerID=EmpID, which is not a big deal.
|
|
-
-
bjolling


- Joined on 06-08-2008
- Belgium
- Posts 739
|
Re: How to minimize usage of NULL in DB
Thanks for sharing the article.
niik:- Columns where NULL always means "do not have a value" or "not
applicable" and amount of rows containing nulls does not go over 10% of
total number of rows (it's against basic relational principles but this is an
acceptable de-normalization step). Note that 10% is not a magic number and it
depends on the total number of rows. As number of rows goes up, the percentage
of "allowed null rows" should decrease in order to maintain
efficiency of the queries. Having tens
of thousands of additional "is null" operators or isnull(), coalesce()
or similar functions will always bring unnecessary load to the database server
(as well as additional application code);
I don't feel too comfortable wih this section. Your queries don't care that only a certain percentage of rows can have columns where NULL could be "not yet entered" or "not applicable". If you allow this for just one row, you'll still have to write the convoluted version of your query. And if I hear someone giving advice on how to keep up the performance of something, I always expect to read about test results. It would've been nice if you had run some test cases and actually proved that for a certain amount of records there is indeed a threshold (like the 10% that you mention) where performance begins to drop.
 It's... Monkey Piston's Frying Circle!
|
|
-
-
bstorer


- Joined on 02-01-2007
- Alexandria, VA
- Posts 4,131
|
niik:If performance is not your
concern that's just fine, but you can not generalize based on this starting premise.
If you buy Ferrari you can't complain it consumes too much fuel, maintenance
fees are high and insurance is killing just because you use it once per week to
go to the supermarket. Relational databases are, like it or not, the most efficient
systems to store and retrieve data, at least today (excluding some special
cases). And there are bunch of them with different prices, characteristics,
support options etc. You should choose one that fits you. And there are still
other options that could be evaluated if performance is not an issue.
The criticism isn't of relational databases, but of SQL. To extend your analogy, I'm not complaining about my Ferrari; it's a beautiful machine. I'm complaining about the fact that I have to operate it using a trackball. It's needlessly difficult to control, regardless of how easy it is to understand. Still, the Ferrari is better than all the other notable cars because they all use the stupid trackball interface. I suppose I could take a bike, but then it's hard to carry groceries. I could walk, but that's awfully slow. At the end of the day, I'm forced to opt for a car, despite the trackball, because the alternatives are worse. But that doesn't mean I have to like the trackball. Don't fool yourself: the best tool readily available is not the same as the best tool. niik:The problem with "job
done" approach
I need to stop you right here: I've said nothing about having a "job done" approach. I've never advocated a strategy of getting the app out the door as quickly as we can. I'm talking about getting work done. "Work" here means real progress in converting a need for software into the actual application. That's not simply coding, either; it's the entire development process. In fact, a great deal of coding doesn't really qualify as doing work at all. For example, writing code to deal with SQL's tri-state logic. niik:But this also depends on
needs, as I said before. I would not spend my time developing state-of-the art
application when it doesn't need to be, and I'll use small city car to go to a shopping
mall, not Ferrari.
You're really wedded to this concept that I care about what car I'm driving, or whether I need a car at all. At some point I've already decided whether I need a car, and what kind of car I need. What I'm complaining about is that driving is already a hard task, and while I should be focusing on that truck swerving up ahead, I'm instead trying to get this goddamn trackball to make the car do what I need.
 These puppies are Nazis. They aren't being obvious about it, but I can tell.
|
|
-
-
niik


- Joined on 05-06-2009
- Posts 11
|
Re: How to minimize usage of NULL in DB
bjolling:I don't feel too comfortable wih this section.
Neither do I. I would like to delete this paragraph immediately, but in most application I've seen exist at least one table with a lot of columns where lot of columns could be null for whatever reason. Keeping all of them nullable was and is not a solution as well as making 50 or more additional tables just to enforce full normalization. For every new design I'm using "normalize first - de-normalize later" rule, but opportunities to start database design from scratch are rare unfortunately. Most of the time you have to deal with strong opposition from developers and managers, budget constraints and not enough time to perform all things that are really necessary. Then, you have to make compromise, like it or not. And compromise I suggest is what I said in my previous post. Performance related advices are tricky and have to be tested for every new case. Some of them have to be revised again and again for the same database (as data change and grow over time). And, yes, I did testing on select queries where filter criteria was defined over nullable column against the same data divided in two related tables. And I've got a bunch of different results in range of "one table is much better" to "two tables are much better" with all the gray levels in between. It doesn't depend only on percentage of rows with nulls, but on existence of index over that column (often columns don't have indexes because of faster insert/update operations), column statistic (aka index statistic), overall size and number of rows, and what is a primary operation over the table - select, insert or update (including the ratio between them).And, let's say - I had to come with some rule as a starting point - and I did it. I came with some numbers which were not correct but were the nearest approximation of my testing results. I'm not happy with that and if I have opportunity I rethink every of those decisions for introducing non mandatory columns and in some of the cases I'm not really assured I did a good job, whatever decision I made. But, that's life. bjolling: If you allow this for just one row, you'll still have to write the convoluted version of your query.
Sad, but true. Life is not fair. That's the reason I do not like non mandatory columns. That's te reason why I wrote the article, too.
|
|
-
-
niik


- Joined on 05-06-2009
- Posts 11
|
bstorer:to make the car do what I need Let's finish this discussion. Just explain me what's your proposal how to avoid ternary logic if null's are allowed? What Should be result of true and null false or null Solution could be to forbid nulls. I'd like that, but even with all mandatory columns you still could end up with null in outer joins. Let's forbid outer joins. Huh... How other, better than SQL, languages solve that problem? Whats happen in any other language when you try to do boolean comparison against variable with no assigned value (or has null if you prefer)? Do you want SQL to return you something like: "Execution canceled because attempt of comparison against null."? Of course not. You addvocate that null should be treated as NOVALUEASSIGNED. Fine with me. And you want to get all the people that have SSN different than '1234'? Great. From what group of the people? Just from people with SSN or from all of them? If you force boolen logic to have predefined answer to NULL state you don't allow developers to give meaning to NULL - you enforce meaning - one or another. And, as I could remember you were asking to have right to choose meaning, but it seems you want to choose meaning and to make all of us accept it as "the only truth". Amen.
|
|
-
-
bstorer


- Joined on 02-01-2007
- Alexandria, VA
- Posts 4,131
|
niik:Just explain me what's your proposal how to avoid ternary logic if null's are allowed? What Should be result of true and null false or null
What's 9 + pizza? What do you get when you concatenate a string to a man riding a unicycle? Those questions make as much sense as the ones you ask do in binary logic. The domain of values in boolean logic is {true, false}. I don't see null in that set, do you? niik:Solution could be to forbid nulls. I'd like that,
but even with all mandatory columns you still could end up with null in
outer joins. Let's forbid outer joins. Huh...
In my experience outer joins are frequently a way to do something that is more correctly done with multiple queries, but I'm sure others have found a good use for them. How about a simpler solution: don't forbid null. Just treat it as a indicator value, completely separate from any other value that type can hold. NULL == NULL but nothing else does, just as 'a' == 'a' and nothing else does. niik:How other, better than SQL, languages solve that problem? Hugh Darwen has a bold solution (see the handouts and lecture notes for "How to Handle Missing Information Without Using NULL"). It's fairly interesting, but perhaps a bit of overkill (Although distributed key constraints really ought to exist.). But you gotta admit, that's some normalized data, huh? I'm not sure how much of this is presently possible in any Tutorial D implementation. niik:Whats happen in any other language when you try to do boolean comparison against variable with no assigned value (or has null if you prefer)? Do you want SQL to return you something like: "Execution canceled because attempt of comparison against null."? Of course not.
'a' == NULL returns false, because 'a' is clearly not null. I don't see your problem with this. niik:You addvocate that null should be treated as NOVALUEASSIGNED. Fine with me. And you want to get all the people that have SSN different than '1234'? Great. From what group of the people? Just from people with SSN or from all of them?
SELECT * FROM people WHERE (ssn <> '1234') returns every row with a value other than '1234', including NULLs. It's what I asked for, it's what I get. If I don't want NULLs, I should be able to do something along the lines of SELECT * FROM people WHERE (ssn NOT IN {'1234', NULL}). niik:If you force boolen logic to have predefined answer to NULL state you don't allow developers to give meaning to NULL - you enforce meaning - one or another.
I'm enforcing no meaning. I'm leaving it up to the user to ascribe their own meaning to NULL. I'm just providing an optional symbol that exists within the domain of every type, but outside the value set of that type. Whether you take it to mean unknown or the color blue is completely up to you.
 These puppies are Nazis. They aren't being obvious about it, but I can tell.
|
|
-
-
niik


- Joined on 05-06-2009
- Posts 11
|
bstorer: niik:From what group of the people? Just from people with SSN or from all of them?
SELECT * FROM people WHERE (ssn <> '1234') returns every row with a value other than '1234', including NULLs. It's what I asked for, it's what I get. Huh, why I feel like talking with my son (teenager, age 17, thinks he knows everything)? Is it SSN<'1234' UNION SSN>'1234' same as SSN <> '1234' ?
In what result set rows with NULL belong? Is 9 bigger than 'pizza'? In fact, we agree about most of the things, at least according to your previous post. NULL is not value and can not be compared. Again:
bstorer:SELECT * FROM
people WHERE (ssn <> '1234') returns every row with a value other
than '1234', including NULLs. It's what I asked for, it's what I
get. It's vice versa, you have to include NULL rows if you want to use not equal, with UNION or with WHERE ssn is null or ssn<>'1234' Of course, it would hellp all of us if there is some more elagant way, something like SSN<>'1234' returns SSN<'1234' UNION SSN>'1234' and SSN != '1234' returns SSN<'1234' UNION SSN>'1234' UNION SSN is null If this is what you prefere to have, I would like to have it too. Maybe could cause confusion in the beginning, but in long terms or with better choosen operators it would be quite useful. About usfulness of NULLs in database, read my post 'How to minimize usage of NULL'. It's not complete and is a kind of compromise, but explains my point of view. Allowing nulls in database created probably more application problems then C pointers. I would not say pointers are wrong, esspecialy if your primary goal is performance. In the exactly same way I would not say NULL is useless, but it is dangerous and should be avoided as much as possible. And should be treated as NULL.
|
|
-
-
LoztInSpace


- Joined on 01-11-2006
- Melbourne
- Posts 120
|
niik:How other, better than SQL, languages solve that problem?
That's what I asked and nobody took up the challenge. No matter what your situation and why, C++, Java, C# or VB or SQL, if your variable might be null you can't use it like you can when it's not null.
bstorer:'a' == NULL returns false, because 'a' is clearly not null. I don't see your problem with this.
Null is not a value as such. Does a == something that is unknown ?. I don't and can't know. I don't see your problem with this.
Let me put it this way: Is your age the same as mine?
|
|
-
-
bstorer


- Joined on 02-01-2007
- Alexandria, VA
- Posts 4,131
|
niik:If this is what you prefere to have, I would like to have it too. Maybe could cause confusion in the beginning, but in long terms or with better choosen operators it would be quite useful.
I'm not going to address the entirety of your points, because they boil down to one thing: SQL, be it the standard version or mine, needs to be scrapped. Frankly, we ought to be able to work directly with the database programmatically instead of using an intermediate langauge anyway. niik:About usfulness of NULLs in database, read my post 'How to minimize usage of NULL'. It's not complete and is a kind of compromise, but explains my point of view.
I read it. You really should read Darwen's stuff I posted before. He gives some interesting ideas on how this could work.
 These puppies are Nazis. They aren't being obvious about it, but I can tell.
|
|
-
-
bstorer


- Joined on 02-01-2007
- Alexandria, VA
- Posts 4,131
|
LoztInSpace: bstorer:'a' == NULL returns false, because 'a' is clearly not null. I don't see your problem with this.
Null is not a value as such. Does a == something that is unknown ?. I don't and can't know. I don't see your problem with this.
My problem with this is that null is treated as if it were in a superposition, but with none of the advantages of such an arrangement, but with many of the logical inconsistencies it creates. Consider, for example, a query like "SELECT * FROM some_table WHERE (a = 0 or a = 1 or a = 2 or ..." wherein I enumerate every possible value for the type of column a. I won't get back the rows where a is null, despite the fact that they have to be one of those values. This is an outlandish example, though, so let's consider a simpler one: a = 0 or not a = 0. According to tri-state logic, this query doesn't give back nulls. That clause covers the entire domain of a's values. Those nulls are supposed to be unknown values, not impossible values. What I propose is something subtly different: instead null being an unknown value in the domain, it is completely outside the domain. Null becomes inapplicable to ordering and clauses that cover the domain. Granted, this isn't a perfect system, and inconsistencies are still possible. One of the big problems is that there has to be some way to handle ordering with nulls. There are a few different solutions, none of them ideal. The real solution, again, is to leave SQL behind and move on to something better.
 These puppies are Nazis. They aren't being obvious about it, but I can tell.
|
|
-
-
niik


- Joined on 05-06-2009
- Posts 11
|
bstorer:...
I know Hugh Darwen's work, as well as work of Data, Codd a lot of other theoreticians and practitioners of relational model. And I have my oppinion, which doesn't neccessary fits in one frame that fits all (as well as all others). The problem with you is, although you red some theory, you don't understand it. You just use it to show "your knowlege". I'll try to explain you how it looks like using relational theory (not SQL), once more. There is relation "people" that covers name, surname, date of birth - all mandatory fields. There is another relation with SSN numbers. It has less tuples because not every person has SSN. There's no nulls at all. There is relationship established between people and SSNs. There's still no nulls. Based on those two relations, two additional projections can be created: - people with SSN - people without SSN This could be done in different way too. You can have two relations: people with SSN and people without SSN, and two projections: people and SSNs - it doesn't matter. Projections and relations by relational theory must be interchangable without affecting nor data neither users of data. But this is not important. Important is that when you ask "I want list of people with SSN different than '1234' you are addresing relation or projection that contains only and only people with SSN. Why? Because tuple with null inside is not a tuple (check Darwen's work, or Codd or Data). You can not expect that people from other relation will appear. If you want all of them, you have to make union of two projections - People with SSN different than '1234' and people without SSN. There's no nulls. There's no ternary logic. There's no missinterpretations. You can do that in all existing RDBMSes. Where is the problem? It is in the fact that you are allowed to use nulls and you use them. And you do that even you are against them. Much worse is that you use them in a completly wrong manner and without understanding basic principles of relational theory. Computers are stupid. They do exactly what they are told to do. They can't read mind and they can't read between lines. They can not know there's difference in command "Give me the list of the people with SSN different than '1234'" based on who's asking. You or me. Or sameone else. They answer just following the logic: If you asked me to check people's SSNs, I'll check only relation representing people that has SSN. And that's the only right relation/projection/set to be checked. People without SSN does not belong here. They are "pizzas".Don't mix them with 9s. And, as you don't understand theory, your answer is - I need something better. There are flaws in implementations of relational model, especially in implemantations of SQL. But to say
bstorer: Frankly, we ought to be able to work directly with the database
programmatically instead of using an intermediate langauge anyway. just shows once more your deep not understanding of relational theory and Hugh Darwen's work. No one should be allowed to work directly with the database
programmatically, because in that case that's not database anymore. It's a file with bunch of data inside. If you want to do that way, you can. You just don't need database.
|
|
-
-
PJH


- Joined on 02-14-2007
- Newcastle, UK
- Posts 3,140
|
niik:Based on those two relations, two additional projections can be created: - people with SSN - people without SSN
Along with - People with more than more than one SSN, and - SSNs used by more than one person (Google 078-05-1120)
3 logicians go into a bar.; the barman says ‘Would you all like a drink?’. The first says 'I’m not sure', the second says 'I’m not sure', and the third says 'Yes'.
|
|
-
-
bstorer


- Joined on 02-01-2007
- Alexandria, VA
- Posts 4,131
|
niik: I know Hugh Darwen's work, as well as work of Data, Codd a lot of other theoreticians and practitioners of relational model. And I have my oppinion, which doesn't neccessary fits in one frame that fits all (as well as all others). The problem with you is, although you red some theory, you don't understand it. You just use it to show "your knowlege".
niik: You can do that in all existing RDBMSes.
I'm well aware of the theory, and I'm well aware that you can do that in existing systems. None of that excuses the manner in which SQL handles NULLs. That you don't need to use an aspect of a language doesn't defend said aspect from criticism. Further, I don't know where you came to the conclusion that my dislike of NULLs and my dislike of SQL are interchangable. NULLs are only a single complaint of many I have with SQL. By the way, his name is Date, not Data. niik:Where is the problem? It is in the fact that you are allowed to use nulls and you use them. And you do that even you are against them.
When did I say that I use them? I said I don't like them. You asked me how I'd fix them, I offered two solutions: a middle ground, revampling NULL so that it better dovetails with the way programming languages handle NULL, and one drastic but ideal, specifically moving to a language with no concept of NULL. niik:And, as you don't understand theory, your answer is - I need something better. There are flaws in implementations of relational model, especially in implemantations of SQL.
So why can't I say we need something better? We've moved away from ASM, we've moved away from C, we're moving (thank God) away from C++. Is it so much to suggest that we move away from SQL? niik:But to say
bstorer: Frankly, we ought to be able to work directly with the database
programmatically instead of using an intermediate langauge anyway. just shows once more your deep not understanding of relational theory and Hugh Darwen's work. No one should be allowed to work directly with the database
programmatically, because in that case that's not database anymore. It's a file with bunch of data inside. If you want to do that way, you can. You just don't need database.
There seems to be a misunderstanding on your part. I'm not advocating the violation of constraints, a lack of transactions, or the removal of privileges. I'm not proposing ad hoc access to the data with disregard for its structure. What I mean is that the constructs to manipulate relational data should exist in the programming languages. Instead of building my query into another language, it should be provided so as to fit the style and structure of the native language. LINQ is half-way to what I want. It still outputs SQL to be interpreted by the RDBMS, which is the step I want eliminated.
 These puppies are Nazis. They aren't being obvious about it, but I can tell.
|
|
-
-
-
bstorer


- Joined on 02-01-2007
- Alexandria, VA
- Posts 4,131
|
Jimmy Savile:The handling of nulls in databases is the first thing most people are taught. It beggars belief that some so called professional programmers do not understand how to deal with them!
Nobody is complaining that they don't know how to deal with them, but that having to deal with them at all is tedious and potentially error-prone. Jimmy Savile:I always get the impression that programmers that do not like sql and nulls are straight out of school and have not experienced proper business systems, if you want to earn real money from this career you have to deal with real rdms'es, SQL server,Oracle and the like, null handling is one of the simplest things you are going to have to deal with.
I always get the impression that programmers who just accept their lot and have no interest in questioning anything are little more than glorified code monkeys. It boggles my mind that you're so accepting of the numerous flaws that you have just acknowledged. If I can't rely on you to think critically about your tools, how can I rely on you to think critically about the task at hand? Yes, I'm pigeonholing, but you started it. Jimmy Savile:SQL might not be perfect but it is loads better than doing the whole data storage, retrieval thing yourself.
This is a false dichotomy. There exists a third alternative: creating something better. I honestly don't understand what makes you think SQL is sacred. We've been through a plethora of programming languages in SQL's lifetime, each an attempt to improve upon the past. Why is it proscribed to suggest that we think about replacing SQL with something better? Jimmy Savile:Its consistent and widely understood by nearly all professional developers, the real wtf is threads like this where people open display their lack of professional skill in front of their piers!
This is a moronic statement. QBASIC is consistent and widely understood, but that doesn't justify its use. I understand that there is nothing better, but I don't have to like it. Jimmy Savile:If you do not like SQL create data access layers, but for gods sake let someone that actually understands databases create them!
I'm sick of being accused of not understanding databases. If I didn't understand them, I'd be hard-pressed to recognize their flaws, wouldn't I? You see the problems, but you're too content to change anything. I'm glad I don't have to work with you; that complacency disgusts me.
 These puppies are Nazis. They aren't being obvious about it, but I can tell.
|
|
-
-
dhromed


- Joined on 04-13-2005
- Dutchland
- Posts 10,115
|
SQL is a strange language. I mean, if you've never seen SQL, and only have normal programming experience, and find yourself in front of a significant query (with a few joins, orders, subselects, ins, groups, havings) you cannot flawlessly predict the parse/process sequence of the elements in a significant query. One has to learn by heart that any query is executed in the almost -- but not quite -- opposite order from the syntax. Having it look like human-ish language sentences was a mistake. Before you know the order of every clause, many queries seem ambiguous. I've always found SQL cumbersome to use, but strangely delightful as well, as it appealed to my sense data control: you are like a god creating meaningful datasets from raw information. Ahem.
SQL, be it the standard version or mine, needs to be scrapped.
Frankly, we ought to be able to work directly with the database
programmatically instead of using an intermediate langauge anyway.
Agreed. It seems redundant that I'm writing code that writes code that performs an operation. That middle step can, and should, be eliminated. The same goes for HTML, really. I don't really feel I'm getting things done when I'm building strings of HTML. I should be able to have my business code construct a tree of Things, and have a separate formatter turn it into ML. I once made a small half-assed attempt to convert SQL syntax for common queries into a proper program structure, but as it was half-assed and my time was short, and my knowledge of SQL is not big enough (it's a pretty expansive language), it didn't go very far. Same for HTML output. Both attempts are filed away. Perhaps someday, something will come of it. ===
As far as the NULL discussion goes: I'm in the NULL = 'empty' camp, where we can say with absolute certainty that 'a' is not in {'b', null}. The fact that in SQL this is not the case is certainly explainable to new programmers, and usable in many practical circumstances, but that doesn't mean it really makes sense in a broader perspective. It's an odd presupposition on the part of its developers, because it adds arbitrary subjectivity to the meaning of NULL. The fact that SQL appears to have no effective constructs to deal with it just adds insult to injury.
 boomzilla: I think the obvious answer is for everyone to just stop programming.
|
|
Page 1 of 1 (43 items)
|
|
|