Apr 30 2008
Is “One Context Per Fact Table” a True Statement?
The other day I was talking to someone about contexts in a universe and they offered the typical response. You probably know the one… it says that you should set up one context per fact table. If you read many posts on BOB then you’ll see that phrase quoted quite a bit. The question is: Is this the truth, the whole truth, and nothing but the truth?
No, it is not.
Contexts are designed to show which joins can be used together. If you have a multi-fact datawarehouse then almost certainly your fact tables contain different information, and therefore should not be linked (joined) together. Thus, you separate them out with contexts. Unfortunately that’s an incomplete explanation at best, and misleading at worst. The implication that some folks might take away from this is that if they don’t have “fact” tables then they don’t need contexts.
That is a very dangerous assumption. To see why, let’s look at some of the math behind some relational database concepts.
Cartesian Products
Wikipedia defines a cartesian product as follows:
Specifically, the Cartesian product of two sets X (for example the points on an x-axis) and Y (for example the points on a y-axis), denoted X × Y, is the set of all possible ordered pairs whose first component is a member of X and whose second component is a member of Y (e.g. the whole of the x-y plane):
For example, the Cartesian product of the thirteen-element set of standard playing card ranks {Ace, King, Queen, Jack, 10, 9, 8, 7, 6, 5, 4, 3, 2} and the four-element set of card suits {â™ , ♥, ♦, ♣} is the 52-element set of playing cards {(Ace, â™ ), (King, â™ ), …, (2, â™ ), (Ace, ♥), …, (3, ♣), (2, ♣)}. The Cartesian product has 52 elements because that is the product of 13 times 4.
Hm. Well. Ok. What does that mean?
It essentially means that you match every element from one set with every element of the next set, resulting in a dramatic increase in the number of combinations or tuples. A tuple in database terms is a result row that can be obtained via a query.
If you replace the word “set” with “table” and the word “element” with “row” you get an application of a Cartesian Product that is associated with database queries. In this case you get every row from the first table related (joined) to every row in a second table. This is generally not a desirable result. 🙂
What About Partial Cartesian Products?
The SQL generator used by Business Objects allows you to check for and prevent (or warn against) Cartesian Product queries. This means that a user will likely never end up with a true Cartesian Product without at least being warned about it. But the SQL generator will not detect what I will call a “partial” product without help. Here’s an example.
Note: This simple ERD is from a database I use for a number of presentations, and eventually you’ll be able to download it from the site.
In this diagram you can see relationships from customers
to orders
, from orders
to lines
, from lines
to items
, and from items
to prices
. Each item can have multiple prices over time. Each item can also appear on more than one order line. A customer can have many orders, which can have many lines, and so on.
There are no loops in this diagram. There are no fact tables. Do I need any contexts?
Presenting the Correct Picture
The answer is “Yes”, I need a context in a universe that contains this simple model. There is a relationship between items
and prices
, and another relationship between items
and lines
. But those two relationships are separate and distinct from each other and as a result cannot be used together.
Let’s assume for a moment that there is an item with item_id
of 100. Let’s assume that the items has had only one price throughout its history, and that it’s $10. Finally, let’s assume that this item has appeared on 10 different order lines. If I run this SQL I will get the correct result:
SELECT items.item_name , prices.price , lines.qty FROM items i , prices p , lines l WHERE items.item_id = prices.item_id AND items.item_id = lines.item_id
I should get ten rows of data back from this query, and I will.
The problem is that if the item changes prices halfway through my ten orders I will get two results from the price table. Those two rows will be combined with ten rows from the lines table because the relationship through the items table is a partial Cartesian product. 😯 This error will not be detected or prevented by the SQL generator, and the user will be presented with incorrect results. There will be two price values for each and every item row, with a net of 2 * 10 or 20 result rows instead of the expected 10. I don’t get a full Cartesian product between every line and every price because of the joins through the items table, but I still have an inflated set of rows.
I might try to solve this by creating a join from lines
to prices
and matching by price but that has then created a loop from lines
to prices
to items
and back to lines
. And frankly it’s not the correct thing to do.
Fixing The Partial Cartesian Product
The proper solution is to isolate the incompatible relationships. The tool that we have to do that is a context. Since the item table has two functions in this universe I will separate out those two functions using two contexts:
- First Context:
Customers – Orders – Lines – Items - Second Context:
Items – Prices
The first context is designed to let me report on order history. The second context is designed to let me report on item price history. The pricing history is a completely separate subject area from order history, and by splitting them out with contexts I can ensure that a user will never be provided with an incorrect or misleading result set.
That should always be the goal of a universe designer. 🙂
Conclusion
I don’t have any fact tables in the ERD I showed above. I don’t have any loops. Yet despite this I need to set up contexts in order to generate the correct query results from my universe. Now here’s the fun part: you could provide me the exact same set of tables but number them one through five rather than give me the table name and some indication as to the purpose. If you do that, I can still set up the universe correctly, even though I don’t have a clue what is stored in each table.
How is that possible?
It’s all driven by the relationships between the tables. As long I know the type of relationship (one – one, one – many, many – many) I can create a correct universe. That’s the subject for my next post on Contexts where I will explain how it works and why the name “chasm trap” is appropriate. 😎
Related Links
When I was reading the blog then until the Conclusion part I screamed for myself “Hey Dave, you have a chasm trap there so you do need 2 contexts”.
Only then in the last paragraph I realized that you had written it this way on purpose and you would continue with another blog dedicated to chasm trap 🙂 Thanks for this “entertaining confusion” 🙂
You’re welcome. 🙂
Dave,
Its excellent example. When will the next will be out ? You the busy man…. of course.
Thanks for sharing.
Ha, Ahmar, thanks for your comment. I don’t publish a schedule because I never quite know when things are going to happen. 🙂 For example, I started writing this subject matter (contexts and fact tables) in March. I have started the next post, but I don’t know when it will be out.
There is an RSS feed feature for the blog software I use. My signature on BOB is automatically updated when I post a new entry as well. So hopefully one (or both) of those options will prove to be useful to you.
Dave,
This is an excellent article. I recently was at the receiving end of it, having to design a WebIntelligence report based on a universe with chasm traps as I’ve posted on my blog.
Hope you’ll include more of the universe design posts in your blog!
Dave,
This is an great example its really helpful to me
Hi Dave.
I love the many articles on your site. I was a little confused by something you said in this one though.
You say that there are no fact tables in this schema. I guess that depends on how you define fact table. I would have normally though of “lines” as a fact table and “prices” as a fact table.
In my mental model, whenever I see a table that has all outwardly-directed many-to-one relationships, then it’s a fact table.
(Those aren’t the only fact tables in my notion of what a fact table is, but “tables with all outwardly-directed many-to-one relationships” is a pretty big chunk of all fact tables. Occasionally I’ll see models where an individual table can play the role of a fact table or a dimension table, depending on how it’s used. Such dual-purpose tables usually have mostly outwardly-directed many-to-one relationships, but a small number of outwardly-directed one-to-many relationships.)
I’m curious to know what your definition of “fact table” is? I ask because, honestly, I don’t think I’ve ever come up with a satisfactory, rigorous definition of “fact table” myself, and I’m always on the lookout for better understanding.
Hi Joel, and welcome. First, thanks for your question. Just so you know, I approved the comment but removed your URL because it didn’t have anything to do with Business Intelligence. I hope you understand the reasoning.
To answer the question, the term “Fact table” always seems to belong to a datamart or warehouse schema rather than a normalized schema, at least that’s how I differentiate it in my head. That’s probably because nobody used the term before those schemas were proposed back during the early days. 🙂 Another definition that I’ve seen is any table with only keys and measures is a fact, but that also seems to be derived from how fact tables are used in a star schema as well.
Your definition of “outward joins” also works for the same reason. 🙂
In the schema I provided for this post, I don’t consider the items table a fact table primarily because I don’t use that term in a transactional model but that’s mostly habit rather than from any strict definition. The item table includes keys (item id, order id), measures (quantity sold), but also dimension (informational) values like price and (if partial shipments are allowed) ship date and so on.
After writing all of this, I checked to see what Wikipedia has, and their article also associates the term “fact table” with dimensional modeling. In fact (heh, pun) it echos quite a few of the comments I had already made before going to read it. 🙂
Hi Dave,
I came across your blogs through the BoB forums and I’ve found them to be a good and helpfull read.
In the last part of the blog above you mentioned you we’re going to write more about how you can create a correct universe by knowing relations and about Chasm Traps. I know you’ve written this post a while ago but I’ve not been able to find this new blog you mentioned. Could you please tell where this blog can be found?
I have several dozen posts waiting in “draft mode” to be completed. Part of what drives me to complete them is when people come back asking me where the next post is. 🙂 Thanks for the nudge.
Two quick questions…
Have you posted your article on Chasm Traps?
A minor quibble (in the interest of clarity)…
Your use of arrows is opposite the convention used in Universe Designer, which uses a standard functional meaning where the arrow points from the table containing the many foreign keys to the table containing the single matching primary key. Why do you use the opposite convention? For example order points to lines whereas Universe Designer would show lines pointing to order.
I always get that backwards because it never made sense to me. 🙂 I should have stuck with the crow’s foot graphic.
I have talked about chasm traps in some of my presentations that you can download. Try the “ZEN and the ART of Universe Design” as one example.
The crows foot graphic is like the arrow in that “arrow” goes in the same direction (away from the many to the one). One way to remember that the arrow goes from the many to the one is to think of it like a mathematical function f(x) = y, i.e. f(x) –> y. Many x can go to one y but you never have the same x going to more than one y.
I checked your Zen reference download. I am new to Universes and am looking for more formal definitions than seem available. Although your presentation is not very formalized, it provides a number of useful guidelines that illustrate many of the major concepts well. Thanks.
Dave,
May I know please whether you have posted on “creating a correct Universe if the type of relationship (one – one, one – many, many – many)” is known? I can create a correct universe. Please let me know the post date or the URL forthis topic. I need it urgently as I need to create a relation universe.
thanks,
uma
I don’t remember posting about cardinalities, specifically. But I have talked about how they can be used to guide loop resolution in a number of presentations.
“It’s all driven by the relationships between the tables. As long I know the type of relationship (one – one, one – many, many – many) I can create a correct universe. ”
It will be fantastic… it is worth to wait for 6 years and more… waiting for it’s coming