May 15 2009
Why Can’t I Average In My Universe?
A few months ago I wrote a post about the new Designer feature “Delegated Measures” and suggested it could be used to avoid issues with projection averages. Today on BOB I answered a question about why this is important (why averages should not be projected within the universe) and decided that I should capture a more thorough example here on my blog.
Problem Statement
I have a database with sales transactions. Users want to be able to create reports using averages and percentages. Can I put these objects into the universe and still guarantee the correct results?
Average Formulas
There are several generally accepted ways for calculating averages. One of the following is probably going to work in most cases:
sum(x) / count(x)
sum(x) / sum(y)
sum(x) / count(y)
In each of these cases there is an aggregate applied to the numerator and another (potentially the same) aggregate applied to the denominator. Once the aggregation is done the division takes place. Consider a transactional database used to take orders from customers. I might want to know the average size of an order (total dollars divided by a count of orders) which seems like a simple calculation. But does it work?
How Projection Works
When a universe designer creates a measure object there are two places to set up an aggregation process. The first and probably most obvious is in the SQL select clause. The second is on the properties tab and is called the Projection function. The SQL clause happens on the database, and the projection happens on the report. When a user drags elements on to (or off of) the report the measures will adjust (project) based on this secondary function.
With a total measure this process works quite well. If I create a query containing Region, Sales Year, Sales Month, and Total Sales I can project the sales amount to any combination of those dimension values without seeing an incorrect result. A more technical way to put this is to say that a “sum” operation is fully transparent. You will get the same answer if you do any of the following operations:
1+2+3+4+5
(1+2)+(3+4)+5
((1+2+3)+(4+5))
Any of those expressions will give you the same answer: 15. Unfortunately when you mix operations (sums and divisions) the same is no longer true. (1+2)/(3+4)
is not the same as 1+(2/3)+4
. This is part of what causes problems for average projections, but it’s not the full story.
Aggregation Masks Source Data
Suppose I gave you a number, a nice round one like sales revenue of $100,000. Where did that number come from? How many possible ways are there for me to generate that value? It could have been one single order (at $100,000) or it could be the sum of 100 different orders at $100 each, or 10 orders for $10,000, or… there are an infinite number of ways that I can generate that number. If all I am concerned about is the total, I don’t really care. But averages require me to know more about the end result; I have to know two values rather than just one. Once the aggregation is done, I’ve lost the source data and cannot go any further.
Let me put some actual numbers in place to prove my point.
In January Widgets-R-Us had 100 orders for $100 each. The total revenue for the month was $100,000. In February all of our production was tied up in one very large order for a single client for $100,000. This means my total revenues for the two month period was $200,000. That number is interesting, but it doesn’t tell the full story. The two months had a very different sales pattern, and I would like to know that. I can show that by calculating the average order per month.
In my universe I can create an object with this formula:
sum(sales) / count(orders)
This object is called Average Order Size and it has been defined with a projection function of Average. I also have an object called Total Sales with a projection function of Sum. With these objects along with Sales Month on a report the resulting numbers should come out as follows:
Mon Sales Avg Jan $100,000 $100 Feb $100,000 $100,000
However, the resulting answer will be correct only for the initial grain of the question. If I (or any report user) were to drag the Sales Month object off of the report, here’s what I would get:
Sales Avg $200,000 $50,050
What happened? The total sales number projected correctly. I did have $200,000 in sales for the time period shown on my report. However, the average order is certainly not $50,050 as shown here. What happens is the average projection can only work with the data available. The source data has not been pulled down from the database, only the aggregate values are on the report. Since there are two values on the report, the average is being calculated as (100000 + 100)/2
and clearly that is wrong. There are far more values (higher count) with an order value of $100 than there are with $100,000 so those values should carry more weight. They don’t, because I don’t have the proper information available. This is how averages get “wrong” on reports. The same thing can happen to percentages or any other calculation that involves a division operation on summed values.
How Do We Fix This?
There are two ways to fix this that are currently available, and a third way that is often talked about as an awesome new feature for universe designers. (hint hint to Business Objects developers, we’ve been talking about this a very long time… 😉 )
The first (and oldest) solution is to provide the components of the average calculation (the sum and the count) as objects and let the report developer do the division. There is nothing wrong with summing a sum, and there is nothing wrong with summing a count. Once I perform these report-side calculations with the original source data I can do the division and get the correct answer. This allows me to ensure the answer is always correct, but the cost is I have to create (and recreate) report variables, often on more than one document. In my example above I would return the total sales (already on the report) and a count of orders (a new object) and the average would be calculated as sum(Total Sales) / sum(Order Count)
. The math behind the projected value for average order size (for when I remove the month from the report) would then become (100,000 + 100,000) / (100 + 1)
or 200,000 / 101
and the final answer is 1980.20, a number that is much more believable.
The second option is to force the query to rerun (go back to the source data and recalculate) using the Database Delegated feature covered in my previous post. This also allows me to generate the correct answer, but the cost here is I have to be on the current version and I have to execute a new database query every time a user manipulates the report. Depending on the response time for the query, this may or may not be a viable option.
The third and currently unavailable option would be to allow a universe designer to create report variable objects in the universe. These objects would not be part of the query, but would be able to be included in the universe and dropped onto a report. This way the universe designer could create the required formulas and they can be reused on multiple reports. Unfortunately, we don’t have this feature, but I am still hopeful. 🙂
Conclusion
In a nutshell, averages and percentages and any other calculation that involves dividing projected values should be done on the report rather than the universe, unless you plan to use the delegated feature that recently became available. The math just doesn’t work with a standard projection. The one case that I can think of right now where I have used an average as a projection function was in a statistics universe where the sample size was always the same.
In my example above we had 100 sample data points for January and only one for February. If both months had exactly the same number of orders, then the average projection would have been correct. In most business scenarios this will not be the case, therefore the average projection function should rarely be used.
Thanks for this post, it’s good to know that I am not alone on dealing with this issue on almost every project.
I have often questioned why Business Objects has not taken on this issue and solved it at the designer level. I am not familiar enough with other products to know if they handle this any better; but I have often suggested the creation of a “report variable object” in designer as this seems like the most efficient solution.
http://www.forumtopics.com/busobj/viewtopic.php?t=123342
I am after this feature (I use the term “2-phase calculation in the universe”, or global report variables) for such a long time. Although “database delgated measures” (a.k.a smart measures) are a step in the right direction, they fall amazingly short 🙁
It would be nice to see to have some really new functionality/features added to Universe Designer, but a this point I believe a lot of SAP’s resources will go into Xcelsius, the integration of BusinessObjects tool with SAP BW, etc.
Hi,
Unfortunately none of the solutions above work for Polestar (Explorer). You cannot control the calculations like Crystal and Polestar ignores Database Relevant measures.
-Mike
We just turn off projection for average or percentage objects and tell the users to run the report at the level they need. If they do happen to drop a column out of the report, the measures stay in the former grain which will make the report obviously in need of attention. This was the better alternative than accidentally getting an incorrect projection. We are exploring cubes to address some user sets that have a high need for random slicing and dicing.
-Chris
Good point, Chris, I should have mentioned that setting the projection function to “None” on average measures does just that.
How about using avg() database function in the object definition and projecting it as sum? any suggestions
Hi Dave,
We have a tricky problem regarding percentage calculation in Explorer. The goal is to present a ratio in percentage in explorer. In IDT we did calculate the ratio and created PERCENT % format and the projection is set to sum. When viewing the values in Explorer, the percentages of each group is summed up and give a non sense value as expected. However if we do that calculate the ratio in Explorer we simply get ratios e.g. 0.84, which is not appealing for C-level audience who are supposed be the end users. All they want to see is percentages e.g 84%. Is there any clever work around fro that?
As you have probably observed, you can create only basic calculations in Explorer, and they’re limited to two levels. You could in theory do A / B * 100 to get from 0.84 to 84 (without the percent sign still) but you can’t do the third operation. What you could do is create a second version of B that is generated as B * 100 during the build process. That would allow you to divide A / B and get 84, but the exposure is that object “B” is now larger by a factor of 100. Which is probably worse. 🙂 As of today I don’t have a work-around to share.
Hi Dave,
Really great blog. I was just struck here.. All my measures are showing correct values except average measures.. In beginning I thought it was fault from HANA side(As my project back end is HANA) but I use to get correct values from HANA to IDT. when i use avg measures in webi it showing irrelevant values. So we created those avg measures in Report level and it gives sensible values. Thanks for the blog and waiting for solution of this kind of scenario in universe.
I will keep on looking into your valuable blogs.
-Mathi