Jul 06 2009
SQL Aggregate Versus Universe Projection
Measure objects are – in my opinion – the most critical objects in your universe. Without them you can only make lists. While lists might be good for a grocery store or library, they don’t do much to help someone run their business. Measures give your universe purpose. They also are the source of most of the design challenges like fan or chasm traps.
In this post I’m going to go back to a basic concept and talk about SQL aggregation and universe projection and how they work together.
SQL Aggregation
An aggregate function is one that operates on a group of rows rather than a single row. Some SQL aggregate functions include sum(), count(), and max(). When you use an aggregate function you also have to use a GROUP BY clause to pull rows together. If the request is for Resort, Year, and a sum of the number of guests, then all of the rows are first grouped by the resort name and the year of the invoice and then the matching rows are summed up. This step happens on the database server.
Here is the code used in the Island Resorts universe for the Number of Guests object:
sum(Invoice_Line.nb_guests)
Here is the SQL generated for a query that includes Resort, Year, and Number of Guests:
SELECT Resort.resort, 'FY'+Format(Sales.invoice_date,'YYYY'), sum(Invoice_Line.nb_guests) FROM Resort, Sales, Invoice_Line, Service_Line, Service WHERE ( Sales.inv_id=Invoice_Line.inv_id ) AND ( Invoice_Line.service_id=Service.service_id ) AND ( Resort.resort_id=Service_Line.resort_id ) AND ( Service.sl_id=Service_Line.sl_id ) AND ( Service_Line.service_line = 'Accommodation' ) GROUP BY Resort.resort, 'FY'+Format(Sales.invoice_date,'YYYY')
Note: the condition on Accommodation is automatically applied for this object. It is not required for the SQL to be correct but is required for a business rule. The GROUP BY clause, however, is required for the SQL to be valid. The good news is that universe designers do not have to anticipate all of the potential combinations of objects a user might put together. This clause is created automatically by the query tools.
Measure Projection
Measures can also include a Projection setting, as shown in this screen shot:
Note that there are several projection functions available, including None. What is the purpose of the projection function?
When a user is viewing a report they might see this:
From there they might remove the Year object in order to get a total rolled up by Resort. Once the Year is removed the block on the report dynamically rolls up and presents this information:
That is projection in action. The function selected on the drop-down box for the measure properties is used by the report client to know which summary function to apply.
How do these functions work together?
Is Aggregation Required?
I have seen universes where the designer did not use any SQL aggregation functions at all. Instead they assigned projection functions only. With no aggregation the database returns the rows as they are extracted. The impact? The report client gets a lot more rows, as shown here:
The first execution of this query returned only nine rows. There are three resorts and three years of data so there are nine combinations, resulting in nine rows of data. The second execution did not have any database aggregation so 324 rows were returned instead. The report looks the same, but it has to deal with a lot more data. You can imagine what the performance would be like if there were several zeros after either (or both) of the numbers from this example.
Is Projection Required?
If the universe does have aggregation, does it have to include projection as well?
Without a projection function, the measures will no longer roll up after an object is removed from the block. Here is what happens to the same report when the Year object is removed; see how the resort names still appear more than once? And the number of guests does not get rolled up? This is what happens when the projection function of “None” is used.
Aggregation and Projection Work Together
Every measure object should have an aggregation function. (If a universe designer can’t figure out which aggregation function makes sense, then the object should probably be something other than a measure.) Each aggregation operation typically gets combined with an appropriate projection function as well. Most of the aggregation + projection function pairs make sense, but there are a few that might require a bit more explanation. First, here is the chart that shows the typical matches.
SQL Aggregate | Client Projection |
---|---|
Sum | Sum |
Min | Min |
Max | Max |
Count | Sum |
Avg | None or Average |
The first three entries are easy to see; a sum from the database is projected using a sum on the client. The settings for counts and averages might require a bit more explanation.
Counting
I created a new object in Island Resorts called Invoice Count with the SQL code count(Sales.inv_id)
and a projection function of Count. This is wrong, and I will show you why. 🙂 Here is a query with Country of Origin, Region, and Invoice Count as results.
Notice that in the raw data I do get a count of invoices by region and country. However, here’s what the block looks like.
The projection function of “count” is counting the number of rows for each combination. Oddly enough, there is one row of each. 🙂 In order to get the correct results, I need to use the Sum projection function for my count, which gives me the following corrected data.
Averages
I covered averages at some length in a prior blog post so I will skip covering the same ground here. To summarize (in case you don’t want to read the link) averages cannot be projected because the math doesn’t work. The only exception (where you can average an average) is when the sample size is exactly the same for all values.
Conclusion
Every measure object should have an aggregation function in order to group rows on the database and return a smaller dataset to the client. The projection function should coordinate with the SQL aggregation function in order to support “slice and dice” operations on the client.
Just three notes:
– For averages the projection should be set to “database delegated” (a.k.a smart measure)
– COUNT DISTINCT would also be a candiadte for “database delegated” projection.
– Semi-additive measures, or better: measures that require one kind of aggreation across one dimension and another type of aggregation on a different dimension, etc. cannot be implemented in an easy way at all using universe designer. One should use OLAP for such scenarios.
Thanks for the input, Andreas. You raise some good points.
Database delegated can certainly be an option, but isn’t available in earlier releases. And I would also suggest that the volume of data returned to the report also needs to be considered. Local calculations might be faster than rerunning a database query in cases of higher volume.
Hi David, By BOE roll Designer has this privillage to do. A report developer who has no access to universe by business process still this can be done by creating variables.For example we had a count in result dataset needs to be displayed with higher hierachy level, say state wise head count should be displayed to country level (result data set is ContryofOrigin,Region,InvoiceCount and BO Report should be [Country of Origin],[Inoice Count] ),To manage this in report level create a variable [Var.InoiceCount] has formulae =Sum([Inoice Count]). Now the Report uses this variable ([Country of Origin],[Var.InoiceCount]), This will work in report as you do in universe level
great post Dave, I find this is often something that is misunderstood by many Business Objects developers. the number of universes I’ve seen with no aggregation functions in the Select clause of measures are too many to count!
– Josh
Vajravelu, welcome and thank you for the comment. What you say is certainly true but it’s far from optimal in many cases. The universe design is supposed to ensure consistent results. If each report writer is required to reinvent the wheel (so to speak) and create variables for sums and counts, there is a chance for inconsistent results. In addition to that, without a SQL aggregate, the report is returning far more data than is required.
Josh, measures are the second thing I look at when called upon to review a universe. 🙂 Does anyone care to guess what the first (and most important, in my opinion) might be?
Hi Dave, heartily agree with this post. It is a timely reminder of the pitfalls of a poorly designed universe. I believe those universes without properly aggregated measures is sometimes done to successfully circumvent potential SQL traps and forces all the aggregation (and intelligence) onto the face of the reports. Indeed one of the real issues with not building universe measures is that you once you have gone down this road, you are stuck with. Web Intelligence queries do not react very kindly if you try to combine these “non-aggregated” measures with measures that attempt to something a little smarter eg the use of CASE statements in conjunction with @PROMPT to generate Year to Date and Month to Date figures
Hi Dave, I think the first thing I’d look for would be multiple unattached schemas (which would result in cartesian products) – I’ve seen that many times as well. 🙂
Hey Dave thanks for the posts…Aggregate Vs Universe Projection and the Database Delegated Measure.
BOth are just AMAZING.
Increase your frequency of posting topics. :p
Regards,
Shiva
Thanks Dave as usual your contributions to the community are stellar. I first check the joins for loops, fan/chasm traps, then I ensure measures are properly defined. How about you?
Dave,
I am a regular visitor to your blog!..
I disagree with this article of using database aggregation in measure object.
If the report require to drill down from top most aggregation to lowest grain level, then how to use the database aggregate function..?
Thus it is based on the requirement of the report, we should decide.
Thanks!
regards,
ilangovan
Hi, ilangovan, thanks for your comment. I am not sure I follow your question. Are you suggesting that in order to drill on a document that the database aggregation has to be turned off? If that is the case, then that’s not correct. When you drill on a report and set up a hierarchy the lowest level of detail is already retrieved on the report. Then the data is projected to the top level of the hierarchy until you drill. But imagine if you had millions of rows in your fact table, you would never want to return that level of detail to a report. Ever. 🙂
hi Dave,
I have a situation where in my user is trying to query an fully additive fact(revenue) with a semi Additive fact(Asset Value).
I have both stored in my database on a monthly level in the same table.
When he queries for an entire year I would like him to see the revnue aggregated for the entire year while the Asset value should be reported as of the last month of the year.
I have been looking into forums trying to find a solution.
Can you help me..
Hello Dave,
I found this post to be very informative – especially the projection part.
I had an issue with the aggregation :
If we have a group of employees working a certain number of hours
Using aggregation – number of hours will always be summed . But what if I need to work on the level on an employee and get the employees that have worked 0 hours for example.
The query gets per department – the number of employees who have worked 0 hours
But of course due to using the SUM ; it groups by department and gets the department that has working hours =0 ; is there any way to apply the condition on the employee /row level instead of the group level ?
Hi, thanks for the question. I don’t have time to write a long answer (and this is actually a blog post I have been meaning to write) but you can solve this with a sub-query. Your sub-query gets a list of employees with zero hours, and then you can get the departments and the total employee counts for employees on that list. I hope this is enough to get you started.
Hi Dave. Can you advise what to do with numeric objects that are non-aggregable? I have numerous objects that are different costs and they shouldn’t be aggregated. So at first I left them as dimensions. But because they don’t form part of the star schema as such, it causes me problems with multi-context queries (I get a synchronised query instead of a joined one). So I made them measures. If I aggregate them with a SUM, I get the correct answers as long as I ensure my query is written correctly, but it takes a long time. So I left them as measures without the SUM which I know is wrong. What would you set such objects as? Detailed objects?
Yes, cost is a classic example of a numeric value that should be a detail object. Cost is an attribute of an item, correct? So when you include that item (Item ID, part number, name, something that identifies the item) then cost can be a detail of that piece of information.
Ok so I’m going a bit off tangent here. And this question applies to merged queries as well. What if the object is a detail of more than one dimension, say it’s the cost for the product and customer. To which dimension do you make it a detail of? Thanks by the way.
First guess would be to make it a detail of the element from the same table. I don’t expect you have products and customers in the same table, correct? You might have a product cost and a customer cost but those would be two separate objects.
Thanks Dave for this great post.
I have a question to clarify: should project function and sql aggreation function be always the same? Can I have “count” in the SQL statement, and use a “Sum” in my the project?
Hello Dave, First of all .. Great post! I have one query.. Do we need Selection level function JUST to retrieve small dataset? I can see in SQL sum(revenue) returns 5000$ versus only country.revenue returns 3500$. Why is it so?
Chaitanya
If you don’t have an aggregate, then you end up with a measure amount in the “group by” clause, and if there are like values (same values on different rows) they are grouped before they are summed, resulting in a value that is too low.
James, absolutely! For example, you can count(x) in the SQL aggregation, which should be followed by a Sum as the projection function. You don’t generally want to count a count. 🙂
What is the first thing you look at (i.e. before measures)?
Measures are the first thing I look at. 😉
Hi Dave,
Always a loyal fan of your post. Thank you for all the high quality stuffs.
You said
“Josh, measures are the second thing I look at when called upon to review a universe. 🙂 Does anyone care to guess what the first (and most important, in my opinion) might be?”
and
“Measures are the first thing I look at.”
well… that make me confuse… are you just joking or there’s something else that are more important than measures?
Eh, you caught me. I wasn’t consistent in my answers, was I? Now you get to watch me wiggle out of this one like a politician. Or attempt to. 😉
Measures are the first thing I look at when examining objects. As I started off in the blog post, they’re the most important objects in a universe and need to be done correctly. That being said, the first thing I look at in a universe overall is the structure, making sure that join structures are done correctly. That means proper usage of shortcut joins, loop resolution, and context definitions. Even if measures are defined correctly, an unsolved fan or chasm trap can still cause them to return incorrect results.
So, how did I do? 🙂
Counting truly DISTINCT Items at the Universe level is still problematic. Using the “Database Delegated” for the Aggregate Projection sort of solves this until you start to have other measures in a report. I either get the #RECALC or #DATATYPE error in the other measures that were working fine until adding the “Database Delegated” count object. I have given up, and just expose a db id field as a dimension, then create a count object on it in the Webi.
My project had a child table that we were joining to, creating a many-to-one relationship to the parent table that I wanted a row count on.
Can we add multiple functions(SUM, MIN, AVG, MAX) for one object.
No, because SQL does not allow you to nest aggregate statements.
For example, you can’t do
select sum(max(x)) from ...