Aug 28 2009
Time-Sliced Measures Part II: Time Slice Calendar Table
In the first post in this series I defined what time-sliced measures are and why they can be useful in a universe. I also shared the design requirements for a particular project I was on. The requirements included:
- Each report is expected to have multiple time-sliced measures
- Users must be able to select the type of calendar during the refresh process; calendar types include Monthly and Fiscal
- Users can provide any possible date as the “to date” for the time slices
- The process of splitting each time-slice time period into its own SQL statement should be completely transparent
- The resulting SQL should be as efficient as possible
In this post I am going to cover the design of a special table that we built in order to support our solution for these requirements. After I talk about the table design I will cover how I use it in the universe, as well as provide a few pros and cons about this solution as I have outlined it so far. Just to prepare you, this post is a bit longer than most that I write, and gets fairly detailed. This post will cover slides 23 through 25 from my 2008 GBN Conference presentation. There is a link to download the file at the end of this post.
Time Slice Calendar Table Definition
Each time slice has a start date and an end date. The actual dates will change based on whether I am looking at a fiscal quarter or a calendar quarter. I need to be able to provide any reference date as my “to date” or input date for each time slice. Finally, I want to be able to compare the current year values with the matching prior year values. All of these requirements can be solved with the following table structure:
Column Name | Description |
---|---|
ref_cal_dt | Calendar date used as input for prompts |
cal_tmslc_mn_val | Time slice name |
cal_nm | Calendar name |
curr_yr_tmslc_strt_dt | Current year date range start date |
curr_yr_tmslc_end_dt | Current year date range end date |
prior_yr_tmslc_strt_dt | Prior year date range start date |
prior_yr_tmslc_end_dt | Prior year date range end date |
The ref_cal_dt
is the reference calendar date. Every time I use this table I need to provide that date, either via a prompt or condition of some kind. If you don’t need the flexibility of redefining the “to date” value in your environment this can be removed. The next two columns are codes for the time slice and calendar type. The last four columns contain the starting and ending dates for each time slice; the first two for the current year and the last two are the matching dates for the prior year. I will talk about each of these columns in a bit more detail in the following paragraphs. I am going to save the reference date for last.
The cal_tmslc_mn_val
contains values like MTD, QTD, YTD, FM, FQ, and FY. Each of these codes represents a time slice. For example, QTD is Quarter To Date and FQ is Full Quarter. The type of quarter (monthly or fiscal) is determined by the next column.
The cal_nm
or Calendar Name column is used to differentiate between the different types of calendars stored in the table. For our implementation we had two calendars, but there is nothing that says I could not create more if needed. I had to create fiscal time slices and calendar type slices to solve my requirement, and the names became Month Calendar and Period Calendar. I was asked at one point why we stored the full text representation rather than a code. If my memory is correct, we configured this column as a compressed character column in Teradata. Since there are only two values, the compression algorithm was very effective.
Here is some sample data from my table. I mentioned earlier that in order to use this table I have to provide an input or “reference” date. For the sample data shown below, the reference date was August 15, 2008. The resulting rows show the start date and end date for all of the available time slices for each calendar type (Month or Period). Both the current year (CY) and prior year (PY) date ranges are retrieved on the same row.
Code | Calendar | CY Start Date | CY End Date | PY Start Date | PY End Date |
---|---|---|---|---|---|
MTD | Month Calendar | 8/1/2008 | 8/15/2008 | 8/1/2007 | 8/15/2007 |
QTD | Month Calendar | 7/1/2008 | 8/15/2008 | 7/1/2007 | 8/15/2007 |
YTD | Month Calendar | 1/1/2008 | 8/15/2008 | 1/1/2007 | 8/15/2007 |
FM | Month Calendar | 8/1/2008 | 8/31/2008 | 8/1/2007 | 8/31/2007 |
FQ | Month Calendar | 7/1/2008 | 9/30/2008 | 7/1/2007 | 9/30/2007 |
FY | Month Calendar | 1/1/2008 | 12/31/2008 | 1/1/2007 | 12/31/2007 |
MTD | Period Calendar | 8/10/2008 | 8/15/2008 | 8/12/2007 | 8/17/2007 |
QTD | Period Calendar | 6/15/2008 | 8/15/2008 | 6/17/2007 | 8/17/2007 |
YTD | Period Calendar | 12/30/2007 | 8/15/2008 | 12/31/2006 | 8/17/2007 |
FM | Period Calendar | 8/10/2008 | 9/6/2008 | 8/12/2007 | 9/8/2007 |
FQ | Period Calendar | 6/15/2008 | 9/6/2008 | 6/17/2007 | 9/8/2007 |
FY | Period Calendar | 12/30/2007 | 12/27/2008 | 12/31/2006 | 12/29/2007 |
What does this tell me? I see that for August 15, 2008, using the Month Calendar, the month to date (MTD) range starts on 8/1/2008 (all dates in this blog post are in MM/DD/YYYY format) and ends on 8/15/2008. The full month starts on 8/1/2008 and runs until 8/31/2008. If I switch to the Period Calendar instead, I see that the MTD range runs from 8/10/2008 to 8/15/2008, and the full month (fiscal month rather than calendar month at this point) runs from 8/10/2008 to 9/6/2008.
Now, about that reference date… in our requirements we had to allow the user to redefine the “to date” for any time slice. It is fairly standard for the “to date” to be the last warehouse refresh date. Since we had to be able to run any report for any “to date” our table starts with the reference date. In the sample data posted above, the reference date was always 8/15/2008. I would see another set of rows for 8/16/2008, for 8/17/2008, for 8/18/2008, and so on. Every single possible date value was stored in the time slice table, and the starting / ending date ranges for all time slices and calendar types were computed and stored for each one.
There is one important note about performance that is appropriate at this point. Because we had to be able to roll values up by multiple calendars, we were required to store all of our data at a daily grain in our fact table. Since weeks do not fall on month boundaries, and since fiscal and monthly calendars do not share the same date ranges, the only possible way to solve the issue was to leave things at the daily grain. As you can imagine, this caused some challenges with query performance.
Join Fact Table to Time Slice Table
Now that I have thoroughly explained my special calendar table and its contents, how am I going to use it? This table is going to be the “driver” for my time-sliced measures, and drivers need roads. (That is, unless you have a nuclear powered DeLorean. 😉 ) I’ve posted before about contexts being paths or routes through a universe, and that’s what I need here too. But with only one calendar table and one fact table, where will I get different roads? First I need aliases.
I will create an alias of my special calendar table for each time slice definition. That means since I have six time slices (MTD, QTD, YTD, FM, FQ, and FY) and two years (Current Year CY and Prior Year PY) I need to create 12 aliases. After I do that, I create joins from each of these aliases to my fact table. Normally having multiple dimensions per fact is going to be fine, but in this case I want to ensure that every different time slice forces a separate SQL path… and now I’m back to where I need to create the contexts.
Here’s what a slightly simplified version of the universe looks like. I only created the “To Date” slices and left the “full” versions out for now.
Yes, that is a picture of the venerable Island Resorts Marketing universe, but with some aliases of my time-sliced calendar table added. Each of those aliases is joined to the “fact” table called SALES
. The original CAL_TIMESLICE
table is in the universe in the lower left corner. The six aliases are outlined in red on the right. Each alias has a join that looks like this:
SALES.INVOICE_DATE between cy_mtd.CURR_YR_TMSLC_STRT_DT and cy_mtd.CURR_YR_TMSLC_END_DT AND cy_mtd.CAL_TMSLC_MN_VAL = 'MTD' AND cy_mtd.CAL_NM = @Prompt('Please select Calendar Type','A','Time Slice Calendar\Calendar Name',mono,free) and cy_mtd.REF_CAL_DT = @Prompt('Please select Calendar Date','D',,mono,free)
I will break this down into steps.
The most direct part of the join takes the invoice date and compares it to the time slice start date and end date, using the BETWEEN
operator.
SALES.INVOICE_DATE between cy_mtd.CURR_YR_TMSLC_STRT_DT and cy_mtd.CURR_YR_TMSLC_END_DT
Next, I want to make sure I get exactly one date range, and that is done first by hard-coding the time slice mnemonic value…
cy_mtd.CAL_TMSLC_MN_VAL = 'MTD'
… and then prompting the user to select the calendar type.
cy_mtd.CAL_NM = @Prompt('Please select Calendar Type','A','Time Slice Calendar\Calendar Name',mono,free)
As an aside, I created a join with a prompt for time-slice code as well. This is a nice enhancement that lets the user pick a single measure for their report and on-the-fly swap it from MTD to QTD or YTD or any other defined time-slice. Of course they can only use one at a time, so that is a limitation of that technique.
The final part of the join in my case was a date prompt for the reference date. This is the “to date” that I talked about earlier in this post. Please note that there is no LOV for the reference date; I rarely provide a list of values for dates since there is not much value added by that overhead.
cy_mtd.REF_CAL_DT = @Prompt('Please select Calendar Date','D',,mono,free)
The join logic is the same for every alias except for the hard-coded mnemonic code for time slice.
Checking Join Structure
In this scenario as I have built it so far, have I introduced any loops? any fan traps? The answer to both of those questions is “No, I have not.” However, I have introduced incompatible joins.
For example, suppose that a user picked a MTD and YTD object at the same time. The SQL (abbreviated here) would include this:
cy_mtd.CAL_TMSLC_MN_VAL = 'MTD'
AND cy_mtd.CAL_TMSLC_MN_VAL = 'YTD'
This is a problem 🙂 but one that I anticipated. At the beginning of this process I said I wanted to create a way to dynamically and transparently generate separate SQL passes for each time slice. I explained the structure and data for my table, and now I have shown how I join that table to my sales fact. The last step is to isolate each one of these aliases into a separate context. Using my shortcut for copying a universe context this step takes only a few minutes to complete for the initial configuration. Each context includes all of the tables from the original Sales context plus exactly one alias of my special calendar table. Here are the joins for the current year MTD alias:
And then I create a context for each of the others as well; here is a list of the contexts from my universe at this point:
Why This Strategy?
As I have been sharing this design concept at various conferences or with other universe designers I invariably get questions like “What made you go this route?” or “Why not alias the fact table instead?” and so on. They are fair questions, and I don’t have time to answer all of them in this post. 🙂 I will try to quickly point out a few advantages to this technique.
If I alias a fact table I can almost certainly guarantee that I will introduce additional loops in my universe structure. There is a limit to the number of loops that can be detected / displayed by Designer, so adding extra loops when I don’t have to is not in my best interest. Plus, as blog readers will see soon when I start building objects on this structure, there is a big benefit to having all of my measures come from a single fact.
This technique does not add any new chasm or fan traps to the universe.
This technique allows me to support a potentially infinite number of time slices without doing any date calculations on the fly.
This technique allows me to support a potentially infinite number of calendar types. Here I have shown both fiscal and calendar types, but I could easily add more.
On the negative side, there are complexities associated with creating contexts in the universe. I mentioned above that creating the initial contexts is simple. Any future maintenance will be more complex as a result of all of the contexts in the universe. However, I submit that the complexities of the universe are more than paid for by the advantages to the users who will be creating reports. And whether I chose to alias the fact or the calendar table I would still have this issue so it’s really a moot point.
Next Steps
At this point I have accomplished the following:
- Created and populated my special calendar table
- Created aliases for each time slice / year combination (only “to date” versions are shown in this post, but I did create the “full” aliases and contexts as well in production)
- Created joins that link each time-slice calendar alias to my fact
- Created contexts to ensure that each time slice will appear in a separate query pass
What’s next? I have to create some objects so my users can make use of all of my hard work so far. That’s my next post in this series, which will cover slides 26 through 30 of my talk. Stay tuned. 😎
Related Links
- Conference presentation download: Universe Designer Essentials and Beyond
- Prior post in this series: Time-Sliced Measures Part I: Defining the Problem
- Other supplemental information: Teradata Compression Explained
Wow, this was really an excellent post! Very nice thinking of the universe developer behind all of this. Very instructive and useful.
Thank you.
Nice post Dave. However part 1 is not available, would it be possible to post it?
Hi, sri, the link was wrong. You could have found it by searching. 😉 I have fixed the link.
Thanks Dave….thought I searched “search is your best friend” 🙂
Good stuff! Anxiously awaiting Part III.
Thanks for the Good Post.
Is there any way where in I can eliminates the need to create and store complex time-based reporting tables. Are there function like TODATE and AGO functions found in OBIEE semantic layer (siebel analytics). It was just simple there, I had to use the function on the semantic layer and BI server would raise the query to get the answer.
The syntax was
Todate(measure, time_dim_attribute)
Ago(measure, Time_Dim_Attribute, period)
OBIEE issues time-series queries, in the form of AGO and TODATE. AGO gives you, for example, the value of sales one month ago or one quarter ago, whilst TODATE gives you the total of sales month-to-date, or quarter-to-date, or year-to-date.
Both of these time-series functions rely on the existence of a time dimension in your business model and mapping layer, with the period you can refer to in the AGO and TODATE functions being determined by what levels you have in this time dimension.
This eliminates the need to create and store complex time-based reporting tables.
I wish BO universe designer has this sort of functionality which is present in OBIEE semantic layer. where I do not need to create and store time-based reporting tables.
Regards
Ishaq
Hi Ishaq, and welcome to my blog. Thanks for your comment. It would be nice to have function calls for “ago” and “todate” as you describe but I have to wonder how well they perform on large data sets?
The implementation where we developed the basic model I am presenting in this series of posts has over 2B (that’s billion) rows in their fact table. We looked at some of the Teradata OLAP functions but they were not going to provide the speed and flexibility that we wanted and got from this system.
Hi Dave thanks for the reply, That’s right performance is an issue with these sort of functions. I have experienced it.
Hi Dave,
I am Anxiously waiting for Part III. Regards Ishaq
Hi, Ishaq, it’s about half done. Right now I have to focus on getting my presentations done for this year. They were due on Sept 7th and I have only turned one in right now. 😳
Hi Dave,
I am still Anxiously waiting for Part III. Thanks & Regards Ishaq
Hi, it’s still in the queue to be completed. Have you reviewed the presentation itself? I believe it contains all of the specifics required to implement this solution.
Dave,
Any chance you could repost part 1 (It seems to have been deleted) and/or point me to the full presentation? I think I’ve got the idea from this one post, but want to be sure I don’t skip anything that might be important. I also never did see Part 3 either 🙂
Part I is linked at the end of the post. All of the parts should be published for this topic, and the full presentation is on the conference presentations download page, which is a permanent link on the top-right corner of the page.
Thanks Dave, the link I tried was the one in the TOP of the post, seems to have a bad URL 🙂
One question however.. if say a person was using this universe (a non universe developer) and was creating an ad-hoc report.
They pull a few elements, but use the “Invoice Date” object in their query filter to display a list of clients who placed orders on a given date.
If they also wanted the person’s YTD order totals, and pulled in your objects, wouldn’t they end up with TWO prompts asking for dates? One from the Object “invoice date” and another generated from the Join itself?
What would you suggest as the best way to avoid this? or is it unavoidable to have this functionality?
I fixed the bad link, thanks.
The prompt built-in to the time-sliced measure is separate from any other date restrictions. Our users have learned to use it, and know that they don’t have to provide other date conditions because the time-slice does it for them. So if I wanted to show orders for the year to date (including order number) then I can certainly do that. I include any / all objects that I want to use, include the YTD Sales object, and everything is based on the date prompt embedded within that logic.
Thanks again.. this seems to work great if you’re pulling one point in time, but if you have a need for QTD and YTD by Month in a WEBI report, then it seems to fall apart. It can be pieced back together at the report layer.
Is there some solution that would let me use this method to generate a result set similar to:
Month – MTD Sales – QTD Sales – YTD Sales
1 – 10 – 10 – 10
2 – 10 – 20 – 20
3 – 10 – 30 – 30
4 – 10 – 10 – 40
5 – 10 – 20 – 50
etc…
When I try this method the QTD/YTD only sums up the sales for the timeframes in the given month and doesn’t include the previous months…
I’m wondering if i need to also alias the fact table in conjunction with this…
If you want YTD figures broken down by month, only include the YTD measure but also include the Month and Quarter result objects. Add a break on Quarter and do a running sum and you’re there.
The time-sliced objects defined here are good for, as you say, one specific point in time.
Hello Dave, am I correct in the assumption that this way the end-user will have only ONE measure object “Sale Revenue” and not multiple measure objects
such as “Sales Revenue – MTD”, “Sales Revenue – Current Year”, etc?
Andreas, you can do either. A user can pick the generic “Sales Revenue” object and combine it with calendar objects, but the user would have to include a date range condition. Or the user can include the “Sales Revenue YTD” object with the month and year calendar objects, and they’ll get YTD revenue broken down by month, without having to enter a date range condition.
U r great dave.. I like u..
Nice post but I have several questions.
1) I would like to know whether the table is sliced into days. Say for example assume there is only One calendar code MTD
1/1/2012, MTD, Month Calendar, 1/1/2012, 8/15/2012, 1/1/2011, 1/15/2011
1/2/2012, MTD, Month Calendar, 1/1/2012, 8/15/2012, 1/1/2011, 1/15/2011
1/3/2012, MTD, Month Calendar, 1/1/2012, 8/15/2012, 1/1/2011, 1/15/2011
…
1/31/2012, MTD, Month Calendar, 1/1/2012, 8/15/2012, 1/1/2011, 1/15/2011
2) There are 7 columns in the table shown above. It is supposed there should be N * 7 objects for date dimensions in universe.
3) Do you create a surrogate key in both CAL_TIMESLICE and SALES dimension?
3) How to handle if there are sales record in Month/Date? Those records are stored into different tables. For example, we have a monthly sales record that is stored every month.
Thank you.
1) I’m a bit confused. First, your data example is wrong. If you are looking at the MTD or Month to Date slice, then the data would look like this:
1/1/2012, MTD, Month Calendar, 1/1/2012, 1/1/2012, 1/1/2011, 1/1/2011
1/2/2012, MTD, Month Calendar, 1/1/2012, 1/2/2012, 1/1/2011, 1/2/2011
1/3/2012, MTD, Month Calendar, 1/1/2012, 1/3/2012, 1/1/2011, 1/3/2011
Month to Date starts at the beginning of the selected month and goes to the reference date value and no further. The prior year includes the matching dates from the prior year. I don’t get your question about slicing into “days” as there is no “day to date” or “full day” value. If you want a specific day, then request only that specific date.
2) There are no objects built on this table that the user sees. They’re not needed.
3) There is no need for a surrogate key in this design. Every join is based on dates.
4) (I’m sure you meant “4” and not “3” again 😉 ) A sales record that is stored monthly is not going to have “to date” summarized values, it’s only a point in time value. As such you don’t need to retrieve a collection of rows, you need to only retrieve a specific row. If you wanted to tie that to this solution, it would be easy enough. The time slice process is designed to prompt for a date… the reference date or input date. Simply use that to filter your monthly sales record table to find the expected row for that month.
Sorry, my bad that I have not done proof read before submission
1)
I copied data from the example you mentioned above. In your example, the fields are (Code, Calendar CY, Start Date CY, End Date PY, Start Date, PY End Date). So my question is whether we need to input all dates into the table. Say 1/1/2012, 1/2/2012, 1/3/2012, etc.
2)
We have to allow users to show period information on the block. In your example, the data input is done manually via prompt. Users do not like prompts and want to do that automatically. Also, we have some publications/schedules that allow user to select data within range (Say all records in last invoice month). Users should not choose explicit date for that purpose.
3) Understood
4) Understood