This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them.Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.
Successful warehouses grow -- get ready for the opportunities and obstacles.
So you've deployed your first dimensional data mart. Fortunately, the business loves it and is clamoring for more! They want to query and analyze integrated data from additional enterprise data sources. For some of you, this demand is a dream come true. For others, it may be the start of a slow-motion nightmare.
In this column, I'll use a case study to discuss the challenges you'll likely encounter when it's time to roll your next business process subject area or data mart into production. Sadly, there's no magic. Putting your next subject area into production is a difficult, though not intractable, problem. It's a good place to focus any fastidious tendencies you might have.
Recognize the Target
Let's start at the end-state goal. Your target enterprise information architecture consists of multiple interconnecting business process subject-area schemas (or data marts) that are dimensional in format. By dimensional, I mean there's a detailed fact table connected to multiple dimension tables. The dimension tables allow different subject areas to interconnect. You have to share dimensions between subject areas for this to work. In other words, if two schemas use a customer dimension, they have to use the same customer dimension. Previous Data Warehouse Designer columns have advocated the enterprise data warehouse bus-architecture approach, providing clear instructions for designing in an incremental and iterative fashion.
Using shared or "conformed" dimensions has several important implications. Dimension surrogate keys must be identical. How else can you join across subject areas? Additionally, you must have corporate agreement about which attributes are tracked through history and which are updated in place. How else can dimension keys be identical?
Case Study Scenario
Let's imagine you launched your enterprise information infrastructure with a successful retail sales data mart. The retail sales information is used daily by store management, as well as by many corporate users in marketing and finance. The dimensional design for retail sales is outlined in Figure 1. It captures data at the correct transactional grain, its daily update process is smooth, and system downtime is well within the service-level agreement (SLA). The retail sales data mart is a thing of grace and beauty.
Figure 1 Data model for retail sales process.
The success of the retail sales project has emboldened management to extend the enterprise warehouse to support additional CRM requirements, namely customers' calls into the call center. I won't detail this new schema, but it obviously includes the customer dimension. In theory, this decision is no problem: You'll reuse the date and customer dimensions from the first mart, add some new dimensions and new facts, and you're done, right? In practice, there will be a few wrinkles.
Not So Fast
Obviously, you'll need to figure out how to modify the data extract, transform, and load (ETL) process to also populate the new tables in the call center schema. But there may be some more complex implications.
Let's look a bit more closely at the customer dimension in the original retail sales schema. First, note that the definition of customer is actually for a subset of all customers; the only customers you know anything about are those with loyalty cards and corresponding account numbers. When you add the call center information, you can expect to get information about a whole bunch of customers you've never seen before. The customer dimension ETL process is going to have to change to accommodate the new source, as I'll discuss later. Adding new customers to the customer dimension shouldn't affect the existing data mart. The new customer rows won't join to any existing retail sales fact rows because there's no way to link a "nonloyalty" caller to a sales transaction. The existing sales schema should behave the same, or almost the same, as it always has.
What if the business users want to track more information for call-center customers than was necessary or available for retail sales? As a simple example, what if the caller's phone number is important? You need to add that new attribute to the customer dimension, again modifying the customer-dimension ETL process. As was the case with the new customers, adding the phone number shouldn't disrupt the original data mart; existing reports and analyses should continue to work as before.
Finally, let's tackle a harder problem. As shown in Figure 1, you have attributes about the customer's address, starting with postal code. What if the business users of the call-center data have a compelling need to track history for the customer's address? In other words, they want to be able to know that when a customer called last year she lived in Montana, but now lives in Hawaii. Dimensional modeling can handle this requirement with what's known as a type 2 slowly changing dimension technique. The issue is that you're changing a fundamental characteristic of the customer dimension in a way that could affect the existing retail sales subject area. I'll return to this dilemma in a bit.
We welcome your comments on this topic on our social media channels, or [contact us directly] with questions about the site.