Saturday, 27 September 2014

Data Vault - Data warehouse modeling

In last couple of months I am obsess with the Data Vault (DV) modeling technique, to design enterprise data warehouse (EDW). The inventor od DV is Dan Linstedt (www.danlinstedt.com). 

Below are some comments sourced from the Lucrum Business Intelligence (www.lucruminc.com), I think they are very good summary of the DV modelling.   
__________
.... The data vault is not a product.  It is not a magic pill that makes all your IT ills go away. It is a comprehensive approach to addressing real world issues with existing implementations.  It brings real flexibility and adaptability to the implementation and brings reliability and dependability to the business.  And with a team that understands the power of the data vault, you are now able to take your Business Intelligence environment upon which the tools that do the flash and splash can be sourced from. ....
__________        

... Modelling Rules for Each Part of the DV Entity:
FOR HUBS:
  • Hubs keys cannot migrate into other hubs (no parent/child like HUBS).
  • Hubs must be connected through links.
  • More than two hubs can be connected through links.
  • Surrogate keys may be used.
  • Business keys are 1 to 1 relationship with surrogate keys.
  • Hubs primary keys always migrate outward.
  • Hub business keys and primary keys never change.
  • If a hub has two or more satellites, then a point-in-time table can be built for ease of joins.
  • An ‘UNKNOWN’ business key record can be inserted into Hub that can be used to tie other data in links and sats that has no business keys in source. This kind of data is usually a bad/incomplete source data.
FOR LINKS:
  • Links can be connected to other links.
  • Links must have atleast two hubs associated with them in order to instantiated.
  • Surrogate keys may be used.
  • The combination of surrogate business keys made a unique key.
  • Does not contain descriptive data.
  • Does not contain begin and end dates.
FOR SATELLITES:
  • Satellites may be connected to hubs or links.
  • Have 1 and only 1 parent table.
  • Satellites always contain either a load date-time stamp, or a numeric reference to a stand-alone load date-time sequence table.
  • Primary key is a combination of ‘surrogate key’ from either hub or link and the load datetime stamp.
  • Surrogate keys may not be used.
  • Must have a Load End Date to indicate when the CHANGE to the data set has occurred.
  • Satellites are always delta driven. Duplicate rows should not appear.
  • Data is separated into satellite structures based on 1) type of information 2) rate of change.....



PS. The images was sourced from this web site  http://thefuturevalueofbusiness.com/index.php?s=Data+Vault , however this web site in not any more active. 


No comments:

Post a Comment