Data Warehousing (is) For Dummies

Good heavens. I have discovered that the social sciences are not the only jargon-bound fools. Data Warehousing has its own lexical morass of conflicting and unnecessary terminology. This is a symptom of intelligent people attacking problems which have already been solved, failing to do the research, and inventing new terms for what they think are new concepts. The worst effect of this i that they get the borders of the concepts wrong, and can never be sure exactly what they are talking about. See also LEAN, Six Sigma, Sociology, any field described as ________ Studies at a University, and so forth.

Here’s a big block of nonsense from the Wikipedia entry on Degenerate Dimensions in the context of Data Warehousing. Note that it is immediately followed up by four alternate nonsenses, because nobody knows what the Hell he is talking about.

The Kimball Definition
According to Ralph Kimball, in a data warehouse, a degenerate dimension is a dimension which is derived from the fact table and doesn’t have its own dimension table. Degenerate dimensions are often used when a fact table’s grain represents transactional level data and one wishes to maintain system specific identifiers such as order numbers, invoice numbers(Bill No) and the like without forcing their inclusion in their own dimension. The decision to use degenerate dimensions is often based on the desire to provide a direct reference back to a transactional system without the overhead of maintaining a separate dimension table.

Other DefinitionsWhile many sources agree with Kimball, other definitions of degenerate dimension can be found both online and in other reference works:

  • “A degenerate dimension is data that is dimensional in nature but stored in a fact table.”
  • “Any values in the fact table that don’t join to dimensions are either considered degenerate dimensions or measures.”
  • “”A degenerate dimension is when the dimension attribute is stored as part of fact table, and not in a separate dimension table.”
  • “”A degenerate dimension acts as a dimension key in the fact table but does not join a corresponding dimension table because all its interesting attributes have already been placed in other analytic dimensions.”

This is not to say that the subject matter itself is devoid of value. Indeed, Kimball’s concept of using a standard set of identical tables as sort of meta-keys to larger clusters of facts maintained as “data-marts” in their respective business units is part of the promise of normalization and a data directory. The issue I have with it is that this is all already inherent in the well-defined discipline of relatinal theory, and made possible, even easy, if you follow relational theory to normalization of data. Developing derivative concepts as such is a right and noble thing to do, but without knowing the foundations, these folks cannot help but throw down buttresses every time they turn a corner. Who can clearly discern the shape of such an ugly agglomeration of concepts?

By defining made-up structures as buses, and re-naming dimensions, facts, and measures based on the context, people seeking to quantify and standardize are doing just the opposite.

Bookmark the permalink.

Leave a Reply