Sometimes Data is Really Different

Real data doesn’t cooperate with the antiseptic world of database and software designers. We’ve pointed out elsewhere how difficult it is for ordinary database systems to cope when input data formats vary over time, and how Spendata solves much of that problem with its Intelligent Loader.

But sometimes input data transformation requires more than column manipulation. The usual solution is to employ an “ETL” (“Extract, Transform, Load”) approach to modify the input data before it is loaded. This can take the form of an actual third-party ETL tool, or it may involve a custom transformation involving Excel or other data manipulation program.

A More Powerful Idea

At Spendata, we think conventional ETL thinking is flawed. At data load time, you have neither the scope nor the information required to make any decision that isn’t essentially trivial. All you have for inputs to your ETL function are the fields associated with the raw transaction. No rolled-up dimensions. No mapped categories. Just raw data. Problem is, the most powerful data transformations you can make — and the most useful from a business standpoint — are transformations that are fully informed by the data and by data relationships at a much higher level.

For example, what if transformation of an input data field is a function of what Commodity the transaction falls into? The transaction hasn’t even been mapped to a Commodity yet. So that transformation is unavailable to you. In fact, if any of your transformation decisions are based on mapped or even rolled-up data, you can’t make any of them at data load time.

So, rather than transforming data at load time, load it in as-is, and then put Spendata to work on the transformation. Whether the transforms are simple or complex, the tools at your fingertips vastly exceed anything you can do with conventional ETL. You can perform multi-pass derivations, and build dimensions and measures dependent on other dimensions and measures. You can also build any kind of computation necessary — and you can do it in javascript, the web language that’s used and understood by millions.

Part of the reason that data transformation is assumed to occur during data load is that the tools available for manipulating and transforming data at a higher level have been quite limited. That’s just not the case with Spendata, which, at its core, is a transformation engine, not merely a data display tool.

Don’t Forget the Auditors…

A final nail in the coffin of the “transform on input” idea is that there is no way to audit your work. Who knows what you did? You may have dropped transactions. You may have changed numbers or fields. Either way, your changes cannot be audited or tracked, so your analysis can’t be defended.

Contrariwise, if you performed your transformations inside Spendata, every computation, derivation, mapping rule, and script is recorded. The raw data that was entered is still available, in its original form, as is the transformed data. Now you can defend your results — and if your transformations are challenged, you can alter them in real time. There’s no longer any need to worry about “that person in the back of the room” (we’ve all met this person) who loves to find small errors and promote them into a Big Deal. Correct the error, move on, situation contained.