For this post, I thought I’d cover one of the biggest headaches in corporate IT environments today – data. Data is the lifeblood of software products and services. Without consistent, reliable data applications are useless. I’m sure most of you have heard the saying “Garbage In Garbage Out”. Data is often over-looked because it’s “not sexy” and considered tedious or boring.
For a typical enterprise, the data lifecycle includes:
1) Sourcing the data
Who provides you with data? Do you manually enter it? Do you pay for feeds through data providers? Bloomberg is an example of a market data provider for the Financial Services industry. What are the SLAs (service level agreements) around receiving your data? What will happen when the data you require doesn’t arrive in time? Do you have processes in place to handle incorrect data?
2) Cleansing/Transforming the data
Chances are you need to “cleanse” the data. Typically, this is required to store the data in your systems and can include reconciliation, validation and transformation.
3) Distributing the data
Who are the end users of the data? How will they access the data? Will it be via third-party applications (e.g. reporting system)? Will the data be published to other constituents throughout your organization? What happens when there are disagreements on interpretations of the data? Is there a clear definition of the data your are distributing?
4) Maintaining the data
Now that you have “clean” data, how do you keep it that way? How do you prevent “data decay”? This is a tough problem to solve. Maintaining high quality requires significant effort and people, as well as, accountability.
Why am I talking about data? For two reasons. First, maintaining accurate data is a problem that almost all organizations struggle with, in one form or another. Secondly, I’m hoping to spark discussion and debate around innovation in this area. What can we do better? Can we use some newer web technologies to address some of these issues? Comments/ideas?

