“Is it Time to Drain the Data Lake?”

(Dominic Wellington, Datanami, 15 October 2018)

“Collecting data from a data lake can be time-consuming and tedious. When fishing for valuable things, nets can get snagged on meaningless rubbish like old car tires or boat parts — completely missing anything valuable. However, Artificial Intelligence (AI) can help spot useful patterns and bring them to the attention of human specialists...”.

With the rise of the data lake, the term ‘Big Data’ is back in the spotlight. The assumption of the common metaphor “data lake” is that if you pour enough big data into the lake, you will have enough information to find what you are looking for. The lake is so deep and dense with data, it takes time to filter the noise to specify what is useful. One problem is that there is lack of true observability around the massive amounts of data we hold. Big Data can detect data patterns and automatically flag them before people have to ask about it. If you can filter the data streams before they enter the data lake, you are able to apply analytic algorithms to the actual unfiltered data lake. Gaining a better insight of this, you can rid of age-old filtering dilemmas and focus on tools that remove just enough data, focusing directly at the network age. With the help of AI-enabled filtering, all relevant and actionable data makes it to the next stage, leaving back the irrelevant data. You can take advantage of a process that can determine a pattern from a specific user request, the exact code path being used, and system resource utilization, identifying a new type of problem and allowing specialists to strategize and define the meaning of why the algorithm recognized that specific pattern — in real time. By searching the data lake, subsidiary systems can determine whether this pattern is new or recurring, but eventually, more and more incidents will begin to take place for the first time and begin to fill the data lake even further. This can deplete the value of the data lake as a whole. Without the ability to monitor in real-time, data lakes are becoming less and less of a necessity.

Leave a Reply

Your email address will not be published. Required fields are marked *