“In a world where oil has been replaced by data as the most valuable resource, terms like big data, data warehouse and data lake are in the spotlight (…) This article defines data lakes and the need for them, explains the downhill of traditional data lakes, and introduces an innovative solution: the data lake powered by interconnected data pools”.
Most people don’t know what a data leak is or that it provides compute power, which proposes a background to execute analytics, Big Data and ML projects. Data lakes are storage repositories that collect vast amounts of data that were dispersed in diverse departments, in their raw format. As more companies use data leaks, they developed some issues and started hindering digital transformation. Companies realize now that due to poor data governance and management, they lost their time on building vast data lakes that ended up being data swamps. Data lakes are rigid, expensive and hinder innovation more than fostering it. Only 8% of the pilots accomplish their intended goals as data lakes are: 1) Over-centralized as every data project has to use the same technology. 2) Over-generalized as they are build the whole firm. 3) Complex as you have to consider Hadoop, advanced data management or data lineage systems. 4) Expensive and taking a lot of time to implement. To make their data team more agile, firms should permit them to build use case-specific projects, as they would have more freedom to choose cloud provider, region and data. It is not easy to build and use data lakes because there are many barriers when companies try to adopt digital transformation strategies when becoming data-driven. Lentiq’s EdgeLake was born to help teams to access data and create good environments to have machine learning projects. Lentiq thinks that firms need a human centric machine learning approach to achieve transformative innovation.