Theses/ main Ideas

Agile Data & DataOps

In Advanced Analytics, Agile is necessary, taking into account that:

  • Design Thinking and Agile concepts and methodologies of the web/mobile software development world cannot be applied, as we know them today, to the Data Analytics field, because of the multi-dimensional aspects considered when building:
    • DataMart, Dashboards, Machine Learning algorithms, Business Intelligence reports…
    • However, the underlying paradigm and the principles of Design Thinking and Agile still apply.
  • There is a segregation of Data Analytics and Web/Mobile development, because of:
    • Huge upfront investment in infrastructure.
    • Time to market is higher.
    • Specialized skills are required within the team.
    • Data sanity –crunching, availability and accuracy– are more critical than user experience.
    • The need to focus on Buy vs Build (out of the box solution over custom-built applications).
    • Large team size and multiple vendors.

What makes it so difficult to build a DataOps & DevOps* team?

What problems is the DataOps & DevOps team facing?

  • Any organization trying to optimize its Big Data stack knows it has enemies stacked up against it, throwing blockages in the way of the DataOps & DevOps team.
  • Digital society produces great quantities of data, and the volume rises daily from:
    • Digital transactions and records
    • IoT
    • Ever-greater digitalization
    • Plus the ‘chips-with-everything’ mentality that connects so many new categories of devices to each year.

This trend fuels a plethora of new applications, spanning an alphabet soup (Extract, Transform and Load; Artificial Intelligence, Internet of Things, Machine Learning, etc…) encompassing many business drivers.

  • Some systems begin to creak or fall over, when data volumes start pushing at their technical boundaries.
  • Data applications don’t exist in isolation of the underlying Big Data stack.
  • Top challenges for the DevOps & DataOps team:
    • Endlessly looking for more storage server space.
    • Reconfiguring clusters.
    • Ensuring that data bases are optimized
  • The barriers & blocks that get in the way of a strong Big Data stack and a good Analytics process go deeper:
    • The silos most organizations build up as data pools in their departments appear constantly.
  • What really helps combat another blockage or barrier?
    • Merging data sources and cataloguing data.
  • The data pipeline is only as good as its weakest link.
  • What is time-intensive and complex for the team?
    • Troubleshooting
    • Configuration variables are painful to parse through
  • Given the lack of freely available talent, it makes sense to:
    • Safeguard the teams you have.
    • Allow them to use their skills to best effect by:
      • Automating the parts of their roles that allow them to focus on their higher skill-set.

Adjuvant Technologies

  • Lentiq really has a unique idea:
    • It combines the concept of Data Lake with Edge Computing to create what it calls “Interconnected Micro Data Lakes”.
  • Data pools are micro Data Lakes that function as a Data Lake, while supporting apps such as Apache Spark, Apache Kafka and Streamsets
    • Data pools exist independently across different clouds
    • Governance rules are enforced only when the data moves
    • So, each department will have the tools needed for:
      • Their use cases, and…
      • Access to the data they need.
    • Data Lakes are a newer take on mass repositories:
      • Like we first saw with data warehouses
      • Except they operate very differently
      • They hold unstructured data:
        • Images, PDFs, audio, logs, etc.
      • Data warehouses are highly structured row & column data.
      • Data Lakes do not require special hardware or software, unlike a data warehouse.
      • The big difference:
        • Data warehouse: you process the data before it goes into storage.
        • Data lake: you fill it with whatever and process it later when you need it.
      • The Edge is supposed to act as a filter for unnecessary data.
  • Edge Computing devices will provide connectivity and protection for new and existing Edge devices.
    • Even though 5G will provide the Edge with far better connectivity and lower latency to cloud based applications…
    • There is still the cost of processing and storing the data.
  • 5G opens up the chance for more applications to run at the Edge.
  • Edge computing allows new applications to function and be remotely managed with resiliency and integrity.
  • The combination of Edge and 5G will benefit the supply chain.
  • Edge platforms aggregate and analyze data over time, revealing insights to drive continuous improvement.
  • Oil and gas industry:
    • Edge computing can be deployed at remote pump sites
    • And connected to centralized automation systems
    • Through 5G networks.
  • New hybrid applications targeted by 5G: have both Edge and cloud components.

Automation & AI

  • Applying AI algorithms to Analytics will prove transformative…
    • But the complex merger requires a roadmap.
  • Greater AI and ML in enterprise resource planning will empower smarter processes to drive cost savings.
  • Digital replicas of physical processes let humans interact with IoT sensors, automating asset management.
  • Data center AI and Analytics will deliver more real-time intelligence, anticipating problems and implementing fixes before costly breakdowns.
  • The combination of elements of VR (Virtual Reality) and AR (Augmented Reality) with Data Analytics will grow in 2019…
    • But will accelerate even faster over the next two or three years.
  • Blockchain uses a shared digital ledger impossible for hackers to breach.
  • Thanks to advances in security, more organizations will embrace the cloud, generating huge new datasets to inform advanced ML.
  • Companies will scramble to hire full-stack engineers with AI and Analytics skills, making them the hottest careers of 2019.

Teradata targets Chinese clients’ need for AI

  • Teradata provides data technologies and Deep Learning (DL) related solutions that outperform rules-based and Machine Learning (ML) approaches in the areas of:
    • Fraud detection
    • Manufacturing performance optimization
    • And risk modeling to help clients accelerate their AI initiatives.
  • Machine Learning (ML) and AI generate higher quality predictive insights.
  • Accuracy is important in Data Science and Machine Learning.
  • Improving recommendations and predictions is the name of the game.
  • Another factor is the explainability of predictions.
    • Deep Learning’s explainabilty challenges will limit its use by firms:
      • You do not trust what you cannot understand.

DevOps (Development Operations) is a set of software development practices that combines software development (Dev) and information technology operations (Ops). Their aim is to shorten the systems development life cycle while delivering features, fixes and updates, in a way that aligns closely with business objectives.

  • Design Thinking is all about ability and learning: how to explore and solve problems, awakening the team’s creativity to reveal new ideas. These ideas will be followed up with Agile.
  • Key concepts of Design Thinking:
    • User personnel: identification of the user as target.
    • Granularity of data: the greater the granularity, the deeper the level of detail revealed.
    • User journeys: they describe at a high level of detail exactly what steps different users take to complete a specific task within a system, application or website.
  • Agile is about how we adapt to changing conditions using software.
    • Scrum: an Agileframework for managing knowledge work.
    • Basic concepts related to Scrum:
      • Product backlog:  list of requirementsthat a Scrum team maintains for a product.
      • Backlog grooming is the ongoing process of:
        • Reviewing product backlog items and…
        • Checking they are correctly prepared and…
        • Ordered in a way that makes them clear and executable for teams once they enter Sprints via the Sprint planning activity.
      • User story:It’s an informal, natural language description of one or more features of a software system…
      • Product owner: the product owner represents the product’sstakeholders and the voice of the customer
        • It’s responsible for the product backlog and…
        • Accountable for maximizing the value delivered by the team.
      • Scrum master: Scrum is facilitated by a master…
        • Accountable for removing impediments to the team’s ability to deliver the product goals and deliverables.
      • Development team: it is responsible for delivering potentially shippable product increments at every Sprint.
      • Weightage prioritization: once the development team has prepared its Sprint backlog…
        • It forecasts (by voting) which tasks will be delivered within the Sprint.
      • Sprint cycle: the Sprint is atimeboxed effort restricted to a specific duration
        • Fixed in advance for each Sprint:
          • Between one week and one month…
          • Two weeks being the most common.
        • Sprint Retro: each day during a Sprint…
          • The team holds a daily Scrum (orstand-up) with specific guidelines…
          • And at the end of a Sprint
            • The team holds two events:
              • The Sprint review.
              • The Sprint retrospective.