Agile Data & DataOps

To apply Agile in Analytics it is a must for business to start with the basics:

  • Design Thinking *
  • Agile *

* These basic concepts are explained in some detail in THESES

Firstly, regarding Design Thinking, the following ideas ought to be clear:

  • Personas (user target): this requires complexity of analysis, granularity of data and security access.
  • Journeys: focus on ‘decision-driven’ user journeys, to guide them to propose successful sales.
  • Ideation workshop (the creative process of developing an ideal work area/space) that needs to center on business KPIs (Key Performance Indicators) and data driven decision-making.
  • Ideation needs to focus on the selection of appropriate Analytics solutions.
  • Story definition: one or more features of a system written from the perspective of anarchetype of users.
  • Backlog grooming: the ongoing process of reviewing product backlog items: including, removing or changing priorities.
  • Key roles:
  • Product owner
  • Development team,
  • Agile coach
  • Weightage prioritization: the team forecasts, usually by voting, which tasks will be delivered within the Sprint.
  • Sprint cycle timetimeboxedeffort, the shorter the better (usually between 1 and 4 weeks).
  • Automation Testing is a ‘must’.
  • Sprint Retro: Mid Sprint and End of the Sprint…
    • Each day, during a Sprint, the team holds a daily stand-up meeting and…
    • At the end of a Sprint, the team holds two meetings:
      • Sprint review and…
      • Sprint retrospective
    • User Acceptance Testing.
  • The volume of data can be a challenge: expanding giga, tera and petabytes needs some ‘tin’ to store it in.
  • Managing applications is highly complex: they require an end-to-end solution.
  • The hottest data technologies are still:
    • SQL
    • R
    • Python
    • Hadoop within Data Science
    • Kafka, Scala, Spark underpinned with Java within Data Engineering.

Adjuvant Technologies

Use case: Five reasons why Credit Unions must be involved in data pooling:

  • Access to diverse data.
  • Affordable access to data scientists.
  • Data must be encrypted and secure.
  • Quantity of data for Predictive Analytics.
  • Near real time (less than 4 months old) industry data for peer to peer analysis.

Use case: when Lentiq combines Data Lake with Edge Computing to create “interconnected micro Data Lakes”, it aims to give access to data, for Analytics and Machine Learning, to as many people as possible in an organization.

In doing so, we need to keep in mind:

  • That it is a common tactic to combine two technologies for the sake of synergy.
  • You can use any device that supports a flat file system, even a mainframe.
  • The Edge is supposed to act as a filter for unnecessary data (everything which is not smart data).
    • An Edge system getting car data, for instance, doesn’t want sensor readings saying everything is normal, it wants the unusual or aberrant.
    • That’s what gets sent up to the main data center (that’s how a data warehouse operates).

What is “Edge”?

‘Edge’ is short for ‘Edge Computing’. It is the hot buzzword when targeting departments and remote locations/offices.

  • Even though 5G will provide the edge with far better connectivity & lower latency to cloud based applications, there is still the cost of processing & storing the data. A hybrid edge compute/5G solution will mitigate these costs.
  • While many monitoring sites have some form of limited connectivity, they cannot accommodate the large volume of rich data that 5G can handle…
  • So the Edge can help with these challenges.

Automation & AI

Why algorithms are the future of business success.

Firms must consider that:

  • They will need to create increasingly more complex algorithms to maintain the level of performance of automation:
    • As the data increases and changes continuously, algorithms need to be adjusted at the same pace.
    • Insights can be modified according to new data, so the refinement of an algorithm is a never-ending process.
  • They need to know more about the exciting fields emerging at the frontier of this new world: Machine Learning(ML) and Deep Learning (DL).
    • Thjese two fields –Machine Learning (ML) and Deep Learning (DL)– emerged when computers performance advanced so they could compute more complex algorithms,
  • Both ML and DL are now the doorway into a new era of Artificial Intelligence (AI).
    • Biotricity is developing personalized and predictive feedback for each patient through AI that learns how patients react differently, based on real-time data. Healthcare will thus shift toward preventive care.
  • Organizations that adapt to this evolution will come out on top.
  • Those who don’t, will quickly find themselves lagging behind.
  • The promise of digital behavioral solutions and apps is enormous to address the gaps in mental healthcare in the US and across the world.
  • Smartphone-basedcognitive behavioral therapy and integrated group therapy are showing promise in treating conditions such as
    • Depression
    • Eating disorders
    • Substance abuse.
  • More researchers are developing AI-based tools that have the backing of randomized clinical trials which are showing good results.
  • dotData develops a “white box(a subsystemwhose internals can be viewed but usually not altered) Data Science platform, which can automate a good chunk of the Machine Learning (ML) pipeline.
  • The software which runs in Hadoop (a collection of open-source software utilities) leverages supervised and unsupervised ML algorithms.
  • This platform can automate the end-to-end process of raw data, to prepare data for feature engineering (the process of using domain knowledge of the data to createfeatures that make Machine Learning algorithms work) in ML.
  • The features are transparent and easy to understand by domain experts.
  • This is the key of the automated feature engineering.

Natural Language Processing and Analytics.

  • Call center recordings can be analyzed and then made available as a data source in an enriched text format that can be used to deliver rich visualizations.
  • As a result, firms have access to insights from Natural Language Processing (NLP) which surfaces keywords and topics that make recorded content discoverable.
  • Combining the data of this call center with the mix of other data renders more comprehensive and nuanced outputs for businesses to act upon.
  • Awhite box (or glass box, or clear box, or open box) is a subsystem whose internals can be viewed but usually not altered.
    • Having access to the subsystem internals makes the subsystem easier to understand but also easier tohack.
  • It is a collection ofopen-source software utilities…
    • That facilitate using a network of many computers to solve problems involving massive amounts of data and computation.
  • It is the process of using domain knowledge of the data to createfeatures that make Machine Learning algorithms work
  • Feature engineeringis fundamental to the application of Machine Learning.
  • And it’s both difficult and expensive.