Data First Architecture: simple data architecture steps that prep for machine learning

By Miriah Peterson

Elevator Pitch

In Software Development it is common practice to treat data as a second class citizen. By bringing data to the forefront of our architecture and development it becomes easier to integrate ML into our products in ways that simplify the development process and higher Return on Investment.

Description

Abstract

In Software Development it is common practice to treat data as a second class citizen. It is becoming more necessary to integrate data heavy machine learning tools into products and platforms to gain a competitive edge in the market. This has traditionally been hard due to lack of experience on engineering teams and well and the high cost and low Return on Investment in hiring and building dedicated data teams. With advancements in and productization of Artificial Intelligence, it is easier for software engineers to leverage machine learning tools and models in product development. By bringing data to the forefront of our architecture and development it becomes easier to integrate ML into our products in ways that simplify the development process and higher Return on Investment.

Talk: Format

Types of Data

Data: Data (treated as singular, plural, or as a mass noun) is any sequence of one or more symbols given meaning by specific act(s) of interpretation.

Types: - Machine-generated data - User generated data - structured: data with a schema - unstructed data: raw loose data

Machine-generated data

What are types of machine generated data does your software produce? * Event Data * Error Data * Network Usage Data * App Actions (click-stream) data * Database change capture (anamoly detection)

User generated

What are types of user generated data does your software produce? * Account information * Personal information * Transaction (payment/currency) * Product Data * *Metadata

Ways to use your data

  • Direct in Product
  • Dashboard/Reports

Prose that Machine Learning from stored data is generally untapped for software teams

Machine Learning Applications

  • the reason for data first architecture: the Easier the data is to consume by a data scientist the quicker you see ROI*

Monitoring usage to trigger an action (predictive autoscaling).

Forcasting outages/high network usage

Product enhancement (weave sentiment analysis)

Data Architecture as part of planning process

### Questions to ask when planning

  • Who is creating, consuming, or communicating the data?
    • Where is the final resting consumption of my data?
  • How much does it cost to store the data?
  • What are availability needs?
  • What are availability needs?
    • Does the product need to consume this data?
    • Do other services need to consume this data?
    • How often is the data consumed?

### Methods for adding Machine Learning Models - API calls - Static Model - Part of Event Stream

Where do we start?

  • Pain points
  • Central Features
  • Product needs

Questions?

Resources

*wikipedia data-computing *Software Engineering Daily May 28, 2020 *machine-generated vs human-generated *cloudera autoscaling *Data Driven Applications *kubernetes autoscaling *autoscale wikipedia *predictive autoscaling aws

Notes

This talk was originally designed to be very interactive with heavy audience involvement.