Coffee Dashboard: Processing and Optimizing the Dataset

In the past blogpost, we have prepared and upload an XML export from our Franke A400 coffee machine. As we’re working with machine data, the data prep journey isn’t over yet. Let’s talk about how to find out what’s in the data, what’s left to be done, and how to resolve these issues.

One nasty thing about working with machine data is, while typically very clean and structured (because written out by a machine following predefined rules), it’s not designed to be read and understood by humans in the first place. In order to work with this data, we have to take a deeper look at the data loaded to analytics and try to get to the meaning of most of the relevant fields.

Column view in the recipe builder.

Let’s start by building a recipe. There are quite a number of helpful tools for our task in the Dataset Recipe. Switch to the “Column” view to see a list of column names and a status bar that indicates each column’s population with values. An important note of caution: Only a sample of the rows will be displayed, and the sample size is by default set to 5000. So before you decide on dropping some columns, make sure you reviewed the changed with a larger sample size or by exploring the dataset itself.

For the machine data we’re using, I know that there were relevant changes introduced with a newer software version, so I start by switching to the larger sample which is 10000 rows.

Column profile widget: You can see that only about half of the rows are populated with a value, and there are only two different values in the data – a typical profile for a dimension such as a software version.

Let’s start by analyzing the first column. Its name is “Ver”, giving us a little clue about the purpose, but we need further investigation. Using the column profile, we can see that more than half of the rows doesn’t have data, and for the rest of the sample, there are only two values, which follow typical version naming conventions. It’s a software version, and “Ver” is obviously a lazy column name (or just an”efficient memory footprint”). Let’s switch to the “attributes” widget and change the label to “System Version”.

A few more columns have obvious content, “Ti” and “TiUtc” have both been identified as date fields, and the name hints already at the content. While “Ti” is populated everywhere, “TiUtc” seems to be filled only recently, and it contains redundant information as we’re only checking the data from a single machine in a single timezone. Ket’s change the label of “Ti” to “Datetime” and drop the representation of the date and time in UTC (“TiUtc”).

Histogram for one of the low (no) variance columns

Next, let’s tackle two archenemies of any analytic approach. Columns with low variance in data, and columns that are there but empty. While low variance can happen and should be examined, no variance and no values are pointing at columns that are not relevant for analysis because they either have the same value for each row (so you don’t need an analysis to get to the value), or they contain nothing, so they are both predictable and without any value .

Going through the list of fields and looking at their value histograms, we can determine fo most of the fields if they are potentially useful for an analysis or not.

And there’s another we can find – let’s look at this histogram and the raw data and go into the details of what we can see here:

Did you find the potential issues:

  • there’s a line “2 Espressi”. The issues is – quite obviously – that the quantity and the product are combined in one field. If we applied the Count(Rows) function, we’d end up with 2 (rows), not 3 (cups served). So one task in data prep is to split quantity and product name.
  • Next: “Teewasser undosiert”. You won’t get to the core without translating the product name. It’s not a real issue, but a hint: The machine produces hot water (“Teewasser”) and the user can choose whatever quantity (“undosiert”). This is also available for hot milk, cold milk and steamed milk and will come very handy to understand the cryptically labeled telemetry fields later.
  • “rinsing”: These are lines we typically want to filter, as the machine runs a lot of automated rinsing programs over the day and these are definitely not drinks consumed by the team.
  • Did you see the last important issue by now? It’s “Latte Macchiato” vs. “Latte macchiato”

Handling the Data Prep in a Dataflow

I decided to handle these adjustments in a dataflow because I want to do a few things that are not currently available in Recipes (soon: Data Prep):

  • I want to normalize the capitalization of the product names that’s different between firmware versions of the machine (that’s a replace function in Recipes and a ComputeExpression in Dataflow).
  • I want to extract the quantities from single and double shots – this can be combined into a case formula which is available as a ComputeExpression in Dataflow
  • I want to create canonical product names so that “2 Espressi” resolves to a user interaction (pushed “2 Espressi” button), a quantity (2) and a product (“Espresso”). Again, I’m going to use a ComputeExpression.
  • … and some buckets for Tea, Coffee, Chocolate and Milk.
  • Then, some transformations for the date – day, week number, month name, day name, hour as a dimension.
  • And finally, we’re dropping all unused columns and auxilliary columns that were created in the process.

We end up with a data flow like this:

The complete JSON definition of this looks like this:

{
  "filter rinsing": {
    "action": "filter",
    "parameters": {
      "source": "Load Machine Data",
      "saqlFilter": "'Na' != \"rinsing\""
    }
  },
  "Store Coffee Data": {
    "action": "sfdcRegister",
    "parameters": {
      "name": "Coffee Data",
      "alias": "CoffeeData",
      "source": "drop Fields"
    }
  },
  "create Fields": {
    "action": "computeExpression",
    "parameters": {
      "source": "filter rinsing",
      "mergeWithSource": true,
      "computedFields": [
        {
          "defaultValue": "0",
          "precision": 2,
          "name": "WeekNumber",
          "saqlExpression": "Ti_Week",
          "label": "Week Number",
          "type": "Text"
        },
        {
          "defaultValue": "0",
          "precision": 1,
          "saqlExpression": "day_in_week(toDate('Ti_sec_epoch'))",
          "name": "DayInWeek",
          "scale": 0,
          "label": "Day In Week",
          "type": "Numeric"
        },
        {
          "defaultValue": "0",
          "precision": 3,
          "saqlExpression": "day_in_year(toDate('Ti_sec_epoch'))",
          "name": "DayInYear",
          "scale": 0,
          "label": "Day In Year",
          "type": "Numeric"
        },
        {
          "defaultValue": "0",
          "precision": 2,
          "saqlExpression": "Ti_Hour",
          "name": "Hour",
          "label": "Hour",
          "type": "Text"
        },
        {
          "saqlExpression": "case \n  Ti_Month\n  when \"01\" then \"January\"\n  when \"02\" then \"February\"\n  when \"03\" then \"March\"\n  when \"04\" then \"April\"\n  when \"05\" then \"May\"\n  when \"06\" then \"June\"\n  when \"07\" then \"June\"\n  when \"08\" then \"August\"\n  when \"09\" then \"September\"\n  when \"10\" then \"October\"\n  when \"11\" then \"November\"\n  when \"12\" then \"December\"\nelse \"\"\nend",
          "name": "MonthName",
          "label": "Month",
          "type": "Text"
        },
        {
          "name": "DayName",
          "saqlExpression": "case \n  DayInWeek\n  when 1 then \"Sunday\"\n  when 2 then \"Monday\"\n  when 3 then \"Tuesday\"\n  when 4 then \"Wednesday\"\n  when 5 then \"Thursday\"\n  when 6 then \"Friday\"\n  when 7 then \"Saturday\"\nelse \"\"\nend",
          "label": "Day",
          "type": "Text"
        },
        {
          "type": "Text",
          "saqlExpression": "replace('Na',\"macchiato\",\"Macchiato\")",
          "name": "Name",
          "label": "Name"
        },
        {
          "type": "Numeric",
          "saqlExpression": "case when substr('Na',1,1) == \"2\" then 2\nelse 1\nend",
          "name": "Quantity",
          "label": "Quantity",
          "precision": 1,
          "defaultValue": "1",
          "scale": 0
        },
        {
          "type": "Text",
          "saqlExpression": "case 'Na'\nwhen \"Teewasser undosiert\" then \"Tea\"\nwhen \"Milchschaum undosiert\" then \"Milk\"\nwhen \"Warme Milch undosiert\" then \"Milk\"\nwhen \"Kalte Milch undosiert\" then \"Milk\"\nwhen \"Schokolade\" then \"Chocolate\"\nelse \"Coffee\"\nend\n",
          "name": "ProductType",
          "label": "Product Type"
        },
        {
          "type": "Text",
          "name": "ProductName",
          "label": "Product Name",
          "saqlExpression": "case \n  'Name'\n  when \"2 Cafés Crème\" then \"Café Crème\"\n  when \"2 Cappuccini\" then \"Cappuccino\"\n  when \"2 Espressi\" then \"Espresso\"\n  when \"2 Latte Macchiato\" then \"Latte Macchiato\"\n  when \"2 Milchkaffees\" then \"Milchkaffee\"\nelse  'Name'\nend"
        }
      ]
    }
  },
  "Load Machine Data": {
    "action": "edgemart",
    "parameters": {alias": "e_xportNew2"}
  },
  "drop Fields": {
    "action": "sliceDataset",
    "parameters": {
      "mode": "select",
      "fields": [
        {"name": "WeekNumber"},
        {"name": "Tl5"},
        {"name": "Tl4"},
        {"name": "Tl3"},
        {"name": "Ti"},
        {"name": "Tf"},
        {"name": "Tb3"},
        {"name": "Tb2"},
        {"name": "Tb1"},
        {"name": "Re"},
        {"name": "Quantity},
        {"name": "ProductType"},
        {"name": "ProductName"},
        {"name": "Pp"},
        {"name": "Pl"},
        {"name": "Ms"},
        {"name": "MonthName"},
        {"name": "Miv1"},
        {"name": "Max"},
        {"name": "Io"},
        {"name": "Hour"},
        {"name": "Fl"},
        {"name": "Dd1"},
        {"name": "DayName"},
        {"name": "DayInYear"},
        {"name": "DayInWeek"},
        {"name": "Bt"},
        {"name": "Bpt"},
        {"name": "Bi2"},
        {"name": "Bi1"},
        {"name": "Bam"},
        { "name": "Name"}
      ],
      "source": "create Fields"
    }
  }
}

As a last step, let’s open the newly created dataset by double clicking on the dataset (or using the “Explore” action). Ignore the chart for now and just click on “Field” to bring up the Dataset XMD editor. There’s at least one column that could do with a human readable name, and that’s “Ti”. Let’s call it “DateTime” by clicking on the name, entering a more readable label and then hit return and click the save button. Now we’re set for the first analysis – stay tuned!

Coffee Dashboard Series

Coffee Dashboard – Preparing the Data

When you begin to build anything related to data, you have to get a clear understanding about the question(s) you want to get solved, and the data you want to use to “ask” to find an answer. What sounds like a banality is indeed an essential technique of breaking down questions on the one hand, understanding your data sources on the other hand, so that you eventually get to questions, data and granularity that matches so that you can finally start building.

Continue reading “Coffee Dashboard – Preparing the Data”

Prologue: Building a Coffee Consumption Dashboard

When I started with Einstein Analytics, I had little business data at hand and I wanted to build something that I could demo to customers in a way that allows them to understand what Einstein Analytics is capable of in a friendly and entertaining way, but without the need of showing our company’s data or fictitious data that people can or cannot relate to. It was just the same time that we got a new, professional coffee maker that can export machine data that gives details on the coffee consumption.

Well – a coffee dashboard! Now that’s something everyone knows: isn’t the coffee maker the number one social space in every company? Isn’t great coffee more than a perk, but rather a necessity of work in the 21st century?

Long story short: I have built this dashboard. I have maintained it for a while, and it always served its purpose very well. And Einstein Analytics moves more and more to the center of our business intelligence, requiring me to take better care of the assets we hold in Analytics and especially decluttering our org from time to time. I removed the old coffee dashboard to keep the org clean, and this could have been the end of the story…

… until I posted about my success in passing the Einstein Analytics and Discovery Consultant certification. I mentioned the coffee dashboard and people did ask for that. So I decided to rebuild it, as it is a good tutorial on how to leverage the features of Analytics to consume external data, run some preparation and build a fancy and fun dashboard. So – join me on this ride back to my Einstein Analytics roots in what was then Wave Analytics. I plan to blog about each step on bi- or triweekly basis.

  1. Data Prep – from XML to Analytics Dataset
  2. Optimizing and Processing the Data

CzechDreamin ’19: Two Gigs in Prague!

Another community conference debut in 2019 is CzechDreamin’ in Prague. Martin Humpolec and his team have crafted a wonderful four-track conference, and there’s plenty of interesting stuff to learn: Rikke Hovgaard will give an Analytics workshop, René Winkelmeyer and Philippe Özil will present on LWC, and much, much more. If your calendar allows, grab one of the last tickets and come to Prague on August 16th.

I’ll be hitting the stage twice:

  • In Klub A at 14:45, I’ll review how the Platform Cache can help you boost your apps performance. Platform Cache was introduced in 2016 and a lot has changed since then – learn how (and why) to use it and what kind of use cases still exist after the advent of heavily cached Lightning Experience.
  • And at 16:45, again in Klub A, Christian Szandor Knapp and I will have yet another gig together, but this time with a side note. While researching for the topic we planned (and how to give it a Kafka theme), we found ourselves in a kafkaesque story when all that seemed obvious looked less and less clear the closer we looked.

Again – get your tickets while stocks last – it’s going to be the #1 summer conference in Europe!

YeurDreamin ’19: Put the Wow! into your Flow

Christian Szandor Knapp and I ran a hands-on workshop at the first installment of the YeurDreamin’ conference in Amsterdam on June 14th, 2019. It’s been a wonderful summer day, and the Amsterdam crew – Paul Ginsberg, Kathryn Chlosta, André van Kampen, Kevin Jackson, and their team – really made it stand out.

[slideshare id=158181315&doc=yeurdreamin19wow-flow-workshoppublic-190726195726]

Continue reading “YeurDreamin ’19: Put the Wow! into your Flow”

ForceAcademy LA ’19: Clicks, Code, Relationship Therapy

I was a returning speaker at ForceAcademy LA on May 13, 2019, with a presentation on the relation between clicks and code – the things that can possibly go wrong and how to find and fix them.

Clicks AND code, limits, performance and the Order of Execution – all of these are key topics that I’m doing a lot of research on.

Continue reading “ForceAcademy LA ’19: Clicks, Code, Relationship Therapy”

“Your admin toolbelt is not complete without Salesforce DX” – #FTD18

I was hitting the stage once again with Christian Szandor Knapp. This time, we were trying to make life better for Salesforce Admins. In our humble opinion, Salesforce DX and especially the CLI are the number power tool for admins – that’s what we promoted this idea at French Touch Dreamin’ in Paris on November 24th, 2019.

Video to follow later.

Resource: https://github.com/open-force/sfcli-cookbook