Skip to main content
Version: v2.6

Sample Data for the Databricks App

MEHRWERK provides sample data that you can use with the Databricks App. This tutorial will walk you through the steps to install the sample data and connect it to the mining app.

note

Please note that integrating the mined data from Databricks into your preferred BI platform is outside the scope of this section.

Prerequisites​

To follow this guide, the following requirements must be met.

  • You have access to a Databricks Workspace in which you can search and get listings from the Marketplace.
  • The sample data sets and the mpmX Process Mining App were shared with your account.
  • For this, we need your Databricks sharing identifier.

To find your Databricks sharing identifier, navigate to the Catalog page in your Databricks Workspace. Click the ⚙︎ symbol above the catalog browser and select Delta Sharing. From the dropdown in the upper right corner, copy your sharing identifier.

Copy sharing identifier
Copy sharing identifier

Sample data​

Get the shared sample data​

Log in to your Databricks account. Make sure that your permissions to get listings shared with you. Navigate to Marketplace. Find the mpmX - Sample Data listing. Open it and click the Get instant access button in the upper right corner.

Find the Sample Data in the Marketplace
Find the Sample Data in the Marketplace
Click Button
Click Button

In the following screen, you may change some settings. In this tutorial, we will proceed with the default values. Check the box and click the Get instant access button. When everything has finished, button in the upper right corner has changed to Open.

Get instant access
Get instant access
Open Sample Data
Open Sample Data

A click on Open will take you to the catalog browser. If you didn't change any settings in the above dialog, you will find a catalog named mehrwerk_gmbh_mpmx_sample_data with a structure as shown in the image below.

Sample Data Overview
Sample Data Overview

Description of the provided data sets​

The shared sample data sets includes different data sets, including event logs and event and case dimensions.

P2P sample data​

This data set contains sample data for object centric process mining of a typical P2P process. It has the following characteristics

  • # Variants: 5,276
  • # Objects: 4
  • Order
  • Order Item
  • Goods
  • Invoice
  • # Cases: 422,413
  • # Activities: 42
  • # Events: 783,723

The data provided includes an event log and case dimensions.

Ticket Log sample data​

This data set contains sample data for object centric process mining of some support ticket process. It has the following characteristics

  • # Variants: 185
  • # Objects: 2
  • Customer
  • Support
  • # Cases: 9,156
  • # Activities: 10
  • # Events: 21,229

The data provided includes an event log, case dimensions and event dimensions.

Logistics sample data​

This data set contains sample data for object centric process mining of a logistics process. The data was taken from here and transformed into the shape our process mining app expects. It has the following characteristics

  • # Variants: 35
  • # Objects: 7
  • Customer Order
  • Transport Document
  • Vehicle
  • Container
  • Handling Unit
  • Truck
  • Forklift
  • # Cases: 13,913
  • # Activities: 14
  • # Events: 35,372

The data provided includes an event log and case dimensions.