Skip to main content

Application Overview

To find the Process Mining App, login to your Databricks workspace and navigate to Compute ยป Apps. Clicking on the app's name will take you to a page where you can find the app's URL. Click on the URL to open the app.

Please note that only users who have been granted permission can see and use the app.

App overview
App overview
App details
App details

All Scenarios Pageโ€‹

The Databricks Process Mining App starts with the All Scenario page. Here you can view all the available process mining scenarios, create new process scenarios or edit and delete existing ones. Each scenario contains all the necessary details for mining your data, including data source definitions within Databricks and the process mining parameters. You can review and edit existing scenarios, delete them, or add new ones as needed.

The "โœŽ" button takes you to the Data Source page, where you can start editing an existing Process Scenario.

The "โฟ" button takes you to the Data Source page, where you can view the Process Scenario configuration. You cannot make any changes to the Process Scenario in this mode.

The "๐Ÿ—‘๏ธ" button allows you to delete an existing Process Scenario. You must confirm your intention.

All Scenarios
All Scenarios

Create a Process Scenarioโ€‹

The "โž•ย Create a Process Scenario" button takes you to a new page.

Create a Process Scenario
Create a Process Scenario
Create a Process Scenario
Create a Process Scenario

Here you enter the Process scenario short name and Process scenario description of the new Process Scenario.

You can also decide whether you want to create an OCPM Process Scenario.

Delta Lake time travel is not supported at the moment. See What is Delta Lake time travel?.

Data Sources Pageโ€‹

Data sources are the starting point for process mining.

Each data source must adhere to a defined data structure, and our validation process ensures that everything is formatted correctly before moving forward. If you encounter any validation issues while configuring the data sources, see Troubleshooting Data Sources Validation for help.

Buttons
Buttons

Select an Event Logโ€‹

An event log is the central and mandatory input for process mining. It is a table that contains case and event information. To further enrich the event log with your domain specific data, you can also add additional Case and Event Dimensions.

The "โž• Select Table as Event Log" button allows you to reference a table that you can access as an event log.

note

The Select a Table dialog displays only those tables for which you have both SELECT and MANAGE privileges. In addition, you must have at least USE CATALOG and USE SCHEMA permissions to view and select a table.

Select an Event Log
Select an Event Log

Click the "๐Ÿ—‘๏ธ" button to remove an existing reference which then allows the selection of a different Event Log.

Data Source page
Data Source page

Dimensionsโ€‹

The "โž• Select Table as Case Dimensions" and "โž• Select Table as Event Dimensions" buttons allow you to reference tables that you can access as additional case and event dimensions respectively.

Click the "๐Ÿ—‘๏ธ" button to remove an existing reference which then allows the selection of different Case or Event Dimensions.

Passthrough Tablesโ€‹

The "โž• Select Table to pass through" buttons allow you to reference tables that are passed through the Databricks Process Mining App as-is, meaning a view is created for each passthrough table.

Click the "๐Ÿ—‘๏ธ" button to remove an existing reference which then allows the selection of a different Passthrough Table.

Data Filter Pageโ€‹

Data Filter page
Data Filter page

The Data Filter page allows you to configure filters for the input data. These filters limit which events from the event log are included in the process mining analysis.

Select Time Rangeโ€‹

Time Range filters limit which events from the event log are included based on their starting timestamps.

You can choose between the following filter types:

  • All time (no time filter) - All events from the event log are included in the process mining analysis.
  • Fixed from date until today - Only events with timestamps on or after a specified date are included. Use the From field to specify the starting date.
  • Fixed from date until fixed to date - Only events with timestamps within a specified date range are included. Use the From field to specify the start of the range and the To field to specify the end of the range. The end date must not be earlier than the start date.
  • Rolling window of last X Days/Weeks/Months - Only events from the most recent time period are included. Use the provided input fields to specify the time period as Days, Weeks, or Months and to specify how many of that period time should be included. If you choose Weeks, you can also specify which day should be considered as the start of a week.

Rolling windows starting date calculationโ€‹

When Rolling window of last X Days/Weeks/Months is selected, the given parameters are used to calculate a starting date. Then all events on or after that starting date will be considered. The starting date is calculated to be the start of a period and may change from process mining execution to process mining execution since it is always calculated based on the current execution date. See the following examples to make that more clear.

  • Days
    • 1 day - All events that occured any time yesterday (relative to mining execution) or later will be taken into account. So if the mining tasks runs on June 5, all events from the 4th and 5th will be processed. If the mining tasks runs on June 6, all events from the 5th and 6th will be processed and so on.
  • Weeks - The calculation depends on the configured start of week. The calculated starting date will always be that weekday.
    • 1 Week with Monday as start of week - All example dates are in 2025. If the process mining task is executed on October 20, which is a Monday, the calculated starting day will be October 13 (Monday). On October 21 (Tuesday), it will also be the 13th and also on October 26 (Sunday). On October 27, it will switch to October 20.
    • 1 Week with Sunday as start of week - All example dates are in 2025. If the process mining task is executed on October 20th, which is a Monday, the calculated starting day will be October 12 (Sunday). On October 21 (Tuesday), it will also be the 12th. But on October 26, which is a Sunday, it will switch to the 19th.
  • Month
    • 1 Month - All events that occured any time in the last month (relative to mining execution) or later will be taken into account. So if the mining task runs on October 22, the starting date will be September 1. If it runs on October 31, the starting date will also be September 1. If it runs on November 1, the starting will be October 1 and it will continue to be October 1 throughout November.

Filtering for whole casesโ€‹

Simply removing all events outside the specified time range from the event log would result in incomplete cases and lead to incorrect variants and measures. For instance, deleting the initial events from certain cases would misidentify subsequent events as start events. To avoid this, the filtering algorithm identifies cases with events occurring within the specified time range. For these cases, all events from the event log are then processed during process mining execution. This reduces the amount of data to be processed to cases with activity during the specified time range, while ensuring consistency in the analyzed cases.

Shifts and Holidays Pageโ€‹

Shifts and Holidays page
Shifts and Holidays page

The Shifts and Holidays page allows you to configure workdays, holidays and shift times for net time calculations. These configurations are optional and enable precise net time calculations by considering working days, holidays and shift patterns. The configurations use a flexible KEY system that allows you to define different rules for different organizational units, regions or processes within a single analysis.

If you encounter any validation issues while configuring these data sources, see Troubleshooting Data Sources Validation for help.

Workdays Configurationโ€‹

The workdays configuration defines which weekdays are considered working days for different organizational units or processes.

The "โž• Select Table as Workdays" button allows you to reference a table that contains workday definitions.

Click the "๐Ÿ—‘๏ธ" button to remove an existing reference which then allows the selection of a different Workdays Config table.

For more information on the required table structure, see The Workdays Config.

Holidays Configurationโ€‹

The holidays configuration defines exception dates (holidays) that should be excluded from net time calculations for different regions or organizational units.

The "โž• Select Table as Holidays" button allows you to reference a table that contains holiday definitions.

Click the "๐Ÿ—‘๏ธ" button to remove an existing reference which then allows the selection of a different Holidays Config table.

For more information on the required table structure, see The Holidays Config.

Shifts Configurationโ€‹

The shifts configuration defines working hours within days, allowing you to specify multiple shift patterns for different organizational units or processes.

The "โž• Select Table as Shift Times" button allows you to reference a table that contains shift time definitions.

Click the "๐Ÿ—‘๏ธ" button to remove an existing reference which then allows the selection of a different Shift Times Config table.

For more information on the required table structure, see The Shift Times Config.

info

To use these configurations for net time calculations, your event log must contain KEY_WORKDAY, KEY_HOLIDAY and KEY_SHIFT columns that link events to the respective configuration rules. The values in these columns must match the KEY values defined in your configuration tables.

Grouping Pageโ€‹

Grouping page
Grouping page

By defining groups, you can combine related activities to get a better overview of your processes in the process analyzer. Groups can be defined in a hierarchy by configuring one group as the parent of another.

To add a new group, click the "โž•" button.

Fill out the Name field with the name of the group.

The Parent Group field lets you define a parent group. Parent groups are optional. A group cannot reference itself as its parent.

Use the Activities field to define which activities belong to your group.

  • Click on the "ห…" to open a list of activities to choose from.
  • Select as many activities as you like, but each activity can only belong to one group.
  • Click the "X" next to an activity name to delete it.

To edit an existing group, simply change the field values as required.

To delete an existing group, click the "๐Ÿ—‘๏ธ" button next to the group.

Subprocess Leadtime Pageโ€‹

Subprocess Leadtime page
Subprocess Leadtime page

Define subprocesses to calculate lead times and other time-related measures are calculated for partial processes.

To understand why this is useful, take a look at the Subprocess Leadtime ยป Use Cases.

Add a New Subprocessโ€‹

To add a new subprocess, click the "โž•" button and fill out the fields appropriately.

The Subprocess field lets you define a name for your subprocess.

Use the Start Activity field to select all activities that mark the start of the subprocess.

The Include Start Activity check mark below defines if the duration of the activity will be included in or excluded from the calculated lead time.

Sometimes you may have more than one activity marked as the Start Activity show up in a process variant - either because two different start activities appear or a single start activity is repeated. The Min/Max Start Activity field is then used to define if the first or last matching activity will be used for lead time calculation. Min uses the first matching activity (resulting in a longer process leadtime), while Max uses the last matching activity (resulting in a shorter process leadtime).

Use the End Activity field to select all activities that mark the end of the subprocess.

The Include End Activity check mark below defines, if the duration of the activity will be included in or excluded from the calculated lead time.

By configuring the Min/Max End Activity field for the End Activitiy you can define if the first or last matching activity will be used for lead time calculation. Here, the resulting impact is the opposite of that of the Start Activity. Min uses the first matching activity (resulting in a shorter subprocess leadtime), Max uses the last matching activity (resulting in a longer subprocess leadtime).

The Target Time [d] and Target Time Operator fields define the desired lead time for the subprocess. They are used to calculate whether the lead time has been missed.

End of Process Pageโ€‹

End of Process page
End of Process page

By default, the mpmX app shows leadtimes and other information from an average of all cases, but this can sometimes be misleading.

  • For example, cases that are not finished will have shorter leadtimes just because they have not reached the end.
  • The number of process variants (specific path that each case takes) will be greater if you include both open and closed cases, than if you only counted closed cases.

On the End of Process page you can define when a process is considered to be completed.

  • If the End of process condition is set to None, all processes are considered to be completed.
  • Select using Activity to then select an activity which marks a process as completed.
    End of Process - using Activity
    End of Process - using Activity
  • Select using Custom Field to define more complex conditions.
    • On the left, you can define which field of the activity log should be used.
    • In the middle select the appropriate operator (= or IN).
    • On the right, define the value (in case of =) or the comma-separated list of values which mark a process as completed.
      End of Process - using Custom Field
      End of Process - using Custom Field

Time Travel Pageโ€‹

info

Delta Lake time travel is not supported at the moment. See What is Delta Lake time travel?.

Additional Parameters Pageโ€‹

On the Additional Parameters page you can configure some rework and automation settings as well as some miscellaneous settings.

Additional Parameters page
Additional Parameters page

Reworkโ€‹

A rework event is any activity that indicates an unexpected change, such as a Purchase Order being adjusted or deleted.

By selecting a Rework definition option you can define which kinds of events should be taken into account when calculating rework related measures. Options are

  • <empty> - rework related measures will not be calculated
  • ReworkEvent - events are considered if they are defined as rework (see Rework Event Expression (by activity type) below)
  • RepeatedEvent - events are considered if they are repeated in a loop
  • ReworkAndRepeatedEvent - events are considered if they are defined as rework AND are repeated
  • ReworkOrRepeatedEvent - events are considered if they are defined as rework OR are repeated

An event is regarded as rework if its activity type matches the Rework Event Expression (by activity type).

Automationโ€‹

  • The Automation limit [%] indicates the percentage of automated events that a case needs to have to be labeled as an automated case.
  • An event is regarded as automated if its user name field matches the Automated Event Expression (by user name).

Resource Analysisโ€‹

Resource Analysis is an optional feature which lets you use the Resource Analysis sheet.

  • If No resource analysis is selected, no resource analysis will be performed during the process mining execution.
  • Selecting Analyse users as resource will result in a resource analysis being performed with regard to the users who executed an event. This option only appears if the event log contains a column called EL_USERNAME. For more information on the input event log, see here.
  • Selecting the option Analyse general resource placeholder as resource will result in a resource analysis being performed based on the values given in the EL_RESOURCE column. This option only appears if the column is present in the event log. For more information on the input event log, see here.

Miscellaneousโ€‹

  • If Reduce timestamps? is checked, the number of distinct timestamps is reduced by flooring seconds and milliseconds to minutes. The lead and process times are not influenced, just the final timestamp output format is shortened to minutes.
  • If Fold loops for activity repetitions is checked, process variants that differ only in the number of times a loop is executed are merged into a single looped variant. For example, A โ†’ B โ†’ B โ†’ D, ย A โ†’ B โ†’ B โ†’ B โ†’ D and A โ†’ B โ†’ B โ†’ B โ†’ B โ†’ D are treated as the same variant. All repetitions are kept in the event log but there will be fewer distinct variants.
  • Check Run pareto analysis to enable Pareto/ABC Analysis.

Conformance Checking Pageโ€‹

Conformance Checking is an optional module in mpmX and will unlock the analysis in the Conformance sheet.

On the Conformance Checking page you can define your Happy Paths, or ideal process paths, as well as configure other mining related settings.

Conformance Checking
Conformance Checking

Conformance Checking Parametersโ€‹

  • Use the Only finished cases? field to control whether only finished cases should be regarded or not.
  • The Optimization Potential Threshold defines which cases will be analyzed for process governance optimization potential by comparing the happy path fitness of the case to the specified threshold:
    • By default, the threshold is set to 0.8 (80%).
    • To customize the threshold, check the Adapt threshold checkbox, then enter a value between 0.0 and 1.0 in the Optimization Potential Threshold field. The value can be adjusted in increments of 0.01.

Happy Pathsโ€‹

This is where you define your happy paths.

To add a new happy path, click the "โž•" button and fill out the fields appropriately.

  • Use the Name field to assign your happy path a meaningful name.
  • Use the Description field to describe what the happy path represents.
  • The Process Path defines the ideal sequence of activities. Build it interactively:
    • In the left column, browse the list of available activities. You can search and filter activities by type or object types.
    • Click the arrow button (โ†’) next to an activity to add it to your happy path sequence.
    • In the right column, view your selected activity sequence. You can reorder activities using the up/down arrow buttons, or remove activities using the delete button.
  • The Condition field (optional) allows you to limit which cases the happy path is checked against:
    • Enable the condition by checking the Condition checkbox.
    • Enter a JSON Logic expression in the text area that appears. This expression defines which cases should be evaluated against this happy path.
  • If Active is checked, then this happy path will be used in the process mining.

To edit an existing happy path, simply change the field values as required.

To delete an existing happy path, click the "๐Ÿ—‘๏ธ" button next to the happy path.

Task Execution Pageโ€‹

Task Execution page
Task Execution page

Task Execution SQL Warehouseโ€‹

The "Overwrite default SQL warehouse" button allows you specify a SQL warehouse that is used solely by the process mining tasks. You can specify a different SQL warehouse for each scenario, enabling you to select the most suitable SQL warehouse size for your dataset.

Clicking the "Use default SQL warehouse" button resets the SQL warehouse reference so that the default SQL warehouse is used by the process mining tasks once again.

The "Change SQL warehouse" button allows you to update the referenced SQL warehouse alongside with your evolving dataset.

Task Execution Schedulingโ€‹

  • This section allows you to configure a schedule for regular automatic process mining runs by defining a Cron expression and a time zone. To understand how a cron expression is structured, please refer to the Quartz Cron Syntax documentation. Please note that a timezone specification is required in addition to the Quartz Cron Expression.

Ad-hoc Task Executionโ€‹

  • The Start Mining Task button allows you to manually start a process mining task.

Mining Summary of last executionโ€‹

  • This section shows the basic KPIs of the process model after a successful mining run.

Task Execution Historyโ€‹

The Task Execution History section displays the status of the most recent process mining runs.

  • The Start (UTC) and End (UTC) columns contain the start and end timestamps of the mining run
  • Duration (hh:mm:ss) contains the total time of the mining run.
  • The States column contains one of the following values:
    • Running - The process mining task is running.
    • Failed - The process mining task has failed. Error details can be found in the Error Mesage column.
    • Succeeded - The process mining task was successful.
    • Scheduled - The process mining task is scheduled to run at the displayed Start (UTC) time.
  • An Error Message will appear if anything went wrong.

Permissions Pageโ€‹

Permissions page
Permissions page

The Permissions page allows you to manage access control for the process scenario. You can grant or revoke permissions for Databricks users, groups, and service principals at the scenario level.

To grant permissions, click the "Grant" button and fill out the required fields:

  • The Principals field allows you to select the group, user, or service principal to grant permissions to
  • The Privileges field lets you choose between CAN USE (data consumer access) or CAN MANAGE (administrative access)
Permissions Grant
Permissions Grant

Once you have configured the permission details, click the "Confirm" button to apply the changes.

To revoke existing permissions, select them using the checkbox next to the permission entry you want to remove, then click the "Revoke" button. You will be prompted to confirm the revocation.

Permissions Revoke
Permissions Revoke
info

For detailed information about the different permission levels and their capabilities, see Security ยป Permission Levels.

App Settings Pageโ€‹

This page allows you to configure all operational parameters for the app.

App Settings
App Settings

Default SQL Warehouseโ€‹

The "Change default SQL warehouse" button allows you to change the referenced SQL warehouse. It is used as a fallback for the process mining task if a scenario specifies no other SQL warehouse. We recommend that you select a warehouse of size 2X-Small and overwrite the process mining warehouse per scenario with a SQL warehouse of an appropriate size for your dataset.

See also: Installation and Update ยป Compute

Permissionsโ€‹

The Permissions section allows you to manage access control for the application. You can grant or revoke permissions for Databricks users, groups, and service principals at the application level.

To grant permissions, click the "Grant" button and fill out the required fields:

  • The Principals field allows you to select the group, user, or service principal to grant permissions to
  • The Privileges field lets you only choose CAN MANAGE (full administrative access) for now

Once you have configured the permission details, click the "Confirm" button to apply the changes.

Permissions Grant
Permissions Grant

To revoke existing permissions, select them using the checkbox next to the permission entry you want to remove, then click the "Revoke" button. You will be prompted to confirm the revocation.

Permissions Revoke
Permissions Revoke
info

For detailed information about the different permission levels and their capabilities, see Security ยป Permission Levels.