14th International Workshop on Business Process Intelligence 2018

to be held in conjunction with BPM 2018 Sydney, Australia, September 9 - 14, 2018

Business Process Intelligence Challenge (BPIC)

Eighth International Business Process Intelligence Challenge (BPIC’18)

Platinum Sponsor Gold Sponsor Silver Sponsor

In this challenge, sponsored by Celonis, NWO's DeLiBiDa project and Minit, we provide participants with a real-life event log, and we ask them to analyze these data using whatever techniques available, focusing on one or more of the process owner's questions or proving other unique insights into the process captured in the event log.

We strongly encourage people to use any tools, techniques, methods at their disposal. There is no need to restrict to open-source tools, and proprietary tools as well as techniques developed or implemented specifically for this challenge are welcome.

Our industrial sponsors provide access to their tools for use with the BPI Challenge dataset. If you would like to use Celonis on this data, please contact them directly on BPI2018@celonis.com. If you would like to try minit on this dataset, please contact minit on BPI2018@minit.io.

Important Dates

Publication of the data: Early February 2018
Abstract submission deadline: 2 June 2018
Report submission deadline: 16 June 2018
Presentation of the winners: At the BPI workshop 2018 in Sydney Australia
Workshop Days: 9-10 September 2018

The Challenge

Like last year, we have decided to have three categories, namely students, academics and professionals. Thanks to the sponsoring of both Celonis and Minit, we can invite winners in all three categories to join the workshop in Sydney to present their findings.

The Student Category

This category targets Bachelor, Master and PhD students or student teams. In this category, the focus is on the originality of the results, the validity of the claims and the depth of the analysis of specific issues identified. We expect participants can focus on a specific aspect of interest and analyze this aspect in great detail. Here, one can choose for example to focus on specific models, such as control-flow models, social network models, performance models, predictive models, etc.

The winner: Jarno Brils, Nina van den Elsen, Jan de Priester and Tom Slooff of the Honors Academy of Eindhoven University of Techology with their report entitled Analysis and Prediction of Undesired Outcomes

The Academic Category

This category targets academics. The focus in this category is much more on the novelty of the techniques applied than the actual results. This provides a great opportunity for BPI researchers to show the practical applicability of their tools and/or techniques on real-life data.

The winner: Stephen Pauwels and Toon Calders of the University of Antwerp with their report entitled Detecting and Explaining Drifts in Yearly Grant Applications

The Professional Category

This category targets professionals to show their skills in analyzing business processes. The submitted reports are judged on their level of professionalism. The participants are expected to report on a broader range of aspects, where each aspect does not have to be developed in full detail. The report submitted in this category will be judged on its completeness of analysis and usefulness for the purpose of a real-life business improvement setting.

The winner: Lalit Wangikar, Sumit Dhuwalia, Abhilasha Yadav, Bhavy Dikshit and Dikshant Yadav from Cognitio Analytics with their report entitled Faster Payments to Farmers: Analysis of the Direct Payments Process of EU's Agricultural Guarantee Fund

The winners were selected by a jury and the winners presented their findings at the workshop in Sydney, Australia!

The Process

The European Union spends a large fraction of its budget on the Common Agricultural Policy (CAP). Among these spendings are direct payments, which are mainly aimed to provide a basic income for farmers decoupled from production. The rest of the CAP budget is spent for market related expenditures and rural development.

The processes that govern the distribution of these funds are subject to complex regulations captured in EU and national law. The member states are required to operate an Integrated Administration and Control System (IACS), which includes IT systems to support the complex processes of subsidy distribution.

The process considered in the BPI Challenge 2018 covers the handling of applications for EU direct payments for German farmers from the European Agricultural Guarantee Fund. The process repeats every year with minor changes due to changes in EU regulations. About 10% of the cases are subject to a more rigorous on-site inspection.

The Data

The data for this year's challenge is brought to you by the German company data experts, located in Neubrandenburg. They provide data from their Java Enterprise system profil c/s.

Profil c/s supports these processes at the level of federal ministries of agriculture and local departments. The system supports various kinds of administrative processes, but for this challenge, the focus is on the yearly allocation of direct payments, starting with the application and, if all goes well, finishing with the authorization of a payment.

The workflows in profil c/s can be understood in terms of documents, where each document has a state that allows for certain actions. These actions can be executed manually at any point in time through document specific tools or they can be scheduled automatically. The latter may be either explicitly stated in the log or implicitly apparent if a large number of actions is performed by the same user at around the same time (batch processing).

In total, the event log contains 2,514,266 events for 43,809 applications over a period of three years. The shortest case contains 24 events, the longest 2973 and on average there are 57 events per case referring to 14 activities. As mentioned, the data is centered around documents and for your convenience, we provide both the complete log file as well as log files for each document type, in which each instance of a document is a case. We expect to publish the data in the 4TU datacenter soon!

There are nine different document types in the data listed in the table below. From 2015 to 2016, the Parcel document was succeeded by the Geo Parcel Document. In 2017, the Geo Parcel document also replaced the Department Control Parcels document.

Document type Sub Process Explanation
Control summary Main A document containing the summarized results of various checks (reference alignment, department control, inspections)
Department control parcels (before 2017) Main A document containing the results of checks regarding the validity of parcels of a single applicant
Entitlement application Main
Objection
Change
The application document for entitlements, i.e., the right to apply for direct payments, usually created once at the beginning of a new funding period
Inspection On-Site Remote A document containing the results of on-site or remote-inspections
Parcel Document (before 2016) Main The document containing all parcels for which subsidies are requested
Geo Parcel Document (replaces Parcel document since 2016 and Department control parcels since 2017) Main
Declared
Reported
The document containing all parcels for which subsidies are requested. From 2017, the Geo Parcel Document also replaces the Department control parcels document.
Payment application Main
Application
Objection
Change
The application document for direct payments, usually each year
Reference alignment Main A document containing the results of aligning the parcels as stated by the applicant with known reference parcels (e.g., a cadaster)

For each document type, one or more sub-processes can be found in the data. These sub processes refer to the overall sequence of events that influence a document. The leading document is always an application for which a number of other documents are created. Typically, documents are created by the initialize activity. Then, documents are edited, typically shown by begin editing until finish editing or a similar pair. While editing, several things may be recorded, for example some calculations are made or the application is saved. The log shows the times at which these events are completed and there are considerable dependencies between the sub-activities of different documents. For example, you will usually only be able to decide an application after all the other documents are in a final state.

Download

The data is made available through the 4TU Center for research data as usual . However, for your convenience, we have the data ready for download right now:

When you use this data, please site this as “van Dongen, B.F. (Boudewijn); Borchert, F. (Florian) (2018) BPI Challenge 2018. Eindhoven University of Technology. Dataset. https://doi.org/10.4121/uuid:3301445f-95e8-4ff0-98a4-901f1f204972”. The Bibtex or other formats can be downloaded from https://doi.org/10.4121/uuid:3301445f-95e8-4ff0-98a4-901f1f204972/object/citation.

Trace attributes

The following attributes are recorded for each case, where each case represent one application of one applicant in a specific year.

Attribute Type Explanation
program-id literal Internal id of the funding program
concept:name
(and application)
literal Unique case id for the application
identity:id UUID Globally unique case id (UUID)
Department literal Id of the local department
application literal The applicant’s id, the same across years
year literal The current year
number_parcels* discrete The number of parcels
area continuous The total area of all parcels
basic_payment boolean Application for basic payment scheme
greening boolean Application for greening payment
redistribution boolean Application for re-distributive payment
small farmer boolean Application for small farmer scheme
young farmer boolean Application for payment for young farmers
applicant literal Anonimized identifier of applicants
Derived attributes
penalty_{xxx} boolean Indicates if a penalty was applied for a certain reason {xxx} (see also the business questions). The following reasons can be found in the log: JLP1, AVGP, C4, JLP3, JLP2, JLP5, JLP6, C9, AVJLP, V5, CC, AVUVP, GP1, B16, BGK, C16, AGP, B3, B2, AVBP, B5, B4, B6, ABP, AUVP, AJLP, BGKV, JLP7, B5F, BGP.
amount_applied{x}* continuous Amount (in Euro) applied for in the application. The number indicates the current payment subprocess, starting with zero. If a case requires changes by the department or due to objection by the applicant, this number is increased by 1 for each payment.
payment_actual{x}* continuous Amount (in Euro) actually received by the applicant. For the meaning of {x}, see above.
penalty_amount{x} continuous Penalty applied by the department, e.g., due to over-declaration of parcel sizes. For the meaning of {x}, see above. Only available if penalty_applied is true.
risk_factor continuous An optional, manually assigned risk assessment factor.
cross_compliance continuous A penalty term due to violation of cross-compliance rules.
selected_random boolean Has the application been selected for an inspection at random?
selected_risk boolean Has the application been selected for an inspection due to risk assessment?
selected_manually boolean Has the application been selected for an inspection manually?
rejected boolean Entire rejection of the application

* The marked attributes have been binned in groups of 100 for anonymization purposes, where the bins are identified by the minimum value. This means, if you encounter a value for “area” of 50 ha, you know that the actual area was at least 50 ha but not larger than the next largest value in the data set. Since the binning is done per year, there may be small differences in the attribute values for applicants across years as these values indicate the lower bound of the interval. For instance, you may observe that an applicant got € 100 more in 2016 compared to 2015, but this may only be due to the boundaries of the bins.

Event attributes

The following attributes are recorded for each event. All events are included in the application log and within each application, events are ordered by timestamp. It is important to realize that if two events have exactly the same timestamp, their ordering cannot be concluded from the order in which they appear in the file.

Attribute Type Explanation
success boolean Indicates whether the event was successful.
concept:name
(and activity)
literal Activity that was performed of which this event indicates the completion.
docid literal Internal id of the document the event refers to.
doctype literal Type of the document as indicated in the list of document types above.
eventid literal Internal id of an event (may be null in case of an inferred event)
lifecycle:transition literal Value is "complete" for all events. Included for compatibility with some tools that require it.
note literal Free text note included for the event. Defaults to "none" if no note is available
org:resource literal indicates the resource responsible for the event.
subprocess literal Subprocess to which the event belongs. Each document is subdivided in a number of subprocesses
time:timestamp timestamp Time at which the event occurred. Note that ordering of events with identical timestamps cannot be concluded from the file. Also note that some timestamps are manually entered and may therefore contain spelling mistakes.
docid_uuid UUID Globally unique id for the document the event belongs to. There is a 1-to-1 correspondence between docid and docid_uuid.
identity:id UUID Globally unique id of each event. Supersedes the eventid attribute in case that the eventid is not unique (e.g. null). Events have a unique identity:id attribute accross all files.

Additional files

Applicants can file an application each year and within each application, multiple documents are kept. Hence there is a one-to-many relation between applications and documents. To study the documents independently, we provide separate log files for each document type. Within these files, the same events are included as in the original files, but the case id is based on the docid attribute of the event and only events with the correct doctype attribute are included in each file.

The identity:id attributes of each event in these files is globally unique, i.e. this UUID can be used to cross-reference the various log files. For each document, the traces also have an additional trace-level attribute “application” referring to the application in the application log file to which this document belongs.

Business Questions

The company has formulated four business questions on the data. They encourage the participants to focus on one or more questions. However, any other insights that can be obtained on the data are welcome. In your reports, please indicate clearly which question you answered.

Undesired outcomes

A usual case is opened around May of the respective year and should be closed by the end of the year. By “closed”, we refer to the timely payment of granted subsidies. There are, however, several cases each year where this could not be achieved:

  • Undesired outcome 1: The payment is late. A payment can be considered timely, if there has been a “begin payment” activity by the end of the year that was not eventually followed by “abort payment”.
  • Undesired outcome 2: The case needs to be reopened, either by the department (subprocess “Change”) or due to a legal objection by the applicant (subprocess “Objection”). This may result in additional payments or reimbursements (“payment_actual{x}“ > 0, where x ≥ 1 refers to the xth payment after the initial one)

Question: We would like to detect such cases as early as possible. Ideally, this should happen before a decision is made for this case (first occurrence of “Payment application+application+decide”). You may use data from previous years to make predictions for the current year.

Prediction of penalties (risk assessment)

The applicant may not receive the total amount of what has been applied for. This may occur to a variety of reason, i.e., the stated size of the farmland did not match the actual size as determined by alignment with the reference or a remote or on-site inspection. Other reasons include the violation of cross-compliance rules or noncompliance with the young farmer condition.

The occurrence of such a penalty is indicated by the cut amount (“penalty_amount{x}”) and a code for one or more reasons (“penalty_{xxx}”). Some of these are considered more severe (namely: B3, B4, B5, B6, B16, BGK, C16, JLP3, V5 and BGP, BGKV, B5F in Q2). A certain amount of applications is selected for the more rigorous (on-site) inspection. This may either happen due to an internal risk assessment (“selected_risk”) or randomly (“selected_random”).

We would expect the risk assessment to reveal a comparatively larger fraction of severe violations in the selected sample. However, we see room for improvement.

Question: Can you draw a better sample of the same size (about 5%) with a better recall in uncovering the severe cases (as defined above)?

Note: You should only use events as predictors that happened before the remote and on-site inspections for the particular year have started (for example, the 27th of June 2015). You should also exclude the attributes in the table in the section “derived attributes”, as these are not known before the inspection has taken place and the application was processed.

You may however, use all data from previous years.

In fact, we would be very interested in discovering dependencies across years. However, we would be also interested in statistical evidence that the current year’s risk is independent of whatever happened in the past.

Differences between departments

Departments may have implemented their processes differently and the hypothesis is that there is a relationship between the different processes and the problems described in questions 1 and 2.

Question: How can one characterize the differences between departments and is there indeed a relation?

Differences across years

Usually, around the same number of applications from the same farmers is handled every year. The processes should be similar each year, but may differ due to changes in regulations or in their technical implementation (for instance, the document type “parcel document” has been replaced by the more sophisticated “geo parcel document” in 2016).

Question: How can one characterize these differences as a particular instantiation of concept drift?

Questions about the challenge

Like before, participants can post questions about the data/process in the ProM forum. The company monitors the messages there and will try to respons as soon as possible

Submissions

Submissions should be made through EasyChair at https://www.easychair.org/conferences/?conf=bpi2018 where you indicate your submission to be a challenge submission. A submission should contain a pdf report of at most 30 pages, including figures, using the LNCS/LNBIP format (http://www.springer.com/computer/lncs?SGWID=0-164-6-791344-0) specified by Springer (available for both LaTeX and MS Word). Appendices may be included, but should only support the main text.