From Interactive Data Analysis and Visual Process Design to Automated Machine Learning, Real-Time Data Streaming, and Machine Learning Model Deployment from an Easy-to-Use Graphical User Interface
Increasing amounts of available data and advanced data analysis and machine learning enable new insights, forecasts, automation, and other value creating solutions for many use cases across many industries. The value of such solution increases significantly, if a wider variety of data sources can be integrated and if real-time predictions based on real-time data support decision processes and automations. Simplifying the design and deployment of data analysis and machine learning processes enables broader groups of users to leverage the power of machine learning for their use cases. The EU-funded R&D project INFORE (Interactive Extreme-Scale Analytics and Forecasting) addressed the challenges posed by huge datasets and data streams and paved the way for real-time, interactive extreme-scale analytics and forecasting. Today, at an increasing rate, industrial and scientific institutions need to deal with massive data flows, streaming-in from maritime surveillance applications, financial forecasting applications or cancer cells growth simulations as well as a multitude of other sources. The ability to forecast, as early as possible, a good approximation to the outcome of a time-consuming and resource demanding computational task allows to quickly identify undesired outcomes and save valuable amounts of time, effort and computational resources. Since not everyone is a data scientist and since not everyone knows how to configure and program the tools needed for real-time data streaming, making the design and deployment of data analysis processes simpler is crucial for a wider adoption of these technologies and to enable users in many industries to leverage the value creation potential for their use cases. Within the INFORE project, the focus of RapidMiner was to provide a unified software platform seamlessly integrating all data sources, data streaming technologies (Kafka, Flink, Spark Streaming, etc.), machine learning algorithms and libraries (Python, R, Google TensorFlow, DL4J, H2O, Keras, Weka, etc.), model validation schemes (cross-validation, sliding window validation, etc.), deployment options, visualizations, model monitoring and operations with easy-to-use interfaces. Users can visually design data analysis workflows in an easy-to-use Graphical User Interface (GUI) without having to code and they can train and deploy machine learned models locally on their computer or server or on distributed data streams, Hadoop or Spark clusters, in the cloud, or on the edge – all from a single unified graphical user interface. The goal is to ease and accelerate the process from data and idea to productive analysis processes and value creation with machine learning for as many users as possible, including not only data scientists but also domain experts from various industries like beer brewers, electrical engineers, manufacturers, etc. as well as business analysts and managers. Within INFORE, RapidMiner and its project partners have developed an easy-to-use framework for handling and integrating large data streams from various sources using various standard technologies like Kafka, Flink, and Spark Streaming, for data preprocessing, for machine learning and parameter optimization (mostly offline), and for real-time model deployment on real-time data streams (online). The INFORE project demonstrated the applicability of this framework and its time series and forecasting capabilities and its Complex Event Detection and Prediction (CEP) capabilities on use cases in various domains including maritime surveillance and issue detection, financial time series forecasting, and cancer cell growth simulations and predictions. This presentation first provides an overview of the RapidMiner data science platform and then focuses on the developed real-time data streaming and machine learning framework and its user interface in RapidMiner.
Ralf Klinkenberg, founder and head of research at RapidMiner, is a data-driven entrepreneur with more than 30 years of experience in machine learning, artificial intelligence, and advanced data analytics research, software development, consulting, and applications in the automotive, aviation, chemical, finance, healthcare, insurance, internet, manufacturing, pharmaceutical, retail, software, and telecom industries. He holds Master of Science degrees in computer science with focus on artificial intelligence, machine learning, and predictive analytics from Technical University of Dortmund, Germany, and Missouri University of Science and Technology (MST), Rolla, MO, USA. In 2001 he initiated the open source data mining software project RapidMiner and in 2007 he founded the predictive analytics software company RapidMiner together with Dr. Ingo Mierswa. In 2008 he won the European Open Source Business Award and in 2016 he was awarded the European Data Innovator Award. In 2017 the German government invited him to the steering committee of the “Plattform Lernende Systeme”, an initiative of the German government to promote the use of machine learning and artificial intelligence in industry and society, which he serves since then. In 2018 and 2020 he consulted the German government in the formulation of its artificial intelligence strategy. Ralf Klinkenberg is co-organizer of the Industrial Data Science (IDS) conference series. He is passionate about learning in humans and machines as well as about how to leverage data to make organization more data-driven, more agile, more efficient and effective, and more successful using data mining and machine learning, both from a business and a technical perspective. Today RapidMiner has more than 1 Mio. registered users in more than 150 countries world-wide and is one of the most widely used predictive analytics platforms world-wide. The analysts of Forrester and Gartner view RapidMiner as one of the world-leading software platforms for machine learning and data science.