Presented by

  • Eduardo Silva

    Eduardo Silva
    @edsiper
    https://edsiper.linuxchile.cl

    Eduardo is a Principal Engineer at ARM Treasure Data. He currently leads the efforts to make logging and data processing more friendly and scalable in Embedded and Containerized systems such as Kubernetes. He is the maintainer of Fluent Bit[0], a scalable log and stream processor for Linux, sub-project of Fluentd ecosystem. Eduardo started his background in open source development around the year 2000, where he started his project called Monkey HTTP Server[1], a Linux focused web server written in C. His journey continues with parallel projects, sharing knowledge at OSS conferences and working in Linux development in Oracle/Ksplice, Treasure Data and now in ARM. [0] https://fluentbit.io [1] http://monkey-project.com

Abstract

Logging is one of the ancient mechanism behavior to perform application or hardware analysis. In a new era of distributed systems at scale and connected embedded devices, data collection and processing becomes a real challenge; Logging has been forced to evolve and adapt to new needs. In Data Analysis, logging is one of the key components to collect and pre-process data, usually, a logging mechanism goes through collect, parse, filter and centralize logs to a storage backend like a database, so data processing and analysis can be performed. This usually happens after the data has been aggregated and stored, but for real-time analysis needs, process the data while is still in motion brings a lot of advantages and this kind of approach is called Stream Processing. What if it was possible to query your data using aggregation functions, windowing, and grouping results while the data was in motion and in-memory but on the edge side?. In this presentation, we will go further and present an extended approach called 'Stream Processing on the Edge', where data is processed on the edge service or device, in a lightweight mode empowering features like anomaly detection (in the order of milliseconds) and Machine Learning in a distributed way using pure Open Source software.