FPDM-SoSe2023 | Moodle

Einschreibeoptionen

Fachprojekt "Data-Mining und Datenanalyse", LSF, 040269

(English version below)

Data-Mining und Datenanalyse sind aus unserem Alltag nicht mehr wegzudenken. Hier fällt unser Blick häufig auf den / die Data Scientist:in, die in ihrem Alltag Daten aufbereitet und eine Vielzahl an Modellen trainiert. Sobald jedoch ein gutes Modell für eine bestimmte Anwendung gefunden wird, verändert sich der Fokus vom Modelltraining auf die kontinuierliche Modellanwendung mit einer Vielzahl von Nutzer. Durch diese vertikale Verteilung des Modells ist der Energieaufwand für eine Modellanwendung häufig höher als der Energieaufwand für das eigentliche Modelltraining. Hierzu ein kurzes Beispiel: Der Tesla Autopilot verwendet ein Deep Learning Modell zur Autosteuerung, welches mit Hilfe eines selbstgebauten Chip, der rund 57 W verbraucht, ausgeführt wird. Das Kraftfahrt-Bundesamt schätzt, dass 2018 (also vor der Coronapandemie) alle deutschen Autofahrer:innen kombiniert ca. 630 Milliarden Kilometer mit ihrem PKW gefahren sind. Dabei beträgt die Durchschnittsgeschwindigkeit ca. 45 km/h, was zu einer kombinierten Gesamtfahrzeit von ca. 14 Milliarden Stunden führt. Wenn man nun für alle diese Fahrten Teslas Autopiloten benutzen möchte, so kommt man auf einen kombinierten Energieverbrauch von ca. 0.79 TWh. Dies entspricht ungefähr dem Energieerzeugnis des größten Wasserkraftwerkes in Deutschland!

Daher steht in diesem Fachprojekt die effiziente Modellanwendung im Vordergrund. Der erste Schritt zu einer energiesparenden Modellanwendung ist die Wahl effizienter Modellklassen. Aus diesem Grund fokussieren wir uns in diesem Fachprojekt insbesondere auf klassische ML Verfahren, d.h. auf nicht-deep learning Methoden. Hierzu sollen die Studierenden zunächst die Grundlagen des Maschinellen Lernens kennenlernen und eine eigene ML Lösung auf einem Beispieldatensatz mit Hilfe von scikit-learn implementieren. Anschließend sollen die Studierende eine geeignete Modellklasse aus scikit-learn auswählen und das deployment (d.h. ausliefern) trainierter Modelle aus dieser Modellklasse auf ein kleines, energiesparendes Gerät (z.B: Raspberry PI) implementieren.

Studierende, welche dieses Fachprojekt erfolgreich abgeschlossen haben,

- kennen die Grundbegriffe des Maschinellen Lernens

- kennen sich mit Machine Learning library scikit-learn aus

- kennen die Grundbegriffe des model deployments, insbesondere auf kleinen Geräten

Das Fachprojekt wird in zwei Teile geteilt: Der erste Teil beginnt mit einer kurzen Vorlesungsphase (ca. 2 Sitzungen), in der die Grundbegriffe des Maschinellen Lernens und die Grundzüge von scikit-learn erklärt werden. Anschließend dürfen die Studierenden selbstständig an einem ML Projekt arbeiten. Im zweiten Teil des Fachprojektes wird dann das model deployment in den Vordergrund gestellt und Studierende sollen eigenständig eine deployment pipeline für trainierte Modelle entwickeln und testen. Die Ergebnisse sollen in FastInference integriert und mit einer kurzen Abschlusspräsentation gesichert werden.

Hinweis: Kenntnisse in Python (zum Trainieren der Modelle) und in C/C++ (für das deployment) sind hilfreich.

(German version above)

Data mining and data analysis has become an integral part of our everyday lives. Here, our gaze often falls on the Data Scientist, who prepares data and trains a variety of models in their everyday life. However, once a good model is found for a specific application, the focus changes from model training to continuous model application with a variety of end users. Due to this vertical distribution of the model, the energy consumption for model application is often higher than the energy consumption for the actual model training. Here's a quick example: the Tesla Autopilot uses a Deep Learning model for car control, which is executed using a home-built chip that consumes about 57 W. The German Federal Motor Transport Authority estimates that in 2018 (i.e., before the corona pandemic), all German drivers combined drove about 630 billion kilometers in their passenger cars. The average speed is about 45 km/h, resulting in a combined total driving time of about 14 billion hours. If we now want to use Tesla's Autopilot for all these journeys, we arrive at a combined energy consumption of approx. 0.79 TWh. This is roughly equivalent to the energy output of the largest hydroelectric power plant in Germany!

Therefore, the efficient model application is in the foreground of this Fachprojekt. The first step to an energy-efficient model application is the choice of efficient model classes. For this reason, we focus in particular on classical ML methods, i.e. non-deep learning methods. To this end, students will first learn the basics of machine learning and implement their own ML solution on a sample dataset using scikit-learn. Subsequently, students will select a suitable model class from scikit-learn and implement the deployment of trained models from this model class to a small, low-power device (e.g., Raspberry PI).

Students who have successfully completed this Fachprojekt,

- know the basic concepts of Machine Learning

- are familiar with the machine learning library scikit-learn

- know the basic concepts of model deployment, especially on small devices.

The Fachprojekt will be divided into two parts: The first part starts with a short lecture phase (about 2 sessions), in which the basic concepts of Machine Learning and the main features of scikit-learn are explained. Afterward, the students are allowed to work independently on an ML project. In the second part of the project, the focus is on model deployment, and students are expected to independently develop and test a deployment pipeline for trained models. The results are to be integrated into FastInference and backed up with a short final presentation.

Note: It is recommended that students have a basic knowledge of Python and C/C++

Lehrende:r: Sebastian Buschjäger

Einschreibeoptionen

Fachprojekt "Data-Mining und Datenanalyse", LSF, 040269

Selbsteinschreibung (Teilnehmer:in)