✨

2023-06-09 17:27:19 +02:00
parent b16caa717e
commit b88ed518ae
22 changed files with 4441 additions and 46 deletions
--- a/bachelor-project-nikolaj.tex
+++ b/bachelor-project-nikolaj.tex
@ -1,6 +1,5 @@
 \documentclass[a4paper,11pt]{article}
-\usepackage[margin=1.3in]{geometry}
-\usepackage[most]{tcolorbox}
+\usepackage[margin=3.3cm, top=2.8cm]{geometry}
 \usepackage{xcolor}
 \usepackage{tikz}
 \usepackage{fancyhdr} % for headers
@ -10,6 +9,7 @@
 \usepackage{fontspec}
 \usepackage{enumitem}
 \usepackage{array}
+\usepackage[en,science]{ku-frontpage/ku-frontpage}

 \usetikzlibrary{arrows.meta, positioning, calc, quotes}

@ -29,13 +29,17 @@

 \newcolumntype{P}[1]{>{\centering\arraybackslash}p{#1}}

-\title{Adding Network Event Triggers to an Event-based Workflow Scheduler}
+\assignment{Bachelor's project}
+\title{Network Event Triggers in an Event-based Workflow Scheduler}
+\subtitle{}
 \author{Nikolaj Ingemann Gade (\texttt{qhp695})}
+\advisor{Advisor: David Marchant}
 \date{June 2023}

 \begin{document}
    \maketitle{}

+    \setcounter{page}{1}
    \section{Abstract}
    This paper introduces a network event monitor to the Managing Event Oriented Workflows (MEOW) system, enabling it to respond to data transmitted over a network connection. The Python-based implementation uses the socket library, incorporates a new pattern type for network events, and reuses existing infrastructure for file events. Performance tests reveal robust handling of events with multiple listeners, demonstrating the viability of this enhancement. The design fosters future extensions, marking an essential step in advancing the capabilities of scientific workflow management systems to meet the dynamic demands of data-intensive fields

@ -239,6 +243,7 @@
        \item \texttt{sendall()}: Sends data to a socket.
    \end{itemize}

+    \newpage
    \section{Method}
    \textit{Code available here: \autocite{Implementation}}

@ -269,9 +274,11 @@
    \subsection{Data Type Agnosticism}
    An important aspect to consider in the functioning of the network monitor is its data type agnosticism: the \texttt{NetworkMonitor} does not impose restrictions or perform checks on the type of incoming data. While this approach enhances the speed and simplicity of the implementation, it also places a certain level of responsibility on the recipes that work with the incoming data. The recipes, being responsible for defining the actions taken upon execution of a job, must be designed with a full understanding of this versatility. They should incorporate necessary checks and handle potential inconsistencies or anomalies that might arise from diverse types of incoming data.

-    \begin{tcolorbox}[colback=lightgray!30!white]
-        Justify. The file events don't check for errors. The system is resistant, so errors don't really matter. Protocol specific monitors could check better.
-    \end{tcolorbox}
+    It's worth noting that this agnostic approach is not exclusive to the network event monitor, but is also characteristic of the file event monitor within MEOW. The underlying philosophy here is to maintain a certain level of simplicity and versatility in the monitors, while entrusting the recipes with the task of handling and interpreting the data. This design choice avoids adding undue complexity to the monitor itself and aligns with the overall modularity of the system.
+
+    Furthermore, MEOW is a fault-tolerant system. This means that if a job encounters an error due to incompatible or unexpected data types, it doesn't halt the entire workflow but instead allows other jobs to continue executing. This resilience reduces the potential disruption caused by unforeseen data types or unexpected data errors.
+
+    However, in a possible future iteration of the system, particularly for workflows that require protocol-specific monitors like HTTP or FTP, the monitors might be designed to perform more sophisticated checks on the incoming data. This could involve validating the format or content of incoming data, or handling certain protocol-specific error conditions. Incorporating such checks would add a layer of robustness to the system, and enhance its reliability when dealing with more stringent or regulated data requirements.

    \subsection{Testing}
    The unit tests for the network event monitor were inspired by the already existing tests for the file event monitor. Since the aim of the monitor was to emulate the behavior of the file event monitor as closely as possible, using the already existing tests with minimal changes proved an effective way of staying close to that goal. The tests verify the following behavior:
@ -305,6 +312,7 @@

    The tests are done in isolation, without a runner. The events are verified by pulling them from the monitor-to-runner pipeline directly. The timing starts after all monitors have been started, but immediately before sending the messages, and ends when all of the events have been received in the runner pipeline.

+    \newpage
    \subsubsection{Single Listener}
    To assess how a single listener handles many events at once, I implemented a procedure where a single listener in the monitor was subjected to a varying number of events, ranging from 1 to 1,000. For each quantity of events, I sent n network events to the monitor and recorded the response time. To ensure reliability of the results and mitigate the effect of any outliers, each test was repeated 50 times.

@ -320,14 +328,14 @@
            \textbf{count} & Total & Per event & Total & Per event & Total & Per event & \textbf{deviation}\\ \hline\hline
            \multicolumn{8}{|c|}{\textbf{Laptop}} \\ \hline
            1 & 0.62ms & 0.62ms & 33ms & 33ms & 2.5ms & 2.5ms & 4.6ms \\\hline
-            10 & 5.5ms & 0.55ms & 2036ms & 203ms & 218ms & 21ms & 495ms \\\hline
-            100 & 51ms & 0.52ms & 4267ms & 42ms & 1372ms & 13ms & 1273ms \\\hline
-            1000 & 462ms & 0.46ms & 20500ms & 20ms & 8165ms & 8.2ms & 5034ms \\\hline\hline
+            10 & 5.5ms & 0.55ms & 2,036ms & 203ms & 218ms & 21ms & 495ms \\\hline
+            100 & 51ms & 0.52ms & 4,267ms & 42ms & 1,372ms & 13ms & 1,273ms \\\hline
+            1000 & 462ms & 0.46ms & 20,500ms & 20ms & 8,165ms & 8.2ms & 5,034ms \\\hline\hline
            \multicolumn{8}{|c|}{\textbf{Desktop}} \\ \hline
-            1 & & & & & & & \\\hline
-            10 & & & & & & & \\\hline
-            100 & & & & & & & \\\hline
-            1000 & & & & & & & \\\hline
+            1 & 0.42ms & 0.42ms & 5.3ms & 5.3ms & 1.2ms & 1.2ms & 0.75ms \\\hline
+            10 & 3.0ms & 0.30ms & 2,033ms & 203ms & 153ms & 15ms & 405ms \\\hline
+            100 & 27ms & 0.27ms & 6,221ms & 62ms & 1,394ms & 13ms & 1,516ms \\\hline
+            1000 & 297ms & 0.30ms & 16,848ms & 16ms & 4,011ms & 4.0ms & 3,011ms \\\hline
        \end{tabular}
        }
        \caption{The results of the Single Listener performance tests.}
@ -345,14 +353,14 @@
            \textbf{count} & Total & Per event & Total & Per event & Total & Per event & \textbf{deviation}\\ \hline\hline
            \multicolumn{8}{|c|}{\textbf{Laptop}} \\ \hline
            1 & 0.61ms & 0.61ms & 16ms & 16ms & 2.2ms & 2.2ms & 0.8ms \\\hline
-            10 & 4.8ms & 0.48ms & 3053ms & 305ms & 135ms & 14ms & 330ms \\\hline
-            100 & 46ms & 0.46ms & 7233ms & 72ms & 1230ms & 12ms & 1225ms \\\hline
+            10 & 4.8ms & 0.48ms & 3,053ms & 305ms & 135ms & 14ms & 330ms \\\hline
+            100 & 46ms & 0.46ms & 7,233ms & 72ms & 1,230ms & 12ms & 1,225ms \\\hline
            1000 & 422ms & 0.42ms & 37,598ms & 37ms & 8,853ms & 8.9ms & 6,543ms \\\hline\hline
            \multicolumn{8}{|c|}{\textbf{Desktop}} \\ \hline
-            1 & & & & & & & \\\hline
-            10 & & & & & & & \\\hline
-            100 & & & & & & & \\\hline
-            1000 & & & & & & & \\\hline
+            1 & 0.40ms & 0.40ms & 3.5ms & 3.5ms & 1.5ms & 1.5ms & 0.56ms \\\hline
+            10 & 2.9ms & 0.29ms & 2,036ms & 203ms & 149ms & 14ms & 364ms \\\hline
+            100 & 27ms & 0.27ms & 6,223ms & 62ms & 683ms & 6.8ms & 970ms \\\hline
+            1000 & 272ms & 0.27ms & 26,828ms & 26ms & 5,437ms & 5.4ms & 4,798ms \\\hline
        \end{tabular}
        }
        \caption{The results of the second suite of Single Listener performance tests.}
@ -399,14 +407,14 @@
            500 & 663ms & 3,321ms & 928ms & 412ms \\\hline
            1000 & 893ms & 3,592ms & 1,163ms & 380ms \\\hline\hline
            \multicolumn{5}{|c|}{\textbf{Desktop}} \\ \hline
-            1 & & & & \\\hline
-            10 & & & & \\\hline
-            100 & & & & \\\hline
-            250 & & & & \\\hline
-            500 & & & & \\\hline
-            1000 & & & & \\\hline
+            1 & 269ms & 24,828ms & 8,090ms & 6,177ms \\\hline
+            10 & 283ms & 19,655ms & 5,193ms & 4,253ms \\\hline
+            100 & 289ms & 7,911ms & 2,114ms & 2,026ms \\\hline
+            250 & 321ms & 5,890ms & 1,002ms & 1,085ms \\\hline
+            500 & 361ms & 475ms & 386ms & 26ms \\\hline
+            1000 & 441ms & 613ms & 462ms & 27ms \\\hline
        \end{tabular}
-        \caption{The results of the Multiple Listeners performance tests with 2 significant digits.}
+        \caption{The results of the Multiple Listeners performance tests.}
    \end{table}

    \begin{figure}[H]
@ -432,7 +440,7 @@

    In such cases, understanding the impact on system performance becomes crucial. This test helps evaluate how well the system handles the extra load and whether there are any unforeseen issues or bottlenecks when multiple monitors are active concurrently. This knowledge would be invaluable for any future improvements or enhancements in MEOW's design and implementation.

-    The test works similarly to the previous one, in that I will maintain 1000 events spread across a number of monitors. Each monitor will have 1 listener associated.
+    The test works similarly to the previous one, in that I will maintain 1000 events spread across a number of monitors. Each monitor will have 1 listener associated. The tests will be repeated 100 times.

    \begin{table}[H]
        \centering
@ -447,14 +455,14 @@
            500 & 508ms & 2,282ms & 867ms & 391ms \\\hline
            1000 & 601ms & 2,893ms & 1,197ms & 661ms \\\hline\hline
            \multicolumn{5}{|c|}{\textbf{Desktop}} \\ \hline
-            1 & & & & \\\hline
-            10 & & & & \\\hline
-            100 & & & & \\\hline
-            250 & & & & \\\hline
-            500 & & & & \\\hline
-            1000 & & & & \\\hline
+            1 & 288ms & 13,259ms & 5,370ms & 3,338ms \\\hline
+            10 & 289ms & 9,542ms & 2,615ms & 2,059ms \\\hline
+            100 & 292ms & 7,703ms & 1,833ms & 1,485ms \\\hline
+            250 & 297ms & 5,563ms & 1,037ms & 1,210ms \\\hline
+            500 & 314ms & 424ms & 328ms & 19ms \\\hline
+            1000 & 342ms & 466ms & 357ms & 19ms \\\hline
        \end{tabular}
-        \caption{The results of the Multiple Listeners performance tests with 2 significant digits.}
+        \caption{The results of the Multiple Listeners performance tests.}
    \end{table}

    \begin{figure}[H]