✨

2023-06-04 16:43:26 +02:00
parent ecf6829582
commit c6a126710a
6 changed files with 150 additions and 27 deletions
--- a/bachelor-project-nikolaj.pdf
+++ b/bachelor-project-nikolaj.pdf
--- a/bachelor-project-nikolaj.tex
+++ b/bachelor-project-nikolaj.tex
@ -29,11 +29,15 @@

 \newcolumntype{P}[1]{>{\centering\arraybackslash}p{#1}}

+\title{Adding Network Event Triggers to an Event-based Workflow Scheduler}
+\author{Nikolaj Ingemann Gade (\texttt{qhp695})}
+\date{June 2023}
+
 \begin{document}
+    \maketitle{}
+
    \section{Abstract}
-    \begin{tcolorbox}[colback=lightgray!30!white]
-    Explain briefly the paper and what it does.
-    \end{tcolorbox}
+    This paper introduces a network event monitor to the Managing Event Oriented Workflows (MEOW) system, enabling it to respond to data transmitted over a network connection. The Python-based implementation uses the socket library, incorporates a new pattern type for network events, and reuses existing infrastructure for file events. Performance tests reveal robust handling of events with multiple listeners, demonstrating the viability of this enhancement. The design fosters future extensions, marking an essential step in advancing the capabilities of scientific workflow management systems to meet the dynamic demands of data-intensive fields

    \section{Introduction}

@ -97,6 +101,7 @@

    In this report, I will walk through the design and implementation process of this feature, detailing the challenges encountered and how they were overcome.

+    \newpage
    \subsection{Problem}

    In its current implementation, MEOW is able to trigger jobs based on changes to monitored local files. This covers a range of scenarios where the data processing workflow involves the creation, modification, or removal of files. By monitoring file events, MEOW's event-based scheduler can dynamically execute tasks as soon as the required conditions are met, ensuring efficient and timely processing of the data. Since the file monitor is triggered by changes to local files, MEOW is limited to local workflows.
@ -183,6 +188,7 @@
        \caption{The cycle of MEOW's file events}
    \end{figure}

+    \newpage
    \subsubsection{The \texttt{meow\_base} codebase}

    \texttt{meow\_base}\autocite{MeowBase} is an implementation of MEOW written in python. It is written to be modular, using base classes for each element in order to ease the implementation of additional handlers, monitors, etc.
@ -276,6 +282,7 @@
        \item The monitor only initializes listeners for patterns with associated rules, and rules updated during runtime are applied.
    \end{itemize}

+    \newpage
    \section{Results}
    The testing suite designed for the monitor comprised of 26 distinct tests, all of which successfully passed. These tests were designed to assess the robustness, reliability, and functionality of the monitor. They evaluated the monitor's ability to successfully manage network event patterns, detect network events, and communicate with the runner to send events to the event queue.

@ -292,6 +299,8 @@
        \end{tabular}
    \end{table}

+    The tests are done in isolation, without a runner. The events are verified by pulling them from the monitor-to-runner pipeline directly. The timing starts after all monitors have been started, but immediately before sending the messages, and ends when all of the events have been received in the runner pipeline.
+
    \subsubsection{Single Listener}
    To assess how a single listener handles many events at once, I implemented a procedure where a single listener in the monitor was subjected to a varying number of events, ranging from 1 to 1,000. For each quantity of events, I sent n network events to the monitor and recorded the response time. To ensure reliability of the results and mitigate the effect of any outliers, each test was repeated 50 times.

@ -319,7 +328,10 @@

    \begin{figure}[H]
        \centering
-        \includegraphics[width=\textwidth]{src/performance_results/single_listener.png}
+
+        \centerline{
+        \includegraphics[width=1.2\textwidth]{src/performance_results/single_listener.png}
+        }
        \caption{The results of the Single Listener performance test plotted logarithmically.}
    \end{figure}

@ -330,7 +342,7 @@
    \subsubsection{Multiple Listeners}
    The next performance test investigates how the introduction of multiple listeners affects the overall processing time. This test aims to understand the implications of distributing events across different listeners on system performance. Specifically, we're looking at how having multiple listeners in operation might impact the speed at which events are processed.

-    In this test, I will maintain a constant total of 1000 events, but they will distributed evenly across varying numbers of listeners: 1, 10, 100, and 1000. By keeping the total number of events constant while altering the number of listeners, I aim to isolate the effect of multiple listeners on system performance.
+    In this test, I will maintain a constant total of 1000 events, but they will distributed evenly across varying numbers of listeners between 1 and 1000. By keeping the total number of events constant while altering the number of listeners, I aim to isolate the effect of multiple listeners on system performance. Once again, each test will be performed 50 times.

    A key expectation for this test is to observe if and how much the overall processing time increases as the number of listeners goes up. This would give insight into whether operating more listeners concurrently introduces additional overhead, thereby slowing down the process. The results of this test would then inform decisions about optimal listener numbers in different usage scenarios, potentially leading to performance improvements in MEOW's handling of network events.

@ -338,16 +350,20 @@
        \centering
        \begin{tabular}{|p{1.5cm}||P{2.5cm}|P{2.5cm}|P{2.5cm}|}
            \hline
-            \textbf{Listener} & \textbf{Minimum time} & \textbf{Maximum time} & \textbf{Average time} \\ \hline\hline
+            \textbf{Listener count} & \textbf{Minimum time} & \textbf{Maximum time} & \textbf{Average time} \\ \hline\hline
            \multicolumn{4}{|c|}{\textbf{Laptop}} \\ \hline
            1 & 0.63s & 17s & 5.6s  \\\hline
            10 & 0.46s & 25s & 7.6s \\\hline
            100 & 0.42s & 20s & 7.1s  \\\hline
+            250 & 0.51s & 7.9s & 2.9s \\\hline
+            500 & 0.59s & 1.6s & 0.72s  \\\hline
            1000 & 0.92s & 3.24s & 1.49s   \\\hline\hline
            \multicolumn{4}{|c|}{\textbf{Desktop}} \\ \hline
            1 & 0.24s & 16s & 5.2s \\\hline
            10 & 0.24s & 19s & 4.0s \\\hline
            100 & 0.25s & 10s & 1.0s \\\hline
+            250 & 0.27s & 12s & 0.90s \\\hline
+            500 & 0.31s & 0.33s & 0.31s \\\hline
            1000 & 0.38s & 0.42s & 0.40s \\\hline
        \end{tabular}
        \caption{The results of the Multiple Listeners performance tests with 2 significant digits.}
@ -355,11 +371,71 @@

    \begin{figure}[H]
        \centering
-        \includegraphics[width=\textwidth]{src/performance_results/multiple_listeners.png}
+
+        \centerline{
+        \includegraphics[width=1.2\textwidth]{src/performance_results/multiple_listeners.png}
+        }
        \caption{The results of the Multiple Listeners performance test plotted logarithmically.}
    \end{figure}

-    % \subsection{Discussion}
+    The results of the Multiple Listener performance test provide fascinating insights into how the Network Monitor's performance scales with the number of listeners. From the data collected, I observe that there is relatively minor fluctuation, or a slight decrease, in maximum and average calculation time when distributing 1000 events across 1, 10, and 100 listeners. This implies that the system is able to handle increases in listener count up to a certain point without significantly impacting performance.
+
+    However, at 500 listeners, a noticeable drop in maximum and average calculation time occurs, followed by a slight increase when each of the 1000 listeners receives a single event. This trend could be attributed to the efficiency of the system in handling smaller, more distributed loads, possibly due to better utilization of threading.
+
+    Contrastingly, the minimum calculation time begins to increase once we reach 200 listeners, with further increases at 500 and 1000 listeners. This could suggest that while the system generally performs well under more distributed loads, the base overhead associated with managing multiple listeners starts to become more pronounced. Each listener requires some system resources to manage, so as the number of listeners increases, the minimum time necessary for processing might increase accordingly.
+
+    \subsubsection{Multiple monitors}
+
+    The final test explores the performance of the system when multiple Network Event Monitors are run simultaneously. Although the current design and usage of MEOW wouldn't typically involve running multiple instances of the same monitor, it's important to anticipate potential future scenarios. Given the ever-evolving nature of computational workflows and the potential for different types of network event monitors to be developed, it's plausible to imagine a future situation where more than one network event monitor could be active at the same time.
+
+    In such cases, understanding the impact on system performance becomes crucial. This test helps evaluate how well the system handles the extra load and whether there are any unforeseen issues or bottlenecks when multiple monitors are active concurrently. This knowledge would be invaluable for any future improvements or enhancements in MEOW's design and implementation.
+
+    The test works similarly to the previous one, in that I will maintain 1000 events spread across a number of monitors. Each monitor will have 1 listener associated.
+
+    \begin{table}[H]
+        \centering
+        \begin{tabular}{|p{1.5cm}||P{2.5cm}|P{2.5cm}|P{2.5cm}|}
+            \hline
+            \textbf{Monitor count} & \textbf{Minimum time} & \textbf{Maximum time} & \textbf{Average time} \\ \hline\hline
+            \multicolumn{4}{|c|}{\textbf{Laptop}} \\ \hline
+            1 & 0.63s & 17s & 5.6s \\\hline
+            10 & 0.45s & 25s & 6.6s \\\hline
+            100 & 0.38s & 18s & 4.4s \\\hline
+            250 & 0.40s & 13s & 1.8s \\\hline
+            500 & 0.44s & 2.9s & 0.72s \\\hline
+            1000 & 0.52s & 2.3s & 0.70s \\\hline\hline
+            \multicolumn{4}{|c|}{\textbf{Desktop}} \\ \hline
+            1 & 0.24s & 16s & 5.2s \\\hline
+            10 & 0.23s & 20s & 6.5s \\\hline
+            100 & 0.24s & 18s & 2.9s \\\hline
+            250 & 0.25s & 7.6s & 0.80s \\\hline
+            500 & 0.26s & 0.30s & 0.27s \\\hline
+            1000 & 0.29s & 0.30s & 0.29s \\\hline
+        \end{tabular}
+        \caption{The results of the Multiple Listeners performance tests with 2 significant digits.}
+    \end{table}
+
+    \begin{figure}[H]
+        \centering
+
+        \centerline{
+        \includegraphics[width=1.2\textwidth]{src/performance_results/multiple_monitors.png}
+        }
+        \caption{The results of the Multiple Monitors performance test plotted logarithmically.}
+    \end{figure}
+
+    The results are similar to the results of the previous performance test: the computation time drops significantly as the amount of events sent to each listener approaches 1. As with the previous test, the results show that having many threads doing less work is more efficient than a single thread doing all the work, with a small price in computational overhead.
+
+    \subsection{Discussion}
+    At the outset of this project, the primary objective was to integrate network event triggers into MEOW, enriching its capability to respond dynamically to network communications. I intended to design a solution that seamlessly integrates with the existing MEOW infrastructure and preserves the loose coupling between patterns and recipes.
+
+    Reflecting on this work, it is apparent that I have made significant strides in achieving this aim. The implemented network monitor successfully listens to network traffic and triggers corresponding jobs upon detecting matching patterns. This has expanded the capability of MEOW, enabling it to react and adapt to dynamic network events in real-time.
+
+    I ensured the solution would be a good fit within the existing codebase, leveraging existing structures where possible and introducing new components only when necessary. For instance, the network event patterns were designed with a similar structure to the file event patterns, ensuring compatibility with the current system. Furthermore, the data received by the network monitor was written to a temporary file, making the most of the infrastructure already in place for file events. This approach has been effective in achieving a seamless integration, preserving the flexibility of MEOW and allowing existing workflows to continue functioning without modifications.
+
+    Finally, the performance tests conducted provide a reasonable level of assurance that the network event triggers can handle a significant number of events, demonstrating the practicality and scalability of the solution. The network monitor has proven its ability to handle many simultaneous events, and the introduction of multiple listeners did not significantly affect performance, attesting to the robustness and scalability of the implementation.
+
+    Given these achievements, it can be confidently said that the project has met and even exceeded its initial objectives, thereby making a valuable contribution to the further development and versatility of the MEOW system.

    \subsection{Future Work}
    \subsubsection{Use-cases for Network Events}
--- a/src/make_graphs.py
+++ b/src/make_graphs.py
@ -1,6 +1,6 @@
 import matplotlib.pyplot as plt

-plt.rcParams.update({'font.size':22})
+plt.rcParams.update({'font.size':35})

 def single_listener():
    fig, (ax1,ax2) = plt.subplots(1,2)
@ -15,7 +15,7 @@ def single_listener():
    ax1.plot(x, y13, label="Average", linewidth=5)
    ax1.plot(x, y11, label="Minimum", linewidth=5)

-    ax1.legend()
+    ax1.legend(bbox_to_anchor=(0.2,1.4))
    ax1.grid(linewidth=2)
    ax1.set_title("Laptop")

@ -35,7 +35,6 @@ def single_listener():
    ax2.plot(x, y23, label="Average", linewidth=5)
    ax2.plot(x, y21, label="Minimum", linewidth=5)

-    ax2.legend()
    ax2.grid(linewidth=2)
    ax2.set_title("Desktop")

@ -46,25 +45,25 @@ def single_listener():
    ax2.set_yscale("log")

    fig.set_figheight(12)
-    fig.set_figwidth(25)
+    fig.set_figwidth(35)
    fig.set_dpi(100)

-    fig.savefig("performance_results/single_listener.png")
+    fig.savefig("performance_results/single_listener.png",bbox_inches='tight')

 def multiple_listeners():
    fig, (ax1,ax2) = plt.subplots(1,2)

-    x = [1,10,100,1000]
+    x = [1,10,100,250,500,1000]

-    y11 = [00.63,00.46,00.42,00.92]
-    y12 = [17.00,25.00,20.00,03.24]
-    y13 = [05.60,07.60,07.10,01.49]
+    y11 = [00.63,00.46,00.42,0.51,0.59,00.92]
+    y12 = [17.00,25.00,20.00,7.90,1.60,03.24]
+    y13 = [05.60,07.60,07.10,2.90,0.72,01.49]

    ax1.plot(x, y12, label="Maximum", linewidth=5)
    ax1.plot(x, y13, label="Average", linewidth=5)
    ax1.plot(x, y11, label="Minimum", linewidth=5)

-    ax1.legend()
+    ax1.legend(bbox_to_anchor=(0.2,1.4))
    ax1.grid(linewidth=2)
    ax1.set_title("Laptop")

@ -76,15 +75,14 @@ def multiple_listeners():

    ###

-    y21 = [00.24,00.24,00.25,0.38]
-    y22 = [16.00,19.00,10.00,0.42]
-    y23 = [05.20,04.00,01.00,0.4]
+    y21 = [00.24,00.24,00.25,00.27,0.31,0.38]
+    y22 = [16.00,19.00,10.00,12.00,0.33,0.42]
+    y23 = [05.20,04.00,01.00,00.90,0.31,0.4]

-    ax2.plot(x, y12, label="Maximum", linewidth=5)
-    ax2.plot(x, y13, label="Average", linewidth=5)
-    ax2.plot(x, y11, label="Minimum", linewidth=5)
+    ax2.plot(x, y22, label="Maximum", linewidth=5)
+    ax2.plot(x, y23, label="Average", linewidth=5)
+    ax2.plot(x, y21, label="Minimum", linewidth=5)

-    ax2.legend()
    ax2.grid(linewidth=2)
    ax2.set_title("Desktop")

@ -95,11 +93,60 @@ def multiple_listeners():
    ax2.set_yscale("log")

    fig.set_figheight(12)
-    fig.set_figwidth(25)
+    fig.set_figwidth(35)
    fig.set_dpi(100)

-    fig.savefig("performance_results/multiple_listeners.png")
+    fig.savefig("performance_results/multiple_listeners.png",bbox_inches='tight')
+
+def multiple_monitors():
+    fig, (ax1,ax2) = plt.subplots(1,2)
+
+    x = [1,10,100,250,500,1000]
+
+    y11 = [00.63,00.45,00.38,00.40,0.44,0.52]
+    y12 = [17.00,25.00,18.00,13.00,2.90,2.30]
+    y13 = [05.60,06.60,04.40,01.80,0.72,0.70]
+
+    ax1.plot(x, y12, label="Maximum", linewidth=5)
+    ax1.plot(x, y13, label="Average", linewidth=5)
+    ax1.plot(x, y11, label="Minimum", linewidth=5)
+
+    ax1.legend(bbox_to_anchor=(0.2,1.4))
+    ax1.grid(linewidth=2)
+    ax1.set_title("Laptop")
+
+    ax1.set_xlabel("Monitor count")
+    ax1.set_ylabel("Time")
+
+    ax1.set_xscale("log")
+    ax1.set_yscale("log")
+
+    ###
+
+    y21 = [00.24,00.23,00.24,0.25,0.26,0.29]
+    y22 = [16.00,20.00,18.00,7.60,0.30,0.30]
+    y23 = [05.20,06.50,02.90,0.80,0.27,0.29]
+
+    ax2.plot(x, y22, label="Maximum", linewidth=5)
+    ax2.plot(x, y23, label="Average", linewidth=5)
+    ax2.plot(x, y21, label="Minimum", linewidth=5)
+
+    ax2.grid(linewidth=2)
+    ax2.set_title("Desktop")
+
+    ax2.set_xlabel("Monitor count")
+    ax2.set_ylabel("Time")
+
+    ax2.set_xscale("log")
+    ax2.set_yscale("log")
+
+    fig.set_figheight(12)
+    fig.set_figwidth(35)
+    fig.set_dpi(100)
+
+    fig.savefig("performance_results/multiple_monitors.png",bbox_inches='tight')

 if __name__ == "__main__":
    single_listener()
    multiple_listeners()
+    multiple_monitors()
--- a/src/performance_results/multiple_listeners.png
+++ b/src/performance_results/multiple_listeners.png
--- a/src/performance_results/multiple_monitors.png
+++ b/src/performance_results/multiple_monitors.png
--- a/src/performance_results/single_listener.png
+++ b/src/performance_results/single_listener.png