This commit is contained in:
NikolajDanger
2023-05-27 20:06:53 +02:00
parent 7fc9591ed7
commit 63eb4fe3c8
2 changed files with 94 additions and 63 deletions

Binary file not shown.

View File

@ -8,6 +8,9 @@
\usepackage{biblatex}
\usepackage{float}
\usepackage{fontspec}
\usepackage{enumitem}
\usetikzlibrary{arrows.meta, positioning, calc, quotes}
% --- Configuration ---
\bibliography{src/references}
@ -71,62 +74,92 @@
\caption{An example of a heterogeneous workflow}
\end{figure}
The example workflow requires several "halting-points", in which data should be transferred between the instrument, the instrument storage, centralized storage, High Performance Computing (HPC) resources, and a human interaction point. Network events can, for the reasons outlined earlier in the section, be used to prevent the workflow from halting when these points are reached.
The example workflow requires several checkpoints in which data should be transferred between the instrument, the instrument storage, centralized storage, High Performance Computing (HPC) resources, and a human interaction point. Network events can, for the reasons outlined earlier in the section, be used to prevent the workflow from halting when these points are reached.
\subsection{Background}
\subsubsection{The structure of MEOW}
The MEOW event-based scheduler consists of four main components: \textit{monitors}, \textit{handlers}, \textit{the conductor}, and \textit{the runner}.
The MEOW event-based scheduler consists of four main components: \textit{monitors}, \textit{handlers}, \textit{conductors}, and \textit{the runner}.
Monitors listen for triggering events. They are initialized with a number of \textit{patterns}, which describe the triggering event. When a pattern's triggering event occurs, the monitor signals to the conductor that the pattern has been triggered, and schedules a job that has been associated with the pattern.
Monitors listen for triggering events. They are initialized with a number of \textit{rules}, which each include a \textit{pattern} and \textit{recipe}. \textit{Patterns} describe the triggering event. For file events, the patterns describe a path that should trigger the event when changed. \textit{Recipes} describe the specific action that should be taken when the rule is triggered. When a pattern's triggering event occurs, the monitor sends an event, which contains the rule and the specifics of the event, to the event queue.
Handlers manage the event queue. They unpack and analyze events in the event queue. If they are valid, a job is created from the recipe, which is then sent to the job queue.
Conductors manage the jobs queue. They execute the jobs that have been created by the handlers.
Finally, the runner is the main program that orchestrates all these components. Each instance of the runner incorporates at least one instance of a monitor, handler, and conductor, and it holds the event and job queues.
\begin{figure}[H]
\begin{center}
\includegraphics[width=0.6\textwidth]{src/monitor.png}
\end{center}
\caption{\textbf{Redo this to fit with the current version.} The monitor's role in MEOW's event-based system.}
\end{figure}
\begin{tikzpicture}[
element/.style={draw, rectangle, rounded corners, minimum height = 1cm},
arrow/.style={-Triangle, ultra thick,shorten >=4pt}
]
\node[element,text width=8cm,align=center,fill=orange!30!white] at (0,2) (run) {Runner};
\node[element,fill=cyan!30!white] at (-2,1.3) (eq) {Event Queue};
\node[element,fill=yellow!50!white] at (2,1.3) (jq) {Job Queue};
\node[element,fill=blue!30!white] at (-5,-1.5) (mon) {Monitor};
\node[text width=2cm,align=center] at (-5,-2.8) {Listens for triggering events};
\node[element,fill=green!30!white] at (0,-4) (han) {Handler};
\node[text width=2cm,align=center] at (0,-5.35) {Validates events and creates jobs};
\node[element,fill=red!40!white] at (5,-1.5) (con) {Conductor};
\node[text width=2cm,align=center] at (5,-2.6) {Executes jobs};
\begin{tcolorbox}[colback=blue!30!white]
I haven't used "Resources" to describe the job queue. Should I do that or should I rephrase the diagram to be more in line with the rest of the project?
\end{tcolorbox}
Handlers perform actions and jobs on behalf of the scheduler. They are initialized with a number of \textit{recipes}, which describe the action to be taken. The handler starts a job when signal to do so by the conductor.
The conductor handles the jobs queue. It is initialized with a number of rules, which a pattern paired with a recipe. When a monitor sends it a triggered pattern, the rules are checked for that pattern. If one or more rules contain that pattern, the corresponding recipes are triggered in their handler.
Finally, the runner is the main program that orchestrates all these components. Each instance of the runner incorporates at least one instance of a monitor, handler, and conductor.
\begin{figure}[H]
\begin{center}
\begin{tikzpicture}
\node[draw,rectangle,rounded corners,text width=8cm,align=center] at (0,2) (run) {Runner};
\node[draw,rectangle,rounded corners] at (0,0) (con) {Conductor};
\node[draw,rectangle,rounded corners] at (3,-2) (mon) {Monitor};
\node[draw,rectangle,rounded corners] at (-3,-2) (han) {Handler};
\draw[arrow] (mon) -- (eq) node[pos=0.5,above left=-10pt,text width=2cm, align=center] {Schedules events};
\draw[arrow] (eq) -- (han) node[pos=0.8,below left=-20pt,text width=2cm, align=center] {Pulls events};
\draw[arrow] (han) -- (jq) node[pos=0.2,right,text width=2cm, align=center] {Schedules job};
\draw[arrow] (jq) -- (con) node[pos=0.5,above right=-10pt,text width=2cm, align=center] {Pulls jobs};
\end{tikzpicture}
\end{center}
\caption{\textbf{WIP.} How the elements of MEOW interact.}
\caption{How the elements of MEOW interact}
\end{figure}
\begin{figure}[H]
\begin{center}
\begin{tikzpicture}[
element/.style={draw, rectangle, rounded corners, minimum height = 1cm, text width=2cm, align=center},
every edge/.style={-Triangle, draw, ultra thick, bend left, text width= 2cm, align=center,shorten >=5pt,shorten <=5pt},
bend angle = 15
]
\node[element,fill=blue!30!white,anchor=south] at (90:2.5) (mon) {\textbf{Monitor}};
\node[element,fill=cyan!30!white,anchor=south west] at (30:2) (eq) {\textbf{Event Queue}};
\node[element,fill=green!30!white,anchor=north west] at (330:2) (han) {\textbf{Handler}};
\node[element,fill=yellow!50!white,anchor=north] at (270:2.5) (jq) {\textbf{Job Queue}};
\node[element,fill=red!40!white,anchor=north east] at (210:2) (con) {\textbf{Conductor}};
\node[element,fill=lightgray!80!white,anchor=south east] at (150:2) (sto) {\textbf{Storage}};
\draw (mon) edge ["Schedules events on"] (eq);
\draw (eq) edge ["Events are interpreted by"] (han);
\draw (han) edge ["Schedules jobs to"] (jq);
\draw (jq) edge ["Jobs executed by"] (con);
\draw (con) edge ["Writes output to"] (sto);
\draw (sto) edge ["Events are seen by"] (mon);
\end{tikzpicture}
\end{center}
\caption{The cycle of MEOW's file events}
\end{figure}
\subsubsection{The \texttt{meow\_base} codebase}
\texttt{meow\_base}\autocite{MeowBase} is an implementation of MEOW written in python. It is written to be modular, using base classes for each element in order to ease the implementation of additional handlers, monitors, etc.
\begin{tcolorbox}[colback=blue!30!white]
How much should I include here?
\end{tcolorbox}
\begin{tcolorbox}[colback=lightgray!30!white]
\begin{itemize}
\item The runner (brief)
\item Conductors (brief)
\item Recipes and handlers (brief)
\item File event monitor (Watchdog)
\item Events (important to clarify how file events work since I refer to it in the method section)
\item Testing (brief)
The relevant parts of the implementation are:
\begin{itemize}
\setlength{\itemsep}{-5pt}
\item \textbf{Events} are python dictionaries, containing the following items:\begin{itemize}[topsep=-10pt]
\setlength{\itemsep}{-5pt}
\item \texttt{EVENT\_PATH}: The path of the triggering file.
\item \texttt{EVENT\_TYPE}: The type of event, e.g. \texttt{"watchdog"}.
\item \texttt{EVENT\_RULE}: The rule that triggered the event, which contains the recipe that the handler will turn into a job.
\item \texttt{EVENT\_TIME}: The timestamp of the triggering event.
\item Any extra data supplied by the monitor. File events are by default initialized with the base directory of the event and a hash of the event's triggering path.
\end{itemize}
\end{tcolorbox}
\item \textbf{The file event monitor} inherits from the \texttt{BaseMonitor} class. It uses the \texttt{Watchdog} module to monitor given directories for changes. The Watchdog monitor is initialized with an instance of the \texttt{WatchdogEventHandler} class as its event handler. When the Watchdog monitor is triggered by a file event, the \texttt{handle\_event} method is called on the event handler, which in turn creates an \texttt{event} based on the specifics of the triggering event. The event is then sent to the runner to be put in the even queue.
\item \textbf{The runner} is implemented as the class \texttt{MeowRunner}. When initialized with at least one instance of a monitor, handler, and conductor, it validates them. When started, all the monitors, handlers, and conductors it was initialized with are started. It also creates \texttt{pipes} for the communication between each element and the runner.
\item \textbf{Recipes} inherit from the \texttt{BaseRecipe} class. They mainly exist to contain data about a given recipe, but also contain validation checks.
\item \textbf{Handlers} inherit from the \texttt{BaseHandler} class. Handler classes are for a specific type of job, like the execution of bash scripts. When started, it enters an infinite loop, where it asks the runner for a valid event in the event queue, and then creates a job for the recipe, and sends it to the runner to put in the job queue.
\item \textbf{Conductors} inherit from the \texttt{BaseConductor} class. Conductor classes are for a specific type of job, like the execution of bash scripts. When started, it enters an infinite loop, where it asks the runner for a valid job in the job queue, and then attempts to execute it.
\end{itemize}
\subsubsection{The \texttt{socket} library}
@ -162,56 +195,54 @@
\subsection{Design of the network event pattern}
In the implementation of a pattern for network events, a key consideration was to integrate it seamlessly with the existing MEOW codebase. This required designing the pattern to behave similarly to the file event pattern when interacting with other elements of the scheduler. A central principle in this design was maintaining the loose coupling between patterns and recipes, minimizing direct dependencies between separate components. While this might not be possible for every theoretical recipe and pattern, designing for it could greatly improve future compatibility.
Network event patterns are initialized with a triggering port, analogous to the triggering path used in file event patterns. This approach inherently limits the number of unique patterns to the number of ports that can be opened on the machine. However, given the large number of potential ports, this constraint is unlikely to present a practical issue. An alternative approach could have involved triggering patterns using a part of the sent message, essentially acting as a "header". However, this would complicate the process since the monitor is otherwise designed to receive raw data. To keep the implementation as straightforward as possible and to allow for future enhancements, I opted for simplicity over complexity in this initial design.
The \texttt{NetworkEventPattern} class is initialized with a triggering port, analogous to the triggering path used in file event patterns. This approach inherently limits the number of unique patterns to the number of ports that can be opened on the machine. However, given the large number of potential ports, this constraint is unlikely to present a practical issue. An alternative approach could have involved triggering patterns using a part of the sent message, essentially acting as a "header". However, this would complicate the process since the monitor is otherwise designed to receive raw data. To keep the implementation as straightforward as possible and to allow for future enhancements, I opted for simplicity and broad utility over complexity in this initial design.
Once the network monitor is started, it opens sockets that start listening on the each of the ports specified in the patterns it was initialized with. This is consistent with the behavior of the file event monitor, which monitors the triggering paths of the patterns it was initialized with.
When the \texttt{NetworkMonitor} instance is started, it starts a number of \texttt{Listener} instances, equal to the number of ports specified in its patterns. Patterns not associated with a rule are not considered, since they will not result in an event. Only one listener is started per port, so patterns with the same port use the same listener. The listeners each open a socket connected to their respective ports. This is consistent with the behavior of the file event monitor, which monitors the triggering paths of the patterns it was initialized with.
\subsection{Integrating network events into the existing codebase}
The data received by the network monitor is written to a temporary file, a design choice that serves two purposes.
The data received by the network monitor is written to a temporary file; this design choice serves three purposes:
Firstly, this method is a practical solution for managing memory usage during data transfer, particularly for large data sets. By writing received data directly to a file, we bypass the need to store the entire file in memory at once, effectively addressing potential memory limitations.
Secondly, this approach allows the leveraging of existing infrastructure built for file events. The newly written temporary file is passed as the "triggering path" of the event, mirroring the behavior of file events. This approach allows network events to utilize the recipes initially designed for file events without modification, preserving the principle of loose coupling. This integration maintains the overall flexibility and efficiency of MEOW while extending its capabilities to handle network events.
Secondly, the method allows the monitor to receive multiple files simultaneously, since receiving the file will be done by separate threads. This means that a single large file will not "block up" the network port for too long.
Lastly, this approach allows the leveraging of existing infrastructure built for file events. The newly written temporary file is passed as the "triggering path" of the event, mirroring the behavior of file events. This approach allows network events to utilize the recipes initially designed for file events without modification, preserving the principle of loose coupling. This integration maintains the overall flexibility and efficiency of MEOW while extending its capabilities to handle network events.
The method will be slower, since writing to storage takes longer than keeping the data in memory, but I have decided that the positives outweigh the negatives.
\subsection{Testing}
The unit tests for the network event monitor were inspired by the already existing tests for the file event monitor. Since the aim of the monitor was to emulate the behavior of the file event monitor as closely as possible, using the already existing tests with minimal changes proved an effective way of staying close to that goal.
\section{Results}
\begin{tcolorbox}[colback=lightgray!30!white]
Does it work? How well?
\end{tcolorbox}
% \subsection{Testing}
The testing suite designed for the monitor comprised of 26 distinct tests, all of which successfully passed. These tests were designed to assess the robustness, reliability, and functionality of the monitor. They evaluated the monitor's ability to successfully manage network event patterns, detect network events, and communicate with the runner to send events to the event queue.
\subsection{Discussion}
\begin{tcolorbox}[colback=lightgray!30!white]
With the hindsight of the results, what could I have done better?
With the hindsight of the results, what could I have done better?
\end{tcolorbox}
\subsection{Future Work}
\begin{tcolorbox}[colback=lightgray!30!white]
What should someone do if they want to fix my mistakes, or expand on them further.
\begin{itemize}
\item Implementation of the other options mentioned when discussing the socket library.
\item Triggering on a header item in addition to port
\end{itemize}
\end{tcolorbox}
\subsubsection{Use-cases for Network Events}
Since the purpose of the project was adding a feature to a workflow manager, it's important to consider its integration within real-life workflows and consider future workflow designs that will capitalize on Network Events.
\begin{tcolorbox}[colback=lightgray!30!white]
Give context to following paragraph.
\end{tcolorbox}
One specific example of a use-case where network event triggers could prove useful is the workflow for The Brain Imaging Data Structure (BIDS). The BIDS workflow requires data to be sent between multiple machines and validated by a user. Network event triggers could streamline this process by automatically initiating data transfer tasks when specific conditions are met, thereby reducing the need for manual management. Additionally, network triggers could facilitate user validation by allowing users to manually prompt the continuation of the workflow through specific network requests, simplifying the user's role in the validation process.
One specific example of an application where network event triggers could prove useful is the workflow for The Brain Imaging Data Structure (BIDS). The BIDS workflow requires data to be sent between multiple machines and validated by a user. Network event triggers could streamline this process by automatically initiating data transfer tasks when specific conditions are met, thereby reducing the need for manual management. Additionally, network triggers could facilitate user validation by allowing users to manually prompt the continuation of the workflow through specific network requests, simplifying the user's role in the validation process.
\begin{figure}[H]
\begin{center}
\includegraphics[width=0.6\textwidth]{src/BIDS.png}
\end{center}
\caption{\textbf{Temp.} The structure of the BIDS workflow. Data is transferred to user, and to the cloud.}
\caption{The structure of the BIDS workflow. Data is transferred to user, and to the cloud.}
\end{figure}
\subsubsection{Additional Monitors}
The successful development and implementation of the network event monitor for MEOW serves as a precedent for the creation of additional monitors in the future. This framework could be utilized as a blueprint for developing new monitors tailored to meet specific demands, protocols, or security requirements.
For instance, security might play a crucial role in the processing and transfer of sensitive data across various workflows. The network event monitor developed in this project, which uses the Python socket library, might not satisfy the security requirements of all workflows, especially those handling sensitive data. In such cases, developing a monitor that leverages the \texttt{ssl} library could provide a solution, enabling encrypted communication and thus improving the security of data transfer. The architecture of the network event monitor can guide the development of an \texttt{ssl} monitor, taking advantage of the similarities between the \texttt{socket} and \texttt{ssl} libraries.
Similarly, we could envision monitors developed specifically for certain protocols. For example, a monitor designed to handle HTTP requests could be beneficial for workflows interacting with web services. As HTTP is a common protocol, this type of monitor would open up a vast array of potential interactions with external services, making MEOW even more versatile.
\section{Conclusion}
\begin{tcolorbox}[colback=lightgray!30!white]
Did I succeed in what I wanted to do?
\end{tcolorbox}
With the monitor performing effectively as tested, it can be anticipated that it will handle network event triggers correctly in live environments. This is a critical enhancement for MEOW, opening up possibilities for more complex, distributed, and heterogeneous workflows, as envisioned in the design objectives.
\newpage
\appendix