.. _tutorial: Tutorial: Modeling with Cimba ============================= Over the course of five different worked examples, we will go from a simple M/M/1 queue simulation with just two active processes, make it run as a designed experiment with parallel trials, and then gradually add more powerful simulation tools such as resource acquisition, timers, and condition variables. We will end the tutorial by demonstrating how to harness CUDA kernels for massively parallel simulated physics inside a simulation with hundreds of parallel trials and a thousand active processes inside each trial. .. _tut_1: A simple M/M/1 queue parallelized --------------------------------- In this section, we will walk through the development of a simple model from connecting basic entities and interactions to parallelizing the model on all available CPU cores and producing presentation-ready output. Our first simulated system is a M/M/1 queue. In queuing theory (Kendall) notation, this abbreviation indicates a queuing system where the arrival process is memoryless with exponentially distributed intervals, the service process is the same, there is only one server, and the queue has unlimited capacity. This is a mathematically well understood system. The simulation model will verify if the well-known formula for expected queue length is correct (or vice versa). Arrival, service, and the queue ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ We model this in a straightforward manner: We need an arrival process that puts customers into the queue at random intervals, a service process that gets customers from the queue and services them for a random duration, and the queue itself. We are not concerned with the characteristics of each customer, just how many there are in the queue, so we do not need a separate object for each customer. We will use a :c:struct:`cmb_buffer` for this. In this first iteration, we will hard-code parameter values for simplicity, such as 75 % utilization, and then do it properly shortly. Let us start with the arrival and service processes. The code can be very simple: .. code-block:: c #include void *arrival_proc(struct cmb_process *me, void *ctx) { cmb_unused(me); struct cmb_buffer *bp = ctx; while (true) { const double rate = 0.75; const double mean = 1.0 / rate; const double t_ia = cmb_random_exponential(mean); cmb_process_hold(t_ia); uint64_t n = 1; cmb_buffer_put(bp, &n); } } void *service_proc(struct cmb_process *me, void *ctx) { cmb_unused(me); struct cmb_buffer *bp = ctx; while (true) { const double rate = 1.0; const double mean = 1.0 / rate; uint64_t m = 1; cmb_buffer_get(bp, &m); double t_srv = cmb_random_exponential(mean); cmb_process_hold(t_srv); } } The first statement in each function, ``cmb_unused(me)`` just states that the argument ``me`` will not be used. It has no other effect than suppressing any compiler warnings about that. The arrivals process generates an exponentially distributed random value with mean 1/0.75, holds for that amount of interarrival time, puts one new customer into the queue, and does it again. Note that the function runs in an infinite loop. Similarly, the service process gets a customer from the queue (waiting for one to arrive if there are none waiting), generates a random service time with mean 1.0, holds for the service time, and does it all over again. An average arrival rate of 0.75 and service rate of 1.0 gives the 0.75 utilization we wanted. This function also runs in an infinite loop. Note that the number of customers to ``put`` or ``get`` is given as a *pointer to a variable* containing the number, not just a value. In more complex scenarios than this, the process may encounter a partially completed put or get, and we need a way to capture the actual state in these cases. For now, just note that the amount argument to :c:func:`cmb_buffer_put()` and :c:func:`cmb_buffer_get()` is a pointer to an unsigned 64-bit integer variable. The process function signature is a function returning a pointer to void (i.e. a raw address to anything). It takes two arguments, the first one a pointer to a :c:struct:`cmb_process` (itself), the second a pointer to void that gives whatever context the process needs to execute. For now, we only use the context pointer as a pointer to the queue, and do not use the ``me`` pointer or the return value. Note also that all Cimba functions used here start with ``cmb_`` indicating that they belong in the namespace of things in the simulated world. There are functions from three Cimba modules here, :c:struct:`cmb_process`, :c:struct:`cmb_buffer`, and :c:struct:`cmb_random`. We will encounter other namespaces and modules soon. We also need a main function to set it all up, run the simulation, and clean up afterwards. Let us try this: .. code-block:: c int main(void) { const uint64_t seed = cmb_random_hwseed(); cmb_random_initialize(seed); cmb_event_queue_initialize(0.0); struct cmb_buffer *que = cmb_buffer_create(); cmb_buffer_initialize(que, "Queue", CMB_UNLIMITED); struct cmb_process *arr_proc = cmb_process_create(); cmb_process_initialize(arr_proc, "Arrival", arrival_proc, que, 0); cmb_process_start(arr_proc); struct cmb_process *serv_proc = cmb_process_create(); cmb_process_initialize(serv_proc, "Server", service_proc, que, 0); cmb_process_start(serv_proc); cmb_event_queue_execute(); cmb_process_terminate(serv_proc); cmb_process_destroy(serv_proc); cmb_process_terminate(arr_proc); cmb_process_destroy(arr_proc); cmb_buffer_terminate(que); cmb_buffer_destroy(que); cmb_event_queue_terminate(); cmb_random_terminate(); return 0; } The first thing it does is to get a suitable random number seed from a hardware entropy source and initialize our pseudo-random number generators with it. It then initializes the simulation event queue, specifying that our clock will start from the value 0.0. (It could be any other value.) Next, it creates and initializes the :c:struct:`cmb_buffer`, naming it "Queue" and giving it unlimited size. After that, it creates, initializes, and starts the arrival and service processes. They get a pointer to the newly created :c:struct:`cmb_buffer` as their context argument, and the event queue is ready, so they can just start immediately. Notice the pattern here: Objects are first *created*, then *initialized* in a separate call. The create method allocates heap memory for the object, the initialize method makes it ready for use. Some objects, such as the pseudo-random number generator and the event queue, already exist in pre-allocated memory and cannot be created. There are no ``cmb_event_queue_create()`` or ``cmb_event_queue_destroy()`` functions. In later examples, we will also see cases where some Cimba object is simply declared as a local variable and allocated memory on the stack. We still need to initialize it, since we are not in C++ with its "Resource Allocation Is Initialization" (RAII) paradigm. In C, resource allocation is *not* initialization (RAINI?), and we need to be very clear on each object's memory allocation and initialization status. We have tried to be as consistent as possible in the Cimba create/initialize/terminate/destroy object lifecycle. Having made it this far, ``main()`` calls :c:func:`cmb_event_queue_execute` to run the simulation before cleaning up. Note that there are matching ``_terminate()`` calls for each ``_initialize()`` and matching ``_destroy()`` for each ``_create()``. These functions un-initialize and and de-allocate the objects that were allocated and initialized. We can now run `our first simulation `_ and see what happens. If you have configured ``meson`` with build type ``debug``, the program will generate output like this: .. code-block:: none [ambonvik@Threadripper tutorial]$ ./tut_1_1 | more 0.0000 dispatcher cmb_event_queue_execute (331): Starting simulation run 0.0000 Arrival cmb_process_hold (333): Holding for 0.896663 time units 0.0000 Arrival cmb_process_timer_add (397): Scheduled timeout event at 0.896663 0.0000 Server cmb_buffer_get (207): Gets 1 from Queue, level 0 0.0000 Server cmb_buffer_get (244): Waiting for more, level now 0 0.0000 Server cmb_resourceguard_wait (158): Waits for Queue 0.89666 dispatcher wakeup_event_time (364): Wakes Arrival signal 0 0.89666 Arrival cmb_buffer_put (291): Puts 1 into Queue, level 0 0.89666 Arrival cmb_buffer_put (298): Success, found room for 1, has 0 remaining 0.89666 Arrival cmb_resourceguard_signal (228): Scheduling wakeup event for Server 0.89666 Arrival cmb_process_hold (333): Holding for 3.056723 time units 0.89666 Arrival cmb_process_timer_add (397): Scheduled timeout event at 3.953386 0.89666 dispatcher wakeup_event_resource (182): Wakes Server signal 0 0.89666 Server cmb_buffer_get (251): Returned successfully from wait 0.89666 Server cmb_buffer_get (207): Gets 1 from Queue, level 1 0.89666 Server cmb_buffer_get (214): Success, 1 was available, got 1 0.89666 Server cmb_process_hold (333): Holding for 0.508688 time units 0.89666 Server cmb_process_timer_add (397): Scheduled timeout event at 1.405351 1.4054 dispatcher wakeup_event_time (364): Wakes Server signal 0 1.4054 Server cmb_buffer_get (207): Gets 1 from Queue, level 0 1.4054 Server cmb_buffer_get (244): Waiting for more, level now 0 1.4054 Server cmb_resourceguard_wait (158): Waits for Queue 3.9534 dispatcher wakeup_event_time (364): Wakes Arrival signal 0 3.9534 Arrival cmb_buffer_put (291): Puts 1 into Queue, level 0 3.9534 Arrival cmb_buffer_put (298): Success, found room for 1, has 0 remaining 3.9534 Arrival cmb_resourceguard_signal (228): Scheduling wakeup event for Server 3.9534 Arrival cmb_process_hold (333): Holding for 1.896515 time units 3.9534 Arrival cmb_process_timer_add (397): Scheduled timeout event at 5.849901 3.9534 dispatcher wakeup_event_resource (182): Wakes Server signal 0 3.9534 Server cmb_buffer_get (251): Returned successfully from wait 3.9534 Server cmb_buffer_get (207): Gets 1 from Queue, level 1 3.9534 Server cmb_buffer_get (214): Success, 1 was available, got 1 3.9534 Server cmb_process_hold (333): Holding for 2.130072 time units 3.9534 Server cmb_process_timer_add (397): Scheduled timeout event at 6.083458 5.8499 dispatcher wakeup_event_time (364): Wakes Arrival signal 0 5.8499 Arrival cmb_buffer_put (291): Puts 1 into Queue, level 0 5.8499 Arrival cmb_buffer_put (298): Success, found room for 1, has 0 remaining 5.8499 Arrival cmb_process_hold (333): Holding for 0.272353 time units 5.8499 Arrival cmb_process_timer_add (397): Scheduled timeout event at 6.122255 6.0835 dispatcher wakeup_event_time (364): Wakes Server signal 0 6.0835 Server cmb_buffer_get (207): Gets 1 from Queue, level 1 6.0835 Server cmb_buffer_get (214): Success, 1 was available, got 1 6.0835 Server cmb_process_hold (333): Holding for 1.152526 time units 6.0835 Server cmb_process_timer_add (397): Scheduled timeout event at 7.235984 6.1223 dispatcher wakeup_event_time (364): Wakes Arrival signal 0 6.1223 Arrival cmb_buffer_put (291): Puts 1 into Queue, level 0 6.1223 Arrival cmb_buffer_put (298): Success, found room for 1, has 0 remaining 6.1223 Arrival cmb_process_hold (333): Holding for 2.188439 time units 6.1223 Arrival cmb_process_timer_add (397): Scheduled timeout event at 8.310693 7.2360 dispatcher wakeup_event_time (364): Wakes Server signal 0 7.2360 Server cmb_buffer_get (207): Gets 1 from Queue, level 1 7.2360 Server cmb_buffer_get (214): Success, 1 was available, got 1 7.2360 Server cmb_process_hold (333): Holding for 1.391523 time units 7.2360 Server cmb_process_timer_add (397): Scheduled timeout event at 8.627507 8.3107 dispatcher wakeup_event_time (364): Wakes Arrival signal 0 8.3107 Arrival cmb_buffer_put (291): Puts 1 into Queue, level 0 8.3107 Arrival cmb_buffer_put (298): Success, found room for 1, has 0 remaining 8.3107 Arrival cmb_process_hold (333): Holding for 3.128101 time units 8.3107 Arrival cmb_process_timer_add (397): Scheduled timeout event at 11.438795 8.6275 dispatcher wakeup_event_time (364): Wakes Server signal 0 8.6275 Server cmb_buffer_get (207): Gets 1 from Queue, level 1 8.6275 Server cmb_buffer_get (214): Success, 1 was available, got 1 8.6275 Server cmb_process_hold (333): Holding for 0.851718 time units 8.6275 Server cmb_process_timer_add (397): Scheduled timeout event at 9.479225 9.4792 dispatcher wakeup_event_time (364): Wakes Server signal 0 9.4792 Server cmb_buffer_get (207): Gets 1 from Queue, level 0 9.4792 Server cmb_buffer_get (244): Waiting for more, level now 0 9.4792 Server cmb_resourceguard_wait (158): Waits for Queue ... ...and will keep on doing that forever. We have to press Ctrl-C or similar to stop it. The good news is that it works. Each line in the trace contains the time stamp in simulated time, the name of the currently executing process, exactly what function and line number is logging, and a formatted message about what is happening. Our simulated processes seem to be doing what we asked them to do, but there is a wee bit too much information here. On the other hand, if you configured ``meson`` with build type ``release``, the program does not produce any output, just seems to hang there, again having to be interrupted with Ctrl-C. In the following, we will assume a ``debug`` build type until further notice. Ending a simulation ^^^^^^^^^^^^^^^^^^^ We will address stopping first. The processes are *coroutines*, executing concurrently on a separate stack for each process. Only one process can execute at a time. It continues executing until it voluntarily *yields* the CPU to some other coroutine. Calling :c:func:`cmb_process_hold()` will do exactly that, transferring control to the hidden dispatcher process that determines what to do next. However, the dispatcher only knows about *events*, not coroutines or processes. It will run as long as there are scheduled events to execute. Our little simulation will always have scheduled events, and the dispatcher will not stop on its own. These events originate from our two processes: To ensure that a process returns to the other end of its :c:func:`cmb_process_hold()` call, it will schedule a wakeup event at the expected time before it yields control to the dispatcher. When executed, this event will *resume* the coroutine where it left off, returning through the :c:func:`cmb_process_hold()` call with a return value that indicates normal or abnormal return. (We have ignored the return values for now in the example above.) So, whenever there are more than one process running, there may be future events scheduled in the event queue. To stop the simulation, we can schedule an "end simulation" event, which stops any running processes at that point. The dispatcher then ends the run. This is perhaps easier to do in code than to describe in text. We define a ``struct simulation`` that contains pointers to the entities of our simulated world and the function for an end simulation event: .. code-block:: c struct simulation { struct cmb_process *arr; struct cmb_buffer *que; struct cmb_process *srv; }; void end_sim(void *subject, void *object) { struct simulation *sim = object; cmb_process_stop(sim->arr, NULL); cmb_process_stop(sim->srv, NULL); } We then store pointers to the simulation entities in the ``struct simulation`` and schedule an ``end_sim`` event before executing the event queue: .. code-block:: c int main(void) { const uint64_t seed = cmb_random_hwseed(); cmb_random_initialize(seed); cmb_event_queue_initialize(0.0); struct simulation sim = {}; sim.que = cmb_buffer_create(); cmb_buffer_initialize(sim.que, "Queue", CMB_UNLIMITED); sim.arr = cmb_process_create(); cmb_process_initialize(sim.arr, "Arrival", arrival_proc, sim.que, 0); cmb_process_start(sim.arr); sim.srv = cmb_process_create(); cmb_process_initialize(sim.srv, "Server", service_proc, sim.que, 0); cmb_process_start(sim.srv); cmb_event_schedule(end_sim, NULL, &sim, 10.0, 0); cmb_event_queue_execute(); cmb_process_terminate(sim.srv); cmb_process_destroy(sim.srv); cmb_process_terminate(sim.arr); cmb_process_destroy(sim.arr); cmb_buffer_terminate(sim.que); cmb_buffer_destroy(sim.que); cmb_event_queue_terminate(); cmb_random_terminate(); return 0; } The arguments to :c:func:`cmb_event_schedule()` are the event function, its subject and object pointers, the simulated time when this event will happen, and an event priority. We have set end time 10.0 here. It could also be expressed as ``cmb_time() + 10.0`` for "in 10.0 time units from now". Note that we now have a mixed simulation world view model, freely combining the two processes with an event. The difference is that the event is instantaneous, occurs at one specific time only, while the processes can have a duration in simulated time. The simulation end event does not need to be at a predetermined time. It is equally valid for some process in the simulation to schedule an end simulation event at the current time whenever some condition is met, such as a certain number of customers having been serviced, a statistics collector having a certain number of samples, or something else. Or, even easier for this simple simulation, the arrival process could just stop generating new arrivals after a certain number of customers, the event queue would clear, and the simulation would stop. (See `benchmark/MM1_single.c `__ for an example doing exactly that.) We gave the end simulation event a default priority of 0 as the last argument to :c:func:`cmb_event_schedule()`. Priorities are signed 64-bit integers, ``int64_t``. The Cimba dispatcher will always select the scheduled event with the lowest scheduled time as the next event. The simulation clock then jumps to that time and that event will be executed. If several events have the *same* scheduled time, the dispatcher will execute the one with the highest priority first. If several events have the same scheduled time *and* the same priority, it will execute them in first-in, first-out (FIFO) order. Again, we ignored the event handle returned by :c:func:`cmb_event_schedule()`, since we will not be using it in this example. If we wanted to keep it, it is an unsigned 64-bit integer (``uint64_t``). When initializing our arrivals and service processes, we quietly set the last argument to :c:func:`cmb_process_initialize()`, the *priority*, to 0. This is the inherent process priority for scheduling any events pertaining to this process, its priority when waiting for some resource, and so on. The processes can adjust their own (or each other's) priorities during the simulation, dynamically moving themselves up or down in various queues. Cimba does not attempt to adjust any priorities by itself, it just acts on whatever the priorities are, and reshuffles any existing queues as needed if priorities change. We compile and run `our revised code `_ and get something like this: .. code-block:: none [ambonvik@Threadripper tutorial]$ ./tut_1_2 | more 0.0000 dispatcher cmb_event_queue_execute (331): Starting simulation run 0.0000 Arrival cmb_process_hold (333): Holding for 0.011652 time units 0.0000 Arrival cmb_process_timer_add (397): Scheduled timeout event at 0.011652 0.0000 Server cmb_buffer_get (207): Gets 1 from Queue, level 0 0.0000 Server cmb_buffer_get (244): Waiting for more, level now 0 0.0000 Server cmb_resourceguard_wait (158): Waits for Queue 0.011652 dispatcher wakeup_event_time (364): Wakes Arrival signal 0 0.011652 Arrival cmb_buffer_put (291): Puts 1 into Queue, level 0 0.011652 Arrival cmb_buffer_put (298): Success, found room for 1, has 0 remaining 0.011652 Arrival cmb_resourceguard_signal (228): Scheduling wakeup event for Server 0.011652 Arrival cmb_process_hold (333): Holding for 0.274450 time units 0.011652 Arrival cmb_process_timer_add (397): Scheduled timeout event at 0.286102 0.011652 dispatcher wakeup_event_resource (182): Wakes Server signal 0 0.011652 Server cmb_buffer_get (251): Returned successfully from wait 0.011652 Server cmb_buffer_get (207): Gets 1 from Queue, level 1 0.011652 Server cmb_buffer_get (214): Success, 1 was available, got 1 0.011652 Server cmb_process_hold (333): Holding for 0.577195 time units 0.011652 Server cmb_process_timer_add (397): Scheduled timeout event at 0.588847 0.28610 dispatcher wakeup_event_time (364): Wakes Arrival signal 0 0.28610 Arrival cmb_buffer_put (291): Puts 1 into Queue, level 0 0.28610 Arrival cmb_buffer_put (298): Success, found room for 1, has 0 remaining 0.28610 Arrival cmb_process_hold (333): Holding for 1.042802 time units 0.28610 Arrival cmb_process_timer_add (397): Scheduled timeout event at 1.328904 0.58885 dispatcher wakeup_event_time (364): Wakes Server signal 0 0.58885 Server cmb_buffer_get (207): Gets 1 from Queue, level 1 0.58885 Server cmb_buffer_get (214): Success, 1 was available, got 1 0.58885 Server cmb_process_hold (333): Holding for 0.812160 time units 0.58885 Server cmb_process_timer_add (397): Scheduled timeout event at 1.401007 1.3289 dispatcher wakeup_event_time (364): Wakes Arrival signal 0 1.3289 Arrival cmb_buffer_put (291): Puts 1 into Queue, level 0 1.3289 Arrival cmb_buffer_put (298): Success, found room for 1, has 0 remaining 1.3289 Arrival cmb_process_hold (333): Holding for 0.378551 time units 1.3289 Arrival cmb_process_timer_add (397): Scheduled timeout event at 1.707456 1.4010 dispatcher wakeup_event_time (364): Wakes Server signal 0 1.4010 Server cmb_buffer_get (207): Gets 1 from Queue, level 1 1.4010 Server cmb_buffer_get (214): Success, 1 was available, got 1 1.4010 Server cmb_process_hold (333): Holding for 0.342517 time units 1.4010 Server cmb_process_timer_add (397): Scheduled timeout event at 1.743524 1.7075 dispatcher wakeup_event_time (364): Wakes Arrival signal 0 1.7075 Arrival cmb_buffer_put (291): Puts 1 into Queue, level 0 1.7075 Arrival cmb_buffer_put (298): Success, found room for 1, has 0 remaining 1.7075 Arrival cmb_process_hold (333): Holding for 1.632981 time units 1.7075 Arrival cmb_process_timer_add (397): Scheduled timeout event at 3.340436 ... 7.4347 dispatcher wakeup_event_time (364): Wakes Server signal 0 7.4347 Server cmb_buffer_get (207): Gets 1 from Queue, level 5 7.4347 Server cmb_buffer_get (214): Success, 1 was available, got 1 7.4347 Server cmb_process_hold (333): Holding for 0.462803 time units 7.4347 Server cmb_process_timer_add (397): Scheduled timeout event at 7.897489 7.8975 dispatcher wakeup_event_time (364): Wakes Server signal 0 7.8975 Server cmb_buffer_get (207): Gets 1 from Queue, level 4 7.8975 Server cmb_buffer_get (214): Success, 1 was available, got 1 7.8975 Server cmb_process_hold (333): Holding for 0.306544 time units 7.8975 Server cmb_process_timer_add (397): Scheduled timeout event at 8.204032 8.2040 dispatcher wakeup_event_time (364): Wakes Server signal 0 8.2040 Server cmb_buffer_get (207): Gets 1 from Queue, level 3 8.2040 Server cmb_buffer_get (214): Success, 1 was available, got 1 8.2040 Server cmb_process_hold (333): Holding for 0.458015 time units 8.2040 Server cmb_process_timer_add (397): Scheduled timeout event at 8.662048 8.6620 dispatcher wakeup_event_time (364): Wakes Server signal 0 8.6620 Server cmb_buffer_get (207): Gets 1 from Queue, level 2 8.6620 Server cmb_buffer_get (214): Success, 1 was available, got 1 8.6620 Server cmb_process_hold (333): Holding for 0.776396 time units 8.6620 Server cmb_process_timer_add (397): Scheduled timeout event at 9.438443 9.3383 dispatcher wakeup_event_time (364): Wakes Arrival signal 0 9.3383 Arrival cmb_buffer_put (291): Puts 1 into Queue, level 1 9.3383 Arrival cmb_buffer_put (298): Success, found room for 1, has 0 remaining 9.3383 Arrival cmb_process_hold (333): Holding for 0.631378 time units 9.3383 Arrival cmb_process_timer_add (397): Scheduled timeout event at 9.969667 9.4384 dispatcher wakeup_event_time (364): Wakes Server signal 0 9.4384 Server cmb_buffer_get (207): Gets 1 from Queue, level 2 9.4384 Server cmb_buffer_get (214): Success, 1 was available, got 1 9.4384 Server cmb_process_hold (333): Holding for 1.983201 time units 9.4384 Server cmb_process_timer_add (397): Scheduled timeout event at 11.421645 9.9697 dispatcher wakeup_event_time (364): Wakes Arrival signal 0 9.9697 Arrival cmb_buffer_put (291): Puts 1 into Queue, level 1 9.9697 Arrival cmb_buffer_put (298): Success, found room for 1, has 0 remaining 9.9697 Arrival cmb_process_hold (333): Holding for 0.218656 time units 9.9697 Arrival cmb_process_timer_add (397): Scheduled timeout event at 10.188323 10.000 dispatcher cmb_process_stop (808): Stop Arrival value (nil) 10.000 dispatcher cmb_process_stop (808): Stop Server value (nil) 10.000 dispatcher cmb_event_queue_execute (334): No more events in queue Progress: It started, ran, and now also stopped. .. _tut_1_logging: Setting logging levels ^^^^^^^^^^^^^^^^^^^^^^ Next, the verbiage. Cimba has powerful and flexible logging that gives you fine-grained control of what to log. The core logging function is called :c:func:`cmb_logger_vfprintf()`. As the name says, it is similar to the standard function ``vfprintf()`` but with some Cimba-specific added features. You will rarely interact directly with this function, but instead call wrapper functions (actually macros) like :c:macro:`cmb_logger_user()` or :c:macro:`cmb_logger_error()`. The key concept to understand here is the *logger flags*. Cimba uses a 32-bit unsigned integer (``uint32_t``) as a bit mask to determine what log entries to print and which to ignore. Cimba reserves the top four bits for its own use, identifying messages of various severities, leaving the 28 remaining bits for the user application. There is a global bit field and a bit mask in each call. If a simple bitwise and (``&``) between the global bit field and the caller's bit mask gives a non-zero result, that line is printed, otherwise not. Initially, all bits in the global bit field are on, ``0xFFFFFFFF``. You can turn selected bits on and off with :c:func:`cmb_logger_flags_on()` and :c:func:`cmb_logger_flags_off()`. Again, it may be easier to show this in code than to explain. We add user-defined logging messages to the end event and the two processes. The messages take ``printf``-style format strings and arguments: .. code-block:: c #include #include #define USERFLAG1 0x00000001 struct simulation { struct cmb_process *arr; struct cmb_buffer *que; struct cmb_process *srv; }; void end_sim(void *subject, void *object) { cmb_unused(subject); cmb_assert_debug(object != NULL); struct simulation *sim = object; cmb_logger_user(stdout, USERFLAG1, "--- Game Over ---"); cmb_process_stop(sim->arr, NULL); cmb_process_stop(sim->srv, NULL); } void *arrival_proc(struct cmb_process *me, void *ctx) { cmb_unused(me); cmb_assert_debug(ctx != NULL); struct cmb_buffer *bp = ctx; while (true) { const double rate = 0.75; const double mean = 1.0 / rate; const double t_ia = cmb_random_exponential(mean); cmb_logger_user(stdout, USERFLAG1, "Holds for %f time units", t_ia); cmb_process_hold(t_ia); uint64_t n = 1; cmb_logger_user(stdout, USERFLAG1, "Puts one into the queue"); cmb_buffer_put(bp, &n); } } void *service_proc(struct cmb_process *me, void *ctx) { cmb_unused(me); cmb_assert_debug(ctx != NULL); struct cmb_buffer *bp = ctx; while (true) { const double rate = 1.0; const double mean = 1.0 / rate; uint64_t m = 1; cmb_logger_user(stdout, USERFLAG1, "Gets one from the queue"); cmb_buffer_get(bp, &m); double t_srv = cmb_random_exponential(mean); cmb_logger_user(stdout, USERFLAG1, "Got one, services it for %f time units", t_srv); cmb_process_hold(t_srv); } } .. _tut_1_assert: .. admonition:: Asserts and debuggers Notice the :c:macro:`cmb_assert_debug()` in this code fragment. It is a custom version of the standard ``assert()`` macro, triggering a hard stop if the condition evaluates to ``false``. Our custom asserts come in three flavors, :c:macro:`cmb_assert_debug()`, :c:macro:`cmb_assert_release()`, and :c:macro:`cmb_assert_always()`. The ``_debug`` assert behaves like the standard one and goes away if the preprocessor symbol ``NDEBUG`` is ``#defined``. The ``_release`` assert is still there, but also goes away if ``NASSERT`` is ``#defined``. The ``_always`` assert remains in the code, no matter what. See :ref:`the background section ` for more details. We will trip one and see how it looks. We temporarily replace the exponentially distributed service time with a normally distributed one, mean 1.0 and sigma 0.25. This will almost surely generate a negative value sooner or later, which will cause the service process to try to hold for a negative time, resuming in its own past. That should not be possible: .. code-block:: c // const double t_srv = cmb_random_exponential(t_srv_mean); const double t_srv = cmb_random_normal(1.0, 0.25); Sure enough:: /home/ambonvik/github/cimba/build/tutorial/tut_1_5 9359.5 Service cmb_process_hold (272): Fatal: Assert "dur >= 0.0" failed, source file cmb_process.c, seed 0x9bec8a16f0aa802a Process finished with exit code 134 (interrupted by signal 6:SIGABRT) The output line lists the simulated time, the process, the function and line of code, the condition that failed, the source code file where it blew up, and even the random number seed that was used to initialize the run in case you want to reproduce the error with additional logging enabled. If using a debugger, place a breakpoint in :c:func:`cmi_assert_failed()`. If some assert trips, control will always go there. You can then page up the stack and see exactly what happened. .. image:: ../images/debugger_assert.png We also suppress the (debug build only) Cimba informationals from the main function: .. code-block:: c cmb_logger_flags_off(CMB_LOGGER_INFO); We compile and run, and get something like this: .. code-block:: none [ambonvik@Threadripper tutorial]$ ./tut_1_3 0.0000 Arrival arrival_proc (31): Holds for 1.217860 time units 0.0000 Server service_proc (48): Gets one from the queue 1.2179 Arrival arrival_proc (34): Puts one into the queue 1.2179 Arrival arrival_proc (31): Holds for 1.473610 time units 1.2179 Server service_proc (51): Got one, services it for 3.621266 time units 2.6915 Arrival arrival_proc (34): Puts one into the queue 2.6915 Arrival arrival_proc (31): Holds for 4.047572 time units 4.8391 Server service_proc (48): Gets one from the queue 4.8391 Server service_proc (51): Got one, services it for 0.323147 time units 5.1623 Server service_proc (48): Gets one from the queue 6.7390 Arrival arrival_proc (34): Puts one into the queue 6.7390 Arrival arrival_proc (31): Holds for 0.154751 time units 6.7390 Server service_proc (51): Got one, services it for 1.834300 time units 6.8938 Arrival arrival_proc (34): Puts one into the queue 6.8938 Arrival arrival_proc (31): Holds for 0.097154 time units 6.9909 Arrival arrival_proc (34): Puts one into the queue 6.9909 Arrival arrival_proc (31): Holds for 0.830664 time units 7.8216 Arrival arrival_proc (34): Puts one into the queue 7.8216 Arrival arrival_proc (31): Holds for 0.681079 time units 8.5027 Arrival arrival_proc (34): Puts one into the queue 8.5027 Arrival arrival_proc (31): Holds for 1.061645 time units 8.5733 Server service_proc (48): Gets one from the queue 8.5733 Server service_proc (51): Got one, services it for 0.026343 time units 8.5997 Server service_proc (48): Gets one from the queue 8.5997 Server service_proc (51): Got one, services it for 0.090050 time units 8.6897 Server service_proc (48): Gets one from the queue 8.6897 Server service_proc (51): Got one, services it for 0.361272 time units 9.0510 Server service_proc (48): Gets one from the queue 9.0510 Server service_proc (51): Got one, services it for 0.426133 time units 9.4771 Server service_proc (48): Gets one from the queue 9.5643 Arrival arrival_proc (34): Puts one into the queue 9.5643 Arrival arrival_proc (31): Holds for 0.361377 time units 9.5643 Server service_proc (51): Got one, services it for 0.040219 time units 9.6046 Server service_proc (48): Gets one from the queue 9.9257 Arrival arrival_proc (34): Puts one into the queue 9.9257 Arrival arrival_proc (31): Holds for 0.879100 time units 9.9257 Server service_proc (51): Got one, services it for 2.498536 time units 10.000 dispatcher end_sim (17): --- Game Over --- Only our user-defined logging messages are printed, also for a debug build. Note how the simulation time, the name of the active process, the calling function, and the line number are automagically prepended to the user-defined message. The output from a release build will be similar. `The code for this version is here `_ We turn off our user-defined messages like this: .. code-block:: c cmb_logger_flags_off(CMB_LOGGER_INFO); cmb_logger_flags_off(USERFLAG1); We can also combine those two calls by a simple bitwise ``or`` (``|``) between the two bit patterns: .. code-block:: c cmb_logger_flags_off(CMB_LOGGER_INFO | USERFLAG1); As you would expect, this version of the program produces no output. Collecting and reporting statistics ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Which brings us to getting some *useful* output. By now, we are suitably convinced that our simulated M/M/1 queuing system is behaving as we expect, so we want it to start reporting some statistics on the queue length. It will be no surprise that Cimba contains flexible and powerful statistics collectors and reporting functions. There is actually one built into the :c:struct:`cmb_buffer` we have been using. We just need to turn it on from our ``main()``: .. code-block:: struct simulation sim = {}; sim.que = cmb_buffer_create(); cmb_buffer_initialize(sim.que, "Queue", CMB_UNLIMITED); cmb_buffer_start_recording(sim.que); After the simulation is finished executing the event queue, we can make it report its history like this: .. code-block:: cmb_buffer_stop_recording(sim.que); cmb_buffer_print_report(sim.que, stdout); We increase the running time from ten to one million time units by schedulign the ``end_sim``event at time 1.0e6, compile, and run. Very shortly thereafter, output appears: .. code-block:: none [ambonvik@Threadripper tutorial]$ ./tut_1_4 Buffer levels for Queue N 1313789 Mean 2.242 StdDev 3.257 Variance 10.61 Skewness 2.397 Kurtosis 9.070 -------------------------------------------------------------------------------- ( -Infinity, 0.000) | [ 0.000, 2.400) |################################################## [ 2.400, 4.800) |##########- [ 4.800, 7.200) |#######= [ 7.200, 9.600) |##- [ 9.600, 12.00) |#- [ 12.00, 14.40) |= [ 14.40, 16.80) |- [ 16.80, 19.20) |- [ 19.20, 21.60) |- [ 21.60, 24.00) |- [ 24.00, 26.40) |- [ 26.40, 28.80) |- [ 28.80, 31.20) |- [ 31.20, 33.60) |- [ 33.60, 36.00) |- [ 36.00, 38.40) |- [ 38.40, 40.80) |- [ 40.80, 43.20) |- [ 43.20, 45.60) |- [ 45.60, 48.00) |- [ 48.00, Infinity) |- -------------------------------------------------------------------------------- The text-mode histogram uses the character ``#`` to indicate a full pixel, ``=`` for one that is more than half full, and ``-`` for one that contains something but less than half filled. We can also get a pointer to the :c:struct:`cmb_timeseries` object by calling :c:func:`cmb_buffer_history()` and doing further analysis on that. As an example, let's do the first 20 partial autocorrelation coefficients of the queue length time series and print a correlogram of that as well: .. code-block:: c struct cmb_timeseries *ts = cmb_buffer_history(sim.que); double pacf_arr[21]; cmb_timeseries_PACF(ts, 20, pacf_arr, NULL); printf("\nPartial autocorrelation coefficients for %s levels\n", cmb_buffer_name(sim.que)); cmb_timeseries_print_correlogram(ts, stdout, 20, pacf_arr); Output: .. code-block:: none Partial autocorrelation coefficients for Queue levels -1.0 0.0 1.0 -------------------------------------------------------------------------------- 1 0.953 |###############################- 2 0.340 |###########- 3 -0.171 =#####| 4 0.139 |####= 5 -0.088 =##| 6 0.086 |##= 7 -0.060 =#| 8 0.064 |##- 9 -0.045 =#| 10 0.051 |#= 11 -0.035 -#| 12 0.042 |#- 13 -0.028 =| 14 0.035 |#- 15 -0.027 =| 16 0.032 |#- 17 -0.022 =| 18 0.029 |= 19 -0.020 =| 20 0.027 |= -------------------------------------------------------------------------------- The code for this stage `can be found here. `_ This is not quite publication-ready graphics, but can be useful at the model development stage we are at now: We have numbers. Theory predicts an average queue length of 2.25 for a M/M/1 queue at 75 % utilization. We just got 2.242. Close, but is it close enough? We need more resolving power. Refactoring for parallelism ^^^^^^^^^^^^^^^^^^^^^^^^^^^ Before we go there, we will clean up a few rough edges. We want to tidy up the hard-coded parameters to a proper data structure. We define a ``struct trial`` to contain parameters and output values, and bundle both our existing ``struct simulation`` and ``struct trial`` in a ``struct context``, and pass that between the functions. .. code-block:: c struct simulation { struct cmb_process *arr; struct cmb_buffer *que; struct cmb_process *srv; }; struct trial { /* Parameters */ double arr_rate; double srv_rate; double warmup_time; double duration; /* Results */ double avg_queue_length; }; struct context { struct simulation *sim; struct trial *trl; }; For now, we just declare these structs as local variables on the stack. We also need a pair of events to turn data recording on and off at specified times: .. code-block:: c static void start_rec(void *subject, void *object) { cmb_unused(subject); cmb_assert_debug(object != NULL); const struct context *ctx = object; const struct simulation *sim = ctx->sim; cmb_buffer_recording_start(sim->que); } static void stop_rec(void *subject, void *object) { cmb_unused(subject); cmb_assert_debug(object != NULL); const struct context *ctx = object; const struct simulation *sim = ctx->sim; cmb_buffer_recording_stop(sim->que); } As the last refactoring step before we parallelize, we move the simulation driver code from ``main()`` to a separate function ``run_MM1_trial()`` and call it from ``main()``. For reasons that soon will become clear, its argument is a single pointer to ``void``, even if we immediately cast this to our ``struct trial *`` once inside the function. We remove the call to :c:func:`cmb_buffer_report`, calculate the average queue length, and store it in the ``trial`` results field: .. code-block:: c struct cmb_wtdsummary wtdsum; const struct cmb_timeseries *ts = cmb_buffer_history(ctx.sim->que); cmb_timeseries_summarize(ts, &wtdsum); ctx.trl->avg_queue_length = cmb_wtdsummary_mean(&wtdsum); The ``main()`` function is now reduced to this: .. code-block:: c int main(void) { struct trial trl = {}; trl.arr_rate = 0.75; trl.srv_rate = 1.0; trl.warmup_s = 1000.0; trl.dur_s = 1e6; run_MM1_trial(&trl); printf("Avg %f\n", trl.avg_queue_length); return 0; } We will not repeat the rest of the code here. You can find it in `tutorial/tut_1_5.c `__. Instead, we compile and run it, receiving output similar to this: .. code-block:: none /home/ambonvik/github/cimba/build/tutorial/tut_1_5 Avg 2.234060 .. _tut_1_parallel: Parallelization ^^^^^^^^^^^^^^^ So far, we have been developing the model in a single thread, running on a single CPU core. All the concurrency between our simulated processes (coroutines) is happening inside this single thread. The simulation is still pretty fast, but a modern CPU has many cores, most of them idly watching our work so far with detached interest. Let's put them to work. Cimba is built from the ground up for coarse-grained parallelism. Depending on the viewpoint, parallelizing a discrete event simulation is either terribly hard or trivially simple. The *hard* way to do it is to try to parallelize a single simulation run. This is near impossible, since the outcome of each event may influence all future events in complex and model-dependent ways. The discrete event simulation algorthm itself is heavily serializing. The *easy* way is to realize that we rarely do a *single* simulation run. We want to run *many* to generate statistically significant answers to questions and/or to test many parameter combinations, perhaps in a full factorial experimental design. Even if we could answer a question by a single very long run, we may get a better understanding by splitting it into many shorter runs to not just get an average, but also a sense of the variability of our results. When we do multiple replications of a simulation, these are by design intended to be independent, identically distributed trials. Multiple parameter combinations are no less independent. This is what is called an "embarrassingly parallel" problem. There is no interaction between the trials, and they can be trivially parallelized by just running them at the same time and collecting the output. Cimba creates a pool of worker threads, one per (logical) CPU core on the system. You describe your experiment as an array of trials and the function to execute each trial, and pass these to :c:func:`cimba_run_experiment()`. The worker threads will pull trials from the experiment array and run them, storing the results back in your trial struct, before moving to the next un-executed trial in the array. This gives an inherent load balancing with minimal overhead. When all trials have been executed, it stops. Returning to our M/M/1 queue, suppose that we want to test the commonly accepted queuing theory by testing utilization factors from 0.025 to 0.975 in steps of 0.025, and that we want to run 10 replications of each parameter combination. We then want to calculate and plot the mean and 95 % confidence bands for each parameter combination, and compare that to the analytically calculated value in publication ready graphics. We can set up our experimental design like this: .. code-block:: c const unsigned n_rhos = 39; const double rho_start = 0.025; const double rho_step = 0.025; const unsigned n_reps = 10; const double srv_rate = 1.0; const double warmup_time = 1000.0; const double duration = 1.0e6; printf("Setting up experiment\n"); const unsigned n_trials = n_rhos * n_reps; struct trial *experiment = cmi_calloc(n_trials, sizeof(*experiment)); uint64_t ui_exp = 0u; double rho = rho_start; for (unsigned ui_rho = 0u; ui_rho < n_rhos; ui_rho++) { for (unsigned ui_rep = 0u; ui_rep < n_reps; ui_rep++) { experiment[ui_exp].arr_rate = rho * srv_rate; experiment[ui_exp].srv_rate = srv_rate; experiment[ui_exp].warmup_time = warmup_time; experiment[ui_exp].duration = duration; experiment[ui_exp].seed_used = 0u; experiment[ui_exp].avg_queue_length = 0.0; ui_exp++; } rho += rho_step; } We allocate the experiment array on the heap using ``cmi_calloc()``. This is a simple wrapper to the standard library ``calloc()`` function together with a test for valit result. This avoids the need to clutter simulation model code with repetitive error handling code. Here, we have hard-coded the parameters for the sake of the tutorial, but they would probably be given as an input file or as interactive input in real usage. .. note:: Do not use any writeable global variables in your model. The entire parallelized experiment exists in a shared memory space. Threads will be sharing CPU cores in unpredictable ways. A global variable accessible to several threads can be read and written by any thread, creating potential hard-to-diagnose bugs. Do not use any static local variables in your model either. Your model functions will be called by all threads. A static local variable remembers its value from the last call, which may have been a completely different thread. Diagnosing those bugs will not be any easier. Regular local variables, function arguments, and heap memory (``malloc()`` / ``free()``) is thread safe. If you absolutely *must* have a global or static variable, consider prefixing it by :c:macro:`CMB_THREAD_LOCAL` to make it global or static *within that thread only*, creating separate copies per thread. We can then run the experiment: .. code-block:: c cimba_run_experiment(experiment, n_trials, sizeof(*experiment), run_MM1_trial); The first argument is the experiment array, the last argument the simulation driver function we have developed earlier. It will take a pointer to a trial as its argument, but the internals of :c:func:`cimba_run_experiment()` cannot know the detailed structure of your ``struct trial``, so it will be passed as a ``void *``. We need to explain the number of trials and the size of each trial struct as the second and third arguments to :c:func:`cimba_run_experiment()` for it to do correct pointer arithmetic internally. When done, we can collect the results like this: .. code-block:: c ui_exp = 0u; FILE *datafp = fopen("tut_1_6.dat", "w"); fprintf(datafp, "# utilization\tavg_queue_length\tconf_interval\n"); for (unsigned ui_rho = 0u; ui_rho < n_rhos; ui_rho++) { const double ar = experiment[ui_exp].arr_rate; const double sr = experiment[ui_exp].srv_rate; const double rho_used = ar / sr; struct cmb_datasummary cds; cmb_datasummary_initialize(&cds); for (unsigned ui_rep = 0u; ui_rep < n_reps; ui_rep++) { cmb_datasummary_add(&cds, experiment[ui_exp].avg_queue_length); ui_exp++; } cmb_assert_debug(cmb_datasummary_count(&cds) == n_reps); const double sample_avg = cmb_datasummary_mean(&cds); const double sample_sd = cmb_datasummary_stddev(&cds); const double t_crit = 2.228; fprintf(datafp, "%f\t%f\t%f\n", rho_used, sample_avg, t_crit * sample_sd); cmb_datasummary_terminate(&cds); } fclose(datafp); cmi_free(experiment); write_gnuplot_commands(); system("gnuplot -persistent tut_1_6.gp"); We use a :c:struct:`cmb_datasummary` to simplify the calculation of confidence intervals, knowing that it will calculate correct moments in a single pass of the data. Once done, we use ``cmi_free()`` to deallocate the experiment array. We then write the results to an output file, write a gnuplot command file to plot the results, and spawn a persistent gnuplot window to display the chart. Also, we would like to know the progress of our experiment, so we define a separate level of logger messages like this: .. code-block:: c #define USERFLAG1 0x00000001 #define USERFLAG2 0x00000002 .. note:: The logging flags are bitmasks, not consecutive numbers. The next three values would be ``0x00000004``, ``0x00000008``, and ``0x00000010``. You can combine flag values bit-wise. For instance, a call to ``cmb_logger_user()`` with flag level 63 (``0xFF``) will print a line if *any* of the lowest 8 bits are set. We add a logging call to our ``run_MM1_trial()``: .. code-block:: c cmb_logger_user(stdout, USERFLAG2, "seed: 0x%016" PRIx64 " rho: %f", trl->seed_used, trl->arr_rate / trl->srv_rate); We use the macro ``PRIx64`` from ``#include `` for portable formatting of the ``uint64_t`` seed value. We add some code to measure run time and some extra ``printf()`` calls, compile and run, and get output similar to this: .. code-block:: none /home/ambonvik/github/cimba/build/tutorial/tut_1_6 Cimba version 3.0.0-beta Setting up experiment Executing experiment 0 0.0000 dispatcher run_MM1_trial (120): seed: 0xc81e7ac2d54abef1 rho: 0.025000 2 0.0000 dispatcher run_MM1_trial (120): seed: 0xb995a846d37f1522 rho: 0.025000 1 0.0000 dispatcher run_MM1_trial (120): seed: 0x246364a107f945e7 rho: 0.025000 5 0.0000 dispatcher run_MM1_trial (120): seed: 0xa7b5e743900ccc53 rho: 0.025000 4 0.0000 dispatcher run_MM1_trial (120): seed: 0x775e54d85c8760eb rho: 0.025000 3 0.0000 dispatcher run_MM1_trial (120): seed: 0x6d4f0bb78fab7321 rho: 0.025000 6 0.0000 dispatcher run_MM1_trial (120): seed: 0xa0d4d65c953e6ba9 rho: 0.025000 7 0.0000 dispatcher run_MM1_trial (120): seed: 0xc66885b9c3e01198 rho: 0.025000 10 0.0000 dispatcher run_MM1_trial (120): seed: 0x0d0324ac1ad47314 rho: 0.050000 9 0.0000 dispatcher run_MM1_trial (120): seed: 0x60ea3e25c23886cd rho: 0.025000 8 0.0000 dispatcher run_MM1_trial (120): seed: 0xb2cf2d84fb2cd36e rho: 0.025000 11 0.0000 dispatcher run_MM1_trial (120): seed: 0x04b83d03be4d5393 rho: 0.050000 12 0.0000 dispatcher run_MM1_trial (120): seed: 0x6a3c8c7d7657b5a2 rho: 0.050000 13 0.0000 dispatcher run_MM1_trial (120): seed: 0x79e878ab601d9ba9 rho: 0.050000 14 0.0000 dispatcher run_MM1_trial (120): seed: 0xcd50fbb55578f7d2 rho: 0.050000 15 0.0000 dispatcher run_MM1_trial (120): seed: 0xfabb1c5f934c9aad rho: 0.050000 ... 384 0.0000 dispatcher run_MM1_trial (120): seed: 0x8aa3bb76ccc324a6 rho: 0.975000 385 0.0000 dispatcher run_MM1_trial (120): seed: 0x9290801927a4f348 rho: 0.975000 386 0.0000 dispatcher run_MM1_trial (120): seed: 0xeaff225d8d61a4ad rho: 0.975000 387 0.0000 dispatcher run_MM1_trial (120): seed: 0xc2c20c0bef3959b7 rho: 0.975000 388 0.0000 dispatcher run_MM1_trial (120): seed: 0xc11445ad99b4a5c9 rho: 0.975000 389 0.0000 dispatcher run_MM1_trial (120): seed: 0x5faccc75f803deef rho: 0.975000 Finished experiment, writing results to file It took 1.31878 sec Note that the Cimba logger understands that it is now running multithreaded and prepends each logging line with the trial index in your experiment array. Note also that the trials may not be executed in strict sequence, since we do not control the detailed interleaving of the threads. That is up to the operating system. We also get an image like this in a separate window: .. image:: ../images/tut_1_6.png Evidently, we cannot reject the null hypothesis that conventional queuing theory may be correct. Nor can we reject the hypothesis that Cimba may be working correctly. Seed management ^^^^^^^^^^^^^^^ Since we used ``cmb_random_hwseed()`` to seed the pseudo-random number generators in each trial, our pseudo-random number generators are seeded with real entropy from the computer hardware. If we run the model again, we will get a slightly different result. This independence is of course a good thing, but we may also need reproducibility. We can make our simulation reproducible by providing a master seed as input, e.g., on the command line, and then use a hashing function to create different seeds for each trial in the experiment. We can also provide other simulation parameters in the same way: .. code-block:: c #include int main(const int argc, char *argv[]) { /* Can be set on command line */ bool timing_enabled = false; uint64_t master_seed = cmb_random_hwseed(); uint32_t n_reps = 10; double warmup_s = 1000.0; double duration_s = 1.0e6; /* Not yet added to command line params */ const unsigned n_rhos = 39; const double rho_start = 0.025; const double rho_step = 0.025; const double srv_rate = 1.0; /* Parse command line options, if any */ int opt; while ((opt = getopt(argc, argv, "d:n:s:tw:")) != -1) { switch (opt) { case 'd': { errno = 0; duration_s = strtod(optarg, NULL); if (errno != 0 || duration <= 0.0) { fprintf(stderr, "Invalid argument %s\n", optarg); return EXIT_FAILURE; } break; } case 'n': { errno = 0; n_reps = (uint32_t)strtoul(optarg, NULL, 0); if (errno != 0) { fprintf(stderr, "Invalid argument %s\n", optarg); return EXIT_FAILURE; } break; } case 's': { errno = 0; master_seed = (uint64_t)strtoull(optarg, NULL, 0); if (errno != 0 || master_seed == 0u) { fprintf(stderr, "Invalid argument %s\n", optarg); return EXIT_FAILURE; } break; } case 't': { timing_enabled = true; break; } case 'w': { errno = 0; warmup_s = (double)strtod(optarg, NULL); if (errno != 0 || warmup_time < 0.0) { fprintf(stderr, "Invalid argument %s\n", optarg); return EXIT_FAILURE; } break; } default: { fprintf(stderr, "Usage: %s [-d ][-n ][-s ][-t][-w ]\n", argv[0]); return EXIT_FAILURE; } } } By using the hash function ``cmb_random_fmix64()`` and a running trial counter as the unique nonce value for hashing the trial seed., we can now create suitable deterministic seeds for each trial in our experiment like this: .. code-block:: c printf("Setting up experiment\n"); printf("Master seed: 0x%" PRIx64 "\n", master_seed); const unsigned n_trials = n_rhos * n_reps; struct trial *experiment = cmi_calloc(n_trials, sizeof(*experiment)); uint64_t ui_exp = 0u; double rho = rho_start; for (unsigned ui_rho = 0u; ui_rho < n_rhos; ui_rho++) { for (unsigned ui_rep = 0u; ui_rep < n_reps; ui_rep++) { experiment[ui_exp].arr_rate = rho * srv_rate; experiment[ui_exp].srv_rate = srv_rate; experiment[ui_exp].warmup_s = warmup_s; experiment[ui_exp].dur_s = duration_s; experiment[ui_exp].seed_used = cmb_random_fmix64(master_seed, ui_exp); experiment[ui_exp].avg_queue_length = 0.0; ui_exp++; } rho += rho_step; } Each trial can now use its pre-assigned seed: .. code-block:: c cmb_random_initialize(trl->seed_used); We can then type, e.g.: .. code-block:: none [ambonvik@Threadripper tutorial]$ ./tut_1_7 -s 0x123456789abcd0 -d 1e7 -n 15 -w 1e6 -t Cimba version 3.0.0-beta Setting up experiment Master seed: 0x123456789abcd0 Executing experiment 0 0.0000 dispatcher run_MM1_trial (173): seed: 0xfdec9c60e9175db9 rho: 0.025000 2 0.0000 dispatcher run_MM1_trial (173): seed: 0x8c86cd3226e7c39a rho: 0.025000 1 0.0000 dispatcher run_MM1_trial (173): seed: 0xa0f0a732375fb48f rho: 0.025000 3 0.0000 dispatcher run_MM1_trial (173): seed: 0x6efb9a53ec801f99 rho: 0.025000 5 0.0000 dispatcher run_MM1_trial (173): seed: 0x7b67c6418c63aae2 rho: 0.025000 6 0.0000 dispatcher run_MM1_trial (173): seed: 0x8402740fc59daac0 rho: 0.025000 8 0.0000 dispatcher run_MM1_trial (173): seed: 0xa622550fd8b73be6 rho: 0.025000 4 0.0000 dispatcher run_MM1_trial (173): seed: 0x4074149906797ad1 rho: 0.025000 11 0.0000 dispatcher run_MM1_trial (173): seed: 0xf92ed9cb02c4ffad rho: 0.025000 16 0.0000 dispatcher run_MM1_trial (173): seed: 0x7965a67ab345d26e rho: 0.050000 7 0.0000 dispatcher run_MM1_trial (173): seed: 0x85c4ac085200f398 rho: 0.025000 14 0.0000 dispatcher run_MM1_trial (173): seed: 0x2681e145aa262b4d rho: 0.025000 9 0.0000 dispatcher run_MM1_trial (173): seed: 0xf80baffa542f3b03 rho: 0.025000 10 0.0000 dispatcher run_MM1_trial (173): seed: 0xee25017da603acb5 rho: 0.025000 ... 578 0.0000 dispatcher run_MM1_trial (173): seed: 0xc7d13810b2cd773b rho: 0.975000 579 0.0000 dispatcher run_MM1_trial (173): seed: 0x5d20f8af7ddbb0c4 rho: 0.975000 580 0.0000 dispatcher run_MM1_trial (173): seed: 0x4c79c5afe47c077a rho: 0.975000 581 0.0000 dispatcher run_MM1_trial (173): seed: 0x24e1643bbf3f2bef rho: 0.975000 582 0.0000 dispatcher run_MM1_trial (173): seed: 0x218448f76e55882c rho: 0.975000 583 0.0000 dispatcher run_MM1_trial (173): seed: 0x87cf1c7dd681deb2 rho: 0.975000 584 0.0000 dispatcher run_MM1_trial (173): seed: 0xb6c0c5fb0077fbe3 rho: 0.975000 Finished experiment, writing results to file It took 17.6253 sec The chart looks similar, but since we used longer run times and more replications, our confidence intervals are tighter: .. image:: ../images/tut_1_7.png This concludes our first tutorial. We have followed the development steps from a first minimal model with basic process interactions to a complete parallelized experiment with graphical output. The files ``tutorial/tut_1_*.c`` include working code for each stage of development. For additional variations of this theme, see `benchmark/MM1_multi.c `__ where the queue is modeled as a :c:struct:`cmb_objectqueue` with individual customers tracking their time in the system, and `test/test_cimba.c `__ modeling a M/G/1 queue with different utilizations and service time varibilities. .. _tut_2: Acquiring, preempting, holding, and releasing resources, with interruptions --------------------------------------------------------------------------- We will now introduce the Cimba methods for acquiring and releasing resources of various kinds. We will also show additional process interactions where the active process is acting directly on some other process. We will demonstrate these through a somewhat cartoonish example. First, some necessary background. Resources and resource pools ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Cimba provides two kinds of resources that a process can acquire, hold, and release. These are the :c:struct:`cmb_resource` and the :c:struct:`cmb_resourcepool`. The difference is that the :c:struct:`cmb_resource` is a single unit that only can be held by one process at a time, while the :c:struct:`cmb_resourcepool` consists of several units that each can be held by a process. A process may hold more than one unit from the :c:struct:`cmb_resourcepool`, but there is a certain upper limit for how many units are available simultaneously from the pool. In computer science terms, the :c:struct:`cmb_resource` is a *binary semaphore*, while the :c:struct:`cmb_resourcepool` is a *counting semaphore*. If the requested resource or number of resources is not available, the ``_acquire`` calls will wait in a priority queue until the requested amount becomes available. The ordering is determined first by the process priority, then FIFO based on the simulation timestamp when it entered the queue. Changing a process' priority with :c:func:`cmb_process_priority_set()` will also have effect on any priority queues it may be waiting in, with no need to explicitly reorder the queue from the application. The typical usage pattern is also the reason for the name :c:func:`cmb_process_hold()`: .. code-block:: c cmb_resource_acquire(res); cmb_process_hold(dur); cmb_resource_release(res); Or: .. code-block:: c cmb_resourcepool_acquire(respl, 6); cmb_process_hold(dur); cmb_resourcepool_release(respl, 3); cmb_process_hold(dur); cmb_resourcepool_release(respl, 3); Or even: .. code-block:: c cmb_resourcepool_acquire(respl, 6); cmb_resource_acquire(res); cmb_process_hold(dur); cmb_resource_release(res); cmb_resourcepool_release(respl, 6); Note that there is no timeout argument in these calls. We will show how to do this in :ref:`the next tutorial `, but will leave it as a (very small) cliffhanger until then. Note also that there are some differences between the ``_acquire()``/``_release()`` pairs and the similar ``_get()``/``_put()`` pairs for buffers and queues. Suppose that you have a ``cmb_buffer`` of maximal size 10. It is still possible to call ``cmb_buffer_put(bufp, 100)``. The call just puts in 10 to begin with, waits for someone to get one or more of them, and then keeps refilling the queue until all 100 are put in. The call only returns at that point (unless interrupted, which we will discuss in a moment). Similarly, ``cmb_buffer_get(bufp, 100)`` works as expected, and trying to ``_put()`` another item into a full :c:struct:`cmb_objectqueue` or :c:struct:`cmb_priorityqueue` just suspends the caller until space becomes available. Resources and resource pools are not like that. Requesting more from a resource pool than its maximum is an error. If we have a resource pool with maximum size 10, 5 of which already are in use, it is fine to call ``cmb_resourcepool_acquire(rpp, 10)``. The call just waits until all 10 are available, accumulating its holding whenever some become available until it has all, and then returns. On the other hand, calling ``cmb_resourcepool_acquire(rpp, 11)`` will not work. It is not a meaningful call, so Cimba will do the most helpful thing it can: Trip an `assert()` and crash your program on the spot, encouraging you to find and fix the error. (See :ref:`the explanatory section ` for more about the Cimba error handling philosophy.) Preemptions and interruptions ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ We have already explained that Cimba processes (and the coroutines they are derived from) execute atomically until they explicitly yield control. These yield (and resume) points are hidden inside functions like :c:func:`cmb_process_hold()` or :c:func:`cmb_resource_acquire()`. Inside the call, control may (or may not) be passed to some other process. The call will only return when control is transferred back to this process. To the calling process just sitting on its own stack, it looks very simple, just another function call, but a lot of things may be happening elsewhere in the model in the meantime. A yielded process does not have any guarantees for what may be happening to it before it resumes control. Other processes may act on this process, perhaps stopping it outright, waking it up early, or snatching a resource away from it. To be able to tell the difference, functions like :c:func:`cmb_process_hold()` and :c:func:`cmb_resource_acquire()` have an integer return value, an ``int64_t``. We have quietly ignored the return values in our earlier examples, but they are an important signalling mechanism between Cimba processes. Cimba has reserved a few possible return values for itself. Most importantly, :c:macro:`CMB_PROCESS_SUCCESS` (numeric value 0) means a normal and successful return. For example, if :c:func:`cmb_process_hold()` returns :c:macro:`CMB_PROCESS_SUCCESS`, the calling process knows that it was woken up at the scheduled time without any shenanigans. The second most important value is :c:macro:`CMB_PROCESS_PREEMPTED`. That means that a higher priority process just forcibly took away some resource held by this process. There are also :c:macro:`CMB_PROCESS_INTERRUPTED`, :c:macro:`CMB_PROCESS_STOPPED`, :c:macro:`CMB_PROCESS_CANCELLED`, and :c:macro:`CMB_PROCESS_TIMEOUT`. These predefined signals are all defined as small negative values, leaving an enormous number of available signal values to application-defined meanings. In particular, all positive integers are available to the application for coding various interrupt signals between processes. These signal values create a rich set of direct process interactions. As an example, suppose some process currently holds 10 units from some resource pool. It then calls :c:func:`cmb_resourcepool_acquire()` requesting 10 more units. At that moment, only 5 are available. The process takes these 5 and adds itself to the priority queue maintained by the resource guard before yielding. In effect, it asks to be woken up whenever some more is available, intending to return from its acquire call only when all 10 units have been collected. There are now three different outcomes for the ``_acquire()`` call: 1. All goes as expected, 5 more units eventually become available, the process takes them, and returns :c:macro:`CMB_PROCESS_SUCCESS`. It now holds 20 units. 2. Some higher priority process calls :c:func:`cmb_resourcepool_preempt()` and this process is targeted. The higher priority process takes *all* units held by the victim process. Its acquire call returns :c:macro:`CMB_PROCESS_PREEMPTED`. It now holds 0 units. 3. Some other process calls :c:func:`cmb_process_interrupt()` on this process. It excuses itself from the resource guard priority queue and returns whatever signal value was given to :c:func:`cmb_process_interrupt()`, perhaps :c:macro:`CMB_PROCESS_INTERRUPTED` or some other value that has an application-defined meaning. It unwinds the 5 units it collected during the call and returns holding the same amount as it held before calling :c:func:`cmb_resourcepool_acquire()`, 10 units. Preempt calls can themselves be preempted by higher priority processes or interrupted in the same way as acquire calls if the preempt was not immediately fulfilled and the process had to wait at the resource guard. Once there, it is fair game for preempts and interrupts. Another potential complexity: Suppose a process holds more than one type of resource, for example: .. code-block:: c cmb_resource_acquire(rp); cmb_resourcepool_acquire(rsp1, 10); cmb_resourcepool_acquire(rsp2, 15); int64_t signal = cmb_process_hold(100,0); if (signal == CMB_PROCESS_PREEMPTED) { /* ??? */ } In cases like this, the functions :c:func:`cmb_resource_held_by_process()` and :c:func:`cmb_resourcepool_held_by_process()` with a pointer to the process itself as the second argument can be useful to figure out which resource was preempted. If the caller does not have a pointer to itself handy (it is always the first argument to the process function), it can get one by calling :c:func:`cmb_process_current()`, returning a pointer to the currently executing process, i.e. the caller. Buffers and object queues, interrupted ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The semantics of buffers and object queues are different from the resources and resource pools. A process can acquire and hold a resource, making it unavailable for other processes until it is released. Preempting it naturally means taking the resource away from the process because someone else needs it more, right now. Buffers and their cousins act differently. Once something is put in, other processes can get it and consume it immediately. Preempting a put or get operation does not have any obvious meaning. If a buffer is empty, a process get call is waiting at the resource guard, and a higher priority process wants to get some first, it just calls :c:func:`cmb_buffer_get()` and goes first in the priority queue. However, waiting puts and gets can still be interrupted. For the :c:struct:`cmb_objectqueue` and :c:struct:`cmb_priorityqueue`, it is very simple. If the respective ``_put()`` or ``_get()`` call returned :c:macro:`CMB_PROCESS_SUCCESS` the object was successfully added to the queue or retrieved from it. If it returned anything else, it was not. The :c:struct:`cmb_buffer` is similarly intuitive. Recall from :ref:`our first tutorial ` that the ``amount`` argument is given as a pointer to a variable, not as a value. As the put and get calls get underway, the value at this location is updated to reflect the progress. If interrupted, this value indicates how much was placed or obtained. The call returns at this point with no attempt to roll back to the state at the beginning of the call. If successful, the put call will return :c:macro:`CMB_PROCESS_SUCCESS` and have a zero value in this location. Similarly, the `_get()` call will have the requested amount. If interrupted, it will return something else and the amount variable will contain some other value between zero and the requested amount. While the cat is away... ^^^^^^^^^^^^^^^^^^^^^^^^ It is again probably easier to demonstrate with code than explain in computer sciencey terms how all this works. On a table, we have some pieces of cheese in a pile. There are several mice trying to collect the cheese and hold it for a while. Each mouse can carry different numbers of cheese cubes. They tend to drop it again quite fast, inefficient hoarders as they are. Unfortunately, there are also some rats, bigger and stronger than the mice. The rats will preempt the cheese from the mice, but only if the rat has higher priority. Otherwise, the rat will politely wait its turn. There is also a cat. It sleeps a lot, but when awake, it will select a random rodent and interrupt whatever it is doing. Since we do not plan to run any statistics here, we simplify the context struct to just the simulation struct. We can then write something like: .. code-block:: c /* The busy life of a mouse */ void *mousefunc(struct cmb_process *me, void *ctx) { cmb_assert_release(me != NULL); cmb_assert_release(ctx != NULL); const struct simulation *simp = ctx; struct cmb_resourcepool *sp = simp->cheese; uint64_t amount_held = 0u; while (true) { /* Decide on a random amount to get next time and set a random priority */ const uint64_t amount_req = cmb_random_dice(1, 5); const int64_t pri = cmb_random_dice(-10, 10); cmb_process_set_priority(me, pri); cmb_logger_user(stdout, USERFLAG1, "Acquiring %" PRIu64, amount_req); int64_t sig = cmb_resourcepool_acquire(sp, amount_req); if (sig == CMB_PROCESS_SUCCESS) { /* Acquire returned successfully */ amount_held += amount_req; cmb_logger_user(stdout, USERFLAG1, "Success, new amount held: %" PRIu64, amount_held); } else if (sig == CMB_PROCESS_PREEMPTED) { /* The acquire call did not end well */ cmb_logger_user(stdout, USERFLAG1, "Preempted during acquire, all my %s is gone", cmb_resourcepool_name(sp)); amount_held = 0u; } else { /* Interrupted, but we still have the same amount as before */ cmb_logger_user(stdout, USERFLAG1, "Interrupted by signal %" PRIi64, sig); } /* Hold on to it for a while */ sig = cmb_process_hold(cmb_random_exponential(1.0)); if (sig == CMB_PROCESS_SUCCESS) { /* We still have it */ cmb_logger_user(stdout, USERFLAG1, "Hold returned successfully"); } else if (sig == CMB_PROCESS_PREEMPTED) { /* Somebody snatched it all away from us */ cmb_logger_user(stdout, USERFLAG1, "Someone stole all my %s from me!", cmb_resourcepool_name(sp)); amount_held = 0u; } else { /* Interrupted while holding. Still have the cheese, though */ cmb_logger_user(stdout, USERFLAG1, "Interrupted by signal %" PRIi64, sig); } /* Drop some amount */ if (amount_held > 1u) { const uint64_t amount_rel = cmb_random_dice(1, amount_held); cmb_logger_user(stdout, USERFLAG1, "Holds %" PRIu64 ", releasing %" PRIu64, amount_held, amount_rel); cmb_resourcepool_release(sp, amount_rel); amount_held -= amount_rel; } /* Hang on a moment before trying again */ cmb_logger_user(stdout, USERFLAG1, "Holding, amount held: %" PRIu64, amount_held); sig = cmb_process_hold(cmb_random_exponential(1.0)); if (sig == CMB_PROCESS_PREEMPTED) { cmb_logger_user(stdout, USERFLAG1, "Someone stole the rest of my %s, signal %" PRIi64, cmb_resourcepool_name(sp), sig); amount_held = 0u; } } } The rats are pretty much the same as the mice, just a bit hungrier and stronger (i.e. assigning themselves somewhat higher priorities), and using :c:func:`cmb_resourcepool_preempt()` instead of ``_acquire()``: .. code-block:: c /* Decide on a random amount to get next time and set a random priority */ const uint64_t amount_req = cmb_random_dice(3, 10); const int64_t pri = cmb_random_dice(-5, 15); cmb_process_set_priority(me, pri); cmb_logger_user(stdout, USERFLAG1, "Preempting %" PRIu64, amount_req); int64_t sig = cmb_resourcepool_preempt(sp, amount_req); The cats, on the other hand, are never interrupted and just ignore return values: .. code-block:: c void *catfunc(struct cmb_process *me, void *ctx) { cmb_unused(me); cmb_assert_release(ctx != NULL); struct simulation *simp = ctx; struct cmb_process **cpp = (struct cmb_process **)simp; const long num = NUM_MICE + NUM_RATS; while (true) { /* Nobody interrupts a sleeping cat, disregard return value */ cmb_logger_user(stdout, USERFLAG1, "Zzzzz..."); (void)cmb_process_hold(cmb_random_exponential(5.0)); do { cmb_logger_user(stdout, USERFLAG1, "Awake, looking for rodents"); (void)cmb_process_hold(cmb_random_exponential(1.0)); struct cmb_process *tgt = cpp[cmb_random_dice(0, num - 1)]; cmb_logger_user(stdout, USERFLAG1, "Chasing %s", cmb_process_name(tgt)); /* Send it a random interrupt signal */ const int64_t sig = (cmb_random_flip()) ? CMB_PROCESS_INTERRUPTED : cmb_random_dice(10, 100); cmb_process_interrupt(tgt, sig, 0); /* Flip a coin to decide whether to go back to sleep */ } while (cmb_random_flip()); } } The complete code is in `tutorial/tut_2_1.c `__ We compile and run, and get output similar to this: .. code-block:: none [ambonvik@Threadripper cimba]$ build/tutorial/tut_3_1 | more Create a pile of 20 cheese cubes Create 5 mice to compete for the cheese Create 2 rats trying to preempt the cheese Create 1 cats chasing all the rodents Schedule end event Execute simulation... 0.0000 Cat_1 catfunc (218): Zzzzz... 0.0000 Rat_2 ratfunc (151): Preempting 4 0.0000 Rat_2 ratfunc (156): Success, new amount held: 4 0.0000 Mouse_4 mousefunc (77): Acquiring 1 0.0000 Mouse_4 mousefunc (82): Success, new amount held: 1 0.0000 Rat_1 ratfunc (151): Preempting 8 0.0000 Rat_1 ratfunc (156): Success, new amount held: 8 0.0000 Mouse_1 mousefunc (77): Acquiring 5 0.0000 Mouse_1 mousefunc (82): Success, new amount held: 5 0.0000 Mouse_3 mousefunc (77): Acquiring 2 0.0000 Mouse_3 mousefunc (82): Success, new amount held: 2 0.0000 Mouse_5 mousefunc (77): Acquiring 1 0.0000 Mouse_2 mousefunc (77): Acquiring 3 0.23852 Mouse_1 mousefunc (99): Hold returned normally 0.23852 Mouse_1 mousefunc (115): Holds 5, releasing 5 0.23852 Mouse_1 mousefunc (122): Holding, amount held: 0 0.23852 Mouse_5 mousefunc (82): Success, new amount held: 1 0.23852 Mouse_2 mousefunc (82): Success, new amount held: 3 0.30029 Cat_1 catfunc (221): Awake, looking for rodents 0.46399 Mouse_2 mousefunc (99): Hold returned normally 0.46399 Mouse_2 mousefunc (115): Holds 3, releasing 1 0.46399 Mouse_2 mousefunc (122): Holding, amount held: 2 0.56088 Mouse_1 mousefunc (77): Acquiring 1 0.56088 Mouse_1 mousefunc (82): Success, new amount held: 1 0.58910 Mouse_4 mousefunc (99): Hold returned normally 0.58910 Mouse_4 mousefunc (122): Holding, amount held: 1 0.73649 Mouse_5 mousefunc (99): Hold returned normally 0.73649 Mouse_5 mousefunc (122): Holding, amount held: 1 0.74171 Mouse_3 mousefunc (99): Hold returned normally 0.74171 Mouse_3 mousefunc (115): Holds 2, releasing 2 0.74171 Mouse_3 mousefunc (122): Holding, amount held: 0 0.83936 Mouse_3 mousefunc (77): Acquiring 4 0.89350 Mouse_5 mousefunc (77): Acquiring 5 1.3408 Rat_2 ratfunc (173): Hold returned normally 1.3408 Rat_2 ratfunc (189): Holds 4, releasing 1 1.3408 Rat_2 ratfunc (196): Holding, amount held: 3 1.3408 Mouse_3 mousefunc (82): Success, new amount held: 4 1.4394 Mouse_4 mousefunc (77): Acquiring 5 1.8889 Mouse_2 mousefunc (77): Acquiring 1 1.8992 Mouse_3 mousefunc (99): Hold returned normally 1.8992 Mouse_3 mousefunc (115): Holds 4, releasing 4 1.8992 Mouse_3 mousefunc (122): Holding, amount held: 0 1.9260 Mouse_1 mousefunc (99): Hold returned normally 1.9260 Mouse_1 mousefunc (122): Holding, amount held: 1 2.5697 Mouse_3 mousefunc (77): Acquiring 3 3.1025 Mouse_1 mousefunc (77): Acquiring 4 3.7215 Rat_2 ratfunc (151): Preempting 6 3.7215 Mouse_4 mousefunc (86): Preempted during acquire, all my Cheese is gone 3.7215 Mouse_1 mousefunc (86): Preempted during acquire, all my Cheese is gone 4.2186 Mouse_1 mousefunc (99): Hold returned normally 4.2186 Mouse_1 mousefunc (122): Holding, amount held: 0 4.7152 Mouse_1 mousefunc (77): Acquiring 5 4.8393 Cat_1 catfunc (224): Chasing Mouse_1 4.8393 Cat_1 catfunc (221): Awake, looking for rodents 4.8393 Mouse_1 mousefunc (92): Interrupted by signal -2 5.3060 Cat_1 catfunc (224): Chasing Mouse_4 5.3060 Cat_1 catfunc (221): Awake, looking for rodents 5.3060 Mouse_4 mousefunc (109): Interrupted by signal 20 5.3060 Mouse_4 mousefunc (122): Holding, amount held: 0 5.8149 Mouse_1 mousefunc (99): Hold returned normally 5.8149 Mouse_1 mousefunc (122): Holding, amount held: 0 6.0788 Mouse_1 mousefunc (77): Acquiring 4 6.1803 Rat_1 ratfunc (173): Hold returned normally 6.1803 Rat_1 ratfunc (189): Holds 8, releasing 3 6.1803 Rat_1 ratfunc (196): Holding, amount held: 5 ...and so on. The interactions can get rather intricate, but hopefully intuitive: A :c:func:`cmb_resourcepool_preempt()` call will start from the lowest priority victim process and take *all* of its resource, but only if the victim has strictly lower priority than the caller. If the requested amount is not satisfied from the first victim, it will continue to the next lowest priority victim. If some amount is left over after taking everything the victim held, the resource guard is signalled to evaluate what process gets the remainder. If no potential victims with strictly lower priority than the caller process exists, the caller will join the priority queue and wait in line for some amount to become available. Cimba does not try to be "fair" or "optimal" in its resource allocation, just efficient and predictable. If the application needs different allocation rules, it can either adjust process priorities dynamically or create a derived class with a custom demand function. It is possible to create deadlocks if multiple processes are waiting for each other. If you build a model with that property, such as Dijkstra's famous Dining Philosophers, the purpose of the model could be to compare the performance of different deadlock prevention algorithms. That logic belongs in the user model, not hidden inside Cimba. Real world uses ^^^^^^^^^^^^^^^ The example above was originally written as part of the Cimba unit test suite to ensure that the library tracking of how many units each process holds from the resource pool always matches the expected values calculated here. We wanted to make very sure that this is correct in all possible sequences of events, hence this frantic stress test with preemptions and interruptions galore. The preempt and interrupt mechanisms are important in a range of real-world modeling applications, ranging from hospital emergency room triage operations to manufacturing job shops and machine breakdown processes. Building, validating, and parallelizing the simulation will follow the same pattern as in :ref:`our first tutorial `, so we will not repeat that here. This completes our second tutorial, demonstrating how to "``_acquire()`` and ``_release()`` resources, and to use direct process interactions like :c:func:`cmb_process_interrupt()` and :c:func:`cmb_resourcepool_preempt`. We also have mentioned, but not demonstrated :c:func:`cmb_process_wait_process()` and :c:func:`cmb_process_wait_event()`. We encourage you to look up these in `the API reference documentation `_ next. .. _tut_3: Agents balking, reneging, and jockeying in queues ------------------------------------------------- In :ref:`our first tutorial `, we modeled a M/M/1 queue as a simple buffer with just a numeric value for the queue length. This was sufficient to calculate queue length statistics. We have extended this in our benchmarking case, `benchmark/MM1_simple.c `__ and `benchmark/MM1_multi.c `__, where the customers were modeled as discrete objects containing their own arrival times to compute statistics for the time each customer spent in the system. In both cases, our simulated results were well aligned with known queuing theory formulas. We now extend this to a more realistic but analytically intractable case where the customers are active, opinionated agents in their own right. The customers generated by the arrival process will make their own decisions on whether to join the queue or not (balking), leave midway (reneging), or switch to another queue that seems to be moving faster (jockeying). Or patiently wait until they get served and then leave for new adventures. Since the active entities in Cimba are the instances of :c:struct:`cmb_process`, this requires each customer to be represented as a process or a class derived from the process. We will make our customers a derived class from the :c:struct:`cmb_process`, inheriting its properties and methods, and adding some more specifics. The use case is to model an amusement park with guests wanting to use various attractions. Our customer, the park operator, wants us to analyze ways of influencing customer behavior. The overall metric is the time spent in the park per visitor and the breakdown of this time between riding, waiting, and walking. The time unit is minutes. Classes derived from cmb_process ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The object-oriented paradigm is very natural for simulation modeling, and was first developed for this purpose in Simula67. Obviously, C does not directly support object-orientation as a language feature, but is flexible enough to support an object-oriented programming style. We will discuss this in more detail in the :ref:`background section `. In this tutorial we will just focus on how to use it. Basically, in object-oriented C, the ``struct`` becomes the class. By placing one struct as the first member of another, the outer struct effectively becomes a derived class from the inner one. A pointer to the outer struct will point to the same address as the inner struct, and one can freely cast the same pointer value between the two types. They exist at the same memory address. For example, we want our server process to also contain a pointer to the queue it gets visitors from, the number of visitors that can board for each run of the ride, and the parameters of the service time distribution. We could pass this to a plain :c:struct:`cmb_process` as part of its ``context`` initialization argument, but we can also create a derived ``struct server`` as a derived class from the :c:struct:`cmb_process`: .. code-block:: c struct server { struct cmb_process core; /* <= Note: The real thing, not a pointer */ struct cmb_priorityqueue *pq; unsigned batch_size; double dur_min, dur_mode, dur_max; }; It is important to note that the first member of the struct is the parent class struct itself, not a pointer to an object of that class. We can now define the core methods of the server class: .. code-block:: c struct server *server_create(void) { struct server *sp = malloc(sizeof(struct server)); cmb_assert_release(sp != NULL); return sp; } void server_initialize(struct server *sp, const char *name, struct cmb_priorityqueue *qp, const unsigned bs, const double dmin, const double dmod, const double dmax) { sp->pq = qp; sp->batch_size = bs; sp->dur_min = dmin; sp->dur_mode = dmod; sp->dur_max = dmax; cmb_process_initialize(&sp->core, name, serverfunc, NULL, 0u); } void server_start(struct server *sp) { cmb_process_start(&sp->core); } void server_terminate(struct server *sp) { cmb_process_terminate(&sp->core); } void server_destroy(struct server *sp) { free(sp); } The most important member function is the server process function itself. It gets a group of suspended ``visitor`` processes from the priority queue, loads them into the attraction, holds them for the duration of the ride, stores some statistics in them, before resuming them as active processes. It can look like this: .. code-block:: c void *serverfunc(struct cmb_process *me, void *vctx) { cmb_assert_debug(me != NULL); cmb_unused(vctx); const struct server *sp = (struct server *)me; while (true) { /* Prepare for the next batch of visitors */ unsigned cnt = 0u; struct visitor *batch[sp->batch_size]; /* Wait for the first visitor, then fill the ride as best possible */ cmb_logger_user(stdout, LOGFLAG_SERVICE, "Open for next batch"); do { struct visitor *vip; (void)cmb_priorityqueue_get(sp->pq, (void**)&(vip)); struct cmb_process *pp = (struct cmb_process *)vip; cmb_process_timers_clear(pp); cmb_logger_user(stdout, LOGFLAG_SERVICE, "Got visitor %s", cmb_process_name(pp)); batch[cnt++] = vip; } while ((cmb_priorityqueue_length(sp->pq) > 0) && (cnt < sp->batch_size)) ; /* Log the waiting times for this batch of visitors */ cmb_logger_user(stdout, LOGFLAG_SERVICE, "Has %u for %u slots", cnt, sp->batch_size); for (unsigned ui = 0; ui < cnt; ui++) { struct visitor *vip = batch[ui]; const double qt = cmb_time() - vip->entry_time_queue; vip->waiting_time += qt; } /* Run the ride with these visitors on board */ cmb_logger_user(stdout, LOGFLAG_SERVICE, "Starting ride"); const double dur = cmb_random_PERT(sp->dur_min, sp->dur_mode, sp->dur_max); cmb_process_hold(dur); cmb_logger_user(stdout, LOGFLAG_SERVICE, "Ride finished"); /* Unload and send the visitors on their merry way */ for (unsigned ui = 0u; ui < cnt; ui++) { struct visitor *vip = batch[ui]; vip->riding_time += dur; struct cmb_process *pp = (struct cmb_process *)vip; cmb_logger_user(stdout, LOGFLAG_SERVICE, "Resuming visitor %s", cmb_process_name(pp)); cmb_process_resume(pp, CMB_PROCESS_SUCCESS); } } } Since our processes are *asymmetric* coroutines, the :c:func:`cmb_process_resume()` call does not directly resume the target process (coroutine), but schedules an event that will make the dispatcher resume the target process (coroutine). That way, all control passes through the dispatcher, and all processes are resumed by the dispatcher only. For this server function, that implies that :c:func:`cmb_process_resume()` is a non-blocking call. It returns immediately and the server can unload all visitors before yielding the CPU in the :c:func:`cmb_priorityqueue_get()` or :c:func:`cmb_process_hold()` calls. Setting and clearing timers ^^^^^^^^^^^^^^^^^^^^^^^^^^^ Similarly, we can define the ``struct visitor`` as a derived class from the ``cmb_process``: .. code-block:: c struct visitor { struct cmb_process core; /* <= Note: The real thing, not a pointer */ double patience; bool goldcard; double entry_time_park; double entry_time_queue; unsigned current_attraction; double riding_time; double waiting_time; double walking_time; unsigned num_attractions_visited; }; It contains some parameters, such as its ``patience`` and whether is a priority gold card holder, holds some state, such as the ``current_attraction``, and collects some history, such as the count of attractions visited and the accumulated riding, waiting, and walking times. Its creators and destructors are similar to the ``server``'s, but the process function brings some new features: .. code-block:: c void *visitor_proc(struct cmb_process *me, void *vctx) { cmb_assert_debug(me != NULL); cmb_assert_debug(vctx != NULL); struct visitor *vip = (struct visitor *)me; const struct context *ctx = vctx; const struct simulation *sim = ctx->sim; cmb_logger_user(stdout, LOGFLAG_VISITOR, "Started"); vip->current_attraction = IDX_ENTRANCE; while (vip->current_attraction != IDX_EXIT) { /* Select where to go next */ const unsigned ua = vip->current_attraction; const struct attraction *ap_from = &sim->park_attractions[ua]; const struct cmb_random_alias *qv = ap_from->quo_vadis; const unsigned nxt = cmb_random_alias_sample(qv); cmb_logger_user(stdout, LOGFLAG_VISITOR, "At %u, next %u", ua, nxt); /* Walk there */ const double mwt = transition_times[ua][nxt]; const double wt = cmb_random_PERT(0.5 * mwt, mwt, 2.0 * mwt); cmb_process_hold(wt); vip->walking_time += wt; vip->current_attraction = nxt; /* Join the shortest queue if several */ if (nxt != IDX_EXIT) { const struct attraction *ap_to = &sim->park_attractions[nxt]; uint64_t shrtlen = UINT64_MAX; unsigned shrtqi = 0u; for (unsigned qi = 0u; qi < ap_to->num_queues; qi++) { struct cmb_priorityqueue *pq = ap_to->queues[qi]; const unsigned len = cmb_priorityqueue_length(pq); if (len < shrtlen) { shrtlen = len; shrtqi = qi; } } cmb_logger_user(stdout, LOGFLAG_VISITOR, "At attraction %u, queue %u looks shortest (%" PRIu64 ")", nxt, shrtqi, shrtlen); /* Balking? */ if (shrtlen > (uint64_t)(vip->patience * balking_threshold)) { /* Too long queue, go to next instead */ cmb_logger_user(stdout, LOGFLAG_VISITOR, "Balked at attraction %u, queue %u", nxt, shrtqi); continue; } /* Set two timeouts */ const double jt = vip->patience * jockeying_threshold; cmb_process_timer_set(me, jt, TIMER_JOCKEYING); const double rt = vip->patience * reneging_threshold; cmb_process_timer_add(me, rt, TIMER_RENEGING); struct cmb_priorityqueue *q = sim->park_attractions[nxt].queues[shrtqi]; uint64_t pq_hndl; cmb_logger_user(stdout, LOGFLAG_VISITOR, "Joining queue %s", cmb_priorityqueue_name(q)); vip->entry_time_queue = cmb_time(); /* Not blocking, since the queue has unlimited size */ cmb_priorityqueue_put(q, (void *)vip, cmb_process_priority(me), &pq_hndl); /* Suspend ourselves until we have finished both queue and ride, * trusting the server to update our waiting and riding times */ while (true) { const int64_t sig = cmb_process_yield(); if (sig == TIMER_JOCKEYING) { cmb_logger_user(stdout, LOGFLAG_VISITOR, "Jockeying at attraction %u", nxt); const unsigned in_q = shrtqi; const unsigned mypos = cmb_priorityqueue_position(q, pq_hndl); shrtlen = UINT64_MAX; shrtqi = 0u; for (unsigned qi = 0u; qi < ap_to->num_queues; qi++) { struct cmb_priorityqueue *pq = ap_to->queues[qi]; const unsigned len = cmb_priorityqueue_length(pq); if (len < shrtlen) { shrtlen = len; shrtqi = qi; } } if (shrtlen < mypos) { cmb_logger_user(stdout, LOGFLAG_VISITOR, "Moving from queue %u to queue %u (len %" PRIu64 ")", in_q, shrtqi, shrtlen); const bool found = cmb_priorityqueue_cancel(q, pq_hndl); cmb_assert_debug(found == true); q = ap_to->queues[shrtqi]; const int64_t pri = cmb_process_priority(me); cmb_priorityqueue_put(q, (void *)vip, pri + 1, &pq_hndl); continue; } } else if (sig == TIMER_RENEGING) { cmb_logger_user(stdout, LOGFLAG_VISITOR, "Reneging at attraction %u", ua); const bool found = cmb_priorityqueue_cancel(q, pq_hndl); cmb_assert_debug(found == true); cmb_process_timers_clear(me); break; } else { cmb_assert_release(sig == CMB_PROCESS_SUCCESS); vip->num_attractions_visited++; cmb_logger_user(stdout, LOGFLAG_VISITOR, "Yay! Leaving attraction %u", vip->current_attraction); break; } } /* Out of the attraction, slightly dizzy. Do it again? */ } } /* No, enough for today */ cmb_logger_user(stdout, LOGFLAG_VISITOR, "Leaving"); cmb_objectqueue_put(sim->departeds, (void *)vip); cmb_process_exit(NULL); /* Not reached */ return NULL; } To model the jockeying and reneging behavior, we use additional :c:struct:`cmb_process` methods: The process can schedule a future timeout event for itself by calling :c:func:`cmb_process_timer_set()`. When the timer event occurs, the process will be interrupted from whatever it is doing with whatever signal the timer was set to send, with :c:macro:`CMB_PROCESS_TIMEOUT` as a predefined possibility. By setting a timeout before some ``_acquire()``, ``_get()``, or ``_wait()`` call, the process can abort the wait when patience runs out. Here, we set *two* different timers, one with the signal ``TIMER_JOCKEYING`` and one with ``TIMER_RENEGING``. After these are set, the visitor process suspends itself unconditionally by calling :c:func:`cmb_process_yield()` after entering the :c:struct:`cmb_priorityqueue`. The visitor process is then a passive object managed by the server process until it is resumed by the server, or until one of the timers goes off and it awakes with a start and shows signs of impatience. Note that the timers remain active past the call to :c:func:`cmb_priorityqueue_put()`. The timers are part of the process state, not the function call. They remain active until either going off or getting cancelled / cleared. In this case, we cancel the jockeying timer if the reneging timer goes off, but not the other way around, since they model two different behaviors. Note also that one of the first things the server process does after getting a visitor from the priority queue is to call :c:func:`cmb_process_timers_clear()` to make sure it does not suddenly wake up and walk off in the middle of the ride. Conversely, one of the first things the visitor process does after waking up on a timer signal is to call :c:func:`cmb_priorityqueue_cancel()` to make sure that it is not admitted on a ride when it actually is walking away from that attraction. The entire dynamic of having the same process object act both as an active opinionated agent in its simulated world, and as a passive object being handled by other processes is enabled by our processes being stackful coroutines. When suspended, their entire state is safely stored on their own execution stack until it is resumed from the same point where it left off. Our processes are first-class objects that can be passed around as function arguments and return values, stored and retrieved, created and destroyed at runtime. Also note that the ``visitor`` process ends by placing itself in the :c:struct:`cmb_objectqueue` of departed visitors before exiting. That way, a departure process can wait for it there, collect its statistics, and reclaim its allocated memory: .. code-block:: c void *departure_proc(struct cmb_process *me, void *vctx) { cmb_unused(me); cmb_assert_debug(vctx != NULL); const struct context *ctx = vctx; struct simulation *sim = ctx->sim; while (true) { /* Wait for a departing visitor */ struct visitor *vip = NULL; (void)cmb_objectqueue_get(sim->departeds, (void **)(&vip)); cmb_assert_debug(vip != NULL); struct cmb_process *pp = (struct cmb_process *)vip; cmb_assert_debug(cmb_process_status(pp) == CMB_PROCESS_FINISHED); cmb_logger_user(stdout, LOGFLAG_DEPARTURE, "%s departed", ((struct cmb_process *)vip)->name); /* Collect its statistics */ const double tsys = cmb_time() - vip->entry_time_park; cmb_datasummary_add(&sim->time_in_park, tsys); cmb_datasummary_add(&sim->riding_times, vip->riding_time); cmb_datasummary_add(&sim->waiting_times, vip->waiting_time); cmb_datasummary_add(&sim->num_rides, vip->num_attractions_visited); cmb_datasummary_add(&sim->walking_times, vip->walking_time); /* Reclaim its memory allocation */ visitor_terminate(vip); visitor_destroy(vip); } } Here, we have not bothered to define this as a derived class, but just pass it the pointer to the ``departeds`` object queue as part of the context argument. Alias sampling probabilities ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ We model the amusement park attractions as nodes in a fully-connected network. Node 0 will be the entrance, while nodes 1 through *n* are the attractions, and node *n+1* is the exit. Each node (except the entrance and the exit) has one or more priority queues for waiting customers. For each node *i*, there is a certain probability that the customer goes to attraction *j* next, including both the possibility of exiting the park and the possibility of taking another go on the same attraction. This can be represented as a matrix of transition probabilities *p(i, j)*. However, the obvious sampling algorithm is rather slow for large *n*: Generate a random number *x* in *[0, 1]*. Starting *j* from 0, add *p(i, j)* until the sum is greater than *x*. The current *j* is then the selected value. (This is implemented in Cimba as the function :c:func:`cmb_random_loaded_dice()`.) Instead, we will use a particularly clever O(1) algorithm known as Vose alias sampling. It requires us to pre-compute a lookup table that we will store with each node by calling :c:func:`cmb_random_alias_create()`. We can then sample it as often as needed by calling :c:func:`cmb_random_alias_sample()`, and clean it up when no longer needed by calling :c:func:`cmb_random_alias_destroy()`. The overhead in construction and destruction pays for itself in overall efficiency already for *n > 7* or so. .. note:: The alert reader will notice that this collapses our usual four-part harmony of ``_create()``, ``_initialize()``, ``_terminate()``; and ``_destroy()`` into just the two functions ``_create()`` and ``_destroy()``. Since we are not planning to create any derived classes from the alias table, we found this a reasonable exception to make. It looks like this in the ``struct attraction``: .. code-block:: c struct attraction { unsigned num_queues; struct cmb_priorityqueue **queues; unsigned num_servers; struct server **servers; struct cmb_random_alias *quo_vadis; }; It is initialized like this in ``attraction_initialize`` from one row of the transition probability matrix: .. code-block:: c ap->quo_vadis = cmb_random_alias_create(NUM_ATTRACTIONS + 2, transition_probs[ui]); and, as we saw above, sampled like this by the ``visitor_proc`` when deciding where to go next. .. code-block:: c const unsigned nxt = cmb_random_alias_sample(qv); It eventually gets destroyed from ``attraction_terminate()``: .. code-block:: c cmb_random_alias_destroy(ap->quo_vadis); A day in the park ^^^^^^^^^^^^^^^^^ You can find the rest of the code in `tutorials/tut_3_1.c `__ Running it, we get output like this: .. code-block:: none [ambonvik@Threadripper cimba]$ build/tutorial/tut_3_1 | more 0.0000 Server_01_00_00 serverfunc (186): Open for next batch 0.0000 Server_02_00_00 serverfunc (186): Open for next batch 0.0000 Server_02_00_01 serverfunc (186): Open for next batch 0.0000 Server_02_00_02 serverfunc (186): Open for next batch 0.0000 Server_03_00_00 serverfunc (186): Open for next batch 0.0000 Server_03_00_01 serverfunc (186): Open for next batch 0.0000 Server_04_00_00 serverfunc (186): Open for next batch 0.0000 Server_04_01_00 serverfunc (186): Open for next batch 0.0000 Server_04_02_00 serverfunc (186): Open for next batch 0.0000 Server_05_00_00 serverfunc (186): Open for next batch 0.0000 Server_06_00_00 serverfunc (186): Open for next batch 0.0000 Server_07_00_00 serverfunc (186): Open for next batch 0.0000 Server_08_00_00 serverfunc (186): Open for next batch 0.0000 Server_09_00_00 serverfunc (186): Open for next batch 0.038977 Arrivals arrival_proc (529): Visitor_000001 arriving 0.038977 Visitor_000001 visitor_proc (336): Started 0.038977 Visitor_000001 visitor_proc (344): At 0, next 2 0.068032 Arrivals arrival_proc (529): Visitor_000002 arriving 0.068032 Visitor_000002 visitor_proc (336): Started 0.068032 Visitor_000002 visitor_proc (344): At 0, next 1 1.0330 Arrivals arrival_proc (529): Visitor_000003 arriving 1.0330 Visitor_000003 visitor_proc (336): Started 1.0330 Visitor_000003 visitor_proc (344): At 0, next 1 1.1268 Arrivals arrival_proc (529): Visitor_000004 arriving 1.1268 Visitor_000004 visitor_proc (336): Started 1.1268 Visitor_000004 visitor_proc (344): At 0, next 2 2.1046 Visitor_000002 visitor_proc (367): At attraction 1, queue 0 looks shortest (0) 2.1046 Visitor_000002 visitor_proc (387): Joining queue Queue_01_00 2.1046 Server_01_00_00 serverfunc (192): Got visitor Visitor_000002 2.1046 Server_01_00_00 serverfunc (197): Has 1 for 1 slots 2.1046 Server_01_00_00 serverfunc (206): Starting ride 2.6024 Arrivals arrival_proc (529): Visitor_000005 arriving 2.6024 Visitor_000005 visitor_proc (336): Started 2.6024 Visitor_000005 visitor_proc (344): At 0, next 4 3.9335 Visitor_000003 visitor_proc (367): At attraction 1, queue 0 looks shortest (0) 3.9335 Visitor_000003 visitor_proc (387): Joining queue Queue_01_00 4.2126 Arrivals arrival_proc (529): Visitor_000006 arriving 4.2126 Visitor_000006 visitor_proc (336): Started 4.2126 Visitor_000006 visitor_proc (344): At 0, next 1 4.4175 Visitor_000001 visitor_proc (367): At attraction 2, queue 0 looks shortest (0) 4.4175 Visitor_000001 visitor_proc (387): Joining queue Queue_02_00 4.4175 Server_02_00_00 serverfunc (192): Got visitor Visitor_000001 4.4175 Server_02_00_00 serverfunc (197): Has 1 for 5 slots 4.4175 Server_02_00_00 serverfunc (206): Starting ride 6.2520 Server_01_00_00 serverfunc (209): Ride finished 6.2520 Server_01_00_00 serverfunc (218): Resuming visitor Visitor_000002 6.2520 Server_01_00_00 serverfunc (186): Open for next batch 6.2520 Server_01_00_00 serverfunc (192): Got visitor Visitor_000003 6.2520 Server_01_00_00 serverfunc (197): Has 1 for 1 slots 6.2520 Server_01_00_00 serverfunc (206): Starting ride 6.2520 Visitor_000002 visitor_proc (434): Yay! Leaving attraction 1 6.2520 Visitor_000002 visitor_proc (344): At 1, next 2 8.9219 Visitor_000006 visitor_proc (367): At attraction 1, queue 0 looks shortest (0) 8.9219 Visitor_000006 visitor_proc (387): Joining queue Queue_01_00 9.2341 Visitor_000002 visitor_proc (367): At attraction 2, queue 0 looks shortest (0) 9.2341 Visitor_000002 visitor_proc (387): Joining queue Queue_02_00 9.2341 Server_02_00_01 serverfunc (192): Got visitor Visitor_000002 9.2341 Server_02_00_01 serverfunc (197): Has 1 for 5 slots 9.2341 Server_02_00_01 serverfunc (206): Starting ride ... ...and so on, until it finishes and reports its statistics for the day: .. code-block:: none Number of rides taken: N 477 Mean 2.696 StdDev 2.171 Variance 4.712 Skewness 1.504 Kurtosis 4.213 Time spent in park: N 477 Mean 58.68 StdDev 31.71 Variance 1006. Skewness 2.786 Kurtosis 13.48 Riding times: N 477 Mean 18.75 StdDev 16.99 Variance 288.5 Skewness 1.971 Kurtosis 7.142 Waiting times: N 477 Mean 4.877 StdDev 5.794 Variance 33.57 Skewness 1.985 Kurtosis 7.021 Walking times: N 477 Mean 33.30 StdDev 12.12 Variance 146.9 Skewness 2.914 Kurtosis 15.28 Detailed queue reports: Queue lengths for Queue_01_00: N 286 Mean 0.4110 StdDev 0.7480 Variance 0.5594 Skewness 2.007 Kurtosis 4.068 -------------------------------------------------------------------------------- ( -Infinity, 0.000) | [ 0.000, 1.000) |################################################## [ 1.000, 2.000) |#############- [ 2.000, 3.000) |#####- [ 3.000, 4.000) |#- [ 4.000, 5.000) |- [ 5.000, Infinity) | -------------------------------------------------------------------------------- Queue lengths for Queue_02_00: N 195 Mean 0.02044 StdDev 0.1670 Variance 0.02788 Skewness 9.533 Kurtosis 105.4 -------------------------------------------------------------------------------- ( -Infinity, 0.000) | [ 0.000, 1.000) |################################################## [ 1.000, 2.000) |= [ 2.000, 3.000) |- [ 3.000, 4.000) |- [ 4.000, Infinity) | -------------------------------------------------------------------------------- Queue lengths for Queue_03_00: N 226 Mean 0.07242 StdDev 0.3426 Variance 0.1174 Skewness 6.695 Kurtosis 61.08 -------------------------------------------------------------------------------- ( -Infinity, 0.000) | [ 0.000, 1.000) |##################################################- [ 1.000, 2.000) |##- [ 2.000, 3.000) |- [ 3.000, 4.000) |- [ 4.000, 5.000) |- [ 5.000, 6.000) |- [ 6.000, Infinity) | -------------------------------------------------------------------------------- Queue lengths for Queue_04_00: N 127 Mean 0.7141 StdDev 0.5106 Variance 0.2608 Skewness -0.2920 Kurtosis -0.5959 -------------------------------------------------------------------------------- ( -Infinity, 0.000) | [ 0.000, 1.000) |#######################= [ 1.000, 2.000) |################################################## [ 2.000, 3.000) |##- [ 3.000, Infinity) | -------------------------------------------------------------------------------- Queue lengths for Queue_04_01: N 97 Mean 0.6702 StdDev 0.4936 Variance 0.2436 Skewness -0.4400 Kurtosis -1.116 -------------------------------------------------------------------------------- ( -Infinity, 0.000) | [ 0.000, 1.000) |##########################- [ 1.000, 2.000) |################################################## [ 2.000, 3.000) |= [ 3.000, Infinity) | -------------------------------------------------------------------------------- Queue lengths for Queue_04_02: N 47 Mean 0.6594 StdDev 0.6268 Variance 0.3929 Skewness 0.4075 Kurtosis -0.6762 -------------------------------------------------------------------------------- ( -Infinity, 0.000) | [ 0.000, 1.000) |###########################################- [ 1.000, 2.000) |################################################## [ 2.000, 3.000) |########= [ 3.000, Infinity) | -------------------------------------------------------------------------------- Queue lengths for Queue_05_00: N 230 Mean 0.5964 StdDev 0.9796 Variance 0.9596 Skewness 1.978 Kurtosis 4.189 -------------------------------------------------------------------------------- ( -Infinity, 0.000) | [ 0.000, 1.000) |################################################## [ 1.000, 2.000) |################= [ 2.000, 3.000) |#######- [ 3.000, 4.000) |###- [ 4.000, 5.000) |= [ 5.000, 6.000) |= [ 6.000, Infinity) | -------------------------------------------------------------------------------- Queue lengths for Queue_06_00: N 202 Mean 0.2753 StdDev 0.5940 Variance 0.3529 Skewness 2.572 Kurtosis 7.949 -------------------------------------------------------------------------------- ( -Infinity, 0.000) | [ 0.000, 1.000) |################################################## [ 1.000, 2.000) |##########= [ 2.000, 3.000) |##- [ 3.000, 4.000) |- [ 4.000, 5.000) |- [ 5.000, Infinity) | -------------------------------------------------------------------------------- Queue lengths for Queue_07_00: N 214 Mean 0.2654 StdDev 0.5742 Variance 0.3297 Skewness 2.359 Kurtosis 5.642 -------------------------------------------------------------------------------- ( -Infinity, 0.000) | [ 0.000, 1.000) |################################################## [ 1.000, 2.000) |##########- [ 2.000, 3.000) |##- [ 3.000, 4.000) |= [ 4.000, Infinity) | -------------------------------------------------------------------------------- Queue lengths for Queue_08_00: N 145 Mean 0.3593 StdDev 0.6692 Variance 0.4478 Skewness 2.041 Kurtosis 4.269 -------------------------------------------------------------------------------- ( -Infinity, 0.000) | [ 0.000, 1.000) |################################################## [ 1.000, 2.000) |#############= [ 2.000, 3.000) |###= [ 3.000, 4.000) |= [ 4.000, 5.000) |- [ 5.000, Infinity) | -------------------------------------------------------------------------------- Queue lengths for Queue_09_00: N 188 Mean 0.2708 StdDev 0.6848 Variance 0.4689 Skewness 3.649 Kurtosis 17.04 -------------------------------------------------------------------------------- ( -Infinity, 0.000) | [ 0.000, 1.000) |##################################################- [ 1.000, 2.000) |########= [ 2.000, 3.000) |#= [ 3.000, 4.000) |- [ 4.000, 5.000) |- [ 5.000, 6.000) |- [ 6.000, 7.000) |- [ 7.000, Infinity) | -------------------------------------------------------------------------------- ...and finally, it reports the trial outcomes that would be passed on to the experiment array if we parallelized this simulation in the same way as the first (and next) example: .. code-block:: none Trial outcomes: --------------- Average number of rides: 2.70 Average time in park: 58.68 Average time in rides: 18.75 Average time in queues: 4.88 Average time walking: 33.30 We will skip parallelizing this example, since it would be the same as :ref:`parallelizing the first tutorial `, and will instead move to the next tutorial. .. _tut_4: A LNG tanker harbor with complex resources and conditions --------------------------------------------------------- Once upon a time, a harbor simulation with tugs puttering about was the author's first exposure to Simula67, coroutines, and object-oriented programming. The essential *rightness* made a lasting impression. Building a beefed-up 21st century version will be our next Cimba tutorial. We will use the occasion to introduce the extremely powerful :c:struct:`cmb_condition` that allows our processes to make arbitrarily complex ``wait`` calls. We will also show how to use some of the Cimba internal building blocks, like the ``cmi_slist`` singly linked list and the ``cmi_hashheap`` for various collections of simulation objects. Since a simulation model only should be built in order to answer some specific question or set of questions, we will assume that our customer, the Simulated Port Authority (SPA), needs to decide whether to spend next year's investment budget on buying more tugs, building another berth, or dredging a deeper harbor channel. The relevant performance metric is to minimize the average time spent in the harbor for the ships. The ships come in two sizes, large and small, with different requirements to wind, water depth, tugs, and berths. Our model will help the SPA prioritize. The time unit in our simulation will be hours. An empty simulation template ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Our starting point will be an empty shell from :ref:`the first tutorial `, giving the correct initial structure to the model. You will find it in `tutorial/tut_4_0.c `__. It does not do anything, so there is no point in compiling it as it is, but you can use it as a starting template for your own models as well. The first functional version is in `tutorial/tut_4_1.c `__. We will not repeat all the code here, just point out some of the highlights. Processes, resources, and conditions ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The simulated world is described in ``struct simulation``. Again, there is an arrival and a departure process generating and removing ships, and the ships themselves are again active processes. We have two pools of resources, the tugs and the berths (of two different sizes), and one single resource, the communication channel used to announce that a ship is moving. There are also two condition variables. One, the harbormaster, ensures that all necessary conditions (including tides and wind) are met and resources are available before it permits a ship to start acquiring resources and proceed towards docking. The other one, Davy Jones, just watches for ships leaving harbor and triggers the departure process. There is no guarantee that ships will be leaving in first in first out order, so we use a ``cmi_hashheap`` as a fast data set for active ships. Departed ships can be handled by simpler means, so a simple linked list with LIFO (stack) ordering will be sufficient. These two classes are not considered part of the external Cimba API, but happen to be available tools that fit the task. Our environment is next, describing the current wind and tide state for use by the harbormaster condition. Obviously, water depth will be different at different locations in a harbor, but we assume that the local topography is known and that a single tide gauge is sufficient. The ships' requirements are expressed against this tide gauge. Building our ships ^^^^^^^^^^^^^^^^^^ Our ships will come in two sizes, small and large. We define an ``enum`` for this, explicitly starting from zero to use it directly as an array index later. We then define a ``struct ship`` to be a derived class from :c:struct:`cmb_process` by placing a :c:struct:`cmb_process` as the first member of ``struct ship``. (Not a pointer to a :c:struct:`cmb_process` - the ship *is a* process.) The rest of the ship struct contains the characteristics of a particular ship object to be instantiated in the simulation. .. code-block:: c enum ship_size { SMALL = 0, LARGE }; /* A ship is a derived class from cmb_process */ struct ship { struct cmb_process core; /* <= Note: The real thing, not a pointer */ enum ship_size size; unsigned tugs_needed; double max_wind; double min_depth; }; .. note:: We do not use ``typedef`` for our object classes. It would only confuse matters by hiding the nature of the object. We want that to be very clear from the code. The only exception is for certain convoluted function prototypes like ``cmb_process_func`` and ``cmb_event_func``. These are ``typedef`` under those names to avoid complex and error-prone declarations and argument lists. Weather and tides ^^^^^^^^^^^^^^^^^ Weather and tides are modelled as simple processes that update the environment state once per hour, using a suitable stochastic and/or periodic model of the physical world. The weather process can look like this, only concerned about wind magnitude and direction: .. code-block:: c /* A process that updates the weather once per hour */ void *weather_proc(struct cmb_process *me, void *vctx) { cmb_unused(me); cmb_assert_debug(vctx != NULL); const struct context *ctxp = vctx; struct environment *envp = ctxp->env; const struct simulation *simp = ctxp->sim; while (true) { /* Wind magnitude in meters per second */ const double wmag = cmb_random_rayleigh(5.0); const double wold = envp->wind_magnitude; envp->wind_magnitude = 0.5 * wmag + 0.5 * wold; /* Wind direction in compass degrees, dominant from the southwest */ envp->wind_direction = cmb_random_PERT(0.0, 225.0, 360.0); /* Requesting the harbormaster to read the new weather bulletin */ cmb_condition_signal(simp->harbormaster); /* ... and wait until the next hour */ cmb_process_hold(1.0); } } Notice that just before holding, we ``signal`` the harbormaster condition, informing it that some state has changed, requiring it to re-evaluate its list of waiting ships. The tide process is similar, but combines the periodicity of astronomical tides with the randomness of weather-driven tide calculated from the environmental state left by the weather process. It also signals the harbormaster at the end of each iteration. You find it in the source code, ``void *tide_proc(struct cmb_process *me, void *vctx)``. The details of the weather and tide models are not important for this tutorial, only that: 1. We can calculate arbitrary state variables, such as the wind and tide here, using relevant mathematical methods. We could embed an AI model or some custom CUDA programming here if we needed to, as long as it is thread safe for our concurrent execution. Our simulated world just stands still until the calculation is done, possibly leaving the CPU to some other trial thread in the meantime if this thread is waiting for a response from a GPU, I/O from disk, or another blocking system call. 2. We *signal* a :c:struct:`cmb_condition` that the state has changed and that it needs to re-evaluate the requirements of any processes waiting for it. The :c:struct:`cmb_condition` is not busy-polling the state, but depends on being signaled by whatever process changes the state. In a discrete event simulation, state only changes due to some event, and no polling is needed between events. Resources and condition variables ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ To understand the general :c:struct:`cmb_condition` class, it may be helpful to start with the :c:struct:`cmb_resource` as a special case. The :c:struct:`cmb_resource` is essentially a binary semaphore. Only one process can hold it at a time. If some process tries to acquire the resource while it is already in use, that process will have to wait in a priority queue. There is a :c:struct:`cmb_resourceguard` managing the priority queue. When adding itself to the queue, a process files its *demand* function with the guard. The demand function is a predicate returning a ``bool``, either ``true`` or ``false``, essentially saying to the guard "wake me up when this becomes true". For a :c:struct:`cmb_resource`, the demand function is internal and pre-defined, evaluating to ``true`` if the resource is available. When a process releases the resource, the guard is signaled, the predicate evaluates to ``true``, and the highest priority waiting process gets the resource and returns successfully from its :c:func:`cmb_resource_acquire()` (or :c:func:`cmb_resource_preempt()`) call as the new holder of the resource. Similarly, the :c:struct:`cmb_resourcepool` is a counting semaphore, where there is a certain number of resource items and a process can acquire and release more than one unit at a time. Again, if not enough is available, the process files its demand with the guard and waits. This demand function is also internal and pre-defined. It evaluates to true if *some* amount of the resource is available, since a waiting process may collect its request piecewise. The :c:struct:`cmb_condition` exposes the resource guard and demand mechanism to the user application. It does not provide any particular resource object, but lets a process wait until an arbitrary condition is satisfied. The demand function may even be different for each waiting process. The condition will evaluate them in turn, and will schedule a wakeup event at the current time for *every* waiting process whose demand function evaluates to ``true``. What to do next is up to the user application. Making this mechanism even more powerful, the :c:struct:`cmb_resourceguard` class maintains a list of other resource guards observing this one. This is a signal forwarding mechanism. When signaled, a resource guard will evaluate its own priority queue, possibly schedule wakeup events for waiting processes, and then forward the signal to each observing resource guard for them to do the same. This makes it possible for the :c:struct:`cmb_condition` to provide complex combinations of "wait for all", "wait for any", and so forth by registering itself as an observer. If some observed resource guard gets signaled, the :c:struct:`cmb_condition` will also be signaled. The :c:struct:`cmb_condition` is still a passive object, not an active process. It only responds to calls like :c:struct:`cmb_condition_wait` and :c:func:`cmb_condition_signal` from some active :c:struct:`cmb_process`. If not signaled from a process, it does nothing. Armed with this knowledge, we can now define the demand predicate function for a ship requesting permission from the harbormaster to dock: .. code-block:: c /* The demand predicate function for a ship wanting to dock */ bool is_ready_to_dock(const struct cmb_condition *cvp, const struct cmb_process *pp, const void *vctx) { cmb_unused(cvp); cmb_assert_debug(pp != NULL); cmb_assert_debug(vctx != NULL); const struct ship *shpp = (struct ship *)pp; const struct context *ctxp = vctx; const struct environment *envp = ctxp->env; const struct simulation *simp = ctxp->sim; if (envp->water_depth < shpp->min_depth) { cmb_logger_user(stdout, USERFLAG1, "Water %f m too shallow for %s, needs %f", envp->water_depth, pp->name, shpp->min_depth); return false; } if (envp->wind_magnitude > shpp->max_wind){ cmb_logger_user(stdout, USERFLAG1, "Wind %f m/s too strong for %s, max %f", envp->wind_magnitude, pp->name, shpp->max_wind); return false; } if (cmb_resourcepool_available(simp->tugs) < shpp->tugs_needed) { cmb_logger_user(stdout, USERFLAG1, "Not enough available tugs for %s", pp->name); return false; } if (cmb_resourcepool_available(simp->berths[shpp->size]) < 1u) { cmb_logger_user(stdout, USERFLAG1, "No available berth for %s", pp->name); return false; } cmb_logger_user(stdout, USERFLAG1, "All good for %s", pp->name); return true; } The ship demands all this to hold true *before* it starts acquiring any tugs or other resources. This prevents ships from grabbing any tugs before their berth is available and the conditions permit docking. Cimba's observer pattern for signal forwarding ensures that this function is evaluated for each ship whenever the environmental state is updated or some resource has been released. The life of a ship ^^^^^^^^^^^^^^^^^^ Our ships are derived from the :c:struct:`cmb_process` class, inherit all properties from there, and adds some of its own. The function to execute in a ship process has the same signature as for the :c:struct:`cmb_process`. It can look like this: .. code-block:: c void *ship_proc(struct cmb_process *me, void *vctx) { cmb_assert_debug(me != NULL); cmb_assert_debug(vctx != NULL); struct ship *shpp = (struct ship *)me; const struct context *ctxp = vctx; /* Unpack some convenient shortcut names */ struct simulation *simp = ctxp->sim; struct cmb_condition *hbmp = simp->harbormaster; const struct trial *trlp = ctxp->trl; /* Note ourselves as active */ cmb_logger_user(stdout, USERFLAG1, "%s arrives", me->name); const double t_arr = cmb_time(); const uint64_t hndl = cmi_hashheap_enqueue(simp->active_ships, shpp, NULL, NULL, NULL, t_arr, 0u); /* Wait for suitable conditions to dock */ while (!is_ready_to_dock(NULL, me, ctxp)) { /* Loop to catch any spurious wakeups, such as several ships waiting for * the tide and one of them grabbing the tugs before we can react. */ cmb_condition_wait(hbmp, is_ready_to_dock, ctxp); } /* Resources are ready, grab them for ourselves */ cmb_logger_user(stdout, USERFLAG1, "%s cleared to dock", me->name); cmb_resourcepool_acquire(simp->berths[shpp->size], 1u); cmb_resourcepool_acquire(simp->tugs, shpp->tugs_needed); /* Announce our intention to move */ cmb_resource_acquire(simp->comms); cmb_process_hold(cmb_random_gamma(5.0, 0.01)); cmb_resource_release(simp->comms); /* It takes a while to move into position */ const double docking_time = cmb_random_PERT(0.4, 0.5, 0.8); cmb_process_hold(docking_time); /* Safely at the quay to unload cargo, dismiss the tugs for now */ cmb_logger_user(stdout, USERFLAG1, "%s docked, unloading", me->name); cmb_resourcepool_release(simp->tugs, shpp->tugs_needed); /* Unloading also takes a while */ const double tua = trlp->unloading_time_avg[shpp->size]; const double unloading_time = cmb_random_PERT(0.75 * tua, tua, 2 * tua); cmb_process_hold(unloading_time); /* Need the tugs again to get out of here */ cmb_logger_user(stdout, USERFLAG1, "%s ready to leave", me->name); cmb_resourcepool_acquire(simp->tugs, shpp->tugs_needed); /* Announce our intention to move */ cmb_resource_acquire(simp->comms); cmb_process_hold(cmb_random_gamma(5.0, 0.01)); cmb_resource_release(simp->comms); /* Gently move out again, assisted by tugs */ const double undocking_time = cmb_random_PERT(0.4, 0.5, 0.8); cmb_process_hold(undocking_time); /* Cleared berth, done with the tugs */ cmb_logger_user(stdout, USERFLAG1, "%s left harbor", me->name); cmb_resourcepool_release(simp->berths[shpp->size], 1u); cmb_resourcepool_release(simp->tugs, shpp->tugs_needed); /* This is a one-pass process, remove ourselves from the active set */ cmi_hashheap_remove(simp->active_ships, hndl); /* List ourselves as departed instead */ cmi_list_push(&(simp->departed_ships), shpp); /* Inform Davy Jones that we are coming his way */ cmb_condition_signal(simp->davyjones); /* Store the time we spent as an exit value in a separate heap object. * The exit value is a void*, so we could store anything there, but for this * tutorial, we keep it simple. */ const double t_dep = cmb_time(); double *t_sys_p = malloc(sizeof(double)); *t_sys_p = t_dep - t_arr; cmb_logger_user(stdout, USERFLAG1, "%s arr %g dep %f time in system %f", me->name, t_arr, t_dep, *t_sys_p); /* Note that returning from a process function has the same effect as calling * cmb_process_exit() with the return value as argument. */ return t_sys_p; } Note the loop on :c:func:`cmb_condition_wait()`. The condition will schedule a wakeup event for all waiting processes with a satisfied demand, but it is entirely possible that some other ship wakes first and grabs the resources before control passes here. Therefore, we test and wait again if it is no longer satisfied. For readers familiar with POSIX condition variables, there is a notable lack of a protecting mutex here. It is not needed in a coroutine-based concurrency model. Once control is back in this process, it will not be interrupted before we yield the execution through some call like :c:func:`cmb_process_hold()`. In particular, this is safe: .. code-block:: c /* Wait for suitable conditions to dock */ while (!is_ready_to_dock(NULL, me, ctxp)) { /* Loop to catch any spurious wakeups, such as several ships waiting for * the tide and one of them grabbing the tugs before we can react. */ cmb_condition_wait(hbmp, is_ready_to_dock, ctxp); } /* Resources are ready, grab them for ourselves */ cmb_logger_user(stdout, USERFLAG1, "%s cleared to dock", me->name); cmb_resourcepool_acquire(simp->berths[shpp->size], 1u); cmb_resourcepool_acquire(simp->tugs, shpp->tugs_needed); /* Announce our intention to move */ cmb_resource_acquire(simp->comms); cmb_process_hold(cmb_random_gamma(5.0, 0.01)); cmb_resource_release(simp->comms); We know that tugs and berths are available from the :c:func:`cmb_condition_wait()`, so the ``_acquire()`` calls will return immediately and successfully. On the other hand, this is not safe at all: .. code-block:: c /* Wait for suitable conditions to dock */ while (!is_ready_to_dock(NULL, me, ctx)) { /* Loop to catch any spurious wakeups, such as several ships waiting for * the tide and one of them grabbing the tugs before we can react. */ cmb_condition_wait(hbm, is_ready_to_dock, ctx); } /* Do NOT do this: Hold and/or request a resource not part of the condition * predicate, possibly yielding execution to other processes that may invalidate * our condition before we can act on it. */ cmb_resource_acquire(simp->comms); cmb_process_hold(cmb_random_gamma(5.0, 0.01)); cmb_resource_release(simp->comms); /* Who knows what happened to the resources in the meantime? */ cmb_logger_user(stdout, USERFLAG1, "%s cleared to dock", me->name); cmb_resourcepool_acquire(simp->berths[shpp->size], 1u); cmb_resourcepool_acquire(simp->tugs, shpp->tugs_needed); A mutex is not needed, but only because a coroutine has atomic execution between explicit yield points. It is the application program's own responsibility to avoid doing something that could invalidate the condition before acting on it. If your code needs a simulated mutex for some reason, a simple :c:struct:`cmb_resource` will do, since it is a binary semaphore that only can be released by the process that acquired it. We next write the arrival process generating ships: .. code-block:: c /* The arrival process generating new ships */ void *arrival_proc(struct cmb_process *me, void *vctx) { cmb_unused(me); cmb_assert_debug(vctx != NULL); const struct context *ctxp = vctx; const struct trial *trlp = ctxp->trl; const double mean = 1.0 / trlp->arrival_rate; const double p_large = trlp->percent_large; uint64_t cnt = 0u; while (true) { cmb_process_hold(cmb_random_exponential(mean)); /* The ship class is a derived sub-class of cmb_process, we malloc it * directly instead of calling cmb_process_create() */ struct ship *shpp = malloc(sizeof(struct ship)); /* We started the ship size enum from 0 to match array indexes. If we * had more size classes, we could use cmb_random_dice(0, n) instead. */ shpp->size = cmb_random_bernoulli(p_large); /* We would probably not hard-code parameters except in a demo like this */ shpp->max_wind = 10.0 + 2.0 * (double)(shpp->size); shpp->min_depth = 8.0 + 5.0 * (double)(shpp->size); shpp->tugs_needed = 1u + 2u * shpp->size; /* A ship needs a name */ char namebuf[20]; snprintf(namebuf, sizeof(namebuf), "Ship_%06" PRIu64 "%s", ++cnt, ((shpp->size == SMALL) ? "_small" : "_large")); cmb_process_initialize((struct cmb_process *)shpp, namebuf, ship_proc, vctx, 0); /* Start our brand new ship heading into the harbor */ cmb_process_start((struct cmb_process *)shpp); cmb_logger_user(stdout, USERFLAG1, "%s started", namebuf); } } As we see, Cimba processes can create other processes as needed. These become additional asymmetric coroutines executing on their own stacks with no special handling needed. There are no "function coloring" issues. The processes switch seamlessly between being active agents and passive objects as needed. For the programmer, mentally placing oneself in a process and just focusing on what that process does is a very powerful encapsulation of complexity. In this example, we just did the ship allocation and initialization inline. If we were to create and/or initialize ships from more than one place in the code, or just wanted to be tidy, we would wrap these in proper ``ship_create()`` and ``ship_initialize()`` functions to avoid repeating ourselves, but there is nothing that forces us to write pro forma constructor and destructor functions. For illustration and code style, we do this "properly" in the next iteration of the example, `tutorial/tut_4_2.c `__, where the ship class looks like this: .. code-block:: C * A ship is a derived class from cmb_process */ struct ship { struct cmb_process core; /* <= Note: The real thing, not a pointer */ enum ship_size size; unsigned tugs_needed; double max_wind; double min_depth; }; /* We'll do the object lifecycle properly with constructors and destructors. */ struct ship *ship_create(void) { struct ship *shpp = malloc(sizeof(struct ship)); return shpp; } /* Process function to be defined later, for now just declare that it exists */ void *ship_proc(struct cmb_process *me, void *vctx); void ship_initialize(struct ship *shpp, const enum ship_size sz, uint64_t cnt, void *vctx) { cmb_assert_release(shpp != NULL); shpp->size = sz; /* We would probably not hard-code parameters except in a demo like this */ shpp->max_wind = 10.0 + 2.0 * (double)(shpp->size); shpp->min_depth = 8.0 + 5.0 * (double)(shpp->size); shpp->tugs_needed = 1u + 2u * shpp->size; char namebuf[20]; snprintf(namebuf, sizeof(namebuf), "Ship_%06" PRIu64 "%s", ++cnt, ((shpp->size == SMALL) ? "_small" : "_large")); /* Done initializing the child class properties, pass it on to the parent class */ cmb_process_initialize((struct cmb_process *)shpp, namebuf, ship_proc, vctx, 0); } void ship_terminate(struct ship *shpp) { /* Nothing needed for the ship itself, pass it on to parent class */ cmb_process_terminate((struct cmb_process *)shpp); } void ship_destroy(struct ship *shpp) { free(shpp); } The departure process is reasonably straightforward, capturing the exit value from the ship process and then recycling the entire ship. A :c:struct:`cmb_condition` is used to know that one or more ships have departed, triggering the departure process to do something. This actually does exactly the same as using the :c:struct:`cmb_objectqueue` in the previous example, but demonstrates a different way of doing it (effectively exposing the internal workings of a :c:struct:`cmb_objectqueue`). .. code-block:: c void *departure_proc(struct cmb_process *me, void *vctx) { cmb_unused(me); cmb_assert_debug(vctx != NULL); const struct context *ctxp = vctx; struct simulation *simp = ctxp->sim; const struct trial *trlp = ctxp->trl; struct cmi_list_tag **dep_head = &(simp->departed_ships); while (true) { /* We do not need to loop here, this is the only process waiting */ cmb_condition_wait(simp->davyjones, is_departed, vctx); /* There is one, collect its exit value */ struct ship *shpp = cmi_list_pop(dep_head); double *t_sys_p = cmb_process_exit_value((struct cmb_process *)shpp); cmb_assert_debug(t_sys_p != NULL); cmb_logger_user(stdout, USERFLAG1, "Recycling %s, time in system %f", ((struct cmb_process *)shpp)->name, *t_sys_p); if (cmb_time() > trlp->warmup_time) { /* Add it to the statistics */ cmb_dataset_add(simp->time_in_system[shpp->size], *t_sys_p); } ship_terminate(shpp); ship_destroy(shpp); /* The exit value was malloc'ed in the ship process, free it as well */ free(t_sys_p); } } Running a trial ^^^^^^^^^^^^^^^ Our simulation driver function ``run_trial()`` does the same as in :ref:`our first tutorial `: Sets up the simulated world, runs the simulation, collects the results, and cleans up everything after itself. There are more objects involved this time, so we will not reproduce the entire function here, just call your attention to two sections. The first one is this: .. code-block:: c /* Create weather and tide processes, ensuring that weather goes first */ sim.weather = cmb_process_create(); cmb_process_initialize(sim.weather, "Wind", weather_proc, &ctx, 1); cmb_process_start(sim.weather); sim.tide = cmb_process_create(); cmb_process_initialize(sim.tide, "Depth", tide_proc, &ctx, 0); cmb_process_start(sim.tide); Since the calculations of tide level depends on the weather state, we give the weather process a higher priority than the tide process. It will then always execute first, giving the tide process guaranteed updated information rather than possibly acting on the previous hour's data. As an efficiency optimization, we can now also remove the :c:func:`cmb_condition_signal()` call from the weather process, since we know that the harbormaster will be signaled by the tide process immediately thereafter, saving one set of demand recalculations per simulated hour. The second section to note is where the resource guard observer/signal forwarding becomes useful: .. code-block:: c /* Create the harbormaster and Davy Jones himself */ sim.harbormaster = cmb_condition_create(); cmb_condition_initialize(sim.harbormaster, "Harbormaster"); cmb_resourceguard_register(&(sim.tugs->guard), &(sim.harbormaster->guard)); for (int i = 0; i < 2; i++) { cmb_resourceguard_register(&(sim.berths[i]->guard), &(sim.harbormaster->guard)); } sim.davyjones = cmb_condition_create(); cmb_condition_initialize(sim.davyjones, "Davy Jones"); The harbormaster registers itself as an observer at the tugs and berths to receive a signal whenever one is released by some other process. Otherwise, it would need to wait until the top of the next hour when it is signaled by the weather and tide processes before it noticed. Building and running our new harbor simulation, we get output similar to this: .. code-block:: none 1.5696 Arrivals arrival_proc (335): Ship_000001_large started 1.5696 Ship_000001_large ship_proc (227): Ship_000001_large arrives 1.5696 Ship_000001_large is_ready_to_dock (209): All good for Ship_000001_large 1.5696 Ship_000001_large ship_proc (240): Ship_000001_large cleared to dock, acquires berth and tugs 2.1582 Ship_000001_large ship_proc (253): Ship_000001_large docked, releases tugs, unloading 3.2860 Arrivals arrival_proc (335): Ship_000002_small started 3.2860 Ship_000002_small ship_proc (227): Ship_000002_small arrives 3.2860 Ship_000002_small is_ready_to_dock (209): All good for Ship_000002_small 3.2860 Ship_000002_small ship_proc (240): Ship_000002_small cleared to dock, acquires berth and tugs 3.9669 Ship_000002_small ship_proc (253): Ship_000002_small docked, releases tugs, unloading 4.7024 Arrivals arrival_proc (335): Ship_000003_small started 4.7024 Ship_000003_small ship_proc (227): Ship_000003_small arrives 4.7024 Ship_000003_small is_ready_to_dock (209): All good for Ship_000003_small 4.7024 Ship_000003_small ship_proc (240): Ship_000003_small cleared to dock, acquires berth and tugs 5.1600 Arrivals arrival_proc (335): Ship_000004_small started 5.1600 Ship_000004_small ship_proc (227): Ship_000004_small arrives 5.1600 Ship_000004_small is_ready_to_dock (209): All good for Ship_000004_small 5.1600 Ship_000004_small ship_proc (240): Ship_000004_small cleared to dock, acquires berth and tugs 5.2328 Ship_000003_small ship_proc (253): Ship_000003_small docked, releases tugs, unloading 5.7241 Arrivals arrival_proc (335): Ship_000005_small started 5.7241 Ship_000005_small ship_proc (227): Ship_000005_small arrives 5.7241 Ship_000005_small is_ready_to_dock (209): All good for Ship_000005_small 5.7241 Ship_000005_small ship_proc (240): Ship_000005_small cleared to dock, acquires berth and tugs 5.7273 Ship_000004_small ship_proc (253): Ship_000004_small docked, releases tugs, unloading 6.3406 Ship_000005_small ship_proc (253): Ship_000005_small docked, releases tugs, unloading 10.614 Ship_000002_small ship_proc (260): Ship_000002_small ready to leave, requests tugs [...] 330.08 Ship_000145_small ship_proc (227): Ship_000145_small arrives 330.08 Ship_000145_small is_ready_to_dock (189): Wind 10.491782 m/s too strong for Ship_000145_small, max 10.000000 330.26 Arrivals arrival_proc (335): Ship_000146_small started 330.26 Ship_000146_small ship_proc (227): Ship_000146_small arrives 330.26 Ship_000146_small is_ready_to_dock (189): Wind 10.491782 m/s too strong for Ship_000146_small, max 10.000000 330.92 Ship_000140_small ship_proc (260): Ship_000140_small ready to leave, requests tugs 330.92 Ship_000140_small is_ready_to_dock (189): Wind 10.491782 m/s too strong for Ship_000145_small, max 10.000000 331.00 Depth is_ready_to_dock (189): Wind 10.258885 m/s too strong for Ship_000145_small, max 10.000000 331.00 Depth is_ready_to_dock (189): Wind 10.258885 m/s too strong for Ship_000146_small, max 10.000000 331.39 Ship_000140_small ship_proc (272): Ship_000140_small left harbor, releases berth and tugs 331.39 Ship_000140_small is_ready_to_dock (189): Wind 10.258885 m/s too strong for Ship_000145_small, max 10.000000 331.39 Ship_000140_small is_ready_to_dock (189): Wind 10.258885 m/s too strong for Ship_000145_small, max 10.000000 331.39 Departures departure_proc (374): Recycling Ship_000140_small, time in system 9.899306 331.48 Ship_000143_small ship_proc (260): Ship_000143_small ready to leave, requests tugs 331.48 Ship_000143_small is_ready_to_dock (189): Wind 10.258885 m/s too strong for Ship_000145_small, max 10.000000 332.00 Depth is_ready_to_dock (209): All good for Ship_000145_small 332.00 Depth is_ready_to_dock (209): All good for Ship_000146_small 332.00 Ship_000145_small is_ready_to_dock (209): All good for Ship_000145_small 332.00 Ship_000145_small ship_proc (240): Ship_000145_small cleared to dock, acquires berth and tugs [...] 434.49 Ship_000189_large is_ready_to_dock (203): No available berth for Ship_000191_large 434.87 Ship_000198_small ship_proc (253): Ship_000198_small docked, releases tugs, unloading 434.87 Ship_000198_small is_ready_to_dock (203): No available berth for Ship_000191_large 435.00 Depth is_ready_to_dock (203): No available berth for Ship_000191_large 435.00 Depth is_ready_to_dock (203): No available berth for Ship_000193_large 435.07 Ship_000189_large ship_proc (272): Ship_000189_large left harbor, releases berth and tugs 435.07 Ship_000189_large is_ready_to_dock (209): All good for Ship_000191_large 435.07 Ship_000189_large is_ready_to_dock (209): All good for Ship_000193_large 435.07 Ship_000191_large is_ready_to_dock (209): All good for Ship_000191_large 435.07 Ship_000191_large ship_proc (240): Ship_000191_large cleared to dock, acquires berth and tugs 435.07 Ship_000193_large is_ready_to_dock (203): No available berth for Ship_000193_large 435.07 Departures departure_proc (374): Recycling Ship_000189_large, time in system 12.530678 435.16 Ship_000190_large ship_proc (260): Ship_000190_large ready to leave, requests tugs 435.16 Ship_000190_large is_ready_to_dock (203): No available berth for Ship_000193_large 435.59 Ship_000191_large ship_proc (253): Ship_000191_large docked, releases tugs, unloading 435.59 Ship_000191_large is_ready_to_dock (203): No available berth for Ship_000193_large 435.78 Ship_000190_large ship_proc (272): Ship_000190_large left harbor, releases berth and tugs 435.78 Ship_000190_large is_ready_to_dock (209): All good for Ship_000193_large 435.78 Ship_000193_large is_ready_to_dock (209): All good for Ship_000193_large 435.78 Ship_000193_large ship_proc (240): Ship_000193_large cleared to dock, acquires berth and tugs 435.78 Departures departure_proc (374): Recycling Ship_000190_large, time in system 12.268849 436.42 Ship_000193_large ship_proc (253): Ship_000193_large docked, releases tugs, unloading 436.68 Ship_000184_large ship_proc (260): Ship_000184_large ready to leave, requests tugs ...and so on. It looks rather promising, so we turn off the logging and rerun. Output: .. code-block:: none System times for small ships: Min 7.126 First_Q 9.097 Median 10.39 Third_Q 12.11 Max 32.92 N 3298 Mean 10.89 StdDev 2.552 Variance 6.512 Skewness 1.775 Kurtosis 6.340 -------------------------------------------------------------------------------- ( -Infinity, 7.126) | [ 7.126, 8.416) |##########################- [ 8.416, 9.706) |################################################## [ 9.706, 11.00) |#################################################= [ 11.00, 12.29) |##################################- [ 12.29, 13.58) |#####################= [ 13.58, 14.87) |############= [ 14.87, 16.16) |#####- [ 16.16, 17.44) |##= [ 17.44, 18.73) |#= [ 18.73, 20.02) |#- [ 20.02, 21.31) |= [ 21.31, 22.60) |- [ 22.60, 23.89) |- [ 23.89, 25.18) |- [ 25.18, 26.47) |- [ 26.47, 27.76) |- [ 27.76, 29.05) | [ 29.05, 30.34) |- [ 30.34, 31.63) | [ 31.63, 32.92) | [ 32.92, Infinity) |- -------------------------------------------------------------------------------- System times for large ships: Min 10.30 First_Q 13.21 Median 15.37 Third_Q 18.31 Max 48.60 N 1081 Mean 16.49 StdDev 4.832 Variance 23.35 Skewness 2.000 Kurtosis 6.453 -------------------------------------------------------------------------------- ( -Infinity, 10.30) | [ 10.30, 12.22) |###################################- [ 12.22, 14.13) |#################################################= [ 14.13, 16.05) |################################################## [ 16.05, 17.96) |######################################= [ 17.96, 19.88) |########################- [ 19.88, 21.79) |############- [ 21.79, 23.71) |##########- [ 23.71, 25.62) |####= [ 25.62, 27.54) |#####- [ 27.54, 29.45) |##- [ 29.45, 31.36) |= [ 31.36, 33.28) |= [ 33.28, 35.19) |- [ 35.19, 37.11) |#- [ 37.11, 39.02) |= [ 39.02, 40.94) |- [ 40.94, 42.85) | [ 42.85, 44.77) |- [ 44.77, 46.68) | [ 46.68, 48.60) | [ 48.60, Infinity) |- -------------------------------------------------------------------------------- Utilization of small berths: Min 0.000 First_Q 3.000 Median 4.000 Third_Q 5.000 Max 6.000 N 5859 Mean 3.839 StdDev 1.671 Variance 2.792 Skewness -0.2813 Kurtosis -0.9255 -------------------------------------------------------------------------------- ( -Infinity, 0.000) | [ 0.000, 1.000) |####- [ 1.000, 2.000) |################- [ 2.000, 3.000) |################################= [ 3.000, 4.000) |#######################################- [ 4.000, 5.000) |##########################################= [ 5.000, 6.000) |###################################- [ 6.000, 7.000) |################################################## [ 7.000, Infinity) | -------------------------------------------------------------------------------- Utilization of large berths: Min 0.000 First_Q 1.000 Median 2.000 Third_Q 3.000 Max 3.000 N 1853 Mean 1.816 StdDev 1.031 Variance 1.063 Skewness -0.3223 Kurtosis -1.102 -------------------------------------------------------------------------------- ( -Infinity, 0.000) | [ 0.000, 1.000) |###################- [ 1.000, 2.000) |#######################################- [ 2.000, 3.000) |###########################################- [ 3.000, 4.000) |################################################## [ 4.000, Infinity) | -------------------------------------------------------------------------------- Utilization of tugs: Min 0.000 First_Q 0.000 Median 0.000 Third_Q 1.000 Max 10.00 N 16456 Mean 0.8756 StdDev 1.305 Variance 1.703 Skewness 1.815 Kurtosis 3.444 -------------------------------------------------------------------------------- ( -Infinity, 0.000) | [ 0.000, 1.000) |################################################## [ 1.000, 2.000) |######################= [ 2.000, 3.000) |####= [ 3.000, 4.000) |########- [ 4.000, 5.000) |###- [ 5.000, 6.000) |= [ 6.000, 7.000) |= [ 7.000, 8.000) |- [ 8.000, 9.000) |- [ 9.000, 10.00) |- [ 10.00, 11.00) |- [ 11.00, Infinity) | -------------------------------------------------------------------------------- Avg time in system, small ships: 10.890823 Avg time in system, large ships: 16.488005 You can find the code for this single-threaded stage in `tutorial/tut_4_1.c `__. Turning up the power ^^^^^^^^^^^^^^^^^^^^ We still find it fascinating to see our simulated ships and tugs scurrying about, but our paying customer, the Simulated Port Authority, reminds us that next year's budget is due and that they would prefer getting answers to their questions soon. And, by the way, could we add scenarios where traffic increases by 10 % and 25 % above today's baseline levels? Time to fire up more computing power. Setting up our experiment, we have the factors dredging depth, number of tugs, and number of small and large berths. To ensure that the SPA also has numbers it can use beyond next year's budget, we try five levels of each parameter, dredging in steps of 0.5 meters and adding tugs and berths in steps of one. We again run ten replications of each parameter set. This gives us 4 * 5 * 3 = 60 parameter combinations and 60 * 10 = 600 trials. We will run each trial for one year of simulated time, i.e. 365 * 24 = 8760 time units, allowing 30 days' warmup time before we start collecting data. Writing the ``main()`` function is straightforward, albeit somewhat tedious. It does the same as in :ref:`the previous tutorial `: Sets up the experiment as an array of trials, executes it in parallel on the available CPU cores, assembles the output as a data file, and plots it in a separate gnuplot window. We compile and run, and this chart appears, showing our 60 parameter combinations, the average time in the system for small (blue) and large ships (red) under each set of parameters, and tight 95 % confidence intervals based on our 10 replications of each parameter combination: .. image:: ../images/tut_4_2.png We see that we can tell our client, the SPA, that they have enough tugs and do not need to dredge, but that they really should consider building one more large berth, especially if traffic is expected to increase. However, building more than one does not make much sense even at the highest traffic scenario. The SPA should rather consider building another one or two small berths next. This concludes our fourth tutorial. The code is in `tutorial/tut_4_2.c `_. We have demonstrated the very powerful :c:struct:`cmb_condition` allowing processes to wait for arbitrary combinations of conditions in the simulated world. We also used the internal :c:struct:`cmi_slist` and :c:struct:`cmi_hashheap` as handy building blocks for other purposes, and leave it to you to discover additional possibilities. Adding CUDA GPU power for simulation physics -------------------------------------------- So far, our models have mostly used simple math. Important simulation tasks may require even more computing power than what we have used so far. For example, there might be complex physical and/or geometrical calculations, optimization algorithms to guide the actions of active agents in the simulated world, or even artificial intelligence and machine learning capabilities. Or we might want to have a hybrid simulation with hardware-in-the-loop for special cases. Cimba does not include direct support for all possible combinations of GPUs and other hardware, but it does provide the necessary hooks to do this. In this tutorial, we will illustrate this by adding GPU processing power from Nvidia CUDA devices to handle model physics. Moreover, we will distribute the workload across all available CUDA devices in the system by assigning each POSIX pthread to a specific CUDA stream. This creates three distinct levels of concurrency within our simulation: The POSIX pthreads executing trials, the Cimba processes (asymmetric coroutines) that are the active entities inside each trial, and the CUDA cores providing massively parallel physics numbercrunching within our simulated processes. The AWACS scenario on a single CPU ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Our simulation scenario is an Airborne Early Warning and Control System flying over some landscape. There are many ground targets to be detected and tracked. For each target and each timestep, we need to figure out if the target is in the beam of the radar, if it is hidden by terrain or earth curvature, and if the reflected signal is strong enough to be detected among atmospheric attenuation and ground clutter. We want to analyze some aspect of the AWACS operation. For the purposes of this tutorial, we assume that the relevant measure of outcome is target detection as a function of AWACS altitude. We use `ParaView `_ for visualization: .. image:: ../images/tut_5_1c.png In this snapshot, we see our simulated AWACS (red) flying a racetrack pattern over a simulated terrain. The terrain model is a synthetic 1000 x 1000 nautical mile patch generated by Perlin noise with 1 arcsecond (~30 m) resolution. It is divided in three altitude-dependent biomes with different clutter and reflectivity parameters, representing farmland and urban areas in the valleys, forested hill slopes in middle altitudes, and bare mountain at high altitude. The altitude for any particular coordinates in the model is calculated by bi-directional linear interpolation between the terrain grid points. The terrain map is very large, nearly 3,6 billion points and 13,4 GB. Our visualization program `ParaView `_ is not able to handle this large data sets without crashing. For that reason, we decimate the terrain by a factor of 32 before writing it to a `HDF5 file `_. There are 1000 targets distributed randomly over the landscape. Each target is an active ``cmb_process`` (coroutine) looping through four distinct states: Hiding (low detectability), staging (medium detectability), firing (high detectability), and driving (medium detectability). This represents some kind of mobile missile launcher operating in a shoot-and-scoot pattern. After firing, it selects a random direction, speed, and distance; and drives there before hiding stationary at the new location. Target coordinates are recalculated on each sensor update. The target sizes in the visualization represent the current radar cross section (on a logarithmic scale). The target color in the visualization represents the detection state, where the blue targets are beyond the radar horizon due to Earth's curvature, the orange ones are shielded by terrain, the yellow ones are too weak to be detected in elevation-dependent ground clutter, and the white ones are successful detections. Just underneath the red AWACS indicator with the sensor lobe pointing towards the right, there are a handful of purple targets. These are in the "nadir hole" under the sensor platform, demonstrating the three-dimensional nature of this simulation example. The airborne sensor, indicated by the red sphere with an arrow indicating the current beam direction, is a scanning, horizon-limited surveillance radar evaluated per dwell. The dwell interval is 0.04 seconds, aggregated to one-second updates for the animation program `ParaView `_ via `HDF5 files `_. Features included in this sensor model: * 3D sensor-target geometry on a local tangent plane (WGS84 radii of curvature at the reference point) * Platform on a racetrack orbit with coordinated-turn bank angle. * Tropospheric refraction via an effective-Earth-radius k(h) derived from an exponential refractivity profile * Line-of-sight terrain masking by ray-marching the refracted ray against the terrain model, plus radar-horizon, elevation-limit, and nadir-cone gating. * Detection scaling from the radar equation (range^4, RCS), referenced to a calibrated range and RCS, with per-target-state RCS. * Surface clutter from a constant-gamma model, incidence angle dependent, gamma chosen per terrain biome and integrated over the resolution cell. * Cell-averaging CA-CFAR detection over non-coherently integrated pulses, above a thermal noise floor. * Doppler processing with a target state dependent ability to distinguish a target from the surrounding clutter. Hiding targets with velocity zero are difficult to distinguish from clutter, moving targets less so. * Specular multipath with a per-biome reflection coefficient attenuated by Rayleigh surface roughness. * Probabilistic detection drawn independently per dwell. Deliberately omitted from the model: * Antenna detail: a hard azimuth beam-gate replaces the main-beam pattern; no sidelobes, sidelobe clutter, elevation pattern, or monopulse. * Waveform detail: no pulse-compression range sidelobes, range/Doppler ambiguities, or eclipsing; range resolution is a parameter. * Target fluctuation: fixed per-state RCS with a detection draw, not a Swerling case or a detailed geometric model per target. * Knife-edge diffraction (masking is hard geometric LOS), diffuse multipath, polarization, gaseous/rain attenuation, ducting, and ECM/ECCM. * Tracking: detections are per-dwell only. There is no association, M-of-N, or track formation. The omitted features could of course also be added, but the chosen level of detail leaves a sufficiently realistic radar model to give reasonable and compute-intensive physics, while limiting the amount of technical detail not relevant to our current purpose. Our aim here is to provide a somewhat realistic geometry and and detection model to demonstrate physics modelling with Cimba and CUDA, not to build a complete radar model for actual military operations research. The code can be found in `tutorial/tut_5_1.c `_. Note that the model output is written by a separate ``cmb_process`` that takes a snapshot every second and appends it to the output file for off-line animation. This decouples the output intervals (here 1 second) from the simulation events (e.g., sensor dwells) and avoids cluttering simulation entities (like the sensor or targets) with unrelated code. Similarily, there is a separate ``cmb_process`` that only updates a progress bar and an estimated completion time to keep the user informed during a long-running program. The Cimba coroutines are computationally very efficient, so this is a logical way to do any model housekeeping tasks that need to happen in simulation time. The important characteristic of this version of the model is that it is written single-threaded, for sequential execution on a single processor. In this case, the processor happens to be an AMD Threadripper 3970x with 128 GB RAM, but the program only uses one of the 64 CPU cores. Still, the model is relatively fast, and finishes each run of 6 hours simulated time in about 10 min 40 sec. Of this, the terrain generation takes about 530 seconds, while the simulation itself takes about 100 seconds. CUDA to the rescue ^^^^^^^^^^^^^^^^^^ CUDA can be described as a C++ based programming interface to the massively parallel computing capabilities of modern graphics processing units (GPUs). This is not intended to be a tutorial in CUDA programming, only an illustration of how to harness this computing power in a Cimba discrete event simulation. For more detail about CUDA programming, see, e.g., `the Nvidia CUDA Toolkit `_. For our purposes, the most important CUDA concept is the *kernel*. This is function that can be launched on a large number of GPU cores in parallel, all running the same code but acting on different data. A modern GPU can have 10000 or more cores. Structuring the program to use this efficiently without too much branching and serializing interdepency is one important concern in writing an efficient CUDA program. Another concern is memory usage. The GPU has onboard VRAM with caches at several levels. The proessing cores may be faster than the memory, and certainly much faster than the data bus connecting CPU and GPU. Minimizing the need to copy large amounts of memory back and forth between CPU and GPU will be an inmportant determinant for the resulting performance. Looking at our simulation in `tut_5_1.c`, there are two clear opportunities for CUDA use: 1. The synthetic terrain generation. This calculates the terrain altitudes for each of nearly 3,6 billion points, about 13.4 GB. (If your GPU has less memory, you may want to reduce the terrain size before building this tutorial.) Importantly, the altitude of each point is calculated independently and does not depend on the values of neighboring points. This is what is called an *embarassingly parallel* problem. In principle, we can just throw all the points to the GPU, ask it to calculate the map, and let the GPU scheduler do the load balancing. 2. The sensor detection logic. For each time step (radar dwell, here every 0.04 s), the sensor needs to check each of the 1000 targets for beam coverage, radar horizon, terrain shielding, clutter masking, and so forth. This is not quite as obviously parallel as the terrain generation, since the logic also contains a fair amount of branching, but still a clear candidate for parallelization. As one might expect, there is a bit more to it. A naïve implementation - take the C code and rebrand it as CUDA - will most likely run slower than the CPU-only program. We will address the terrain first. We do not want to shuffle a 13.4 GB data object back and forth between the GPU and CPU. Once generated on the GPU, the terrain map stays there, in the VRAM of the GPU that generated it. Code that needs to access the terrain map must run on the GPU. For visualization output, we also decimate the map on the GPU and only pass back the exact data that need to be written to file, nothing more. Moreover, it turns out that CUDA has a specialized type of memory object, a *texture*, that happens to be optimized for the kind of ray tracing we will be doing to check for intersections between the radar beam and terrain. So, the terrain map is implemented as a CUDA texture in VRAM. Getting the best out of the detection logic also requires some thought. At each sensor update, we can run a first triage kernel to check each target against the easy conditions, such as in beam, in horizon, etc. The surviving candidates can then be fed to a second kernel to do the actual ray-marching from target to sensor to check for line-of-sight (suitably accounting for atmospheric diffraction), and finally the probabilistic detection attempt for the potentially visible candidates. Again, we want to reduce the amount of data that goes to the visualization, so we let the output process take snapshots of the targets array once per second and append those to the events file. This version of the program consists of the three files `tut_5_2.c `_, `tut_5_2.h `_, and `tut_5_2.cu `_. It runs in about 12 seconds, 50 times faster than the CPU-only version. The most time-consuming part of the CPU-only version, terrain generation, now only takes 0.7 seconds (about 750 times faster), while the simulation itself takes 11 seconds (9.4 times faster). The output looks the same as before. But we are still running on just one CPU core, and there is actually another GPU in the rig, so we are far from firing on all cylinders here. Let's put them all to work. Pthreads, coroutines, and CUDA kernels ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ We will in principle do the same as in our first tutorial: Set up an experiment array of trial structs, each one initialized with the relevant simulation parameters, where the Cimba worker threads can help themselves to the next trial and store the results. The only complication is handling the terrain map between trials. Since generating it is so fast on CUDA, we could just generate a new one for each trial at very little cost. However, for the purposes of this simulation, we will not do that. Instead, we will count up how many GPUs we have available, generate one terrain map per GPU, and let the terrain map become a parameter for the simulation. We can do this from ``main()`` like this: .. code-block:: C /* What compute horsepower do we have available? */ int n_gpus = 0; CUDA_CHECK(cudaGetDeviceCount(&n_gpus)); if (n_gpus < 1) { fprintf(stderr, "No CUDA devices found\n"); return EXIT_FAILURE; } const uint32_t n_terrains = (uint32_t)n_gpus; /* one terrain per GPU */ struct terrain **terrains = cmi_calloc(n_terrains, sizeof(*terrains)); for (uint32_t t = 0u; t < n_terrains; t++) { CUDA_CHECK(cudaSetDevice((int)t)); terrains[t] = terrain_create(); /* Deterministic, distinct seed per terrain */ const uint64_t terr_seed = cmb_random_fmix64(master_seed, SEED_TERRAIN + t); terrain_initialize(terrains[t], fwidth_nm, fheight_nm, ref_lat, ref_lon, terr_seed); /* Set biome tables per GPU stream */ sensor_gpu_load(); } ``CUDA_CHECK`` is a macro to check the outcome of the CUDA calls, see ``tut_5_3.h``. The last function call ``sensor_gpu_load()`` creates a lookup-table for the GPUs to use, defined as C-callable CUDA code in ``tut_5_3.cu``. It saves a few compute cycles on each trial since these lookup values are unchanged across trials. We want to find out the effect of AWACS altitude on the detection ability. We will set up our experiment as a series of trials for flight levels from FL100 (feet x 100, about 3000 m) up to FL400 (about 12100 m) in 15 steps of 25 each. We have two GPU's available, so we will try each flight level on two different terrain maps. And we want 10 replications for each combination of flight level and map, in total 300 trials. We can then create the experiment in the usual way: .. code-block:: C printf("Setting up experiment\n"); const double fl_start = 100.0; const double fl_step = 25.0; const uint32_t n_levels = 15u; const uint32_t n_trials = n_levels * n_terrains * n_replications; struct trial *experiment = cmi_calloc(n_trials, sizeof(*experiment)); uint32_t ui_trl = 0u; double flt_lvl = fl_start; for (uint32_t ul = 0u; ul < n_levels; ul++) { for (uint32_t ut = 0u; ut < n_terrains; ut++) { for (uint32_t ur = 0u; ur < n_replications; ur++) { experiment[ui_trl].flight_level = flt_lvl; experiment[ui_trl].terrain = terrains[ut]; experiment[ui_trl].seed = cmb_random_fmix64(master_seed, SEED_TRIAL + ui_trl); experiment[ui_trl].dur_s = dur_h * 3600.0; /* Sentinel value to detect aborted trials */ experiment[ui_trl].num_found = -1; ui_trl++; } } flt_lvl += fl_step; } Note how we use ``cmb_random_fmix64()`` to generate unique-but-deterministic seeds for each trial. We could simply call ``cmb_random_hwseed()`` to obtain unique seeds from CPU entropy as in earlier tutorial examples, but it would not be reproducible. Instead, ``cmb_random_fmix64()`` hashes the master seed (from CPU entropy or command line argument) with a per-trial value to set the seed to use per trial. Only one thing remains: The worker threads need to know what CUDA streams to use. Ciba provides this critical connection by the function ``cimba_thread_hooks_set()``. It takes three arguments. First a pointer to an initializing function, second an argument to be passed to that function when called, and third a pointer to an exit function. For our AWACS model, the init function could look like this: .. code-block:: C static void *gpu_thread_init(uint64_t tid, void *usrarg) { cmb_unused(tid); const int n_gpus = (int)(intptr_t)usrarg; struct gpu_thread_ctx *ctx = cmi_malloc(sizeof(*ctx)); ctx->n_gpus = n_gpus; ctx->streams = cmi_malloc((size_t)n_gpus * sizeof(cudaStream_t)); for (int d = 0; d < n_gpus; d++) { CUDA_CHECK(cudaSetDevice(d)); CUDA_CHECK(cudaStreamCreate(&ctx->streams[d])); } return ctx; } The first argument, ``tid``, is the thread identifier, not used here. The second argument is a ``void *`` that can point to anything the application needs to pass to a worker thread. In this case, we simply pass the number of GPUs in the system. We could also obtain that number directly by calling ``CUDA_CHECK(cudaGetDeviceCount(&n_gpus))`` from here, but we already did it, and we wanted to show how to pass an argument to this thread initializing function. The function creates a thread context for the worker thread's use. Here, it is a lookup table of CUDA streams, one per GPU. Each worker thread running this will set up a CUDA stream per GPU in the system. It then returns a ``void *``, again basically anything the application wants to store as a thread context. The return value will be stored in thread local memory once the thread starts running, and can be obtained from user code by calling ``cimba_thread_context()``. Internally in ``cimba.c`` the following happens: .. code-block:: C /* User-defined context per thread */ CMB_THREAD_LOCAL void *cmi_thread_context = NULL; .. code-block:: C /* Any user-defined initialization needed? */ if (cmg_thread_init_func != NULL) { cmi_thread_context = cmg_thread_init_func(tid, cmg_thread_init_usrarg); } One example of using this is in ``tut_5_3.c``'s ``run_trial()`` function. The first thing it does is to look up what device the terrain map to use is on, and then to look up its CUDA stream for that device: .. code-block:: C void run_trial(void *vtrl) { cmb_assert_release(vtrl != NULL); struct trial *trl = vtrl; struct terrain *terp = trl->terrain; const int dev = terp->device; CUDA_CHECK(cudaSetDevice(dev)); const struct gpu_thread_ctx *ctx = cimba_thread_context(); cudaStream_t stream = ctx->streams[dev]; /* ... */ The corresponding exit function cleans up the memory allocations before exiting the worker thread, receiving the stored ``cmi_thread_context`` as its argument. Whatever the return value from the init function is, the exit function will receive it as its argument. .. code-block:: C static void gpu_thread_exit(void *vctx) { struct gpu_thread_ctx *ctx = vctx; if (ctx == NULL) return; for (int d = 0; d < ctx->n_gpus; d++) { cudaSetDevice(d); cudaStreamDestroy(ctx->streams[d]); } cmi_free(ctx->streams); cmi_free(ctx); } Our ``main()`` can then run the experiment, ensuring that the Cimba worker thread associate each trial with its correct terrain map: .. code-block:: C printf("Baiting the hooks\n"); cimba_thread_hooks_set(gpu_thread_init, (void *)(intptr_t)n_gpus, gpu_thread_exit); printf("Running the simulation ...\n"); cimba_run(experiment, n_trials, sizeof(*experiment), run_trial); This mechanism to set callback functions that execute at the beginning and end of each worker thread and store an application-defined thread local state is a very flexible (and somewhat abstract) tool. Here, we use it as the only real integration point between the Cimba worker threads and the CUDA streams. That is why it was created. However, it is not CUDA specific, and can also be used for other purposes such as integration to other GPGPU APIs or even hardware-in-the-loop purposes. We also remove the per-trial visualization output and add a Gnuplot chart, compile and run, and get output like this: .. code-block:: none [ambonvik@Threadripper cimba]$ time build/tutorial/tut_5_3 Setting up experiment Using 64 threads Baiting the hooks Running the simulation ... 0 00:00:00.0 dispatcher run_trial (755): Start trial: Flight level: 100 GPU: 0 Seed: 0xaabc464bf835ba01 1 00:00:00.0 dispatcher run_trial (755): Start trial: Flight level: 100 GPU: 0 Seed: 0x0187615b9b563892 2 00:00:00.0 dispatcher run_trial (755): Start trial: Flight level: 100 GPU: 0 Seed: 0xfdf3e1d69dc021f1 3 00:00:00.0 dispatcher run_trial (755): Start trial: Flight level: 100 GPU: 0 Seed: 0x166396ef78583e12 4 00:00:00.0 dispatcher run_trial (755): Start trial: Flight level: 100 GPU: 0 Seed: 0xbe4c7beb66bfa52d 5 00:00:00.0 dispatcher run_trial (755): Start trial: Flight level: 100 GPU: 0 Seed: 0x53665865db06f479 6 00:00:00.0 dispatcher run_trial (755): Start trial: Flight level: 100 GPU: 0 Seed: 0x4e7eaff0ecc53516 7 00:00:00.0 dispatcher run_trial (755): Start trial: Flight level: 100 GPU: 0 Seed: 0x4836e23a98da025c 8 00:00:00.0 dispatcher run_trial (755): Start trial: Flight level: 100 GPU: 0 Seed: 0xc7fcca2edd7e7672 9 00:00:00.0 dispatcher run_trial (755): Start trial: Flight level: 100 GPU: 0 Seed: 0x4f08ba1ab652f797 11 00:00:00.0 dispatcher run_trial (755): Start trial: Flight level: 100 GPU: 1 Seed: 0x1b6791020c37e265 13 00:00:00.0 dispatcher run_trial (755): Start trial: Flight level: 100 GPU: 1 Seed: 0x24f4a0c7a2b7fc85 14 00:00:00.0 dispatcher run_trial (755): Start trial: Flight level: 100 GPU: 1 Seed: 0x8ca5527f505a41cc 16 00:00:00.0 dispatcher run_trial (755): Start trial: Flight level: 100 GPU: 1 Seed: 0xe91fddb0016691b5 12 00:00:00.0 dispatcher run_trial (755): Start trial: Flight level: 100 GPU: 1 Seed: 0xd32c5262396d4a28 21 00:00:00.0 dispatcher run_trial (755): Start trial: Flight level: 125 GPU: 0 Seed: 0x5ae7ed646f78f29c 10 00:00:00.0 dispatcher run_trial (755): Start trial: Flight level: 100 GPU: 1 Seed: 0x54a883eb38e018ca 17 00:00:00.0 dispatcher run_trial (755): Start trial: Flight level: 100 GPU: 1 Seed: 0xd4c7af89365af186 18 00:00:00.0 dispatcher run_trial (755): Start trial: Flight level: 100 GPU: 1 Seed: 0x79ee62007b579a97 19 00:00:00.0 dispatcher run_trial (755): Start trial: Flight level: 100 GPU: 1 Seed: 0x6aef2b4d821bf4ba ... 263 06:00:00.0 dispatcher run_trial (834): End trial: Flight level: 425 Found: 220 261 06:00:00.0 dispatcher run_trial (834): End trial: Flight level: 425 Found: 209 262 06:00:00.0 dispatcher run_trial (834): End trial: Flight level: 425 Found: 231 285 06:00:00.0 dispatcher run_trial (834): End trial: Flight level: 450 Found: 221 283 06:00:00.0 dispatcher run_trial (834): End trial: Flight level: 450 Found: 215 289 06:00:00.0 dispatcher run_trial (834): End trial: Flight level: 450 Found: 244 281 06:00:00.0 dispatcher run_trial (834): End trial: Flight level: 450 Found: 213 284 06:00:00.0 dispatcher run_trial (834): End trial: Flight level: 450 Found: 205 280 06:00:00.0 dispatcher run_trial (834): End trial: Flight level: 450 Found: 248 287 06:00:00.0 dispatcher run_trial (834): End trial: Flight level: 450 Found: 220 286 06:00:00.0 dispatcher run_trial (834): End trial: Flight level: 450 Found: 227 282 06:00:00.0 dispatcher run_trial (834): End trial: Flight level: 450 Found: 221 288 06:00:00.0 dispatcher run_trial (834): End trial: Flight level: 450 Found: 232 Finished, calculating stats real 1m13.137s user 34m35.338s sys 7m0.024s *We just ran 300 trials of our AWACS simulation in 73 seconds of wall clock time on a desktop PC.* Each trial ran for 6 simulated hours, involved 1000 active targets on a 1000 x 1000 nautical mile map with 1 arcsecond resolution, and an active radar with dwell time of 0.04 seconds doing raymarching against the terrain map for each dwell. The experiment produced this chart for the number of targets detected as a function of the AWACS flight level: .. image:: ../images/tut_5_3.png By default, Cimba will put one worker thread per CPU core reported by the operating system. That is often twice the number of physical CPU cores. Cimba also provides functions to manage this. One can get the number of Cimba threads by calling ``uint32_t cimba_threads_num()`` and set the number of threads to be used by calling ``uint32_t cimba_threads_use(uint32_t n_threads)``. The argument ``0`` means "use default". It will return the number of threads actually used used, i.e., the number of CPU cores if given the argument 0. The optimal number of threads to use will depend on the task. In general, using a number of threads between 1.5 to 2.5 times the number of physical CPU cores is close to optimal, with a rather flat curve through that region and random run-to-run variation swamping any meaningful differences. The Cimba default number of threads is a reasonable choice. Summary ------- In these five tutorials, we have gone from a basic M/M/1 queue simulation to a near military grade AWACS scenario running on 64 CPU cores and two GPUs (10496 cores each) in parallel, efficiently utilizing all the compute power available on a recent multi-core, multi-GPU desktop PC for discrete event simulation. For more detailed information about the various features of Cimba, please see the in-depth :ref:`the background section ` and the detailed :doc:`API reference pages `.