Ruminations of J.net idle rants and ramblings of a code monkey

How StreamInsight Data is Different – Part II

StreamInsight

In a previous post, I outlined how StreamInsight adds an additional dimension to data – the dimension of time – and how events in StreamInsight correlate to events that we see and experience every day. I’ll now dig a little more into this topic.

As things happen in the world around us, seconds, minutes, hours go by and our brain processes and interprets these events. All the while things are happening, time is passing by at a regular, known pace … but that’s not how we experience it. Sometimes time flies by (having a good, interesting discussion over lunch with our friend) and sometimes it drags (waiting around to be seated and a very busy restaurant when you’re hungry). If, however, we use a video camera to record the lunch (because we’re strange like that), the camera will see the events at the regular, known, pace of seconds. So, depending on how we are getting the information in, time may “pass” differently. It may be based on information from our senses that we are processing or it may be external and ticking by regularly like the video camera. Now, back to lunch. My friend and I have placed our orders and our waiter brings our food out and places our orders in front of us. Did he place them down in front of us at the same time? It depends on how we are looking at it and what our time reference is for “at the same time”. Technically, unless he is some sort of super-waiter that can handle an order in each hand and perfectly synchronize his hands to put them both down at the exact same nanosecond, they aren’t happening at the same time. But we don’t usually split hairs like that and we don’t experience, mentally, simultaneity like that. Still, that’s a pretty limited time window. If the waiter brings out, say, my order first (mmmm … I has cheezburger) and then my friend’s order 5 minutes later (ewwww … chicken livers), we aren’t going to say that he brought our orders out at the same time. In StreamInsight, Current Time Increments (CTIs) provide a way to handle time windows and to understand what “at the same time” means within the context of the application. Like our experience, CTIs may or may not happen at regular intervals and they may move quickly or slowly. They may be controlled programmatically or declaratively by input adapter (IDeclareAdvanceTimeSettings), the query (AdvanceTimeGenerationSettings) or even another stream by importing CTIs. Our experience of time really isn’t much different. It is not uncommon to say that two events happened “at the same time” when, in fact, they were off by a few seconds (or more). When we say “at the same time”, we really mean that the events happened within a window that we understand to be simultaneous. And it is the CTIs that allow us to define that in StreamInsight.

When StreamInsight does joins and unions, it joins and unions those events that are happening at the same time … that is, events that are valid within the current CTI window. For our lunch example, if we had a stream of the “open hours” interval events, a stream for our “starting/ending lunch” edge event and a stream for the individual point events of our lunch, they would all participate in the join or union. If the event’s valid times are outside the CTI window, they will not participate in the join or union. Those valid times are determined by the start and end times on the events. Furthermore, these joins are happening on the data as it is happening … in a way very much like we perceive events happening in time during our daily experience. The human brain is, after all, a massively parallel complex event processing engine.

Going back to lunch, let’s also say that here is a thunderstorm outside during lunch. It’s raining cats and dog, lightning flashing, thunder rolling. We’re all familiar with first seeing the lightning and then hearing the thunder. We know that these events occur at the same time but we don’t “get” them at the same time. Instead, the thunder has latency when compared to the lightning; the event is actually simultaneous with the lightning but it’s arrival is some time later. In the StreamInsight, we have ways of handling this. First, when you handle your CTIs declaratively, either using IDeclareAdvanceTimeSettings or using AdvanceTimeGenerationSettings, you can specify a delay for late-arriving events that then allows these events to be processed with the appropriate, on-time arriving events. For events that arrive even later than that, you can specify dropping or adjusting those events. An example of IDeclareAdvanceTimeSettings is below. In this example, we’re saying “Everything that happens in a 5 second window is considered simultaneous. Wait for a second before that window is closed. Anything that comes in late should be adjusted to the current application time.”

var advanceTimeSettings = new AdapterAdvanceTimeSettings(
new AdvanceTimeGenerationSettings(
TimeSpan.FromSeconds(5), //Frequency of the CTITimeSpan.FromSeconds(1)), //Delay for late-arriving eventsAdvanceTimePolicy.Adjust);

One caveat with using IDeclareAdvanceTimeSettings … if your adapter is not enqueuing events, the StreamInsight engine will not keep enqueuing CTIs. The CTIs are enqueued only while your adapter is sending events into the queue. So … don’t think that it’s going to keep happily chugging away on, say, a reference stream adapter when you aren’t pushing events in.

If you are programmatically enqueuing your CTI events, you have even more control – you can really do whatever you want/need/desire to do since you enqueue the CTI time yourself.

Now … we’ve been talking a lot about time and you may think that it’s always happening now and related to the system clock. While that can be true, it’s not necessarily true. StreamInsight runs with application time, not system clock time. What does that mean? Well, it means, first of all, that events don’t have to be enqueued with a timestamp of DateTimeOffset.Now. You can enqueue events with any time that you want. You could, for example, re-run events from a log file using the original timestamp … and you would need to enqueue your CTIs with the proper (in the past) timestamps. You could call this “playback” mode … you’re reviewing events in the past as they happened; for example, watching a video of an event in your life. Like that video, you don’t have to watch it at 1x speed … you can fast forward through events. Your application time does not have to be now … it can be 10 years ago. When you enqueue your events and CTIs, you enqueue them with the appropriate timestamp 10 years ago as well.