Ruminations of idle rants and ramblings of a code monkey

How StreamInsight Data Is Different – Part I


One of the key challenges that developers (from what I’ve seen) have when getting started with StreamInsight is in dealing with and understanding how data is handled. The concepts of input and output adapters is actually relatively simple and familiar to developers everywhere so that’s simple. The language used by SI to handle the data – LINQ – is also pretty familiar to developers, in this case, deceptively familiar. But once they get into writing the queries, things become more difficult.

Let’s start first with what developers are used to – 3-dimensional data. Data is identified and “found” using 3 pieces of information: source, row and column. This tells you where it is. We’re used to seeing things in 3 dimensions; it’s how our world is shaped. Given 3 pieces of spatial values, you can determine either size or location. Yes, these things may change – new building are constructed, existing buildings are reconfigured or renovated, wrecking balls take buildings out. But finding something now only requires those 3 pieces of information. And using this information, you can go to a restaurant. Or you can go on walkabout and explore to see what’s there right now. It is information that is stored and then retrieved on demand. From an experiential perspective, it’s memory and recall.

StreamInsight is different. There is an additional dimension added to things – the dimension of time. If, say, you are meeting a friend to have lunch, you need to know more than just where to meet him. You need to know when to meet him. You don’t need to know the “when” dimension to find the restaurant. And this is exactly the dimension that StreamInsight adds to our data. But it’s actually more than that. Saying when to meet our friend is still using it as an attribute, not as a dimension. Instead, it is like how we actually experience the lunch as it is occurring, how our brain processes the events and happenings around us. Like our own daily experience of events in time, data comes to us and passes by. Once a moment has passed, that moment is gone and lost to history. We may remember things, write them down, take a picture and so on … but then we are back to stored data that is a snapshot of time, but is not happening in time. As things (events) are happening, our brain processes them, correlates them, stores some away in memory and tosses some of it out. StreamInsight, as a CEP engine, does much the same thing that our brains do approaching the world in time. It wouldn’t, IMHO, be unreasonable to say that our brains are the most complex CEP engine that there is.

Let’s take this a little further and use it to explain some of the core StreamInsight concepts. Like the real world, SI has events and these events have different temporal characteristics.

Let’s go back to meeting our friend for lunch. We go to the at a specific time restaurant to meet our friend. When we go into the restaurant, we open the door and walk in. It is an event in time but it’s not something that we’d typically remember later nor would we care to. This is a point event; there and then gone. If there is something unusual about this event … say, for example, the door falls off of its hinges when we open it … but we’ll touch on that a little later.

This restaurant that we’re going to isn’t open 24 hours; they are open from 10:00 AM to 11:00 PM on some days and from 10:00 AM to 2:00 AM on some other days. But we know what those hours are going to be and that the restaurant will have a status of “open” during this time period. While the restaurant is open, other things happen … customers come in, place orders and pay their bill. At the end of the day, we have a total number of customer, total number of orders and total amount of money that pass through the register. This, too, is an interval with associated data, one that is just like an aggregate database query. We can have both of these types of intervals in StreamInsight – one where the data associated with it is known in the beginning and one that has data that is calculated based on a time window. Either way, there is a known and definite start and end.

Now, back to our lunch. We know we are at lunch and we know who our friend is when it starts but we don’t know, at the time we start, when it’s going to end. We know it will but there are a lot of factors that come into play – how quickly the kitchen is running, how busy our waitress is, a lengthy discussion with our friend the kids and wives or yesterday’s baseball game. Once our lunch is done, we can mark down the end time and move on with our day. This is the edge event. You have a start and sometime later have an end but you don’t know when that end will be when the event itself starts.

If you want to get technical, everything happens over a time span, even if it’s a very short one. Likewise, StreamInsight handles all of these events internally as interval events. Point events have a time span of a tick and edge events start out with a time span of infinity and at the end has the end time set. Interval events are, I think, self-explanatory.

Where do all of these events come from? In our daily experiences, we learn about things that happening through our 5 senses. Or, perhaps more accurately, we can get it through our 5 senses. We could close our eyes or plug our ears. Our senses are our input adapters. Most of these events come and go with little or even no memory or their occurring (it’s a MASSIVE amount of data that our brains routinely handle) but unusual or extraordinary events are remembered, written down, photographed or video recorded. We can then review these events later, whether it’s to relive them or analyze them deeper. And this is exactly what our output adapters are for. But, just like the events that occur around us, not all events are sent to output adapters, only those events that meet certain conditions. In StreamInsight, we use LINQ queries to do this determination. You won’t, in many cases, want to store these events and, besides, StreamInsight can handle and analyze far more events than we could reasonably send to a disk. A traditional database is always a replay of the past … you select things from tables that have already happened, you analyze them in cubes, create reports, etc. StreamInsight can help you get the appropriate, important information into these storage-based data sources.

In the next article, I’ll go further on this and talk about how events are joined and unioned in time.