Ruminations of J.net idle rants and ramblings of a code monkey

StreamInsight: The basic components

In my last post, I announced my new job and some information about it. In short, I’m working with a new technology from Microsoft called StreamInsight - a platform for Complex Event Processing (CEP). In short, dealing with and processing (near) real-time streaming data in ways that allow “the business” to monitor, record and slice/dice this information on the fly. That last part is the key thing … there is no disk involved, no store-and-evaluate – it’s all about evaluating the data that is coming in at VERY high rates (thousands of events per second) with very low latency (sub-second latency) in interesting ways. I say near real time because we cannot escape the latency of the wire feeding StreamInsight, at least until we can manipulate quantum coupling and utilize “spooky action at a distance” That’s the 50K foot view. I’m going to start here focusing on the basic components and, step by step, get lower and deeper into StreamInsight.

So … simply put, there are only 3 key components: Input Adapters, Output Adapters and Queries. I’ll talk about each.

Adapters:

Input and output adapters provide similar capabilities with different targets. I’ll touch on that later. But … in talking about “adapters” many developers think about BizTalk as adapters are a core components of BizTalk as well. They follow the same pattern (Adapter Pattern … as indicated from their name) … both live to connect to external sources and provide data to the underlying engine for further processing. But the similarities end there. BizTalk is designed to enable integration between multiple disparate systems and provides this information to the BizTalk engine for processing. It is not, by any stretch of the imagination, a real-time engine and it doesn’t pretend to be. Very deliberately, BizTalk allows things like long running transactions that are simply antithetical to StreamInsight. This isn’t to say that BizTalk is not a great platform; it’s simply recognition of where it lives in the ecosystem compared to StreamInsight. StreamInsight is all about real-time processing with very low latency – 10’s of thousands of events per second with sub-millisecond response time. BizTalk is about evaluating and processing transactions between multiple systems, enforcing rules on these transactions and providing information about this stuff. Latency is, at the end of the day, not a big deal. And yes, Biztalk can be the target of an output adapter.

Input Adapters

As the name implies, these are components that feed information to the StreamInsight engine from various underlying sources. Where is the data coming from? OPC? Databases? OSI PI? PerfMon counters? Other? There are typed and untyped adapters but, at the engine level, there is little difference between the two.

I’ll stop there for now. There will be more stuff on input adapters later.

Output Adapters

Where is the data going? Output Adapters are at the end of the chain … they provide the results of queries to any number of end sources. That’s all they do.

Again, I’ll stop there for now. More will be coming later.

Queries:

Queries are written in Linq. If you are a .NET developer, you know (or should know) Linq. There is no other syntax. Period. “Cool”, you say … “I know Linq and I can bend, twist and, when necessary, pervert Linq to my needs.” That’s good and I’m happy for you. But StreamInsight introduces a new dimension to Linq – really, querying in general – that few developers are used to. Yes, knowing Linq helps. Understanding Linq deeply – and the core composability inherent in Linq queries helps you. But there’s this extra dimension of time. With OLTP and OLAP databases, time is merely an attribute.  With StreamInsight, time and the timeline of events it not a mere attribute; it is the centerpiece of everything. It is an additional dimension to all queries, all joins, all unions, all aggregates … everything. This is the paradigm shift. This is what the average .NET developer needs to get his/her head around. The queries … and their timelines … are the centerpiece of StreamInsight. The adapters are incidental; they live only to serve the queries. And the queries are temporally bound. Don’t feel bad if you don’t “get it”. The paradigm shift required is, honestly, revolutionary. I can’t say that I have my head fully around it yet.

I can wax philosophical about this … time and the essence of time has, for a long time, been a philosophical interest of mine dating back to my college days as a Lit major/Philosophy minor. I also found relativity and its view of time fascinating at the very least. Letting go of a hard definition of time and accepting a more relativistic view will help you to understand StreamInsight … but only to a (limited) point. It helps in that the average, pre-conceived notions about time are already in question and you are willing to think outside of our typical view of time as it relates to data. Like I said, it’s a paradigm shift.

All of that said, I’ll leave you with a quote from my mostest favoritest poem from my mostest favoritest poet:

Time present and time past
Are both perhaps present in time future,
And time future contained in time past.
If all time is eternally present
All time is unredeemable.
What might have been is an abstraction
Remaining a perpetual possibility
Only in a world of speculation.
What might have been and what has been
Point to one end, which is always present.

Bonus points if you can name the writer and the poem. No cheating/Googling allowed.