One use case for StreamInsight in an enterprise environment is to have a hub and spoke architecture. In such an architecture, you would have multiple downstream StreamInsight instances (the spokes) that sit close to the data source to do event detection and processing on very high speed data that, very simply, can’t be sent to a centralized server due to network latency and limited bandwidth. This server would also downsample and (likely) filter data that is of interest to a centralized operations center where there is an aggregating StreamInsight server (the hub). In the case of an “interesting” event or set of events, additional information can be “turned on” at the downstream server to add to the existing feed, thereby optimizing the use of limited bandwidth while preserving the ability to view and collect data that is of critical interest. The aggregating StreamInsight server can then provide end-user interfaces with data as well as do additional aggregated analysis across all of the spokes.
Two example use cases for this:
Oil & Gas Production: In O&G, you would have a downstream StreamInsight server that sits at the drill site, whether that be an offshore rig or an onshore platform. In both of these scenarios, it is entirely likely that there are very limited pipes back to a central operations center. Furthermore, since these installations can have thousands or tens of thousands of sensors, not all of this information will be useful – or even desirable – in the onshore operations center but still needs to be processed and analyzed onsite. Aggregated, filtered data and even calculated data based on the raw sensor feeds would be forwarded to a central StreamInsight server back “on the beach” for analysis across platforms and monitoring. Cross-platform aggregation would be interesting and useful when multiple platforms are using the same pipelines to make sure that there is capacity in the pipeline for current production as well as to optimize capacity usage.
Utilities – SmartGrid: This scenario provides an even better use case for this kind of architecture. A single utility company will have millions of smart meters installed across their service area. Utility companies are also installing “smart transformers” that provide data related to transformer performance. StreamInsight may be fast (it is) and may be able to handle a lot of data at high frequency (it can) but it can only do so much. Having a single StreamInsight server processing the data from all of a utlity company’s smart meters and transformers simply isn’t realistic. Like the O&G scenario above, downstream StreamInsight servers would collect information from individual meters and transformers, downsample and aggregate and then send to a centralized server. In fact, with utilities, there may be a couple of layers of this, depending on the size of the service area and utility provider. Initial aggregate (at the source) by substation that is then fed to the hub server would be useful and interesting. From there, aggregation across substations can provide the information required to ensure that there is enough capacity on the grid for current usage as well as optimize the grid’s current capacity so there isn’t too much capacity that isn’t required. Aggregation by substation – or, for example, zip code – can help utility providers optimize and target certain areas for rolling brown/blackouts when necessary with the minimum impact required to keep the grid balanced.
My team has developed some adapters that are specifically designed for and intended to be used in these scenarios. As we’ve been doing this, we’ve also had a lot of discussion around how these should work, especially in cases where the inbound stream has a different shape than the target input adapter (e.g. Edge output –> Point Input). How are these translated from one to the other? What kind of event shapes are actually valid in these scenarios? Here’s what we’ve come up with:
|Output (Source) Adapter Shape||Input (Target) Adapter Shape||Valid Use Case||Comments|
|Edge||Edge||Yes||Events should be enqueued as they arrive. No translation should be done unless the Start Edge time is before the last issued CTI.|
|Edge||Interval||No||Typically, this is invalid. With an inbound interval from a downstream StreamInsight server, the start time is just about guaranteed to be before the last issued CTI. Because interval events aren’t released to the output adapter until the end time, it is also possible for there to be different start times in one inbound package.|
|Edge||Point||Yes||The point input adapter should only enqueue the End Edge with an event timestamp equal to the EndTime of the end edge. If the end time is before the last issued CTI, |
Alternatively, you could enqueue a point at both the start and the end or just the start, using the corresponding timestamp. If these alternatives are required use cases, it should be configurable.
|Interval||Edge||No||The only time this would be possible is to enqueue on the end edge event as it is only then that the end time (and total interval) is known. Since the start time is virtually guaranteed to be after the last issued CTI and since it is likely that different edge events that arrive together also have completely different start times, it is impossible to enqueue them correctly in the application timeline.|
We have determined, therefore, that this is an invalid use case.
|Interval||Interval||No||While it seems that there would be no translation required, that is not the case. As with the edge target above, the start time of the interval is virtually guaranteed to be before the last issued CTI.|
We have determined, therefore, that this an invalid use case.
|Interval||Point||Yes||Again, the start time is virtually guaranteed to be before the last issued CTI and start times in the same “group” will likely have different start times. Therefore, the point should be enqueued with the End Time of the inbound interval event.|
|Point||Point||Yes||No translation necessary. Enqueue the point with the original timestamp.|
|Point||Interval||No||While possible, it doesn’t really make much sense. You could enqueue the interval with an end time 1 tick past the start time but what would be the point?|
|Point||Edge||No||Just like with the interval target above, there isn’t a very good way to handle this that makes sense logically. Having a start edge and then an end edge with a 1 tick difference between start and end time defeats the purpose of an edge event.|
As you can see above, it’s not as straightforward as one would initially think. Because of their very nature, interval events are particularly problematic when coming in from a downstream StreamInsight server because their start times are typically in the past according to application time. Point events as a destination are universally valid use cases.
What about CTI’s? As I’m sure you are aware, CTI’s advance application time and are not necessarily based on the system clock - though they certainly can be if that is desired. In the use cases above, they would be at the source, downstream servers and the destination, upstream server would advance application time based on the source servers. Due to latency, this would like be a touch behind the system clock. Depending on the protocol used, it is possible that events may arrive out-of-order so, in some cases, this CTI should have a configurable time span to account for this. You may need to issue a CTI for, say 10:00:00 at 10:00:02 – but using the 10:00:00 timestamp. This will also ensure that any queries that span multiple downstream, input servers are synchronized. One thing to also note with all of the timestamps – it may be necessary, in some cases, to account for differences in the source server application clock with the target server application clock. This can happen in cases where the source application clock is advanced according to the source system clock and and the application clocks across the source servers aren’t fully synchronized. In both the O&G and Utilities scenarios above, this is an entirely possible use case. If that is the case, the target adapter will need to do any translation necessary between the systems – the target, upstream server will need to do this to fully and accurately coordinate the clocks between the multiple downstream servers.