Secundus Posting (of sorts)
It’s been a while since there’s been any new content on here (well, let’s be honest, a very long time). My role at Microsoft has shifted a bit in the last year, going down a slightly different direction and focusing on a new technology area. I shifted over to covering StreamInsight on the SQL Customer Advisory Team, focusing on helping customers and partners with building large scale event driven applications.
After working on some incredible projects, and with the launch of StreamInsight as part of SQL Server 2008 R2, finally carving out enough time to talk about some of the best practices we’ve learned through launching the product, and share some (hopefully) valuable information about how to roll out your own StreamInsight implementations.
So, now that I’ve rambled on about a role change, let’s set the stage for the things of things that will wend their way to the web through this humble blog:
What is StreamInsight?
Let’s think about the wild world of analytics, and the value of impatience with an analogy. Anybody who has ever been subjected to a presentation from me is familiar with my love of a good analogy (well, love of a really bad analogy, let’s be honest here), so this blog won’t really be much different. Note: this scenario is completely contrived; we’ll get to the real ones later.
Suppose I had to answer a question “how many red cars are in the parking lot behind your office”. Let’s look at how we’d answer this question, both in a physical manner, and how this problem would be addressed in a relational database:
|
|
|
|
Seems pretty straightforward. We can think about optimizing these queries by doing things such as parking all of the cars on one side of the lot, then parking them by color (or, in the relational world, adding an index). For this class of questions, we have a great platform for delivering answers. However, there’s another class of problems that aren’t well addressed by relational data engines.
Suppose I were asked a slightly different questions – how many cars have passed your office on the highway in the last 20 minutes. If we were to approach answering this problem in a pure relational sense our analogy would be:
|
|
|
|
All of a sudden this doesn’t seem like the most efficient way to provide the answer in either realm. This is where StreamInsight steps in to help fill the gap – by providing a stream processing platform with the capabilities to address delivering insight about streaming data at scale. In the StreamInsight world our new analogy is:
|
|
|
|
var x = from e in cars
where e.Type == "AUTO" && e.Color == "RED"
select e;
var count = from e in x.HoppingWindow(
TimeSpan.FromMinutes(2), TimeSpan.FromSeconds(30),
HoppingWindowOutputPolicy.ClipToWindowEnd)
select new
{
count = e.Count()
};
The LINQ statement above is a StreamInsight query that asks the question:
- Look at a stream of data containing observations about vehicles (the cars stream).
- Filter out any events that aren’t AUTO (mobiles) and painted RED.
- Create a window in time, look at the last 2 minutes, and advance the window every 30 seconds.
- In each window, count the number of events (i.e. the number of red cars in that 2 minute window), and report the aggregate.
Err.. wait.. how does this LINQ stuff magically solve my problem of observing data in flight? Well, that’s some of the magic of StreamInsight. It provides a platform for:
- In-memory stream processing engine.
- Connecting streaming data sources (adapters) to continuously running standing queries (queries).
- Asking questions about temporal and relational data (windows in time, relationships in time), including dynamically chaining and composing queries.
- Extensibility for queries, operators, aggregates and data connectivity.
What is it good for?
StreamInsight is designed to enable you to develop powerful event processing applications at scale. This lends itself to a range of scenarios across various industries, such as:
Manufacturing |
|
Financial Services |
|
Operational Analytics |
|
Utilities |
|
Basically, anything where you have large volumes of streaming data and need to process results with low latency.
Sounds cool, how do I get started?
To get started you’ll want to be familiar with C# and LINQ, and have a copy of Visual Studio (2008 or 2010). Once you’ve got the baseline in place:
- Download and install StreamInsight from the StreamInsight product page (or MSDN/TechNet if you’re a subscriber)
- Download the reference samples from codeplex onto your local hard drive.
- Since codeplex downloads (along with most internet downloads) are not trusted by default, you’ll need to “Unblock” the .zip file (best to do this before unzipping, doing it file by file is grossly annoying).
- Right click on the StreamInsightProductTeamSamples zip file, then click on Properties.
- Click on Unblock to “trust” the file.
- From Visual Studio, open up the Applications\TrafficJoinQuery\TrafficJoinQuery.sln solution.
- This application uses simulated traffic data (from two .csv files contained in the solution), with the query listed in the CreateQueryTemplate function (in Program.cs).
- Press F5 to debug the application, and execute the query against the two simulated data streams. This will result in output similar to the following:
Creating CEP Server Creating CEP Application Registering LINQ query template Registering Adapter Factories Registering bound query Start query INSERT 6/25/2009 12:00:00 AM +00:00 1 1001 18 INSERT 6/25/2009 12:00:00 AM +00:00 1 1003 19 -- snip -- NSERT 6/25/2009 12:18:20 AM +00:00 2 1004 21 INSERT 6/25/2009 12:18:20 AM +00:00 2 1005 19 INSERT 6/25/2009 12:18:20 AM +00:00 2 1006 21 INSERT 6/25/2009 12:18:20 AM +00:00 3 1007 22 CTI 12/31/9999 11:59:59 PM +00:00 Diagnostic View for 'cep:/Server/EventManager': AllEventsCount: 18 AllEventsMemory: 417792 Diagnostic View for 'cep:/Server/PlanManager': -- snip -- QueryTotalIncomingEventCount: 497 QueryTotalConsumedEventCount: 497 QueryTotalProducedEventCount: 192 QueryTotalOutgoingEventCount: 192 -- snip -- |
Showing the StreamInsight engine starting up, activating the query, pumping some output, then dumping diagnostic information.
Where else can I learn about this stuff?
I’ll be pumping out a series of blog posts over the coming months addressing specific aspects of building StreamInsight applications, including adapter development, query authoring, hosting StreamInsight, connecting with other technology platforms (such as AppFabric Cache). Just to kick things off in the interim here’s a few great links:
- StreamInsight product blog
- StreamInsight forum
- StreamInsight documentation on MSDN
- Hitchhiker’s guide to StreamInsight Queries
Comments
- Anonymous
May 17, 2012
I tried the sample with v2.0. The sample did not initially compile. To fix:
- Set the target frameowork to .NET Framework 4.0. Right-click the project name (SimpleTextFileReader, SimpleTextFileWriter, TrafficJoinQuer) and click Properties. In the Application tab, reseet the Target framework.
- Delete the old references and reset using the new references For these projects: SimpleTextFileReader, SimpleTextFileWriter Open the References folder. Delete these two references: Microsoft.ComplexEventProcessing Microsoft.ComplexEventProcessing.Adapters Right-click the References folder and click Add. In the .NET Framework tab, re-add the two namespaces that were deleted.