Analyzing Security Trends Using RSA Exhibitor Descriptions

The data used for this post is available here. A word of warning, I only have complete data set for 2014 and 2015. For 2008-2013, I have what I consider to be representative samples. So please take the result set with a big bucket of salt.

After going through this analysis, the big question I wonder out loud is:

How can vendors differentiate from each other and stand above the crowd when everyone is using the same words to describe themselves?

The annual security conference, RSA 2015, is right around the corner. Close to 30,000 attendees will descend into San Francisco Moscone Center to attend 400+ sessions, listen to 600+ speakers and talk to close to 600 vendors and exhibitors.

For me, the most interesting aspect of RSA is walking the expo floor, and listening to how vendors describe their products. Intuitively, the vendor marketing messages should have a high degree of correlation to what customers care about, even if the messages trail the actual pain points slightly.

This post highlights some of the unsurprising findings from analyzing 8 years worth of RSA Conference exihibitor descriptions.

It is interesting how almost all vendor descriptions use the same set of words to describe themselves, and these words mostly haven’t changed over the past 8 years. For example, the following table shows the top 10 words used in RSA conference exhibitor descriptions for the past 8 years. You can find the complete word list at …

# 2008 2009 2010 2011 2012 2013 2014 2015
1 secure secure secure secure secure secure secure secure
2 solution solution solution solution solution solution solution solution
3 network manage manage network provide provide provide provide
4 provide provide network provide manage manage network data
5 manage data protect manage network service manage network
6 enterprise network provide data information more data protect
7 data company data information software software protect threat
8 product service organization enterprise enterprise information threat manage
9 technology software information technology data enterprise service service
10 application busy risk product more customer enterprise enterprise

Here’s a word cloud that shows the 2015 top words. You can also find word clouds for 2008, 2009, 2010, 2011, 2012, 2013, 2014.

Compliance Down, Threats Up

While the macro trend has not changed dramatically for the exhibitor descriptions, there have been some micro trends. Here are a couple of examples.

First, the use of the word compliance has gone down over the years, while the word threat has gone up. After 2013, they changed places with each other.

This finding is probably not surprising. At the end of 2013, one of the biggest breaches, Target, happened. And over the next two years we’ve seen major breaches of Sony, Anthem, Home Depot, Premera and many others. Threats to both the corporate infrastructure as well as top executive jobs (just ask Target’s CEO Gregg Steinhafel, or Sony’s Co-Chairwoman Amy Pascal) are becoming real. So it seems natural for the marketers to start using the word threat to highlight their solutions.

Compliance was a big use case in security for many years, and many vendors have leveraged the need for compliance to build their company and revenue pipeline since the mid-2000s. However, use cases can only remain in fashion for so long before customers get sick of hearing about them, and vendors need new ways of selling their wares to customers. So it looks like compliance is finally out of fashion around 2011 and started declining in exhibitor descriptions.

Mobile and Cloud Up

The words mobile and cloud has gained dramatically in rankings over past 8 years. In fact, it’s been consistently one of the top words used in the last 4. For anyone who hasn’t been hiding under a rock in the past few years, this is completely unsurprising.

The cloud war started to heat up back in 2009 when most major service providers have felt the Amazon Web Services threat and all wanted to build their own clouds. In fact, I joined VMware in 2009 to build out their emerging cloud infrastructure group to specifically help service providers build their cloud infrastructures. Eventually, in 2011, VMware decided to get into the game and I built the initial product and engineering team that developed what it’s now known as vCloud Air (still have no idea why this name is chosen).

As more and more workloads move to the cloud, requirements for protecting cloud workloads quickly appeared, and vendors natually started to position their products for the cloud. So the rise in cloud rankings matches what I’ve experiened.

About the same time (2010, 2011 or so), more and more corporations are providing their employees smartphones, and workers are becoming more and more mobile. The need for mobile security became a major requirement, and a whole slueth of mobile security startup came into the scene. So natually the mobile word rose in rankings.

Virtual and Real-Time Regaining Ground

The words virtual and real-time dropped dramatically in rankings for a couple of years (2010, 2011) but have since regained all the lost ground and more. I have no precise reasons on why that’s the case but I have some theories. These theories are probably completely wrong, and if you have better explanations I would love to hear from you.

  • Virtual lost to cloud during that timeframe as every vendor is trying to position their products for the cloud era. However, virtual infrastructures haven’t gone away and in fact continue to experience strong growth. So in the past couple of years, marketers are covering their basis and starting to message both virtual and cloud.
  • The drop in rankings for real-time is potentially due to the peak of compliance use case, which is usually report-based and does not have real-time requirements. Also, I suspect another reason is that SIEM, which real-time is critical, is going out of fashion somewhat due to the high cost of ownership and lack of trust in the tools. However, given the recent rise of threats, natually real-time becomes critical again.

Other Findings

The word cyber gained huge popularity in the past 3 years, likely due to the U.S. government’s focus on cyber security. The word malware has been fairly consistently at the top 100 words since 2010.

The words product and service switched places in 2013, likely due to the increase in number of security software-as-a-service plays.


  • The word clouds and word rankings are generated using Word Cloud.
  • The actual vendor descriptions are gathered from the RSA web site as well as press releases from Business Wire and others.
  • Charts are generated using Excel, which continues to be one of the best friends for data analysts (not that I consider myself one).

Papers I Read: 2015 Week 8

Random Ramblings

Another week, another report of hacks. This time, The Great Bank Robbery, where up to 100 financial institutions have been hit.Total financial losses could be as a high as $1bn. You can download the full report and learn all about it.

Sony spent $15M to clean up and remediate their hack. I wonder how much these banks are going to spend on tracing the footsteps of their intruders and trying to figure out exactly where they have gone, what they have done and what they have taken.

I didn’t make much progress this week on either sequence or surgemq because of busy work schedule and my son getting sick AGAIN!! But I did merge the few surgemq pull requests that the community has graciously contributed. One of them actually got it tested on Raspberry! That’s pretty cool.

I also did manage to finish up the experimental json scanner that I’ve been working on for the past couple of weeks. I will write more about it in the next sequence article.

Actually I am starting to feel a bit overwhelmed by having both projects. Both of them are very interesting and I can see both move forward in very positive ways. Lots of ideas in my head but not enough time to do them. Now that I am getting feature requests, issues and pull requests, I feel even worse because I haven’t spent enough time on them. <sigh>

Papers I Read

Memory is rapidly becoming a precious resource in many data processing environments. This paper introduces a new data structure called a Compressed Buffer Tree (CBT). Using a combination of buffering, compression, and lazy aggregation, CBTs can improve the memoryefficiency of the GroupBy-Aggregate abstraction which forms the basis of many data processing models like MapReduce and databases. We evaluate CBTs in the context of MapReduce aggregation, and show that CBTs can provide significant advantages over existing hashbased aggregation techniques: up to 2× less memory and 1.5× the throughput, at the cost of 2.5× CPU.

Stream processing has become a key means for gaining rapid insights from webserver-captured data. Challenges include how to scale to numerous, concurrently running streaming jobs, to coordinate across those jobs to share insights, to make online changes to job functions to adapt to new requirements or data characteristics, and for each job, to efficiently operate over different time windows. The ELF stream processing system addresses these new challenges. Implemented over a set of agents enriching the web tier of datacenter systems, ELF obtains scalability by using a decentralized “many masters” architecture where for each job, live data is extracted directly from webservers, and placed into memory-efficient compressed buffer trees (CBTs) for local parsing and temporary storage, followed by subsequent aggregation using shared reducer trees (SRTs) mapped to sets of worker processes. Job masters at the roots of SRTs can dynamically customize worker actions, obtain aggregated results for end user delivery and/or coordinate with other jobs.

Not just a paper, it’s a whole book w/ 800+ pages.

The purpose of this book is to help you program shared-memory parallel machines without risking your sanity.1 We hope that this book’s design principles will help you avoid at least some parallel-programming pitfalls. That said, you should think of this book as a foundation on which to build, rather than as a completed cathedral. Your mission, if you choose to accept, is to help make further progress in the exciting field of parallel programming—progress that will in time render this book obsolete. Parallel programming is not as hard as some say, and we hope that this book makes your parallel-programming projects easier and more fun.

Papers I Read: 2015 Week 7

Random Ramblings

Well, another week, another big data breach. This time is Anthem, one of the nation’s largest health insurers. Ok, maybe it was last week that it happend. But this week they revealed that hackers had access … going back as far as 2004. WSJ blamed Anthem for not encrypting the data. Though I have to agree with Rich Mogull over at Securosis that “even if Anthem had encrypted, it probably wouldn’t have helped”.

I feel bad for saying this but there’s one positive side effect from all these data breaches. Security is now officially a boardroom topic. Anthem’s CEO, Joseph Swedish, is now under the gun because top level executives are no longer immune to major security breaches that affect the company’s top line. Just ask Target’s CEO Gregg Steinhafel, or Sony’s Co-Chairwoman Amy Pascal.

Brian Krebs wrote a detailed piece analyzing the various pieces of information available relating to the Anthem hack. Quite an interesting read.

One chart in the artile that Brian referred to is the time difference between the “time to compromise” and the “time to discovery”, taken from Verizon’s 2014 Data Breach Investigations Report. As Brian summaries, “TL;DR: That gap is not improving, but instead is widening.”

What this really says is that, you will get hacked. So how do you shorten the time between getting hacked, and finding out that you are hacked so you can quickly remediate the problem before worse things happen?

The time difference between the “time to compromise” and the “time to discovery.”

With all these data breaches as backdrop, this week we also saw “President Barack Obama signed an executive order on Friday designed to spur businesses and the Federal Government to share with each other information related to cybersecurity, hacking and data breaches for the purpose of safeguarding U.S. infrastructure, economics and citizens from cyber attacks.” (Gigaom)

In general I don’t really think government mandates like this will work. The industry has to feel the pain enough that they are willing to participate, otherwise it’s just a waste of paper and ink. Facebook seems to be taking a lead in security information sharing and launched their ThreatExchange security framework this week. along with Pinterest, Tumblr, Twitter, and Yahoo. Good for them! I hope this is not a temporary PR thing, and that they keep funding and supporting the framework.

Papers I Read

Another great resource of computer science papers is Adrian Coyler’s the morning paper. He selects and summarizes “an interesting/influential/important paper from the world of CS every weekday morning”.

I read this paper when I was trying to figure out how to make the FSAs smaller for the Effective TLD matcher I created. The FSM I generated is 212,294 lines long. That’s just absolutely crazy. This paper seems to present an interesting way of compressing them.

I am not exactly sure if PublicSuffix uses a similar representation but it basically represents a FSA as an array of bytes, and then walk the bytes like a binary search tree. It’s interesting for sure.

This paper is a follow-up to Jan Daciuk’s experiments on space-efficient finite state automata representation that can be used directly for traversals in main memory [4]. We investigate several techniques of reducing the memory footprint of minimal automata, mainly exploiting the fact that transition labels and transition pointer offset values are not evenly distributed and so are suitable for compression. We achieve a size gain of around 20–30% compared to the original representation given in [4]. This result is comparable to the state-of-the-art dictionary compression techniques like the LZ-trie [12] method, but remains memory and CPU efficient during construction.

This work presents integrated model for active security response model. The proposed model introduces Active Response Mechanism (ARM) for tracing anonymous attacks in the network back to their source. This work is motivated by the increased frequency and sophistication of denial-of-service attacks and by the difficulty in tracing packets with incorrect, or “spoofed”, source addresses. This paper presents within the proposed model two tracing approaches based on: • Sleepy Watermark Tracing (SWT) for unauthorized access attacks. • Probabilistic Packet Marking (PPM) in the network for Denial of Service (DoS) and Distributed Denial of Service (DDoS) attacks.

Here we introduce the design of Dapper, Google’s production distributed systems tracing infrastructure, and describe how our design goals of low overhead, application-level transparency, and ubiquitous deployment on a very large scale system were met. Dapper shares conceptual similarities with other tracing systems, particularly Magpie [3] and X-Trace [12], but certain design choices were made that have been key to its success in our environment, such as the use of sampling and restricting the instrumentation to a rather small number of common libraries.

Not a paper, but a good write up nonetheless.

Some people call it stream processing. Others call it Event Sourcing or CQRS. Some even call it Complex Event Processing. Sometimes, such self-important buzzwords are just smoke and mirrors, invented by companies who want to sell you stuff. But sometimes, they contain a kernel of wisdom which can really help us design better systems. In this talk, we will go in search of the wisdom behind the buzzwords. We will discuss how event streams can help make your application more scalable, more reliable and more maintainable.