A Modern App Developer and An Old-Timer System Developer Walk Into a Bar

Note: Thanks for all the great comments and feedback here and on Hacker News. Please keep them coming. I learned a ton and I am sure others will also.

Happy Valentine’s Day!

A modern app developer and an old-timer system developer walk into a bar. They had a couple drinks and started talking about the current state of security on the Internet. In a flash of genius, they both decided it would be useful to map the Internet and see what IPs have vulnerable ports open. After some discussion, they decided on the following

  • They will port scan all of the IPv4 address (2^32=4,294,967,296) on a monthly basis
  • They will focus on a total of 20 ports including some well-known ports such as FTP (20, 21), telnet (23), ssh (22), SMTP (25), etc
  • They will use nmap to scan the IPs and ports
  • They need to store the port states as open, closed, filtered, unfiltered, openfiltered and closedfiltered
  • They also need to store whether the host is up or down. One of the following two conditions must be met for a host to be “up”:
    • If the host has any of the ports open, it would be considered up
    • If the host responds to ping, it would be considered up
  • They will store the results so they can post process to generate reports, e.g.,
    • Count the Number of “Up” Hosts
    • Determine the Up/Down State of a Specific Host
    • Determine Which Hosts are “Up” in a Particular /24 Subnet
    • Count the Number of Hosts That Have Each of the Ports Open
    • How Many Total Hosts Were Seen as “Up” in the Past 3 Months?
    • How Many Hosts Changed State This Month (was “up” but now “down”, or was “down” but now “up”)
    • How Many Hosts Were “Down” Last Month But Now It’s “Up”
    • How Many Hosts Were “Up” Last Month But Now It’s “Down”

Modern App Developer vs Old-Timer Developer

Let’s assume 300 million IPs are up, and has an average of 3 ports open.

  • How would you architect this?
  • Which approach is faster for each of these tasks?
  • Which approach is easier to extend, e.g., add more ports?
  • Which approach is more resource (cpu, memory, disk) intensive?

Disclaimer: I don’t know ElasticSearch all that well, so feel free to correct me on any of the following.

Choose a Language

Modern App Developer:

I will use Python. It’s quick to get started and easy to understaind/maintain.

Old-Timer Developer:

I will use Go. It’s fast, performant, and easy to understand/maintain!

Store the Host and Port States

Modern App Developer:

I will use JSON! It’s human-readable, easy to parse (i.e., built in libraries), and everyone knows it!

{
	"ip": "10.1.1.1",
	"state": "up",
	"ports": {
		"20": "closed",
		"21": "closed",
		"22": "open",
		"23": "closed",
		.
		.
		.
	}
}

For each host, I will need approximately 400 bytes to represent the host, the up/down state and the 20 port states.

For 300 million IPs, it will take me about 112GB of space to store all host and port states.

Old-Timer System Developer:

I will use one bit array (memory mapped file) to store the host state, with 1 bit per host. If the bit is 1, then the host up; if it’s 0, then the host is down.

Given there are 2^32 IPv4 addresses, the bit array will be 2^32 / 8=536,870,912 or 512MBs

I don’t need to store the IP address separately since the IPv4 address will convert into a number, which can then be used to index into the bit array.


I will then use a second bit array (memory mapped file) to store the port states. Given there are 6 port states, I will use 3 bits to represent each port state, and 60 bits to represent the 20 port states. I will basically use one uint64 to represent the port states for each host.

For all 4B IPs, I will need approximately 32GB of space to store the port states. Together, it will take me about 33GB of space to store all host and port states.

I can probably use EWAH bitmap compression to gain some space efficiency, but let’s assume we are not compressing for now. Also if I do EWAH bitmap compression, I may lose out on the ability to do population counting (see below).

Count the Number of “Up” Hosts

Modern App Developer:

This is a big data problem. Let’s use Hadoop!

I will write a map/reduce hadoop job to process all 300 million host JSON results (documents), and count all the IPs that are “up”.


Maybe this is a search problem. Let’s use ElasticSearch!

I will index all 300M JSON documents with ElasticSearch (ES) on the “state” field. Then I can just run a query that counts the results of the search where “state” is “up”.

I do realize there’s additional storage required for the ES index. Let’s assume it’s 18 of the original document sizes. This means there’s possibly another 14GB of index data, bringing the total to 126GB.

Old-Timer System Developer:

This is a bit counting, or popcount(), problem. It’s just simple math. I can iterate through the array of uint64’s (~8.4M uint64’s), count the bits for each, and add them up!

I can also split the work by creating multiple goroutines (assuming Go), similar to map/reduce, to gain faster calculation.

Determine the Up/Down State of a Specific Host

Modern App Developer:

I know, this is a search problem. Let’s use ElasticSearch!

I will have ElasticSearch index the “ip” field, in addition to the “state” field from earlier. Then for any IP, I can search for the document where “ip” equals the requested IP. From that document, I can then find the value of the “state”.

Old-Timer System Developer:

This should be easy. I just need to index into the bit array using the integer value of the IPv4, and find out if the bit value is 1 or 0.

Determine Which Hosts are “Up” in a Particular /24 Subnet

Modern App Developer:

This is similar to searching for a single IP. I will search for documents where IP is in the subnet (using CIDR notation search in ES) AND the “state” is “up”. This will return a list of search results which I can then iterate and retrieve the host IP.

Or

This is a map reduce job that I can write to process the 300 million JSON documents and return all the host IPs that are “up” in that /24 subnet.

Old-Timer System Developer:

This is just another bit iteration problem. I will use the first IP address of the subnet to determine where in the bit array I should start. Then I calculate the number of IPs in that subnet. From there, I just iterate through the bit array and for every bit that’s 1, I convert the index of that bit into an IPv4 address and add to the list of “Up” hosts.

Count the Number of Hosts That Have Each of the Ports Open

For example, the report could simply be:

20: 3,023
21: 3,023
22: 1,203,840
.
.
.

Modern App Developer:

This is a big data problem. I will use Hadoop and write a map/reduce job. The job will return the host count for each of the port.

This can probably also be done with ElasticSearch. It would require the port state to be index, which will increase the index size. I can then count the results for the search for ports 22 = “open”, and repeat for each port.

Old-Timer System Developer:

This is a simple counting problem. I will walk through the host state bit array, and for every host that’s up, I will use the bit index to index into the port state uint64 array and get the uint64 that represents all the port states for that host. I will then walk through each of the 3-bit bundles for the ports, and add up the counts if the port is “open”.

Again, this can easily be paralleized by creating multiple goroutines (assuming Go).

How Many Total Hosts Were Seen as “Up” in the Past 3 Months

Modern App Developer:

I can retrieve the “Up” host list for each month, and then go through all 3 lists and dedup into a single list. This would require quite a bit of processing and iteration.

Old-Timer System Developer:

I can perform a simple OR operation on the 3 monthly bit arrays, and then count the number of “1” bits.

_Note: I fixed the original AND to OR based on a comment from HN. Not sure what I was thinking when I typed AND…duh!

How Many Hosts Changed State This Month (was “up” but now “down”, or was “down” but now “up”)

Modern App Developer:

Hm…I am not sure how to do that easily. I guess I can just iterate through last month’s hosts, and for each host check to see if it changed state this month. Then for each host that I haven’t checked this month, iterate and check that list against last month’s result.

Old-Timer System Developer:

I can perform a simple XOR operation on the bit arrays from this and last month. Then count the number of “1” bits of the resulting bit array.

How Many Hosts Were “Up” Last Month But Now It’s “Down”

Modern App Developer:

I can retrieve the “Up” hosts from last month from ES, then for each “Up” host, search for it with the state equals to “Down” this month, and accumulate the results.

Old-Timer System Developer:

I can perform this opeartion: (this_month XOR last_month) AND last_month. This will return a bit array that has the bit set if the host was “up” last month but now it’s “down”. Then count the number of “1” bits of the resulting bit array.

How Many Hosts Were “Down” Last Month But Now It’s “Up”

Modern App Developer:

I can retrieve the “Down” hosts from last month from ES, then for each “Down” host, search for it with the state equals to “Up” this month, and accumulate the results.

Old-Timer System Developer:

I can perform this opeartion: (this_month XOR last_month) XOR this_month. This will return a bit array that has the bit set if the host was “down” last month but now it’s “up”. Then count the number of “1” bits of the resulting bit array.

2016: Analyzing Security Trends Using RSA Exhibitor Descriptions

The data used for this post is available here. A word of warning, I only have complete data set for 2014-2016. For 2008-2013, I have what I consider to be representative samples. So please take the result set with a big bucket of salt.


Continuing my analysis from last year, this post analyzes the exhibitors’ descriptions from the annual security conference, RSA 2016. Intuitively, the vendor marketing messages should have a high degree of correlation to what customers care about, even if the messages trail the actual pain points slightly.

Some interesting findings:

  • The word hunt has appeared for the first time since 2008. It only appeared 6 times and ranked pretty low, but that’s a first nonetheless.
  • The word iot jumped 881 spots to 192 in 2016, after showing up for the first time in 2015. This may indicate the strong interest in IoT security.
  • There’s no mention of docker in any of the years, and only 5 mentions of container in 2016. This is a bit of a surprise given the noise docker/container is making. This is likely due to most customers care more about management for new technologies than security.
  • The word firewall dropped 151 spots in 2016. I want to speculate that it is due to the dissolving of perimeters, but I can’t be sure of that.
  • As much as noise as blockchain is making, there’s no mention of the word.
  • The word behavior (as in behavioral analysis) has also gained drastically over the past few years, going from #370 in 2012 to #78 in 2016.
  • See other findings below.

Top Words

The top words that vendors use to describe themselves haven’t changed much. The following table shows the top 10 words used in RSA conference exhibitor descriptions since 2008. You can find the complete word list here.

# 2008 2009 2010 2011 2012 2013 2014 2015 2016
1 secure secure secure secure secure secure secure secure secure
2 solution solution solution solution solution solution solution solution solution
3 network manage manage network provide provide provide provide provide
4 provide provide network provide manage manage network data data
5 manage data protect manage network service manage network threat
6 enterprise network provide data information more data protect network
7 data company data information software software protect threat protect
8 product service organization enterprise enterprise information threat manage manage
9 technology software information technology data enterprise service service enterprise
10 application busy risk product more customer enterprise enterprise service

Here’s a word cloud that shows the 2016 top words. You can also find word clouds for 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015.

Endpoint vs. Network

While the word network has mostly maintained its top 10 position (except 2013 when it fell to #11), the big gainer is the word endpoint, which improved drastically from #266 in 2012 to 2016’s #50. This may indicate that enterprises are much more accepting of endpoint technologies.

I also speculate that there might be a correlation between the increase in cloud and the increase in endpoint. As the perimeters get dissolved due to the move to cloud, it’s much more difficult to use network security technologies. So enterprises are looking at endpoint technologies to secure their critical assets.

Compliance vs. Threat

Not surprisingly, the use of the word compliance continues to go down, and the word threat continues to go up.

The number of mentions for threat intelligence remained at 22 for both 2015 and 2016, after jumping from 12 in 2014.

Mobile, Cloud, Virtual and IoT

While the words mobile and cloud maintained their relative positioning in 2016, we can also see virtual continues its slight downward trend.

Interestingly, the word iot made a big jump, going from position #1073 in 2015 to #193 in 2016. This potentially indicates a strong interest in security for internet of things. In general, the IoT space has seen some major activities, including Cisco’s recent acquisition of Jasper.

Cyber, Malware and Phishing

The word cyber continues to gain popularity in the past 4 years; however, the word malware has fell below the top 100, a position it maintained since 2010.

The word phishing made drastic gains since 2014, jumping from #807 to #193 in 2016. This may indicate that enterprises are seeing more attacks from phishing, and vendors are targeting that specific attack vector.

It’s all about Behavior!

The word behavior (as in behavioral analysis) has also gained drastically over the past few years, going from #370 in 2012 to #78 in 2016.

Credits

  • The word clouds and word rankings are generated using Word Cloud.
  • The actual vendor descriptions are gathered from the RSA web site as well as press releases from Business Wire and others.
  • Charts are generated using Excel, which continues to be one of the best friends for data analysts (not that I consider myself one).

Installing Windows 7 on Macbook Late 2008

Over the weekend I wanted to install Windows in a bootcamp partition so the kids can use it to do their Chinese homework. The Chinese homework CD unfortunately only works in Windows so I had no choice!! I guess I could have taken other routes, like installing Windows in a VM or something, but I figure that Mac has this awesome tool called bootcamp, why not use that?

Well, how wrong I was! I went through a whole day of head-scratching, temper-inducing, word-cussing, USB-swapping and machine-rebooting exercise of getting Windows installed in the bootcamp partition. I almost went as far as buying a replacement superdrive for the macbook, but at the end I finally was able to get Windows 7 onto the Macbook.

To start, my laptop is a Macbook, Aluminum, Late 2008 (MB467LL/A) with a busted optical drive (superdrive). I originally had Mavericks running on it but before this exercise I wiped it clean and installed Yosemite on it. Because the optical drive is busted, I cannot use the Windows 7 DVD, so I had to do this using a USB flash drive.

Below are the steps I took to make this work. I can’t guarantee that these steps will work for you, but it’s probably good as a reference. Having seen a ton of articles on the problems people had with bootcamp, I hope no one has to go through the troubles I went through.

  1. It took me a while to figure this out (after reading numerous online posts), if your Mac has an optical drive, Boot Camp Assistant will NOT create a USB flash drive-based install disk. The only way to trick the system to do that is to do the following: (Though it turns out at the end that this step is quite useless, since the USB install disk created by Boot Camp Assistant couldn’t boot! So you could really skip this step.)
    1. Modify Boot Camp Assistant’s Info.plist as described here.
    2. After the modification, you need to resign Boot Camp Assistant, or else it will keep crashing. To do that, following the instructions here. For the impatient, run the command sudo codesign -fs - /Applications/Utilities/Boot\ Camp\ Assistant.app.
  2. Start “Boot Camp Assistant”, and select the options “Download the latest Windows Support”, and “Install Windows 7 or later versions”.
    • Note I am not selecting the option to create a Windows install disk. It turned out the USB install disk didn’t boot. I keep getting the “non-system disk, press any key to continue” error, and basically that’s the end.
    • In any case, these two tasks should download the bootcamp drivers onto a USB drive, and also partition the Mac’s HD into two partitions. One of the parititions is the BOOTCAMP partition, which will be used to install Windows 7.
  3. Once that’s done, I needed to create a bootable Windows 7 USB Flash drive.
    • If you search the web, you will find that most people run into two problems. The first is the bootcamp-created flash drive giving the “non-system disk” error, and the second is the boot up hangs with a blank screen and a flash underscore cursor at the top left corner. I’ve ran into both. You will also find some articles that explain how to make the flash drives bootable using fdisk, but that didn’t work for me either.
    • Finally I found a post online that pointed to the Windows USB/DVD Download Tool. It’s a Windows program that can create a bootable USB flash drive from a Windows 7 or 8 ISO file.
    • Note though, not all the USB flash drives are created equal. The PNY 16GB drive I used didn’t work. WUDT ended with an error that says it couldn’t run bootsect to create the boot sectors on the flash drive. The one that worked for me was Kingston Data Traveler 4GB.
  4. Now that I have the bootable USB flash drive, I plugged that into the Mac and started it up. This time the installation process got started.
  5. When Boot Camp Assistant created the BOOTCAMP partition, it did not format it to NTFS. So the first thing I noticed was that when I select the BOOTCAMP partition, the installer said it cannot be used because it’s not NTFS.
    • The option to format the partition is not immediately obvious, but I had to click on “Drive options (advanced)” and select the option to format the partition.
    • Once that’s done, I encountered another error that says the drive may not be bootable and I need to change the BIOS setting. Yeah at this point I was pretty ticked and the computer heard a few choice words from me. Doesn’t matter what I do it doesn’t seem to let me pass this point.
    • I did a bunch more readings and research, but nothing seem to have worked. I finally decided to turn the computer off and come back to it. Magically it worked the second time I tried to install it. I was no longer getting the non-bootable disk error. My guess is that after the NTFS formatting, the installer needs to be completely restarted.
  6. In any case, at this point, it was fairly smooth sailing. The installation process took a bit of time but overall everything seemed to have worked.
  7. After the installation, I plugged int the bootcamp flash drive with the WindowsSupport files, and installed them.

I am still not a 100% yet. The trackpad still doesn’t behave like when it’s on the Mac. For example, I can’t use the two finger drag to scroll the windows, and for the life of me, I cannot figure out how to easily (and correctly) set the brightness of display. But at least now I have a working Windows 7 laptop!

Archive