Zen 3.1 - Product. Data. Code

Log Management Requirements for MSPs

Mon, Nov 29, 2004

I spent five years at one of the largest MSSPs as an architect and development manager. We had a couple thousand firewall, VPN, NIDS and HIDS devices that we manage for various hosting and managed service customers. We needed to aggregate all the logs generated by these devices and be able to provide reports and analysis for our customers.

We spent quite a bit of time looking at various COTS products and services, including ArcSight, Intellitactics, netForensics, and others. However, none met our requirements.

At the end, we built our own solution using open source tools such as MySQL, GD::Graph, RRD Tool, Cricket, etc.

Below are the top ten requirements that any MSP should consider when building their log management solution.

1. Segregation of Logs by Customer

As an MSSP, one of the biggest concern we had was the segregation of logs for the many customers we had. We didn’t want any of the customer data to be mixed in the same files or database tables as other customers. This requirement drove many of the design decisions we made during the building of the support infrastructure.

Imagine two competitors had their firewalls managed by the MSP and their data were mixed in the same files, what happens if the data of one competitor were shown to the other because of a bug in the software that’s used to filter logs?

2. Raw Log Retention Is Required

One of the requirements we had was that we needed to provide the raw logs to our customers so that they can do their own analysis. We can do all the necessary analysis on our side, but sometimes the customers may want other information from the logs that we don’t. Also, customers sometime have information that are proprietary and they can use that information to correlate with the logs.

Remember, MSPs built everything based on scale of economy and sometimes it’s difficult to customize the solution for individual customers. Obviously if the customer wanted to pay for professional services, there’s customization we can do. But not every customer wants to or have the budget to do that.

3. Reporting is #1

Customers wanted to reports on their security infrastructure. They wanted to see how much traffic (bytes, connections, etc) the firewalls are passing so they can properly plan for the future. They wanted to see how often users are VPN’ing into the infrastructure so they can identify any mis-use. They want to see trend reports to show how their infrastructure is holding up. They wanted to see correlated reports of the various devices (firewalls, VPN, IDS, etc) to see if there are any anomalies.

Our #1 priority was to provide all these reports and more to our customers.

4. Then Comes Real-Time Analysis/Alerting

Alerting is another important feature that our customers wanted. They wanted to know when something really bad is happening to their infrastructure. They don’t want to get alerted every time some script kiddie scans their network, but they do want to know if their network all the sudden has a huge increase in traffic and continues to increase for a long period of time. They want to know when an unauthorized connection is made to their production database. They want to know when a successful attack has been happened.

However, the MSPs are the first to receive these alerts and will filter the alerts based on the internal knowledge and SLA. Then the MSP will pass the alerts onto the customer if they are determined to be real.

5. Support All My Log Sources

Most MSPs support an array of devices and applications as part of their service. For examples, most MSPs will support firewalls such as PIX, Netscreen, Check Point or IP Tables, Network IDSes such as ISS or Cisco, Host-based IDS such as ISS, Okena, or Trip Wire. Some MSPs will support SSL accelerators or proxies, vulnerability scanning tools, servers, and many other applications.

At Cable & Wireless, we had firewalls, Network or Host based IDSes, scanning tools and authentication services. We needed a solution that will support all these different devices or applications. Most of the logs are sent via syslog except for Check Point firewall, Cisco IDS and Nessus scanning results. The Check Point logs were aggregated at the Provider-1 boxes and they need to be off loaded using LEA. The Cisco IDS alerts need to be retrieved via RDEP. The Nessus scan results were XML files and they needed to be parsed.

6. Web-based GUI Only

We needed a web-based GUI so that the customers can access reports, alerts, raw logs and device policies. It’s difficult to support any GUI that requires installation on the customer’s desktop. Non-web applications will eventually have conflicts with other applications and will require support from the MSP. That model is not scalable.

We also needed to embed this interface in our own web based interface. It is much easier to embed another web interface than an application. It is also much easier to skin a web interface as we wanted to have our corporate color scheme and logos there.

7. Distributed Collection Points

Cable & Wireless had over 40 data centers all over the world. Most MSPs probably have distributed environments as well. The devices that we managed were spread all over the data centers. They can be in UK or NY or SF or even HK or Japan. We needed a solution that can collect logs in all these different locations. We needed to be able to collect logs close to the log source so that the chance of dropped logs is minimized. We also wanted to keep the cost of these remote collectors relatively low comparing to the central archive.

8. Secure Connections All The Way

As a security organization, everything we did that can be secured must be secured. Unfortunately we cannot secure protocols such as syslog from a PIX, but we must support SSL for the web interface, Cisco RDEP, Check Point LEA, and encryption between remote collectors and the central storage. We also needed to secure the connection between the database and other components if they are in different networks.

9. No Agents Whatsoever

Due to the risk (performance, application conflicts, etc) of installing additional software (custom agents) on servers, we needed a solution that doesn’t require agents. This requirement is generally not a problem for most devices since they send logs via syslog. However, it’s a big issue for Windows based operating systems. There are several methods that one can collect Windows events, as I have written previously. However, all these methods have their problems such as requirements of agents or performance issues or none-real-time logging. Since we didn’t manage many Windows devices, this was not our major concern.

10. Open API for Integration

Another requirement was that we needed to run reports or search logs via scripts or web applications. We ran many background processes that sent out reports to customers or create custom reports based on some other condition. All these needed to be automated due to the need for scale of economy. This required that the solution to provide some type of open API that can be used within other programs.

11. Granular Permission Model

Since we needed to provide alerts, reports and raw logs to customers, we need to make sure customers have access to ONLY their own information. It was critical that customers don’t see other customers’ data. We needed the solution to have a granular permission model that can determine which devices belong to which customer, which reports the customers/users can see, which log files can be viewed by customers, etc.

These are some of the requirements for an MSP. I hope this will help you evaluate and build your own log management infrastructure.