Logging: getting the right balance

Logging: what's too much? Is there too little? How long should I keep them? This post discusses factors to consider when configuring logging.

Logging: getting the right balance

Access logs, authentication logs, application logs, service logs - there's logs for all sorts of things when it comes to computing.  A list of "things that happened" and at what time can be a very useful thing for troubleshooting and incident response, but how do you know what to log and at what level of detail?

This is a complicated question and there's no globally correct answer.  What's appropriate in your use case may not match mine, and vice versa, so, as always, context is key here.  In this post I'll discuss some considerations when configuring systems to log.

Why log anyway?

I've already alluded to a couple of reasons, troubleshooting and incident response, but there may also be regulatory requirements applying to your industry or organisation.  If that's the case there will also be minimum (and possibly maximum) retention times (retention is covered later in this post).

When it comes to troubleshooting, logs can be a fantastic resource.  Earlier today I was working on a new feature for eVitabu and there were definite problems.  As this excerpt from my web server logs shows, there's an HTTP 500 (internal server error) being caused so my new code has broken the web application in some way:

[04/May/2019:11:16:44 +0000] "GET /api/announcement HTTP/1.1" 500 193 "-" "eVitabu 1.3.4a"

(Incidentally, that excerpt includes the new custom eVitabu user agent string.  Coming soon :) ).

Logs can also be used to point out a software configuration issue, as shown here in this Nginx log snippet:

2019/05/04 07:27:39 [crit] 21553#21553: *147985 SSL_do_handshake() failed (SSL: error:14209102:SSL routines:tls_early_post_process_client_hello:unsupported protocol) while SSL handshaking

In this particular case that's not a problem on my part: my webserver is configured to offer the protocols I deem appropriate.  I suspect someone was attempting a downgrade attack instead.

Incident response is the process of containing an incident in the most effective manner.  After containment has been achieved (for example stopping ransomware progressing further) the logs can also be examined to determine how an incident took place.  Incident response and digital forensics are often linked.

Importanly for incident response, logs are often near real-time so can be used to track an incident.  A client suffered a ransomware attack some years ago and it was file server logs that allowed me to determine what device the ransomware was on.  As a result we contained the incident (by shutting down the machine) and could then start our cleanup.

Another possibility is that your logs will be used as input to another system. PSLogonFailures makes use of the Windows Security log to actively block attempts to brute force a remote desktop server.  Fail2Ban is another fantastic system capable of protecting many systems and can be heavily customised.  It uses logs to achieve that protection too.

If you don't have any of these requirements (or any others) then you could, potentially, disable logs altogether.

Space

The biggest problem with logs is that they take up space, lots of it.  You could argue that space is cheap these days, and that's true in the home market.  A single 4TB SATA drive costs as little as £90, providing space for a lot of logs, even for a small to medium sized business.  When considering business IT requirements though it's important to realise a single disk rarely happens: RAID [1] arrays are used to store data and consist of numerous disks.  Add to that the fact enterprise storage is a higher quality and price than your average home grade product and the cost of your logs starts to hit the hundreds of pounds.

As an example, one of our former web filters logged using PostgreSQL.  The storage for those logs, and the indexes to make them searchable, clocked in at about 600GB for eight months.  That's cheap if I was using a single disk at home, but uses multiple disks (for redundancy & speed) in the SAN [2].  Remember those logs are also backed up (more disks) and that space can't be used for anything else so there are resourcing implications too.

In situations where it's possible to configure the level of logging it's sensible to only use verbose or debug level logging when looking in to a particular problem.  These log levels generate a lot more data, so should be avoided for day-to-day usage.

Do you look at them?

Have you heard the phrase "if a tree falls down in a forest, and there's no-one there, does it still make a sound?" ? The same can be applied to logs: if they show a major data breach, but no-one looks at them, do you know about the beach? The answer, clearly, is no.

If you're not going to look at your logs there isn't much point in having them in the first place.

Retention

Compliance rules will often dictate some times for retention, for example mandating you're able to look at incidents from six months ago.  It's worth noting there's an option to archive off logs to cheaper storage in the event it's necessary to refer to older entries.  Note, however, that having the logs "in an older backup" isn't the same as archive storage!

Setting up a retention schedule prevents your logs hogging all your diskspace, as older entries are deleted from disk.  Given some logs contain personally identifiable information, not keeping logs for longer than necessary helps protect that data from accidental breach and saves the organisation the cost of a subject access request.

Not every system automatically purges old logs (consider Microsoft's IIS web server, which seems to store logs indefinitely in its default configuration) so this should be checked.  Tools like logrotate on Linux help with retention.

Securing logs

Logs should be considered an important tool in the cybersecurity arsenal so should be protected as such.  If a malicious actor can change your logs (or delete them), the logs become of little use.  Logs can often be sent "off box" to another device with additional security controls (perhaps even an audit log ... ), allowing logs to be isolated from more vulnerable environments.

Once logs have been archived they still need appropriate protection, otherwise it's easy to hide a historic incident from detection.

Conclusions

There's no easy answer when it comes to logs, your context is important.  Start by working out what systems you have that can log and then determine what reasons you would have for reviewing such logs.  If you don't have any reasons immediately it's worth thinking long and hard before you disable logs altogether.

Whatever logging you choose, I can almost guarantee you'll not have logged something you need in the event of an investigation.  Once everything has calmed down, evaluate what additional logging would have been beneficial and consider adding it.  There will always be an edge case, so don't try to log everything unless you've got unlimited storage and time.


Banner image an extract from the logs for this blog.

[1] RAID - Redundant Array of Independant Disks, a method of protecting against disk failure.  Does not replace a backup.

[2] SAN - Storage Area Network, a storage array often found in business and enterprise environments.