A smarter way for reading log files

By | August 16, 2006

All system administrators or database maintenance employed know that one of the most tedious, yet one of the most crucial. Having to go through all the logs on who got DHCP leases, which user account has been (ab)used, etc…

I know that for me it started to become a pain in the ass, especially since the damn computer I was reading logs from was a hobby project. So I started fooling around with simple ideas on how to process all of this information automatically. And yes I know that there are a lot of tools out there that can do this, but they either cost a sift load of money or they can only do one specific task.

I wanted something that could read log files for DHCP, System logs, webservers and various other applications.

Since my server is Windows based I decided to write something small in VB.Net. And again, I know that most servers run Linux and so on and so on. But I don’t care, after all how many desktops run it, and this tool was to run from a desktop.

So what to do first when you find yourself in my situation. Well I was kinda familiar with the format of the logs, which helps trust me on this! So first thing I did was make a list with what information could be found and why this could be useful. So for webservers this would be any type of failure and why, how and when they occurred. For this post I will focus a bit on Apache, but it’s easy to implement something similar for every log file you can imagine.

If you know the build-up of the log file then you can write an easy parser for it. Which sounds somewhat easier then it is. I love using regular expressions for this because of their power. So for a typical Apache log file this would look something like this:

Date regex: [o-9]{2}/[w]*/[0-9]{4}(:[0-9]{2}){3}s(+|-)[0-9]{4}

State:  “((GET)|(HEAD))s/[w/._0-9,-]*.[w]*s(HTTP)/[0-9].[0-9]”

Status: s[0-9]{3}s

As you might have guessed this part takes some time, or if you’re anything like me trial and error.  Note that these three regular expressions only fetch the date, state and status (unfortunately accompanied by some noise).  For testing the regular expressions I often turn to RegexLib Website with a snippet from an actual log file.

Now that you have all this raw information it’s simply a matter of formatting it in a way that pleases you. For me this was a HTML e-mail format. Easy and nice as you can have to tool e-mail the results of the logs to you, or at least a sort of summary.

Why did I not just pick a default package like Webalizer for this. Well easy I wanted it to sent specific type of information. Like what pages cause most problems, how often are wrong requests made and how many visits are there on average.

Would I suggest to you all to write your own tools for analyzing massive amount of logs and sending you a summarized version. Yes I would! It does require some time and you need to learn a programming language, but it far out ways having to sift through all the log files day in day out.

More information

Leave a Reply