Building internal IT systems is as much about the data that passes through your application as it is about the logic that operates on it. I’ve scanned log files of web servers for years. I often use a combination of
sed, and other tools to look into specific problems. These are often one-off curiosities that don’t require a GUI.
Since before taking over management of this team in July, our app has had some long-running cases where we see ERROR level items in our logs. Some I question whether they should be logged as errors, while some are disconcerting. The application works well overall, but in the spirit of No Broken Windows I’d like to clean up some of the sources of these errors.
Here’s a little snippet of the JBoss log file, showing some standard log output from Hibernate:
After a minute or two I formulated this little command-line string to give me a count of the classes that had the most errors:
This cuts through the log file, locates only errors, slices out the classname from the log message, sorts the remaining list, and then counts up the number of times each classname appears in the sorted list.
Interpreting these results can be a bit tricky. I take it really informally. This doesn’t represent the number of bugs in the modules, or even the classes with the most software defects. For me, this is simply a list that shows from among the classes that do log errors, which classes have experienced conditions that trigger those error logs most frequently. For our setup, that usually means that there’s some data from another system coming in that doesn’t meet some assumptions. Other causes may include intermittent connection problems to other systems that just get noticed on one part of the code.
I’m not certain of the exact outcomes that will result from this little discovery exercise, but I’m betting that our code will be a bit cleaner and a bit more robust for it.