What Can You Do With WebSphere Log Codes?

I’m finally getting back into IBM BPM and it seems that my memories of slow-boating along the Mekong River have squeezed out one or two things that I used to know inside-out. Evernote is my favourite solution to this problem and it’s my hope that this post will serve as a reminder to someone else one day.

I’ve been playing around with an IBM BPM virtual machine and with my usual obsession around digital cleanliness I wanted to get the Process Server startup time down to long enough to make a cup of tea (rather than having time to drink it as well).

I’m a big fan of operational visibility using tools like Splunk and (prior to visiting Laos) have built reports to provide good visibility of key operational metrics.

In this instance I don’t have Splunk installed (yet) and I just want to get the total startup time from the logs.

It turns out that IBM are nicely disciplined around log codes and that WebSphere churns out some entries that contain the information that I need.

The files that I’m interested in are in the install_root/profiles/profile_name/logs/server_name directory and called SystemOut.log, SystemErr.log, startServer.log and stopServer.log.

I can watch the logs in Windows using a simple PowerShell command:

Get-Content -Path C:\IBM\WebSphere\AppServer\profiles\qbpmaps\logs\server1\SystemOut.log -Tail 1 –Wait

I’m really only interested in the following entries...

[5/5/16 22:03:09:906 EST] 00000001 ManagerAdmin I TRAS0017I: The startup trace state is *=info.
[5/5/16 22:07:44:093 EST] 00000001 WsServerImpl A WSVR0002I: Server server1 open for e-business
[11/29/15 4:37:41:297 EST] 00000038 ServerCollabo A WSVR0024I: Server server1 stopped

So I can run...

Get-Content -Path C:\IBM\WebSphere\AppServer\profiles\qbpmaps\logs\server1\SystemOut.log | where { $_ -match “WSVR0002I|TRAS0017I" }

This is all very basic but it was enough for me to quickly learn that giving my VM 8Gb or RAM instead of 4Gb was enough to reduce the startup time by 25%.

Hopefully it’s an easy exercise for the reader to see how much value could easily be added using a tool like Splunk e.g.

Generate alerts when the server stops.
Automatically generate server uptime charts and statistics.
Report on error frequency (grouped by code).

NOTE: I don’t work for Splunk but I do think it’s awesome. I’m reasonably confident that Sumo Logic (and probably a bunch of other tools) could do all of this just as easily. I’m pitching the approach, not the tools.

Here’s a table (that I will endeavour to add to - please make suggestions in the comments) of noteworthy log entry codes:

Code	Description
TRAS0017I	The Process Server has started starting.
WSVR0002I	The Process Server has finished starting.
WSVR0024I	The Process Server has stopped.