Image may be NSFW.
Clik here to view.As much as I love and lean on logging, it's not immune from problems. In fact, it can be the source of some serious headaches. A recent snafu at work prompted me to write about some situations where logging can bring your application performance to a screeching halt, or crash your application altogether.
Here's what happened...
An incredibly complex system of hardware and software had been running smoothly for months; as part of an instrumentation layer we opted to change the rolling log strategy from 50 10MB text files:
...<appender name="TextFile" type="log4net.Appender.RollingFileAppender"> <file value="logs\log.txt" /> <appendToFile value="true" /> <rollingStyle value="Size" /> <maxSizeRollBackups value="50" /> <maximumFileSize value="10MB" /> <layout type="log4net.Layout.XMLLayout"> <prefix value="" /> </layout></appender> ...
to 500 1MB xml files:
...<appender name="XmlFile" type="log4net.Appender.RollingFileAppender"> <file value="logs\log.xml" /> <appendToFile value="true" /> <rollingStyle value="Size" /> <maxSizeRollBackups value="500" /> <maximumFileSize value="1MB" /> <staticLogFileName value="false" /> <countDirection value="1" /> <layout type="log4net.Layout.XMLLayout"> <prefix value="" /> </layout></appender> ...
As an ardent log4net user, I am aware of the performance impact of rolling a large number of files - if the CountDirection setting of the RollingFileAppender is less than 0 (which it is by default), the system will rename every log file each time the log rolls over. This is costly, and in our product configuration that would mean up to 500 file renames on each roll.
"Good thing I know what I'm doing...."
Several hours after firing up the new configuration a college asked me to come look at the device. It had slowed to a wounded crawl. I went to dig into the logs - I popped over to the log drive and started to find the most recent one.
... but there were 2000 log files, not 500. The 2GB drive dedicated to the log was completely full. And the application was still trying to write new log entries, which meant a slew of IO Exceptions were being continuously thrown by multiple threads.
"Um, I think I may have found the problem..."
I removed the oldest 1999 log files and the device immediately recovered.
So what happened?
The configuration XML is 100% correct. The problem was that I accidentally deployed the software to the device with an old beta version of log4net 1.2.9; that particular version contains a bug in the RollingFileAppender code that prevents the MaxSizeRollBackups from being honored when CountDirection was >= 0. Without the logs being limited in number, the software eventually filled up the hard disk with log entries.
Which bring me to my first death-by-logging mantra...
Logs Consume Space
It sounds silly I know, but this is the single most prevalent antipattern I see with any logging implementation. There is a finite amount of storage space, and you need to make sure your logging activity doesn't consume more than its share.
I frequently see this when apps use FileAppender - this beast has no chains and, as I've stated elsewhere, you should never ever use it. Ever. Even in "little" applications it can cause massive trouble because the log accumulates over process time or application runs with no checks. I've seen a 1KB application with 3GB of logs spanning almost a year of activity.
But don't think the problem is limited to the file-based appenders. Remember....
- memory is finite...
- event logs fill up more often than you think...
- a database can be configured to boundlessly expand as needed...
- mail servers generally put caps on the size of an inbox...
Whichever appender strategies you choose, you should carefully consider the following:
... How much persistent storage are you allocating to the log? Your answer should be a firm number, like "500MB", and not "the rest of the disk". If you can, base this on the amount of information you need to have access to. If a typical run of your application results in 10MB of logs, you can base the allocated size on the number of runs you want to persist. If your application runs continuously - like a web site or windows service - you can plot out the logging activity over time, then gauge the allocation size from the time span of activity you want to persist.... How will you cope when the allocated storage is used up? Some appenders, like the RollingFileAppender, can handle this for you by freeing up space used by old log entries. Others, like the EventLogAppender or the AdoNetAppender, blithely log without regard to the amount of space being consumed, and it's up to you to manage the size of the log in other ways. E.g., I've seen SQL jobs dedicated to removing log records older than N hours, or truncating the log table to the N newest records.
... What happens when you log to a full repository? Plan for success, but understand the causes of failure. As I recently learned, our application slows down significantly when the log drive is full, so now checking the free space of the log drive is now included in our troubleshooting guide as a checklist item under "Application Performance Issues". Test your application under limited logging resources to understand how it will behave.
The most important thing to remember is that logging, like any other subsystem of your application, needs to be planned, tested, and verified.