This article describes a case study in which the Smarter Monitoring approach that helped reduce the time for initial problem analysis from hours to minutes. The additional information contained in the logs because of the smarter instrumentation also helped reduce performance problem analysis duration. An implementation approach that ingests 500 million log entries or 70 GB per day of data is briefly described.
Note: this article has originally been published on IBM's developerWorks website on 7. September 2016, and is here republished within IBM's publishing guidelines.
Almost two years ago I demonstrated a 10MHz accelerator card for the Commodore PET on the Classic Computing exhibition. For this presentation I wrote a small demo program that draws a 3-D cube on the PET's screen - first with 1 MHz, then with 10 MHz. As I was concentrating on the 10MHz speedup, I didn't have enough time to look deeply into the demo program itself. So it was questioned why it was so slow in 1MHz compared to other demos. Finally I found the time to have a look at this, so I'll make this an example of a performance optimization job. Read on to learn about how the proper approach for a performance optimization even helps fixing performance problems in an old Commodore BASIC program.
I have often been called into projects to investigate performance problems or to optimize a system for performance. What I find is that performance problems can have a lot of causes. Some are design problems, for example using the wrong algorithm for a problem, or processing items one by one instead of by collection. Others are database problems caused by bad access patterns. Network problems add their share. With the growing complexity of modern distributed systems (think cloud!) the situation is getting even worse. More and more systems and services are being combined to provide the business functionalities. Each one could be the cause of a performance problem.