For sure, transaction volumes and response times are the first two metrics to consider.
But that is not what makes APM a complex undertaking.
Everybody, for years, has been MONITORING volumes and response time. Actually DOING SOMETHING with the data – this is the real gap.
There are many organizational impediments that conspire to bog down the sharing of this critical information. The real focus of APM (where the ‘M’ == management) is to facilitate collaboration on the data. Getting the data to the right people in the organization and giving them a mechanism to effect the change needed to improve or restore performance – this is the gap that the APM vendors still need to work on addressing.
Monitoring without paying attention to collaboration – and you miss the APM value. Nobody picks up a “hammer” and instantly knows how to build a house! The tools today are great ‘hammers’…
Monitoring + Organizational Processes (Collaboration) – and APM delivers. No matter whose ‘hammer’ you happen to have.
Don’t count out Synthetics from your APM strategies
For sure, synthetics are an important strategy, especially after hours when traffic is too low for real-transaction volume to be consistent. But if synthetics are all you have – you’re going to be in trouble! Those marketing sites are apparently not being managed. There isn’t much point in monitoring if there is no one to respond to the incident!
Any APM strategy has to allow for those apps/services that are initially unmanaged and could quickly be illuminated with log monitoring or synthetics. For tier-2 and tier-3 apps – this could be all they ever need. But for any app/service of significance you need to quickly follow up with deeper visibility – to whatever that implementation can support. Our first job is to get some visibility, any way we can. Our second job is to start using this information to improve the software service – not simply alert that it has ‘hit the fan’. Otherwise, you end up with a practice that measures “craters” with great precision – but never learns how to avoid the accidents in the first place.
Completely Automatic Performance Regression!
I always use a load profile that ramps-up for 5 minutes, then steady state for 10-20 minutes and then ramps down for 5 minutes. I need to focus my analysis on that steady-state period . How do I achieve that with your tool set?
Log Data Outranks Traditional Data Sources for Network Operations Management
Key phrase:: Network Operations – and that should be no surprise.
While the network is a important component of an APM initiative, it is usually the last area to participate collaboratively. (I’ve seen things!) And logs are always part of the APM toolset. But logs alone only help a few folks, leaving nothing for the application stakeholders.
Successful APM is achieved when all of the ‘traditional IT silos’ are contributing and participating – network, security, middleware, mainframe, operations. APM focuses on transaction and application characteristics but you cannot get to “root cause” unless the other silos participate – otherwise, full-on “blame game”. Getting those silos to collaborate – it’s all about the process – and not the technology – as this article reminds us.
So if your network group lives on logs – that’s cool. Can they baseline performance? Do they share the KPIs? Do they proactively notify when things ‘go south’? Do they post configuration changes so that you can correlate your transaction degradations with ongoing network issues?
Building data landfills, based on logs, and consumed by a few specialists – I don’t get the benefit. But reduce the data to KPIs and baselines, update proactively and contribute to the single view of end-2-end service performance… that’s what I would put some budget chips on! 😉
From what angle should we be looking at APM as it relates to IT strategy? | response only
How about the angle of “value to the business”? It doesn’t matter if you are monitoring infrastructure, real-time performance or end-user experience – or all three – if the business is not deriving a value from that investment. DevOps especially is really about software/service quality. Monitoring is just a means of getting there.
So does infrastructure monitoring help me improve software/service quality? Nope. Just lets me measure the size (and frequency) of the crater. A lot of folks cluster applications not to get performance but because they can’t fix the memory leaks or other instabilities. So the gap is in getting code fixed – not measuring the craters!
So does real-time performance monitoring improve software/service quality? Nope. I get to see what is breaking, in great detail. But I don’t have a business-impact context to let me prioritize where I should be spending my spare software-fix budget. So I throw a few more instances into the mix… and hope I meet my SLA.
So does end-user-experience improve software quality? Nope. I finally get to see the business context, specifically which transactions and what business impact – but I don’t get much of a clue to what is broken – other than it is transaction #47… and it’s costing us a ton!
To make sense, APM needs all of the info. So let’s throw in network traffic monitoring (different from infrastructure) and logs (for those apps we can’t instrument or are using RPCs, whatever). And hopefully I’ll have visibility into what needs to get fixed. But then you have to have processes on which to employ all these measurement, and when you catch problems, you have a mechanism to correct them. And this is something best done before you even hit production – but that’s a lesson for another day.