External Post – What is a Performance Monitor

Originally Posted on Quora

In the IT (Information Technology) domain, it is a software system that assess the Availability, Performance and Capacity of the various IT subsystems, such as Mainframe, Database, Middleware, Web and Application Servers, and Network infrastructure (Routers, Firewalls, Switches, etc.)

The software system can be any combination of technologies: agent-based, agent-less (packet filtering), SNMP pings, transaction simulators, or logfiles.

Performance is itself consists of various Response Time measurements: component interactions, database calls, web services – essentially any kind of transaction that has a significant volume and measurable response time.

The challenge in using Performance Monitoring is knowing how to distinguish what is normal, and what is not. The process for this is to characterize the application under load and then survey to find the more frequent transactions that have a significant response time (> 1 msec). Putting these key transactions in to a monitoring group let’s you establish a normal behavior. And when the response time is too short, then use a capacity measure like invocation count.

Finding these key transactions can be accomplish over a week or two of production experience but is better done during QA performance testing where you have better control of inter-system variables and can potentially load the application until failure. A load-to-failure lets you identify the bottlenecks in the application and very often, the key transactions are different under crush load than nominal load.

In a modern enterprise, there are potentially hundreds of IT components that comprise a complex application or service. Figuring out what components are responsible for a degradation of service or outright failure can be difficult or impossible without the visibility that Performance Monitoring provides. Not all applications need full performance monitoring and it is usually reserved for revenue bearing systems or Tier-1 applications.

For many other web services/applications, especially those that are dynamically clusters and multi-site, Performance Monitoring is nice to have but not mandatory. In these situations, loosing a few instances, here and there, is no big deal. But if you want to optimize your clustering costs, or enhance service reliability and customer experience – then Performance Visibility is an essential tool.

You manage what you measure. //ws-na.amazon-adsystem.com/widgets/q?MarketPlace=US&OneJS=1&Operation=GetAdHtml&ServiceVersion=20070822&ad_type=product_link&asins=B004I5BNEA&bg_color=FFFFFF%22%3E+++++%3C%2Fiframe%3E&linkId=350d7dc3274d0f44221644576ee601f8&link_opens_in_new_window=false&marketplace=amazon&placement=B004I5BNEA&price_color=333333&ref=tf_til&region=US&show_border=false&source=ac&title_color=0066C0&tracking_id=spyderjacks-20“>APM best practices

 

What is the main difference between IT and software engineering?

This topic is a useful jump point for conversations around identifying Best Practices, and making their application reliable and consistent across the enterprise.  This is what The APM Practice is focused on – helping to ensure that your organization’s DNA lives on, as new tools and technologies are brought to bear in the evolution of your enterprise.

Originally posted on Quora

IT (Information Technology) is about managing the application of computing technology (hardware, software and networking) to business problems and environments. In the not so distant past, absolutely everybody had some capability, ranging from the college student that kept your PC’s up to date (software and hardware), to hundreds of individuals overseeing and contributing to the implementation and operation of the computing resources of a large enterprise. It required significant budgets, planning and project management to keep up with the changing technology, development and testing, and operation of the resulting physical plant.

Software Engineering is about reliably building systems of software that full-fill business/commercial and personal requirements. It is largely independent of any specific hardware or environment (platform independence) and a single individual can literally “change the world” with a novel application, and with little more than investment than their time to gain the expertise.

With the world literally moving to cloud-based computing resources and applications, the need for private computing centers is fading, and along with it the ‘profession’ of IT. The efficiencies and automation of cloud operations have a single individual responsible for the same work that literally hundreds of staff were required – and that was only a few years ago.

Even the concept of a computer workstation is shifting to mobile and assistants like Alexa. Still early days, for sure but the glory days of the CIO and an army of IT staff are over and done.

What remains is to preserve the useful practices of IT as a foundation for enhanced automation – the lessons learned and tools developed (that are not already commercialized or open-sourced). Everything that IT performed, as individual agents in a large and hierarchical organization, becomes subsumed into the practice and domain of the Software Engineer, and their specialist roles in Systems, Architecture and Security.

External Post – Building the “Death Star” of legacy APM tech?

If you are going to do Enterprise APM, you need a full APM stack: logs, commands, synthetics, real user and byte code instrumentation. There is always an advantage in going with a single vendor vs. ‘best-of-breed’ and the consolidation is just a reflection of the industry maturation. Sure, it might look like the “Death Star” but it is still smaller the hundreds of apps present in an established enterprise – even if you limit your APM to tier-1 services.

Regards the “declining shares” of the big players, I think this has more to do with the decline in infrastructure management (IM) vs. growth of APM. You become proactive in your performance management by catching problems pre-production – rather than precise measurements of the crater you make in production.

External Post – Requirements – do you have to start at the beginning of the application lifecycle?

Requirements are nice but an APM initiative need not begin at the start of a new application life-cycle. Reality is that a menu of apps are available; some mature, some problematic, some of little consequence – and the very few ready to begin with a ‘greenfield’ of clean requirements. You need only assess what proportions are on the ‘menu’ and you can devise a program that will bring them all into good alignment.

Most important is to actually begin with the ‘stable’ applications so that you can practice your deployment and configuration of APM and learn what your stakeholders really want for performance metrics. Then move on to problematic and ‘greenfield’ apps.

Every client environment is different and simply taking an academic and rational approach to performance requirements is going to leave a lot of gaps – and support will evaporate. Better to get going now and implement some visibility quickly and show what you can do – rather than what would be ideal to have.

What is the motivation to write about Best practices?

The very last thing I did at CA – and I mean right before I turned in my badge, was to do a Q&A about why I wrote the APM Book and what is means for CA, customers and the industry at large.  I never checked if it was actually used… and now I find it on the YouTube!  CA Press Author Series: Mike Sydor

So if you have missed having me on a conference call or leading a customer initiative, you can re-live the magic.  All done in one take with the video team pleading: “please don’t move – we keep getting glare from your glasses” – Arg!  I can still feel the cramps.

You can find more links on the book and related articles and publications here.

Cheers.

External Post – Two principle metrics you must monitor for any application

Two principal metrics you must monitor for any Application

For sure, transaction volumes and response times are the first two metrics to consider.

But that is not what makes APM a complex undertaking.

Everybody, for years, has been MONITORING volumes and response time. Actually DOING SOMETHING with the data – this is the real gap.

There are many organizational impediments that conspire to bog down the sharing of this critical information. The real focus of APM (where the ‘M’ == management) is to facilitate collaboration on the data. Getting the data to the right people in the organization and giving them a mechanism to effect the change needed to improve or restore performance – this is the gap that the APM vendors still need to work on addressing.

Monitoring without paying attention to collaboration – and you miss the APM value. Nobody picks up a “hammer” and instantly knows how to build a house! The tools today are great ‘hammers’…

Monitoring + Organizational Processes (Collaboration) – and APM delivers. No matter whose ‘hammer’ you happen to have.

External Post – Don’t count out Synthetics from your APM strategies

Don’t count out Synthetics from your APM strategies

For sure, synthetics are an important strategy, especially after hours when traffic is too low for real-transaction volume to be consistent. But if synthetics are all you have – you’re going to be in trouble! Those marketing sites are apparently not being managed. There isn’t much point in monitoring if there is no one to respond to the incident!

Any APM strategy has to allow for those apps/services that are initially unmanaged and could quickly be illuminated with log monitoring or synthetics. For tier-2 and tier-3 apps – this could be all they ever need. But for any app/service of significance you need to quickly follow up with deeper visibility – to whatever that implementation can support. Our first job is to get some visibility, any way we can. Our second job is to start using this information to improve the software service – not simply alert that it has ‘hit the fan’. Otherwise, you end up with a practice that measures “craters” with great precision – but never learns how to avoid the accidents in the first place.

External Post – Log Data Outranks Traditional Data Sources for Network Operations Management

Log Data Outranks Traditional Data Sources for Network Operations Management

Key phrase:: Network Operations – and that should be no surprise.

While the network is a important component of an APM initiative, it is usually the last area to participate collaboratively. (I’ve seen things!) And logs are always part of the APM toolset. But logs alone only help a few folks, leaving nothing for the application stakeholders.

Successful APM is achieved when all of the ‘traditional IT silos’ are contributing and participating – network, security, middleware, mainframe, operations. APM focuses on transaction and application characteristics but you cannot get to “root cause” unless the other silos participate – otherwise, full-on “blame game”. Getting those silos to collaborate – it’s all about the process – and not the technology – as this article reminds us.

So if your network group lives on logs – that’s cool. Can they baseline performance? Do they share the KPIs? Do they proactively notify when things ‘go south’? Do they post configuration changes so that you can correlate your transaction degradations with ongoing network issues?

Building data landfills, based on logs, and consumed by a few specialists – I don’t get the benefit. But reduce the data to KPIs and baselines, update proactively and contribute to the single view of end-2-end service performance… that’s what I would put some budget chips on! 😉