Originally Posted on Quora
In the IT (Information Technology) domain, it is a software system that assess the Availability, Performance and Capacity of the various IT subsystems, such as Mainframe, Database, Middleware, Web and Application Servers, and Network infrastructure (Routers, Firewalls, Switches, etc.)
The software system can be any combination of technologies: agent-based, agent-less (packet filtering), SNMP pings, transaction simulators, or logfiles.
Performance is itself consists of various Response Time measurements: component interactions, database calls, web services – essentially any kind of transaction that has a significant volume and measurable response time.
The challenge in using Performance Monitoring is knowing how to distinguish what is normal, and what is not. The process for this is to characterize the application under load and then survey to find the more frequent transactions that have a significant response time (> 1 msec). Putting these key transactions in to a monitoring group let’s you establish a normal behavior. And when the response time is too short, then use a capacity measure like invocation count.
Finding these key transactions can be accomplish over a week or two of production experience but is better done during QA performance testing where you have better control of inter-system variables and can potentially load the application until failure. A load-to-failure lets you identify the bottlenecks in the application and very often, the key transactions are different under crush load than nominal load.
In a modern enterprise, there are potentially hundreds of IT components that comprise a complex application or service. Figuring out what components are responsible for a degradation of service or outright failure can be difficult or impossible without the visibility that Performance Monitoring provides. Not all applications need full performance monitoring and it is usually reserved for revenue bearing systems or Tier-1 applications.
For many other web services/applications, especially those that are dynamically clusters and multi-site, Performance Monitoring is nice to have but not mandatory. In these situations, loosing a few instances, here and there, is no big deal. But if you want to optimize your clustering costs, or enhance service reliability and customer experience – then Performance Visibility is an essential tool.
You manage what you measure.