By Patrick Hubbard, SolarWinds Head Geek
I’m at an interesting point in my career. I’ve ridden all the rides and managed all the technology things like most ops engineers with some miles. Eventually, you realize—with a few notable exceptions—IT technology is a cycle in which enterprises reimplement the same business functions over and over again. And this realization forces a change. You pivot from a focus on mastery of new technology and instead learn to identify and share successes from previous cycles to reduce risk and cost in the current one.
For example, at a macro level, aren’t IBM® LPARs, VMware, and Kubernetes the same thing? Gross technical oversimplification aside, they allow organizations to share infrastructure, apportioning and orchestrating common resources to service multiple workloads. Even their design goals are the same: efficiency (decreased costs through better utilization), flexibility (nimbleness), and standardization (reduced complexity). Make no mistake, ops is getting better each go-round, especially for web applications. Smart teams now lead more by understanding the needs of the business and less by chasing shiny new tech feature promises.
But what happens when a technology refuses to stay within the established platform guardrails it always has? Web performance monitoring is a good example. Applications themselves aren’t changing radically—web evolves faster than any other front-end tech. Instead, the back ends serving them are beginning to transform nearly as rapidly as UX design fashion. This change flies in the face of decades of application delivery assumptions, forcing web teams to scramble to keep up.
In my early days as a developer, web was some of the most exciting work, especially replacing the console front ends on midrange systems. Users were astounded when tedious field tabbing across console screens was gone, replaced with clean web forms and images. Conveyance? In my analytics app? Yup. Little to no training required.
Even better, the web unlocked trapped value for less technical business experts. As a bonus, it was relatively easy to migrate apps over time to new back ends as those systems evolved. Service-oriented architectures and APIs have been benefactions to web app owners since the beginning. Eventually, the internet’s disunion effect—breaking apart bespoke client-server topologies—will drive modern data center service-first architectures, too.
But however much infrastructure evolved, two constants remained. First, endpoints were (for the most part) relatively homogeneous. Generally, users sat at PCs on campus networks or VPNs. Second, we monitored with root access. Because we owned the data center, we could collect performance metrics from every layer of the application stack, including the network. User endpoint monitoring was useful but usually complex and expensive. In response, operations learned how to read the tea leaves of servers, storage, firewalls, routers, and switches and infer users’ performance and experience based on how the back-end was behaving. The human mind is great at these types of associations, assuming you can keep conclusion bias at bay.
Even as web applications were overwhelmingly adopted by enterprises’ real audience—customers on the internet—the monitoring approach was largely the same. If an app team possessed enough detail, they could reliably assume what the experience of 95% of users was. Happy people reveled in the novel convenience of online access to brands they trusted while IT operations teams kept close watch on performance with metrics they knew like the back of their hands. This golden age survived multiple server technology eras.
Suddenly, times are changing. While operations is quick to blame cloud and hybrid topologies for increasing complexity and making web performance monitoring more challenging, a series of conspiring circumstances actually frustrate web owners.
First, cloud is eating everything, including root access. If you think of Amazon, Azure®, and GCP™ as hyper-scale MSPs, they do what you’d expect from an MSP; they offload work from you. This is great and as intended. However, like any MSP, they won’t let you access the underlying plumbing because they’re multi-tenant. Partly for safety and partly for competitive reasons, your ops team loses visibility into inter-instance network monitoring, detailed storage performance, firewall transit, and other utilization and performance details when application components move to the cloud. There are far fewer tea leaves to read, which means operations teams have limited data to guesstimate real user experience. This came as an unwelcome surprise for many web application owners who chose cloud for other reasons.
Simultaneously, a second factor has more dramatically changed enterprises’ options for web performance monitoring: users. Web’s genius is its versatility, and we’ve supported the development of not only mobile browsers but API-bound native apps, which to ops look more like other machines than people. Users are also far more experienced and have much higher expectations. They don’t care about old-school SLAs—they expect to be delighted and reward brands that delight them. They expect your application to work well from a browser at work, at home, or on the road; on all native versions for iOS and Android; over 3G or 5G; and anywhere on earth. Oh, and all these experiences must be great while the endpoint simultaneously runs a hundred other applications you’re unaware of.
Fortunately, web applications teams are no longer at the change vanguard alone, and battle-tested best practices are being shared. Teams are extending the enterprise web app performance monitoring armory to include three previously “nice to have” capabilities. First, they’re configuring synthetic user monitoring from many locations to provide composite views of application performance. When transactions are monitored from multiple locations simultaneously, it’s much easier to differentiate important application or systems issues from those originating on the network or device.
Second, they’re investing more in real user monitoring, effectively turning the actual humans the business cares about into agents. Despite the walled gardens of mobile OSs, application owners are working around device visibility limitations to better monitor the performance of individual users in a broad collection of configurations. Cloud- and services-backed app teams are using techniques like automated script injection. By monitoring UI performance and tracking device and network data, they’re able to compare performance and behavior across users.
Finally, they’re instrumenting APIs—the longtime friend of web development teams everywhere—as thoroughly as they kept an eye on HTTP requests. Just like the traditional application stacks we’ve relied on for years, “API stacks” are only as effective as their weakest member. We may not be shopping for prepackaged services from the “API economy” store, but operations teams rely on APIs more than ever. At the same time, APIs enable greater reuse and sharing, leading teams to monitor the performance of the services they publish and share with other teams as much as the APIs they depend on for their front ends.
The pessimistic, half-empty assessment of this period of web application evolution says transitioning web performance monitoring to users and API first, infrastructure second must be an unwelcome change. It’s understandable. Addressing system changes and slowing innovation or the new features users are clamoring for isn’t web’s charter. No technology team enjoys a plumbing upgrade time-out. But many teams on the other side of this transition are quick to share a more half-full assessment. In many cases, they’re able to close existing monitoring gaps more easily than they thought, with real benefits for users, application owners, and the business. It’s no longer the heavy lift it was just a couple years ago.
Maybe it’s not a strange time if you’ve been working with web engineers for a few years. CMOs, application owners, and operations are finding the web is more nimble and adaptable than most applications in the data center. But if you’re in operations, this isn’t a surprise at all.