Agents of Kaos in IT Infrastructure Monitoring
In the world of IT infrastructure monitoring and management solutions there has been a lot of debate about agent-based vs. agentless architectures. Deciding between an agent-based vs. agentless approach can have a big impact on the efficiency and management of your day-to-day operations as well as your ability to protect your IT environment in the future. It’s important to understand what the difference is and be fully aware of the tradeoffs involved in this decision.
Software agents are ubiquitous in IT infrastructures. They live on servers that are hosted, whether on-prem, in cloud, or even on your laptop. – These servers are running applications for all sorts of reasons, as well as operating system tasks. . Sometimes agents are unavoidable, for instance if you need the servers to perform common functions like ‘let me connect my application to you’.
Agents are small programs that perform tasks, often repetitively, on behalf of the server (or OS) so that a person doesn’t have to manually ‘run’ a function (i.e., list all the processes running on this server: “ps -ef”).
Considering Agents in IT Infrastructure Monitoring
Some agents help us keep our systems updated and avoid configuration drift. Others roam the server to keep everything safe from threats. Software agents are designed to make our job easier by automatically performing a function at some interval.
There are, however, some things to think about before embracing the euphoria of let’s run agents for everything we ever want our IT infrastructure monitoring and managing solutions to find out on some interval basis.
- Deploying and managing hundreds or thousands of agents on multiple servers or ‘instances’ can be a real hassle. The difficulty isn’t simply deployment of all these agents but creating the policies that roll out the correct agent version to the proper OS versions and testing the app for which it’s meant to work. For instance, an agent that runs on Windows Server 2012 to collect info for application-A version 1, likely won’t work on Windows Server 2016 to collect info for application-A version 3. Deployment products are now needed that do proper rollouts depending upon server inventory.
- Getting approval to install agents on servers can involve bureaucratic red tape. Most managers who are responsible for IT environments don’t like change. Change can mean something that was rolling along smoothly may gain some hiccups, at the least. Agents are also processes. That means they need memory and/or CPU resources which can affect the server’s performance. Worse, a poorly written process can spawn more of themselves.
Due to the aforementioned issues, who’s watching the agent?
The Difficulty with Some Cloud ITIM
With the advent of cloud infrastructures some cloud providers or even the companies whose apps are hosted on these cloud instances have policies that forbid the use of agents in their systems. In the case of a provider, they don’t want a rogue (running) agent running in THEIR environment. In the case of a corporation and provider, it could be fear of an agent compromising performance, but more than likely the concern is about compromising security! Agents are programs that perform functions, some of which include pulling information from your servers or apps and in many cases writing that info to the local server file, or of more concern, to an external server or data repository (as in the case of many data collections type monitoring systems). Therefore, they represent a security risk. Is the data they’re gathering something you want gathered? Is the data gathered something you want sent to a file or external repository? Are you absolutely SURE what is being collected or written is ALL there is … or is what you are aware of only what was documented?
Many agents require permission to connect to open ports in order for them to perform their function. Many need another port to the server to which they are reporting to. That communication is usually bidirectional. In summary, an agent can compromise a host and have a passkey to the estate. In the case, an agent is compromised through the actual agent code, today’s programming languages use of inheritance for functionality can get you extra functionality that you personally didn’t write. You can also inherit a compromising little bug or two unless you vet your supply chain very carefully – and even then, life happens.
Bad Actors in IT Infrastructure Monitoring
In large IT infrastructure monitoring environments, there are sometimes a group or team dedicated to deployments and upgrades. That team has people, called system admins, who have root or admin authority to each and every server in the organization. This allows those admins to shell or remote shell (a shell is command window) into any server. Root is particularly dangerous because it allows ANY function to be performed on the server. Some agents require root access to be installed. Some agents require root access to run! Aahhh! The answer to that is: if it does, DON’T.
If there is a bad actor in your environment, it’s a bad thing, but it becomes even worse if they can use an agent in your IT infrastructure monitoring solution to leave behind and do their dirty work – or even worse, deliver sensitive data to outside repositories and external bad actors. There’s likely no way to prevent damage in that case outside of diligent monitoring of an agents’ activity (which is resource intensive, to say the least).
In contrast to these data-collection-type monitoring solutions, there are other IT Infrastructure monitoring technologies that do not use agents, namely “agentless”, which only require active communication to the server to enable an application to ask that server (or app) a question and return an answer. While a bad actor can still compromise that communication, there is no process left behind to keep working, impair the performance of the server, or create other compromising services. Agents also require similar communication to the server, but with agents they stay running on the server even when not requested to do work.
A quick note here about Appliances: Some servers are considered appliances – black boxes where no code is allowed to be installed on the appliance outside of the vendor who created the appliance. In this case, whatever functionality you may want your agent to perform for you isn’t possible because the appliance prohibits it. So, you cannot use your agent at all, not even to monitor those appliances.
Other than running an appliance, how can you be confident that agents haven’t compromised your server(s)? Well, you can’t unless you have a way to monitor THEM at a deep level. Kind of like leaving the 5-year-old in charge of the 2-year-old! This level of monitoring would be very data intensive and correspondingly expensive.
In my own long programming history, I’ve written many agents, but in 2006 I started thinking about IoT and appliances and devices and began to take the agentless path. Knowing what I know about both techniques, I much favor agentless for all of the aforementioned reasons. While it doesn’t 100% eliminate risk, it does drop the odds greatly. This is the approach we took with our agentless infrastructure and application monitoring solution, Infrared360®
You can learn a little more about an agentless approach to cloud infrastructure and application monitoring in a whitepaper we recently released.
If you’d like to see an agentless monitoring and management solution for your IT infrastructure and applications (on-prem, in the cloud, or both) on a real, live network (we’ll even customize the demo to loosely resemble your own environment) click here to get a live demo of our Infrared360® solution.