User Agents Have Existed for Years
Yep. So why have companies started bringing it up as a selling point now? To sell you stuff—the same reason most decisions are made. Companies have to make money; that’s just how it works. It would be a poor company that doesn’t make money. That is their purpose, despite any nice words in their mission statement.
There’s nothing wrong with companies making money—somebody has to—but it does become annoying when this behavior starts to twist the meaning of things. I don’t think this is done maliciously; it just happens because there are perverse incentives to leap on tech trends. Even smart companies fall into this trap, as misusing words can draw eyes.
Those of us in the tech sector roll our eyes and continue on. For those outside our little bubble, the real meaning of the word is changed, and now we are talking about two different things. Jargon is sacrificed to the buzzword gods, and alignment is decreased. So let’s start with some level-setting about agents.
What Is an Agent?
In software, an agent is an element that acts on behalf of a user. These are a sub-pattern of the client pattern. If that sounds vague, you’re right. Software engineers like these things called Patterns. Patterns are little bits of logic that crop up repeatedly, having some general usefulness. They are abstract, can be generalized, and prevent you from reinventing the wheel. There are whole books on the topic of software patterns. Read a couple, and now you’re fun at parties. Well, very specific kinds of parties—mostly with other people who read books on software patterns.
If you need a clearer picture of an agent, I have the perfect example for you. In fact, the odds are very high that you are using this agent right now. May I present: the Web Browser.
A web browser is a user agent because it acts as the intermediary between a website and a user. When you type a URL into your web browser or click on a link, the browser sends a request to the server hosting the website. To help the server respond appropriately with the correct version of the content (e.g., HTML, CSS, JavaScript) tailored for different devices, browsers send User-Agent strings. These strings contain information about the browser’s name, version, platform, operating system, and architecture, among other details.
The browser takes your intent—looking at cute cat pictures—and performs all the necessary tasks to get that cat picture on your screen at the right size and resolution, just as cute as a button.
Isn’t That What All Software Does?
Well, kinda.
Oh, you’re still interested in the difference? The key part is that it takes action on behalf of a user. Most (all?) software takes actions, but it just does the action and doesn’t care who it’s for. An agent knows who it is taking the action for and has the details about how the user wants the action to occur. This is a subtle difference, but a very key one. Let’s compare a browser to an FTP client.
Web browsers, as user agents, fetch content like cat pictures by initiating HTTP requests and providing detailed User-Agent strings to servers. This allows for optimized responses tailored to specific devices and browser versions. In contrast, FTP clients retrieve content more directly using the FTP protocol without sending user information or tailoring interactions based on server-side intelligence. Essentially, while both browsers and FTP clients are responsible for fetching content from servers, browsers do so with additional sophistication and adaptability through HTTP communication and User-Agent identification.
So Why Are People Excited About Agents?
People are excited because LLMs allow for smarter agents—agents that can do more, agents that can use semantic reasoning to figure out what you mean when you say “cute cat pictures,” a more general-purpose agent.
Agents in the past have been very limited in what they can do and how flexible they can be. LLMs enable the translation of context into different forms, making them flexible in how they act as agents. Here’s an example of how an LLM-enabled agent could retrieve cat pictures in a different way than current agents. But first, let’s talk about how they work.
How LLM-Enabled Agents Work
From this point on, when I say “agent,” I mean LLM-enabled agents. Here’s my definition:
Agents are pieces of software that use semantic reasoning and tools to act on behalf of a user.
Semantic reasoning refers to the agent’s ability to understand and process the meaning behind a user’s query or the data it interacts with. This involves interpreting language, making logical inferences, and using contextual understanding to determine what steps to take and how to act effectively. This is made possible by the pattern matching and completion provided by the LLM.
For example:
-
Understanding the Query
When a user asks, “Find pictures of black kittens,” the agent must recognize that “black” is a color and “kittens” refers to young cats. This understanding is semantic reasoning, where the agent grasps the meaning of the words and their relationships. -
Interpreting Results
When the agent retrieves data, it must analyze the results to ensure they align with the query. For instance, if some retrieved pictures show adult black cats instead of kittens, the agent needs to refine its actions. -
Making Informed Decisions
Semantic reasoning helps the agent decide what to do next, such as querying the API differently, combining data, or explaining its findings to the user.
Tools are external resources or systems the agent can use to gather information, perform actions, or solve specific tasks. Tools act like the agent’s “hands” to interact with the world beyond just reasoning.
Example of an Agent at Work
Let’s consider our case of getting cute cats. We’ll send the agent this text:
“Can you find me pictures of cute black kittens?”
Step 1: Understand the Query
The agent interprets the query:
- “Pictures” indicates the user wants images, not text or data.
- “Cute black kittens” specifies criteria:
- The subject: cats.
- Attributes: black (color), kitten (age group), cute (implied, so no need to explicitly filter).
Reasoning
“To fulfill this request, I need to find a source of cat pictures and filter by color (black) and age (kitten).”
Step 2: Decide to Use a Tool
The agent determines:
- An appropriate tool (e.g., a Cat Image API) must be used to retrieve this specific data.
Step 3: Construct the API Query
The agent builds the query:
- Identify the API endpoint (e.g.,
/v1/cats
). - Add parameters for
color=black
andage=kitten
.
Step 4: Execute the Query
The agent sends the request to the API:
- Example request:
https://api.cute-cat-pictures.com/v1/cats?color=black&age=kitten&limit=5
Step 5: Interpret the API Response
The API responds with data:
{
"results": [
{"image_url": "https://catpics.com/black-kitten1.jpg"},
{"image_url": "https://catpics.com/black-kitten2.jpg"},
{"image_url": "https://catpics.com/black-kitten3.jpg"}
]
}
The agent extracts the image_url
values.
(BTW, those links don’t work. This is an especially sad example of hallucination, but we will soldier on.)
Step 6: Provide the Answer
The agent presents the URLs to the user as hyperlinks so the user can view the images.
Wrapping It All Up: Cats, Agents, and Buzzwords
So there you have it: agents are not new, but the LLM-powered ones are like the overly ambitious interns of the software world—eager to prove themselves by doing everything on your behalf, even if they sometimes hallucinate cat pictures that don’t exist. The buzzwordification of “agent” isn’t inherently evil (corporate slogans aside); it’s just a symptom of the tech industry’s habit of taking something old, slapping a shiny new label on it, and pretending they invented fire.
But hey, at least this time it’s genuinely exciting. LLM-enabled agents have the potential to be as transformative as they claim—assuming we can keep their semantically reasoning little brains focused on actual tasks and not their existential crises about whether that cat is cute enough.
So next time you hear someone going on about “the future of agents,” just smile, nod, and maybe send them this article. Then go back to using your browser, your real MVP agent, to look up more cute cat pictures—the most noble use of technology we’ve invented to date. 🐾