Language Decision Processes (LDP)
Last updated
Last updated
LDP is a software framework for enabling modular interchange of language agents, environments, and optimizers. A language decision process is a partially-observable Markov decision process (POMDP) where actions and observations consist of natural language. The full definition from the Aviary paper is:
See the following for an example of how to run an LDP agent.
| | |
Check out our new notebook on running an LDP agent in an Aviary environment!
The Aviary paper has been posted to ! Further updates forthcoming!
A pictorial overview of the language decision process (LDP) framework together with five implemented Aviary environments.
To install ldp
:
To install aviary
and the nn
(neural network) module required for the tutorials:
If you plan to export Graphviz visualizations, the graphviz
library is required:
Linux: apt install graphviz
macOS: brew install graphviz
The minimal example below illustrates how to run a language agent on an Aviary environment (LDP's sister library for defining language agent environments - https://github.com/Future-House/aviary)
Below we elaborate on the components of LDP.
An agent is a language agent that interacts with an environment to accomplish a task. Agents may use tools (calls to external APIs e.g. Wolfram Alpha)
in response to observations returned by the environment. Below we define LDP's SimpleAgent
which relies on a single LLM call.
The main bookkeeping involves appending messages received from the environment and passing tools.
An agent has two methods:
The get_asv(agent_state, obs)
method chooses an action (a
) conditioned on the observation messages
returning the next agent state (s
) and a value estimate (v
).
The first argument, agent_state
, is an optional container for environment-specific objects such as e.g. documents for PaperQA or lookup results for HotpotQA,
as well as more general objects such as memories which could include a list of previous actions and observations.agent_state
may be set to None
if memories are not being used.
The second argument obs
is not the complete list of all prior observations, but rather the returned value from env.step
.
The value
is the agent's state/action value estimate used for reinforcment learning training. It may default to 0.
For more advanced use-cases, LDP features a stochastic computation graph which enables differentiatiation with respect to agent parameters (including the weights of the LLM). The example computation graph below illustrates the functionality
The code cell above creates and executes a computation graph that doubles the input. The computation graph gradients and executions are saved in a context for later use, such as in training updates. For example:
A more complex example is given below for an agent that possesses memory.
We use differentiable ops to ensure there is an edge in the compute graph from the LLM result (action) to components such as memory retrieval as well as the query used to retrieve the memory.
Why use an SCG? Aside from the ability to take gradients, using the SCG enables tracking of all inputs/outputs to the ops and serialization/deserialization of the SCG such that it can be easily saved and loaded. Input/output tracking also makes it easier to perform fine-tuning or reinforcement learning on the underlying LLMs.
Agent
is designed to support arbitrary types
Subclasses can precisely specify state types, making the code more readable
The Agent
(as well as classes in agent.ops
)
are ,
which means:
If you are new to Python generics (typing.Generic
),
please read about them in . Below is how to specify an agent with a custom state type.