Language Decision Processes (LDP)

Last updated 4 days ago

Language Decision Processes (LDP)

LDP is a software framework for enabling modular interchange of language agents, environments, and optimizers. A language decision process is a partially-observable Markov decision process (POMDP) where actions and observations consist of natural language. The full definition from the Aviary paper is:

See the following for an example of how to run an LDP agent.

| | |

What's New?

Check out our new notebook on running an LDP agent in an Aviary environment!
The Aviary paper has been posted to ! Further updates forthcoming!

Overview

A pictorial overview of the language decision process (LDP) framework together with five implemented Aviary environments.

Getting Started

To install ldp:

pip install -e .

To install aviary and the nn (neural network) module required for the tutorials:

pip install "ldp[nn]" "fhaviary[gsm8k]"

If you plan to export Graphviz visualizations, the graphviz library is required:

Linux: apt install graphviz
macOS: brew install graphviz

Tutorial Notebooks

Running an Agent on an Aviary Environment

The minimal example below illustrates how to run a language agent on an Aviary environment (LDP's sister library for defining language agent environments - https://github.com/Future-House/aviary)

from ldp.agent import SimpleAgent
from aviary.core import DummyEnv

env = DummyEnv()
agent = SimpleAgent()

obs, tools = await env.reset()
agent_state = await agent.init_state(tools=tools)

done = False
while not done:
    action, agent_state, _ = await agent.get_asv(agent_state, obs)
    obs, reward, done, truncated = await env.step(action.value)

Below we elaborate on the components of LDP.

Agent

An agent is a language agent that interacts with an environment to accomplish a task. Agents may use tools (calls to external APIs e.g. Wolfram Alpha) in response to observations returned by the environment. Below we define LDP's SimpleAgent which relies on a single LLM call. The main bookkeeping involves appending messages received from the environment and passing tools.

from ldp.agent import Agent
from ldp.graph import LLMCallOp


class AgentState:
    def __init__(self, messages, tools):
        self.messages = messages
        self.tools = tools


class SimpleAgent(Agent):
    def __init__(self, **kwargs):
        super().__init__(**kwargs)
        self.llm_call_op = LLMCallOp()

    async def init_state(self, tools):
        return AgentState([], tools)

    async def get_asv(self, agent_state, obs):
        action = await self.llm_call_op(
            config={"name": "gpt-4o", "temperature": 0.1},
            msgs=agent_state.messages + obs,
            tools=agent_state.tools,
        )
        new_state = AgentState(
            messages=agent_state.messages + obs + [action], tools=agent_state.tools
        )
        return action, new_state, 0.0

An agent has two methods:

agent_state = await agent.init_state(tools=tools)
new_action, new_agent_state, value = await agent.get_asv(agent_state, obs)

The get_asv(agent_state, obs) method chooses an action (a) conditioned on the observation messages returning the next agent state (s) and a value estimate (v).
The first argument, agent_state, is an optional container for environment-specific objects such as e.g. documents for PaperQA or lookup results for HotpotQA,
as well as more general objects such as memories which could include a list of previous actions and observations.agent_state may be set to None if memories are not being used.
The second argument obs is not the complete list of all prior observations, but rather the returned value from env.step.
The value is the agent's state/action value estimate used for reinforcment learning training. It may default to 0.

Stochastic Computation Graph (SCG)

For more advanced use-cases, LDP features a stochastic computation graph which enables differentiatiation with respect to agent parameters (including the weights of the LLM). The example computation graph below illustrates the functionality

from ldp.graph import FxnOp, LLMCallOp, PromptOp, compute_graph

op_a = FxnOp(lambda x: 2 * x)

async with compute_graph():
    op_result = op_a(3)

The code cell above creates and executes a computation graph that doubles the input. The computation graph gradients and executions are saved in a context for later use, such as in training updates. For example:

print(op_result.compute_grads())

A more complex example is given below for an agent that possesses memory.

@compute_graph()
async def get_asv(self, agent_state, obs):
    # Update state with new observations
    next_state = agent_state.get_next_state(obs)

    # Retrieve relevant memories
    query = await self._query_factory_op(next_state.messages)
    memories = await self._memory_op(query, matches=self.num_memories)

    # Format memories and package messages
    formatted_memories = await self._format_memory_op(self.memory_prompt, memories)
    memory_prompt = await self._prompt_op(memories=formatted_memories)
    packaged_messages = await self._package_op(
        next_state.messages, memory_prompt=memory_prompt, use_memories=bool(memories)
    )

    # Make LLM call and update state
    config = await self._config_op()
    result = await self._llm_call_op(
        config, msgs=packaged_messages, tools=next_state.tools
    )
    next_state.messages.extend([result])

    return result, next_state, 0.0

We use differentiable ops to ensure there is an edge in the compute graph from the LLM result (action) to components such as memory retrieval as well as the query used to retrieve the memory.

Why use an SCG? Aside from the ability to take gradients, using the SCG enables tracking of all inputs/outputs to the ops and serialization/deserialization of the SCG such that it can be easily saved and loaded. Input/output tracking also makes it easier to perform fine-tuning or reinforcement learning on the underlying LLMs.

Generic Support

Agent is designed to support arbitrary types
Subclasses can precisely specify state types, making the code more readable

from dataclasses import dataclass, field
from datetime import datetime

from ldp.agents import Agent


@dataclass
class MyComplexState:
    vector: list[float]
    timestamp: datetime = field(default_factory=datetime.now)


class MyAgent(Agent[MyComplexState]):
    """Some agent who is now type checked to match the custom state."""

References

Nextpackages