Needs-based AI: part 1, needs and the main AI loop

(This note is a part of the needs-based AI series)

Main AI Loop

Let’s get right to the meat of the matter: the main AI loop. There are many ways to drive an agent; many games use finite-state machines, or behavior trees, or other approaches. Needs-based AI is an alternative that works as follows.

To pick something to do, the agent looks around the world, and figures out what various things can be done, based on what’s in the area. The agent makes a list of what activities are possible, and scores how beneficial they are. Finally, it picks a good one based on the score, finds what actions make it up, and pushes them onto its action queue.

The highest-level AI loop looks like this:

Main AI loop:

  • While there are actions in the queue, pop the next one off, perform it, and get the reward
  • If you run out of actions, perform action selection based on current needs, to find more actions
  • If you still have nothing to do, do some fallback actions

That second step, the action selection point, is where the actual choice happens. It decomposes as follows.

Need-based action selection:

  1. Examine objects around you, and find out what they advertise
  2. Score each advertisement based on your current needs
  3. Pick the best advertisement, get its action sequence
  4. Push the action sequence on your queue

The following sections will delve more deeply into each of those steps.

 

Needs

Needs correspond to individual behavior drivers; for example, the need to eat, drink, rest, and so on. The choice of needs depends very much on the game. A simulator of everyday people such as the Sims borrowed heavily from Maslow’s hierarchy, and ended up with a mix of biological and emotional drivers. A different game should include a more specific set of drivers.

Inside the engine, needs are routinely represented as simple numeric values, which decay over time. In this discussion we use the range of [0, 100]. Depending on the context, we use the term ‘need’ to describe both the driver itself (eg. hunger), or its numeric value (eg. 50).

Needs routinely have the semantics of “lower is worse and more urgent”, so that hunger=30 means “I’m pretty hungry,” while hunger=90 means “I’m satiated.” Performing an action then “refills” the need to a higher value.

To simulate needs getting worse and more urgent over time, their values also decay at some rate. For example, decreasing the hunger value over time simulates the agent getting hungry if they don’t eat. Performing the “eat” action would then refill it, and it would be less important.

 

Advertisements and Discovery/Selection Decoupling

When the time comes to pick a new set of actions, the agent looks at what can be done in the environment around them, and evaluates the effect.

Each object in the world advertises a set of action/reward tuples – some actions to be taken, with a promise that they will refill some needs by some amount. For example, a fridge might advertise a “prepare food” action with a reward of +30 hunger, and “clean” with the reward of +10 environment. I will write these as tuples: < “prepare food”, +30 hunger > or < “clean”, +10 environment >, respectively. 

To pick an action, the agent examines the various objects around them, and finds out what they advertise. Once we know what advertisements are available, each of them gets scored, as described in the next section. The agent then picks the best advertisement using the score, and adds its actions to their pending action queue.

Please notice that the discovery of what actions are available is decoupled from choosing among them: the agent “asks” each object what it advertises, and only then scores what’s available. The object completely controls what it advertises as available, so it’s easy to enable or disable actions based on object state. This provides great flexibility, for example, a working fridge might advertise “prepare food” by default; once it’s been used several times it also starts advertising “clean me”; finally, once it breaks, it stops advertising anything other than “fix me” until it’s repaired.

Without this decoupling, imagine coding all those choices and possibilities into the agent itself, not just for the fridge but also for all the possible objects in the world – it would be a disaster, and impossible to maintain.

 

(Go to intro, part 1, part 2, part 3)