Advertisement Scoring
Once we have an object’s advertisement, we need to score it, and stack it against all the other advertisements from this and other objects. We score an advertisement based on the reward it promises (eg. +10 environment), and the agent’s current needs. Of course it’s not strictly necessary that those rewards actually be granted as promised; this is known as false advertising, and can be used with some interesting effects, as described later.
Here are some common scoring functions, from the simplest to the increasingly more sophisticated:
a. Trivial scoring
future value need = current value need + advertised delta need
score = ∑ all needs (future value need)
Under this model, we go through each need, look up the promised future need value, and add them up. For example, if the agent’s hunger is at 70, an advertisement of +20 hunger means the future value of hunger will be 90; the final score is the sum of all future values.
This model is trivially easy, and has significant drawbacks: it’s only sensitive to the magnitude of changes, and doesn’t differentiate between urgent and non-urgent needs. So increasing hunger from 70 to 90 has the same score as increasing thirst from 10 to 30 – but the latter should be much more important, considering the agent is very thirsty!
b. Attenuated need scoring
Needs at low levels should be much more urgent than those at high levels. To model this, we introduce a non-linear attenuation function for each need. So the score becomes:
score = ∑ all needs A need (future value need)
where A need is the attenuation function, mapping from a need value to some numeric value. The attenuation function is commonly non-linear and non-increasing: starts out high when the need level is low, then drops quickly as the need level increases.
For example, consider the attenuation function A(x) = 10/x. An action that increases hunger to 90 will score of 1/9, while an action that increases thirst to 30 will have a score of 1/3, so three times higher, because low thirst is much more important to fulfill.
These attenuation functions are a major tuning knob in needs-based AI. We will have more to say about various attenuation functions later.
You might also notice one drawback: under this scheme, improving hunger from 30 to 90 would have the same score as improving it from 50 to 90. Worse yet, worsening hunger from 100 to 90 would have the same score as well! This detail may not be noticeable in a running system, but it’s easy to fix, by examining the need delta as well.
c. Attenuated need-delta scoring
It’s better to eat a filling meal than a snack, especially when you’re hungry, and worse to eat something that leaves you hungrier than before. To model this, we can score based on need level difference:
score = ∑ all needs (A need (current value need) – A need (future value need))
For example, let’s consider our attenuation function A(x) = 10/x again. Increasing hunger from 30 to 90 will now score 1/3 – 1/9 = 2/9, while increasing it from 60 to 90 will score 1/6 – 1/9 = 1/18, so only a quarter as high. Also, decreasing hunger from 100 to 90 will have a negative score, so it will not be selected unless there is nothing else to do.
Action Selection
Once we know the scores, it’s easy to pick the best one. Several approaches for arbitration are standard: in a winner-takes-all approach, the highest-scoring action gets picked; in a weighed-random approach, we do a weighed random selection from the top n (eg. top 3) high-scoring advertisements; other approaches are easy to imagine, such as a priority-based behavior stack.
In everyday implementation, weighted-average is a good compromise between having some predictability about what will happen, and not having the agent look unpleasantly deterministic.
Action Selection Additions
The model described above can be extended in many directions, to add more flexibility or nuance. Here are a few additions that I’ve used in the past, and how they have fared:
Attenuating score based on distance
Given two objects with identical advertisements, an agent should tend to pick the one closer to them. We can do this by attenuating each object’s score based on distance or containment:
score = D ( ∑ all needs ( … ) )
where D is some distance-based attenuation function, commonly a non-increasing one, such as the physically-inspired D(x) = x / |distance|^2
However, distance attenuation can be difficult to tune, because a distant object’s advertisement will be lowered not just compared to other object of this type, but also compared to all other advertisements. This may lead to a “bird in hand” kind of behavior, where the agent prefers a much worse action nearby rather than a better one further away.
Filtering advertisements before scoring
It’s useful to add pre-requisites to advertisements: for example, kids should not be able to operate stoves, so the stove should not advertise the “cook” action to them. This can be implemented in several ways, from simple attribute tests, to a full expressive language for Boolean predicates.
From personal experience, I would recommend starting out with something simple, because complex prerequisites are more difficult to debug when there are many agents running around. An easy prerequisites system could be as simple as setting Boolean attributes on characters (eg. is-adult, etc.), and adding an attribute mask on each advertisement; action selection would only consider advertisements whose mask matches up against the agent’s attributes.
Attenuation function tuning
Attenuation functions map from low need levels to high scores. Each need can be attenuated differently, since some needs are more urgent than others. As such, they are a major tuning knob in games, but a delicate one because their effects are global, affecting all agents. This requires good design iterations; but analytic functions (eg. A(x) = 10/x) are not easy to tweak by designers or reason about.
I have found the happiest medium by defining attenuation functions using piecewise-linear functions (ie. point pairs, rather than analytic formulas) in a spreadsheet file, and loading them during the game.
Tuning need decay
Agents’ need levels should decay over time; this causes agents to change their priorities over time. We can tweak this system by modifying how quickly those needs decay. For example, if an agent’s hunger doesn’t decay as quickly, they will not need to eat as often, and will have more time for other pursuits.
We can use this to model a very bare-bones personality profile, eg. whether someone needs to eat/drink/entertain themselves more or less often. It can also be used for difficulty tuning: agents whose needs decay more quickly are harder to please.
Tuning advertisement scores
The scoring function can also simulate simple personality types directly, by tuning down particular advertisement scores. To do this, we would have each agent contain a set of tuning parameters, one for each need, which modify that need’s score:
new score agent, need = old score agent, need * tuning agent, need
For example, by tuning down the +hunger advertisement’s score, we’ll get an agent that has a stronger preference for high-quality food; tuning up a +thirst advertisement will produce an agent that will opt for less satisfying drinks, and so on.
(Go to intro, part 1, part 2, part 3)