Complex self-driving behaviours emerging from affordance competition in layered control architectures

—The deployment of autonomous driving technol- ogy is hindered by “corner cases”: unusual nuanced conditions that the self-driving software cannot understand and act fully. We argue that some corner cases originate from a “narrow AI” approach, which lacks the general knowledge that humans exploit when dealing with these cases. We propose an alternative that can be seen as a step toward features of Artificial General Intelligence. Weexploit the biological principle of affordance competition in layered control architectures to create an artificial agent that realizes emergent, adaptive, and logical behaviors without programming case-specific rules or algorithms. We give six different examples of simple and complex emergent behaviors. For the case study of merge scenarios, we contrast the approach of this paper with an algorithmic solution of the literature. The ideas presented here (if not the whole agent’s sensorimotor organization) could be used to improve the robustness and flexibility of self-driving technology.


Introduction
The notion of self-driving vehicles is old-fashioned, but is still an active field of industrial research.A CB Insight report (CB Insights, 2015-2020) listed more than 40 companies that work on autonomous vehicles in 2020.
The main motivations for automating driving are: (1) Safety.Under the assumption that humans are bad drivers and that it is possible to automate driving.(2) Providing new mobility services to reduce traffic congestion, energy consumption and pollution, and for people who cannot drive.
For decades, the achievement of self-driving vehicles has been a mirage; it has become closer recently (but not without issues), in parallel with the rapid advancement of Artificial Intelligence (AI) powered by deep neural networks.

Autonomous vehicles within AI
It is natural to ask what role AI plays in the research on autonomous vehicles (AV); more specifically, can AV technology be considered part to meet the safety goal (point 1), it would be clear that these systems should be significantly more reliable than human drivers.That would raise the bar to the level of one death every 1-10 billion miles (which still means about 400 deaths per year in the EU) or even more.
However, the recent development of autonomous driving technology appears to be impeded by the continued emergence of many edge and corner cases (Adams, 2020;Anderson, 2020;Arieff, 2021;Eliot, 2021;The New York Times, 2021a, 2021b).The prototypes of driverless vehicles show a long tail of unexpected situations in which cars are unable to act (Boggs, Arvin, & Khattak, 2020;Dixit, Chand, & Nair, 2016;Lv et al., 2018), or act dangerously (Marshall & Davies, 2019;US National Transport Safety Board, 2019).
The difference between narrow and general intelligence becomes crucial when dealing with edge-and corner-cases.During unexpected situations and emergencies, the most appropriate reaction might come from a broad knowledge of objects in the world and intuitive cognition of the physics that rules their motion and interactions (we will give an example in Section 2.1).However, note that there is no interest in developing AV capable of performing other tasks in addition to driving.For this reason, it may never fully belong to a strict definition of AGI.Still, we believe that advanced autonomous driving systems should be at an intermediate point in the intelligence spectrum between narrowly specialized AI models and AGI.

Paper contribution
The work presented here attempts to move from narrow AI to AGI by using certain architectural principles belonging to AGI, with the objective of better coping with driving situations requiring high-level intelligence.
The industrial approach to dealing with corner cases relies on large-scale research and development programs (Connected Automated Driving, 2021) that combine experiments, simulation, and standardization activities; for example, the PEGASUS (Project PEGASUS, 2019) and HEADSTART (EU project HEADSTART, 2019) projects in the EU.
This paper presents an agent's sensorimotor architecture that produces polite behavior as an emergent feature of the sensorimotor system.We speculate that such a property may ease the burden of dealing with corner cases.
Among the various approaches to AGI (Goertzel, 2014), the agent presented here loosely embraces three of them.First, the agent takes inspiration from several features of brain organization regarding perception and action selection.Second, we are in agreement with the ''emergentist approach'', in which internal representations of concepts and behavioral schemes emerge from lower-level dynamics.Third, our agent follows the embodied perspective in which intelligence is something that physical moving agents do in physical environments.
The novel contribution is found in Section 4. It is preceded by a summary of the agent's architecture in Section 3, which was published previously, but is summarized here to the extent necessary to explain the contribution of Section 4.

Paper organization
In the next Section, we review the two main approaches to AV that follow the narrow AI account: in Section 2.1, the approach based on an engineering decomposition of the overall system into sequential modules (e.g., sense-think-act); in Section 2.2, the opposite approach known as end-to-end.We argue that both are brittle and harbor seeds for corner cases.
In Section 2.3 we present the layered control architecture, a biological solution capable of generating emergent adaptive behavior with successful robotics applications.We clarify how our layered control architecture with affordance competition works in Section 3. Finally, Section 4 constitutes the core of the paper.It gives six different examples of emergent behaviors, in most cases for non-trivial scenarios (when possible, recalling examples of the same scenarios handled in a traditional way).Multimedia materials support the demonstrations.
From an engineering perspective, the scheme of Fig. 1 is a convenient decomposition of a system's functions, which works fine when the system states can be modeled perfectly beforehand (which is the case of many engineered systems).
However, it shows weaknesses when faced with situations that cannot be perfectly specified in advance.In this model, a designer decides how to ''represent'' the world and, in particular, the symbols used for that representation (for example, the classes of objects, such as cars, pedestrians, etc.).These symbols are abstract labels (Keijzer, 2002) that do not contain any information about how real things work.The self-driving behaviors -what to do with given classes -are programmed by a human designer.
The reasons why we argue that the scheme of Fig. 1, as an artificial cognitive system, is brittle are at least twofold: (1) the inherent deficiency of narrow-AI and (2) the complexity of programming every behavior.

Inherent narrow-AI deficiency
Fig. 2 gives an example of a typical outcome of a narrow-AI perceptual system that turns out to lack important information.It shows the output of a state-of-the-art semantic segmentation network, where the object labels lack any inferential meaning representation, unlike an AGI system.
The front minivan is classified as a vehicle not different from the others.Information about the dangerousness of the minivan load, which could fall onto the road through the open door, is lost.The reason is that the neural network's output symbols have been predefined without considering this unusual situation (and many others).Even if we could retrain the network with one additional class for the special vehicle (which would be a considerable undertaking), a quick Internet search reveals an infinite number of hanging-load situations, some requiring urgent action and others with little danger.Therefore, the class in itself still says nothing about which behavior is required: the meaning of the symbols and, consequently, action planning, is left to a human programmer with limited cues about what to do for virtually infinite variants of unforeseen situations.

Complexity of behavior programming
Even if the output coding of the perceptual system might include some meaningful content of objects, decision-making can still be problematic.Behavior selection and optimization are typically non-convex, and many special maneuvers are often handled by systems of rules or ad hoc algorithms.For example, negotiation in a merge scenario is solved  with a specific algorithm (which requires the creation of a phantom obstacle) in Kreutz and Eggert (2021).For the same courtesy merge example, the complexity related to behavior generation may also be glimpsed in a recent Ph.D. thesis (Menéndez-Romero, 2021), which shows how the function breaks down into a plethora of subcases.

End-to-end architectures
A recent alternative to the traditional sequential decomposition of autonomous driving tasks lies in the extreme opposite.End-to-end learning approaches train a (deep) neural network controller by giving expert driver demonstration examples.A notable demonstration by NVIDIA is the lane keeping function (Bojarski et al., 2016), where a deep neural network realizes a sensorimotor loop that produces control commands in response to raw camera data.The network learns patterns that could be considered as symbols created on purpose, not a rigid set of symbols like in the sense-think-act paradigm (Bojarski et al., 2017).
However, because of the narrow AI scope, the most appealing feature of the end-to-end strategy -dispensing with programmed internal representations and algorithms -is also a major source of trouble.It is impossible to acquire an implicit knowledge of objects and the world phenomena from a single narrow task of driving.The end-to-end approach struggles with the vastness and diversity of the training set necessary ''to train a generalizable model which can drive in all different environments'' (Bansal & Ogale, 2018), which is related to another recently discovered issue: the so-called causal confusion, i.e. the inability to grasp the ''causal structure of the interaction between the expert and the environment '' (De Haan, Jayaraman, & Levine, 2019).
Finally, human actions are a superintended choice between different affordances (Marti, Morice, & Montagne, 2015;Pezzulo & Cisek, 2016).This level of explanation is lacking with end-to-end approaches: it remains unclear which alternative actions have been evaluated and why they have been discarded.
At any given time, the largest number of potential actions that an agent detects in the environment -also called affordances (Gibson, 1986) -are simultaneously primed by the violet stack, creating a large pool of opportunities (the light orange circle).
The optimal action is then selected from the pool (action selection) through a robust centralized competition process (Cisek, 2007).The pool of actions instantiated before the final choice explains which alternative affordances were considered.

Our implementation of layered control
We have realized a self-driving agent with a layered control architecture, which is described in detail in Da Lio, Donà, Rosati Papini, and Gurney (2020).
For the reader's convenience, we summarize its operation here.However, we remark that this summary should not be considered a complete self-contained description of the agent.This description is limited to what is necessary to understand the novel contributions of the paper, which will be presented in Section 4. The Appendix A gives further details linked to the previous literature.
Technically speaking, the layers of the subsumption stack are ''inverse models'' that link the sensory effect to the motor command that causes it.In principle, they can be learned via generalized motor babbling, i.e., by producing exploratory motor commands and observing the sensory effects.Examples of bootstrapping subsumption architectures in this way were studied in the European projects COSPAL (EU project COSPAL, 2007) and DIPLECS (EU project DIPLECS, 2010).In biology, learning cerebellar inverse models from sensory-motor pairs is explained in Porrill, Dean, and Anderson (2012).
In driving environments, testing actions in the real world may be dangerous.In the EU Dreams4Cars project (EU project Dreams4Cars, 2019), we used an indirect approach that synthesizes the inverse models via ''mental simulation'' processes (Da Lio, Donà, Rosati Papini, Biral, & Svensson, 2020a), that is, interacting in a safe sandbox created with pre-learned forward models (the agent's offline operation).The process is inspired by human mental synthesis that occurs in various mental states, including when thinking about new actions and during sleep and dreams.

Online operation
During action (the agent's online operation), the inverse models in the subsumption stack respond to different affordances offered by the environment.For example, in Fig. 4, the sensorimotor model labeled ''lane follow'' reacts to the affordance corresponding to traveling in the lane ( 1 ).The inverse model labeled ''lane change left'' reacts to the possible action of traveling in both the current and the left lane ( 2 ).Each inverse model estimates the salience   ( 0 ,  0 ) for affordance   as: ..
where (), () are the lateral and longitudinal control,  is the discrete time and   [.] is the reward for using (), () for   , beginning with the environment and state of the vehicle of   .
Eq. ( 1) emphasizes the choice of instantaneous control  0 ,  0 over the future (), ().  ( 0 ,  0 ) is a topographic encoding of action values that tells the value of choosing  0 ,  0 for each   .
Fig. 4 shows that each inverse model is made up of excitatory and inhibitory loops.The formers activate regions of the  0 ,  0 plane corresponding to free navigable spaces.The latter loops suppress subregions that correspond to close encounters (yellow) or collisions with obstacles (red), Da Lio et al. (2020, Section III.C).
For action selection, the individual   are aggregated into a single  function.
( 0 ,  0 ) = max(    ( 0 ,  0 ),  ∈ affordances). ( where   are weights that can steer affordance selection.We have shown how this weighting mechanism can be used to navigate the landscape of affordances proactively and implement legal rules in Da The final action selection is carried out with multihypothesis sequential probability ratio test algorithm (MSPRT) (Baum & Veeravalli, 1994;Draglia, Tartakovsky, & Veeravalli, 1999), which is an algorithm that maximizes the probability of making the highest-salience choice in noisy situations and within a maximum decision time constraint.

Emergent behaviors
This section demonstrates emergent behaviors.These can be quite complex and look like what one might logically expect.The section presents six examples (simulated in IPG CarMaker and post-processed in Wolfram's Mathematica): merge, follow and overtake, reaction to cut-in, incorrect pedestrian crossing, traffic not giving way, and navigating an unusual wide lane with parallel traffic and an obstacle.These examples are presented in a supplement video, and the salient moments are commented on here.When possible, we compare the traditional sense-think-act solutions.

Format of videos
Fig. 5 shows the format of the videos.There are several data visualizations organized in panes.On the top left, the bird's eye view pane shows the legal corridors (the portion of the road that the selfdriving vehicle is legally allowed to use) and other objects on the road.On the top right, there is the corresponding camera view showing the whole road (in this example, the legal corridor is the rightmost lane) and the same objects present in the bird's eye view.In the upper middle, a panel representing the longitudinal intention/state of the selfdriving agent is shown.It shows four alternative conditions that are: (a) free flow, if the vehicle longitudinal control is not limited by objects, legal speed limits, or curves, (b) car-following (when the longitudinal control is constrained by the need to comply with an obstacle ahead, not necessarily a car), (c) speed limit (when the speed must be adapted to the next legal limit), (d) curve (when the speed must be reduced for a next curve).The current state is highlighted.At the bottom left, there is a panel concerned with the lateral state/intention.The panel shows three distinct affordances: remaining on the current legal corridor or moving to an adjacent corridor to the left or right.In the example, there are no legally affordable corridors on both sides; hence, the squares are gray.When there are choices (see Section 4.3 in frame labeled ''3''), they are shown here.The chosen affordance is highlighted and the others are dimmed.1For every affordable corridor, there are two sub-panels.The bottom one shows a representation of the 3D salience map (the top one is an intermediate step in the computation of the lateral salience map, which is related to the estimated probability of time-to-lane-crossing (TTLC) given a particular lateral control choicesee Appendix).On the right, there are two final panes.In the middle, the lateral salience, which visualizes the range of lateral control for remaining in the corridor.At the bottom, a chart shows the inhibitions (absolute in red, partial in yellow) produced by the obstacles shown in the bird's eye and camera views.The inhibited regions indicate the combination of lateral and longitudinal control at risk of collision.The lateral salience map and the inhibition map are related to the selected affordance (highlighted in the lateral state pane).The lateral salience is the lateral cross section of the 3D salience map.The inhibition map is a contour density map of the 3D salience, where red means zero salience.The green-white boundary on the inhibition map visualizes the position of the maximum longitudinal salience.
A tiny blue circle indicates the instantaneous longitudinal and lateral control selected by the agent.This choice, propagated backward in the lateral state and longitudinal state panels, identifies which loop in the subsumption architecture corresponds to the chosen control, i.e., the agent's longitudinal and lateral intentions (the reactivation of the subsumed loops permits the generation of behavior in every detail).
An interesting observation related to the inhibition chart is that the inhibited neurons (in a neural implementation) corresponding to the obstacles may influence the agent's decision: if the inhibitions are close enough, they reduce the likelihood of the choice (close actions reinforce each other, and distant actions compete against each other).Between choices with close inhibitions and similar-valued options without inhibitions in the neighborhood, the agent will tend to prefer the latter, achieving robust behavior.

Emergent merge behavior
Fig. 6 shows three salient moments of the merge video example.In frame ''1'', two vehicles on the motorway (not shown yet in the camera view) are going to collide with the self-driving vehicle: they produce inhibitions that approach the action chosen by the agent.In frame ''2'', the inhibitions of the obstacles overlap the range of lateral control that permits the vehicle to remain in the lane.The only option for the agent to stay in the lane is to choose a longitudinal control toward the bottom of the chart, which means decelerating.The selected action is also shown in the 3D salience chart.The chosen longitudinal control is at the bottom edge of the partial inhibition, automatically producing an emergent car-following behavior.As the gap opens, the self-driving vehicle resumes speed approaching the speed limit and merges with the motorway (''3'').Similar examples found in the literature may be Kreutz and Eggert (2021), Menéndez-Romero (2021).They use an elaborate algorithm to produce an analogous behavior.

Follow and overtake
Fig. 7 shows three salient frames of the follow and overtake example.At ''1'', after completing the approach phase, the self-driving vehicle enters the car-following state (the selected action touches the obstacle inhibition and it would not be possible to travel faster).The two vehicles ahead proceed at the same speed (longitudinal inhibitions have a matching bottom), and there is no added value in moving to the left lane.The change left affordance exists but is not selected (dimmed colors).At ''2'', the obstacle on the left accelerates.Immediately, its inhibition in the control space moves upward and the change left affordance wins (the winner peak is shown in the lateral state panes).Therefore, the intention of the agent changes to the left lane change (the planned trajectory is shown in the bird's eye view), while the longitudinal state remains ''car following'' (following the left car).When the lane change is completed at ''3'', the agent has two options: staying on the left lane or returning to the right.The latter wins (lanes In this example, the green car has two options: traveling in the lane ( 1 ) or moving to the left lane ( 2 ).For each option, a map is created, which represents the reward   ( 0 ,  0 ) for using the instantaneous control  0 ,  0 (respectively, lateral and longitudinal control).The contour lines on the reward map are filled with green.The more intense the color, the higher the value, which peaks around the optimal control choice.Red and yellow represent regions inhibited by obstacles (with zero or diminished value).The value maps are aggregated into a final decision map ( 0 ,  0 ), weighted by weights   that can be used for a variety of purposes described in the text and Appendix A. Fig. 5. Format of videos.This graphical interface summarizes the agent's computations using several real-time data charts The bird's view shows a scheme of the relevant objects in the environment.The axes labels are: ''x (m)'' forward direction in meters (range −50 m to 150 m), and ''y (m)'' lateral direction in meters (range −30 m to + 30 m).The longitudinal states box shows four conditions that affect the speed choice: free flow, speed limit, curve, and car-following.The inhibition map is the aggregated salience chart ( 0 ,   ) of Fig. 4. The axes labels are: ''r (1/m/s)'' lateral control in the trajectory curvature rate (m −1 ∕s) (range −0.009 m −1 ∕s to 0.009 m −1 ∕s) and ''j m/s 3 '' longitudinal -jerk -control (range −2 m∕s 3 to + 2 m∕s 3 ).The lateral states box shows the   ( 0 ,  0 ) for the three legal lanes.The probabilistic TTLC (Time To Lane Crossing) subgraph shows the time when the probability of leaving the corridor in absence of corrections exceeds two thresholds (see Appendix).The axes labels are: ''r (1/m/s)'' and ''T (s)'' (range 0 to 6 s).The 3D salience map shows   ( 0 ,  0 ) with the elevation axis (labeled ''1/J'') being the salience (see Appendix).Finally, the lateral salience map shows the function  ( 0 ) (see Appendix).
to the right are slightly biased to favor the legal recommendation to return to the right when possible).The winner peak in the lateral intention pane corresponds to a path that will reenter the right lane after clearing the obstacle to the right, as can be seen in the video.Note that the intention of returning to the right after completing the overtake is taken very early, well before the right obstacle is cleared.

Reacting to cut-in
Fig. 8 illustrates the reaction to a cut-in maneuver.At ''1'', a vehicle arriving from the rear in the fast left lane is detected first.It blocks the lane change option (only the trajectories that do not complete the lane change survive on the salience map of the left panel.2 ) At ''2'', the fast vehicle frees the lane: the salience map for the left lane change widens, including complete lane change maneuvers.However, the agent does not move leftwards, as the current lane is free.At ''3'', the obstacle decelerates and moves toward our lane (cut-in).The winner action is now the lane change: at ''3'', as soon as the predicted obstacle path indicates cut-in, the agent plans to overtake.The rest of the video shows the completion of the overtake maneuver with a return to the right lane, similar to the example in Section 4.3.

Pedestrian crossing incorrectly
Corner cases may arise from the incorrect behavior of other road users (it is difficult to imagine every possible incorrect behavior).In this example, a pedestrian crosses the road when the traffic light turns red for the pedestrian and green for the self-driving vehicle.The salient frames are shown in Fig. 9.At ''1'', the self-driving vehicle stops because of the inhibition produced by the red traffic light.Crossing traffic produces additional inhibition to the same motor space region.When the traffic light turns green (''2''), and the crossing traffic stops, that motor region would become free, allowing the vehicle to start.However, the movement of the pedestrian keeps that region reinhibited (in fact, the path of a moving entity is predicted regardless of whether it is correct, as explained in the Appendix).Thus, the selfdriving vehicle does not start regardless of the green traffic light.We had a chance to compare the same situation with traditional self-driving software.The comparison is shown at EU project Dreams4Cars (2018).The traditional system starts when the traffic light turns green and brakes when the pedestrian is about to enter the lane.This example does not mean that every traditional self-driving system will behave the same way in that situation.However, it indicates that rule-based systems may be fragile and rigid.

Traffic not giving way
Fig. 10 shows a similar case: the crossing traffic does not give way.A safe stop maneuver emerges from the inhibition caused by the crossing vehicles.

Unusual wide lane with parallel traffic and obstacle
The situation shown in Fig. 11 combines a few unusual elements.The lane is 25 m wide.There is a stationary vehicle far ahead in the lane (with an abundance of room to pass on either side).Two vehicles follow the self-driving vehicle closely on both sides, at slightly different distances.The traditional self-driving software used for the pedestrian example (Section 4.5) would stop the car behind the stopped vehicle because it is a busy lane (without considering the unusual width of the lane).Such behavior is dangerous because people would not expect that (we again remark that this is the case of the particular implementation that we could use for Section 4.5 and here).Common sense would suggest letting the closest rear car on the left pass and then moving on the left to pass the stopped obstacle on the left.This behavior is emergent, and explained with Fig. 11 and Fig. 12. Vehicles start from rest.At low speed (time 5 s in Fig. 12) the entire action space is affordable.The agent accelerates under free flow conditions, remaining in the center of the lane (the vehicle speed is visible on the tachometer in the camera view).At about 8.4 s, the nearby vehicles -with increased speed -restrict the affordable control choices.At about 8.5 s the front obstacle is detected, which divides the affordable space into two parts ( 1 and  2 ).Between 8.5 s and 8.8 s (frame ''1'' in Fig. 11 is between the two times) the agent opts for the faster  1 ,  corresponding to passing in front of the right car and to the right of the obstacle.However, the vehicle on the right is getting closer until, at time 9.6 s,  1 vanishes.At about time 10 s the agent reverts the decision to  1 , which is tailing (and stopping) behind the front obstacle (frame ''2'').As the speed decreases, at time 12 s the left vehicle is fast enough to create another affordance,  3 (frame ''3''), which corresponds to tailing the left vehicle and passing to the left of the obstacle (note that the left vehicle is still on the side, but the motor space is based on future predicted position and therefore anticipates possible actions).

Discussion and conclusion
Section 3 presented a self-driving agent characterized by creating emergent behaviors, and we gave six examples of emergent behaviors (from Section 4.2 to Section 4.7).The principles used here are borrowed from biology and can be seen as a proposed step toward AGI, an alternative to the traditional narrow AI approaches mentioned in Section 2. The implementation of the agent as described in Section 3.1, is limited to the strict extent necessary for this paper; but a complete description is found in Da Lio et al. (2020).Here, we want to comment on two points: (1) The biological principles exploited in the agent and their implications for the agent's operation.
(2) How this approach compares to the ones based on narrow AI; and what might be the advantages for development/maintenance and scalability.

Biological principles exploited in the agent
Cisek's Affordance Competition hypothesis (Cisek, 2007) is the main idea.Parallel action priming and selection through a competitive process is the theoretical element that informs the agent's sensorimotor system (Fig. 4 vs.Cisek (2007, Fig. 1)).Continuous competition between affordances forms the basis for emergent adaptive behavior.
Another important idea is the topographic organization of the motor space, where affordances are encoded by hills of activity with the heights of the hills that encode the action value (or the salience, (2)).Trajectories that begin with similar control lie close to each other in the motor space and reinforce each other.Hence, by selecting one particular instantaneous control, the agent commits to a family of future maneuvers (beginning with a similar control and possibly diverging later) rather than selecting just one.This minimum commitment principle further maximizes adaptability because alternative maneuvers with high values may exist close to the highest-valued one.
Finally, the whole system is organized hierarchically (Fig. 4) with different branches dedicated to high-level goals.The branches are aggregated into a unique action-value map ( 0 ,  0 ) with weights that allow one to navigate the affordance landscape according to Pezzulo's hypothesis (Pezzulo & Cisek, 2016).This allows the agent's behavior to be guided by high-level directives, effectively modulating the value function ( 0 ,  0 ).This flexibility (i.e., a modulable reward function) does not normally exist in traditional reinforcement learning implementations.Even if not shown here, we must mention that this mechanism was used for the implementation of traffic regulations (Da Lio et al., 2020), long-term cautious behavior (Rosati Papini et al., 2022), and human-agent interactions (Da Lio et al., 2022b).Thus, the emergent behavior characteristic of this agent does not prevent controlling the agent's higher-level behaviors with directives (or rules), for example, to comply with a highway code.

How does this approach compare to narrow AI ones
Most traditional motion planning methods work by selecting one trajectory from a pool of alternatives; however, not with the sophistication of the above principles.In particular, while local trajectory optimization is obtained with various methods, among which optimal control, the choice among alternative higher-level behaviors is typically produced with ad hoc algorithms.
As an example, consider the merge scenario Section 4.2.An algorithm to deal with this situation was mentioned earlier (Kreutz & Eggert, 2021).It adapts the Intelligent Driver Model (IDM) to modulate the speed of a vehicle on a ramp entering the main road.The paper proposes a modified IDM (GIDM) and shows, with various performance indicators, that it works to regulate vehicle entrance.Although the ramp is adjacent to the road, the vehicle enters the road only at the end; therefore, it is equivalent to the situation in Section 4.2.We replicated the study with our agent, performing simulations spanning the same conditions described in the paper (Kreutz & Eggert, 2021, Section VI.B) to find that a fine adaptation of the speed in the ramp is automatically obtained for all the conditions considered.For example, Fig. 13 shows the time-to-collision (TTC) after merge obtained with different relative longitudinal positions of the vehicle and obstacle () and with different relative speeds ().The graph is directly comparable to (Kreutz & Eggert, 2021, Fig. 7).The top-left region occurs when the agent merges in front of the obstacle.The bottom-right region occurs when the agent merges behind (this latter part is not considered in Kreutz and Eggert (2021)).The white color occurs when the tailing vehicle (either the agent or the obstacle) is slower than the leading vehicle (therefore, TTC does not exist), or TTC is greater than 20 s.
This example shows that the principles commented on in Section 5.1 may be an alternative to programmed algorithms.Algorithms are based on schematic situations and may have limitations.For example, as discussed in Kreutz and Eggert (2021, Section IV) the algorithm is an adaptation of one-dimensional dynamics to the reality and needs various schematizations which are explained in subsections B to E. Of course, more complex algorithms are possible, for example (Menéndez-Romero, 2021), which are more difficult to develop, test, and validate.Thus, in our view, having behaviors that emerge from biological principles is an attractive perspective.

Limitations of this study and future plans
The main limitation of this study is its anecdotal nature.On the one hand, we show nontrivial emergent behaviors.On the other hand, the limited number of examples given here does not permit one to claim that emergent behaviors will be correct for any possible situation (but this problem holds also for algorithms).The testing and validation of self-driving software for long-term driving is a major undertaking for current autonomous driving development programs.For example, the recent EU project SUNRISE (Safety assUraNce fRamework for connected, automated mobIlity SystEms) (EU project SUNRISE, 2022) aims at developing test scenarios and test cases to address validation in the long term, inheriting the findings of many previous projects.
Note that, as explained in the Introduction, the presented work proposes a bridge between narrow AI and AGI, although it is not possible for it to fully belong to the latter.This is because the agent must give importance to the restricted set of affordances that are meaningful in the driving context.On the other hand, in the case of a human driver, behaviors can emerge from the knowledge of a wider variety of contexts, in which the social aspect has a significant weight.Despite this limitation, the agent architecture shown here produces behaviors that would otherwise require careful programming (Section 5.2, and Sections 4.5, 4.7 where they mention the outcome of a traditional software).
So what if we discover situations where the behaviors are not acceptable?Algorithmic solutions would call for modification of the algorithms and re-testing.In contrast, the agent above can exploit the affordance navigation and biasing mechanism (Section 5.1) to learn better high-level choices.An example of how this may work was given in Rosati Papini et al. (2022) for learning the speed choice in the case of distracted pedestrians.

Fig. 2 .
Fig. 2. A driving scenario in which a vehicle with a dangerously open door is confused with the others.

Fig. 4 .
Fig. 4.Dreams4Cars implementation of layered control with affordance competition.In this example, the green car has two options: traveling in the lane ( 1 ) or moving to the left lane ( 2 ).For each option, a map is created, which represents the reward   ( 0 ,  0 ) for using the instantaneous control  0 ,  0 (respectively, lateral and longitudinal control).The contour lines on the reward map are filled with green.The more intense the color, the higher the value, which peaks around the optimal control choice.Red and yellow represent regions inhibited by obstacles (with zero or diminished value).The value maps are aggregated into a final decision map ( 0 ,  0 ), weighted by weights   that can be used for a variety of purposes described in the text and Appendix A.

Fig. 10 .
Fig. 10.Stop in traffic that does not yield.

Fig. 13 .
Fig. 13.Speed adaptation on ramps obtained as an emerging behavior: time to collision (TTC) after merging with different relative longitudinal positions () and different relative speed ().The graph is directly comparable to (Kreutz & Eggert, 2021, Fig. 7).