Weisfeiler-Lehman goes Dynamic: An Analysis of the Expressive Power of Graph Neural Networks for Attributed and Dynamic Graphs

Graph Neural Networks (GNNs) are a large class of relational models for graph processing. Recent theoretical studies on the expressive power of GNNs have focused on two issues. On the one hand, it has been proven that GNNs are as powerful as the Weisfeiler-Lehman test (1-WL) in their ability to distinguish graphs. Moreover, it has been shown that the equivalence enforced by 1-WL equals unfolding equivalence. On the other hand, GNNs turned out to be universal approximators on graphs modulo the constraints enforced by 1-WL/unfolding equivalence. However, these results only apply to Static Attributed Undirected Homogeneous Graphs (SAUHG) with node attributes. In contrast, real-life applications often involve a much larger variety of graph types. In this paper, we conduct a theoretical analysis of the expressive power of GNNs for two other graph domains that are particularly interesting in practical applications, namely dynamic graphs and SAUGHs with edge attributes. Dynamic graphs are widely used in modern applications; hence, the study of the expressive capability of GNNs in this domain is essential for practical reasons and, in addition, it requires a new analyzing approach due to the difference in the architecture of dynamic GNNs compared to static ones. On the other hand, the examination of SAUHGs is of particular relevance since they act as a standard form for all graph types: it has been shown that all graph types can be transformed without loss of information to SAUHGs with both attributes on nodes and edges. This paper considers generic GNN models and appropriate 1-WL tests for those domains. Then, the known results on the expressive power of GNNs are extended to the mentioned domains: it is proven that GNNs have the same capability as the 1-WL test, the 1-WL equivalence equals unfolding equivalence and that GNNs are universal approximators modulo 1-WL/unfolding equivalence.


Introduction
Graph data is becoming pervasive in many application domains, such as biology, physics, and social network analysis [1,2].Graphs are handy for complex data since they allow for naturally encoding information about entities, their links, and their attributes.In modern applications, several different types of graphs are commonly used and possibly combined: graphs can be homogeneous or heterogeneous, directed or undirected, have attributes on nodes and/or edges, and be static or dynamic hyper-or multigraphs [3].Considering the diversity of graph types, it has recently been shown that Static Attributed Undirected Homogenous Graphs (SAUHGs) with both attributes on nodes and edges can act as a standard form for graph representation, namely that all the common graph types can be transformed into those SAUHGs without losing their encoded information [3].
In the field of graph learning, Graph Neural Networks (GNNs) have become a prominent class of models used to process graphs and address different learning tasks directly.For this purpose, most GNN models adopt a computational scheme based on a local aggregation mechanism.It recursively updates the local information of a node stored in an attribute vector by aggregating the attributes of neighboring nodes.After several iterations, the attributes of a node capture the local structural information received from its -hop neighborhood.At the end of the iterative process, the obtained node attributes can address different graph-related tasks by applying a suitable readout function.
The Graph Neural Network model of [4], which is called Original GNN (OGNN) in this paper, was the first model capable of facing both node/edge and graph-focused tasks utilizing suitable aggregation and readout functions, respectively.In the current research, a large number of new applications and models have been proposed, including Neural Networks for graphs [5], Gated Sequence Graph Neural Networks [6], Spectral Networks [7], Graph Convolutional Neural Networks [8], GraphSAGE [9], Graph Attention Networks [10], and Graph Networks [11].Another extension of GNNs consisted of the proposal of several new models capable of dealing with dynamic graphs [2], which can be used, e.g., to classify sequences of graphs, sequences of nodes in a dynamic graph or to predict the appearance of an edge in a dynamic graph.
Recently, a great effort has been dedicated to studying the theoretical properties of GNNs [12].Such a trend is motivated by the attempt to derive the foundational knowledge required to design reasonable solutions efficiently for many possible applications of GNNs [13].A particular interest lies in the study of the expressive power of GNN models since, in many application domains, the performance of a GNN depends on its capability to distinguish different graphs.For example, in Bioinformatics, the properties of a chemical molecule may depend on the presence or absence of small substructures; in social network analysis, the count of triangles allows us to evaluate the presence and the maturity of communities.Similar examples exist in dynamic domains: the distinction of substructures contributes significantly to the successful analysis of molecular conformations [14]; in studies of the evolution of social networks [15], substructures in the form of contextual knowledge can help to improve performance.
Formally, a central result on the expressive power of GNNs has shown that GNNs are at most as powerful as the Weisfeiler-Lehman graph isomorphism test (1-WL) [16,17,18].The 1-WL test iteratively divides graphs into groups of possibly isomorphic2 graphs using a local aggregation schema.Therefore, the 1-WL test exactly defines the classes of graphs that GNNs can recognize as non-isomorphic.
Furthermore, it has been proven that the equivalence classes induced by the 1-WL test are equivalent to the ones obtained from the unfolding trees of nodes on two graphs [19] [20], and hence, 1-WL and the unfolding equivalences can be used interchangeably.An unfolding tree rooted at a node is constructed by starting at the root node and unrolling the graph along the neighboring nodes until it reaches a certain depth.If the unfolding trees of two nodes are equal in the limit, the nodes are called unfolding tree equivalent 3 .Fig. 1 visualizes the relations between the 1-WL, the unfolding tree equivalences, and the GNN expressiveness.
Another research goal of the expressive power of GNNs is to study the GNN approximation capability.Formally, it has been proven in [22] that OGNNs can approximate in probability, up to any degree of precision, any measurable function (, ) → ℝ  that respects the unfolding equivalence.Such a result has been recently extended to a large Figure 1: a) In Thm.4.1.6,we prove the equivalence of the attributed unfolding tree equivalence (AUT) and the attributed 1-WL equivalence (1-AWL) for SAUHGs.Afterward in Thm.5.1.3,we show a result on the approximation capability of static GNNs for SAUHGs (SGNN) using the AUT equivalence.b) Analogously to the attributed case, we show similar results for Dynamic GNNs (DGNN) which can be used on temporal graphs.
Despite the availability of the mentioned results on the expressive power of GNNs, their application is still limited to undirected static graphs with attributes on nodes.This limitation is particularly restrictive since modern applications usually involve more complex data structures, such as heterogeneous, directed, dynamic graphs and multigraphs.In particular, the ability to process dynamic graphs is progressively gaining significance in many fields such as social network analysis [15], recommender systems [23,24], traffic forecasting [25] and knowledge graph completion [26,27].Several surveys discuss the usage of dynamic graphs in other application domains [1,28,29,30,31].
Although GNNs are considered universal approximators on the extended domains, it is uncertain which GNN architectures contribute to such universality.Moreover, an open question is how the definition of the 1-WL test has to be modified to cope with novel data structures and whether the universal results fail for particular graph types.
In this paper, we propose a study on the expressive power of GNNs for two domains of particular interest, namely dynamic graphs and static attributed undirected homogeneous graphs (SAUHGs) with node and edge attributes.On the one hand, dynamic graphs are interesting from a practical and a theoretical point of view and are used in several application domains [2].Moreover, dynamic GNN models are structurally different from GNNs for static graphs, and the results and methodology required to analyze their expressive power cannot directly be deduced from existing literature.On the other hand, SAUHGs with node and edge attributes are interesting because, as mentioned above, they act as a standard form for several other types of graphs that can all be transformed to SAUGHs [3].
First, we introduce appropriate versions of the 1-WL test and the unfolding equivalence to construct the fundamental theory for the domains of SAUHGs and dynamic graphs and discuss their relation afterward.Then, we consider generic GNN models that can operate on both domains and prove their universal approximation capability modulo the aforementioned 1-WL/unfolding equivalences.More precisely, the main contributions of this paper are as follows.
• We present new versions of the 1-WL test and of the unfolding equivalence appropriate for dynamic graphs and SAUHGs with node and edge attributes, and we show that they induce the same equivalences on nodes.Such a result makes it possible to use them interchangeably to study the expressiveness of GNNs.
• We show that generic GNN models for dynamic graphs and SAUHGs with node and edge attributes are capable of approximating, in probability and up to any precision, any measurable function on graphs that respects the 1-WL/unfolding equivalence.
• The result on approximation capability holds for graphs with unconstrained attributes of reals and target functions.Thus, most of the domains used in practical applications are included.
• Moreover, the proof is based on space partitioning, which allows us to deduce information about the GNN architecture that can achieve the desired approximation.
• We validate our theoretical results conducting an experimental validation.Our setups show that 1) sufficiently powerful DGNNs can approximate well dynamic systems that preserve the unfolding equivalence and 2) using non-universal architectures can lead to poor performances.
The rest of the paper is organized as follows.Section 2 illustrates the related literature.In section 3, the notation used throughout the paper is described, and the main definitions are introduced.In section 4, we introduce the 1-WL test and unfolding equivalences suitable for dynamic graphs and SAUHGs with node and edge attributes and prove that those equivalences are equal.In section 5, the approximation theorems for GNNs on both graph types are presented.We support our theoretical findings by setting up synthetic experiments in 6.Finally, section 7 includes our conclusions and future matter of research.All of the proofs are collected in the appendix A.

Related Work
In the seminal work [18], it has been proven that standard GNNs have the same expressive power as the 1-WL test.To overcome such a limitation, new GNN models and variants of the WL test have been proposed.For example, in [32], a model is introduced where node identities are directly injected into the aggregate functions.In [33], the k-WL test has been taken into account to develop a more powerful GNN model, given its greater capability to distinguish non-isomorphic graphs.In [34], a simplicial-wise Weisfeiler-Lehman test is introduced, and it is proven to be strictly more powerful than the 1-WL test and not less powerful than the 3-WL test; a similar test (called cellular WL test), proposed in [35], is proven to be more powerful than the simplicial version.Nevertheless, all these tests do not deal with edge-attributed graphs and not with dynamic graphs.
In [36], the authors consider a GNN model for multirelational graphs where edges are labeled with types: it is proven that such a model has the same computation capability as a corresponding WL-test.The result is similar to our result on SAUGH, but the way we aggregate the message on the edges is different.Additionally, our work extends [36] from several points of view: the approximation capability of GNNs is studied, a relationship between the 1-WL test and unfolding trees is established, and edge attributes include more general vectors of reals.Moreover, all those studies are extended to dynamic graphs.
Moreover, the WL test mechanism applied to GNNs has also been studied within the paradigm of unfolding trees [37], [38].An equivalence between the two concepts has been established by [19], but it is limited to static graphs without edge attributes.
Some studies have been dedicated to the approximation and generalization properties of Graph Neural Networks.In [22], the authors proved the universal approximation properties of the original Graph Neural Network model, modulo the unfolding equivalence.The universal approximation is shown for GNNs with random node initialization in [39], while, in [18], GNNs are shown to be able to encode any graph with countable input features.Moreover, the authors of [40] proved that GNNs, provided with a colored local iterative procedure (CLIP), can be universal approximators for functions on graphs with node attributes.The approximation property has also been extended to Folklore Graph Neural Networks in [41] and Linear Graph Neural Networks, and general GNNs in [42,43], both in the invariant and equivariant case.Recently, the universal approximation theorem has been proved for modern message-passing graph neural networks by [20] giving a hint on the properties of the network architecture (e.g., the number of layers, the characterization of the aggregation function).A relation between the graph diameter and the computational power of GNNs has been established in [44], where the GNNs are assimilated to the so-called LOCAL models and it is proved that a GNN with a number of layers larger than the diameter of the graph can compute any Turing function of the graph.Nevertheless, no information on the aggregation function characterization is given.
Despite the availability of universal approximation theorems for static graphs with node attributes, the theory lacks results about the approximation capability of other types of graphs, such as dynamic graphs and graphs with attributes on both nodes and edges.Therefore, this paper aims to extend the results of the expressive power of GNNs for dynamic graphs and SAUHGs with node and edge attributes.

Notation and Preliminaries
Before extending the work about the expressive power of GNNs to dynamic and edge-attributed graph domains, the mathematical notation and preliminary definitions are given in this chapter.In this paper, only finite graphs are under consideration.

ℕ
natural numbers ℕ 0 natural numbers starting at 0 ℝ real numbers ℝ  ℝ vector space of dimension  ℕ  ℕ vector space of dimension  ℤ  ℤ vector space of dimension   = (0, … , 0) ⊤ zero vector of corresponding size The following definition introduces a static, node/edge attributed, undirected and homogeneous graph called SAUHG.The reason for defining and using it comes from [3].Here, it is shown that every graph type is bijectively transformable into each other.Therefore, SAUHGs will be used as a standard form for all structurally different graph types as directed or undirected simple graphs, hypergraphs, multigraphs, heterogeneous or attributed graphs, and any composition of those.For a detailed introduction, see [3].
is a finite set of edges and node and edge attributes are determined by the mappings  ′ ∶  ′ → ,  ′ ∶  ′ →  that map into the arbitrary node attribute set , and edge attribute set .The domain of SAUGHs will be denoted as  ′ .Remark 3.2.(Attribute sets) In the above definition of the SAUHG, the node and edge attribute sets  and  can be arbitrary.However, without loss of generality, we can assume that the attribute sets are equal because they can be arbitrarily extended to  ′ =  ∪ .Additionally, any arbitrary attribute set  ′ can be embedded into the -dimensional vector space ℝ  .Since the attribute sets in general do not matter for the theories in this paper and to support a better readability, in what follows we consider  ′ ∶  ′ → ℝ  ,  ∶  ′ → ℝ  for every SAUHG.
All the aforementioned graph types are static, but temporal changes play an essential role in learning on graphs representing real-world applications; thus, dynamic graphs are defined in the following.In particular, the dynamic graph definition here consists of a discrete-time representation.To prove the approximation theorems for SAUHGs and dynamic graphs, we need to specify the GNN architectures capable of handling those graph types.The standard Message-Passing GNN for static node-attributed undirected homogeneous graphs is given in [18].Here, the node attributes are used as the initial representation and input to the GNN.The update is executed by aggregation over the representations of the neighboring nodes.
Given that a SAUHG acts as a standard form for all graph types, the ordinary GNN architecture will be extended to take also edge attributes into account.This can be done by analogously including the edge attributes in the first iteration to the processing of the node information in the general GNN framework as follows.Definition 3.4 (SGNN).For a SAUHG  ′ = ( ′ ,  ′ ,  ′ ,  ′ ) let ,  ∈  ′ and  = {, }.The SGNN propagation scheme for iteration  ∈ [],  > 0 is defined as [(1,-),(2,-),(3,-)] [(-,-),(p,-),(q,-)] (p,-) (q,-) d -(-,-) -Figure 2: Illustration of the statification of a dynamic graph.On the left, the temporal evolution of a graph, including non-existent nodes and edges (gray), is given, and on the right, the corresponding statified graph with the total amount of nodes and edges together with the concatenated attributes is shown.
The output for a node-specific learning problem after the last iteration  respectively is given by ) , using a selected aggregation scheme and a suitable READ-OUT function, and the output for a graph-specific learning problem is determined by ) .
For the dynamic case, we chose a widely used GNN model that is consistent with the theory we built.Based on [1], the discrete dynamic graph neural network (DGNN) uses a GNN to encode each graph snapshot.Here, the model is modified by using the previously defined SGNN in place of the standard one.Definition 3.5 (Discrete DGNN).Given a discrete dynamic graph  = (  ) ∈ , a discrete DGNN using a continuously differentiable recursive function  for temporal modelling can be expressed as: where   () ∈ ℝ  is the hidden representation of node  at time  of dimension  and   () ∈ ℝ  is an dimensional hidden representation of node  produced by  , and  ∶ ℝ  × ℝ  → ℝ  is a neural architecture for temporal modeling (in the methods surveyed in [1],  is almost always an RNN or an LSTM).
The stacked version of the discrete DGNN is then: where being  the number of nodes,  and  the dimensions of the hidden representation of a node produced respectively by the SGNN and by the  .Applying  corresponds to componentwise applying  for each node [1].
To conclude, a function READOUT dyn will take as input () and gives a suitable output for the considered task, so that altogether the DGNN will be described as (, , ) = READOUT dyn (()).The GNNs expressivity is studied in terms of their capability to distinguish two non-isomorphic graphs.Definition 3.7 (Graph Isomorphism).Let  1 and  2 be two static graphs, then  1 = ( 1 ,  1 ) and  2 = ( 2 ,  2 ) are isomorphic to each other  1 ≈  2 , if and only if there exists a bijective function If the two graphs are dynamic, they are called to be isomorphic if and only if the static graph snapshots of each timestep are isomorphic.
Graph isomorphism (GI) gained prominence in the theory community when it emerged as one of the few natural problems in the complexity class NP that could neither be classified as being hard (NP-complete) nor shown to be solvable with an efficient algorithm (that is, a polynomial-time algorithm) [45].Indeed it lies in the class NP-Intermidiate.However, in practice, the so-called Weisfeiler-Lehman (WL) test is used to at least recognize non-isomorphic graphs [37].If the WL test outputs two graphs as isomorphic, the isomorphism is likely but not given for sure.
The expressive power of GNNs can also be approached from the point of view of their approximation capability.It generally analyzes the capacity of different GNN models to approximate arbitrary functions [46].Different universal approximation theorems can be defined depending on the model, the considered input data, and the sets of functions.This paper will focus on the set of functions that preserve the unfolding tree equivalence, defined as follows.
Since the results in this section hold for undirected and unattributed graphs, we aim to extend the universal approximation theorem to GNNs working on SAUHGs (cf.Def.3.1) and dynamic graphs (cf.Def.3.3).For this purpose, in the next sections, we introduce a static attributed and a dynamic version of both the WL test and the unfolding trees to show that the graph equivalences regarding the attributed/dynamic WL test and attributed/dynamic unfolding trees are equivalent.With these notions, we define the set of functions that are attributed/dynamic unfolding tree preserving and reformulate the universal approximation theorem to the attributed and dynamic cases (cf.Thm.5.1.3and Thm.5.2.4).

Weisfeiler-Lehman and Unfolding Trees
There are many different extensions of the WL test, e.g., n-dim.WL test, n-dim folklore WL test, or set n-dim.WL test [37].In general, the 1-WL test is based on a graph coloring algorithm.The coloring is applied in parallel on the nodes of the two input graphs, and at the end, the number of colors used per each graph is counted.Then, two graphs are detected as non-isomorphic if these numbers do not coincide, whereas when the numbers are equal, the graphs are possibly isomorphic (WL equivalent).Another way to check the isomorphism of two graphs and therefore compare the GNN expressivity is to consider the so-called unfolding trees of all their nodes.An unfolding tree consists of a tree constructed by a breadth-first visit of the graph, starting from a given node.Two graphs are possibly isomorphic if all their unfolding trees are equal.From [19], it is known that for static undirected and unattributed graphs, both the unfolding tree and the Weisfeiler-Lehman approach for testing the isomorphism of two graphs are equivalent.
In this section, we extend the unfolding tree (UT) and the Weisfeiler-Lehman (WL) tests to the domains of SAUHGs and dynamic graphs respectively.Using these we show that also the extended versions of UT-equivalence and WLequivalence are equivalent.

Equivalence for Attributed Static Graphs
The extended result on SAUHGs is formalized and proven in Thm.4.1.6.The original WL test and unfolding tree notions cover all graph properties except edge attributes.Thus, the notion of unfolding trees and the Weisfeiler-Lehman test have to be extended to an attributed version.

Definition 4.1.1 (Attributed Unfolding Tree). The attributed unfolding tree 𝐓
where  ( ′  ) is a tree constituted of node  with attribute is the tree consisting of the root node  and subtrees Using the definition of the attributed unfolding equivalence on graphs, the 1-Weisfeiler Lehman (1-WL) test provided in [20] is extended to attributed graphs.This allows for the definition of the attributed 1-WL equivalence on graphs and the subsequent Lem.4.1.5and the resulting Thm.4.1.6that pose the relation between the attributed unfolding equivalence and the attributed 1-WL test.
Definition 4.1.3(Attributed 1-WL test).Let HASH be a bijective function that codes every possible node attribute with a color from a color set  and  ′ = ( ′ ,  ′ ,  ′ ,  ′ ).
The attributed 1-WL (1-AWL) test is defined recursively through the following.
• At iteration  = 0, the color is set to the hashed node attribute: • At iteration  > 0, the HASH function is extended to the edge weights:

𝑛𝑒 𝑣
) ) In the following, the 1-WL equivalence of graphs and nodes is extended by using the attributed version of the 1-WL test (cf.Def.4.1.3).
Finally, to complete the derivation of the equivalence between the attributed unfolding equivalence (cf.Def.4.1.2) and the attributed WL-equivalence (cf.Def.4.1.4),the following helping lemma Lem.4.1.5 is given, which directly leads to Thm. 4.1.6.The lemma states the equivalence between the attributed unfolding tree equivalence of nodes and the equality of their attributed unfolding trees up to a specific depth.In [22], it has been shown that the unfolding trees of infinite depth are not necessary to consider for this equivalence.Instead, the larger number of nodes of both graphs under consideration is sufficient for the depth of the unfolding trees, which is finite since the graphs are bounded.The following lemma determines the equivalence between the attributed unfolding trees of two nodes and their colors resulting from the attributed 1-WL test.
The proof can be found in Apx.A.2 Directly from Lem. 4.1.5,the equivalence of the attributed unfolding tree equivalence and the attributed 1-WL equivalence of two nodes belonging to the same graph can be formalized.
Proof.The proof follows from the proof of Lem.4.1.5.

Equivalence for Dynamic Graphs
In this section, the previously introduced concepts of unfolding tree and WL equivalences are extended to the dynamic case.Note that Lem.4.1.5and, therefore, Thm.4.1.6also hold in case  ′ is the SAUHG (cf.Def.3.1) resulting from a transformation of a dynamic graph  = (  ) ∈ to its static attributed version.However, the GNNs working on dynamic graphs usually use a significantly different architecture than those that work on static attributed graphs.Therefore, the following includes the derivation of the various equivalences on dynamic graphs separately.
First, dynamic unfolding trees are introduced as a sequence of unfolding trees for each graph snapshot respectively.Afterward, the equivalence of two dynamic graphs regarding their dynamic unfolding trees is presented.Definition 4.2.1 (Dynamic Unfolding Tree).Let  = (  ) ∈ with   = (  ,   ,   ,   ) be a dynamic graph.The dynamic unfolding tree    () at time  ∈  of node  ∈  up to depth  ∈ ℕ 0 is defined as where  (  ()) is a tree constituted of node  with attribute   ().Furthermore, ) is the tree with root node  with attribute   ().Additionally, • At iteration  = 0 the color is set to the hashed node attribute or a fixed color for non-existent nodes: • Then, the aggregation mechanism is defined by the bijective function HASH  for  > 0: ) ) Note that for  > 0,  (−1)()  =  ⟂ holds for a nonexistent node at time .Further, the neighborset is empty so the other inputs of HASH  are empty, and together with  ⟂ it will always give the same color for non-existent nodes.Definition 4.2.4 (Dynamic 1-WL equivalence).Two nodes ,  ∈  in a dynamic graph  are said to be dynamic WL equivalent, noted by  ∼   , if their colors resulting from the WL test are pairwise equal per timestep.Analogously, let  1 and  2 be dynamic graphs.Then  1 ∼    2 , if and only if for all nodes  1 ∈  (1)   there exists a corresponding node  2 ∈  (2)   with

Approximation Capability of GNN's
In this section, the results from Sec. 4.1 and Sec.4.2 are brought together in the formulation of a universal approximation theorem for GNNs working on SAUHGs and dynamic graphs and the set of functions that preserve the attributed or dynamic unfolding equivalence, respectively.

GNNs for Attributed Static Graphs
Since the goal is to show the attributed extension of the universal approximation theorem, it is necessary to define the corresponding family of attributed unfolding equivalencepreserving functions.A function preserves the attributed unfolding equivalence if the output of the function is equal when two nodes are attributed unfolding equivalent.Definition 5.1.1.Let  ′ be the domain of bounded SAUHGs, All functions that preserve the attributed unfolding equivalence are collected in the set  ( ′ ).
Analogous to the argumentation in [22], there exists a relation between the unfolding equivalence preserving functions and the unfolding trees for attributed graphs, as follows.

Proposition 5.1.2 (Functions of attributed unfolding trees).
A function  belongs to  ( ′ ) if and only if there exists a function  defined on trees such that for any graph  ′ ∈  ′ it holds  ( ′ , ) = (  ), for any node  ∈  ′ .
The proof works analogously to the proof of the unattributed version presented in [22] and can be found in Apx.A.1.
Considering the previously defined concepts and statements for SAUHGs in Sec.4.1, finally, the following theorem states the universal approximation capability of the SGNNs on bounded SAUHGs.For any measurable function  ∈  ( ′ ) preserving the attributed unfolding equivalence (cf.Def.5.1.1),any norm ‖ ⋅ ‖ on ℝ, any probability measure  on  ′ , for any reals ,  where ,  > 0, there exists a SGNN defined by the continuously differentiable functions COMBINE () , AGGREGATE () , at iteration  ≤ 2 − 1, and by the function READOUT, with hidden dimension  = 1, i.e, ℎ   ∈ ℝ ∀, such that the function  (realized by the GNN) computed after 2 − 1 steps for all  ′ ∈  ′ satisfies the condition The corresponding proof can be found in A.4.
As in [22] we want now to study the case when the employed components (COMBINE, AGGREGATE, READ-OUT) are sufficiently general to be able to approximate any function preserving the unfolding equivalence.We call this class of networks,   , SGNN models with universal components.To simplify our discussion, we introduce the transition function  () to indicate the concatenation of the AGGREGATE () and COMBINE () , i.e., Then, we can formally define the class   .
The following result shows that Theorem 5.1.3still holds even for SGNNs with universal components.Theorem 5.1.5(Approximation by Neural Networks).Assuming that the hypotheses of Theorem 5.1.3are fulfilled and   is a class of SGNNs with universal components.Then, there exists a parameter set  and some functions COMBINE ()   , AGGREGATE ()  , READOUT  , implemented by Neural Networks in   , such that the thesis of Theorem 5.1.3holds.
Proof.The proof is identical to the one contained in [22]; to give a hint on the methodology, we refer to the more complex proof of the analogous Theorem 5.2.6 for DGNN.

GNNs for Dynamic Graphs
Suitable functions that preserve the unfolding equivalence on dynamic graphs are dynamic systems.Before this statement is formalized and proven in Prop.5.2.3, dynamic systems and their property to preserve the dynamic unfolding equivalence are defined in the following.( Here,  ∶ ℝ  → ℝ  is an output function, and the state function   () is determined by The class of dynamic systems that preserve the unfolding equivalence on  will be denoted with  ().A characterization of  () is given by the following result (following the work in [22]).

Proposition 5.2.3 (Functions of dynamic unfolding trees).
A dynamic system dyn belongs to  () if and only if there exists a function  defined on attributed trees such that for all (, , ) ∈  it holds dyn(, , ) =  ( (   () )
The proof can be found in Apx.A.3.Finally, the universal approximation of the Message-Passing GNN for dynamic graphs is determined as follows.
Theorem 5.2.4 (Universal Approximation Theorem by DGNN).Let  = (  ) ∈ be a discrete dynamic graph in the graph domain  and  = max ∈ || be the maximal number of nodes in the domain.Let dyn(, , ) ∈  () be any measurable dynamical system preserving the unfolding equivalence, ‖ ⋅ ‖ be a norm on ℝ,  be any probability measure on  and ,  be any real numbers where ,  > 0.
Then, there exists a DGNN composed by SGNNs with 2 − 1 layers and hidden dimension  = 1, and Recurrent Neural Network state dimension  = 1 such that the function  realized by this model satisfies The proof can be found in Apx.A.5. Theorem 5.2.4 intuitively states that, given a dynamical system dyn, there is a DGNN that approximates it.The functions which the DGNN is a composition of (such as the dynamical function  , COMBINE () , AGGREGATE () , etc.) are supposed to be continuously differentiable but still generic, while can be generic and completely unconstrained.This situation does not correspond to practical cases where the DGNN adopts particular architectures, and those functions are Neural Networks, or more generally, parametric modelsfor example, made of layers of sum, max, average, etc.Thus, it is of fundamental interest to clarify whether the theorem still holds when the components of the DGNN are parametric models.
Definition 5.2.5.A class   of discrete DGNN models is said to have universal components if the employed SGNNs have universal components as defined in Def.5.1.4and the employed recurrent model is designed such that for any  1 ,  2 > 0 and any continuously differentiable target functions f, READOUT dyn there is a discrete DGNN in the class   , with functions   , READOUT dyn, and parameters  such that, for any input vectors  ∈ ℝ  , ,  ⋆ ∈ ℝ  , it holds The following result shows that Theorem 5.2.4 still holds even for discrete DGNNs with universal components.

Theorem 5.2.6 (Approximation by Neural Networks). Assume that the hypotheses of Thm. 5.2.4 are fulfilled and   is a class of discrete DGNNs with universal components.
Then, there exists a parameter set , and the functions f, READOUT dyn , implemented by Neural Networks in   , such that Thm.5.2.4 holds.
The proof can be found in Apx.A.6

Discussion
The following remarks may further help to understand the results proven in the previous paragraphs: • Thm.5.1.3suggests an alternative approach to process several graph domains with a universal SGNN model.Actually, almost all the graphs, including, e.g., hypergraphs, multigraphs, directed graphs, etc., can be transformed to SAUGHs with node and edge attributes [3].Then, we can use a universal GNN model on such a domain using sufficiently expressive AGGREGATE and COMBINE functions.
• The proofs of Thms.5.1.3and 5.2.4 are based on space partitioning reasoning.Differently from technique based on Stone-Weierstrass theorem [42], which are existential in nature, such an approach allows us to deduce information about the characteristics of networks that reach the desired approximation.Actually, the theorems point out that the approximation can be obtained with a minimal hidden dimension  = 1 both in SGNNs and DGNNs and with a state dimension  = 1 in the Recurrent Neural Network of DGNNs.
Such a result may appear surprising, but the proofs show that GNNs can encode unfolding trees with a single real number.
• Moreover, Thms.5.1.3and 5.2.4 specify that GNNs can obtain the approximation with 2 − 1 layers.We could incorrectly presume that the maximum number of layers required to reach a desired approximation depends on the diameter () of the graph, which can be smaller than the number of nodes  since a GNN can move the information from one node to another in () iterations.However, () layers are not always sufficient to distinguish all the nodes of a graph.In fact, it has been proven that  − 1 is a lower bound on the number of iterations that the 1-WL algorithm has to carry out to be able to distinguish any pairs of 1-WL distinguishable graphs [47], and 2 − 1 is a lower bound for 1-WL algorithm to distinguish pairs of nodes in two different graphs [19].So overall, 2 − 1 is also the lower bound for the GNN computation time to approximate any function for either graph-or node-focused tasks (see [20] for a detailed discussion).
• Thms.5.1.3and 5.2.4 specify that the approximation is modulo unfolding equivalence, or, correspondingly, modulo WL equivalence.It can be observed that in the dynamic case, only a part of the architecture affects the equivalence.Actually, a dynamic GNN contains two modules: the first one, an SGNN, produces an embedding of the input graph at each time instance; the second component contains a Recurrent Neural Network that processes the sequence of the embeddings.The dynamic unfolding equivalence is defined by sequences of unfolding trees, which are built independently for each node and time instance by the SGNN.Similarly, the dynamic WL equivalence is defined by sequences of colors defined independently at each time step.Intuitively, the Recurrent Neural Network does not affect the equivalence, since Recurrent Neural Networks can be universal approximators and implement any function of the sequence without introducing other constraints beyond those already introduced by the SGNN.
• Thm.5.2.4 does not hold for any Dynamic GNN, as we take into account a discrete recurrent model working on graph snapshots (also known as Stacked DGNN).Nevertheless, several DGNNs of this kind are listed in [2], such as GCRN-M1 [48], RgCNN [49], PATCHY-SAN [50], DyGGNN [51], and others.Still, the approximation capability depends on the functions AGGREGATE and COMBINE designed for each GNN working on the single snapshot and the implemented Recurrent Neural Network.For example, the most general model, the original RNN, has been proven to be a universal approximator [52].

Experimental Validation
In this Section, we support our theoretical findings with an experimental study.For this purpose, we carry out two sets of experiments described as follows: E1.We show that a DGNN with universal components can approximate a function    ∶  → ℕ that models the 1-DWL test.The function    assigns a target label to the input graph that represents the class of 1-DWL equivalence; E2.In the same approximation task, we compare DGNNs with different GNN modules from the literature to show how the universality of the components affects the approximation capability.
We focus on the ability of the DGNN to approximate    , so only training performances are considered, i.e., we do not investigate the generalization capabilities over a test set.
Since the 1-DWL test provides the finest partition of graphs reachable by a DGNN, the mentioned tasks experimentally evaluate the expressive power of DGNNs.
Dataset.The dataset consists of dynamic graphs, i.e., vectors of static graph snapshots of fixed length  .Each static snapshot is one of the graphs in Fig. 4. Since the dataset is composed of all the possible combinations of the four graphs, it contains 4  dynamic graphs.Given that the graphs in Fig. 4 are pairwise 1-WL equivalent ( a) is 1-WL equivalent to b) and c) is 1-WL equivalent to d) ), the number of classes is 2  , with 4  2  = 2  graphs in each class.For each dynamic graph, the target is the corresponding 1-DWL output, represented as a natural number.For training purposes, the targets are normalized between 0 and 1 and uniformly spaced in the interval [0, 1].Therefore, the distance between each class label is  = 1∕2  .A dynamic graph  with target   will be said to be correctly classified if, given  = DGNN(), we have | −   | < ∕2.

E1 For the first set of experiments, the Dynamic Graph
Neural Network is composed of two modules: A Graph Isomorphism Network (GIN) [18] and a Recurrent Neural Network (RNN), which implement the static GNN and the temporal Network  of Eq. ( 1), respectively.Since it has been proven that the GIN is a universal architecture [18] and the RNNs are universal approximators for dynamical systems on vector sequences [52], the architecture used in the experiments fits the hypothesis of Thm.5.2.4.Thus, it can approximate any dynamical system on the temporal graph domain.
The model hyperparameters for the experiments are set as follows.The GIN includes  max = 6 layers 4 .
The MLP in the GIN network contains one hidden layer with a hyperbolic tangent activation function and batch normalization.Hidden layers of different sizes, i.e., ℎ  ∈ {1, 4, 8}, have been tested.For sake of simplicity, the output network has one hidden layer with the same number of neurons as the MLP in the GIN.Furthermore, ℎ  = 8 is the size of the hidden state of the RNN.

E2
In the other set of experiments, we test DGNNs composed by different GNN static modules and an RNN module (analogously to E1.).In particular, we compare DGNNs with the GNN module taken from the following list: -GIN as mentioned before; -Graph Convolutional Network (GCN) [8]; -GNN presented in [53] (see also [33]) where the aggregation function is the sum of the hidden features of the neighbours; it will be called _; -GNN presented in [53] with mean of the hidden features of the neighbours as aggregation function, called _ here; -GAT [10].
Here, the used hyperparameters are hidden dimension is ℎ = 8, the number of layers  = 4, and the time length  =  = 5.
In both the experimental cases, the model is trained over 300 epochs using the Adam optimizer with a learning rate of  = 10 −3 .Each configuration is is evaluated over 10 runs.The overall training is then performed on an Intel(R) Core(TM) i7-9800X processor running at 3.80GHz using 31GB of RAM and a GeForce GTX 1080 Ti GPU unit.The code used to run the experiments can be found at https://github.com/AleDinve/dyn-gnn.
Results.The results of the experiments confirm our theoretical statements.More precisely, the DGNNs performed as follows during training.E1 In Fig. 5, the evolution of the training accuracy over the epochs is presented for different GIN hidden layer sizes ℎ  and for dynamic graphs up to time lengths  = 4 (Fig. 5a) and  = 5 (Fig. 5b).All the architectures reach 100% accuracy for experiments on both time lengths.Even setting ℎ  = 1 leads to a perfect classification at a slower rate.It may appear surprising that, even with a hidden representation of size 1, the DGNN can well approximate the function    .
However, as we already pointed out in Sec.5.3, the possibility of reaching the universal approximation with a feature of dimension 1 is confirmed by Thm.5.2.4.

E2
The DGNN with the GIN module achieve the best performance in terms of learning accuracy and speed of decreasing, as illustrated in Fig. 6.The DGNN with the _ module is able to learn the task, although learning is unstable (see Fig. 6 b)).This is not surprising since this module has been proven to match the expressive power of the 1-WL test [33].The other DGNNs are incapable to learn the objective function.This is a consequence of their weaker expressive power, widely investigated in literature [18,20].
Thus, overall, our theoretical expectations were met by both experiments.

Conclusion and Future Work
This paper provides two extensions of the 1-WL isomorphism test and the definition of unfolding trees of nodes and graphs.First, we introduced WL test notions to attributed and dynamic graphs, and second, we introduced extended concepts for unfolding trees on attributed and dynamic graphs.Further, we proved the existence of a bijection between the dynamic 1-WL equivalence of dynamic graphs and the attributed 1-WL equivalence of their corresponding static versions that are bijective regarding their encoded information.The same result we proved w.r.t. the unfolding tree equivalence.Moreover, we extended the strong connection between unfolding trees and the (dynamic/attributed) 1-WL tests -proving that they give rise to the same equivalence between nodes for the attributed and the dynamic case, respectively.Note that the GNNs working on static graphs usually have another architecture than those working on dynamic graphs.Therefore, we have proved that both the different GNN types can approximate any function that preserves the (attributed/dynamic) unfolding equivalence (i.e., the (attributed/dynamic) 1-WL equivalence).
Note that the dynamic GNN considered here in this paper is given in discrete-time representation, i.e., as a sequence of static graph snapshots without actual timestamps.Thus, Thm.5.2.4 does not hold for any Dynamic GNN, as we take into account a discrete recurrent model working on graph snapshots (also known as Stacked DGNN).Nevertheless, several DGNNs of this kind are listed in [2], such as GCRN-M1 [48], RgCNN [49], PATCHY-SAN [50], DyGGNN [51], and others.Still, the approximation capability depends on the functions AGGREGATE and COMBINE designed for each GNN working on the single snapshot and the implemented Recurrent Neural Network.For example, the most general model, the original RNN, has been proven to be a universal approximator [52].
As future work, extending all our results for graphs in continuous-time representation would be interesting.One difficulty in this context is deciding in which sense two continuous-time dynamic graphs are called WL equivalent since there are many possibilities for dealing with the given timestamps.The investigation of equivalences of dynamic graphs requires determining the handling of dynamic graphs that are equal in their structure but differ in their temporal occurrence, i.e., dependent on the commitment of the WL equivalence or the unfolding tree equivalence, it is required to decide whether the concepts need to be timeinvariant.For time-invariant equivalence, the following concepts hold as they are.In case two graphs with the same structure should be distinguished when they appear at different times, the node and edge attributes can be extended by an additional dimension carrying the exact timestamp.Thereby, the unfolding trees of two (structural) equal nodes would be different, having different timestamps in their attributes.Then, all dynamic graphs  () ∈  are defined over the same time interval .Without loss of generality, this assumption can be made since the set of timestamps of  () noted by   () can be padded by including missing timestamps   and  () can be padded by empty graphs Furthermore, this paper considers extensions of the usual 1-WL test and the commonly known unfolding trees.Further future work could be to investigate extensions, for example, the n-dim attributed/dynamic WL test or other versions of unfolding trees, covering GNN models not considered by the frameworks used in this paper.These extensions might result in a more exemplary classification of the expressive power of different GNN architectures.
Moreover, the proposed results mainly focus on the expressive power of GNNs.However, GNNs with the same expressive power may differ for other fundamental properties, e.g., the computational and memory complexity and the generalization capability.Understanding how the architecture of AGGREGATE () , COMBINE () , and READOUT impact those properties is of fundamental importance for practical applications of GNNs.

A.1. Proof of Prop. 5.1.2
A function  belongs to  ( ′ ) if and only if there exists a function  defined on trees such that for any graph  ′ ∈  ′ it holds  ( ′ , ) = (  ), for any node  ∈  ′ .

A.2. Proof of Lem. 4.1.5
Consider  ′ = ( ′ ,  ′ ,  ′ ,  ′ ) as the SAUHG resulting from a transformation of an arbitrary static graph  = (, , , ) with nodes ,  ∈  and corresponding attributes   ,   .Then it holds Proof.The proof is carried out by induction on , which represents both the depth of the unfolding trees and the iteration step in the WL coloring. = 0: It holds . > 0: Suppose that Eq. ( 4) holds for  − 1, and prove that it holds also for .
-By definition,    =    is equivalent to ) . ( -Applying the induction hypothesis, it holds that -Eq. ( 5) is equivalent to the following: Given the definition of the unfolding trees and their construction, this is equivalent to -By the induction hypothesis, Eq. ( 7) is equivalent to -Putting together Eq. ( 6), (7), and the fact that the HASH function is bijective, we obtain: which, by definition, is equivalent to  ()  =  ()  .

A.3. Proof of Prop. 5.2.3
A dynamic system dyn belongs to  () if and only if there exists a function  defined on attributed trees such that for all (, , ) ∈  it holds dyn(, , ) =  ( (   () )
Proof.We show the proposition by proving both directions of the equivalence relation: ) for all triplets (, , ) ∈ , then for any pair of nodes ) = dyn(,  2 , ).
⇐: On the other hand, if dyn preserves the unfolding equivalence, then we can define  as ) = dyn(, , ).
Note that the above equality is a correct specification for a function.In fact, if ) implies dyn(, , ) = dyn(, , ), then  is uniquely defined.

A.4. Sketch of the proof of Thm. 5.1.3
Let  ′ be the domain of bounded SAUHGs with the maximal number of nodes  = max For any measurable function  ∈  ( ′ ) preserving the attributed unfolding equivalence (cf.Def.5.1.1),any norm ‖ ⋅ ‖ on ℝ, any probability measure  on  ′ , for any reals ,  where ,  > 0, there exists a SGNN defined by the continuously differentiable functions COMBINE () , AGGREGATE () , at iteration  ≤ 2 − 1, and by the function READOUT, with hidden dimension  = 1, i.e, ℎ   ∈ ℝ ∀, such that the function  (realized by the SGNN) computed after 2 − 1 steps for all  ′ ∈  ′ satisfies the condition Proof.Since the proof proceeds analoguously to the one in [20], we will only sketch the proof idea here and refer to the original paper for further details.First, we need a preliminary lemma which is an extension of [22,Lem. 1] to the domain of SAUHGs  ′ .Intuitively, this lemma suggests that a domain of SAUHG graphs with continuous attributes can be partitioned into small subsets so that the attributes of the graphs are almost constant in each partition.Moreover, in probability, a finite number of partitions is sufficient to cover a large part of the domain.

for each 𝑗, all the graphs in  ′
have the same structure, i.e., they differ only in the values of their attributes; 3. for each set ′  , there exists a hypercube Here,   ′ denotes the vector obtained by concatenating all the attribute vectors of both nodes and edges of  ′ , namely where   ′ is the concatenation of all the node attributes and Ω  ′ is the concatenation of all edge attributes; 4. for any two different sets  ′  ,  ′  ,  ≠ , their graphs have different structures, or their hypercubes   ,   are disjoint, i.e.,   ⋂   = ∅; 5. for each  and each pair of graphs  1 ,  2 ∈  ′  , the inequality ‖  1 −   2 ‖ ∞ ≤  holds; 6. for each graph  ′ ∈ ′ , the inequality Proof.The proof is similar to the one contained in [22].The only remark needed here is that we can consider the whole concatenating of all attributes from both nodes and edges without loss of generality; indeed, if we were considering the node and the edge attributes separately, we would need conditions on the hypercubes, s.t.: Then we can stack those attribute vectors, as in the statement, s.t.: which allows us to exploit the same proof contained in [22].
The following theorem, where the domain contains a finite number of graphs and the attributes are integers, is equivalent to Thm. 5 Therefore, the GNN has to encode the attributed unfolding tree into the node attributes, i.e., for each node , we want to have   = ▿(  ), where ▿ is an encoding function that maps attributed unfolding trees into real numbers.The existence and injectiveness of ▿ are ensured by construction.More precisely, the encodings are constructed recursively by the AGGREGATE () and the COMBINE () functions using the neighborhood information, i.e., the node and edge attributes.Consequently, the theorem can be proven given that there exist appropriate functions ▿, AGGREGATE () , COMBINE ()  and READOUT.For this purpose, the functions AGGREGATE ()  and COMBINE () must satisfy ∀  ≤ 2 − 1: )) .
In a simple solution, AGGREGATE () decodes the attributed trees of the neighbors  of ,  −1  , and stores them into a data structure to be accessed by COMBINE () .The detailed construction of the appropriate functions is given in [20].
Adopting an argument similar to that in [22], it is proven that the previous theorem is equivalent to Thm. 5.1.3and this concludes the proof.

A.5. Proof of Thm. 5.2.4
Let  = (  ) ∈ be a discrete dynamic graph in the graph domain  and  = max ∈ || be the maximal number of nodes in the domain.Let dyn(, , ) ∈  () be any measurable dynamical system preserving the unfolding equivalence, ‖ ⋅ ‖ be a norm on ℝ,  be any probability measure on  and ,  be any real numbers where ,  > 0. Proof.To prove the theorem above, we need some preliminary results.Using the same argument used for SAUHGs in 5.1.3,we need, as a preliminary lemma, the extension of [22,Lem. 1] to the domain of dynamic graphs , analogously to the extension to the domain of SAUHGs in A.4.1.
Proof.Indeed, taking into account the argument in [3], one can establish a bijection between the domain of dynamic graphs and the domain of SAUHGs; on the latter, we can directly apply A.4.1.
Thm. 5.2.4 is equivalent to the following, where the domain contains a finite number of elements in  and the attributes are integers.
The equivalence between Thm. 5.2.4 and Thm.A.5.2 is formally proven by the following lemma.Proof.The proof is similar to the one contained in [22].Nevertheless, we want to highlight that in this case, patterns are taken from  ∶=  ×  × , as we are proving it in the context of the dynamic graphs.Now, we can proceed to prove Thm.A.5.2.
Proof of Thm.A.5.2.The proof of this theorem involves assuming that the output dimension is  = 1, i.e., dyn(, , ) ∈ ℝ, but the result can be extended to the general case with  ∈ ℕ by concatenating the corresponding results.As a result of Thm.5.2.3, there exists a function , s.t.dyn(, , ) = (  ()) = ((  ()) ∈ [𝑡] ) where   () is an attributed unfolding tree.Given   as the number of nodes of the graph at timestep , in order to store the graph information, an attributed unfolding tree of depth 2  − 1 is required for each node, in such a way that  can satisfy
The main idea behind the proof of Theorem A.5.2 is to design a DGNN that can encode the sequence of attributed unfolding trees (  ()) ∈[] into the node attributes at each timestep t, i.e,   () = #  ((  ()) ∈ [𝑡] ).This is achieved by using a coding function that maps sequences of +1 attributed trees into real numbers.To implement the encoding that could fit the definition of the DGNN, two coding functions are needed: the ∇ function, which encodes the attributed unfolding trees, and the family of coding functions #  .The composition of these functions is used to define the node's attributes, and the DGNN can produce the desired output by using this encoded information as follows: )) (10) where the ausiliar function APPEND  and the ∇, #  coding functions are defined in the following.
APPEND  Let   () be the domain of the attributed unfolding trees with root , up to a certain depth .The function ) Intuitively, this function appends the unfolding tree snapshot of the node  at time  to the sequence of the unfolding trees of that node at the previous  − 1 timesteps.
In the following, the coding functions are defined; their existence and injectiveness are provided by construction.

The ∇ Coding Function
Let ∇ ∶=  ∇ • ∇ be a composition of any two injective functions  ∇ and  ∇ with the following properties: - ∇ is an injective function from the domain of static unfolding trees, calculated on the nodes in the graph   , to the Cartesian product ℕ×ℕ  ×ℤ  = ℕ  +1 ×ℤ  , where  is the maximum number of nodes a tree could have.
Intuitively, in the Cartesian product, ℕ represents the tree structure, ℕ  denotes the node numbering, while, for each node, an integer vector in ℤ  is used to encode the node attributes.Notice that  ∇ exists and is injective since the maximal information contained in an unfolding tree is given by the union of all its node attributes and all its structural information, which just equals the dimension of the codomain of  ∇ .- ∇ is an injective function from ℕ  +1 ×ℤ  to ℝ, whose existence is guaranteed by the cardinality theory, since the two sets have the same cardinality.Since  ∇  and  ∇  are injective, also the existence and the injectiveness of ∇  is ensured.

The # 𝑡 Coding Family
Similarly to ∇, the functions #  ∶=  #  • #  are composed by two functions  #  and  #  with the following properties: - #  is an injective function from the domain of the dynamic unfolding trees    () ∶= {(   ()) ∈[] } to the Cartesian product ℕ  × ℕ   × ℤ  = ℕ (  +1) × ℤ  , where   is the maximum number of nodes a tree could have at time t.
- #  is an injective function from ℕ ( +1) × ℤ  to ℝ, whose existence is guaranteed by the cardinality theory, since the two sets have the same cardinality.Since  #  and  #  are injective, also the existence and the injectiveness of #  are ensured. ()   , COMBINE ()

𝑡
The recursive function  has to satisfy where the   () is the hidden representation of node  at time  extracted from the -th SGNN, i.e.,   () = SGNN (  , ).
In particular, at each iteration , we have )) Further, the functions AGGREGATE ()  and COMBINE () following the proof in [20] -must satisfy For example, the trees can be collected into the coding of a new tree, i.e., where ∪ ∈  () denotes an operator that constructs a tree with a root having void attributes from a set of subtrees (see Fig. 7).Then, COMBINE ()  assigns the correct attributes to the root by extracting them from  −1  (), i.e., where ATTACH is an operator that returns a tree constructed by replacing the attributes of the root in the latter tree with those of the former tree and  is the result of the AGGREGATE ()  function.Now, notice that, with this definition, AGGREGATE ()  , COMBINE ()   , and READOUT  may not be differentiable.Nevertheless, Eq. ( 9) has to be satisfied only for a finite number of graphs, namely   .Thus, we can specify other functions AGGREGATE  () , COMBINE  () , and READOUT, which produce exactly the same computations when they are applied on the graphs   , but that can be extended to the rest of their domain so that they are continuously differentiable.Obviously, such an extension exists since those functions are only constrained to interpolate a finite number of points 5 .

The READOUT dyn function
Eventually, READOUT dyn must satisfy: ))) This concludes the proof via Lem.A.5.3. 5 Notice that a similar extension can also be applied to the coding function ▿ and to the decoding function ▿ −1 .In this case, the coding function is not injective on the whole domain, but only on the graphs mentioned in the theorem.

2.6
Assume that the hypotheses of Thm.5.2.4 are fulfilled and   is a class of discrete DGNNs with universal components.Then, there exists a parameter set , and the functions SGNN()  ,  0, ,   , implemented by Neural Networks in   , such that the thesis of Thm.5.2.4 holds.
Proof.The idea of the proof follows from the same reasoning adopted in [20].Intuitively, since the discrete DGNN of Thm.5.2.4 is implemented by continuously differentiable functions, its output depends continuously on the possible changes in the DGNN implementation: small changes in the function implementation cause small changes in the DGNN outputs.Therefore, the functions of the DGNN of Thm.5.2.4 can be replaced by Neural Networks, provided that those networks are suitable approximators.
As in the proof of the dynamic version of the approximation theorem, cf.Thm.5.2.4,without loss of generality, we will assume that the attribute dimension is  = 1 6 .
Considering that the theorem has to hold only in probability, we can also assume that the domain is bounded to a finite set of patterns {( () , (  ) () ∈ ,  () ) |  = 1, … , } (as in Theorem A.5.2).As a result, the functions , f and READOUT dyn are bounded and have a bounded Jacobian.We can take the maximum of these Jacobians, which we will denote as .

Remark 3 . 6 .
The DGNN is a Message-Passing model because the SGNN is one, by definition.

Figure 4 :
Figure 4: The four static graphs are used as components to generate the synthetic dataset.Graphs a) and b) are equivalent under the static 1-WL test; the same holds for c) and d).

Figure 5 :Figure 6 :
Figure 5: Experimental Framework E1.Training accuracy over the epochs for a DGNN trained on the dataset containing dynamic graphs up to time length  = 4 (a) and  = 5 (b).

Figure 7 :
Figure 7: The ATTACH operator on trees.

‖
φ(, , ) −   (, , )‖ ∞ = ‖READOUT dyn ( Q()) − READOUT dyn, (  ())‖ ∞ ≤ ‖READOUT dyn ( Q()) − READOUT dyn (  ())‖ ∞ + ‖READOUT dyn (  ()) − READOUT dyn, (  ())‖ ∞ ≤    +  2 = (  ,  1 ,  2 ).Thus, we choose   ,  1 ,  2 , s.t. ≤  2 ; going back in probability, we obtain (‖ φ(, , ) −   (, , )‖ ≤  2 ) ≥ 1 −  ∀  ∈ ,which, along with Eq. (11), proves the result.Silvia Beddar-Wiesing is a Ph.D. student in Computer Science at the University of Kassel.After her Bachelor's in Mathematics with a specialization in Mathematical Optimization in 2017 at the University of Kassel, she changed to Computer Science and absolved her Master's Degree with a specialization in Computational Intelligence and Data Analytics in 2020.In her research, she focuses on Machine Learning on structural-dynamic graphs, considering graph preprocessing, representation, and embedding techniques.Her further research interests cover topics from Timeseries Analysis, Graph Theory, Machine Learning, and Deep Neural Networks.Giuseppe Alessio D'Inverno is currently a Ph.D. student in Information and Engineering Science at the University of Siena.He took a Bachelor's Degree in Mathematics in 2018 and a Master's Degree cum Laude in Applied Mathematics in 2020.His research interests are mainly focused on the mathematical properties of Deep Neural Networks, the application of Deep Learning to PDEs, graph theory, and geometric modeling.Caterina Graziani is currently a Ph.D. student in Information Engineering and Science at the University of Siena.In 2018 she received the Bachelor's Degree Cum Laude in Mathematics from the University of Siena, and two years later, she obtained the Master's Degree Cum Laude in Applied Mathematics at the University of Siena, defending a thesis called "LSTM for the prediction of translation speed based on Ribosome Profiling".Her research interests are in Graph Theory, Graph Neural Networks, and Bioinformatics, with a particular interest in the mathematical foundations of Deep Neural Networks.Veronica Lachi is a Ph.D. student in Information and Engineering at the University of Siena.She graduated in Applied Mathematics with Honors in 2020; in 2018, she took a bachelor's degree in Economics with Honors.Her research interests mainly focus on Graph Theory, Graph Neural Networks, Community Detection in Graphs and Networks, and mathematical properties of Deep Neural Networks.Alice Moallemy-Oureh is currently a Ph.D. candidate at the University of Kassel, Department of Electrical Engineering and Computer Science.Her topic addresses developing Graph Neural Networks designed to master the handling of attributedynamic graphs in continuous time representation.Prior, she graduated with an M.Sc. in Mathematics and its applications to Computer Science from the University of Kassel.Her specialization in Mathematics lies in Algorithmic Commutative
|  | } |} of the neighbors of .Moreover, the attributed unfolding tree of  determined by   = lim →∞    is obtained by merging all unfolding trees    of any depth .

1-WL test (1-DWL) generates
are corresponding subtrees with edge attributes Ω  ().If the node  does not exist at time , the corresponding tree is empty, there is no tree of depth  > 0 for this timestep and  does not occur in any neighborhood of other nodes.In total, the dynamic unfolding tree of  at time ,   () = lim Two nodes ,  ∈  are said to be dynamic unfolding equivalent  ∼    if   () =   () for every timestep t.Analogously, two dynamic graphs  1 ,  2 are said to be dynamic unfolding equivalent  1 ∼    2 , if there exists a bijection between the nodes of the graphs that respects the partition induced by the unfolding equivalence on the nodes.(Dynamic 1-WL test).Let  = (  ) ∈ with   = (, ,   ,   ) be a dynamic graph.Let HASH 0  be a bijective function encoding every node attribute of   with a color from a color set .The dynamic a vector of color sets one for each timestep  ∈  by: 1 () =   2 () for all  ∈ .Theorem 4.2.5 (Equivalence of Dynamic WL Equivalence and Dynamic UT Equivalence for nodes).Let  = (  ) ∈ be a dynamic graph and ,  ∈ .Then, it holds  ∼    ⟺  ∼   .Proof.Two nodes are dynamic unfolding tree equivalent iff they are attributed unfolding tree equivalent at each timestep  ∼    ∀ ∈  4.2.2.Further, as consequence of Thm.4.1.6,it holds that for all  ∈  the two nodes are attributed WL equivalent  ∼    and thus, the two nodes are dynamic WL equivalent by Def.4.2.4.In case of the non-existence of  at a certain timestep , the Theorem still holds.
a function that processes the graph snapshot at time  and provides an dimensional internal state representation for each node .Finally,  ∶ ℝ  × ℝ  → ℝ  is a recursive function, that is called state update function.
1. Let  and  be connected graphs and ,  be nodes of  and , respectively.The infinite unfolding trees   ,   are equal if and only if they are equal up to depth 2 − 1, i.e.,   =   iff  2−1