Accuracy-diversity trade-off in recommender systems via graph convolutions

Graph convolutions, in both their linear and neural network forms, have reached state-of-the-art accuracy on recommender system (RecSys) benchmarks. However, recommendation accuracy is tied with diversity in a delicate trade-off and the potential of graph convolutions to improve the latter is unexplored. Here, we develop a model that learns joint convolutional representations from a nearest neighbor and a furthest neighbor graph to establish a novel accuracy-diversity trade-off for recommender systems. The nearest neighbor graph connects entities (users or items) based on their similarities and is responsible for improving accuracy, while the furthest neighbor graph connects entities based on their dissimilarities and is responsible for diversifying recommendations. The information between the two convolutional modules is balanced already in the training phase through a regularizer inspired by multi-kernel learning. We evaluate the joint convolutional model on three benchmark datasets with different degrees of sparsity. The proposed method can either trade accuracy to improve substantially the catalog coverage or the diversity within the list; or improve both by a lesser amount. Compared with accuracy-oriented graph convolutional approaches, the proposed model shows diversity gains up to seven times by trading as little as 1% in accuracy. Compared with alternative accuracy-diversity trade-off solutions, the joint graph convolutional model retains the highest accuracy while offering a handle to increase diversity. To our knowledge, this is the first work proposing an accuracy-diversity trade-off with graph convolutions and opens the doors to learning over graphs approaches for improving such trade-off.


Introduction
Despite accuracy is still the most dominant criterion guiding the design and evaluation of recommender systems (RecSys), numerous studies have shown that recommendation diversity -decreasing the similarity of the items in the recommended item listsignificantly improves user satisfaction (Aggarwal et al., 2016;Bradley & Smyth, 2001;Kaminskas & Bridge, 2016). However, accuracy and diversity do not always go hand in hand and the development of a recommender system to consider both criteria typically requires dealing with an accuracy-diversity trade-off, a.k.a. balance or dilemma Kunaver and Požrl (2017), Wu et al. (2019).
The thin balance between accuracy and diversity is tied with the complexity and irregularity of the user-item relationships. Dealing with this complexity and irregularity has produced creative adaptations of existing RecSys paradigms, such as modifying accuracyoriented algorithms into diversity-oriented counterparts (Gan & Jiang, 2013;Said, Kille, Jain, & Albayrak, 2012). As an example, the nearest neighbor (NN) collaborative filtering connects entities (users, items) based on pairwise similarities and leverages these connections to interpolate the missing values from proximal entities. To secure accuracy for the target user, the system learns from the preferences of the most similar (nearest) neighboring entities. However, this typically leads to low recommendation diversity, as the NNs are too similar to the target user. In the search for a better accuracy-diversity trade-off, (Said et al., 2012) proposed to look at the furthest neighbors (FNs) instead, i.e., a subset of k users that are most dissimilar to the target-user in terms of preferences. The assumption here is that recommending items FNs disliked most could bring more diversity while preserving an acceptable level of accuracy. Other RecSys algorithms focusing on improving diversity by connecting entities based on their dissimilarity include (Anelli, Di Noia, Di Sciascio, Ragone, & Trotta, 2019;Zeng, Shang, Zhang, Lü, & Zhou, 2010;Zhou et al., 2010). Alternative approaches aiming at trading accuracy with diversity include re-ranking (Adomavicius & Kwon, 2008;Zhang & Hurley, 2008), leveraging side information (Hurley, 2013;Panniello, Tuzhilin, & Gorgoglione, 2014), or merging different models operating with different criteria (Zeng et al., 2010;Zhou et al., 2010).
We believe that in order to obtain sufficient depth in understanding the accuracy-diversity trade-off, RecSys approaches are needed that can fully capture the abovementioned complex and irregular user-item relationships. Graphs have proved themselves as excellent tools to develop such approaches (Ortega, Frossard, Kovačević, Moura, & Vandergheynst, 2018), which made graph-based RecSys one of the most rapidly developing areas. Examples of graph-based RecSys approaches are diffusion-based recommendations (Nikolakopoulos, Berberidis, Karypis, & Giannakis, 2019), random walks (Abbassi & Mirrokni, 2007), and graph neural network-based recommendations (Monti, Bronstein, & Bresson, 2017;Sun et al., 2019;Ying et al., 2018), to name a few.
In parallel to the increasing importance of graphs in the RecSys domain, the signal processing and machine learning communities have developed processing tools for data over graphs (Ortega et al., 2018;Wu et al., 2020). The quintessential tool in these areas is the graph convolution. Graph convolutions extend to graphs the operation of convolution used to process temporal and spatial signals and serve as the building block for graph convolutional neural networks (GCNNs) (Gama, Isufi, Leus, & Ribeiro, 2020). Graph convolutions, both in their linear or GCNN form, have been successfully applied to RecSys reaching state-of-the-art accuracy (Yang, 2019;Ying et al., 2018). Despite the promise, graph convolutions have only been used to over-fit accuracy, leaving unexplored their ability to diversify recommendations and, ultimately, improve the accuracy-diversity trade-off.
In this work, we explore the potential of graph convolutions to improve the accuracy-diversity trade-off for recommender systems. We conduct this exploration by developing a novel model composed of two graph convolutional components, one providing accuracyoriented recommendations from a NN graph, and one providing diversity-oriented recommendations from a FN graph. Differently from current works, we train a single joint model to fit the data, rather than using two separate models. Our specific contribution in this paper can be summarized as follows: i) We propose a novel accuracy-diversity trade-off framework for RecSys via graph convolutions. The model operates on a NN graph to improve accuracy and on a FN graph to improve diversity. Each graph can capture user-user or item-item relationships, allowing to also include the hybrid settings, such as a user-NN and an item-FN graph. To the best of our knowledge, this is the first contribution providing an accuracy-diversity trade-off by using these hybrid setups. The proposed model relies only on the available ratings, which we find important since side information, such as metadata or context, can be unavailable or require extensive work to be accurately used. ii) We develop design strategies that estimate the joint model parameters in view of both accuracy and diversity. These design strategies are versatile to both rating and ranking frameworks. When the joint model is composed of linear graph convolutional filters, we analyze the optimality of the design problem and provide solutions for it. iii) We analyze the joint model in the graph-spectral domain to provide an alternative interpretation of how the proposed approach balances accuracy with diversity. The joint model presents a band-stop behavior on both the NN and the FN graph, and builds recommendations by focusing on the extremely low and high graph frequencies. iv) We evaluate two types of trade-offs: i) an accuracy-diversity trade-off w.r.t. catalog coverage (i.e., aggregated diversity), and ii) an accuracy-diversity trade-off w.r.t. list diversity (i.e., individual diversity). The first trade-off shows the models' ability to recommend niche items and personalize recommendations. The second trade-off shows the models' ability to diversify items in the list. The proposed models can either trade accuracy to boost substantially one diversity metric, or improve by a lesser amount.
The remainder of this paper is organized as follows. Section 2 places our contribution in the context of current literature. Section 3 reviews NN collaborative filtering from a graph convolutional learning perspective. Section 4 provides a high-level overview of the proposed approach. Sections 5 and 6 contain the design strategies for rating and ranking, respectively. Section 7 provides the graphspectral analysis of the joint models, while Section 8 contains the numerical results. Section 9 discusses our findings.

Related work
Accuracy-diversity trade-off. Along with the initial work (Smyth & McClave, 2001), also Bridge and Kelly (2006) promotes the accuracy-diversity trade-off as a joint objective for effective RecSys. A popular direction to tweak this trade-off is by two-step approaches, in which re-ranking is applied to a retrieved list to boost diversity (Adomavicius & Kwon, 2009;Zhang & Hurley, 2008). The work in Adomavicius and Kwon (2008) re-ranks items based on rating variance in the neighborhood of a user, while (Adomavicius & Kwon, 2011) uses re-ranking to cover a larger portion of the catalog. Also Eskandanian and Mobasher (2020), Hamedani and Kaedi (2019) diversify items to improve coverage in a user-personalized manner. The work in Hamedani and Kaedi (2019) optimizes the recommendation list to improve accuracy and diversity but reduce item popularity, while uses matching problems to improve coverage while minimizing the accuracy loss. Instead, Hurley and Zhang (2011) proposes a new metric to quantify diversity within a list and develops an optimization algorithm to improve it. Methods and algorithms in this category rely heavily on the initial recommendation list, which makes it difficult to attribute to which extent the improved trade-off is due to re-ranking or to the properties of the list.
Another category of approaches considers a single algorithm and leverage side information, such as metadata or context, to improve diversity. The work in Hurley (2013) builds an item-item dissimilarity graph from features and uses this graph in a learning-to-rank framework. Also the work in Gogna and Majumdar (2017) uses item features to provide a single method for matrix completion. Differently, Panniello et al. (2014) leverages context and evaluates different pre-filtering, post-filtering, and modeling schemes in terms of accuracy and diversity. Our approach, instead, balances accuracy with diversity without relying on side-information in both a learning-to-rate and learning-to-rank framework.
A third category of approaches modifies conventional accuracy-oriented algorithms to improve diversity. Authors in Liu, Shi, and Guo (2012) build similarities by avoiding the influence of popular objects or high-degree users on the direction of random walk, which is shown heuristically to improve diversity. The work in Gan and Jiang (2013) adjusts the calculation of user similarities in the classic NN approach to improve diversity. A broader analysis following this line is recently presented in Anelli et al. (2019). The work in Wasilewski and Hurley (2016) follows up on Hurley (2013) and uses the regularizer in the learning-to-rank loss to improve diversity. In our view, the latter overloads the regularizer with an additional objective. Since the primary goal of the regularizer is to generalize the model to unseen data, leveraging it also to improve diversity leads to a triple accuracy-diversity-generalization trade-off, which is challenging to handle. Likely, one of the three objectives will be treated as a byproduct, which reduces the possibilities to steer the optimization of the accuracy-diversity trade-off. Differently, Said, Fields, Jain, and Albayrak (2013), Said et al. (2012) connect users based on their dissimilarities and propose the so-called furthest neighbor (FN) collaborative filtering -contrarily to the vanilla NN collaborative filtering. By using information from neighbors that a user disagrees with, this approach was shown to improve diversity by affecting accuracy by little. Yet, the degree to which FNs affect the accuracy-diversity trade-off remains insufficiently investigated. In our approach, we leverage both the NN and the FN in a joint convolutional model to better understand this influence.
While changing the inner-working mechanism of a single model can improve diversity, a single model often lacks the ability to capture the complex relationships contained in highly-sparse RecSys datasets. A fourth category of approaches overcomes this issue by working with an ensemble of models, also referred to as joint or hybrid models. These models have a higher descriptive power that can better balance accuracy with diversity at the expense of complexity, which is often of the same order of magnitude. Authors in Zeng et al. (2010) propose a joint collaborative filtering algorithm that leverages the influence of both similar and dissimilar users. The dissimilarity is computed by counting the items two users have consumed individually but not jointly. The predicted ratings from the similar and dissimilar users are merged into a final score and the influence of each group is controlled by a scalar. The way dissimilarity is computed ignores the fact that users may have consumed the same item, but rated it differently. Also, building dissimilarities from non-consumed items ignores the fact that a user may also like an item other users have consumed separately. To avoid the latter, we account for the ratings when building dissimilarities between entities. The authors of Zhou et al. (2010) follow a similar strategy as Zeng et al. (2010) and mix a heat spreading with a random walk to provide an accuracy-diversity trade-off. A probabilistic model to balance accuracy with diversity is further proposed in Javari and Jalili (2015). The latter considers the order in which items are consumed and proposes a joint model, which on one branch maximizes accuracy, while on the other, diversity. In contrast to this, we train the whole model jointly.
Graph convolutions in RecSys. Graph convolutions have been introduced to the RecSys domain only recently (Berg, Kipf, & Welling, 2017;Huang, Marques, & Ribeiro, 2017;Monti et al., 2017). The approach proposed in Huang et al. (2017), subsequently extended to Huang, Marques, and Ribeiro (2018), showed the NN collaborative filter is a non-parametric graph convolutional filter of order one. This work also showed that higher-orders parametric graph convolutional filters improve rating prediction. These graph convolutional filters are the basis to form GCNNs Gama et al. (2020) and we will use them to balance accuracy with diversity. The work in Monti et al. (2017) merges GCNNs with a recurrent neural network to complete the user-item matrix. Instead, Berg et al. (2017) completes the matrix with a variational graph autoencoder, in which graph convolutions are performed by an order-one graph convolutional filter. The work in Chen, Wu, Hong, Zhang, and Wang (2020) uses the same graph convolution as Huang et al. (2018), but uses it in a learning-to-rank setting. Although starting from different standpoints and naming the method differently, the two approaches are identical from a technical perspective. Taken together, Huang et al. (2018) and Chen et al. (2020) showed that linear graph convolutions may often suffice in highly sparse RecSys datasets. We shall corroborate this behavior also in the accuracy-diversity trade-off setting.
The authors in Sun et al. (2019) deployed a GCNN with filters of order one on an augmented graph comprising the user-item bipartite interaction graph and also user-user and item-item proximal graphs. The work in Wang, He, Wang, Feng, and Chua (2019b) also learns from the user-item bipartite interactions through an order one GCNN, but augments the propagation rule to promote exchanges from similar items. Authors in Ying et al. (2018) combine random walks with graph convolutions to perform recommendations in large-scale systems containing millions of nodes. Authors in Wang et al. (2019a) first build a user-specific knowledge graph and then apply graph neural networks to compute personalized recommendations. They also regularize the loss to enforce similar scores in adjacent items. Lastly, GCNNs have been used for location recommendation in Zhong et al. (2020). Two GCNNs are run over two graphs, a point-of-interest graph and a social relationship graph to identify these points-of-interest for a user.
Altogether, these works show the potential of graph convolutions in changing the RecSys landscape. However, all approaches focus only on accuracy and ignore recommendation diversity. In this work, we consider graph convolutions to establish an accuracydiversity trade-off in both a learning-to-rate and learning-to-rank setup.

Learning from similar nearest neighbors
Consider a recommender system setting comprising a set of users U = {1, …, U} and a set of items I = {1, …, I}. Ratings are collected in the user-item matrix X ∈ R U×I , in which entry X ui contains the rating of user u to item i. Ratings are mean-centered 1 , so that we can adopt the common convention X ui = 0 if value (u, i) is missing. The objective is to populate matrix X by exploiting the relationships between users and items contained in the available ratings. We capture these relationships through a graph, which is built following the principles of NN collaborative filtering. This graph is used to predict ratings and the k items with the highest predicted rating form the recommendation list.
In user-based NNs, relationships are measured by the Pearson correlation coefficient. Consider a matrix B ∈ R U×U , in which entry B uv measures the correlation between users u and v. Matrix B is symmetric and can be seen as the adjacency matrix of a user-correlation graph G u = (U ,E u ). The vertex set of G u is the user set U and the edge set E u contains an edge (u, v) ∈ E u only if B uv ∕ = 0. Each item i is treated separately and user ratings are collected in vector x i ∈ R U (corresponding to the ith column of X). Vector x i can be seen as a signal on the vertices of G u , which uth entry [x i ] u := X ui is the rating of user u to item i, or zero otherwise (Shuman, Narang, Frossard, Ortega, & Vandergheynst, 2013); see Fig. 2 (a). Predicting ratings for item i translates into interpolating the missing values of graph signal x i . These values are estimated by shifting available ratings to neighboring users. First, we transform the global graph B to an item-specific graph B i which contains only the top-n positively correlated edges per user and normalize their weights; see Fig. 2 The NN shifted ratings to immediate neighbors can be written as which holds true because matrix B i respects the sparsity of the user-graph adapted to item i. In item-based NNs, the procedure follows likewise. First, we construct an item-item correlation matrix C ∈ R I×I in which entry C ij is the Pearson correlation coefficient between items i and j. Matrix C is symmetric and it is the adjacency matrix of an item-correlation graph G i = (I ,E i ). The vertex set of G i matches the item set I and the edge set E i contains an edge (i, j) ∈ E i only if C ij ∕ = 0. Then, we consider the complementary scenario and treat each user u separately. We collect the ratings of user u to all items in the graph signal x u ∈ R I (corresponding to the uth row of X). Finally, item-based NN interpolates the missing values in x u through shifts to neighboring items. Building a user-specific graph C u from C, keeping only the top-n positively correlated edges per item, and normalizing the weights, we predict the ratings as In either scenario, matrices B, {B i } i , C, and {C u } u can be regarded as instances of a general graph adjacency matrix variable S of a graph G = (V , E ) containing |V | nodes and |E | edges. We denote the available rating signal by x and the estimated rating signal by x so that we write estimators (1) and (2) with the unified notation x = Sx. ( As it follows from (3), NN estimators rely only on ratings present in the immediate surrounding of a node. But higher-order neighbors carry information that can improve prediction and their information should be accounted for accordingly to avoid destructive interference. Graph convolutional filters have proven themselves to be the tool for capturing effectively multi-resolution neighbor information when learning over graphs (Ortega et al., 2018), including recent success in multi-resolution NN collaborative filtering (Huang et al., 2018;Monti et al., 2017). We detail in the sequel the graph convolutional filter and the respective GCNN extension.

Nearest neighbor graph convolutional filters
Estimator (3) accounts for the immediate neighbors to predict ratings. Similarly, we can account for the two-hop neighbors via the second-order shift S 2 x. Writing S 2 x = S(Sx) shows the second-order shift builds a NN estimator S(⋅) on the previous one Sx. We can also consider neighbors up to K-hops away as S K x = S(S K− 1 x). To balance the information coming from the different resolutions Sx, S 2 x, …, S K x, we consider a set of parameters h = [h 0 , …, h K ] ⊤ and build the Kth order NN predictor where H(S) = ∑ K k=0 h k S k is referred to as graph convolutional filter of order K (Ortega et al., 2018;Shuman et al., 2013). 2 The ratings x in (4) are built as a shift-and-sum of the available ratings x. Particularizing G to G u , (4) becomes a graph convolutional filter estimator over the user NN graph with estimated ratings for item i Particularizing G to G i , (4) becomes a graph convolutional filter over the item NN graph with estimated ratings for user u Moreover, the vanilla NN collaborative filters (1) and (2) are the particular case of (5) and (6), respectively, obtained for K = 1 and h 0 = 0 and h 1 = 1.
Graph convolutional filters are defined by the K + 1 parameters h. Further, since the shift operator matrix S matches the NN structure, it is sparse, therefore, obtaining the output x in (4) amounts to a complexity of order O (|E |K). These properties are important to deal with scarcity of data and scalability.

Nearest neighbor graph convolutional neural networks
Besides numerical efficiency, graph convolutional filters have high mathematical tractability and are the building block for graph convolutional neural networks (Gama et al., 2020). To build a GCNN with the filter in (4), consider the composition of a set of L layers.
The first layer ℓ = 1 comprises a bank of F 1 filters H f 1 (S) each defined by coefficients {h f 1k } k . Each of these filters outputs graph signals u f 1 = H f 1 (S)x, which are subsequently passed through a pointwise nonlinearity σ(⋅) to produce a collection of F 1 features x f 1 that constitute the output of layer ℓ = 1, i.e., At subsequent intermediate layers ℓ = 2, …, L − 1, the output features {x g ℓ } g of the previous layer ℓ − 1 become inputs to a bank of The filter outputs obtained from a common

Fig. 2.
Rating prediction with similar and dissimilar Graphs. We construct a NN graph G s capturing similarities between entities, and a FN graph G d capturing dissimilarities between entities. On each graph, we run a graph convolutional module Φ(⋅) with respective parameter set ]. The estimated outputs are combined through a parameter α to obtain the final joint estimate X . input x g ℓ− 1 are aggregated and the result is passed through a nonlinearity σ(⋅) to produce the F ℓ output features Operation (8) is the propagation rule of a generic layer ℓ of the GCNN, which final outputs are the F L features x 1 L ,…,x FL L . These final convolutional features are passed through a shared multi-layer perceptron per node to map the F L features per node n, [x 1 Ln ; …; x FL Ln ], into the output estimate x n .
The GCNN can be seen as a map Φ(⋅) that takes as input a graph signal rating x, an entity-specific graph S, and a set of parameters H

= {h
fg ℓk } for all layers ℓ, orders k, and feature pairs (f, g). This map produces the estimate The GCNN leverages the coupling between the rating and the NN graph in the input layer to learn higher-order representations in the intermediate layers. This coupling is captured by the bank filters as per (4). Consequently, the GCNN inherits the numerical benefits of the graph convolutional filter. Denoting by F = max ℓ F ℓ the maximum number of features for all layers, the number of parameters defining the GCNN is of order . The latter are in the same order of magnitude as for the graph convolutional filter [cf. (4)]. The filter can in fact be viewed as a particular GCNN map Φ(⋅) [cf. (9)] limited to linear aggregations; refer also to Gama et al. (2020) for more details. In the remainder of this paper, we will denote by Φ(⋅) both the filter and the GCNN and refer to them with the common terminology graph convolutions.

Accounting for disimilar furthest neighbors
We work with a NN similarity-graph The dissimilarity graph is built by following the opposite principles of NNs, i.e., connecting each entity to its top-n most negatively related ones. To illustrate the latter, consider C captures item-item correlations while vector x u contains the ratings of user u to all items. The user-specific dissimilarity graph has the adjacency matrix C u obtained from C by: i) removing any edge starting from an item not rated by user u; ii) keeping for each item i the n most negatively connections; iii) normalizing the resulting matrix to make C u right stochastic. In other words, defining N iu as the set containing the n most dissimilar items to i rated by user u, This procedure for building the user-specific FN graphs differs from the NN approach only in step ii). The normalization step ensures a similar magnitude of signal shifting [cf.
(3)] on both NN and FN graphs and implies the entries of C u are positive, i.e., a larger value indicates a stronger dissimilarity. In the considered datasets, positive correlations go up to 1.0 while the negative correlations down to − 0.2.
On each graph G s and G d we have a convolutional module Φ s (x; S s ; H s ) and Φ d (x; S d ; H d ), outputting an estimate of the user-item matrix X s and X d , respectively. We combine the two outputs in the joint estimate where scalar α ∈]0, 1[ balances the influence of similar and dissimilar connections; Fig. 2. Each graph G s or G d can be a user or an item graph and the graph convolutional modules Φ(⋅) can be linear [cf. (4)] or nonlinear [cf. (9)]. This framework yields eight combinations to investigate the trade-off. We limit ourselves to situations where the graph convolutional modules are the same on both graphs and focus on the four combinations in Table 1. To ease exposition, we shall discuss the theoretical methods with the hybrid combination user NN graph (i.e., G s,u with adjacency matrix B i for item i) and item FN graph (i.e., G d,i with adjacency matrix C u for user u). This setting implies we predict rating X ui by learning, on one side, from the coupling (G s,u , x i ), and, on the other side, from the coupling Joint models like the one we consider are popular beyond the RecSys literature. The works in Hua et al. (2019), Sevi, Rilling, and Borgnat (2018) consider two different shift operators of the same graph to model signal diffusion with graph convolutional filters [cf. (4)]. This strategy is subsequently extended to GCNNs in Dehmamy, Barabási, and Yu (2019). Instead, Chen, Niu, Lan, and Liu (2019), Ioannidis, Marques, and Giannakis (2019) exploit different relationships between data to build GCNNs. The common motivation in all these works is that a model based on a single graph (often capturing similarities between nodes (Mateos, Segarra, Marques, & Ribeiro, 2019)) or a single shift operator is insufficient to represent the underlying relationships. Therefore, we argue a joint model capturing different interactions helps representing better the data. A model based only on NNs fails giving importance to items that differ from the main trend. FNs account for this information and aid diversity. However, the information from FNs should be accounted for properly during training to keep the accuracy at the desired level. We detail this aspect in the upcoming two sections.

Learning for rating
In this section, we estimate the joint model parameters w.r.t. the mean squared error (MSE) criterion. Analyzing the MSE quantifies also the trade-off for all items in the dataset (unbiasing the results from the user preferences in the list). 3 The MSE also provides insights into the role played by the FNs. To this end, consider a training set of user-item pairs T = {(u, i)} for the available ratings in X. Consider also the user-similarity graph G s,u , the item-dissimilarity graph G d,i , and their respective graph We estimate parameters H s and H d by solving the regularized problem where MSE (u,i)∈T (⋅; ⋅) measures the fitting error w.r.t. the available ratings X T , while the second term acts as an accuracy-diversity regularizer. 4 Scalar α controls the information flow from the NN and the FN graph.
problem (12) forces parameters H d to a smaller norm rather than using them to fit the data. Hence, this setting mainly leverages information from the similar graph Φ s (⋅), ultimately, reducing the ensemble to an accuracy-oriented NN graph convolutional model.
2 /α, which implies the information from the similar graph plays little role in fitting since parameters H s are forced towards zero. Hence, problem (12) mainly exploits information from FNs to reduce the MSE.
Intermediate values of α closer to zero than to one lead to models where most information is leveraged from the NNs to keep the MSE low, while some information is taken from FNs to add diversity. We refer to α as the trade-off parameter. Scalar μ balances the fitting error with the overall regularization and allows generalizing the model to unseen data.

Graph convolutional filter
Recall the graph convolutional filter in (4) and consider graphs G s,u and G d,i can have different number of nodes. To account for this technicality in the design phase, we first transform the filters into a more manageable form. The filter output on the user-similarity graph B i can be written as The τth row of M s corresponds to the τth (u, i) tuple. Denoting by x T = vec(X T ) the |T | × 1 vector of available ratings, we can write the filter output for all training samples as x s,T = M s h s . Likewise, we can write the filter output over the item-dissimilarity graph which is a regularized-least squares problem in the filter coefficients h s and h d . The closed-form solution for (15) can be found by setting the gradient to zero, i.e., or equivalently solving the linear system of equations If the matrix inversion in (17) is ill-conditioned, we can always solve (15) with of-the-shelf iterative methods. The above procedure leads to an optimal balance between the information coming from the NNs and the FNs.

Graph convolutional neural network
We now consider models Φ s (x i ; B i ; H s ) and Φ d (x u ; C u ; H d ) are GCNNs running respectively over graphs B i and C u . Particularizing (12) to this setting implies solving where [Φ s (x i ; B i ; H s )] u is the user-similarity GCNN output for user u and [Φ d (x u ; C u ; H d )] i is the item-dissimilarity GCNN output for item i. Problem (18) preserves the trade-offs of the general version (12), but it is non-convex and little can be said about its global optimality. However, because of the compositional form of the GCNN, we can estimate parameters H s and H d via standard backpropagation since the graph convolutional filters are linear operators in the respective parameters (Goodfellow, Bengio, & Courville, 2016). The following remark is in order.

Remark 1.
In (18), we considered the accuracy-diversity parameter α only in the regularizer and not also in the fitting part as in (11). We found that including the latter to the MSE term leads to a more conservative solution towards diversity. We have consistently seen that keeping α only in the regularizer allows for a better trade-off. Furthermore, the regulariser in (18) does not need be rational in α, but can be in any form as long as it balances the NNs with the FNs. An alternative is Ω( 2 ).

Learning for ranking
This section designs the joint model for ranking. We considered the Bayesian personalized ranking (BPR), which is a state-of-the-art learn-to-ranking framework (Rendle, Freudenthaler, Gantner, & Schmidt-Thieme, 2009). BPR considers the rating difference a user u has given to two items i and j. Let symbol i≻ u j indicate user u rated item i more than item j and augment the training set as T ⊆U ×I ×I to contain triples of the form T = {(u,i,j)|i≻ u j}. For each available tuple (u, i) we created four triples {(u, i, j)} j such that X ui > X uj following (Rendle et al., 2009). Subsequently, the estimated ratings for tuples (u, i) and (u, j) are respectively and the utility function is which expresses the rating difference as a parametric relationship between user u, item i, and item j. The utility function is used to estimate parameters H s , H d by maximizing the likelihood where σ(x) = (1 + e − x ) − 1 is the logistic sigmoid function (Rendle et al., 2009). By applying the natural logarithm (monotonic increasing) to (21) and regularizing it, we can estimate the joint convolutional model parameters by solving the regularized optimization problem Differently from (5), the regularizer in (22) is linear in α. We opted for this choice because the linear was more robust to μ. Nevertheless, the regulariser in (22) respects the same trend as that in (5): for α → 0, NNs are mainly used for fitting since α ‖ H s ‖ 2 2 → 0; vice-versa, for α → 1 the FNs are mainly used for fitting since (1 − α) ‖ H d ‖ 2 2 → 0.

Graph convolutional filter
Particularizing the convolutional models to filters [cf. (4)], (19) becomes Function − lnσ(X uij (h s , h d )) is convex since it involves a log-sum-exp of an affine function Boyd, Boyd, and Vandenberghe (2004). Consequently, problem (24) is convex in h s and h d . Convexity guarantees we can find a minimizer for (24) but not a closed-form solution. In fact, finding an analytical solution for logistic fitting problems is notoriously difficult except for particular instances (Lipovetsky, 2015). However, we can get the optimal parameters for (24) through the stochastic gradient descent updates where γ is the stepsize. These optimal parameters guarantee the best learning-to-rank solution for any balance between the NNs and FNs (α) and between fitting and generalization (μ).

Graph convolutional neural network
When Φ s (x i ; B i ; H s ) and Φ d (x u ; C u ; H d ) are GCNNs, the BPR optimization problem is that in (22). Because of the nonlinearity, it is difficult to establish if a global minimum exists and we should seek for a satisfactory local minimum. Since cost (22) is differentiable w. r.t. H s and H d , we can achieve this local minimum through conventional backpropagation.
Either estimated for rating or ranking, the coefficients of the joint model dictate the filter behavior (either directly or within the GCNN layers) on the NN and FN graphs. Besides analyzing the filter behavior in the node domain (as multi-hop rating aggregation) and in the respective cost functions (as accuracy-diversity trade-off), we can also get insight on the trade-off by analyzing the graph convolutional modules in the graph spectral domain (Ortega et al., 2018). We discuss this aspect next.

Spectral explanation
We conduct here a spectral analysis of graph convolutions to show they act as band-stop filters on both NN and FN graphs. First, we recall the concept of Fourier transform for signals on directed graphs (Sandryhaila & Moura, 2014). Assuming the shift operator S is diagonalizable, we can write S = UΛU − 1 with eigenvector matrix U = [u 1 , …, u N ] and complex eigenvalues Λ = diag(λ 1 , …, λ N ). The graph Fourier transform (GFT) of signal x is The ith GFT coefficient x i of x quantifies the contribution of the ith eigenvector u i to expressing the variability of x over the graph. The latter is analogous to the discrete Fourier transform for temporal or spatial signals. In this analogy, the complex eigenvalues λ i ∈ Λ are referred to as the graph frequencies (Sandryhaila & Moura, 2013;Shuman et al., 2013). The inverse transform is x = Ux.
To measure the graph signal variability, we follow Sandryhaila and Moura (2014) and order the graph frequencies λ i based on their distance from the maximum eigenvalue λ max (S). This ordering is based on the notion of total variation (TV), which for the eigenpair (λ i , u i ) is defined as where ‖ ⋅ ‖ 1 is the ℓ 1 -norm. The closer λ i to the maximum eigenvalue λ max (S), the smoother the corresponding eigenvector u i over the graph (i.e., values on neighboring nodes are similar). If signal x changes little (e.g., similar users have similar ratings), the corresponding GFT x has nonzero entries mostly in entries x i which index i corresponds to a low graph frequency λ i → λ max (S) (low TV). Contrarily, if signal x varies substantially in connected nodes, the GFT x has nonzero values also in entries x i which index i corresponds to a high graph frequency λ i ≫ λ max (S) (high TV); refer to Ortega et al. (2018), Sandryhaila and Moura (2014) for further detail. With this analogy in place, we substitute the eigendecomposition S = UΛU − 1 into the graph convolutional filter (4) and obtain the filter input-output relationship in the spectral domain where x = U − 1x is the GFT of the output and H(Λ) = ∑ K k=0 h k Λ k contains the filter frequency response on the main diagonal. Relation (28) shows in first place graph convolutional filters respect the convolutional theorem because they act as a pointwise multiplication between the filter transfer function H(Λ) and the input GFT x. Therefore, analyzing H(Λ) shows how the filter processes the input ratings x to estimate x . We evaluate the frequency responses of the filter and respective GCNN when deployed on the similar NN and the dissimilar FN graphs for the MovieLens 100K dataset. The latter allows for a direct comparison with the vanilla NN and the graph convolutional NN filter (Huang et al., 2018). (14)]. Substituting the eigendecompositions B i = U s,i Λ s,i U − 1 s,i and C u = U d,u Λ d,u U − 1 d,u , we can write the outputs in the graph frequency domain respectively as

(13)] and an item-FN filter H(C
In (29), u are the responses of filters H(B i ) and H(C u ), respectively. To estimate the responses, we first get the parameters from (15) for rating or (22) for ranking and order the eigenvalues λ n,i (resp. λ n,u ) of each B i (resp. C u ) as per the total variation in (27). Subsequently, for each B i (resp. C u ) we record the frequency responses {H(Λ s,i )} i (resp. {H(Λ d,u )} u ) and average them across all items I (resp. users U) to get a single frequency response over the user-NN graph (resp. item-FN graph). The frequency responses are shown in Fig. 3 for different values of α.
In all cases, we observe a band-stop behavior since more than 90% of the response in the middle frequencies is zero. The latter corroborates the behavior of the vanilla and graph convolutional NN filter (Huang et al., 2018). Another behavior inherited from the NN/FN graphs is that filters preserve the extreme low and high graph frequencies. Low graph frequencies are signals with a small total variation [cf. (27)], while high graph frequencies are signals with a high total variation.
-In the user-NN graph, low frequencies represent signals where similar users give similar ratings. This part is the global trend of preferences among similar users, which is leveraged to predict ratings. High frequencies represent discordant ratings between similar users for a particular item and can be seen as a primitive source for diversity. -In the item-FN graph, the spectral behavior is the same but implications are different. Low frequencies represent ratings with a small difference in dissimilar neighboring items; implying, a user u gave similar ratings to dissimilar items. These low frequencies may also be because users rate negatively a subset of dissimilar connected items and positively another subset of dissimilar connected items. The high pass components represent ratings changing significantly between neighboring dissimilar items; e.g., one of the two dissimilar items sharing an edge is rated positively while the other negatively. This part contributes towards keeping high the recommendation accuracy while relying on negative correlations between items.
These insights show the joint linear models eliminate irrelevant features (band-stop behavior), smooth out ratings (low frequencies), and preserve discriminative features to aid diversity (high frequencies). This phenomenon is observed for different values of α (importance on NNs vs. FNs) and design criteria (MSE [cf. (15)] vs. BPR [cf. (24)]). The frequency response changes less with α in the MSE design (lines differ by 10 − 3 ) than in BPR. This might be because the MSE focuses on the average rating prediction for all items (preferred or not), while the BPR prioritizes a subset of most preferred items. In BPR, we also observed a stronger band-stop behavior for α → 1 meaning the joint model focuses even more on extreme frequencies to predict ratings. This suggests the model relies on the average trend on both graphs (lower frequencies) and on highly dissimilar values in adjacent entities (higher frequencies).

Graph convolutional neural networks
We now analyze the frequency response of the filters in the GCNN (8). Fig. 4 illustrates the latter for a one-layer GCNN with F = 2 filters over each graph. We observe again the strong band-stop behavior. In the NN graph, the stopped band is narrower than in the FN graph, and it is narrower if the GCNN is learned for ranking than rating. The band-stop behavior and the increased focus on the extremly low and high graph frequencies suggest the GCNN leverages the information in a similar way as the linear counterpart. We refer to the previous section to avoid repetition. Lastly, we remark the band-stop behavior is also observed in the vanilla NN [cf.

Numerical experiments
This section corroborates the proposed schemes through experiments with three real datasets of different sparsity, namely, MovieLens100k and MovieLens1M Harper and Konstan (2015), Douban Ma, Zhou, Liu, Lyu, and King (2011) and Flixster Jamali and Ester (2010). Table 2 summarizes their features. We evaluate the trade-offs of the joint models for all combinations in Table 1. We considered both the linear [cf. (4)] and the nonlinear graph convolutional models [cf. (8)] designed for rating [cf. (12)] and ranking [cf. (22)], leading to 16 combinations. We considered the same data pre-processing and train-test split as in Monti et al. (2017). The code to reproduce our results and apply the model to other data is available as an open-source software. 5 We quantified accuracy through the root MSE (RMSE) -the lower the better-and the normalized discounted cumulative gain @k (NDCG@k) -the higher the better-and diversity through the aggregated diversity @k (AD@k) and individual diversity @k (ID@k) -both the higher the better (Herlocker, Konstan, Terveen, & Riedl, 2004;Zhang & Hurley, 2008;Ziegler, McNee, Konstan, & Lausen, 2005). The RMSE measures the fitting error for all ratings, while the NDCG@k accounts also for the item relevance in a list of length k. The AD@k is a global at-the-dataset metric and measures the fraction of all items included in all recommendation lists of length k. The ID@k is a local at-the-user metric and measures the average diversity in each recommendation list. A high ID@k does not imply a high AD@k and vice-versa Adomavicius and Kwon (2011), Wang and Yin (2013). Appendix A provides further detail 6 We considered a GCNN architecture composed of a single hidden layer with two parallel filters. We trained the GCNN using the ADAM optimizer with the default parameters (Kingma & Ba, 2014) and sought different learning rates γ and fitting-regularizer parameter μ. To limit the search of different hyperparameters, we proceeded with the following rationale. First, we performed an extensive parameter analysis in the MovieLens100k dataset, since this dataset is common in the two most similar graph convolutional works (Huang et al., 2018;Monti et al., 2017) and in the accuracy-diversity trade-off works (Adomavicius & Kwon, 2009;Zeng et al., 2010). We then used the best performing setting in this dataset and corroborated the trade-offs in the remaining three. Second, we chose the hyperparameters of the similarity graph (number of nearest neighbor, filter order, length of the recommendation list) from the linear graph convolutional filter optimized for rating [cf. (15)] (Huang et al., 2018). Besides being a faster design method to seek different parameters, this strategy allowed evaluating also the accuracy-diversity trade-off of the graph convolutional NN filter. Finally, we kept fixed these parameters for the NN graph and evaluated different combinations on the FN graph.

Accuracy-diversity trade-off for rating
We first study the trade-off when the joint models are trained for rating [cf. Section 5]. For the NN module, we used the parameters derived in Appendix B. For the FN module, we fixed the number of neighbors to the arbitrary common value 40, evaluated different filter orders K ∈ {1, 2, 3}, and show the best results.  Table 1 as a function of α ∈ [0.1,0.9]. As we increase the influence of FNs (α → 1), the RMSE increases. The linear filters are more robust to α than the GCNN. We attribute the latter to the convexity of their design problem. Increasing α increases diversity, while the AD and ID exhibit opposite behavior. Values of α up to 0.5 offer a good trade-off as the RMSE remains unaffected but diversity increases substantially.
To further quantify the trade-off, we allow the RMSE to deteriorate by at most 3% w.r.t. the NN setup [cf. Appendix B] and pick a value of α that respects such constraint. Table 3 compares the different models. For a user NN graph, the joint models (i.e., UU and UI) boost substantially one diversity metric. We believe this is because models build only on user-NN graphs are conservative to both diversity metrics [cf. Fig. B.11], therefore, the margin for improvement is larger. Contrarily, for an item NN graph, the joint models (i. e., IU and II) are conservative and improve by little both diversity metrics. We also highlight the case of II-GCNN which improves the RMSE and AD while keeping the same ID.

Accuracy-diversity trade-off for ranking
With the same setting of the last section, we now evaluate the trade-off when the joint models are optimized for ranking [cf. Section 6]. These results are shown in Fig. 6. A higher importance to FNs (α → 1) reduces the NDCG@20 but improves diversity. Both the filter and the GCNN are less sensitive to α when designed for ranking. While for the filter we may still attribute this robustness to the  6 We have also evaluated the models with different metrics including: the mean absolute value (MAE), a surrogate of the RMSE for rating; precision and recall @k, which are ranking-oriented metrics for accuracy; and entropy diversity, which measures the models' ability to recommend items in the long-tail. We have observed these metrics respect the accuracy-diversity trade-off we report and have omitted them for conciseness. optimality of the design problem, the results for the GCNN suggest the BPR leverages better the information from FNs. Note also the filter on the UI combination pays little in NDCG but gains substantially in AD and ID.
To further quantify these results, in Table 4 we show the diversity gain when reducing the NDCG by at most 3%. We note that it is often sufficient to deteriorate the NDCG by 1% and gain substantially in diversity. Bigger diversity improvements are achieved when one of the two graphs is item-based. Lastly, we notice the joint GCNN models gain less in diversity compared with linear filters. The GCNN can be further improved by tuning its parameters.  Fig. 5. RMSE, AD@20 and ID@20 as a function of the accuracy-diversity parameter α for models optimized for rating. As more information from the dissimilar connections is included, the RMSE deteriorates but diversity improves. The RMSE of the GCNN is more sensitive to α as its hyperparameters are not tuned.

Comparisons with accuracy-oriented models
In this section, we analyze how the trade-offs of the joint models compare with those achieved by five accuracy-oriented alternatives including state-of-the-art 7 user NN filter [cf. (5)], item NN filter [cf. (6)], and the multi-graph convolutional neural network (MGCNN) (Jamali & Ester, 2010); but also the conventional methods of low-rank matrix completion (LR-MC) (Mazumder, Hastie, & Tibshirani, 2010) and matrix factorization optimized w.r.t. BPR (MF-BPR) (Rendle et al., 2009). Save the last, the first four are designed for rating. We first compare the models in MovieLens100k dataset and then in Douban and Flixster. We consider only the UI combination. Fig. 7 contrasts the RMSE and NDCG@20 with the diversity metrics the AD@20 (left) and ID@20 (right) for α ∈ [0.1,0.9]. The accuracy of GCNN is more sensitive to α than the other models. The GCNN gives also more importance to diversity within the list (ID) rather than covering the catalog (AD). This indicates a few items are recommended by the GCNN but are different between them. Contrarily, the joint linear filters are more robust to accuracy losses, gain in AD, but pay in ID. Contrasting the proposed approaches with the other alternatives, we observe: -Rating-optimized models (MGCNN, user NN filter, item NN filter, and LR-MC) achieve a lower RMSE but face problems in AD. The item NN filter achieves a reasonable AD but its ID is very low. The MGCNN over-fits the RMSE by prioritizing a few popular items to all users as shown by the low AD and high ID. The joint linear filter can substantially improve the AD by paying little in RMSE, while the GCNN requires additional tuning. The improved AD comes often at expenses of ID, yet values of α ≈ 0.3 offer a good balance between the two. We can further improve the ID with the IU combination [cf. Fig. 5].   Fig. 6. NDCG@20, AD@20 and ID@20 as a function of the accuracy-diversity parameter α for ranking-optimized models. As more information from FNs is included, the NDCG@20 deteriorates but diversity improves. The NDCG is less sensitive to α compared with the RMSE for both the joint graph convolutional filter and GCNN.

Table 4
NDCG@20, AD@20 and ID@20 for the models working on the NN graph and for the joint models optimized for ranking. In brackets we show the standard deviation of NDCG@20 and ID@20. NDCG@20 AD@20 ID@20 The latter generalize the vanilla NN collaborative filter.
-The ranking-optimized method (BPR-MF) achieves a high NDCG but still lower than the rating-design user NN filter. This high accuracy is again linked to filling the list with a small group of different items. The joint models optimized for ranking overcome this limitation by making the list slightly more similar (lowering ID) but increasing the catalog coverage (improving AD). This strategy keeps the NDCG high.
Overall, we conclude that a high accuracy from the NNs is tied with an increase of list diversity (ID) but also with a scarce catalog Fig. 7. RMSE and NDCG@20 versus AD@20 and ID@20 as a function of parameter α. Results are shown for the joint models on the UI graph combinations and for five baselines on MovieLens100k. Increasing α for the joint model degrades the accuracy where the GCNN method is more sensitive. Contrarily, diversity increases and we see an opposite behavior between the AD@20 and ID@20. Fig. 8. Accuracy vs. diversity comparisons of the proposed approach for differnet values of α and five baselines in MoveLens1M. We see again that increasing α increases both AD@20 and ID@20 while the NDCG@20 reduces. The GCNN is again more sensitive than the linear counterpart.
coverage (AD). The proposed joint models can keep a reasonable accuracy while contributing to a higher diversity.
MovieLens1M. Likewise, we show in Fig. 8 the performance in the MoveLens1M dataset. We illustrate only the NDCG but a similar trend is achieved also for the RMSE. Overall, we can see the same trend and trade-off as above. While we can see the lienar counterpart can scale and preserve the trade-off without needed a parameter tuning, this is not the case for the GCNN. Especially, if we want to target a trade-off w.r.t. the individual diversity.
Douban & Flixster. We now compare the different models in two datasets containing fewer interactions compared with Movie-Lens100k; see Table 2. The sparsity of these datasets brings additional challenges when evaluating the NDCG@k. For a list of length 20 there is only one test user for Fixster and none for Douban. To have statistically meaningful results, we measured the NDCG for a list of length k = 5, which leads 1,373 test users for Douban and 126 for Fixster. However, to have a unified diversity comparison with the MovieLens100k dataset, we computed the diversity for a list of length 20.
In Table 5, we show the performance for Douban and Fixster datasets, respectively. For our models, we report the extreme values α = 0.1 and α = 0.9 and a hand-picked value of α. As α increases, the joint models lose in accuracy but gain in diversity. We see again the sensitivity of GCNNs to α for which the RMSE may also reach unacceptable values. The joint models optimized for ranking can always provide a better NDCG w.r.t. MF-BPR while offering a higher diversity. In general, the best trade-off by the joint models is achieved by the GCNN designed for rating and filters designed for ranking.

Comparison with accuracy-diversity algorithms
In this final section, we compare the joint UI linear model with two alternatives that propose a similar accuracy-diversity trade-off Zeng et al. (2010), Zhou et al. (2010). The hybrid approach in Zeng et al. (2010) Zhou et al. (2010) merges a heat diffusion with a random walk to balance accuracy with diversity over item-item graphs. This approach controls the influence of each model similarly to our method through a scalar α ∈ [0, 1]. Both works predict the probability of an item being consumed by a user rather than the rating. Therefore, we compare the accuracy w.r.t. the NDCG. As a baseline, we also consider the vanilla user FN and item FN collaborative filters.
In Fig. 9, we show the trade-offs of the different methods for all three datasets. We see the proposed joint model achieves consistently the highest NDCG while offering a margin to improve accuracy. This behavior is better highlighted in MovieLens100K dataset for which the method hyperparameters have been chosen. We attribute the latter to the fact that the joint model learns its parameters to improve ranking accuracy rather than being a simple fusion of two separate entities. The hybrid strategy from Zhou et al. (2010) focuses entirely on the catalog coverage as can be seen by the high AD. This strategy heavily affects both the NDCG and ID for which this approach performs the worst. The hybrid strategy from Zeng et al. (2010) offers a trade-off in both diversity metrics but the role of the two graphs depends largely on the dataset sparsity. In Flixster, for instance, we see this strategy offers little trade-off as the performance for all values of α but − 1.4 is the same. To some extent, this trend is also present in our joint model, yet it has more control over diversity while retaining the highest NDCG.

Discussion
The accuracy-diversity trade-off represents a crucial factor for improving user satisfaction when personalizing recommendations. However, achieving the 'right' trade-off is challenging, not only because of its subjective aspects that are difficult to quantify, but also because of the complex and irregular user-item relationships that influence both accuracy and diversity. This paper focused on the latter and investigated the potential of graphs that have a proven history as core mathematical tools for representing such data. More specifically, it focused on graph convolutions as the means of dealing with the data complexity and irregularity and to achieve an effective accuracy-diversity trade-off for recommender systems. The overall conclusion of this paper is that graph convolutions have large potential to learn an accuracy-diversity trade-off from the ratings in user-item matrix without relying on side information. Results in three datasets showed graph convolutions attained the highest accuracy while improving diversity compared with other alternatives operating in a similar setting. The proposed approach relies on the joint information from the nearest and the furthest neighbors in both a learning-to-rate and learning-to-rank setting. We analyzed how this information is leveraged during parameter design and from a graph spectral domain perspective. We formulated a learning problem that has accuracy as main focus but that accounts for the trade-off through a regularizer. When the graph convolutional model is composed only of linear filters, we proved the learning problem is convex and provided solutions for it. Convexity rendered the linear model more robust to hyperparameter choice, while the nonlinear model required careful tuning. In the graph spectral domain, we showed graph convolutions operate as bandstop filters in both the nearest neighbor and in the furthest neighbors graphs. This analysis concluded the joint model exploits the general agreement about preferring or not an item but also complete disagreements between connected nearest and furthest neighbors.
We developed an accuracy-to-coverage trade-off, in which accuracy is traded to recommend niche items; and an accuracy-toindividual diversity trade-off, in which accuracy is traded to improve the diversity in the list. The joint convolutional model offers a balance in each setting that is difficult to be achieved with a single model. Comparisons with nearest neighbor accuracy oriented approaches -including state-of-the-art graph convolutional RecSys methods but also vanilla and graph convolutional nearest neighbor collaborative filtering-showed a diversity improvement by up to seven times while paying about 1% in accuracy. Comparisons with the vanilla furthest neighbor collaborative filtering showed consistently a higher accuracy because of the information from the nearest neighbors. The trend in these findings is in line with that in Said et al. (2012), Zeng et al. (2010). Overall, we have seen graph convolutions can trade accuracy to improve substantially one diversity criteria or improve both by a lesser amount.
The current manuscript has also open aspects. One main question we left unaddressed is why nonlinear GCNN models do not outperform the linear counterparts. Although we have seen a GCNN case that improves accuracy and both diversity metrics, most of the results suggested the joint model should be less complex, the sparser the dataset. This is not entirely surprising as shown also in Chen et al. (2020), Huang et al. (2018). Second, we considered the information coming from the furthest neighbor to improve diversity. However, the latter is only a choice and may require careful learning to keep accuracy at satisfactory levels. In alternative, we Fig. 9. Accuracy-diversity trade-off of the joint UI filter optimized for ranking [cf. (24)] and of the hybrid approaches from Zeng et al. (2010) and Zhou et al. (2010). The results are shown for different values of the single parameter α controlling the trade-off in all methods. We also show the vanilla user-FN and item-FN for reference. For all methods, we see that chaning α leads to a degradation of accuracy while improving diversity. The proposed approach pays the least in terms of NDCG while offering an improvement in diversity. may consider safer choices for the dissimilarity graph, which in turn need revisiting the learning problem to still have diversity gains. Third, an extensive analysis is needed to contrast the proposed approach with re-ranking-based solutions. It would be of interest to identify to what extent the dissimilarity graph contrasts the need for re-ranking and which re-ranking method can yield a better trade-off when applied to the list retrieved by the joint convolutional approaches. Lastly, more research is needed toward explainability. The graph spectral analysis helps in this regard but research is still needed to identify the link between the different spectral components and the items included in the recommendation lists. Nonetheless, to the best of our knowledge this is the first work showcasing the potential of graph convolutions to establish an accuracy-diversity trade-off for recommender systems.

Appendix A. Metrics
Denote the test set by T s and the length of recommendation list by k.

RMSE.
For X ui being the true value and X ui the estimated rating for tuple (u, i) ∈ T s , the RMSE is defined as A lower value of RMSE indicates a better fit; hence, a better performance. NDCG @k. Denote by I uk = {i u1 , …, i uk } the set of k items predicted with the highest ratings for user u, i.e., X uiu1 ≥X uiu2 ≥ … ≥X ui uk . We first define the discounted cumulative gain (DCG) for which we consider the true ratings X ui := rel i (called also the relevance for item i) for items i ∈ I uk ordered w.r.t. the predicted order in I uk , i.e., rel 1 ≥ rel 2 ≥ … ≥ rel k . The DCG for user u in a list of length k is defined as The DCG u @k accounts for the ordering of the true values in the predicted list I uk . This ordering can at most be the ideal one X uiu1 =X uiu1 ≥ X uiu2 =X uiu2 ≥ … ≥ X ui uk =X ui uk , i.e., when the algorithm orders the items in the predicted list I uk following the true order.
In this instance, we refer to it as the ideal DCG for user u (iDCG u @k). Then, the NDCG@k for the a list k over all users U is defined as A high value of NDCG@k indicates a better recommendation in the list of order k; hence, a better performance. Aggregated diversity. The aggregated diversity measures the fraction of items I included in the union of all the recommendation lists I uk , i.e., (A.4) A higher aggregated diversity indicates the algorithm recommends a larger portion of the items present in the catalog, consequently, a better performance.
Individual diversity. The individual diversity for a list of length k (ID@k) measures the average diversity within the recommendation lists of all users. For d(i, j) being a distance metric of two items i and j quantifying their dissimilarity, the individual diversity is computed as where the inner sum computes the individual diversity for the list I uk of user u and the outer sum averages across all users. A higher ID indicates the average recommendation list is more diverse. Notice the ID requires computing a distance between items (often based on item features). To use this metric also in featureless items, we followed Kunaver, Dobravec, and Košir (2015) and computed the Euclidean distance based on the first seven SVD latent features for items i and j.

Appendix B. Filter Order Analysis
In this section, we analyze the role of the filter orders in the accuracy diversity tradeoff. We first start with the similarity graph to fix the parameters for this setting and then we discuss the impact of the two filter orders for the joint model. Similarity Graph. We consider the user-NN and item-NN graphs [cf. Section 3]. For each graph, we evaluated different nearest neighbors n ∈ {5, 10, …, 40}, filter orders K ∈ {1, 2, 3}, and list length k ∈ {10, 20, …, 100} .
NN and filter order. We first analyzed combinations between different NN and filter orders. We fixed the length of the list to k = 10 which is a common choice in the literature (Adomavicius & Kwon, 2011;Karypis, 2001;Niemann & Wolpers, 2013;Wang & Yin, 2013;Ziegler et al., 2005). Fig. B.10 shows the RMSE, the AD@10, and the ID@10 for both scenarios. The number of NNs plays a role in the trade-off. More NNs reduce the RMSE but degrade both diversity metrics. This is because each entity gets connected with more similar entities whose combined effect smoothness ratings. For almost all NNs, there is always an order K > 1 that improves both accuracy and diversity of the vanilla NN collaborative filter [cf.
Length recommendation list. In Fig. B.11, we show the effect the recommendation list has on trade-off NDCG@k-AD@k, and NDCG@k-ID@k. A longer list improves diversity, but reduces accuracy. This is rather expected because chances to include different items increase in a longer list. At the same time, a longer list makes more challenging identifying the correct order, therefore, reducing the NDCG@k; see also Valcarce, Bellogín, Parapar, and Castells (2018).
Comparing the user NN with the item NN, we see little difference in terms of NDCG, while there is more different in AD and ID.
-User NN achieves a lower AD but a higher ID. This implies the algorithm prioritizes a few relevant items in the catalog but diversifies the list of each user, respectively. In our opinion, this is because the user NN has a narrow view of all items in the catalog as it explores user-similarities. The model fails to account for the broad range of items (each item is treated individually) and prioritizes popular choices, which are different between them. The plateau the user NN reaches for relatively low ID further corroborates the latter.
-Item NN achieves a higher AD but a lower ID. I.e., the model covers a larger portion of the catalog (recommends different items to different users) but, to a specific user, it recommends similar items. Item NN is less user-centric since it leverages item similarities and ignores the influence of other similar users. Consequently, the model has a broader view on items to build recommendations; this explains an AD that is up to four times higher compared with the user NN. Nevertheless, since each user is treated individually less importance is given to diversifying items within the list. We can see this model as highly personalizing the user list because different items are recommended to different users but these items are highly similar.
Based on these results, we set the list length to k = 20 since this value achieves the highest NDCG for both the user NN and the item NN. While a longer list can be an option to improve diversity, it is not user-satisfactory to search within it.
Similar-Dissimilar Graphs. Next, we analyze the impact of the filter order in the user-item linear filter combination. Fig. B.12 shows the heatmap of the different metrics (RMSE, NDCG@20, AD@20, and ID@20) as a function of the filter order K in the usersimilarity graph and K in the item-dissimilarity graph. We can see two main trade-offs.
First, limiting the effects of the dissimilarity graph by setting K = 1, we see the accuracy increases with the similarity filter order K. This is expected because increasing K implies exploiting more the similar connections for fitting the accuracy-metrics; see also Huang et al. (2018). In addition, also the individual diversity improves, whereas the aggregated diversity deteriorates. Thus, prioritising more the information from the similarity graph benefits an accuracy-to-individual diversity trade-off but a worse catalog coverage. The latter may be interpreted as providing less personalzied recommendations and working with a few niche items that are diverse among them. Second, increasing the effect of the dissimilarity graph (i.e., K) reduces the accuracy but increases diversity. In this instance, the trade-off is in favour of the aggregated diversity. That is, the more the furthest neighbors are included, the more different items are recommended to different users. All this at expenses of having less diversity within the list of a user. Remark in the latter case the gain in aggregated diversity is substantial compared to the loss in individual diversity.  Performance metrics for accuracy (RMSE and NDCG@20) and diversity (AD@20 and ID@20) as a function of the filter order K on the user-similarity graph and K on the item-dissimilarity graph. Results are normalized to ease comparison.