Deep Learning has been used in many traditional machine learning tasks such as object detection, video processing, speech recognition,etc. The data used in these tasks however are generated from Euclidean space. Machine learning models however do not perform well With tasks that uses data generated from non-euclidean space and represented in graphs, thus the need for deep learning solutions. Graph Neural Networks(GNNs) inspired from Recursive Neural Network and Markov chains was first introduced in 2005. GNNs are used from Computer vision tasks such as scene graph generation, point clouds classification, and action recognition to Natural Learning Tasks such as generating a semantic, knowledge graph given a sentence and recommendation tasks. In this work we examine the various advances made in GNNs for Recommendation.

Session-based Recommendation with Graph Neural Networks

The goal of a session based recommendation sysytem is to predict which item the user will click next based on user’s current session data and without accessing long term data. Session-based Recommendation with Graph Neural Networks (SR-GNN) proposed in 2019 is used to obtain accurate item embedding and take complex transitions of items into account in session based recommendation tasks. In the graph a node represents an item and an edge represents an interaction of the user with the item. Since an item may occur in a sequence multiple times, each edge is weighted which is calculated as the occurrence of the edge divided by the outdegree of that edge’s start node.


The model is fed with the item embeddings and the concatenation of two adjacency matrices which is the weighted connections of outgoing and incoming edges in the sessions. As seen from the figure above SR-GNN uses a variant of Gated Recurrent Unit (GRU) layer.


The local embedding is represented by the embedding of the last item in the session. Since the embedding may contain information of different priority an attention mechanism is used on the aggregation of node vectors to get the global embedding. The final session embedding is obtained by applying a linear layer on the concatenation of the local and global embedding vectors. To obtain the final prediction scores for each session, the session representation is multiplied with item embeddings for each item flollowed by softmax.

An Efficient and Effective Framework for Session-based Social Recommendation

Social information for an user is usually available and the user’s interest could be influenced by their friends, thus this work imporves the recommendation by imporving the node embeddings using the social information. To use the social information a heterogenous graph is used by a Heterogenous Graph Neural Network(HGNN) with attention mechanism to refine the user and item embeddings. The embeddings relevant to a session are then passed to a non-social aware model to capture the user’s current interest. Thus the model is able to capture both the influence of friends and user’s interest in the final user embeddings.


The figure above shows the architecture of the model Social-aware Efficient Recommender used in the work. The heterogenous knowledge graph consists of the session and social data. It uses 4 types of edges:

  • User-User(UU): “is followed by” users are more influenced by other users they follow with weight 1
  • User-Item(UI): represents user item interactions with weight number of times the interaction occurs
  • Item-User(IU): represents user item interactions with weight number of times the interaction occurs
  • Item-Item(II): represents transition of item from one to another with weight number of times the transition occurs

A HGNN with attention mechanism with L layers is used to learn the node embeddings. User v that user u follows and item that user u interacted with is used to message the user u embeddings. Since the number neighboring users and items may be unbalanced, a attention mechanism is used for weight allocation during aggregation.


As seen above the edge features are first calculated using the source and destination nodes. For each layer the edges are embedded into a dense vector and added to previously calculated edge features. After passing through a sigmoid layer these features are multiplied with a learnable parameter q. After nomalizing the scores, the influence from the neighboring nodes are computed as weighted sum. The new node representation is calculated by applying relu on the sum of aggregated information and old representation after linear layer. For items the influence of neighboring transition and user interaction is considered.

The personalized session component extracts the node embeddings relevant to current session. The item embeddings in the session are initialized with user embeddings added with each of the item embeddings. Both the incomming and outgoing nodes are considered when computing the update information, these two vectors are concatenated and a gating mechanism is used to update the information. Session specific are obtained by using the last item to select important items.


The above formula is used to calculate the attention weights using the user, items and last item embeddings. After applying the softmax layer weighted sum is calculated as the session embeddings. These seesion embeddings are then concatenated with the user embeddings and multiplied with item embeddings to calculate the prediction.

Neural Graph Collaborative Filtering

A drawback of earlier works for recommendation systems is that they obtain a user’s embedding from pre-existing features, thus collaborative signal present in user-item interactions is not encoded in the embedding process. Neural Graph Collaborative Filtering aims to exploit this feature by propagating embeddings in user item graphs, thus injecting the collaboratve signal in an explicit manner.


As seen from the figure above the user and item embeddings are initialized from a lookup table, they are then propagated on the user-item interaction graph to refine them. The intuition behind this is that interacted item indicates an user’s preference and an user that consumes an item can be treated as the item’s features. Embedding propagation based on these assumptions would require two main operations i.e message construction and message aggregation.

For message construction between the neighbor(i) and node(j) the neighbor is multiplied with the node embeddings and passed through linear layer, the resultant is then added to node embeddings after passing through another linear layer. Normalization is done to account for the decay with path length, this is done by dividing the message by square root of the product of the number of one-hop neighbor of the node and neighbor. For node aggregation the node embeddings are passed through same linear layer during message construction, added with the message and passed through a leakyrelu activation. Thus the node is updated with positive and some negative signals while retaining the original features. Thus information from one-hop neighbors are eoncorporated, to include higher order connectivity information multiple such layers are stacked over one-another. The different layers would conatin information from passed over different connections, thus they are concatenated for final node representation.

A Heterogeneous Information Network based Cross Domain Insurance Recommendation System for Cold Start Users

Recommendation system has problem recommending items for users with little past activity. This work uses the user item interactions from another(source) domain and trains a network to map these user embedding from source domain to target domain. This network is trained on users present in both source and target doomain.


PingAn Jinguanjia dataset has user item interactions from two domain insurance(target) and non-financial items such as clothes, skincare, etc. Thus the network needs to learn the latent feature in both source and target domain.

As shown in the figure to learn the features in source domain a heterogenous graph is constructed with nodes user(U), agent(A), insurance product(I), insurance property(P) and relations

  • Purchase U->I, Purchased by I->U
  • served by U->A, be served by A->U
  • posses I->P, be possesd by P->I


Relational Neighbor Aggregation is used to calculate the importance of one-hop neighors. As shown above to calculate the attention scores the source embedding is passed through a linear layer and multiplied with the neighbor embeddings, pass the output through a linear layer and softmax to calculate the attention score. The attention scores are multiplied with the node embeddings, and Relu, layernormalization and dropout layers to calculate the final representation.

Meta path based aggregation is used to add information from nodes connected via meta paths, however only neighbors having same node type are considered. The node and neighbor embeddings multipled and passed through a linear layer followed by softmax to calculate the attention scores. These scores are multipled with node embeddings, added and to calculate the final presentation. These embeddings are stacked to get multiple node embeddings for each node.

Multiple node embeddings are fused using Semantic Attention. A linear, tanh and linear with softmax layer followed by average is used to calculate the attention score. These scores are multipled with the node embeddings and summed to get final representation.

The output of Semantic attention and regional attention are concatenated and passed through a linear layer. The resultant is concatenated with original embeddings, passed through a linear layer with relu activation and layernormalization to get final node representation.

Each item in source domain has a description, word2vec is used on each word, concatenated and a maxpoooling layer is used to get final representation. To get user’s representation GRU layer is used. A MLP is used to map source user embeddings to target user embeddings. This network is then used to get user embeddings for user present in source domain but not in target domain. Thus this work is able to resolve the cold start issue.

Attention-Based Recommendation On Graphs

For any collaborative filtering task an important aspect of the mdoel would be to figure out how much informative a node would be for predicting the future behavior of a target user. Thus this work proposes GARec which uses an attention mechanism along with a spatial GCN to extract user and item embeddings.


The model is trained in an end to end manner and is fed with the initial feature vectors and the adjacency matrix. A bipartite graph is constrctued using the adjacency matrix and the weight of each node is the user’s feedback. The edge embeddings are calculated based on the connecting node embeddings. A spatial GCN is used to extract the node embeddings based on two types of indirect neighborhood for each node, for users these are

  • co-rated users: extracted from user-item-user path with weight the product of the rating of the two users; for items those that have got interaction from the same user are used

  • provided feedback: users(y) who provided feedback for target(i) item with feedback as the edge weight

One-hot encoding is not used to obtain the initial node embeddingssince since it is computationally expensive than NMF. NMF is fed with the adjacency matrix to output two other non-negative matrix F1 and F2. Where each row of F1 and are used to represent the feature vectors of user and item respectively.

To aggeregate the information from neighbors, weighted mean is used. To calculate the weight of neighbour(y) for the target node(u) they are passed through a linear layer to get query(q) and key(k) respectively, then the dot product of q and k is multiplied with weight of the edge between the two nodes. The output is passed through relu and softmax, further to avoid long tail problem scores less than average score is set to negative infinity followed by normalization. This process is repeated for all the neighbors and added to calculate the aggregated information.

The aggeregated information and the original node representation is passed through a linear layer and dot product with q to calculate their relevance with the target node respectively. After multiplying these weights with the vectors repectively they are added and passed through relu to update the node representation.

The user and item embeddings are given to a MLP for prediction. RMSE loss is used to train the model to predict the known user-item edge weights.