2.1 Introduction

The importance of high-order complex network modeling has been discussed in Chap. 1. In this chapter, we introduce the basic knowledge of hypergraph. In a hypergraph, the edge degree is usually higher than that of a simple graph, which is two for a simple graph. Different from a graph structure that can model pairwise connections with its 2-degree edges, a hypergraph can model correlations between practical data that are much more complex than pairwise relationships. As a result of its versatility and usefulness of modeling complex correlations of data, machine learning on hypergraph has attracted increasing attention.

Machine learning methods on hypergraph have been used in many real-world applications due to its advantages. A wide variety of tasks have been performed with hypergraph in computer vision, including image retrieval [1] and 3D object classification [2], video segmentation [3], re-identification of people [4], hyper-spectral image analysis [5], landmark retrieval [6], and visual tracking [7]. It is possible to embed a wide range of subjects into a hypergraph structure for these tasks. In different tasks, the hypergraph structure can be used to formulate the correlation among a variety of subjects. In image retrieval [3], the correlation among different images can be modeled in a hypergraph, where each vertex denotes an image and the hyperedges can be generated by finding similar image features. In 3D object classification [2], the correlation among different 3D objects can be modeled in a hypergraph, where each vertex denotes a 3D object and the hyperedges can be generated based on the similarity among these 3D objects. In person re-identification [4], a hypergraph structure can be constructed, where each vertex represents a personal image and the hyperedges can be generated based on the similarities in the feature space. Similar modeling attempts have been deployed in medical image analysis and bio-informatics studies to identify genes [8, 9], predict diseases [10, 11], identify sub-types [12], and analyze functional networks [13].

Before detailed introduction of the hypergraph computation paradigm, hypergraph modeling, and other related methods and applications, in this chapter, we first present preliminary knowledge of hypergraph and multiple representations of hypergraph. We also compare the hypergraph structure with the graph structure from four aspects.

2.2 Preliminary Knowledge of Hypergraph

The basic concepts of hypergraph are hereby briefly discussed. Table 2.1 provides the main notations and definitions of hypergraphs throughout this chapter. We first introduce undirected hypergraph and directed hypergraph, respectively, and then introduce the K-uniform hypergraph, probabilistic hypergraph, the relationship between hypergraph and bipartite graph, and the weights on hypergraph.

Table 2.1 Notations and definitions of hypergraphs

2.2.1 Undirected Hypergraph

Let \(\mathbb {G}\) be an indication of a hypergraph (undirected hypergraph), which consists of a set of vertices \(\mathbb {V}\) and a set of hyperedges \(\mathbb {E}\). In a weighted hypergraph, each hyperedge \(e \in \mathbb {E}\) is assigned with a weight w(e), symbolizing the importance of the connection relationship throughout the whole hypergraph. Let W denote the diagonal matrix of the hyperedge weights, i.e., \(\operatorname {diag}(\mathbf {W})=\left [w\left (e_{1}\right ), w\left (e_{2}\right ), \ldots , w\left (e_{|\mathbb {E}|}\right )\right ]\). Given a hypergraph \(\mathbb {G}=(\mathbb {V}, \mathbb {E}, \mathbf {W})\), the structure of the hypergraph is usually represented by an incidence matrix \(\mathbf {H} \in \{0,1\}^{|\mathbb {V}| \times |\mathbb {E}| }\), with each entry H(v, e) indicating whether the vertex v is in the hyperedge e:

$$\displaystyle \begin{aligned} \mathbf{H}(v, e)= \begin{cases}1 & \text{ if } v \in e \\ 0 & \text{ if } v \notin e,\end{cases} \end{aligned} $$
(2.1)

where H(v, e) indicates the possibility of vertex v assigned to hyperedge e or the importance of vertex v for hyperedge e. The degree of hyperedge e and the degree of vertex v are defined as follows:

$$\displaystyle \begin{aligned} \delta(e)=\sum_{v \in \mathbb{V}} \mathbf{H}(v, e), \end{aligned} $$
(2.2)

and

$$\displaystyle \begin{aligned} d(v)=\sum_{e \in \mathbb{E}} w(e) * \mathbf{H}(v, e). \end{aligned} $$
(2.3)

The traditional hypergraph structure creates associations among vertices, with a single hyperedge connecting multiple vertices that have associations. All vertices on the same hyperedge are given a value of 1 in the incidence matrix H. The adjacency matrix H is calculated as in (2.1), whose elements are valued by 0 or 1. Each row represents each vertex in the hypergraph and the columns represent all hyperedges. Each column represents the set of vertices on this hyperedge.

Figure 2.1 shows an undirected hypergraph, including the hypergraph itself, the incidence matrix H, the vertex set \(\mathbb {V}\), the hyperedge set \(\mathbb {E}\), and the weight matrix W. In the illustrated undirected hypergraph, there are 3 hyperedges e 1, e 2, and e 3 with 6 vertexes. The degree of the hyperedge e 3 is 3, which contains vertices {v 3, v 4, v 6}. By the same token, other elements of D v can be inferred. Vertex v 3 belongs to the hyperedges e 2 and e 3, and the degree of the vertex is 2. The incidence matrix H of hypergraph is readily obtained by the rules of construction, which are shown on the right side of Fig. 2.1.

Fig. 2.1
1 hypergraph and 3 matrices. In the hypergraph, e 1 connects v 1, v 2, and v 5. e 2 connects v 1 and v 3. e 3 connects v 3, v 4, and v 6. The dimensions of the matrices H, D subscript v, and D subscript e are 6 by 3, 6 by 6, and 3 by 3.

An example of an undirected hypergraph

Given the incidence matrix H as calculated as in Eq. (2.1), all elements are valued by either 0 or 1. It is noted that the connection weights of different vertices on a hyperedge could be different. For example, some vertices are highly connected in the hyperedge and with high weights, while others may be with low weights. That is to say, the sum of each column of H is 1 (or not, due to different applications and objectives) and its values represent the vertex importance on this hyperedge.

There are various rules that can be used to determine whether vertices are associated with one another. Hyperedge groups can be generated from the data with a graph structure by using pairwise edges and k-hops; for the data without a graph structure, they can be generated by using neighbors in feature space. A detailed description of these methods is provided in Chap. 4.

2.2.2 Directed Hypergraph

The real world is incompatible with traditional undirected hypergraph representation in that hyperedges may be directional. Therefore, the representation of directed hypergraph structures is important. In each hyperedge, the vertex can be further divided into two sets: the source vertex set and the target vertex set. On directed hypergraph, a trivial definition [14] for the incidence matrix is defined as follows:

$$\displaystyle \begin{aligned} {\displaystyle \hat{\mathbf{H}}(v,e)=\left\{{\begin{matrix}-1&\mathrm {if} ~v\in T(e)\\1&\mathrm {if} ~v\in S(e)\\0&\mathrm {otherwise,} \end{matrix}}\right.} \end{aligned} $$
(2.4)

where T(e) and S(e) are the target and source vertices for hyperedge e, respectively. The incidence matrix H is split into two matrices, H s and H t, describing the source and target vertices for all hyperedges, respectively. When passing messages with these two incidence matrices, it is important to maintain the directional information. Two different incidence matrices guide message passing in the directed hypergraph, H s and H t, unlike in the undirected hypergraph. The average aggregation of messages is normalized by D s and D t as two matrices, and it can be formulated as follows:

$$\displaystyle \begin{aligned} \left\{ \begin{array}{ll} {\mathbf{D}}_{\mathbf{s}} = \text{diag}({col_sum}({\mathbf{H}}_s))\\ {\mathbf{D}}_{\mathbf{t}} = \text{diag}({col_sum}({\mathbf{H}}_t)),\\ \end{array} \right. \end{aligned} $$
(2.5)

where diag(v) is a function that converts a vector v to a diagonal matrix. The col_sum(⋅) is a column accumulation function.

Figure 2.2 shows an example of directed hypergraph including the directed hypergraph itself, the incidence matrix H, the source incidence matrix H s, and the target incidence matrix H t. The illustrated directed hypergraph contains six vertices and two hyperedge e 1 and e 2. e 1 connects four vertices and e 2 connects three vertices. In hyperedge e 1, the source vertices are v 1 and v 2, and the target vertices are v 4 and v 5. As for the hyperedge e 2, the source vertices are v 2 and v 3, and the target vertices are only v 6.

Fig. 2.2
A directed hypergraph and 3 matrices. In the hypergraph, e 1 connects v 1, v 2, v 4, and v 5. e 2 connects v 2, v 3, and v 6. The matrices H, H subscript s, and H subscript t have the same dimension of 2 by 6.

An example of a directed hypergraph

2.2.3 Probabilistic Hypergraph

In the real-world correlations, the intensity of the connection can not only be a binary number but also be a continuous number from zero to one. Consequently, the incidence matrix may be a continuous matrix with elements ranging from 0 to 1, which is adopted to denote a probabilistic hypergraph.

As shown in Fig. 2.3, the probabilistic hypergraph consists of six vertices and three hyperedges. The hyperedge e 1 connects three vertices v 1, v 2, and v 5. The intensity of the connection in this hyperedge is not the same. As shown in the right side of the figure, e 1 connects v 1 with an intensity of 0.3, connects v 2 with an intensity of 0.8, and connects v 5 with an intensity of 0.5. The degree of vertex and hyperedge in this type of hypergraph is computed by the sum of the row or column of the hypergraph incidence matrix H, as shown in the bottom of Fig. 2.3.

Fig. 2.3
1 probabilistic hypergraph and 3 matrices. In the hypergraph, e 1 connects v 1, v 2, and v 5. e 2 connects v 1 and v 3. e 3 connects v 3, v 4, and v 6. The dimensions of the matrices H, D subscript v, and D subscript e are 3 by 6, 6 by 6, and 3 by 3.

An example of a probabilistic hypergraph

2.2.4 K-Uniform Hypergraph

In many applications, hyperedges in a hypergraph may connect the same number of vertices, which is known as the k-uniform hypergraph. In the k-uniform hypergraph, each hyperedge contains precisely k vertices, as shown in Fig. 2.4. Under this definition, a simple can be regarded as a spatial case of hypergraph, a 2-uniform hypergraph, where each hyperedge only connects two vertices.

Fig. 2.4
A 3-uniform hypergraph and 3 matrices. In the hypergraph, e 1 connects v 1, v 2, and v 5. e 2 connects v 1, v 2, and v 3. e 3 connects v 3, v 4, and v 6. The dimensions of the matrices H, D subscript v, and D subscript e are 3 by 6, 6 by 6, and 3 by 3.

An example of a 3-uniform hypergraph

Figure 2.4 illustrates an example of 3-uniform hypergraph. The hypergraph consists of six vertices and three hyperedges, and each hyperedge contains precisely 3 vertices. Hyperedge e 1 connects vertices v 1, v 2, and v 5. Hyperedge e 2 connects vertices v 1, v 2, and v 3. The degree of all hyperedges in this type of hypergraph is consistent k.

2.2.5 Hypergraph and Bipartite Graph

The bipartite graph can be indicated by \(\mathbb {G} = \{ \mathbb {U}, \mathbb {V}, \mathbb {E}\}\). Unlike the simple graph, vertices in the bipartite can be divided into two disjoint and independent sets \(\mathbb {U}\) and \(\mathbb {V}\). Every edge only connects one vertex in set \(\mathbb {U}\) and another vertex in set \(\mathbb {V}\). Obviously, an undirected hypergraph can be regarded as a bipartite graph if the hyperedges are treated as another vertex set, as shown in Fig. 2.5.

Fig. 2.5
2 bipartite graphs, 1 hypergraph, and 3 matrices. The hypergraph has the e values connected to the v values. 1 bipartite graph has 6 u values connected to 3 v values, while the other has 3 u values connected to 6 v values. Below is the H matrix of 3 by 6 connected to B matrices of 3 by 6 and 6 by 3.

The relationship between hypergraph and bipartite graph

Figure 2.5 illustrates examples of converting hypergraph to bipartite graph. The bipartite graph can be generated by two strategies: the vertices and hyperedges are treated as vertices in \(\mathbb {U}\) and vertices in \(\mathbb {V}\) (as illustrated in the left part), and the vertices and hyperedges are treated as vertices in \(\mathbb {V}\) and vertices in \(\mathbb {U}\) (as illustrated in the right part). Similarly, a bipartite graph can also be transformed to an undirected hypergraph with set \(\mathbb {U}\)/\(\mathbb {V}\) as the hyperedges. It is not mean that the hypergraph is the same as or can be replaced with the bipartite graph. The transformation only exists in the undirected hypergraph and the probabilistic hypergraph. Confronting more complex hypergraph like directed hypergraph, the transformation will be invalid.

2.2.6 The Weights on Hypergraph

It is noted that there are different weights on a hypergraph, which provide additional information to assign values to a hypergraph structure. This is a more semantically preferred way of representing a hypergraph, as different components of a hypergraph, such as a vertex, a hyperedge or even a sub-hypergraph, should have different impact on the relationship modeling. For example, in a recommender system, the weights in the user profile influence the categorization of user attributes. If the attributes are not categorized accurately, the accuracy of the recommendations and marketing based on the profile could be questionable. The main types of weight information on a hypergraph are hyperedge weights and vertex weights, with the magnitude of the values indicating the relative importance of hyperedges and vertices, respectively.

First, let us show how the weights on vertex can be used. Different vertices may have varying importance on hypergraph modeling, and vertex weights are used in a hypergraph to determine the importance of different vertices. If a vertex is connected on the hypergraph strongly (with high correlations), it should be with a large vertex weight. Otherwise, it should be with a small vertex weight. For those vertices which have a 0 weight value in the incidence matrix, it can also be regarded as it is connected by the corresponding hyperedge with a weight of 0. Here, the diagonal elements of U to represent the weights of vertices, which are between 0 and 1, which reveal the relative importance of these hyperedges. Figure 2.6 shows an example hypergraph with vertex weights. In this figure, the weight of each vertex is denoted by the size of the vertex node. Vertex v 6 has a weight of 0.9, which is larger than all other vertices, and vertex v 2 is the smallest among the six vertices.

Fig. 2.6
A hypergraph and 3 matrices. In the hypergraph, e 1 connects v 1, v 2, and v 5. e 2 connects v 1 and v 3. e 3 connects v 3, v 4, and v 6. The matrices H, D v, and D e have dimensions of 3 by 6, 6 by 6, and 3 by 3. The weights of the vertices are 0.8, 0.2, 0.4, 0.5, 0.7, and 0.9.

An example of a vertex weighted hypergraph

Then, let us focus on the weights on hyperedge. Hyperedge weights reflect the importance of different hyperedges in a hypergraph. As different hyperedges may have different importance in representing connections among vertices, it is crucial that hyperedges be weighted corresponding to their representative capabilities. In some cases, a part of hyperedges are more reliable due to its generation method or the features employed in this task, and these hyperedges should be given a large weight during the learning process. Here, the diagonal element values of W can be used to represent the weights of vertices, which are between 0 and 1, revealing the relative importance of these hyperedges. Figure 2.7 shows an example of the hyperedge weighted hypergraph. In the illustration, the three hyperedges have the weights 0.3, 0.9, and 0.5, respectively.

Fig. 2.7
A hypergraph and 3 matrices. In the hypergraph, e 1 connects v 1, v 2, and v 5. e 2 connects v 1 and v 3. e 3 connects v 3, v 4, and v 6. The matrices H, D subscript v, and D subscript e have dimensions of 3 by 6, 6 by 6, and 3 by 3. The element values of W are given as 0.3, 0.9, and 0.5.

An example of a hyperedge weighted hypergraph

2.3 Comparison Between Graph and Hypergraph

As a generalization of graph, the relationship between graph and hypergraph is a fundamental question. In this part, we detailedly introduce the relationship between graph and hypergraph from four aspects, i.e., the order of correlations, the representation methods, the structure transformation and random work on both of them.

2.3.1 Low-Order Versus High-Order Correlations

First, we define the interaction as a set I = [p 0, p 1, ⋯ , p k−1] containing k basic elements of the system being studied, which can also be called vertices or nodes. Various real-world interactions can be described by such interactions, such as coauthors of a scientific paper, genes required to perform a specific function, neurons co-activating during a specific task, and more. We then denote the order (or dimension) of interactions among vertices as an order-0 interaction for a vertex interacting with itself only, an order-1 interaction for two vertices interacting with each other, an order-2 interaction for three vertices interactions, and so on. Furthermore, high-order interactions are considered k-interactions with k ≥ 2. Low-order interactions, on the other hand, are those characterized by k ≤ 1.

Figure 2.8 shows the comparison of hypergraph and graph on the modeling of different orders of correlations. We notice that a graph can only represent the order-1 interactions between two vertices. Different from graph, a hypergraph can represent any order-k interactions through its flexible hyperedges. From this direction, hypergraph is more effective on modeling high-order correlation among subjects compared with graph.

Fig. 2.8
A diagram compares a graph and a hypergraph. The graph has a low order in which k connects to 2, where each entity can only be connected to a single group. The hypergraph has a high order in which k connects with 2, 3, 4, and 5, where the same entity can connect to different groups.

The expressive ability comparison of graph and hypergraph

2.3.2 Adjacency Matrix Versus Incidence Matrix

A graph with N vertices can be described by an adjacency matrix A ∈ {0, 1}N×N, where A i,j = 1 denotes that there is an edge connecting vertex v i and vertex v j. In most cases, the adjacency matrix A is a symmetry matrix.

A hypergraph with N vertices and M hyperedges can be described by an incidence matrix H ∈ {0, 1}N×M, where H i,j = 1 denotes that the hyperedge e j connects vertex v i.

By comparison of adjacency matrix and incidence matrix, a graph can be regarded as a 2-uniform hypergraph. In this case, each hyperedge can only connect two vertices. Given the possible N × N order-1 hyperedges H in the 2-uniform hypergraph, they can be directly projected to the N × N elements in adjacency matrix A. The hypergraph incidence and the simple graph adjacency matrix can be bi-transformed as follows:

$$\displaystyle \begin{aligned} \mathbf{H}{\mathbf{H}}^\top = \mathbf{A} + \mathbf{D}. \end{aligned} $$
(2.6)

The adjacency matrix for graph and the incidence matrix for hypergraph have different processing styles when confronting multi-modal data or multiple types of connections. Given m adjacency matrices representing m graphs \(\mathbb {G}_1,\mathbb {G}_2,\ldots ,\mathbb {G}_m\), there are two typical ways to combine these data for graph. The first way is to combine different graphs into one graph \(\mathbb {G}\) and then conduct other tasks. The second way is to conduct the task in each graph individually and then combine all these results. Figures 2.9 and 2.10 show these two types of methods. In either method, it is required to perform fusion, either in the graph structure part or in the result part. In recent years, a series of graph fusion methods [15, 16] have been introduced, while it is still a challenging task to optimally combine different graphs. On the other side, the multi-modal graph fusion is also with high computational complexity, which may limit the applications on multi-modal data.

Fig. 2.9
A diagram of how graphs 1, 2, and 3 with vertices v 1 to v 6 are combined as a fused graph, where all the vertices are interconnected. Once tasks 1 to T are conducted, relevant results are generated. It also has 3 matrices A 1, A 2, and A 3 of 6 by 6 combined to give A all of the dimensions.

An example of the graph structure fusion for the multi-modal data

Fig. 2.10
Three graphs 1, 2, and 3 with vertices v 1 to v 6. Tasks 1 to T are conducted for each graph, and the results of 1 to T are analyzed. The fusion of three pairs of results is merged as single pair of 1 to T results.

An example of the results fusion for the multi-modal data

Different from the processing method in graph, hypergraph can handle such types of different connections in an easy and direct way, due to its flexible hyperedges. As shown in Fig. 2.11, when there are multiple types of connections available, it is possible to generate multiple hyperedge groups with m incidence matrices H 1, H 2, …, H m, and these m incidence matrices can be directly concatenated to generate the overall hypergraph structure H. In this way, all these multi-modal data or multiple types of connections can be easily modeled in one hypergraph and all further processing can be directly deployed on this hypergraph structure. Under such circumstances, it is not required to conduct multi-modal fusion in an explicit way, while it could be jointly included in the hypergraph computation process.

Fig. 2.11
An illustration of hypergraphs 1, 2, and 3 have e 1 connected to different vertices v 1 to v 6 that combine as a fused hypergraph with interlinked connections. This leads to tasks 1 to T and their results. Below is the combination of matrices H 1, H 2, and H 3 that gives the output of H all.

An example of the hypergraph structure fusion for the multi-modal data

2.3.3 Structure Transformation from Hypergraph to Graph

A hypergraph can encode high-order data correlation (beyond pairwise) using its degree-free hyperedges compared to a simple graph, where the degree for all edges has to be 2. In a sense, a simple graph can be viewed as a special case, where all hyperedges on a hypergraph are of degree 2. Therefore, hypergraph and graph are interconvertible. Currently, there are a number of methods for converting a hypergraph to a simple graph. The common ones are clique expansion, star expansion, and line expansion, which are shown in Figs. 2.12, 2.13 and 2.14, respectively.

Fig. 2.12
A hypergraph and a graph. In the hypergraph, e 1 connects v1, v 2, and v 5. e 2 connects v 1 and v 3. e 3 connects v 3, v 4, and v 6. With click expansion, the hypergraph converts to a graph where v 1, v 2, and v 5 are connected, v 3, v 4, and v 6 are connected and v1 and v 3 are connected.

An example of transforming a hypergraph to a graph with clique expansion

Fig. 2.13
A hypergraph and a graph. In the hypergraph, e1 connects v 1, v 2, and v 5. e 2 connects v 1 and v 3. e 3 connects v 3, v 4, and v 6. With star expansion, the hypergraph converts to a graph, where virtual nodes from e 1, e 2, and e 3 connect v 1 to v 6.

An example of transforming a hypergraph to a graph with star expansion

Fig. 2.14
A hypergraph and a graph. In the hypergraph, e 1 connects v 1, v 2, and v 5. e 2 connects v 1 and v 3. e 3 connects v 3, v 4, and v 6. After line expansion, the hypergraph converts into a pyramid-structured graph with vertices v 1 to v 6 and the hyperedges e 1, e 2, and e 3.

An example of transforming a hypergraph to a graph with line expansion

(1) Clique Expansion

Figure 2.12 shows an example of transforming a hypergraph to a graph with clique expansion. The clique expansion algorithm constructs a graph \(\mathbb {G}^{x}\left (\mathbb {V}, E^{x}\right )\) from the original hypergraph \(\mathbb {G}(\mathbb {V}, \mathbb {E})\) by replacing each hyperedge e with edges, whose degree is 2, for each pair (u, v) of vertices in the hyperedge [17]: \(\mathbb {E}^{x}=\{(u, v): u, v \in e, e \in \mathbb {E}\}\).

It is interesting to note that the vertices in hyperedge e form a clique in the graph \(\mathbb {G}^{x}\), exactly where the name comes from. \(\mathbb {G}^{x}\) preserves the structure of the vertices of \(\mathbb {G}\), so that the information on the edges needs to be reduced as far as possible to the higher order associations of the hyperedges. That is, the difference between the weights of any two edges that contains both u and v on \(\mathbb {G}^{x}\) and the weights of the hyperedge connections should be as small as possible. Thus we use the following formula when assigning weights w x(u, v) to edges on \(\mathbb {G}^{x}\):

$$\displaystyle \begin{aligned} w^{x}(u, v)=\underset{w^{x}(u, v)}{\arg \min } \sum_{e \in \mathbb{E}: u, v \in e}\left(w^{x}(u, v)-w(e)\right)^{2} . \end{aligned} $$
(2.7)

Hence, clique expansion uses the discriminative model, where every edge in the clique of \(\mathbb {G}^{x}\) associated with hyperedge e has weight w(e). This criterion has the following minimizer:

$$\displaystyle \begin{aligned} w^{x}(u, v)=\mu \sum_{e \in \mathbb{E}: u, v \in e} w(e)=\mu \sum_{e} h(u, e) h(v, e) w(e) , \end{aligned} $$
(2.8)

where μ is a fixed scalar. Equivalently, from the point of view of edges, the weight between two vertices u and v is derived from the sum of the weights assigned by the hyperedge that contains all of them simultaneously.

(2) Star Expansion

Figure 2.13 shows an example of transforming a hypergraph to a graph with star expansion. By star expansion, a graph \(\mathbb {G}^{*}\left (\mathbb {V}^{*}, \mathbb {E}^{*}\right )\) can be constructed from hypergraph \(\mathbb {G}(\mathbb {V}, \mathbb {E})\) by regarding every hyperedge \(e \in \mathbb {E}\) as a new vertex, thus \(\mathbb {V}^{*}=\mathbb {V} \cup \mathbb {E}\) [17]. Each vertex in the hyperedge is connected to the new graph vertex e, i.e., \(\mathbb {E}^{*}=\{(u, e): u \in e, e \in \mathbb {E}\}\).

There are different types of vertices in graph \(\mathbb {G}^{*}\) and each hyperedge in \(\mathbb {E}\) corresponds to a star in graph G. With star expansion, the scaled hyperedge weight is assigned to each graph edge w (u, e) that corresponds to each hyperedge in \(\mathbb {E}\) as follows:

$$\displaystyle \begin{aligned} w^{*}(u, e)=w(e) / \delta(e) . \end{aligned} $$
(2.9)

For each vertex representing a hyperedge, the weights of edges connecting to it are equivalent for equally dividing the superside weights into |δ(e)| parts.

(3) Line Expansion

Figure 2.14 shows an example of transforming a hypergraph to a graph with line expansion. In the case of line expansion algorithm, the vertices of the graph \(\mathbb {G}^{l}=\left (\mathbb {V}^{l}, \mathbb {E}^{l}\right )\) are constructed by reconstructing the structure of the data stored in the vertices of the hypergraph, \(\mathbb {G}=(\mathbb {V}, \mathbb {E})\). Each line vertex (u, e) in \(\mathbb {G}^{l}\) can be viewed as a vertex in a context of a hyperedge or a hyperedge in a context of a vertex [18]. For each point on each hyperedge, a vertex is created to represent it. The vertex v in the line expended graph indicates the property of the vertex in the hyperedge, to each vertex in the hyperedge to it, i.e., \(\mathbb {V}^{*}=\{(u, e): u \in e, u \in \mathbb {V}, e \in \mathbb {E}\}\). This means that \(\left |\mathbb {V}^{l}\right |=\sum _{e} \delta (e)\).

Therefore the vertexes in \(\mathbb {G}^{l}\), which contain the same vertex or the same hyperedge, can be defined as the neighborhood. Consider both connections to be equally important, so W l = diag(1, …, 1), \(|W^{l}| = |\mathbb {V}^{l}| \times |\mathbb {V}^{l}|\). The mapping between a hypergraph \(\mathbb {G}\) and its line expansion \(\mathbb {G}^{l}\) is bijective under the construction.

2.3.4 Random Walks on Graph and Hypergraph

Random walks propagate the information stored in the vertices based on the links among the vertices in the graph or hypergraph. These links constitute the path of different vertices. In the hypergraph, each vertex’s neighbor vertex messages are aggregated to update itself based on the “path” between the central vertex and each vertex in its neighborhood. A hypergraph’s path between vertices v 1 and v k is defined as a sequence, called hyperpath [19]:

$$\displaystyle \begin{aligned} P(v_1, v_k) = (v_1,e_1,v_2,e_2,\dots,v_{k-1},e_{k},v_k), \end{aligned} $$
(2.10)

where v j and v j+1 are both part of the same vertex subset described by a hyperedge e j. We say that a hyperpath separates two neighboring vertices by a hyperedge. In a hypergraph, messages between vertices are propagated through hyperedges, which are higher-order relationships than those in graphs. It is first necessary to extend the Neighbor Relation definition among vertices to the Inter-Neighbor Relation N over vertex set \(\mathbb {V}\) and hyperedge set \(\mathbb {E}\) for message propagation from vertex to hyperedge and hyperedge to hyperedge on the hyperpath.

Definition 1

The Inter-Neighbor Relation \(N \subset \mathbb {V} \times \mathbb {E}\) on a hypergraph \(\mathbb {G}=(\mathbb {V}, \mathbb {E}, \mathbf {W})\) with incidence matrix \(\mathbf {H} \in \{0, 1 \}^{|\mathbb {V}| \times |\mathbb {E}|}\) is defined as

$$\displaystyle \begin{aligned} N = \{ (v, e) \mid \mathbf{H}(v, e) = 1, \, v\in \mathbb{V} \text{ and } e\in \mathbb{E} \, \} . \end{aligned} $$
(2.11)

The hyperedge inter-neighbor set N e(v) of vertex v and the vertex inter-neighbor set N v(e) of hyperedge e are defined based on the Inter-Neighbor Relation.

Definition 2

The hyperedge inter-neighbor set of vertex \(v \in \mathbb {V}\) is defined as

$$\displaystyle \begin{aligned} N_e(v) = \{ e \mid vNe, \, v \in \mathbb{V} \text{ and } e \in \mathbb{E} \, \} . \end{aligned} $$
(2.12)

Definition 3

The vertex inter-neighbor set of hyperedge \(e \in \mathbb {E}\) is defined as

$$\displaystyle \begin{aligned} N_v(e) = \{ v \mid vNe, \, v \in \mathbb{V} \text{ and } e \in \mathbb{E} \, \} . \end{aligned} $$
(2.13)

With hypergraph learning, in contrast to graph learning, data are correlated at a higher level, and correlation models are expanded to a high level, resulting in improved performance in practice. This is just an apparent part of the nature of graph and hypergraph. Next, we delve deeper into the relationship between graphs and hypergraph from the point of view of mathematical derivations with the help of random walks [20] and Markov chain [21]. We then provide a mathematical comparison between hypergraph and graph. The proof concludes that, from random walks’ aspect, a hypergraph with edge-independent vertex weights is equivalent to a weighted graph, and a hypergraph with edge-dependent vertex weights cannot be reduced to a weighted graph.

Two types of hypergraphs can be constructed to accurately represent real-world correlations, that is, hypergraph with vertex weights independent of edge and hypergraph with vertex weights dependent on edge. By using the binary hypergraph incidence matrix \(\mathbf {H}\in \{0, 1\}^{|\mathbb {V}|\times |\mathbb {E}|}\), where vertices in each hyperedge share the same weight, hypergraph with edge-independent vertex weights (\(\mathbb {G}_{\text{in}}=\{\mathbb {V}, \mathbb {E}, \mathbf {W} \}\)) can model beyond pairwise correlations. Alternatively, the weighted hypergraph incidence matrix \(\mathbf {R} \in \mathbb {R}^{|\mathbb {V}|\times |\mathbb {E}|}\) is used to model the variable correlation intensity in each hyperedge for the hypergraph with edge-dependent vertex weights (\(\mathbb {G}_{\text{de}} = \{\mathbb {V}, \mathbb {E}, \mathbf {W}, \gamma \}\)). We assume that hyperedge e includes vertex v, where γ e(v) denotes the connection intensity and w(e) the weight of hyperedge e.

In hypergraph with edge-independent vertex weights, the definition of binary hypergraph incidence matrix H, vertex degree d(v), and hyperedge degree δ(e) is the same as in Sect. 2.1. In hypergraph with edge-dependent vertex weights, define the d(v) and δ(e) as follows:

$$\displaystyle \begin{aligned} \left\{ \begin{array}{l} d(v) = \sum_{\beta \in \mathbb{N}_e(v)}w(\beta) \\ \delta(e) = \sum_{\alpha \in \mathbb{N}_v(e)}\gamma_e(\alpha), \end{array} \right. \end{aligned} $$
(2.14)

where \(\mathbb {N}_v(\cdot )\) and \(\mathbb {N}_e(\cdot )\) are defined in Eqs. (2.12) and (2.13), respectively.

Then, we will introduce the random walks and the Markov chain in hypergraph. First, we define the random walk in a hypergraph following papers [20,21,22,23]. At time t, a random walker at vertex v t does the following:

  • Pick an edge e containing vertex v t = v, with probability p ve.

  • Pick vertex u from e, with probability p eu.

  • Move to vertex v t+1 = u, at time t + 1.

We then define the transition probability p v,u of the corresponding Markov chain on \(\mathbb {V}\) as \(p_{v, u} = \sum _{e \in \mathbb {N}_e(v, u)} p_{v \to e} p_{e \to u} \), where \(\mathbb {N}_e(v, u) = \mathbb {N}_e(v) \cap \mathbb {N}_e(u)\) denotes the hyperedge \(\beta \in \mathbb {N}_e(v, u)\) containing vertices v and u, simultaneously. In hypergraph with edge-independent vertex weights, we have p ve = w(e)∕d(v) and p eu = 1∕δ(e). The transition probability p v,u can be written as \(p_{v, u} = \sum _{\beta \in \mathbb {N}_e(v, u)} \frac {w(\beta )}{d(v)} \cdot \frac {1}{\delta (\beta )} .\) In hypergraph with edge-dependent vertex weights, we have p ve = w(e)∕d(v) and p eu = γ e(u)∕δ(e), and the transition probability p v,u can be written as \(p_{v, u} = \sum _{\beta \in \mathbb {N}_e(v, u)} \frac {w(\beta )}{d(v)} \cdot \frac {\gamma _\beta (u)}{\delta (\beta )} \).

The following lemmas and definitions are used to compare the graph and the two types of hypergraphs [21].

Definition 4

Let M be a Markov chain with state space X and transition probability p x,y, for x, y ∈ S. It can be said that M is reversible if there exists a probability distribution π over S such that π x p x,y = π y p y,x.

Lemma 5

Let M be an irreducible Markov chain with finite state space S and transition probability p x,y for x, y  S. M is reversible if and only if there exists a weighted undirected graph \(\mathbb {G}\) with vertex set S such that random walks on \(\mathbb {G}\) and M are equivalent.

Proof of Lemma 5

Note that π indicates the stationary distribution [21, 24] of a given edge-independent/edge-dependent hypergraph. The transition probability p v,u of vertices in hypergraph with edge-independent vertex weights is defined as

$$\displaystyle \begin{aligned} p_{v, u} = \sum_{\beta \in \mathbb{N}_e(v, u)} \left( \frac{w(\beta)}{d(v)} \right) \left( \frac{1}{\delta(\beta)} \right) . \end{aligned} $$
(2.15)

Moreover, the transition probability p v,u of vertices in hypergraph with edge-dependent vertex weights is defined as

$$\displaystyle \begin{aligned} p_{v, u} = \sum_{\beta \in \mathbb{N}_e(v, u)} \left( \frac{w(\beta)}{d(v)} \right) \left( \frac{\gamma_\beta(u)}{\delta(\beta)} \right) . \end{aligned} $$
(2.16)

“⇒”: Suppose M is reversible with transition probability p x,y. We then construct a graph \(\mathbb {G}\) with vertex set S and edge weights w x,y = π x p x,y. Because M is irreducible, π x ≠ 0 and p x,y ≠ 0 for all states x and y. Thus, the edge weight w x,y ≠ 0 and the graph \(\mathbb {G}\) are a connected graph. Due to the reversibility of M that w x,y = π x p x,y = π y p y,x = w y,x, the constructed graph \(\mathbb {G}\) is an undirected graph. Random walks on \(\mathbb {G}\) from x to y in one-time step satisfy the following:

$$\displaystyle \begin{aligned} \frac{w_{x, y}}{\sum_{z \in S} w_{x, z}} = \frac{\pi_x p_{x, y}}{\sum_{z\in S} \pi_x p_{x,z}} = \frac{p_{x, y}}{\sum_{z\in S}p_{x, z}} = p_{x, y}, \end{aligned} $$
(2.17)

since ∑zS p x,z = 1. Thus, if M is reversible, the stated claim holds.

“⇐”: Random walks on an undirected graph are always reversible.

Definition 6

A Markov chain is reversible if and only if its transition probability satisfies

$$\displaystyle \begin{aligned} p_{v_1, v_2}p_{v_2, v_3} \cdots p_{v_n, v_1} = p_{v_1, v_n}p_{v_n, v_{n-1}} \cdots p_{v_2, v_1} \end{aligned} $$
(2.18)

for any finite sequence of states v 1, v 2, ⋯v n ∈ S. The definition is also known as Kolmogorov’s criterion. For more detailed proofs, please refer to [25].

Theorem 1

Let \(\mathbb {G}_{\mathit{\text{in}}} = \{\mathbb {V}, \mathbb {E}, \mathbf {W} \}\) be a hypergraph with edge-independent weights, and then there exists a weighted undirected graph \(\mathbb {G}\) such that a random walk on \(\mathbb {G}\) is equivalent to a random walk on \(\mathbb {G}_{\mathit{\text{in}}}\).

Proof of Theorem 1

The probability p v,u of \(\mathbb {G}_{\text{in}}\) is defined in Eq. (2.15). By Definition 6, the following equation can be deduced:

$$\displaystyle \begin{aligned} &p_{v_1, v_2}p_{v_2, v_3}\cdots p_{v_n, v_1} \\ &\quad = \sum_{\beta \in \mathbb{N}_e(v_1, v_2)} \left( \frac{w(\beta)}{d(v_1)} \cdot \frac{1}{\delta(\beta)} \right) \cdots \sum_{\beta \in \mathbb{N}_e(v_n, v_1)} \left( \frac{w(\beta)}{d(v_n)} \cdot \frac{1}{\delta(\beta)} \right) \\ &\quad = \left(\frac{1}{d(v_1)} \sum_{\beta \in \mathbb{N}_e(v_1, v_2)} \frac{w(\beta)}{\delta(\beta)} \right) \cdots \left(\frac{1}{d(v_n)} \sum_{\beta \in \mathbb{N}_e(v_n, v_1)} \frac{w(\beta)}{\delta(\beta)} \right) \\ &\quad = \frac{1}{d(v_2)} \sum_{\beta \in \mathbb{N}_e(v_1, v_2)} \frac{w(\beta)}{\delta(\beta)} \cdots \frac{1}{d(v_1)} \sum_{\beta \in \mathbb{N}_e(v_n, v_1)} \frac{w(\beta)}{\delta(\beta)} . \end{aligned} $$
(2.19)

For any v i and v j, \(\sum _{\beta \in \mathbb {N}_e(v_i, v_j)} \frac {w(\beta )}{\delta (\beta )} = \sum _{\beta \in \mathbb {N}_e(v_j, v_i)} \frac {w(\beta )}{\delta (\beta )}\). Thus, the reversibility can be proven by

$$\displaystyle \begin{aligned} &p_{v_1, v_2}p_{v_2, v_3}\cdots p_{v_n, v_1} \\ &\quad = \frac{1}{d(v_2)} \sum_{\beta \in \mathbb{N}_e(v_2, v_1)} \frac{w(\beta)}{\delta(\beta)} \cdots \frac{1}{d(v_1)} \sum_{\beta \in \mathbb{N}_e(v_1, v_n)} \frac{w(\beta)}{\delta(\beta)} \\ &\quad = p_{v_2, v_1}p_{v_3, v_2}\cdots p_{v_1, v_n} \\ &\quad = p_{v_1, v_n}p_{v_n, v_{n-1}}\cdots p_{v_2, v_1} . \end{aligned} $$
(2.20)

We say that a random walk on \(\mathbb {G}_{\text{in}}\) is reversible. Furthermore, by Lemma 5, a random walk on \(\mathbb {G}_{\text{in}}\) is equivalent to a random walk on a weighted undirected graph \(\mathbb {G}\).

The proof of Theorem 1 can be processed as follows:

  1. 1.

    A random walk on \(\mathbb {G}_{\text{in}}\) is equivalent to a random walk on a reversible Markov chain (according to Definition 6 ).

  2. 2.

    A random walk on a reversible Markov chain is equivalent to a random walk on a weighted undirected graph \(\mathbb {G}\) (according to Lemma 5 ).

Theorem 2

Let \(\mathbb {G}_{\mathit{\text{de}}} = \{\mathbb {V}, \mathbb {E}, \mathbf {W}, \gamma \}\) be a hypergraph with edge-dependent weights, and then there does not exist a weighted undirected graph \(\mathbb {G}\) such that a random walk on \(\mathbb {G}\) is equivalent to a random walk on \(\mathbb {G}_{\mathit{\text{de}}}\).

Proof of Theorem 2

Figure 2.15 provides an example that a random walk on \(\mathbb {G}_{\text{de}}\) is not equivalent to a random walk on a reversible Markov chain. According to the second step of Theorem 1 ’s proof, Theorem 2 holds.

Fig. 2.15
2 pairs of a hypergraph and a matrix. Hypergraphs a and b connect e 0, e 1, and e 2 to v 0, v 1, v 2, and v 3. Matrix a has values of 1s and 0s. Matrix b has decimal values. In pair 1, the product of vertices is the same for the hypergraph and matrix, but the values are different in pair 2.

An example of two types of random walks on the hypergraph with edge-independent vertex weights and the hypergraph with edge-dependent vertex weights. This figure is from [26]

A simple illustration is shown in Fig. 2.15 to make it easier to understand. There is no difference in the connection structure between the two hypergraphs, but there is a difference in the intensity of the connections. For two types of hypergraphs, the transition probabilities p v,u can be computed accordingly. As a consequence, two random walks from vertex v 0 are conducted: “v 0 → v 1 → v 2 → v 0” and “v 0 → v 2 → v 1 → v 0.” Having obtained \(p_{v_0, v_1}\cdot p_{v_1, v_2}\cdot p_{v_2, v_0}\) and \(p_{v_0, v_2}\cdot p_{v_2, v_1}\cdot p_{v_1, v_0}\) for the two paths, the cumulative transition probability can then be calculated. This type of hypergraph is reversible according to Theorem 1 and Lemma 5 . Thus, from the two reversible paths, the same accumulated transition probability can be obtained. Alternatively, two different accumulated transition probabilities are obtained from two reversible paths in the hypergraph with edge-independent vertex weights.

2.4 Summary

In this chapter, we present the mathematical definition of the foundations of hypergraph and their interpretation. We then also show the representation of directed hypergraph, different from undirected hypergraph, which represents the relationships between vertices within a hyperedge. Finally, we discuss the relationship between graph and hypergraph in conversions and expressive ability perspectives. The most intuitive differences between graph and hypergraph can be seen in low-order versus high-order representations and adjacency matrix versus incidence matrix. Clique expansion, star expansion, and line expansion are methods for converting hypergraph into simple graph. We also show the relationship between graph and hypergraph from the random walk view. A hypergraph with edge-independent vertex weights is equivalent to a weighted graph, and a hypergraph with edge-dependent vertex weights cannot be reduced to a weighted graph from the information propagation process on graph/hypergraph.