09_Knowledge_Graphs

Buffl

KI Fahrzeugtechni

by Jensen J.

How are graphs placed in the lecture?

What are basic elements of a graph?

node
directed edges
undirected edges
weighed edges
unweighted edges

Formal description graphs?

How can one get from open street map to a routable graph?

OSM provides geometry and speed limits
-> have nodes, edges and speeds on edges

augmented graph -> calculate edge weitghts like distances and timespans (based on speed limit…)

compressed garph
- e.g. combine paths where a node with e.g. only two edges is inbetween…
- => reduces complexity…
include turn restrictions and turn penalties
- turn restricions derived from OSM tags
- turn penalites might be introduced for time optimal routing
- => basically, split undirected edge in two directed for each direction
- -> add different time for each… (to e.g. introduce time to turn…)

What assumptions did we do in the lecture?

only non negative edge weights
edges are undirected
nodes are given by their coordinates in two dimensional space
edge weights defined by distance functoin -> euclidean distance is used… (based on node coordinates…)

What to approaches to find a path can be used? With which algorithms?

find a path
- depth first
- breadth first
- (best first)
find optimal path
- dijkstra
- A*

What is the workflow of routing?

generate routable graph
determine origin / destination
find path

How to construct search trees from graphs?

choose a root (e.g. origin in path…)
create childs from one-setp extensions in graph (e.g. one-step neighbors)
=> trace out all possible paths until no further extension is posisble…

When is a search algorithm complete?

if it is guaranteed to find an existing solution in a finite amount of time…
=> all algos we have in the lecture

What rules for creating search trees exist?

paths never bite their own tail (no loops)
ties resolved lexically

How does depth first work?

begin with one-element queue and zero length path (root)
until first path in queue terminates at goal node or queue is empty
- remove first path from queue; create new paths by extending first path to all neighbors of terminal node
- reject all new paths with loops
- add new paths, if any, to the front of the queue
if goal node is found, announce success; otherwise announce failure

=> basically: go as deep as possible, than backtrace until previous node has other child…

How do depth first trees look loke?

narrow and deep

How does breadth first work?

Form a one-element queue consisting of a zero-length path that contains only the root node
Until the first path in the queue terminates at the goal node or the queue is empty,
- Remove the first path from the queue; create new paths by extending the first path to all the neighbors of the terminal node
- Reject all new paths with loops
- Add the new paths, if any, to the back of the queue
If the goal node is found, announce success; otherwise announce failure

=> basically, extend nodes, add child paths to end of queue

=> level by level

How do breadth first trees look like?

wide and shallow

How does dijkstra work?

greedy algorithm
-> extend node where we have the lowest cumulative cost…
-> terminate until reached goal node and no cheaper path is available anymore

Until the first path in the queue terminates at the goal node or the queue is empty,
- Remove the first path from the queue; create new paths by extending the first path to all the neighbors of the terminal node
- Reject all new paths with loops
- If two or more paths reach a common node, delete all those paths except the one that reaches the common node with the minimum cost.
- Add the new paths, if any, to the queue
- Sort all paths by their accumulated costs

What can graphs be used for (as general purpose data structure)?

store and organize data with strong relationships between data objects
restriction, taht nodes have fixed coordingates no lonver valid
graphs can be used to link data together and in combination with specific algorithms can be referred as knowledge graphs

Diagram “what is knowledge”

What are the “levels” of graphs w.r.t. knowledge?

(un) directed graph
labelled
property

How can we label a graph?

add labels to the edges that indicate relations
-> comparison SQL; in graphs more flexible…

How can we then generate a knowledge graph?

make property graphs
=> nodes are entities with types and attributes…
- -> e.g. node santiago:
  - type: capital city
  - attributes:
    - latitude
    - longitude
-> edges also have types and attributes
- -> e.g. edge LA380
  - type: Flight
  - attributes
    - company

What does graph embedding do? What is its goal? How is its metric defined?

map nodes to lower dimensinoal space
-> find embedding so that “similar” nodes in graph have close distance in the embedding
-> useage of graph structure to define similarity

What is the approach to graph embedding?

find suitable encoding from node to embedding space
- enc(u) = z_u
define a similarity function on the original network
- similarity(u,v) = z_v * z_u
optimize the parameters of the encoder to reflect the simiarlity function in the embedding

What is shallow embedding?

lookup table with column for each node and number of rows representing dimensions of embedding space
=> one-hot encoding witn more than one dimension for each element…

What are the methods to define similarity?

adjacency
- -> usage of edge weigths in graph
SkipGram
- usage of pure graph structure

How to calculate adjacency based similarity?

create adjacency matrix A (nxn)
-> holds neighborhood information of each node
=> values are weights between nodes…
consider unidirectionality and bidirectionality…

How to calculate the loss in adjacency based similarity?

loss is sum over all embedded node pairs
squared euclidean distance of
dot product of vector pair
minus distance in adjacency graph

What is the optimization goal in adjacency based embedding?

Goal:

find embedding Matrix A that minimizes the loss function
->… not adjacency matrix (unlucky naming scheme…)

Method:

numerical optimization like (stochastic) gradient descend

What are drawbacks of adjacency based simiilarity embedding?

runtime and number of parameters
- -> embedding matrix can become quite big… -> computational heavy…
considers only connections
- -> e.g. have node that is connected to all others but never directly -> ignore directly…

What is the similarity definition of SkipGram?

chance of the nodes u and v being both present on random walk over the graph…

What is the approach to SkipGram based embedding?

use random walk algo to generate set of node sequences (fixed length)
estimate probability of visiting node u when using random walk strategy from starting node v
optimize the embedding to encode generated statistic from step 1

First step rabdom walk?

chose fix path length (e.g.4)
for eahc node -> do random walk n times with e.g. length 4
-> yields n paths of length 4
=> create superset N_R(u) containing the set of nodes that were visited by the random walks from node u (including empty set)

Second step random walk?

calculate the loss
-> sum over all nodes
-> sum over nodes in superset of that node (nodes visited by random walks from that node)

-> lower sum is sum over all nodes

-> upper exp => e^distance of start node to goal node

-> lower exp => e^distance of start node to all other nodes in the graph

=> yields probabilty more or less…

What is the idea of SkipGram?

train simple NN for each node with one hidden layer
=> given that node -> pick another node -> network gives probabilty of that node being connected
=> basically train neural network for each node with output being
probabilityvector with dimensionality corresponding to number of nodes
-> containint the pairwise probabiliities that we visit this node from the initial node…
=> use weight vector as node embedding…

What is needed for skip gram?

do lots and lots of random walks
-> to generate the training probabilites…

What is the skipgram NN architecture?

input:
- vector of length |N| -> all zeroes except positino of current node (i.e. one hot encoded)
hidden layer:
- number of neurons are equal to dimensions in the embedding space, no activation funciotn given
- does not have to be N -> can decide what dimensionality our embedding space should have…
output layer:
- probability of node z that it apperas with the input node in random walk sequence
- use softmax

How lookup table structured?

rows: nodes
columns: embedding (i.e. neuron weights in skipgram)

What elements are in the graph neural network model?

G = (N,E) -> Graph G
- consists of N nodes
- and E edges
ne(n)
- set of neighbors of node n
co(n)
- set of edges connected to n
l_n, l_(n1, n2)
- label/properties of node n or the edge n1-n2

What is the general approach to Graph NN?

graph used as model architecture
model trained on graph
output represented in the graph as additional property per node

What is the core idea of graph NN?

propagate the node information over het network
- to combine data freatuers with the logical structures and data relationships
in addition to label/property vector, each node gets hidden state vector h assigned
h dependent on time step t
h can be initialized with feature vector for t=0

-> update each state incrementaly by considering features of adjacent neighbors (multiplied by activation function…) with learnable weights…

How is the update step in the hidden layer performed in graph convolutional neural networks?

state at time step t+1
is activation functoin of
adjacency matrix
old hidden state
and weight matrix

What was not yet considered w.r.t. adjacency matrixc?

we need to include loops to use our own hidden state -> as else, it will basically play no role anymore after many iterations

What has to be considered w.r.t. the size of the hidden state?

we should normalize the output (new hidden state)
-> as else, due to summation, it will become very large…

How do we scale the hidden state updates we consider?

normalize each individual hidden state from neighbor by 1/number of own neighbros

or:

1/sqrt(number of own neighbors times number of individual neighbor)

What are the (mathematical) steps of GCN?

input graph
-> propagation (h(t+1) = Ah(t)
-> linear weights (h(t+1) = h(t+1)W)
-> nonlinear activation function
- h(t+1) = activation function (Ah(t)W)
=> use as output or do additional iteration…

What is a problem so far?

adjacency based does not support edge features… (we only have adjacency matrix but no edge feature vectors…)
idea: message passing
=> message is adjusted by edge itself…

What is the hidden state update formula for message passing?

new hidden state for node v
is sum over neighboring nodes
of a function over
- own feature vector
- neighbor feature vector
- connecting edge feature vector
- own current hidden state
- neighbor current hidden state

What is an exmaple how GNN can be realized (so far no message passing)?

new hidden state is average of previous hidden state and passed message
passed message is the sum of the neighboring hidden states

=> no message passing

What is important in terms of temporal relationship of different nodes?

fixed iteration steps
=> always consider the previous hidden state of other node, even when we calculated the new one…
=> in theory: concurrently calculate the new hidden states before actually updating them
=> else not deterministic…

How does inforamtion propagation happen in GNN?

first iteration: take neighbor informatino
second iteratoin: take neighbor informatino (containing info from neighbors of neighbors due to first iteration…)

Final approach GNN?

aggregate messages from neighbors
feed into neural network
repeat…

Final formula GNN?

first hidden state is the initial feature vector
update hidden state by
- actiavtion function of
  - weight times
  - sum over neighbors
    - normalization factor (1/number of neighbors)
    - times old hidden state
    - plus
    - B matrix old hidden state) (B matrix second weitght matrix…)
output = hidden state after k time steps…

What is the loss function for our GNN?

sum over all nodes
- ground truth times
- log(activation function(output * classificaiotn weights))
- +
- (1-ground truth)
- log(1-activation function(output * clasificaiotn weights))

-> basically meaning:

ground truth times log(classifiactoin)
(1- ground truth) log (1-classification))

-> if the label is 1 and the classifiactoin output is 1 -> loss is 0 (log(0)) as other term is 0 due to (1-1)

-> if label is 1 and classification output is very small -> log(0.001) -> negative value increasing with smaller values…

What has to be defined for GNN?

message aggregation function
- -> how to aggregate the hidden states from the neighbors
define loss function
train on set of nodes

What is shared between nodes in GNN?

Wk and Bk is shared…
(aggregation parameters)

Application GNN?

node classificaiotn
link prediciotn
graph clustering
graph classificatoin
computer vision

Specific applications GNN?

travel time estimatnio
social graph analysis
scene understanding

Join Course

Preview

Author

Jensen J.

Information

Last changed
2 years ago

Report course