graphframe最短路径失败，顶点id未找到错误

af7jpaap 于 2021-05-29 发布在 Spark

关注(0)|答案(2)|浏览(447)

我的spark/databricks使用graphframes来查找连接组件中顶点之间的最短路径。算法在几分钟后失败 org.graphframes.NoSuchVertexException: GraphFrame algorithm given vertex ID which does not exist in Graph. Vertex ID 1 not contained in GraphFrame .
错误消息完全不相关-两个图的顶点都不包含 id == 1 非边缘
src dst 做。算法根本不应该寻找这样的id。我想知道是否有尺寸限制 shortestPaths 失败或者我遗漏了定义的其他部分。
代码非常简单：

val sp14754224 = g54.shortestPaths.landmarks("14754224").run

图形结构也很基本：

e54:org.apache.spark.sql.DataFrame
    src:integer
    dst:integer
    edgeRevenue:double
    edgeAgreements:double

v54:org.apache.spark.sql.DataFrame
    id:integer
    Name:string
    vertexRevenue:double
    vertexDealss:long

图形本身相对较大（31342个顶点和1027724条边），但它只是以前由 connectedComponets . 内存消耗似乎也没有问题（观察到的峰值约为20gb，而每个工人有64gb）。
有什么建议吗？

apache-spark Algorithm Graph

来源：https://stackoverflow.com/questions/62314893/graphframe-shortest-path-fails-with-vertex-id-not-found-error

2条答案

按热度按时间

wnavrhmk1#

我相信地标应该是一个序列，试试这个：
val sp14754224=g54.shortestpaths.landmarks（seq（“14754224”））。运行
我想知道是否有一些转换正在进行，因此您的字符串可能会变成seq[char]，从而导致vertex 1错误。

赞(0）回复(0）举报 2021-05-29

mwg9r5ms2#

从未在spark/scala中找到解决方案，但使用spark r有一个简单的解决方法：
切换到r（创建r笔记本或使用 %r ; 无论哪种方式，输入数据都必须从存储器中读取）
安装 iGraph 以及 SparkR 群集中的库
收集顶点和边以获得r data.frame 应用 iGraph 方法（例如最短路径）
图形数据必须符合司机的记忆，这是我的问题的任务的情况。
代码示例：

%r
if (require(SparkR) == FALSE) install.packages("SparkR")
if (require(igraph) == FALSE) install.packages("igraph")

processRoot <-  "abfss://yourAccount@fdestorageuat.dfs.core.windows.net/YourDataPath/"

# Data Intake

inEdgesPath     <- paste(processRoot, "STG_Edges/", sep = "")
inVerticesPath  <- paste(processRoot, "STG_Vertices/", sep = "")

inEdges     <- collect(read.parquet(inEdgesPath))
inVertices  <- collect(read.parquet(inVerticesPath))

# Graph Definition (inVertices is optional - adding to capture names size of vertices)

g <- graph_from_data_frame(validEdges, directed = FALSE, inVertices)

# Example of iGraph (defining connected components and communities)

clu <- components(g)
fg <- fastgreedy.community(g)

# Outputs from both commands have same order and lenght as vertices thus they could be added to vertices data

inVertices$componentId <- clu[["membership"]]
inVertices$communityId <- as.numeric(membership(fg))

igraph比graphx/graphframe具有更广泛的功能。只要图形数据适合内存，它的执行速度也会更快。如果graph太大，请考虑首先使用graphframe的连接组件，然后分别处理每个组件，通过调用igraph的功能 gapply 按每个组件id。

赞(0）回复(0）举报 2021-05-29

我来回答

graphframe最短路径失败，顶点id未找到错误

2条答案

相关问题

热门标签

最新问答