2023年7月30日发(作者:)
⽹络挖掘初探索(3)_NEO4J图算法⼀、图的应⽤优势之前的那篇⽂章让数据⽤过图的⽅式展⽰,图可视化给⼈以视觉冲突,给能清晰查看关系信息。当然图除了可视化这个明显优势,还有很多其他隐藏优势,让我们⼀⼀来揭秘吧。1、提取关键特征关系信息通常是⾏为预测的关键指标。传统机器学习⽅法通过从表中提取向量、矩阵,过程中可能丢失⼤量对象之间的关联信息。通过⽬标对象的关联关系,抽取相关实体的特征信息。通过图的结构应⽤⽆监督或有监督算法(中⼼点、社区发现、关系推测等,本⽂主要介绍这些算法)来进⾏对全局的描述,为机器学习提供更多的输⼊特征。2、加速机器学习使⽤图查询⽅式代替合表。使⽤图代替稀疏矩阵的表⽰⽅式。使⽤⼦图(Subgraph)来加速学习过程,如使⽤查询、算法等建⽴多层⼦图映射,来快速定位异常或推荐的可能对象,常⽤于对实时性要求⾼的推荐系统或风险探测系统。3、可解释性(Credibility)数据可解释性,不仅能表⽰实体本⾝信息,还能表⽰实体之间的关系信息。预测结果可解释性,基于标记的知识图谱(labeled knowledge graph)构建联系起⽬标对象的可遍历的⽹络。算法可解释性,使⽤带权重的关系(weighted relationship)来构建张量(tensor)能训练可解释性的神经⽹络算法。⼆、常⽤的算法图数据库NEO4J提供了专业的分析算法常⽤算法:相似度计算中⼼度计算社区划分中⼼度算法(Centralities):(1)(2)(3)(4)(4)(5)(6)社区检测算法(Community detection):(1)(2)(3)(4)(5)(6)
路径搜索算法(Path Finding & Search):(1)(2)(3)(4)(5)(6)(7)(8)Path Expanding(路径扩展过程,)相似性算法(Similarity):(1)(2)(3)(4)(5)(6)链接预测(Link Prediction):(1)(2)(3)(4)(5)(6)预处理算法(Preprocessing):(1)PathFinding & Search ⼀般⽤来发现Nodes之间的最短路径,常⽤算法有如下⼏种 Google SearchResults Dijkstra - 边不能为负值 Folyd - 边可以为负值,有向图、⽆向图 Bellman-Ford SPFACentrality ⼀般⽤来计算这个图中节点的中⼼性,⽤来发现⽐较重要的那些Nodes。这些中⼼性可以有很多种,⽐如 DegreeCentrality - 度中⼼性 Weighted Degree Centrality - 加权度中⼼性 BetweennessCentrality - 介数中⼼性 Closeness Centrality - 紧度中⼼性 Community Detection基于社区发现算法和图分析Neo4j解读《权⼒的游戏》 ⽤于发现这个图中局部联系⽐较紧密的Nodes,类似我们学过的聚类算法。Strongly Connected Components Weakly Connected Components (Union Find)Label Propagation Lovain Modularity Triangle Count and AverageClustering Coefficient三、代码演⽰本章直接⽤NEO4J的算法包(只做代码的搬运⼯,从不⾃⼰原创代码)。1、安装算法包1. 下载算法包:从下载相应版本jar包(例:graph-algorithms-algo-3.5.4.0),放到 C:4jDesktopneo4jDatabasesdatabase-数据库IDinstallation-3.5.6plugins ⽬录下⾯2. 配置⽂件:在 C:4jDesktopneo4jDatabasesdatabase-数据库IDinstallation-3.5.6/conf/ 配置⽂件中添加 ricted=algo.*3. 重启neo4j4. 查看是否安装成功执⾏命令:CALL ()2、代码中⼼度// Closeness Centrality (接近/紧密中⼼度,ess)CALL ("Node", "LINK")YIELD nodeId, centralityMATCH (n:Node) WHERE id(n) = nodeIdRETURN AS node, centralityORDER BY centrality DESCLIMIT 20;//Betweenness Centrality (中介中⼼度,nness)CALL ("User", "MANAGES", {direction:"out"})YIELD nodeId, centralityMATCH (user:User) WHERE id(user) = nodeIdRETURN AS user,centralityORDER BY centrality DESC//PageRank (页⾯排名,nk)CALL ("Page", "LINKS",{iterations:20})YIELD nodeId, scoreMATCH (node) WHERE id(node) = nodeIdRETURN AS page,scoreORDER BY score DESC社区划分// Louvain (鲁汶算法,n)//
源码CALL n(label: STRING, relationship: STRING, { write: BOOLEAN, writeProperty: STRING // additional configuration})YIELD nodes, communities, modularity, loadMillis, computeMillis, writeMillis//案例CALL ("User", "FRIEND", {})YIELD nodeId, communityMATCH (user:User) WHERE id(user) = nodeIdRETURN AS user, communityORDER BY community;
//Label Propagation (标签传播,ropagagtion)CALL ("User", "FOLLOWS", {direction: "OUTGOING", iterations: 10})路径//Shortest Path (最短路径,stPath)MATCH (start:Loc{name:"A"}), (end:Loc{name:"F"})CALL (start, end, "cost")YIELD nodeId, costMATCH (other:Loc)
WHERE id(other) = nodeIdRETURN AS name, cost//Single Source Shortest Path (单源最短路径,tepping)MATCH (n:Loc {name:"A"})CALL (n, "cost", 3.0YIELD nodeId, distanceMATCH (destination) WHERE id(destination) = nodeIdRETURN AS destination, distance
发布者:admin,转转请注明出处:http://www.yc00.com/web/1690721922a407709.html
评论列表(0条)