summaries
We introduce Zep, a novel memory layer service for intelligentsia that outperforms the current state-of-the-art system, MemGPT, in Deep Memory Retrieval (DMR) benchmarks.In addition, Zep excels in more comprehensive and challenging evaluations than DMR that better reflect real-world enterprise use cases. While existing large-scale Language Model (LLM)-based Retrieval Augmented Generation (RAG) frameworks are limited to static document retrieval, enterprise applications require dynamic integration of knowledge from multiple sources including ongoing conversations and business data.Zep addresses this fundamental limitation with its core component, Graphiti, a time-aware Knowledge Graph engine. It dynamically integrates unstructured conversation data and structured business data while maintaining historical relationships. Zep demonstrated its superior performance (94.81 TP3T vs. 93.41 TP3T) in a DMR benchmark test built by the MemGPT team. In addition to DMR, Zep's capabilities were further validated in the more challenging LongMemEval benchmark, which better reflects enterprise use cases through complex temporal reasoning tasks. In this evaluation, Zep improved in accuracy by up to 18.51 TP3T while reducing response latency by 901 TP3T compared to the baseline implementation.These results are particularly significant in enterprise-critical tasks such as cross-session information synthesis and long-term context maintenance, demonstrating Zep's effectiveness in real-world applications.
1. Introduction
In recent years, the impact of Transformer-based Large Language Models (LLMs) on industry and research has attracted a lot of attention [1].A major application of LLMs is the development of chat-based intelligences. However, the capabilities of these intelligentsia are limited by the LLM context window, effective context utilization, and knowledge gained during pre-training. Therefore, additional context is needed to provide out-of-domain (OOD) knowledge and reduce illusions.
Retrieval Augmented Generation (RAG) has become an important area of interest in LLM applications.RAG utilizes Information Retrieval (IR) techniques developed over the last fifty years [2] to provide the necessary domain knowledge for LLM.
Current approaches to using RAG focus on extensive domain knowledge and relatively static corpora-that is, the content of documents added to a corpus rarely changes. In order for intelligences to become pervasive in our daily lives, capable of autonomously solving problems ranging from the trivial to the highly complex, they will need access to an ever-evolving large corpus generated by user-intelligence interactions, as well as relevant business and world data. We believe that giving intelligences this kind of extensive and dynamic "memory" is a key component of realizing this vision, and we do not believe that current RAG approaches are appropriate for this future. Since entire dialog histories, business datasets, and other domain-specific content cannot be effectively adapted to the contextual window of LLM, new approaches need to be developed to deal with intelligent body memory. Adding memories to LLM-driven intelligences is not a new idea - this concept has been explored previously in MemGPT [3].
Recently, knowledge graphs (KGs) have been used to augment RAG architectures to address many of the shortcomings of traditional IR techniques [4]. In this paper, we introduce Zep [5], an in-memory layer service powered by Graphiti [6], a dynamic, time-aware knowledge graph engine.Zep ingests and synthesizes unstructured message data and structured business data.The Graphiti KG engine dynamically updates the knowledge graph with non-lossy of new information, maintaining a timeline of facts and relationships, including their expiration dates. This approach enables the Knowledge Graph to represent a complex, evolving world.
Since Zep is a production system, we place great importance on the accuracy, latency, and scalability of its memory retrieval mechanisms. We use two existing benchmarks to evaluate the effectiveness of these mechanisms: the Deep Memory Retrieval task (DMR) in MemGPT [3] and the LongMemEval benchmark [7].
2. Knowledge mapping
In Zep, memory is supported by a time-aware dynamic knowledge graph ℊ = (𝓃, ℯ, φ), where 𝓃 represents a node, ℯ represents an edge, and φ:ℯ→ 𝓃 × 𝓃 represents a formalized association function. This graph consists of three hierarchical subgraphs: the plot subgraph, the semantic entity subgraph, and the community subgraph.
2.1 Episodes
Zep's graph construction starts by ingesting raw data units called episodes. Episodes can be one of three core types: messages, text, or JSON.While each type requires specific processing during the construction of the graph, in this paper we focus on the message type because our experiments concentrate on conversational memory. In our context, a message consists of relatively short text (several messages can be adapted to the context window of the LLM) as well as the associated participants that produced the discourse.
Each message contains a reference timestamp trefThe time of the message is an indication of when the message was sent. This temporal information allows Zep to accurately recognize and extract relative or partial dates mentioned in the message content (e.g., "next Thursday", "in two weeks", or "last summer"). Zep implements a diachronic model, where the timeline T represents the chronological order of events, and the timeline T′ represents the chronological order of Zep data ingestion. Although T′ Timelines serve the traditional purpose of database auditing, but the T timeline provides an additional dimension for modeling the dynamic nature of conversation data and memory. This dual-time approach represents a novel advance in the construction of LLM knowledge graphs and underlies Zep's unique capabilities compared to previous graph-based RAG proposals.
Plot side ℯe Connect plots to their extracted entity nodes. Plots and their derived semantic edges maintain bi-directional indexes that track the relationship between an edge and its source plot. This design enhances the non-lossy nature of Graphiti's plot subgraphs by enabling forward and backward traversal: semantic artifacts can be traced back to their sources for citation or reference, and plots can be quickly retrieved for their associated entities and facts. While these connections were not directly examined in the experiments of this thesis, they will be explored in future work.
2.2 Semantic entities and facts
2.2.1 Entities
Entity extraction is the initial phase of episode processing. During ingestion, the system processes the current message content and the last n message to provide context for named entity recognition. For this paper and for the general implementation of Zep, then=4, two complete dialog rounds are provided for contextual evaluation. Given our focus on message processing, the speaker is automatically extracted as an entity. After the initial entity extraction, we employ a reflection technique inspired by reflection [12] to minimize illusions and enhance extraction coverage. The system also extracts entity summaries from the episode to facilitate subsequent entity parsing and retrieval operations.
After extraction, the system embeds each entity name into a 1024-dimensional vector space. This embedding makes it possible to retrieve similar nodes among the existing graph entity nodes by cosine similarity search. The system also performs separate full-text searches on existing entity names and abstracts to identify additional candidate nodes. These candidate nodes, along with the plot context, are then processed through LLM using our entity resolution hints. When the system recognizes duplicate entities, it generates an updated name and abstract.
After entity extraction and parsing, the system merges the data into the knowledge graph using predefined Cypher queries. We chose this approach over LLM-generated database queries to ensure a consistent architectural format and reduce the possibility of hallucinations.
Selected tips for atlas construction are provided in the Appendix.
2.2.2 Facts
for each fact containing its key predicate. Similarly, the same fact can be extracted multiple times across different entities, enabling Graphiti to model complex multi-entity facts by implementing hyperedges.
After extraction, the system generates embeddings for facts in preparation for graph integration. The system performs edge de-duplication through a process similar to entity resolution. Hybrid search-related edges are restricted to edges that exist between pairs of entities that are identical to the proposed new edges. This restriction not only prevents incorrect combinations of similar edges between different entities, but also significantly reduces the computational complexity of the de-duplication process by restricting the search space to a subset of edges associated with a particular entity pair.
2.2.3 Time extraction and edge invalidation
A key differentiating feature of Graphiti compared to other knowledge graph engines is that it manages dynamic information updates through temporal extraction and edge invalidation processes.
system utilization tref Extracting temporal information about facts from plot contexts. This enables accurate extraction and datetime representation, including absolute timestamps (e.g., "Alan Turing was born on June 23, 1912") and relative timestamps (e.g., "I started my new job two weeks ago"). Consistent with our dual-time modeling approach, the system tracks four timestamps:t′ Create and t′ Expiration ∈T′ Monitor when facts are created or invalidated in the system, and the tvalid cap (a poem) tinvalid∈T Tracks the timeframe in which the fact was established. These time data points are stored on the side along with other factual information.
The introduction of new edges can invalidate existing edges in the database. The system uses LLM to compare new edges with semantically related existing edges to identify potential contradictions. When the system recognizes a temporal contradiction, it does so by comparing the tinvalid Set to the invalid side of the tvalid to invalidate the affected edges. According to the transaction timeline T′, Graphiti always prioritizes new information when determining edge invalidation.
This integrated approach allows data to be dynamically added to Graphiti as conversations evolve, while maintaining the current state of the relationship and a history of its evolution over time.
2.3 Community
After building the plot and semantic subgraphs, the system constructs community subgraphs through community detection. While our community detection approach builds on the techniques described in GraphRAG [4], we employ the label propagation algorithm [13] instead of Leiden's algorithm [14]. This choice is influenced by a simple dynamic extension of label propagation that allows the system to maintain accurate community representations for longer periods of time as new data enters the graph, postponing the need for a full community refresh.
Dynamic expansion implements the logic of a single recursive step in label propagation. When the system adds a new entity node to the graph ni ∈Ns When it does, it surveys the communities of neighboring nodes. The system assigns the new node to the community held by the majority of its neighbors, and then updates the community summary and graph accordingly. While this dynamic update allows communities to scale efficiently as data flows into the system, the resulting communities gradually deviate from those generated through full label propagation runs. Therefore, periodic community refreshes remain necessary. However, this dynamic update strategy provides a practical heuristic that significantly reduces latency and LLM inference costs.
Following [4], our community nodes contain summaries derived via iterative map-reduce style summaries of member nodes. However, our retrieval approach is quite different from GraphRAG's map-reduce approach [4]. To support our retrieval approach, we generated community names containing key terms and related topics from the community summaries. These names are embedded and stored to enable cosine similarity search.
3. Memory retrieval
Zep's memory retrieval system provides powerful, sophisticated and highly configurable functionality. Overall, the Zep graph search API implements a function f:S→S, which accepts a text string query α ∈S as input and returns a text string context β ∈S as the output. The output β contains formatted data from nodes and edges, which the LLM intelligences need to generate an accurate response to the query α. Process f(α)→β consists of three distinct steps:
- Search (φ): this process first identifies post-selected nodes and edges that may contain relevant information. Although Zep employs a number of different search methods, the overall search function can be expressed as φ.S→ℰsn-×𝒩sn. ×𝒩cn. Thus, φ transforms the query into a 3-tuple containing a list of semantic edges, entity nodes, and community nodes - the three graph types containing relevant textual information.
- Reorderer (ρ): the second step is to reorder the search results. The reorderer function or model takes a list of search results and generates a reordered version of those results: ρ:φ(α),...→ℰsn×𝒩sn×𝒩cnThe
- Constructor (χ): in the last step, the constructor converts the relevant nodes and edges into a textual context: χ: theℰsn×𝒩sn×𝒩cn→S. For each ei∈ℰs, χ returns the fact and tvalid, tinvalid field; for each ni∈𝒩s, return the name and summary fields; for each ni∈𝒩c, returns the summary field.
With these definitions in place, we can set the f is represented as a combination of these three components:f(α) = χ(ρ(φ(α))) = β.
Sample context string template:
FACTS 和 ENTITIES 表示与当前对话相关的上下文信息。
以下是最相关的事实及其有效日期范围。如果该事实与某个事件相关,则表示该事件发生在这个时间范围内。
格式:FACT(日期范围:from - to)
<FACTS>
{facts}
</FACTS>
以下是最相关的实体
ENTITY_NAME:实体简介
<ENTITIES>
{entities}
</ENTITIES>
3.1 Search
Zep implements three search functions: cosine semantic similarity search (φcos), Okapi BM25 full-text search (φbm25}) and breadth-first search (φbfs). The first two functions utilize Neo4j's implementation of Lucene [15] [16]. Each search function provides different capabilities in terms of identifying relevant documents, and together they provide comprehensive coverage of candidate results before reordering. The search fields differ between object types: for 𝒜s, we search for fact fields; for 𝒩s, search for entity names; for 𝒩c, searches for community names that include relevant keywords and phrases covered in the community. Although our community search method was developed independently, it parallels the high-level key search method of LightRAG [17]. Combining LightRAG's approach with graph-based systems such as Graphiti provides a promising direction for future research.
While cosine similarity and full-text search methods are well established in RAG [18], breadth-first search on knowledge graphs has received limited attention in the RAG domain, with notable exceptions in graph-based RAG systems such as AriGraph [9] and Distill-SynthKG [19]. In Graphiti, breadth-first search is performed by identifying the n additional nodes and edges within a hop to enhance the initial search results. In addition, φbfs The ability to accept nodes as search parameters allows for finer control over the search function. This feature proves particularly valuable when using recent episodes as the seed for a breadth-first search, allowing the system to merge recently mentioned entities and relationships into the context of the search.
Each of these three search methods targets a different aspect of similarity: full-text search identifies word similarity, cosine similarity captures semantic similarity, and breadth-first search reveals contextual similarity - closer nodes and edges in the graph appear in more similar conversation contexts. This multifaceted approach to candidate result identification maximizes the likelihood of discovering the best context.
3.2 Reorderer
While the initial search approach aims for high recall, the reorderer improves precision by prioritizing the most relevant results.Zep supports existing reordering methods such as Reciprocal Rank Fusion (RRF) [20] and Maximum Marginal Relevance (MMR) [21]. In addition, Zep implements a graph-based episode mention reorderer that prioritizes results based on the frequency of entity or fact mentions, making frequently cited information more accessible. The system also includes a node distance reorderer, which reorders results based on their distance from a specified central node, providing content that is localized to a specific region of the knowledge graph. The most sophisticated reordering capability of the system employs cross-coders - LLMs that generate relevance scores by using cross-attention to evaluate the relevance of nodes and edges to a query, although this approach incurs the highest computational cost.
4. Experiments
This section analyzes two experiments conducted using LLM memory-based benchmarking. The first evaluation uses the Deep Memory Retrieval (DMR) task developed in [3], which uses a 500-conversation subset of the multi-session chat dataset introduced in "Beyond Goldfish Memory: long-term open-domain conversations" [22]. The second evaluation uses the LongMemEval benchmark test from [7]. Specifically, we used the LongMemEval ċċċċċċċċiedi dataset, which provides a wide range of dialog contexts with an average length of 115,000 tokens.
For both experiments, we integrated the dialog history into the Zep knowledge graph via Zep's API. We then retrieve the 20 most relevant edges (facts) and entity nodes (entity summaries) using the techniques described in Section 3. The system reformats this data into context strings that match the functionality provided by the Zep Memory API.
While these experiments demonstrate Graphiti's key search capabilities, they represent a subset of the system's complete search functionality. This focused scope allows for clear comparisons with existing benchmark tests, while preserving space for future work to explore additional knowledge graph capabilities.
4.1 Model selection
Our experimental implementation uses BAAI's BGE-m3 model for the reordering and embedding tasks [23] [24]. For graph construction and response generation, we use gpt-4o-mini-2024-07-18 for graph construction and gpt-4o-mini-2024-07-18 and gpt-4o-2024-11-20 for chat intelligences to generate responses to the provided context.
To ensure direct comparability with MemGPT's DMR results, we also performed a DMR evaluation using gpt-4-turbo-2024-04-09.
The lab notebook will be made publicly available through our GitHub repository, and the associated lab tips are included in the Appendix.
Table 1: Deep memory retrieval
memorization | mould | score |
Recursive Summarization Conversation Summarization MemGPT? Full Conversation | gpt-4-turbo gpt-4-turbo gpt-4-turbo gpt-4-turbo gpt-4-turbo | 35.3% 78.6% 93.4% 94.4% |
Zep Summary of the dialogue | gpt-4-turbo gpt-4o-mini | 94.8% |
Full Dialog Zep | gpt-4o-mini gpt-4o-mini | 88.0% 98.0% 98.2% |
† Results are reported in [3].
4.2 Deep Memory Retrieval (DMR)
Deep memory retrieval evaluation was introduced by [3] and consists of 500 multi-session dialogs, each containing 5 chat sessions with up to 12 messages per session. Each conversation includes a question/answer pair for memory evaluation.The MemGPT framework [3] currently leads the performance metrics with an accuracy of 93.41 TP3T, which is a significant improvement over the baseline of 35.31 TP3T achieved by recursive summarization.
To establish a comparison baseline, we implemented two common LLM memory methods: full dialog context and session summary. Using gpt-4-turbo, the full dialog baseline achieves an accuracy of 94.41 TP3T, which is slightly higher than the results reported by MemGPT, while the session summary baseline achieves 78.61 TP3T. both methods show better performance when using gpt-4o-mini: 98.01 TP3T for the full dialog, and 88.01 TP3T for the session summary . we were unable to reproduce the results of MemGPT using gpt-4o-mini due to the lack of sufficient methodological details in its published work.
We then evaluated Zep's performance by ingesting the dialog and using its search function to retrieve the top 10 most relevant nodes and edges.The LLM judge compares the intelligentsia's response to the correct answer provided.Zep achieves an accuracy of 94.81 TP3T with gpt-4-turbo, and 98.21 TP3T with gpt-4o-mini. showing marginal improvements to MemGPT and the corresponding full dialog baseline. However, these results have to be placed in context: each dialog contains only 60 messages and is easily adapted to the current LLM context window.
The limitations of the DMR assessment extend beyond its small size. Our analysis reveals significant weaknesses in the design of the benchmark test. The assessment relies exclusively on single-round, fact-retrieval questions and is unable to assess complex memory comprehension. Many of the questions contained vague wording that referred to concepts like "favorite relaxation drink" or "strange hobby" that were not explicitly described in the conversation. Crucially, the dataset does not perform well for real-world enterprise use cases of LLM intelligences. The excellent performance achieved using the simple full-context approach of modern LLM further highlights the inadequacy of benchmarking in evaluating memory systems.
This shortcoming is further emphasized by the findings in [7], which show that the performance of LLM in the LongMemEval benchmarking test decreases rapidly as the length of the conversation increases.The LongMemEval dataset [7] addresses these shortcomings by providing longer, more coherent conversations that better reflect enterprise scenarios, as well as more diverse evaluation questions.
4.3 LongMemEval (LME)
We evaluated Zep using the LongMemEvals dataset, which provides conversations and questions representative of real-world business application LLM intelligences.The LongMemEvals dataset poses a significant challenge to existing LLM and business memory solutions [7], with conversations averaging about 115,000 tokens in length. This length, while quite large, is still within the contextual window of the current frontier model, allowing us to establish a meaningful baseline to evaluate Zep's performance.
The dataset contains six different problem types: single-session users, single-session assistants, single-session preferences, multi-session, knowledge updating, and temporal reasoning. These categories are not evenly distributed in the dataset; for more information, we refer the reader to [7].
We conducted all experiments between December 2024 and January 2025 We conducted our tests using consumer laptops at a residential location in Boston, MA, connected to the Zep service hosted at AWS us-west-2. This distributed architecture introduces additional network latency in evaluating Zep's performance, although this latency is not present in our baseline evaluation.
For answer assessment, we used the GPT-4o and provided the question-specific prompts provided in [7], which have been shown to be highly relevant to human assessors.
4.3.1 LongMemEval and MemGPT
In order to establish a comparative benchmark between Zep and the current state-of-the-art MemGPT system [3], we attempted to evaluate MemGPT using the LongMemEval dataset.Given that the current MemGPT framework does not support the direct ingestion of existing message histories, we implemented a workaround by adding conversational messages to the archive history. However, we were unable to achieve successful Q&A using this approach. We look forward to seeing this benchmark test evaluated by other research teams, as comparing performance data would be beneficial for the broader development of LLM memory systems.
4.3.2 LongMemEval results
Zep demonstrated significant improvements in both accuracy and latency compared to the baseline. Using gpt-4o-mini, Zep achieves an accuracy improvement of 15.21 TP3T over the baseline, while gpt-4o achieves an improvement of 18.51 TP3T. The reduced cue size also leads to a significant reduction in latency cost compared to the baseline implementation.
Table 2: LongMemEvals
memorization | mould | score | procrastinate | Delay IQR | Average context marker |
full context | gpt-4o-mini | 55.4% | 31.3 s | 8.76 s | 115k |
Zep | gpt-4o-mini | 63.8% | 3.20 s | 1.31 s | 1.6k |
full context | gpt-40 | 60.2% | 28.9 s | 6.01 s | 115k |
Zep | gpt-40 | 71.2% | 2.58 s | 0.684 s | 1.6k |
Analysis by question type showed that gpt-4o-mini using Zep demonstrated improvements in four out of six categories, with the most significant improvements in the complex question types: single-session preference, multi-session, and temporal reasoning. When using gpt-4o, Zep demonstrates further improvements in the knowledge update category, highlighting that it is more effective when used with more capable models. However, additional development may be required to improve the understanding of Zep temporal data by less capable models.
Table 3: Decomposition of LongMemEvals problem types
Type of problem | mould | full context | Zep | incremental |
Single-session preference | gpt-4o-mini | 30.0% | 53.3% | 77.71 TP3T |
Single Session Assistant | gpt-4o-mini | 81.8% | 75.0% | ↑'6 |
chronological inference | gpt-4o-mini | 36.5% | 54.1% | 48.2%↑ |
multisession | gpt-4o-mini | 40.6% | 47.4% | 16.7%↑ |
Knowledge update | gpt-4o-mini | 76.9% | 74.4% | 3.36%↓ |
single-session user | gpt-4o-mini | 81.4% | 92.9% | 14.1%↑ |
Single-session preference | gpt-40 | 20.0% | 56.7% | 184%↑ |
Single Session Assistant | gpt-40 | 94.6% | 80.4% | 17.7%↓ |
chronological inference | gpt-40 | 45.1% | 62.4% | 38.41 TP3T |
multisession | gpt-40 | 44.3% | 57.9% | 30.7%↑ |
Knowledge update | gpt-40 | 78.2% | 83.3% | 6.52%↑ |
single-session user | gpt-40 | 81.4% | 92.9% | 14.1%↑ |
These results demonstrate Zep's ability to improve performance across model scales, with the most significant improvements observed in complex and delicate problem types when used with more capable models. Latency improvements are particularly significant, with Zep reducing response times by approximately 901 TP3T while maintaining higher accuracy.
The drop in performance on the single-session helper problem - 17.71 TP3T for gpt-4o and 9.061 TP3T for gpt-4o-mini - represents a notable exception to Zep's otherwise consistent improvements and suggests the need for further research and engineering work.
5. Conclusion
We have presented Zep, a graph-based approach to LLM memory that combines semantic and episodic memory with entity and community summarization. Our evaluation shows that Zep achieves state-of-the-art performance in existing memory benchmarks, while reducing labeling costs and operating at significantly lower latencies.
While the results achieved by Graphiti and Zep are impressive, they may only be preliminary advances in graph-based memory systems. Multiple avenues of research could build on these two frameworks, including the integration of other GraphRAG approaches into the Zep paradigm, as well as novel extensions of our work.
Research has demonstrated the value of fine-tuning models for LLM entity and edge extraction in the GraphRAG paradigm to improve accuracy while reducing cost and latency [19] [25]. Similarly, models fine-tuned for Graphiti cues may enhance knowledge extraction, especially for complex dialogs. Furthermore, while current research on LLM-generated knowledge graphs operates primarily in the absence of formal ontologies [9] [4] [17] [19] [26], domain-specific ontologies have significant potential. Graph ontologies, which are fundamental in pre-LLM knowledge graph work, deserve further exploration in the Graphiti framework.
Our search for suitable memory benchmark tests reveals limited options, with existing benchmark tests typically lacking robustness and sophistication, often defaulting to simple pin-seeking fact retrieval problems [3]. The field needs additional memory benchmark tests, especially those reflecting business applications such as customer experience tasks, to effectively evaluate and differentiate memory approaches. Notably, existing benchmark tests are insufficient to assess Zep's ability to process and synthesize conversation history with structured business data. Although Zep focuses on LLM memory, its traditional RAG capabilities should be evaluated against the benchmark tests established in [17] [27] [28].
The current literature on LLM memories and RAG systems does not adequately address the issues of production system scalability in terms of cost and latency. We include latency benchmarking of retrieval mechanisms to begin to address this gap, following the example of the authors of LightRAG in prioritizing these metrics.
6. Appendix
6.1 Tips for Graph Construction
6.1.1 Entity extraction
<之前的消息>
{previous_messages}
</之前的消息>
<当前消息>
{current_message}
</当前消息>
根据上述对话内容,从当前消息(CURRENT MESSAGE)中提取明确或隐含提到的实体节点:
指导原则:
1. 始终将说话者/行动者提取为第一个节点。说话者是每行对话中冒号前的部分。
2. 提取当前消息中提到的其他重要实体、概念或行动者。
3. 不要为关系或行为创建节点。
4. 不要为时间信息(如日期、时间或年份)创建节点(这些信息将在后续作为边添加)。
5. 节点名称尽量具体,使用全称。
6. 不要提取仅在前文中提到的实体。
6.1.2 Entity resolution
<之前的消息>
{previous_messages}
</之前的消息>
<当前消息>
{current_message}
</当前消息>
<已有节点>
{existing_nodes}
</已有节点>
根据上述已有节点(EXISTING NODES)、消息(MESSAGE)以及之前的消息(PREVIOUS MESSAGES),判断从对话中提取出的新节点(NEW NODE)是否是已有节点中的重复实体。
<新节点>
{new_node}
</新节点>
任务:
1. 如果新节点与已有节点中任意一个代表的是同一个实体,请在回复中返回 `is_duplicate: true`。
否则,返回 `is_duplicate: false`。
2. 如果返回为 is_duplicate: true,还需在回复中返回重复节点的 uuid。
3. 如果返回为 is_duplicate: true,请返回该节点最完整的全名作为名称。
指导原则:
1. 请结合节点的名称和摘要来判断是否为重复实体。重复节点的名称可能不同。
6.1.3 Fact extraction
<PREVIOUS MESSAGES>
{previous_messages}
</PREVIOUS MESSAGES>
<CURRENT MESSAGE>
{current_message}
</CURRENT MESSAGE>
<ENTITIES>
{entities}
</ENTITIES>
根据以上的消息(MESSAGES)和实体(ENTITIES),从当前消息(CURRENT MESSAGE)中提取所有与列出的实体有关的事实信息。
指南:
1. 仅提取出现在所提供实体之间的事实。
2. 每条事实应代表两个**不同节点**之间的明确关系。
3. relation_type 应为简洁、全大写的关系描述(例如:LOVES、IS_FRIENDS_WITH、WORKS_FOR)。
4. 提供包含所有相关信息的更详细事实描述。
5. 如有必要,考虑关系中的时间要素。
6.1.4 Factual analysis
根据以下上下文,判断 New Edge 是否与 Existing Edges 列表中的任意一条边表示相同的信息。
<EXISTING EDGES>
{existing_edges}
</EXISTING EDGES>
<NEW EDGE>
{new_edge}
</NEW EDGE>
任务:
1. 如果 New Edge 表达的信息与 Existing Edges 中任意一条边的事实信息相同,请在回复中返回 `is_duplicate: true`;否则返回 `is_duplicate: false`。
2. 如果 `is_duplicate` 为 true,还需在回复中返回该现有边的 uuid。
指导原则:
1. 即使事实信息不完全一致,只要表达的是相同的信息,即可视为重复。
6.1.5 Time extraction
<先前消息>
{previous_messages}
</先前消息>
<当前消息>
{current_message}
</当前消息>
<参考时间戳>
{reference_timestamp}
</参考时间戳>
<事实>
{fact}
</事实>
重要提示:仅当时间信息是所提供事实的一部分时才提取时间,否则请忽略提到的时间。
请根据提供的参考时间戳尽可能确定确切日期(例如 “10 年前”“2 分钟前” 这样的相对时间也要换算为确切时间)。
如果关系并非是持续性的,但仍能确定日期,请仅设置 valid_at 字段。
定义:
- valid_at:描述该事实所代表关系首次成立或变为真实的日期时间。
- invalid_at:描述该事实所代表关系不再成立或终止的日期时间。
任务:
分析对话内容,判断是否有与该关系事实相关的日期信息。仅当日期明确涉及关系的建立或变化时才填写。
指南:
1. 使用 ISO 8601 格式(YYYY-MM-DDTHH:MM:SS.SSSSSSZ)表示日期时间。
2. 判断时使用参考时间戳作为当前时间。
3. 如果事实是以现在时表述的,则使用参考时间戳作为 valid_at 日期。
4. 如果没有用于建立或更改关系的时间信息,请将字段留空(null)。
5. 不要根据相关事件推测日期。只使用直接用于建立或更改关系的日期。
6. 如果提到的相对时间与关系直接相关,请根据参考时间戳计算出实际日期时间。
7. 如果只提到了日期而没有具体时间,默认时间为当日 00:00:00(午夜)。
8. 如果只提到了年份,默认时间为该年 1 月 1 日的 00:00:00。
9. 始终包含时区偏移(若未提及具体时区,请使用 Z 表示 UTC)。
Reference:
https://arxiv.org/pdf/2501.13956
ZEP-Graphiti: a temporal knowledge graph architecture for intelligent body memory