重新初始化项目
commit
2db2cee569
|
|
@ -0,0 +1,9 @@
|
|||
# LLM API 配置(用于提取会议结构化信息)
|
||||
LLM_API_KEY=sk-your-api-key
|
||||
LLM_BASE_URL=https://api.deepseek.com/v1
|
||||
LLM_MODEL=deepseek-chat
|
||||
|
||||
# Embedding 配置(用于 LlamaIndex 向量存储)
|
||||
EMBEDDING_API_KEY=sk-your-embedding-key
|
||||
EMBEDDING_BASE_URL=https://api.openai.com/v1
|
||||
EMBEDDING_MODEL=text-embedding-3-small
|
||||
|
|
@ -0,0 +1,4 @@
|
|||
__pycache__/
|
||||
obsidian_vault/
|
||||
vector_store_data/
|
||||
.env
|
||||
|
|
@ -0,0 +1,4 @@
|
|||
{
|
||||
"python-envs.defaultEnvManager": "ms-python.python:conda",
|
||||
"python-envs.defaultPackageManager": "ms-python.python:conda"
|
||||
}
|
||||
|
|
@ -0,0 +1,203 @@
|
|||
# 会议纪要长期记忆系统
|
||||
|
||||
基于 LLM + LlamaIndex 向量库 + Obsidian 知识图谱的会议纪要长期记忆管理系统,支持**行动项状态追踪**与**双重内容去重**。
|
||||
|
||||
## 工作流程
|
||||
|
||||
```
|
||||
会议纪要.md ──→ ①内容哈希查重 ──→ ②语义相似度查重 ──→ LLM 结构化抽取 ──→ 状态合并
|
||||
│ │ │
|
||||
│ │ ┌─────┘
|
||||
│ │ │ meeting_state.json
|
||||
│ │ │ (行动项/指标历史/内容哈希)
|
||||
│ │ └─────┐
|
||||
│ │ │
|
||||
│ ├──→ ③标题+日期查重 ──┼──→ Obsidian Vault
|
||||
│ │ │ ├── Raw/
|
||||
│ │ │ ├── Meetings/
|
||||
│ │ │ ├── Entities/
|
||||
│ │ │ └── Graphs/
|
||||
│ │ │
|
||||
│ │ └──→ 向量索引持久化
|
||||
│ │
|
||||
└── 命中 → 跳过 ────┘── 命中 → [s]跳过 / [o]覆盖
|
||||
```
|
||||
|
||||
## 快速开始
|
||||
|
||||
```bash
|
||||
cd meeting_memory
|
||||
|
||||
# 1. 安装依赖
|
||||
python -m venv .venv
|
||||
.venv\Scripts\pip install -r requirements.txt
|
||||
|
||||
# 2. 配置 API
|
||||
cp .env.example .env
|
||||
# 编辑 .env,填入你的 LLM 和 Embedding API 信息
|
||||
|
||||
# 3. 处理一个会议纪要
|
||||
.venv\Scripts\python main.py process 会议文件.md
|
||||
|
||||
# 4. 用 Obsidian 打开 obsidian_vault/ 查看知识图谱
|
||||
```
|
||||
|
||||
## 使用方式
|
||||
|
||||
### 交互模式(推荐)
|
||||
|
||||
```bash
|
||||
.venv\Scripts\python main.py
|
||||
```
|
||||
|
||||
进入后可直接输入问题查询,支持以下命令:
|
||||
|
||||
| 命令 | 说明 |
|
||||
|------|------|
|
||||
| `query 问题` | 语义查询会议记忆 |
|
||||
| `process 文件路径` | 处理新的会议文件 |
|
||||
| `stats` | 查看统计 |
|
||||
| `exit/quit` | 退出 |
|
||||
|
||||
非命令文本自动作为查询处理。
|
||||
|
||||
### 命令行模式
|
||||
|
||||
```bash
|
||||
# 处理会议文件(重复时会交互询问跳过/覆盖)
|
||||
python main.py process meeting_example.md
|
||||
|
||||
# 强制覆盖(不询问,清理旧数据后重新处理)
|
||||
python main.py process meeting_example.md -f
|
||||
|
||||
# 语义查询
|
||||
python main.py query "弱光指标目标值是多少?"
|
||||
|
||||
# 查看统计
|
||||
python main.py stats
|
||||
|
||||
# 直接输入文本
|
||||
python main.py text "今天会议讨论了..."
|
||||
|
||||
# 批量处理(自动交互,推荐加 -f 跳过确认)
|
||||
python main.py batch "meetings/*.md" -f
|
||||
```
|
||||
|
||||
## 架构
|
||||
|
||||
```
|
||||
meeting_memory/
|
||||
├── config.py 配置 (LLM / Embedding / Obsidian / 向量库 / 状态路径)
|
||||
├── extractor.py LLM 从会议纪要中抽取结构化信息
|
||||
│ ├── title, date, participants
|
||||
│ ├── entities (人物/组织/指标/概念)
|
||||
│ ├── relations (主体-谓词-客体)
|
||||
│ ├── action_items (任务+负责人+截止)
|
||||
│ ├── metrics (指标+数值+趋势)
|
||||
│ └── decisions (决策记录)
|
||||
├── meeting_state.py ★ 跨会议状态追踪引擎
|
||||
│ ├── ActionItem: 按 task+assignee 哈希匹配
|
||||
│ ├── Metric: 按 metric_name+owner 哈希匹配
|
||||
│ ├── 历史演变记录 (时间线)
|
||||
│ ├── 会议系列自动识别 (去除期号后缀)
|
||||
│ └── ★ 内容哈希注册表 (content_hashes) 防重复
|
||||
├── vector_store.py LlamaIndex 向量索引管理
|
||||
│ ├── 自定义 Embedding 适配 (兼容任意 OpenAI 兼容 API)
|
||||
│ ├── 会议文档向量化存储 (含演变信息)
|
||||
│ ├── 语义检索 (similarity_top_k)
|
||||
│ ├── ★ 查重 + 按 meeting_id 删除覆盖
|
||||
│ └── ★ 原文语义相似度查重 (find_similar_text)
|
||||
├── obsidian_manager.py Obsidian Vault 生成器
|
||||
│ ├── Raw/ — 未加工的原文 (status: unprocessed/processed)
|
||||
│ ├── Meetings/ — 完整会议笔记 + YAML frontmatter
|
||||
│ ├── Entities/ — 实体笔记 (含行动项时间线)
|
||||
│ └── Graphs/ — 知识图谱总览 (MOC)
|
||||
├── meeting_processor.py 主流程编排
|
||||
│ ├─ 内容哈希查重 → 语义相似度查重 → LLM 抽取 → 状态合并 → Obsidian → 向量库
|
||||
│ ├─ ★ 前置去重 (LLM 调用前),避免无效 API 调用
|
||||
│ └─ ★ 重复处理时支持 skip/overwrite 选择
|
||||
├── main.py CLI 入口 (交互模式 + 子命令,支持 -f 强制覆盖)
|
||||
├── requirements.txt 依赖
|
||||
├── .env 密钥配置
|
||||
├── meeting_state.json ★ 跨会议状态持久化文件 (行动项/指标历史演变/内容哈希注册表)
|
||||
├── vector_store_data/ 向量索引持久化目录
|
||||
└── obsidian_vault/ Obsidian 知识库 (可直接用 Obsidian 打开)
|
||||
├── .obsidian/ Obsidian 配置 (app.json, core-plugins.json)
|
||||
├── Raw/ ★ 未加工原文 (处理前先保存)
|
||||
├── Meetings/ 会议笔记 *.md
|
||||
├── Entities/ 实体笔记 *.md (含历史时间线)
|
||||
└── Graphs/ 知识图谱总览
|
||||
```
|
||||
|
||||
## 核心能力
|
||||
|
||||
### 1. LLM 结构化抽取
|
||||
|
||||
输入原始会议纪要,自动抽取:
|
||||
|
||||
- **会议元信息**: 标题、日期、参会人
|
||||
- **实体**: 人物、部门、项目、KPI指标、概念制度
|
||||
- **关系**: 主体-谓词-客体 (如 `建维部 → 负责 → 网络运维`)
|
||||
- **行动项**: 任务描述 + 负责人 + 截止时间 + 优先级
|
||||
- **指标**: 指标名 + 当前值 + 目标值 + 趋势 (向好/持平/恶化)
|
||||
- **决策**: 决策内容 + 提出人 + 状态
|
||||
|
||||
### 2. LlamaIndex 向量检索
|
||||
|
||||
- 会议内容向量化存储
|
||||
- 支持自然语言语义查询
|
||||
- 持久化索引,重启自动加载
|
||||
- 兼容任意 OpenAI 兼容的 Embedding API
|
||||
|
||||
### 3. 跨会议行动项追踪
|
||||
|
||||
- 每个行动项按 `task + assignee` 生成稳定哈希 ID
|
||||
- 同系列会议(自动去除"第X期"后缀)中的相同任务被自动匹配
|
||||
- 状态变更历史完整保留:`待办 → 进行中 → 已完成`
|
||||
- Obsidian 笔记中展示完整时间线
|
||||
- `meeting_state.json` 持久化所有历史
|
||||
|
||||
### 4. 双重内容去重
|
||||
|
||||
处理前在 LLM 调用之前进行两道去重检查,避免重复内容污染记忆库:
|
||||
|
||||
- **① 内容哈希指纹**:SHA256(原文) 精确匹配,拦截完全相同的文件/文本(纳秒级,100% 确定)
|
||||
- **② 语义相似度**:原文 Embedding 余弦相似度 > 0.92 时触发,拦截同一会议的不同转录版本
|
||||
- **③ 标题+日期查重**(兜底):LLM 提取后,在向量库中检索相同标题/日期的会议
|
||||
- 命中后交互询问:**[s]跳过** 或 **[o]覆盖**
|
||||
- 覆盖模式:删除旧向量节点 + 旧 Obsidian 笔记 + 旧哈希注册,重新处理
|
||||
- `-f / --force` 标志跳过所有确认,适用于批量处理
|
||||
|
||||
### 5. Obsidian 知识图谱
|
||||
|
||||
- 自动生成完整的 Obsidian Vault
|
||||
- 所有实体独立笔记,`[[Wiki Link]]` 双向链接
|
||||
- 实体笔记中的行动项显示**最新状态 + 历史演变**
|
||||
- 打开 Obsidian Graph View 即可看到实体关系网络
|
||||
- 知识图谱总览提供全局索引
|
||||
- `.obsidian/` 配置自动生成
|
||||
|
||||
## 配置
|
||||
|
||||
编辑 `.env`:
|
||||
|
||||
```ini
|
||||
# LLM API (用于结构化抽取)
|
||||
LLM_API_KEY=sk-xxx
|
||||
LLM_BASE_URL=https://api.deepseek.com/v1
|
||||
LLM_MODEL=deepseek-chat
|
||||
|
||||
# Embedding API (用于向量检索)
|
||||
EMBEDDING_API_KEY=sk-xxx
|
||||
EMBEDDING_BASE_URL=https://api.openai.com/v1
|
||||
EMBEDDING_MODEL=text-embedding-3-small
|
||||
```
|
||||
|
||||
## 依赖
|
||||
|
||||
- `openai` — LLM 调用
|
||||
- `pydantic` — 结构化数据模型
|
||||
- `llama-index` — 向量索引与语义检索
|
||||
- `chromadb` — 向量数据库后端
|
||||
- `python-dotenv` — 环境变量管理
|
||||
- `pyvis` — 图谱可视化 (扩展功能)
|
||||
|
|
@ -0,0 +1,44 @@
|
|||
import os
|
||||
from pydantic import BaseModel, Field
|
||||
from dotenv import load_dotenv
|
||||
|
||||
load_dotenv()
|
||||
|
||||
PROJECT_ROOT = os.path.dirname(os.path.abspath(__file__))
|
||||
|
||||
|
||||
class LLMConfig(BaseModel):
|
||||
api_key: str = Field(default=os.getenv("LLM_API_KEY", ""))
|
||||
base_url: str = Field(default=os.getenv("LLM_BASE_URL", "https://api.deepseek.com/v1"))
|
||||
model: str = Field(default=os.getenv("LLM_MODEL", "deepseek-chat"))
|
||||
max_tokens: int = Field(default=64000)
|
||||
temperature: float = Field(default=0.95)
|
||||
|
||||
|
||||
class EmbeddingConfig(BaseModel):
|
||||
api_key: str = Field(default=os.getenv("EMBEDDING_API_KEY", ""))
|
||||
api_base: str = Field(default=os.getenv("EMBEDDING_BASE_URL", "https://api.openai.com/v1"))
|
||||
model: str = Field(default=os.getenv("EMBEDDING_MODEL", "text-embedding-3-small"))
|
||||
|
||||
|
||||
class ObsidianConfig(BaseModel):
|
||||
vault_path: str = Field(default=os.path.join(PROJECT_ROOT, "obsidian_vault"))
|
||||
meetings_dir: str = Field(default="Meetings")
|
||||
entities_dir: str = Field(default="Entities")
|
||||
graphs_dir: str = Field(default="Graphs")
|
||||
raw_dir: str = Field(default="Raw")
|
||||
|
||||
|
||||
class VectorStoreConfig(BaseModel):
|
||||
persist_dir: str = Field(default=os.path.join(PROJECT_ROOT, "vector_store_data"))
|
||||
|
||||
|
||||
class ProjectConfig(BaseModel):
|
||||
llm: LLMConfig = Field(default_factory=LLMConfig)
|
||||
embedding: EmbeddingConfig = Field(default_factory=EmbeddingConfig)
|
||||
obsidian: ObsidianConfig = Field(default_factory=ObsidianConfig)
|
||||
vector_store: VectorStoreConfig = Field(default_factory=VectorStoreConfig)
|
||||
state_path: str = Field(default=os.path.join(PROJECT_ROOT, "obsidian_vault", "meeting_state.json"))
|
||||
|
||||
|
||||
config = ProjectConfig()
|
||||
|
|
@ -0,0 +1,157 @@
|
|||
import json
|
||||
import logging
|
||||
import re
|
||||
from typing import List, Optional
|
||||
from pydantic import BaseModel
|
||||
|
||||
from openai import OpenAI
|
||||
|
||||
from config import config
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
client = OpenAI(
|
||||
api_key=config.llm.api_key or None,
|
||||
base_url=config.llm.base_url if config.llm.base_url else None,
|
||||
)
|
||||
|
||||
|
||||
class Entity(BaseModel):
|
||||
name: str
|
||||
entity_type: str
|
||||
description: str = ""
|
||||
|
||||
|
||||
class Relation(BaseModel):
|
||||
subject: str
|
||||
subject_type: str
|
||||
predicate: str
|
||||
object: str
|
||||
object_type: str
|
||||
description: str = ""
|
||||
|
||||
|
||||
class ActionItem(BaseModel):
|
||||
task: str
|
||||
assignee: str = ""
|
||||
deadline: str = ""
|
||||
status: str = "待办"
|
||||
priority: str = "中"
|
||||
|
||||
|
||||
class Decision(BaseModel):
|
||||
content: str
|
||||
proposer: str = ""
|
||||
status: str = "已决"
|
||||
|
||||
|
||||
class MeetingMetric(BaseModel):
|
||||
metric_name: str
|
||||
value: str
|
||||
target: str = ""
|
||||
owner: str = ""
|
||||
trend: str = ""
|
||||
|
||||
|
||||
class MeetingExtraction(BaseModel):
|
||||
title: str
|
||||
date: str = ""
|
||||
participants: List[str] = []
|
||||
agenda: List[str] = []
|
||||
entities: List[Entity] = []
|
||||
relations: List[Relation] = []
|
||||
action_items: List[ActionItem] = []
|
||||
decisions: List[Decision] = []
|
||||
metrics: List[MeetingMetric] = []
|
||||
summary: str = ""
|
||||
|
||||
|
||||
EXTRACTION_SYSTEM_PROMPT = """
|
||||
你是一个专业的会议纪要信息抽取专家。你的任务是从中文会议记录中抽取结构化信息,并严格按照要求的JSON格式返回。
|
||||
|
||||
## 抽取内容
|
||||
|
||||
### 1. 实体
|
||||
- 人物:参会人员、提及的人员
|
||||
- 组织/部门:公司、部门、团队
|
||||
- 项目/任务:正在进行的项目、任务
|
||||
- 指标/KPI:关键绩效指标(如转化率、退单率等)
|
||||
- 概念/制度:管理概念、制度要求
|
||||
- 地点:会议地点、项目地点
|
||||
|
||||
### 2. 关系 (主体-关系谓词-客体)
|
||||
抽取事实性关系,例如:
|
||||
- {"subject": "建维部", "subject_type": "组织", "predicate": "负责", "object": "网络运维", "object_type": "任务", "description": ""}
|
||||
- {"subject": "弱光指标", "subject_type": "指标", "predicate": "目标值", "object": "0.5以下", "object_type": "数值", "description": ""}
|
||||
|
||||
### 3. 行动项
|
||||
谁负责什么任务,截止时间,优先级
|
||||
|
||||
### 4. 决策
|
||||
做出的决定和结论
|
||||
|
||||
### 5. 指标数据
|
||||
具体的数字指标:当前值、目标值、负责人、趋势(向好/持平/恶化)
|
||||
|
||||
## 规则
|
||||
- 只提取事实性信息
|
||||
- 过滤比喻、假设、主观评价
|
||||
- 数字指标要精确提取
|
||||
- entities、relations、action_items、decisions、metrics 如果没有则返回空数组
|
||||
"""
|
||||
|
||||
|
||||
def _call_llm(system: str, user: str) -> str:
|
||||
response = client.chat.completions.create(
|
||||
model=config.llm.model,
|
||||
messages=[
|
||||
{"role": "system", "content": system},
|
||||
{"role": "user", "content": user},
|
||||
],
|
||||
max_tokens=config.llm.max_tokens,
|
||||
temperature=config.llm.temperature,
|
||||
)
|
||||
content = response.choices[0].message.content
|
||||
if content is None:
|
||||
raise ValueError("LLM returned empty response")
|
||||
return content
|
||||
|
||||
|
||||
def extract_meeting_info(text: str) -> MeetingExtraction:
|
||||
user_prompt = f"""
|
||||
从以下会议记录中抽取结构化信息。
|
||||
|
||||
JSON字段说明:
|
||||
- title: 会议标题
|
||||
- date: 会议日期
|
||||
- participants: 参会人列表
|
||||
- agenda: 议程列表
|
||||
- entities: 实体列表,每个实体包含 name(名称), entity_type(类型), description(描述)
|
||||
- relations: 关系列表,每个关系包含 subject(主体), subject_type(主体类型), predicate(关系谓词), object(客体), object_type(客体类型), description(描述)
|
||||
- action_items: 行动项列表,每条包含 task(任务), assignee(负责人), deadline(截止时间), status(状态), priority(优先级)
|
||||
- decisions: 决策列表,每条包含 content(决策内容), proposer(提出人), status(状态)
|
||||
- metrics: 指标列表,每条包含 metric_name(指标名), value(当前值), target(目标值), owner(负责人), trend(趋势)
|
||||
- summary: 会议摘要
|
||||
|
||||
请直接返回JSON对象。不要包含任何额外说明文字。
|
||||
|
||||
会议记录:
|
||||
{text}
|
||||
"""
|
||||
content = _call_llm(EXTRACTION_SYSTEM_PROMPT, user_prompt)
|
||||
data = _try_parse_json(content)
|
||||
return MeetingExtraction(**data)
|
||||
|
||||
|
||||
def _try_parse_json(content: str) -> dict:
|
||||
try:
|
||||
return json.loads(content)
|
||||
except json.JSONDecodeError:
|
||||
logger.warning("JSON解析失败,尝试修复...")
|
||||
match = re.search(r'\{.*\}', content, re.DOTALL)
|
||||
if match:
|
||||
try:
|
||||
return json.loads(match.group())
|
||||
except json.JSONDecodeError as e:
|
||||
logger.error(f"修复后的JSON仍无法解析: {e}")
|
||||
raise
|
||||
|
|
@ -0,0 +1,226 @@
|
|||
import argparse
|
||||
import logging
|
||||
import os
|
||||
import sys
|
||||
|
||||
if sys.stdout.encoding.lower() == "gbk":
|
||||
sys.stdout.reconfigure(encoding="utf-8")
|
||||
|
||||
logging.basicConfig(
|
||||
level=logging.INFO,
|
||||
format="%(asctime)s [%(levelname)s] %(name)s: %(message)s",
|
||||
datefmt="%H:%M:%S",
|
||||
)
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
|
||||
def cmd_process(args):
|
||||
from meeting_processor import meeting_processor
|
||||
|
||||
filepath = args.file
|
||||
if not os.path.exists(filepath):
|
||||
print(f"错误: 文件不存在: {filepath}")
|
||||
sys.exit(1)
|
||||
|
||||
print(f"正在处理会议文件: {filepath}")
|
||||
vault_path = meeting_processor.process_meeting_file(filepath, force=getattr(args, 'force', False))
|
||||
|
||||
if vault_path:
|
||||
print(f"\n✅ 会议处理完成!")
|
||||
print(f"📝 Obsidian 笔记: {vault_path}")
|
||||
print(f"📂 Obsidian Vault: {os.path.dirname(vault_path)}")
|
||||
else:
|
||||
print("\n❌ 会议处理失败")
|
||||
sys.exit(1)
|
||||
|
||||
|
||||
def cmd_text(args):
|
||||
from meeting_processor import meeting_processor
|
||||
|
||||
text = args.text
|
||||
print("正在处理会议文本...")
|
||||
vault_path = meeting_processor.process_meeting_text(text, force=getattr(args, 'force', False))
|
||||
|
||||
if vault_path:
|
||||
print(f"\n✅ 会议处理完成!")
|
||||
print(f"📝 Obsidian 笔记: {vault_path}")
|
||||
else:
|
||||
print("\n❌ 会议处理失败")
|
||||
|
||||
|
||||
def cmd_query(args):
|
||||
from meeting_processor import meeting_processor
|
||||
|
||||
question = args.question
|
||||
print(f"🔍 查询: {question}")
|
||||
print("-" * 40)
|
||||
result = meeting_processor.query(question, top_k=args.top_k)
|
||||
if result:
|
||||
print(result)
|
||||
else:
|
||||
print("未找到相关信息")
|
||||
|
||||
|
||||
def cmd_stats(args):
|
||||
from meeting_processor import meeting_processor
|
||||
|
||||
stats = meeting_processor.stats()
|
||||
print("📊 会议记忆系统统计")
|
||||
print("-" * 40)
|
||||
print(f"Obsidian 会议笔记: {stats.get('obsidian_meetings', 0)}")
|
||||
print(f"Obsidian 实体笔记: {stats.get('obsidian_entities', 0)}")
|
||||
print(f"向量索引节点数: {stats.get('vector_index', {}).get('node_count', 0)}")
|
||||
print(f"Vault 路径: {stats.get('vault_path', '')}")
|
||||
|
||||
|
||||
def cmd_batch(args):
|
||||
from meeting_processor import meeting_processor
|
||||
import glob as glob_module
|
||||
|
||||
pattern = args.pattern
|
||||
files = glob_module.glob(pattern, recursive=True)
|
||||
force = getattr(args, 'force', False)
|
||||
|
||||
if not files:
|
||||
print(f"未匹配到任何文件: {pattern}")
|
||||
sys.exit(1)
|
||||
|
||||
print(f"找到 {len(files)} 个文件,开始批量处理...")
|
||||
success = 0
|
||||
for f in files:
|
||||
try:
|
||||
print(f"\n处理: {f}")
|
||||
meeting_processor.process_meeting_file(f, force=force)
|
||||
success += 1
|
||||
except Exception as e:
|
||||
logger.error(f"处理失败: {f} - {e}")
|
||||
|
||||
print(f"\n✅ 批量处理完成: {success}/{len(files)} 成功")
|
||||
|
||||
|
||||
def cmd_interactive(args=None):
|
||||
from meeting_processor import meeting_processor
|
||||
|
||||
print("📋 会议纪要长期记忆系统 — 交互模式")
|
||||
print("=" * 50)
|
||||
print("可用命令:")
|
||||
print(" query <问题> 语义查询会议记忆")
|
||||
print(" process <路径> 处理会议文件")
|
||||
print(" stats 查看统计")
|
||||
print(" help 显示帮助")
|
||||
print(" exit/quit 退出")
|
||||
print("=" * 50)
|
||||
|
||||
while True:
|
||||
try:
|
||||
line = input("\n> ").strip()
|
||||
except (EOFError, KeyboardInterrupt):
|
||||
print()
|
||||
break
|
||||
|
||||
if not line:
|
||||
continue
|
||||
|
||||
if line in ("exit", "quit", "q"):
|
||||
break
|
||||
|
||||
if line == "help":
|
||||
print("可用命令:")
|
||||
print(" query <问题> — 语义查询会议记忆")
|
||||
print(" process <路径> — 处理一个会议markdown文件")
|
||||
print(" stats — 查看系统统计")
|
||||
print(" help — 显示此帮助")
|
||||
print(" exit/quit — 退出")
|
||||
continue
|
||||
|
||||
if line == "stats":
|
||||
stats = meeting_processor.stats()
|
||||
print(f"📊 会议: {stats.get('obsidian_meetings', 0)} | "
|
||||
f"实体: {stats.get('obsidian_entities', 0)} | "
|
||||
f"向量节点: {stats.get('vector_index', {}).get('node_count', 0)}")
|
||||
continue
|
||||
|
||||
if line.startswith("process "):
|
||||
filepath = line[8:].strip()
|
||||
if not os.path.exists(filepath):
|
||||
print(f"❌ 文件不存在: {filepath}")
|
||||
continue
|
||||
print(f"正在处理: {filepath}")
|
||||
vault_path = meeting_processor.process_meeting_file(filepath)
|
||||
if vault_path:
|
||||
print(f"✅ 完成: {vault_path}")
|
||||
else:
|
||||
print("❌ 处理失败")
|
||||
continue
|
||||
|
||||
if line.startswith("query "):
|
||||
question = line[6:].strip()
|
||||
else:
|
||||
question = line
|
||||
|
||||
print(f"🔍 查询中...", end="", flush=True)
|
||||
result = meeting_processor.query(question, top_k=3)
|
||||
print("\r" + " " * 30 + "\r", end="")
|
||||
if result:
|
||||
print(result[:2000])
|
||||
if len(result) > 2000:
|
||||
print("... (结果过长已截断)")
|
||||
else:
|
||||
print("未找到相关信息")
|
||||
print("bye!")
|
||||
|
||||
|
||||
def main():
|
||||
parser = argparse.ArgumentParser(
|
||||
description="📋 会议纪要长期记忆系统",
|
||||
formatter_class=argparse.RawDescriptionHelpFormatter,
|
||||
epilog="""
|
||||
示例:
|
||||
python main.py process meeting_example.md
|
||||
python main.py query "弱光指标目标值是多少?"
|
||||
python main.py stats
|
||||
python main.py text "今天会议讨论了..."
|
||||
|
||||
无参数时进入交互模式。
|
||||
|
||||
Powered by LlamaIndex + Obsidian + LLM
|
||||
""",
|
||||
)
|
||||
subparsers = parser.add_subparsers(dest="command", help="子命令")
|
||||
|
||||
p_process = subparsers.add_parser("process", help="处理会议 markdown 文件")
|
||||
p_process.add_argument("file", help="会议纪要 markdown 文件路径")
|
||||
p_process.add_argument("-f", "--force", action="store_true", help="重复时自动覆盖,跳过确认")
|
||||
|
||||
p_text = subparsers.add_parser("text", help="直接输入会议文本")
|
||||
p_text.add_argument("text", help="会议文本内容")
|
||||
p_text.add_argument("-f", "--force", action="store_true", help="重复时自动覆盖,跳过确认")
|
||||
|
||||
p_query = subparsers.add_parser("query", help="语义查询会议记忆")
|
||||
p_query.add_argument("question", help="查询问题")
|
||||
p_query.add_argument("--top-k", type=int, default=3, help="返回结果数量")
|
||||
|
||||
p_stats = subparsers.add_parser("stats", help="查看系统统计")
|
||||
|
||||
p_batch = subparsers.add_parser("batch", help="批量处理会议文件")
|
||||
p_batch.add_argument("pattern", help="文件 glob 模式, 如 'meetings/*.md'")
|
||||
p_batch.add_argument("-f", "--force", action="store_true", help="重复时自动覆盖,跳过确认")
|
||||
|
||||
args = parser.parse_args()
|
||||
|
||||
if args.command == "process":
|
||||
cmd_process(args)
|
||||
elif args.command == "text":
|
||||
cmd_text(args)
|
||||
elif args.command == "query":
|
||||
cmd_query(args)
|
||||
elif args.command == "stats":
|
||||
cmd_stats(args)
|
||||
elif args.command == "batch":
|
||||
cmd_batch(args)
|
||||
else:
|
||||
cmd_interactive(args)
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
|
|
@ -0,0 +1,75 @@
|
|||
|
||||
|
||||
# 会议记录
|
||||
|
||||
议 题:合川分公司周例会(2026第X期)
|
||||
|
||||
时 间:2026年5月6日 13:37—14:23
|
||||
|
||||
地 点:分公司会议室
|
||||
|
||||
主持人:AlanPaine
|
||||
|
||||
参加人:分公司领导、各部门经理及相关人员
|
||||
|
||||
议程:
|
||||
|
||||
一、各部门汇报
|
||||
|
||||
二、分公司领导指示部署
|
||||
|
||||
---
|
||||
|
||||
## 会议内容
|
||||
|
||||
### 一、各部门汇报
|
||||
|
||||
建维部、综合部、商客市场负责人按议程现场按顺序做汇报。建维部汇报宽带安装受天气影响进度偏后,弱光指标0.51持续向好,三代终端年度目标5.5需持续压降,九零工程月度转化率87.35%接近90%目标,退单率6.53%,PCDN专线学校出口问题正协调限速机制,二级基站拆除预计4月中旬完成;综合部通报建委相关工作清单及投资计划已汇报,打印设备已协调保障招投标需求,工会经费压减后严考严用,食堂改造及自饮机引入方案正在推进,第四届体育文化节方阵人员招募与排练已部署;商客市场2月收入88.5万元实现增长,三期项目二期拆迁完成1145户,社区与单位清洗服务5场落实签约量待提升。
|
||||
|
||||
---
|
||||
|
||||
### 二、部署强调
|
||||
|
||||
#### 建维部负责人强调:
|
||||
|
||||
1. **网络运维与指标管控:**
|
||||
- 弱光指标0.51持续向好,三代终端年度目标5.5需持续压降,FPTR已达标但主动过境0.3靠后。
|
||||
- 九零工程月度转化率87.35%接近90%目标,退单率6.53%,主要受用户原因及改约影响,已建议施工优化BtoC审核撤单流程。
|
||||
- PCDN专线因学校出口带宽问题持续恶化,正协调限速机制;超频基站故障已及时处理,专线巡检按计划推进。
|
||||
|
||||
2. **工作反馈与执行要求:**
|
||||
- 强调养成“日清日结”习惯,工作回复必须量化、有结果、有措施,杜绝工作拖延数月未动。
|
||||
- 针对关键业务上量指标缺乏保障措施问题,要求本周内出具具体可行方案并明确责任人。
|
||||
|
||||
---
|
||||
|
||||
#### 综合部负责人强调:
|
||||
|
||||
1. **工会经费与后勤保障:**
|
||||
- 工会经费全面压减,需严考严用、以更少资金办更好实事。软性工程(更衣室、食堂改造)已确定上报市公司,拟引入自饮机解决饮水问题并节约成本。
|
||||
|
||||
2. **奖项申报策略:**
|
||||
- 针对河川区“担当作为先进集体和先进个人”申报,评选条件多为定性要求,建议提前与区领导或分管领导沟通确认意向后再行申报,避免盲目提交浪费资源。
|
||||
|
||||
---
|
||||
|
||||
#### 市场部负责人强调:
|
||||
|
||||
1. **季度收官与二季度谋划:**
|
||||
- 市场部需提前谋划季度收官及二季度业务活动,打破淡季思维,全力推动商客、H业务及AI军团活动升温。
|
||||
- 本周内完成招聘情况、农村渠道进度及营销方案汇报;本周六上午视频汇报运动会筹备情况。
|
||||
|
||||
2. **满意度与考核管控:**
|
||||
- 深刻反思满意度测评前期工作未做到位问题,要求主管亲自抓。明确满意度及投诉考核标准,不满客户需严格按5:30及5:35节点操作报警。
|
||||
- 要求商客经理每日微信发送日报,跟进考核细节,确保指标可控。
|
||||
|
||||
---
|
||||
|
||||
#### 分公司主要领导强调:
|
||||
|
||||
1. **强化执行力与作风:**
|
||||
- 各部门及一线人员必须摒弃“知道怎么做却不去做”的作风,做到事不做好不收兵。分管领导需加强政企部等部门督导力度,必要时亲自沟通。
|
||||
|
||||
2. **年度考核提前摸底:**
|
||||
- 针对四公司年度考核及集团相关指标提升,要求各部门提前深入了解考核细则及可能产生重大影响的不利因素并及时上报,切忌定稿后被动。
|
||||
- 市公司会统筹考虑分公司整体情况,务必提前布局、赢在起跑线。
|
||||
|
|
@ -0,0 +1,171 @@
|
|||
import hashlib
|
||||
import logging
|
||||
from typing import Optional
|
||||
|
||||
from extractor import extract_meeting_info, MeetingExtraction
|
||||
from vector_store import meeting_vector_store
|
||||
from obsidian_manager import obsidian_manager
|
||||
from meeting_state import MeetingStateStore
|
||||
from config import config
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
state_store = MeetingStateStore(config.state_path)
|
||||
|
||||
|
||||
class MeetingProcessor:
|
||||
def process_meeting_file(self, filepath: str, force: bool = False) -> Optional[str]:
|
||||
with open(filepath, "r", encoding="utf-8") as f:
|
||||
text = f.read()
|
||||
|
||||
return self.process_meeting_text(text, force=force)
|
||||
|
||||
def process_meeting_text(self, text: str, force: bool = False) -> Optional[str]:
|
||||
content_hash = self._compute_content_hash(text)
|
||||
|
||||
if not force and state_store.has_content_hash(content_hash):
|
||||
print(f"\n⚠️ 检测到重复内容(内容指纹匹配),跳过处理")
|
||||
logger.info(f"内容哈希重复,跳过: {content_hash[:12]}")
|
||||
return None
|
||||
|
||||
if not force:
|
||||
similar = meeting_vector_store.find_similar_text(text, threshold=0.92)
|
||||
if similar:
|
||||
meta = similar["metadata"]
|
||||
print(f"\n⚠️ 发现高度相似的已有会议: 「{meta.get('title', '')}」({meta.get('date', '')}) 相似度: {similar['score']:.2%}")
|
||||
while True:
|
||||
choice = input(" 选择操作 [s]跳过 / [o]覆盖 (默认 s): ").strip().lower() or "s"
|
||||
if choice == "s":
|
||||
logger.info(f"跳过相似会议: {meta.get('title', '')}")
|
||||
return None
|
||||
elif choice == "o":
|
||||
logger.info(f"覆盖重新处理相似会议")
|
||||
force = True
|
||||
break
|
||||
print(" 请输入 s(skip) 或 o(overwrite)")
|
||||
|
||||
meeting_data = self._extract(text)
|
||||
if not meeting_data:
|
||||
logger.error("会议信息提取失败")
|
||||
return None
|
||||
|
||||
data_dict = meeting_data.model_dump()
|
||||
meeting_title = data_dict.get("title", "")
|
||||
meeting_date = data_dict.get("date", "")
|
||||
data_dict["_content_hash"] = content_hash
|
||||
|
||||
should_skip = self._handle_duplicate(data_dict, force)
|
||||
if should_skip:
|
||||
return None
|
||||
|
||||
raw_path = obsidian_manager.save_raw_text(
|
||||
text,
|
||||
title=meeting_title,
|
||||
date=meeting_date,
|
||||
)
|
||||
data_dict["_original_text"] = text
|
||||
data_dict["_original_text_path"] = raw_path
|
||||
|
||||
obsidian_manager.mark_raw_processed(raw_path)
|
||||
|
||||
meeting_filename = obsidian_manager._meeting_filename(data_dict)
|
||||
|
||||
merged_items = state_store.merge_action_items(
|
||||
data_dict.get("action_items", []),
|
||||
meeting_title,
|
||||
meeting_date,
|
||||
meeting_filename,
|
||||
)
|
||||
data_dict["action_items"] = merged_items
|
||||
|
||||
merged_metrics = state_store.merge_metrics(
|
||||
data_dict.get("metrics", []),
|
||||
meeting_title,
|
||||
meeting_date,
|
||||
meeting_filename,
|
||||
)
|
||||
data_dict["metrics"] = merged_metrics
|
||||
|
||||
state_store.add_content_hash(content_hash, meeting_title, meeting_date, meeting_filename)
|
||||
state_store.save()
|
||||
|
||||
vault_path = obsidian_manager.add_meeting(data_dict, text)
|
||||
|
||||
vector_store_manager = meeting_vector_store
|
||||
vector_store_manager.add_meeting(data_dict)
|
||||
|
||||
logger.info(f"会议处理完成: {meeting_data.title}")
|
||||
return vault_path
|
||||
|
||||
def _handle_duplicate(self, data_dict: dict, force: bool) -> bool:
|
||||
title = data_dict.get("title", "")
|
||||
date = data_dict.get("date", "")
|
||||
|
||||
existing = meeting_vector_store.find_meeting(title, date)
|
||||
file_exists = obsidian_manager.meeting_file_exists(data_dict)
|
||||
|
||||
if not existing and not file_exists:
|
||||
return False
|
||||
|
||||
if force:
|
||||
logger.info(f"发现重复会议「{title}」,--force 模式自动覆盖")
|
||||
self._remove_old(data_dict)
|
||||
return False
|
||||
|
||||
print(f"\n⚠️ 发现重复会议: 「{title}」({date})")
|
||||
while True:
|
||||
choice = input(" 选择操作 [s]跳过 / [o]覆盖 (默认 s): ").strip().lower() or "s"
|
||||
if choice == "s":
|
||||
logger.info(f"跳过重复会议: {title}")
|
||||
return True
|
||||
elif choice == "o":
|
||||
logger.info(f"覆盖重新处理: {title}")
|
||||
self._remove_old(data_dict)
|
||||
return False
|
||||
print(" 请输入 s(skip) 或 o(overwrite)")
|
||||
|
||||
def _remove_old(self, data_dict: dict):
|
||||
meeting_id = meeting_vector_store._meeting_id(data_dict)
|
||||
meeting_vector_store.remove_meeting(meeting_id)
|
||||
obsidian_manager.remove_meeting_note(data_dict)
|
||||
content_hash = data_dict.get("_content_hash", "")
|
||||
if content_hash:
|
||||
state_store.remove_content_hash(content_hash)
|
||||
logger.info(f"旧数据清理完成: {data_dict.get('title', '')}")
|
||||
|
||||
def _compute_content_hash(self, text: str) -> str:
|
||||
normalized = text.strip().replace('\r\n', '\n')
|
||||
return hashlib.sha256(normalized.encode('utf-8')).hexdigest()
|
||||
|
||||
def _extract(self, text: str) -> Optional[MeetingExtraction]:
|
||||
try:
|
||||
return extract_meeting_info(text)
|
||||
except Exception as e:
|
||||
logger.error(f"LLM提取失败: {e}")
|
||||
return None
|
||||
|
||||
def query(self, question: str, top_k: int = 3) -> str:
|
||||
return meeting_vector_store.query_as_context(question, top_k=top_k)
|
||||
|
||||
def stats(self) -> dict:
|
||||
import os
|
||||
vault = config.obsidian.vault_path
|
||||
meetings_dir = os.path.join(vault, config.obsidian.meetings_dir)
|
||||
entities_dir = os.path.join(vault, config.obsidian.entities_dir)
|
||||
|
||||
meeting_files = [f for f in os.listdir(meetings_dir) if f.endswith(".md")] if os.path.exists(meetings_dir) else []
|
||||
entity_files = [f for f in os.listdir(entities_dir) if f.endswith(".md")] if os.path.exists(entities_dir) else []
|
||||
|
||||
vs_stats = meeting_vector_store.get_stats()
|
||||
state_stats = state_store.get_stats()
|
||||
|
||||
return {
|
||||
"obsidian_meetings": len(meeting_files),
|
||||
"obsidian_entities": len(entity_files),
|
||||
"vector_index": vs_stats,
|
||||
"state": state_stats,
|
||||
"vault_path": vault,
|
||||
}
|
||||
|
||||
|
||||
meeting_processor = MeetingProcessor()
|
||||
|
|
@ -0,0 +1,189 @@
|
|||
import hashlib
|
||||
import json
|
||||
import logging
|
||||
import os
|
||||
from datetime import datetime
|
||||
from typing import Dict, List, Optional
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
|
||||
def _item_id(task: str, assignee: str) -> str:
|
||||
raw = f"{task}|{assignee}"
|
||||
return hashlib.md5(raw.encode("utf-8")).hexdigest()[:8]
|
||||
|
||||
|
||||
def _metric_id(metric_name: str, owner: str) -> str:
|
||||
raw = f"{metric_name}|{owner}"
|
||||
return hashlib.md5(raw.encode("utf-8")).hexdigest()[:8]
|
||||
|
||||
|
||||
class MeetingStateStore:
|
||||
def __init__(self, state_path: str):
|
||||
self.state_path = state_path
|
||||
self._state = self._load()
|
||||
|
||||
def _load(self) -> dict:
|
||||
if os.path.exists(self.state_path):
|
||||
try:
|
||||
with open(self.state_path, "r", encoding="utf-8") as f:
|
||||
return json.load(f)
|
||||
except Exception as e:
|
||||
logger.warning(f"加载状态文件失败,将创建新状态: {e}")
|
||||
return {
|
||||
"action_items": {},
|
||||
"metrics": {},
|
||||
"meeting_series": {},
|
||||
"content_hashes": {},
|
||||
}
|
||||
|
||||
def save(self):
|
||||
os.makedirs(os.path.dirname(self.state_path), exist_ok=True)
|
||||
with open(self.state_path, "w", encoding="utf-8") as f:
|
||||
json.dump(self._state, f, ensure_ascii=False, indent=2)
|
||||
|
||||
def _ensure_series(self, meeting_title: str, meeting_date: str) -> str:
|
||||
series_name = self._detect_series(meeting_title)
|
||||
series = self._state["meeting_series"].get(series_name)
|
||||
if not series:
|
||||
series = {"latest_date": meeting_date, "processed_titles": []}
|
||||
self._state["meeting_series"][series_name] = series
|
||||
if meeting_date > series.get("latest_date", ""):
|
||||
series["latest_date"] = meeting_date
|
||||
if meeting_title not in series["processed_titles"]:
|
||||
series["processed_titles"].append(meeting_title)
|
||||
return series_name
|
||||
|
||||
def _detect_series(self, title: str) -> str:
|
||||
import re
|
||||
cleaned = re.sub(r"(\d{4}第\w+期)", "", title)
|
||||
cleaned = re.sub(r"\(\d{4}第\w+期\)", "", cleaned)
|
||||
cleaned = re.sub(r"\d{4}第\w+期", "", cleaned)
|
||||
cleaned = re.sub(r"\d{4}年第\w+次", "", cleaned)
|
||||
cleaned = cleaned.strip("-_ ")
|
||||
return cleaned or title
|
||||
|
||||
def merge_action_items(
|
||||
self,
|
||||
new_items: List[dict],
|
||||
meeting_title: str,
|
||||
meeting_date: str,
|
||||
meeting_filename: str,
|
||||
) -> List[dict]:
|
||||
series_name = self._ensure_series(meeting_title, meeting_date)
|
||||
merged = []
|
||||
|
||||
for item in new_items:
|
||||
task = item.get("task", "")
|
||||
assignee = item.get("assignee", "")
|
||||
iid = _item_id(task, assignee)
|
||||
|
||||
history_entry = {
|
||||
"date": meeting_date,
|
||||
"meeting": meeting_filename,
|
||||
"status": item.get("status", "待办"),
|
||||
"priority": item.get("priority", "中"),
|
||||
"deadline": item.get("deadline", ""),
|
||||
}
|
||||
|
||||
existing = self._state["action_items"].get(iid)
|
||||
if existing:
|
||||
existing["history"].append(history_entry)
|
||||
existing["latest"] = history_entry
|
||||
latest = existing["history"][-1]
|
||||
item["_item_id"] = iid
|
||||
item["_history"] = list(existing["history"])
|
||||
item["status"] = latest["status"]
|
||||
item["priority"] = latest["priority"]
|
||||
item["deadline"] = latest["deadline"]
|
||||
else:
|
||||
self._state["action_items"][iid] = {
|
||||
"item_id": iid,
|
||||
"task": task,
|
||||
"assignee": assignee,
|
||||
"series": series_name,
|
||||
"created_meeting": meeting_filename,
|
||||
"history": [history_entry],
|
||||
"latest": history_entry,
|
||||
}
|
||||
item["_item_id"] = iid
|
||||
item["_history"] = [history_entry]
|
||||
|
||||
merged.append(item)
|
||||
|
||||
return merged
|
||||
|
||||
def merge_metrics(
|
||||
self,
|
||||
new_metrics: List[dict],
|
||||
meeting_title: str,
|
||||
meeting_date: str,
|
||||
meeting_filename: str,
|
||||
) -> List[dict]:
|
||||
merged = []
|
||||
|
||||
for m in new_metrics:
|
||||
metric_name = m.get("metric_name", "")
|
||||
owner = m.get("owner", "")
|
||||
mid = _metric_id(metric_name, owner)
|
||||
|
||||
history_entry = {
|
||||
"date": meeting_date,
|
||||
"meeting": meeting_filename,
|
||||
"value": m.get("value", ""),
|
||||
"target": m.get("target", ""),
|
||||
"trend": m.get("trend", ""),
|
||||
}
|
||||
|
||||
existing = self._state["metrics"].get(mid)
|
||||
if existing:
|
||||
existing["history"].append(history_entry)
|
||||
existing["latest"] = history_entry
|
||||
item = m
|
||||
item["_metric_id"] = mid
|
||||
item["_history"] = list(existing["history"])
|
||||
else:
|
||||
self._state["metrics"][mid] = {
|
||||
"metric_id": mid,
|
||||
"metric_name": metric_name,
|
||||
"owner": owner,
|
||||
"history": [history_entry],
|
||||
"latest": history_entry,
|
||||
}
|
||||
m["_metric_id"] = mid
|
||||
m["_history"] = [history_entry]
|
||||
|
||||
merged.append(m)
|
||||
|
||||
return merged
|
||||
|
||||
def get_action_item_history(self, item_id: str) -> Optional[dict]:
|
||||
return self._state["action_items"].get(item_id)
|
||||
|
||||
def get_metric_history(self, metric_id: str) -> Optional[dict]:
|
||||
return self._state["metrics"].get(metric_id)
|
||||
|
||||
def get_series_info(self, title: str) -> Optional[dict]:
|
||||
series_name = self._detect_series(title)
|
||||
return self._state["meeting_series"].get(series_name)
|
||||
|
||||
def has_content_hash(self, content_hash: str) -> bool:
|
||||
return content_hash in self._state["content_hashes"]
|
||||
|
||||
def add_content_hash(self, content_hash: str, title: str, date: str, filename: str):
|
||||
self._state["content_hashes"][content_hash] = {
|
||||
"title": title,
|
||||
"date": date,
|
||||
"filename": filename,
|
||||
}
|
||||
|
||||
def remove_content_hash(self, content_hash: str):
|
||||
self._state["content_hashes"].pop(content_hash, None)
|
||||
|
||||
def get_stats(self) -> dict:
|
||||
return {
|
||||
"action_items_tracked": len(self._state["action_items"]),
|
||||
"metrics_tracked": len(self._state["metrics"]),
|
||||
"meeting_series": len(self._state["meeting_series"]),
|
||||
"content_hashes": len(self._state["content_hashes"]),
|
||||
}
|
||||
|
|
@ -0,0 +1,416 @@
|
|||
import logging
|
||||
import os
|
||||
import shutil
|
||||
from datetime import datetime
|
||||
from typing import Dict, List, Optional, Set
|
||||
|
||||
from config import config
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
|
||||
def _sanitize_filename(name: str) -> str:
|
||||
if not name:
|
||||
return "未命名"
|
||||
invalid = '<>:"/\\|?*'
|
||||
for c in invalid:
|
||||
name = name.replace(c, "")
|
||||
name = name.replace(" ", "_").strip("._")
|
||||
if not name:
|
||||
return "未命名"
|
||||
return name
|
||||
|
||||
|
||||
def _safe_filename(name: str, max_len: int = 60) -> str:
|
||||
safe = _sanitize_filename(name)
|
||||
if len(safe) > max_len:
|
||||
safe = safe[:max_len]
|
||||
return safe
|
||||
|
||||
|
||||
class ObsidianVaultManager:
|
||||
def __init__(self):
|
||||
self.vault_path = config.obsidian.vault_path
|
||||
self.meetings_dir = os.path.join(self.vault_path, config.obsidian.meetings_dir)
|
||||
self.entities_dir = os.path.join(self.vault_path, config.obsidian.entities_dir)
|
||||
self.graphs_dir = os.path.join(self.vault_path, config.obsidian.graphs_dir)
|
||||
self.raw_dir = os.path.join(self.vault_path, config.obsidian.raw_dir)
|
||||
self._ensure_dirs()
|
||||
|
||||
def _ensure_dirs(self):
|
||||
for d in [self.meetings_dir, self.entities_dir, self.graphs_dir, self.raw_dir]:
|
||||
os.makedirs(d, exist_ok=True)
|
||||
|
||||
def save_raw_text(self, text: str, title: str = "", date: str = "") -> str:
|
||||
date_str = date or datetime.now().strftime("%Y-%m-%d")
|
||||
safe_title = _safe_filename(title or "未命名", 40)
|
||||
filename = f"{date_str}_{safe_title}.md"
|
||||
filepath = os.path.join(self.raw_dir, filename)
|
||||
|
||||
if os.path.exists(filepath):
|
||||
with open(filepath, "r", encoding="utf-8") as f:
|
||||
existing = f.read()
|
||||
if "status: processed" in existing:
|
||||
logger.warning(f"原文文件已存在且已处理过,将被覆盖: {filepath}")
|
||||
|
||||
content = f"""---
|
||||
title: "{title}"
|
||||
date: "{date_str}"
|
||||
tags: [raw]
|
||||
status: unprocessed
|
||||
---
|
||||
|
||||
# {title or "未命名"}
|
||||
|
||||
**日期**: {date_str}
|
||||
|
||||
## 原文
|
||||
|
||||
{text}
|
||||
"""
|
||||
with open(filepath, "w", encoding="utf-8") as f:
|
||||
f.write(content)
|
||||
logger.info(f"原文已保存: {filepath}")
|
||||
return filepath
|
||||
|
||||
def mark_raw_processed(self, raw_filepath: str):
|
||||
if not os.path.exists(raw_filepath):
|
||||
return
|
||||
with open(raw_filepath, "r", encoding="utf-8") as f:
|
||||
content = f.read()
|
||||
content = content.replace("status: unprocessed", "status: processed")
|
||||
with open(raw_filepath, "w", encoding="utf-8") as f:
|
||||
f.write(content)
|
||||
|
||||
def _ensure_obsidian_config(self):
|
||||
obsidian_config = os.path.join(self.vault_path, ".obsidian", "app.json")
|
||||
if not os.path.exists(obsidian_config):
|
||||
os.makedirs(os.path.dirname(obsidian_config), exist_ok=True)
|
||||
with open(obsidian_config, "w", encoding="utf-8") as f:
|
||||
f.write('{\n "alwaysUpdateLinks": true,\n "newFileLocation": "current",\n "useMarkdownLinks": true\n}')
|
||||
|
||||
core_plugins = os.path.join(self.vault_path, ".obsidian", "core-plugins.json")
|
||||
if not os.path.exists(core_plugins):
|
||||
with open(core_plugins, "w", encoding="utf-8") as f:
|
||||
f.write('{\n "file-explorer": true,\n "graph": true,\n "backlink": true,\n "tag-pane": true,\n "page-preview": true,\n "templates": true,\n "search": true\n}')
|
||||
|
||||
def _meeting_filename(self, data: dict) -> str:
|
||||
date_str = data.get("date", datetime.now().strftime("%Y-%m-%d"))
|
||||
title = data.get("title", "未命名会议")
|
||||
safe_title = _safe_filename(title, 40)
|
||||
return f"{date_str}_{safe_title}.md"
|
||||
|
||||
def _entity_path(self, name: str) -> str:
|
||||
safe = _safe_filename(name, 60)
|
||||
return os.path.join(self.entities_dir, f"{safe}.md")
|
||||
|
||||
def _entity_link(self, name: str) -> str:
|
||||
safe = _safe_filename(name, 60)
|
||||
return f"[[Entities/{safe}|{name}]]"
|
||||
|
||||
def _meeting_link(self, data: dict) -> str:
|
||||
fname = self._meeting_filename(data).replace(".md", "")
|
||||
title = data.get("title", "未命名会议")
|
||||
return f"[[Meetings/{fname}|{title}]]"
|
||||
|
||||
def meeting_filepath(self, meeting_data: dict) -> str:
|
||||
filename = self._meeting_filename(meeting_data)
|
||||
return os.path.join(self.meetings_dir, filename)
|
||||
|
||||
def meeting_file_exists(self, meeting_data: dict) -> bool:
|
||||
return os.path.exists(self.meeting_filepath(meeting_data))
|
||||
|
||||
def raw_filepath(self, meeting_data: dict) -> str:
|
||||
date_str = meeting_data.get("date", datetime.now().strftime("%Y-%m-%d"))
|
||||
title = meeting_data.get("title", "未命名")
|
||||
safe_title = _safe_filename(title, 40)
|
||||
filename = f"{date_str}_{safe_title}.md"
|
||||
return os.path.join(self.raw_dir, filename)
|
||||
|
||||
def remove_meeting_note(self, meeting_data: dict):
|
||||
paths = [
|
||||
self.meeting_filepath(meeting_data),
|
||||
self.raw_filepath(meeting_data),
|
||||
]
|
||||
for p in paths:
|
||||
if os.path.exists(p):
|
||||
os.remove(p)
|
||||
logger.info(f"已删除: {p}")
|
||||
|
||||
def add_meeting(self, meeting_data: dict, original_text: str) -> str:
|
||||
self._ensure_obsidian_config()
|
||||
filename = self._meeting_filename(meeting_data)
|
||||
filepath = os.path.join(self.meetings_dir, filename)
|
||||
content = self._render_meeting_note(meeting_data, original_text)
|
||||
with open(filepath, "w", encoding="utf-8") as f:
|
||||
f.write(content)
|
||||
logger.info(f"会议笔记已生成: {filepath}")
|
||||
|
||||
self._create_all_entity_notes(meeting_data)
|
||||
self._update_graph_moc()
|
||||
|
||||
return filepath
|
||||
|
||||
def _render_meeting_note(self, data: dict, original_text: str) -> str:
|
||||
lines = []
|
||||
lines.append("---")
|
||||
lines.append(f'title: "{data.get("title", "")}"')
|
||||
lines.append(f'date: "{data.get("date", "")}"')
|
||||
content_hash = data.get("_content_hash", "")
|
||||
if content_hash:
|
||||
lines.append(f'content_hash: "{content_hash}"')
|
||||
lines.append("tags: [meeting]")
|
||||
lines.append("---")
|
||||
lines.append("")
|
||||
|
||||
lines.append(f"# {data.get('title', '')}")
|
||||
lines.append("")
|
||||
|
||||
if data.get("date"):
|
||||
lines.append(f"**日期**: {data['date']}")
|
||||
if data.get("participants"):
|
||||
participants_links = [self._entity_link(p) for p in data["participants"]]
|
||||
lines.append(f"**参会人**: {', '.join(participants_links)}")
|
||||
lines.append("")
|
||||
|
||||
if data.get("summary"):
|
||||
lines.append("## 摘要")
|
||||
lines.append(data["summary"])
|
||||
lines.append("")
|
||||
|
||||
lines.append("## 原文")
|
||||
lines.append(original_text)
|
||||
lines.append("")
|
||||
|
||||
if data.get("entities"):
|
||||
lines.append("## 涉及实体")
|
||||
for e in data["entities"]:
|
||||
name = e.get("name", "")
|
||||
if not name:
|
||||
continue
|
||||
lines.append(f"- {self._entity_link(name)} ({e.get('entity_type', '')}): {e.get('description', '')}")
|
||||
lines.append("")
|
||||
|
||||
if data.get("action_items"):
|
||||
lines.append("## 行动项")
|
||||
for item in data["action_items"]:
|
||||
task = item.get("task", "")
|
||||
assignee_link = self._entity_link(item["assignee"]) if item.get("assignee") else "待确认"
|
||||
deadline = item.get("deadline", "未指定")
|
||||
priority = item.get("priority", "中")
|
||||
status_emoji = "✅" if item.get("status") == "已完成" else "🔄"
|
||||
lines.append(f"- {status_emoji} **{task}** | 负责人: {assignee_link} | 截止: {deadline} | 优先级: {priority}")
|
||||
history = item.get("_history", [])
|
||||
if len(history) > 1:
|
||||
for h in history:
|
||||
icon = "✅" if h.get("status") == "已完成" else "🔄"
|
||||
lines.append(f" - {h.get('date', '')}: {icon} {h.get('status', '')} (优先级: {h.get('priority', '')})")
|
||||
lines.append("")
|
||||
|
||||
if data.get("metrics"):
|
||||
lines.append("## 指标跟踪")
|
||||
lines.append("| 指标 | 当前值 | 目标值 | 趋势 | 负责人 |")
|
||||
lines.append("|------|--------|--------|------|--------|")
|
||||
for m in data["metrics"]:
|
||||
trend_icon = {"向好": "📈", "持平": "➡️", "恶化": "📉"}.get(m.get("trend", ""), "")
|
||||
owner_link = self._entity_link(m["owner"]) if m.get("owner") else "-"
|
||||
lines.append(f"| {m['metric_name']} | {m.get('value', '')} | {m.get('target', '')} | {trend_icon}{m.get('trend', '')} | {owner_link} |")
|
||||
lines.append("")
|
||||
|
||||
if data.get("decisions"):
|
||||
lines.append("## 决策记录")
|
||||
for d in data["decisions"]:
|
||||
proposer_link = self._entity_link(d["proposer"]) if d.get("proposer") else "-"
|
||||
status_badge = "✅ 已决" if d.get("status") == "已决" else "⏳ 待定"
|
||||
lines.append(f"- {status_badge} {d['content']} ({proposer_link})")
|
||||
lines.append("")
|
||||
|
||||
if data.get("relations"):
|
||||
lines.append("## 关系图谱")
|
||||
for r in data["relations"]:
|
||||
sub = r.get("subject", "")
|
||||
obj = r.get("object", "")
|
||||
pred = r.get("predicate", "")
|
||||
if sub and obj:
|
||||
lines.append(f"- {self._entity_link(sub)} → **{pred}** → {self._entity_link(obj)}")
|
||||
lines.append("")
|
||||
|
||||
lines.append("---")
|
||||
date_tag = data.get("date", "").replace("-", "/")
|
||||
lines.append(f"#meeting #{date_tag}")
|
||||
|
||||
return "\n".join(lines)
|
||||
|
||||
def _create_all_entity_notes(self, data: dict):
|
||||
seen: Set[str] = set()
|
||||
|
||||
for entity in data.get("entities", []):
|
||||
name = entity.get("name", "")
|
||||
if name and name not in seen:
|
||||
self._upsert_entity_note(name, entity.get("entity_type", "实体"), entity.get("description", ""), data)
|
||||
seen.add(name)
|
||||
|
||||
for participant in data.get("participants", []):
|
||||
if participant and participant not in seen:
|
||||
self._upsert_entity_note(participant, "人物", f"{participant} (参会人)", data)
|
||||
seen.add(participant)
|
||||
|
||||
for rel in data.get("relations", []):
|
||||
for key in ["subject", "object"]:
|
||||
name = rel.get(key, "")
|
||||
if name and name not in seen:
|
||||
etype = rel.get(f"{key}_type", "实体")
|
||||
self._upsert_entity_note(name, etype, "", data)
|
||||
seen.add(name)
|
||||
|
||||
for item in data.get("action_items", []):
|
||||
name = item.get("assignee", "")
|
||||
if name and name not in seen:
|
||||
self._upsert_entity_note(name, "人物", f"{name} (行动项负责人)", data)
|
||||
seen.add(name)
|
||||
|
||||
for m in data.get("metrics", []):
|
||||
name = m.get("owner", "")
|
||||
if name and name not in seen:
|
||||
self._upsert_entity_note(name, "人物", f"{name} (指标负责人)", data)
|
||||
seen.add(name)
|
||||
|
||||
for d in data.get("decisions", []):
|
||||
name = d.get("proposer", "")
|
||||
if name and name not in seen:
|
||||
self._upsert_entity_note(name, "人物", f"{name} (决策提出人)", data)
|
||||
seen.add(name)
|
||||
|
||||
def _upsert_entity_note(self, name: str, entity_type: str, description: str, meeting_data: dict):
|
||||
filepath = self._entity_path(name)
|
||||
meeting_link = self._meeting_link(meeting_data)
|
||||
|
||||
if os.path.exists(filepath):
|
||||
with open(filepath, "r", encoding="utf-8") as f:
|
||||
existing = f.read()
|
||||
|
||||
if meeting_link not in existing:
|
||||
idx = existing.find("## 相关会议")
|
||||
if idx > 0:
|
||||
section_end = existing.find("\n## ", idx + 10)
|
||||
if section_end < 0:
|
||||
section_end = len(existing)
|
||||
new_section = existing[idx:section_end].rstrip() + f"\n - {meeting_link}\n"
|
||||
existing = existing[:idx] + new_section + existing[section_end:]
|
||||
else:
|
||||
existing = existing.rstrip() + f"\n\n## 相关会议\n- {meeting_link}\n"
|
||||
|
||||
self._upsert_entity_action_items(existing, meeting_data, name, meeting_link, filepath)
|
||||
return
|
||||
|
||||
rel_lines = []
|
||||
for r in meeting_data.get("relations", []):
|
||||
if r.get("subject") == name and r.get("object"):
|
||||
rel_lines.append(f"- → **{r['predicate']}** → {self._entity_link(r['object'])}")
|
||||
elif r.get("object") == name and r.get("subject"):
|
||||
rel_lines.append(f"- {self._entity_link(r['subject'])} → **{r['predicate']}** →")
|
||||
|
||||
action_lines = []
|
||||
for item in meeting_data.get("action_items", []):
|
||||
if item.get("assignee") == name:
|
||||
task = item.get("task", "")
|
||||
status_emoji = "✅" if item.get("status") == "已完成" else "🔄"
|
||||
action_lines.append(f"- {status_emoji} {task} (状态: {item.get('status', '待办')}, 源自: {meeting_link})")
|
||||
history = item.get("_history", [])
|
||||
if len(history) > 1:
|
||||
for h in history:
|
||||
icon = "✅" if h.get("status") == "已完成" else "🔄"
|
||||
action_lines.append(f" - {h.get('date', '')}: {icon} {h.get('status', '')}")
|
||||
|
||||
content = f"""---
|
||||
type: {_sanitize_filename(entity_type)}
|
||||
entity_type: "{entity_type}"
|
||||
tags: [entity, {_sanitize_filename(entity_type)}]
|
||||
---
|
||||
|
||||
# {name}
|
||||
|
||||
**类型**: {entity_type}
|
||||
**描述**: {description}
|
||||
|
||||
## 相关会议
|
||||
- {meeting_link}
|
||||
|
||||
## 关系
|
||||
{chr(10).join(rel_lines) if rel_lines else "(暂无)"}
|
||||
|
||||
## 行动项
|
||||
{chr(10).join(action_lines) if action_lines else "(暂无)"}
|
||||
"""
|
||||
with open(filepath, "w", encoding="utf-8") as f:
|
||||
f.write(content)
|
||||
|
||||
def _upsert_entity_action_items(self, existing: str, meeting_data: dict, entity_name: str, meeting_link: str, filepath: str):
|
||||
action_lines = []
|
||||
for item in meeting_data.get("action_items", []):
|
||||
if item.get("assignee") == entity_name:
|
||||
task = item.get("task", "")
|
||||
status_emoji = "✅" if item.get("status") == "已完成" else "🔄"
|
||||
action_lines.append(f"- {status_emoji} {task} (状态: {item.get('status', '待办')}, 源自: {meeting_link})")
|
||||
history = item.get("_history", [])
|
||||
if len(history) > 1:
|
||||
for h in history:
|
||||
icon = "✅" if h.get("status") == "已完成" else "🔄"
|
||||
action_lines.append(f" - {h.get('date', '')}: {icon} {h.get('status', '')}")
|
||||
|
||||
action_section = "\n".join(action_lines) if action_lines else "(暂无)"
|
||||
|
||||
idx = existing.find("## 行动项")
|
||||
if idx > 0:
|
||||
section_end = existing.find("\n## ", idx + 10)
|
||||
if section_end < 0:
|
||||
section_end = len(existing)
|
||||
new_section = f"## 行动项\n{action_section}"
|
||||
existing = existing[:idx] + new_section + existing[section_end:]
|
||||
else:
|
||||
existing = existing.rstrip() + f"\n\n## 行动项\n{action_section}\n"
|
||||
|
||||
with open(filepath, "w", encoding="utf-8") as f:
|
||||
f.write(existing)
|
||||
|
||||
def _update_graph_moc(self):
|
||||
meetings = [f for f in os.listdir(self.meetings_dir) if f.endswith(".md")]
|
||||
entities = [f for f in os.listdir(self.entities_dir) if f.endswith(".md")]
|
||||
|
||||
lines = []
|
||||
lines.append("---")
|
||||
lines.append("tags: [moc, graph]")
|
||||
lines.append("---")
|
||||
lines.append("")
|
||||
lines.append("# 知识图谱总览")
|
||||
lines.append("")
|
||||
lines.append("## 统计")
|
||||
lines.append(f"- **会议数量**: {len(meetings)}")
|
||||
lines.append(f"- **实体数量**: {len(entities)}")
|
||||
lines.append("")
|
||||
lines.append("## 最近会议")
|
||||
for m in sorted(meetings, reverse=True)[:10]:
|
||||
name = m.replace(".md", "")
|
||||
link_text = name[11:] if len(name) > 11 else name
|
||||
lines.append(f"- [[Meetings/{name}|{link_text}]]")
|
||||
lines.append("")
|
||||
lines.append("## 实体索引")
|
||||
for e in sorted(entities):
|
||||
name = e.replace(".md", "")
|
||||
lines.append(f"- [[Entities/{name}|{name}]]")
|
||||
|
||||
with open(os.path.join(self.graphs_dir, "知识图谱总览.md"), "w", encoding="utf-8") as f:
|
||||
f.write("\n".join(lines))
|
||||
|
||||
def rebuild_vault(self, meetings_data: List[dict]):
|
||||
import shutil
|
||||
if os.path.exists(self.vault_path):
|
||||
shutil.rmtree(self.vault_path)
|
||||
self._ensure_dirs()
|
||||
self._ensure_obsidian_config()
|
||||
|
||||
for md in meetings_data:
|
||||
self.add_meeting(md, md.get("_original_text", ""))
|
||||
|
||||
|
||||
obsidian_manager = ObsidianVaultManager()
|
||||
|
|
@ -0,0 +1,10 @@
|
|||
{
|
||||
"alwaysUpdateLinks": true,
|
||||
"newFileLocation": "current",
|
||||
"useMarkdownLinks": true,
|
||||
"defaultViewMode": "preview",
|
||||
"livePreview": false,
|
||||
"promptDelete": false,
|
||||
"readableLineLength": true,
|
||||
"showInlineTitle": true
|
||||
}
|
||||
|
|
@ -0,0 +1,3 @@
|
|||
{
|
||||
"cssTheme": "Atom"
|
||||
}
|
||||
|
|
@ -0,0 +1,34 @@
|
|||
{
|
||||
"file-explorer": true,
|
||||
"graph": true,
|
||||
"backlink": true,
|
||||
"tag-pane": true,
|
||||
"page-preview": true,
|
||||
"templates": true,
|
||||
"search": true,
|
||||
"global-search": true,
|
||||
"switcher": true,
|
||||
"canvas": true,
|
||||
"outgoing-link": true,
|
||||
"footnotes": false,
|
||||
"properties": true,
|
||||
"daily-notes": true,
|
||||
"note-composer": true,
|
||||
"command-palette": true,
|
||||
"slash-command": false,
|
||||
"editor-status": true,
|
||||
"bookmarks": true,
|
||||
"markdown-importer": false,
|
||||
"zk-prefixer": false,
|
||||
"random-note": false,
|
||||
"outline": true,
|
||||
"word-count": true,
|
||||
"slides": false,
|
||||
"audio-recorder": false,
|
||||
"workspaces": false,
|
||||
"file-recovery": true,
|
||||
"publish": false,
|
||||
"sync": true,
|
||||
"bases": true,
|
||||
"webviewer": false
|
||||
}
|
||||
|
|
@ -0,0 +1,22 @@
|
|||
{
|
||||
"collapse-filter": true,
|
||||
"search": "",
|
||||
"showTags": false,
|
||||
"showAttachments": false,
|
||||
"hideUnresolved": false,
|
||||
"showOrphans": true,
|
||||
"collapse-color-groups": true,
|
||||
"colorGroups": [],
|
||||
"collapse-display": true,
|
||||
"showArrow": false,
|
||||
"textFadeMultiplier": 0,
|
||||
"nodeSizeMultiplier": 1,
|
||||
"lineSizeMultiplier": 1,
|
||||
"collapse-forces": true,
|
||||
"centerStrength": 0.518713248970312,
|
||||
"repelStrength": 10,
|
||||
"linkStrength": 1,
|
||||
"linkDistance": 250,
|
||||
"scale": 0.7132754626224356,
|
||||
"close": true
|
||||
}
|
||||
|
|
@ -0,0 +1,6 @@
|
|||
{
|
||||
"name": "Atom",
|
||||
"version": "0.0.0",
|
||||
"minAppVersion": "0.16.0",
|
||||
"author": "kognise"
|
||||
}
|
||||
|
|
@ -0,0 +1,349 @@
|
|||
/* Base colors - TODO: are grey 1 and grey 2 used? */
|
||||
.theme-dark {
|
||||
--accent-h: 219;
|
||||
--accent-s: 56%;
|
||||
--accent-l: 55%;
|
||||
|
||||
--background-primary: #272b34;
|
||||
--background-primary-alt: #20242b;
|
||||
--background-secondary: #20242b;
|
||||
--background-secondary-alt: #1a1e24;
|
||||
--background-accent: #000;
|
||||
--background-modifier-border: #424958;
|
||||
--background-modifier-form-field: rgba(0, 0, 0, 0.3);
|
||||
--background-modifier-form-field-highlighted: rgba(0, 0, 0, 0.22);
|
||||
--background-modifier-box-shadow: rgba(0, 0, 0, 0.3);
|
||||
--background-modifier-success: #539126;
|
||||
--background-modifier-error: #3d0000;
|
||||
--background-modifier-error-rgb: 61, 0, 0;
|
||||
--background-modifier-error-hover: #470000;
|
||||
--background-modifier-cover: rgba(0, 0, 0, 0.6);
|
||||
--text-accent: #61afef;
|
||||
--text-accent-hover: #70bdfc;
|
||||
--text-normal: #dcddde;
|
||||
--text-muted: #888;
|
||||
--text-faint: rgb(81, 86, 99);
|
||||
--text-error: #e16d76;
|
||||
--text-error-hover: #c9626a;
|
||||
--text-highlight-bg: rgba(255, 255, 0, 0.4);
|
||||
--text-selection: rgba(0, 122, 255, 0.2);
|
||||
--text-on-accent: #dcddde;
|
||||
--interactive-normal: #20242b;
|
||||
--interactive-hover: #353b47;
|
||||
--interactive-accent-hover: hsl(var(--accent-h), calc(var(--accent-s) + 5%), calc(var(--accent-l) - 10%));
|
||||
--scrollbar-active-thumb-bg: rgba(255, 255, 255, 0.2);
|
||||
--scrollbar-bg: rgba(255, 255, 255, 0.05);
|
||||
--scrollbar-thumb-bg: rgba(255, 255, 255, 0.1);
|
||||
--panel-border-color: #18191e;
|
||||
--gray-1: #5C6370;
|
||||
--gray-2: #abb2bf;
|
||||
--red: #e06c75;
|
||||
--orange: #d19a66;
|
||||
--green: #98c379;
|
||||
--aqua: #56b6c2;
|
||||
--purple: #c678dd;
|
||||
--blue: #61afef;
|
||||
--yellow: #e5c07b;
|
||||
|
||||
--nav-item-color-active: #ffffff;
|
||||
|
||||
--background-modifier-hover: hsla(var(--accent-h), calc(var(--accent-s) - 35%), var(--accent-l), 0.06);
|
||||
--divider-color-hover: #404754;
|
||||
}
|
||||
|
||||
.theme-light {
|
||||
--accent-h: 230;
|
||||
--accent-s: 83%;
|
||||
--accent-l: 64%;
|
||||
|
||||
--background-primary: #fafafa;
|
||||
--background-primary-alt: #eaeaeb;
|
||||
--background-secondary: #eaeaeb;
|
||||
--background-secondary-alt: #dbdbdc;
|
||||
--background-accent: #fff;
|
||||
--background-modifier-border: #dbdbdc;
|
||||
--background-modifier-form-field: #fff;
|
||||
--background-modifier-form-field-highlighted: #fff;
|
||||
--background-modifier-box-shadow: rgba(0, 0, 0, 0.1);
|
||||
--background-modifier-success: #A4E7C3;
|
||||
--background-modifier-error: #e68787;
|
||||
--background-modifier-error-rgb: 230, 135, 135;
|
||||
--background-modifier-error-hover: #FF9494;
|
||||
--background-modifier-cover: rgba(0, 0, 0, 0.8);
|
||||
--text-accent: #1592ff;
|
||||
--text-accent-hover: #097add;
|
||||
--text-normal: #383a42;
|
||||
--text-muted: #8e8e90;
|
||||
--text-faint: #999999;
|
||||
--text-error: #e75545;
|
||||
--text-error-hover: #f86959;
|
||||
--text-highlight-bg: rgba(255, 255, 0, 0.4);
|
||||
--text-selection: rgba(0, 122, 255, 0.15);
|
||||
--text-on-accent: #f2f2f2;
|
||||
--interactive-normal: #eaeaeb;
|
||||
--interactive-hover: #dbdbdc;
|
||||
--interactive-accent-rgb: 21, 146, 255;
|
||||
--interactive-accent-hover: hsl(var(--accent-h), calc(var(--accent-s) - 10%), calc(var(--accent-l) - 4%));
|
||||
--scrollbar-active-thumb-bg: rgba(0, 0, 0, 0.2);
|
||||
--scrollbar-bg: rgba(0, 0, 0, 0.05);
|
||||
--scrollbar-thumb-bg: rgba(0, 0, 0, 0.1);
|
||||
--panel-border-color: #dbdbdc;
|
||||
--gray-1: #383a42;
|
||||
--gray-2: #383a42;
|
||||
--red: #e75545;
|
||||
--green: #4ea24c;
|
||||
--blue: #3d74f6;
|
||||
--purple: #a625a4;
|
||||
--aqua: #0084bc;
|
||||
--yellow: #e35649;
|
||||
--orange: #986800;
|
||||
|
||||
--nav-item-color-active: var(--text-normal);
|
||||
}
|
||||
|
||||
.theme-dark, .theme-light {
|
||||
--ribbon-background: var(--background-primary);
|
||||
--drag-ghost-background: var(--background-secondary-alt);
|
||||
--background-modifier-message: var(--background-secondary-alt);
|
||||
|
||||
--tab-outline-color: transparent;
|
||||
--divider-color: transparent;
|
||||
|
||||
--prompt-border-color: var(--panel-border-color);
|
||||
--modal-border-color: var(--panel-border-color);
|
||||
|
||||
--background-modifier-border-hover: var(--interactive-hover);
|
||||
--background-modifier-border-focus: var(--interactive-hover);
|
||||
|
||||
--checkbox-color: var(--text-accent);
|
||||
--checkbox-color-hover: var(--text-accent-hover);
|
||||
|
||||
--nav-item-background-active: var(--interactive-accent);
|
||||
|
||||
--tag-color: var(--yellow);
|
||||
--tag-background: var(--background-primary-alt);
|
||||
--tag-color-hover: var(--yellow);
|
||||
--tag-background-hover: var(--background-primary-alt);
|
||||
--tag-padding-x: 4px;
|
||||
--tag-padding-y: 2px;
|
||||
--tag-radius: 4px;
|
||||
|
||||
--inline-title-weight: var(--bold-weight);
|
||||
--link-decoration: none;
|
||||
--link-external-decoration: none;
|
||||
--embed-padding: 0 0 0 var(--size-4-4);
|
||||
}
|
||||
|
||||
/* Search */
|
||||
.search-result .search-result-file-title {
|
||||
cursor: pointer;
|
||||
}
|
||||
|
||||
.search-result .collapse-icon {
|
||||
cursor: var(--cursor);
|
||||
}
|
||||
|
||||
.search-result:not(.is-collapsed) .search-result-file-title {
|
||||
color: var(--blue);
|
||||
}
|
||||
|
||||
/* File tab separators */
|
||||
.workspace .mod-root .workspace-tab-header-inner::after {
|
||||
right: unset;
|
||||
left: -0.5px;
|
||||
}
|
||||
|
||||
.workspace .mod-root .workspace-tab-header:last-child .workspace-tab-header-inner::before {
|
||||
position: absolute;
|
||||
right: -0.5px;
|
||||
width: 1px;
|
||||
background-color: var(--tab-divider-color);
|
||||
content: '';
|
||||
height: 20px;
|
||||
}
|
||||
|
||||
.workspace .mod-root .workspace-tab-header.is-active .workspace-tab-header-inner::after,
|
||||
.workspace .mod-root .workspace-tab-header.is-active .workspace-tab-header-inner::before,
|
||||
.workspace .mod-root .workspace-tab-header:first-child .workspace-tab-header-inner::after,
|
||||
.workspace .mod-root .workspace-tab-header.is-active + .workspace-tab-header .workspace-tab-header-inner::after {
|
||||
opacity: 0;
|
||||
}
|
||||
|
||||
/* Editor and output */
|
||||
.markdown-rendered blockquote {
|
||||
padding: var(--embed-padding);
|
||||
}
|
||||
|
||||
mjx-container {
|
||||
text-align: left !important;
|
||||
}
|
||||
|
||||
.math-block {
|
||||
font-size: 1.3em;
|
||||
}
|
||||
|
||||
.theme-light :not(pre)>code,
|
||||
.theme-light pre {
|
||||
background: var(--background-primary);
|
||||
box-shadow: inset 0 0 0 1px var(--background-primary-alt);
|
||||
border-radius: 4px;
|
||||
}
|
||||
|
||||
.markdown-preview-section > div h1,
|
||||
.markdown-preview-section > div h2,
|
||||
.markdown-preview-section > div h3,
|
||||
.markdown-preview-section > div h4,
|
||||
.markdown-preview-section > div h5,
|
||||
.markdown-preview-section > div h6 {
|
||||
margin-top: 40px;
|
||||
}
|
||||
|
||||
.mod-header + div h1,
|
||||
.mod-header + div h2,
|
||||
.mod-header + div h3,
|
||||
.mod-header + div h4,
|
||||
.mod-header + div h5,
|
||||
.mod-header + div h6 {
|
||||
margin-top: 30px;
|
||||
}
|
||||
|
||||
.cm-sizer > .inline-title {
|
||||
margin-bottom: 20px;
|
||||
}
|
||||
|
||||
/* Miscellaneous */
|
||||
.theme-dark .dropdown:hover {
|
||||
background-color: var(--background-modifier-form-field);
|
||||
}
|
||||
|
||||
.tooltip {
|
||||
color: var(--text-muted);
|
||||
}
|
||||
|
||||
.nav-file, .nav-folder {
|
||||
padding: 1px 2px;
|
||||
}
|
||||
|
||||
body:not(.is-grabbing) .nav-file-title.is-being-dragged,
|
||||
body:not(.is-grabbing) .nav-folder-title.is-being-dragged,
|
||||
.nav-file-title.is-being-dragged,
|
||||
.nav-folder-title.is-being-dragged {
|
||||
background-color: var(--background-primary-alt);
|
||||
color: var(--nav-item-color);
|
||||
}
|
||||
|
||||
.view-header-title {
|
||||
text-decoration: underline;
|
||||
text-decoration-color: var(--text-muted);
|
||||
text-underline-offset: 1.5px;
|
||||
}
|
||||
|
||||
.status-bar {
|
||||
border-color: var(--panel-border-color);
|
||||
border-width: 1px;
|
||||
padding: 4px 8px;
|
||||
}
|
||||
|
||||
.theme-dark button.mod-warning {
|
||||
--background-modifier-error: #d42020;
|
||||
--background-modifier-error-hover: #b01515;
|
||||
}
|
||||
|
||||
.theme-light button.mod-warning {
|
||||
--background-modifier-error: #f23f3f;
|
||||
--background-modifier-error-hover: #d72020;
|
||||
}
|
||||
|
||||
/* Code syntax highlighting */
|
||||
code[class*='language-'], pre[class*='language-'] {
|
||||
text-align: left !important;
|
||||
white-space: pre !important;
|
||||
word-spacing: normal !important;
|
||||
word-break: normal !important;
|
||||
word-wrap: normal !important;
|
||||
line-height: 1.5 !important;
|
||||
tab-size: 4 !important;
|
||||
hyphens: none !important;
|
||||
}
|
||||
|
||||
/* Wrap code when exporting PDF */
|
||||
@media print {
|
||||
code[class*='language-'] {
|
||||
white-space: pre-wrap !important;
|
||||
}
|
||||
}
|
||||
|
||||
pre[class*='language-'] {
|
||||
/* Code blocks */
|
||||
padding: 1em !important;
|
||||
margin: .5em 0 !important;
|
||||
overflow: auto !important;
|
||||
}
|
||||
|
||||
:not(pre)>code[class*='language-'] {
|
||||
/* Inline code */
|
||||
padding: .1em !important;
|
||||
border-radius: .3em !important;
|
||||
white-space: normal !important;
|
||||
}
|
||||
|
||||
.token.comment, .token.prolog, .token.doctype, .token.cdata,
|
||||
.HyperMD-codeblock .cm-comment {
|
||||
color: var(--gray-1) !important;
|
||||
}
|
||||
|
||||
.token.punctuation,
|
||||
.HyperMD-codeblock .cm-hmd-codeblock, .HyperMD-codeblock .cm-bracket {
|
||||
color: var(--gray-2) !important;
|
||||
}
|
||||
|
||||
.token.selector, .token.tag,
|
||||
.HyperMD-codeblock .cm-tag, .HyperMD-codeblock .cm-property, .HyperMD-codeblock .cm-meta, .HyperMD-codeblock .cm-qualifier, .HyperMD-codeblock .cm-header, .HyperMD-codeblock .cm-quote, .HyperMD-codeblock .cm-hr, .HyperMD-codeblock .cm-link {
|
||||
color: var(--red) !important;
|
||||
}
|
||||
|
||||
.token.property, .token.boolean, .token.number, .token.constant, .token.symbol, .token.attr-name, .token.deleted,
|
||||
.HyperMD-codeblock .cm-number, .HyperMD-codeblock .cm-atom, .HyperMD-codeblock .cm-attribute {
|
||||
color: var(--orange) !important;
|
||||
}
|
||||
|
||||
.token.string, .token.char, .token.attr-value, .token.builtin, .token.inserted,
|
||||
.HyperMD-codeblock .cm-string, .HyperMD-codeblock .cm-builtin {
|
||||
color: var(--green) !important;
|
||||
}
|
||||
|
||||
.token.operator, .token.entity, .token.url, .language-css .token.string, .style .token.string,
|
||||
.HyperMD-codeblock .cm-string-2, .HyperMD-codeblock .cm-operator {
|
||||
color: var(--aqua) !important;
|
||||
}
|
||||
|
||||
.token.atrule, .token.keyword,
|
||||
.HyperMD-codeblock .cm-keyword {
|
||||
color: var(--purple) !important;
|
||||
}
|
||||
|
||||
.token.function, .token.macro.property,
|
||||
.HyperMD-codeblock .cm-def, .HyperMD-codeblock .cm-variable {
|
||||
color: var(--blue) !important;
|
||||
}
|
||||
|
||||
.token.class-name,
|
||||
.HyperMD-codeblock .cm-variable-2, .HyperMD-codeblock .cm-variable-3 {
|
||||
color: var(--yellow) !important;
|
||||
}
|
||||
|
||||
.token.regex, .token.important, .token.variable {
|
||||
color: var(--purple) !important;
|
||||
}
|
||||
|
||||
.token.important, .token.bold {
|
||||
font-weight: bold !important;
|
||||
}
|
||||
|
||||
.token.italic {
|
||||
font-style: italic !important;
|
||||
}
|
||||
|
||||
.token.entity {
|
||||
cursor: help !important;
|
||||
}
|
||||
|
|
@ -0,0 +1,11 @@
|
|||
{
|
||||
"name": "Shimmering Focus",
|
||||
"author": "pseudometa aka Chris Grieser",
|
||||
"description": "A minimalistic and opinionated Obsidian theme for the keyboard-centric user.",
|
||||
"version": "5.74.0.0",
|
||||
"minAppVersion": "1.6.0",
|
||||
"authorUrl": "https://chris-grieser.de/",
|
||||
"fundingUrl": {
|
||||
"PayPal": "https://github.com/sponsors/chrisgrieser"
|
||||
}
|
||||
}
|
||||
File diff suppressed because one or more lines are too long
|
|
@ -0,0 +1,219 @@
|
|||
{
|
||||
"main": {
|
||||
"id": "f2deca1cbdbb5274",
|
||||
"type": "split",
|
||||
"children": [
|
||||
{
|
||||
"id": "db01802d2d6d7dd1",
|
||||
"type": "tabs",
|
||||
"children": [
|
||||
{
|
||||
"id": "503bd095373c5a16",
|
||||
"type": "leaf",
|
||||
"state": {
|
||||
"type": "graph",
|
||||
"state": {},
|
||||
"icon": "lucide-git-fork",
|
||||
"title": "关系图谱"
|
||||
}
|
||||
}
|
||||
]
|
||||
}
|
||||
],
|
||||
"direction": "vertical"
|
||||
},
|
||||
"left": {
|
||||
"id": "06b9ad1d92cf867a",
|
||||
"type": "split",
|
||||
"children": [
|
||||
{
|
||||
"id": "d3f9cb7cb1d78709",
|
||||
"type": "tabs",
|
||||
"children": [
|
||||
{
|
||||
"id": "e5684088f3ceec73",
|
||||
"type": "leaf",
|
||||
"state": {
|
||||
"type": "file-explorer",
|
||||
"state": {
|
||||
"sortOrder": "alphabetical",
|
||||
"autoReveal": false
|
||||
},
|
||||
"icon": "lucide-folder-closed",
|
||||
"title": "文件列表"
|
||||
}
|
||||
},
|
||||
{
|
||||
"id": "72982fd9edce03df",
|
||||
"type": "leaf",
|
||||
"state": {
|
||||
"type": "search",
|
||||
"state": {
|
||||
"query": "",
|
||||
"matchingCase": false,
|
||||
"explainSearch": false,
|
||||
"collapseAll": false,
|
||||
"extraContext": false,
|
||||
"sortOrder": "alphabetical"
|
||||
},
|
||||
"icon": "lucide-search",
|
||||
"title": "搜索"
|
||||
}
|
||||
},
|
||||
{
|
||||
"id": "9e2bc0e842ce8810",
|
||||
"type": "leaf",
|
||||
"state": {
|
||||
"type": "bookmarks",
|
||||
"state": {},
|
||||
"icon": "lucide-bookmark",
|
||||
"title": "书签"
|
||||
}
|
||||
}
|
||||
]
|
||||
}
|
||||
],
|
||||
"direction": "horizontal",
|
||||
"width": 230.5
|
||||
},
|
||||
"right": {
|
||||
"id": "6c168d5302964694",
|
||||
"type": "split",
|
||||
"children": [
|
||||
{
|
||||
"id": "568ef49d4c3bb1c2",
|
||||
"type": "tabs",
|
||||
"children": [
|
||||
{
|
||||
"id": "dca44025f45b23bc",
|
||||
"type": "leaf",
|
||||
"state": {
|
||||
"type": "backlink",
|
||||
"state": {
|
||||
"file": "Meetings/2026-05-06_合川分公司周例会(2026第X期).md",
|
||||
"collapseAll": false,
|
||||
"extraContext": false,
|
||||
"sortOrder": "alphabetical",
|
||||
"showSearch": false,
|
||||
"searchQuery": "",
|
||||
"backlinkCollapsed": false,
|
||||
"unlinkedCollapsed": true
|
||||
},
|
||||
"icon": "links-coming-in",
|
||||
"title": "2026-05-06_合川分公司周例会(2026第X期) 的反向链接列表"
|
||||
}
|
||||
},
|
||||
{
|
||||
"id": "9a092d05427830e0",
|
||||
"type": "leaf",
|
||||
"state": {
|
||||
"type": "outgoing-link",
|
||||
"state": {
|
||||
"file": "Meetings/2026-05-06_合川分公司周例会(2026第X期).md",
|
||||
"linksCollapsed": false,
|
||||
"unlinkedCollapsed": true
|
||||
},
|
||||
"icon": "links-going-out",
|
||||
"title": "2026-05-06_合川分公司周例会(2026第X期) 的出链列表"
|
||||
}
|
||||
},
|
||||
{
|
||||
"id": "a79abdd5394475f3",
|
||||
"type": "leaf",
|
||||
"state": {
|
||||
"type": "tag",
|
||||
"state": {
|
||||
"sortOrder": "frequency",
|
||||
"useHierarchy": true,
|
||||
"showSearch": false,
|
||||
"searchQuery": ""
|
||||
},
|
||||
"icon": "lucide-tags",
|
||||
"title": "标签"
|
||||
}
|
||||
},
|
||||
{
|
||||
"id": "f1ffa28736fd176e",
|
||||
"type": "leaf",
|
||||
"state": {
|
||||
"type": "all-properties",
|
||||
"state": {
|
||||
"sortOrder": "frequency",
|
||||
"showSearch": false,
|
||||
"searchQuery": ""
|
||||
},
|
||||
"icon": "lucide-archive",
|
||||
"title": "添加笔记属性"
|
||||
}
|
||||
},
|
||||
{
|
||||
"id": "c89d13371ebc85d3",
|
||||
"type": "leaf",
|
||||
"state": {
|
||||
"type": "outline",
|
||||
"state": {
|
||||
"file": "Meetings/2026-05-06_合川分公司周例会(2026第X期).md",
|
||||
"followCursor": false,
|
||||
"showSearch": false,
|
||||
"searchQuery": ""
|
||||
},
|
||||
"icon": "lucide-list",
|
||||
"title": "2026-05-06_合川分公司周例会(2026第X期) 的大纲"
|
||||
}
|
||||
}
|
||||
]
|
||||
}
|
||||
],
|
||||
"direction": "horizontal",
|
||||
"width": 300,
|
||||
"collapsed": true
|
||||
},
|
||||
"left-ribbon": {
|
||||
"hiddenItems": {
|
||||
"switcher:打开快速切换": false,
|
||||
"graph:查看关系图谱": false,
|
||||
"canvas:新建白板": false,
|
||||
"daily-notes:打开/创建今天的日记": false,
|
||||
"templates:插入模板": false,
|
||||
"command-palette:打开命令面板": false,
|
||||
"bases:新建数据库": false
|
||||
}
|
||||
},
|
||||
"active": "503bd095373c5a16",
|
||||
"lastOpenFiles": [
|
||||
"Entities/二级基站拆除.md",
|
||||
"Entities/4月中旬.md",
|
||||
"Entities/各部门.md",
|
||||
"Graphs/知识图谱总览.md",
|
||||
"Entities/分公司主要领导.md",
|
||||
"Entities/商客经理.md",
|
||||
"Entities/市场部主管.md",
|
||||
"Entities/市场部.md",
|
||||
"Entities/1145户.md",
|
||||
"Entities/三期项目二期拆迁.md",
|
||||
"Entities/88.5万元.md",
|
||||
"Entities/商客市场.md",
|
||||
"Entities/工会经费与后勤保障.md",
|
||||
"Entities/6.53%.md",
|
||||
"Entities/退单率.md",
|
||||
"Entities/90%.md",
|
||||
"Entities/87.35%.md",
|
||||
"Entities/5.5.md",
|
||||
"Entities/三代终端年度目标.md",
|
||||
"Entities/0.51.md",
|
||||
"Entities/网络运维与指标管控.md",
|
||||
"Entities/市场部负责人.md",
|
||||
"Entities/综合部负责人.md",
|
||||
"Entities/建维部负责人.md",
|
||||
"Entities/相关人员.md",
|
||||
"Entities/各部门经理.md",
|
||||
"meeting_state.json",
|
||||
"未命名.canvas",
|
||||
"Raw",
|
||||
"Graphs",
|
||||
"Entities",
|
||||
"Meetings",
|
||||
"Raw/未命名.base",
|
||||
"未命名.base"
|
||||
]
|
||||
}
|
||||
|
|
@ -0,0 +1,8 @@
|
|||
openai>=1.0.0
|
||||
pydantic>=2.0.0
|
||||
llama-index>=0.10.0
|
||||
llama-index-embeddings-openai>=0.1.0
|
||||
llama-index-vector-stores-chroma>=0.1.0
|
||||
chromadb>=0.5.0
|
||||
python-dotenv>=1.0.0
|
||||
pyvis>=0.3.0
|
||||
|
|
@ -0,0 +1,259 @@
|
|||
import hashlib
|
||||
import json
|
||||
import logging
|
||||
import os
|
||||
import re
|
||||
from typing import List, Optional
|
||||
|
||||
from openai import OpenAI as OpenAI_Client
|
||||
from llama_index.core import (
|
||||
Document,
|
||||
VectorStoreIndex,
|
||||
StorageContext,
|
||||
load_index_from_storage,
|
||||
)
|
||||
from llama_index.core.embeddings import BaseEmbedding
|
||||
from llama_index.core.settings import Settings
|
||||
|
||||
from config import config
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
|
||||
class CustomOpenAIEmbedding(BaseEmbedding):
|
||||
def __init__(
|
||||
self,
|
||||
model: str = "text-embedding-ada-002",
|
||||
api_key: Optional[str] = None,
|
||||
api_base: Optional[str] = None,
|
||||
**kwargs,
|
||||
):
|
||||
super().__init__(model_name=model, **kwargs)
|
||||
self._client = OpenAI_Client(
|
||||
api_key=api_key or "not-needed",
|
||||
base_url=api_base,
|
||||
)
|
||||
self._model = model
|
||||
|
||||
async def _aget_query_embedding(self, query: str) -> List[float]:
|
||||
return self._get_embedding(query)
|
||||
|
||||
async def _aget_text_embedding(self, text: str) -> List[float]:
|
||||
return self._get_embedding(text)
|
||||
|
||||
def _get_query_embedding(self, query: str) -> List[float]:
|
||||
return self._get_embedding(query)
|
||||
|
||||
def _get_text_embedding(self, text: str) -> List[float]:
|
||||
return self._get_embedding(text)
|
||||
|
||||
def _get_embedding(self, text: str) -> List[float]:
|
||||
resp = self._client.embeddings.create(
|
||||
model=self._model,
|
||||
input=text,
|
||||
)
|
||||
return resp.data[0].embedding
|
||||
|
||||
|
||||
class MeetingVectorStore:
|
||||
def __init__(self):
|
||||
embed_model = CustomOpenAIEmbedding(
|
||||
model=config.embedding.model,
|
||||
api_key=config.embedding.api_key or None,
|
||||
api_base=config.embedding.api_base if config.embedding.api_base else None,
|
||||
)
|
||||
Settings.embed_model = embed_model
|
||||
|
||||
self.persist_dir = config.vector_store.persist_dir
|
||||
self._index: Optional[VectorStoreIndex] = None
|
||||
self._load_or_create_index()
|
||||
|
||||
def _load_or_create_index(self):
|
||||
if os.path.exists(os.path.join(self.persist_dir, "docstore.json")):
|
||||
try:
|
||||
storage_context = StorageContext.from_defaults(persist_dir=self.persist_dir)
|
||||
self._index = load_index_from_storage(storage_context)
|
||||
logger.info(f"从磁盘加载向量索引: {self.persist_dir}")
|
||||
return
|
||||
except Exception as e:
|
||||
logger.warning(f"加载向量索引失败,将创建新索引: {e}")
|
||||
|
||||
self._index = VectorStoreIndex.from_documents([])
|
||||
logger.info("创建新的向量索引")
|
||||
|
||||
def _save(self):
|
||||
if self._index:
|
||||
os.makedirs(self.persist_dir, exist_ok=True)
|
||||
self._index.storage_context.persist(persist_dir=self.persist_dir)
|
||||
|
||||
def _meeting_id(self, meeting_data: dict) -> str:
|
||||
title = meeting_data.get("title", "")
|
||||
date = meeting_data.get("date", "")
|
||||
raw = f"{date}_{title}"
|
||||
return f"meeting_{hashlib.md5(raw.encode('utf-8')).hexdigest()[:12]}"
|
||||
|
||||
def find_meeting(self, title: str, date: str = "") -> Optional[dict]:
|
||||
if not self._index:
|
||||
return None
|
||||
query_text = f"会议标题: {title}"
|
||||
if date:
|
||||
query_text += f" 日期: {date}"
|
||||
try:
|
||||
results = self.query(query_text, top_k=3)
|
||||
for r in results:
|
||||
meta = r.get("metadata", {})
|
||||
meta_title = meta.get("title", "")
|
||||
if meta_title == title or (date and meta.get("date") == date):
|
||||
return meta
|
||||
return None
|
||||
except Exception as e:
|
||||
logger.warning(f"会议查重查询失败: {e}")
|
||||
return None
|
||||
|
||||
def find_similar_text(self, text: str, threshold: float = 0.92) -> Optional[dict]:
|
||||
if not self._index:
|
||||
return None
|
||||
try:
|
||||
retriever = self._index.as_retriever(similarity_top_k=3)
|
||||
nodes = retriever.retrieve(text)
|
||||
for node in nodes:
|
||||
if node.score is not None and node.score > threshold:
|
||||
return {
|
||||
"metadata": node.metadata,
|
||||
"score": node.score,
|
||||
}
|
||||
return None
|
||||
except Exception as e:
|
||||
logger.warning(f"文本相似度查重失败: {e}")
|
||||
return None
|
||||
|
||||
def remove_meeting(self, meeting_id: str) -> bool:
|
||||
if not self._index:
|
||||
return False
|
||||
try:
|
||||
for field in self._FIELD_TYPES:
|
||||
self._index.delete_ref_doc(f"{meeting_id}_{field}")
|
||||
self._save()
|
||||
logger.info(f"已从向量索引移除会议: {meeting_id}")
|
||||
return True
|
||||
except Exception as e:
|
||||
logger.warning(f"移除向量索引失败: {e}")
|
||||
return False
|
||||
|
||||
_FIELD_TYPES = ["header", "summary", "action_items", "metrics", "decisions", "relations", "entities"]
|
||||
|
||||
def add_meeting(self, meeting_data: dict) -> bool:
|
||||
try:
|
||||
meeting_id = self._meeting_id(meeting_data)
|
||||
original_text_path = meeting_data.get("_original_text_path", "")
|
||||
original_text = meeting_data.get("_original_text", "")
|
||||
|
||||
base_metadata = {
|
||||
"title": meeting_data.get("title", ""),
|
||||
"date": meeting_data.get("date", ""),
|
||||
"participants": ", ".join(meeting_data.get("participants", [])),
|
||||
"type": "meeting",
|
||||
"content_hash": meeting_data.get("_content_hash", ""),
|
||||
"original_text_path": original_text_path,
|
||||
"original_text_excerpt": original_text[:500] if original_text else "",
|
||||
"meeting_id": meeting_id,
|
||||
}
|
||||
|
||||
docs = self._build_field_docs(meeting_data, base_metadata, meeting_id)
|
||||
|
||||
if self._index:
|
||||
for doc in docs:
|
||||
self._index.insert(doc)
|
||||
self._save()
|
||||
logger.info(f"会议 '{meeting_data.get('title')}' 已添加到向量索引 (id={meeting_id}, 字段数={len(docs)})")
|
||||
return True
|
||||
except Exception as e:
|
||||
logger.error(f"添加会议到向量索引失败: {e}")
|
||||
return False
|
||||
|
||||
def _build_field_docs(self, data: dict, base: dict, meeting_id: str) -> List[Document]:
|
||||
docs = []
|
||||
|
||||
header = f"# {data.get('title', '')}"
|
||||
if data.get("date"):
|
||||
header += f"\n日期: {data['date']}"
|
||||
if data.get("participants"):
|
||||
header += f"\n参会人: {', '.join(data['participants'])}"
|
||||
docs.append(Document(text=header, metadata={**base, "field": "header"}, doc_id=f"{meeting_id}_header"))
|
||||
|
||||
if data.get("summary"):
|
||||
docs.append(Document(text=data["summary"], metadata={**base, "field": "summary"}, doc_id=f"{meeting_id}_summary"))
|
||||
|
||||
if data.get("action_items"):
|
||||
lines = []
|
||||
for item in data["action_items"]:
|
||||
status = item.get('status', '待办')
|
||||
lines.append(f"- [{status}] {item.get('task', '')} (负责人: {item.get('assignee', '')}, 截止: {item.get('deadline', '')}, 优先级: {item.get('priority', '')})")
|
||||
history = item.get("_history", [])
|
||||
if len(history) > 1:
|
||||
lines.append(" 演变: " + " → ".join(f"{h.get('date','')}({h.get('status','')})" for h in history))
|
||||
docs.append(Document(text="\n".join(lines), metadata={**base, "field": "action_items"}, doc_id=f"{meeting_id}_action_items"))
|
||||
|
||||
if data.get("metrics"):
|
||||
lines = []
|
||||
for m in data["metrics"]:
|
||||
lines.append(f"- {m.get('metric_name', '')}: {m.get('value', '')} (目标: {m.get('target', '')}, 趋势: {m.get('trend', '')})")
|
||||
docs.append(Document(text="\n".join(lines), metadata={**base, "field": "metrics"}, doc_id=f"{meeting_id}_metrics"))
|
||||
|
||||
if data.get("decisions"):
|
||||
lines = [f"- {d.get('content', '')}" for d in data["decisions"]]
|
||||
docs.append(Document(text="\n".join(lines), metadata={**base, "field": "decisions"}, doc_id=f"{meeting_id}_decisions"))
|
||||
|
||||
if data.get("relations"):
|
||||
lines = [f"- {r.get('subject', '')} --{r.get('predicate', '')}--> {r.get('object', '')}" for r in data["relations"]]
|
||||
docs.append(Document(text="\n".join(lines), metadata={**base, "field": "relations"}, doc_id=f"{meeting_id}_relations"))
|
||||
|
||||
if data.get("entities"):
|
||||
lines = [f"- [{e.get('entity_type', '')}] {e.get('name', '')}: {e.get('description', '')}" for e in data["entities"]]
|
||||
docs.append(Document(text="\n".join(lines), metadata={**base, "field": "entities"}, doc_id=f"{meeting_id}_entities"))
|
||||
|
||||
return docs
|
||||
|
||||
def query(self, question: str, top_k: int = 5) -> List[dict]:
|
||||
if not self._index:
|
||||
return []
|
||||
try:
|
||||
retriever = self._index.as_retriever(similarity_top_k=top_k)
|
||||
nodes = retriever.retrieve(question)
|
||||
results = []
|
||||
for node in nodes:
|
||||
results.append({
|
||||
"text": node.text,
|
||||
"score": node.score,
|
||||
"metadata": node.metadata,
|
||||
})
|
||||
return results
|
||||
except Exception as e:
|
||||
logger.error(f"查询向量索引失败: {e}")
|
||||
return []
|
||||
|
||||
def query_as_context(self, question: str, top_k: int = 3) -> str:
|
||||
results = self.query(question, top_k=top_k)
|
||||
if not results:
|
||||
return ""
|
||||
parts = []
|
||||
for i, r in enumerate(results):
|
||||
metadata = r.get("metadata", {})
|
||||
parts.append(f"[{i+1}] {metadata.get('title', '未知会议')} ({metadata.get('date', '')})\n{r['text']}\n")
|
||||
return "\n".join(parts)
|
||||
|
||||
def get_stats(self) -> dict:
|
||||
if not self._index:
|
||||
return {"doc_count": 0, "node_count": 0}
|
||||
try:
|
||||
docstore = self._index.docstore
|
||||
docs = list(docstore.docs.values()) if hasattr(docstore, 'docs') else []
|
||||
return {
|
||||
"doc_count": len(docstore.docs) if hasattr(docstore, 'docs') else 0,
|
||||
"node_count": len(docs),
|
||||
}
|
||||
except Exception:
|
||||
return {"doc_count": 0, "node_count": 0}
|
||||
|
||||
|
||||
meeting_vector_store = MeetingVectorStore()
|
||||
Loading…
Reference in New Issue