682 lines
16 KiB
Markdown
682 lines
16 KiB
Markdown
|
|
# imeeting 录音AI解析功能技术方案
|
|||
|
|
|
|||
|
|
## 需求分析
|
|||
|
|
|
|||
|
|
用户上传完整的会议录音文件,系统需要完成以下处理:
|
|||
|
|
|
|||
|
|
1. **人声分离**:识别和分离不同的说话人
|
|||
|
|
2. **语音转文字**:将音频内容转换为文字转录
|
|||
|
|
3. **时间戳标记**:为每段对话标记时间戳
|
|||
|
|
4. **智能摘要**:生成会议摘要和关键信息提取
|
|||
|
|
|
|||
|
|
## 技术选择对比
|
|||
|
|
|
|||
|
|
### 1. 语音转文字 (ASR) 技术
|
|||
|
|
|
|||
|
|
#### 选项A:OpenAI Whisper
|
|||
|
|
**优势:**
|
|||
|
|
- 开源免费,支持本地部署
|
|||
|
|
- 多语言支持(包括中文)
|
|||
|
|
- 准确率高,特别是对中文识别效果好
|
|||
|
|
- 支持不同模型大小(tiny, base, small, medium, large)
|
|||
|
|
- 可以输出时间戳信息
|
|||
|
|
- 社区活跃,文档完善
|
|||
|
|
|
|||
|
|
**劣势:**
|
|||
|
|
- 需要GPU加速才能获得最佳性能
|
|||
|
|
- 大模型文件较大(large模型约3GB)
|
|||
|
|
- 处理速度相对较慢
|
|||
|
|
|
|||
|
|
**部署方式:**
|
|||
|
|
```python
|
|||
|
|
import whisper
|
|||
|
|
model = whisper.load_model("medium")
|
|||
|
|
result = model.transcribe("audio.mp3", language="zh")
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
#### 选项B:Azure Speech Services
|
|||
|
|
**优势:**
|
|||
|
|
- 微软云服务,稳定可靠
|
|||
|
|
- 支持实时转录和批量转录
|
|||
|
|
- 中文识别准确率高
|
|||
|
|
- 支持说话人识别
|
|||
|
|
- 自动标点和格式化
|
|||
|
|
|
|||
|
|
**劣势:**
|
|||
|
|
- 需要付费使用
|
|||
|
|
- 依赖网络连接
|
|||
|
|
- 数据隐私考虑
|
|||
|
|
|
|||
|
|
#### 选项C:百度语音识别API
|
|||
|
|
**优势:**
|
|||
|
|
- 针对中文优化
|
|||
|
|
- 识别准确率高
|
|||
|
|
- 提供免费额度
|
|||
|
|
- 国内服务稳定
|
|||
|
|
|
|||
|
|
**劣势:**
|
|||
|
|
- 需要付费(超出免费额度)
|
|||
|
|
- API调用限制
|
|||
|
|
- 数据需上传到百度服务器
|
|||
|
|
|
|||
|
|
### 2. 说话人分离 (Speaker Diarization) 技术
|
|||
|
|
|
|||
|
|
#### 选项A:pyannote-audio
|
|||
|
|
**优势:**
|
|||
|
|
- 开源免费
|
|||
|
|
- 专业的说话人分离库
|
|||
|
|
- 支持实时和批量处理
|
|||
|
|
- 可以与Whisper很好集成
|
|||
|
|
- 输出详细的说话人时间段信息
|
|||
|
|
|
|||
|
|
**劣势:**
|
|||
|
|
- 需要预训练模型
|
|||
|
|
- 对音频质量要求较高
|
|||
|
|
- 处理复杂场景(多人同时说话)效果有限
|
|||
|
|
|
|||
|
|
**使用示例:**
|
|||
|
|
```python
|
|||
|
|
from pyannote.audio import Pipeline
|
|||
|
|
pipeline = Pipeline.from_pretrained("pyannote/speaker-diarization")
|
|||
|
|
diarization = pipeline("audio.wav")
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
#### 选项B:Azure Speaker Recognition
|
|||
|
|
**优势:**
|
|||
|
|
- 集成在Azure Speech Services中
|
|||
|
|
- 可以识别特定说话人身份
|
|||
|
|
- 云端处理,无需本地资源
|
|||
|
|
|
|||
|
|
**劣势:**
|
|||
|
|
- 付费服务
|
|||
|
|
- 需要预先注册说话人声纹
|
|||
|
|
|
|||
|
|
### 3. 智能摘要生成技术
|
|||
|
|
|
|||
|
|
#### 选项A:OpenAI GPT API
|
|||
|
|
**优势:**
|
|||
|
|
- 强大的文本理解和总结能力
|
|||
|
|
- 支持中文
|
|||
|
|
- 可以生成结构化摘要
|
|||
|
|
- API调用简单
|
|||
|
|
|
|||
|
|
**劣势:**
|
|||
|
|
- 付费服务
|
|||
|
|
- 需要网络连接
|
|||
|
|
- Token限制
|
|||
|
|
|
|||
|
|
#### 选项B:本地LLM(如Llama2/ChatGLM)
|
|||
|
|
**优势:**
|
|||
|
|
- 可本地部署
|
|||
|
|
- 数据隐私安全
|
|||
|
|
- 一次部署长期使用
|
|||
|
|
|
|||
|
|
**劣势:**
|
|||
|
|
- 需要大量GPU内存
|
|||
|
|
- 部署复杂度高
|
|||
|
|
- 中文效果可能不如专业API
|
|||
|
|
|
|||
|
|
## 推荐技术方案
|
|||
|
|
|
|||
|
|
### 方案一:完全开源方案(推荐)
|
|||
|
|
```
|
|||
|
|
音频文件 → Whisper (语音转文字) → pyannote-audio (说话人分离) → 本地LLM (摘要生成)
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**技术栈:**
|
|||
|
|
- **ASR**: OpenAI Whisper (medium模型)
|
|||
|
|
- **说话人分离**: pyannote-audio
|
|||
|
|
- **摘要生成**: ChatGLM-6B 或 Llama2-7B-Chat
|
|||
|
|
- **音频处理**: librosa, pydub
|
|||
|
|
- **后端**: Python FastAPI
|
|||
|
|
- **任务队列**: Celery + Redis
|
|||
|
|
|
|||
|
|
**优势:**
|
|||
|
|
- 完全开源,无API费用
|
|||
|
|
- 数据隐私安全
|
|||
|
|
- 可控性强,可定制化
|
|||
|
|
|
|||
|
|
**挑战:**
|
|||
|
|
- 需要GPU服务器资源
|
|||
|
|
- 部署复杂度较高
|
|||
|
|
- 需要模型优化调参
|
|||
|
|
|
|||
|
|
### 方案二:混合方案
|
|||
|
|
```
|
|||
|
|
音频文件 → Whisper (语音转文字) → pyannote-audio (说话人分离) → Qwen3 (摘要生成)
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**优势:**
|
|||
|
|
- ASR和说话人分离本地处理,保护音频隐私
|
|||
|
|
- 摘要生成使用成熟API,效果好
|
|||
|
|
- 平衡了成本和效果
|
|||
|
|
|
|||
|
|
## 实施架构
|
|||
|
|
|
|||
|
|
### 处理流程
|
|||
|
|
1. **文件上传**: 用户上传音频文件到服务器
|
|||
|
|
2. **预处理**: 音频格式转换、降噪处理
|
|||
|
|
3. **说话人分离**: 使用pyannote-audio识别说话人片段
|
|||
|
|
4. **语音识别**: 使用Whisper对每个说话人片段进行转录
|
|||
|
|
5. **后处理**: 合并转录结果,添加时间戳和说话人标签
|
|||
|
|
6. **摘要生成**: 基于转录文本生成会议摘要
|
|||
|
|
7. **结果存储**: 将处理结果存储到数据库
|
|||
|
|
|
|||
|
|
### 技术要求
|
|||
|
|
- **硬件**: 推荐NVIDIA GPU (4GB+ VRAM)
|
|||
|
|
- **内存**: 16GB+ RAM
|
|||
|
|
- **存储**: SSD存储,用于模型文件和临时音频文件
|
|||
|
|
- **网络**: 如使用云API需要稳定网络连接
|
|||
|
|
|
|||
|
|
### 性能估算
|
|||
|
|
对于1小时的会议录音:
|
|||
|
|
- **Whisper转录**: 约5-15分钟(取决于GPU性能)
|
|||
|
|
- **说话人分离**: 约2-5分钟
|
|||
|
|
- **摘要生成**: 约30秒-2分钟
|
|||
|
|
- **总处理时间**: 8-22分钟
|
|||
|
|
|
|||
|
|
## 后续优化方向
|
|||
|
|
|
|||
|
|
1. **实时处理**: 支持流式音频处理
|
|||
|
|
2. **多语言支持**: 扩展到英文等其他语言
|
|||
|
|
3. **说话人识别**: 建立说话人声纹库,实现身份识别
|
|||
|
|
4. **关键词提取**: 自动提取会议关键词和主题
|
|||
|
|
5. **情感分析**: 分析说话人情感倾向
|
|||
|
|
6. **会议洞察**: 生成会议参与度、发言时长等统计信息
|
|||
|
|
|
|||
|
|
## 环境部署指南
|
|||
|
|
|
|||
|
|
### 系统要求
|
|||
|
|
|
|||
|
|
**推荐配置(16GB内存 + T4 GPU + Ubuntu):**
|
|||
|
|
- **操作系统**: Ubuntu 20.04 LTS 或更高版本
|
|||
|
|
- **GPU**: NVIDIA T4 (16GB VRAM) 或同等性能显卡
|
|||
|
|
- **内存**: 16GB RAM 最低,32GB 推荐
|
|||
|
|
- **存储**: 100GB+ SSD存储空间
|
|||
|
|
- **CPU**: 8核心以上处理器
|
|||
|
|
- **Python**: 3.8-3.11
|
|||
|
|
|
|||
|
|
### 1. Whisper 语音转文字环境部署
|
|||
|
|
|
|||
|
|
#### 1.1 安装CUDA和cuDNN
|
|||
|
|
```bash
|
|||
|
|
# 安装NVIDIA驱动
|
|||
|
|
sudo apt update
|
|||
|
|
sudo apt install nvidia-driver-470
|
|||
|
|
|
|||
|
|
# 安装CUDA 11.8
|
|||
|
|
wget https://developer.download.nvidia.com/compute/cuda/11.8.0/local_installers/cuda_11.8.0_520.61.05_linux.run
|
|||
|
|
sudo chmod +x cuda_11.8.0_520.61.05_linux.run
|
|||
|
|
sudo ./cuda_11.8.0_520.61.05_linux.run
|
|||
|
|
|
|||
|
|
# 配置环境变量
|
|||
|
|
echo 'export PATH=/usr/local/cuda-11.8/bin:$PATH' >> ~/.bashrc
|
|||
|
|
echo 'export LD_LIBRARY_PATH=/usr/local/cuda-11.8/lib64:$LD_LIBRARY_PATH' >> ~/.bashrc
|
|||
|
|
source ~/.bashrc
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
#### 1.2 创建Python虚拟环境
|
|||
|
|
```bash
|
|||
|
|
# 安装conda
|
|||
|
|
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
|
|||
|
|
bash Miniconda3-latest-Linux-x86_64.sh
|
|||
|
|
|
|||
|
|
# 创建虚拟环境
|
|||
|
|
conda create -n imeeting python=3.10
|
|||
|
|
conda activate imeeting
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
#### 1.3 安装Whisper和依赖
|
|||
|
|
```bash
|
|||
|
|
# 安装PyTorch (CUDA版本)
|
|||
|
|
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
|
|||
|
|
|
|||
|
|
# 安装Whisper
|
|||
|
|
pip install openai-whisper
|
|||
|
|
|
|||
|
|
# 安装音频处理库
|
|||
|
|
pip install librosa pydub soundfile
|
|||
|
|
|
|||
|
|
# 验证安装
|
|||
|
|
python -c "import whisper; print('Whisper installed successfully')"
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
#### 1.4 下载并测试Whisper模型
|
|||
|
|
```bash
|
|||
|
|
# 在Python中预下载模型
|
|||
|
|
python -c "
|
|||
|
|
import whisper
|
|||
|
|
# 下载medium模型 (适合中文,约1.5GB)
|
|||
|
|
model = whisper.load_model('medium')
|
|||
|
|
print('Medium model downloaded successfully')
|
|||
|
|
"
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
#### 1.5 Whisper使用示例
|
|||
|
|
```python
|
|||
|
|
import whisper
|
|||
|
|
import time
|
|||
|
|
|
|||
|
|
# 加载模型
|
|||
|
|
model = whisper.load_model("medium")
|
|||
|
|
|
|||
|
|
# 转录音频
|
|||
|
|
def transcribe_audio(audio_path):
|
|||
|
|
start_time = time.time()
|
|||
|
|
result = model.transcribe(
|
|||
|
|
audio_path,
|
|||
|
|
language="zh", # 中文
|
|||
|
|
task="transcribe",
|
|||
|
|
verbose=True,
|
|||
|
|
word_timestamps=True # 获取词级别时间戳
|
|||
|
|
)
|
|||
|
|
|
|||
|
|
processing_time = time.time() - start_time
|
|||
|
|
print(f"处理时间: {processing_time:.2f}秒")
|
|||
|
|
|
|||
|
|
return result
|
|||
|
|
|
|||
|
|
# 使用示例
|
|||
|
|
# result = transcribe_audio("meeting.wav")
|
|||
|
|
# print(result["text"])
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### 2. pyannote-audio 说话人分离环境部署
|
|||
|
|
|
|||
|
|
#### 2.1 安装pyannote.audio
|
|||
|
|
```bash
|
|||
|
|
# 激活虚拟环境
|
|||
|
|
conda activate imeeting
|
|||
|
|
|
|||
|
|
# 安装pyannote.audio
|
|||
|
|
pip install pyannote.audio
|
|||
|
|
|
|||
|
|
# 安装额外依赖
|
|||
|
|
pip install pytorch-lightning
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
#### 2.2 获取Hugging Face访问权限
|
|||
|
|
```bash
|
|||
|
|
# 安装huggingface-hub
|
|||
|
|
pip install huggingface-hub
|
|||
|
|
|
|||
|
|
# 登录Hugging Face (需要先在https://huggingface.co注册账号)
|
|||
|
|
huggingface-cli login
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**注意**: 需要在Hugging Face上申请访问pyannote模型的权限:
|
|||
|
|
1. 访问 https://huggingface.co/pyannote/speaker-diarization
|
|||
|
|
2. 点击"Request access"申请访问权限
|
|||
|
|
3. 等待审批通过(通常1-2天)
|
|||
|
|
|
|||
|
|
#### 2.3 pyannote.audio使用示例
|
|||
|
|
```python
|
|||
|
|
from pyannote.audio import Pipeline
|
|||
|
|
import torch
|
|||
|
|
|
|||
|
|
# 检查GPU是否可用
|
|||
|
|
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
|
|||
|
|
print(f"使用设备: {device}")
|
|||
|
|
|
|||
|
|
# 加载说话人分离模型
|
|||
|
|
pipeline = Pipeline.from_pretrained(
|
|||
|
|
"pyannote/speaker-diarization-3.1",
|
|||
|
|
use_auth_token=True # 使用HF token
|
|||
|
|
)
|
|||
|
|
|
|||
|
|
# 将模型移动到GPU
|
|||
|
|
pipeline = pipeline.to(device)
|
|||
|
|
|
|||
|
|
def speaker_diarization(audio_path):
|
|||
|
|
"""说话人分离"""
|
|||
|
|
diarization = pipeline(audio_path)
|
|||
|
|
|
|||
|
|
# 输出结果
|
|||
|
|
speakers_info = []
|
|||
|
|
for turn, _, speaker in diarization.itertracks(yield_label=True):
|
|||
|
|
speakers_info.append({
|
|||
|
|
"speaker": speaker,
|
|||
|
|
"start": turn.start,
|
|||
|
|
"end": turn.end,
|
|||
|
|
"duration": turn.end - turn.start
|
|||
|
|
})
|
|||
|
|
|
|||
|
|
return speakers_info
|
|||
|
|
|
|||
|
|
# 使用示例
|
|||
|
|
# speakers = speaker_diarization("meeting.wav")
|
|||
|
|
# for info in speakers:
|
|||
|
|
# print(f"说话人 {info['speaker']}: {info['start']:.2f}s - {info['end']:.2f}s")
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### 3. Ollama + 本地LLM 摘要生成环境部署
|
|||
|
|
|
|||
|
|
#### 3.1 安装Ollama
|
|||
|
|
```bash
|
|||
|
|
# 下载并安装Ollama
|
|||
|
|
curl -fsSL https://ollama.ai/install.sh | sh
|
|||
|
|
|
|||
|
|
# 启动Ollama服务
|
|||
|
|
sudo systemctl start ollama
|
|||
|
|
sudo systemctl enable ollama
|
|||
|
|
|
|||
|
|
# 验证安装
|
|||
|
|
ollama --version
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
#### 3.2 下载和配置中文LLM模型
|
|||
|
|
```bash
|
|||
|
|
# 下载Qwen2.5模型 (推荐用于中文摘要)
|
|||
|
|
ollama pull qwen2.5:7b
|
|||
|
|
|
|||
|
|
# 或者下载ChatGLM3模型
|
|||
|
|
ollama pull chatglm3:6b
|
|||
|
|
|
|||
|
|
# 或者下载Llama3.1中文版
|
|||
|
|
ollama pull llama3.1:8b
|
|||
|
|
|
|||
|
|
# 查看已下载的模型
|
|||
|
|
ollama list
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
#### 3.3 配置Ollama服务
|
|||
|
|
```bash
|
|||
|
|
# 创建Ollama配置目录
|
|||
|
|
sudo mkdir -p /etc/systemd/system/ollama.service.d
|
|||
|
|
|
|||
|
|
# 创建配置文件
|
|||
|
|
sudo tee /etc/systemd/system/ollama.service.d/override.conf > /dev/null <<EOF
|
|||
|
|
[Service]
|
|||
|
|
Environment="OLLAMA_HOST=0.0.0.0"
|
|||
|
|
Environment="OLLAMA_NUM_PARALLEL=2"
|
|||
|
|
Environment="OLLAMA_MAX_LOADED_MODELS=2"
|
|||
|
|
Environment="OLLAMA_GPU_MEMORY_FRACTION=0.8"
|
|||
|
|
EOF
|
|||
|
|
|
|||
|
|
# 重新加载配置并重启服务
|
|||
|
|
sudo systemctl daemon-reload
|
|||
|
|
sudo systemctl restart ollama
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
#### 3.4 安装Python客户端
|
|||
|
|
```bash
|
|||
|
|
# 安装Ollama Python客户端
|
|||
|
|
pip install ollama
|
|||
|
|
|
|||
|
|
# 安装其他依赖
|
|||
|
|
pip install requests
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
#### 3.5 LLM摘要生成使用示例
|
|||
|
|
```python
|
|||
|
|
import ollama
|
|||
|
|
import json
|
|||
|
|
|
|||
|
|
class MeetingSummarizer:
|
|||
|
|
def __init__(self, model_name="qwen2.5:7b"):
|
|||
|
|
self.model_name = model_name
|
|||
|
|
self.client = ollama.Client()
|
|||
|
|
|
|||
|
|
def generate_summary(self, transcript_text, speakers_info):
|
|||
|
|
"""生成会议摘要"""
|
|||
|
|
|
|||
|
|
# 构建提示词
|
|||
|
|
prompt = f"""
|
|||
|
|
请根据以下会议转录内容生成详细的会议摘要:
|
|||
|
|
|
|||
|
|
转录内容:
|
|||
|
|
{transcript_text}
|
|||
|
|
|
|||
|
|
请按以下格式输出摘要:
|
|||
|
|
1. 会议概述
|
|||
|
|
2. 主要讨论点
|
|||
|
|
3. 决策事项
|
|||
|
|
4. 待办事项
|
|||
|
|
5. 参会人员发言要点
|
|||
|
|
|
|||
|
|
请用中文回答,保持简洁明了。
|
|||
|
|
"""
|
|||
|
|
|
|||
|
|
try:
|
|||
|
|
response = self.client.chat(
|
|||
|
|
model=self.model_name,
|
|||
|
|
messages=[
|
|||
|
|
{
|
|||
|
|
'role': 'user',
|
|||
|
|
'content': prompt
|
|||
|
|
}
|
|||
|
|
],
|
|||
|
|
options={
|
|||
|
|
'temperature': 0.3,
|
|||
|
|
'top_p': 0.9,
|
|||
|
|
'max_tokens': 2000
|
|||
|
|
}
|
|||
|
|
)
|
|||
|
|
|
|||
|
|
return response['message']['content']
|
|||
|
|
|
|||
|
|
except Exception as e:
|
|||
|
|
print(f"摘要生成错误: {e}")
|
|||
|
|
return None
|
|||
|
|
|
|||
|
|
# 使用示例
|
|||
|
|
# summarizer = MeetingSummarizer()
|
|||
|
|
# summary = summarizer.generate_summary(transcript_text, speakers_info)
|
|||
|
|
# print(summary)
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### 4. 完整集成部署脚本
|
|||
|
|
|
|||
|
|
#### 4.1 创建部署脚本
|
|||
|
|
```bash
|
|||
|
|
# 创建部署脚本
|
|||
|
|
cat > setup_imeeting_ai.sh << 'EOF'
|
|||
|
|
#!/bin/bash
|
|||
|
|
|
|||
|
|
set -e
|
|||
|
|
|
|||
|
|
echo "开始部署imeeting AI环境..."
|
|||
|
|
|
|||
|
|
# 更新系统
|
|||
|
|
sudo apt update && sudo apt upgrade -y
|
|||
|
|
|
|||
|
|
# 安装必要的系统依赖
|
|||
|
|
sudo apt install -y wget curl git build-essential
|
|||
|
|
|
|||
|
|
# 安装NVIDIA驱动(如果还没安装)
|
|||
|
|
if ! command -v nvidia-smi &> /dev/null; then
|
|||
|
|
echo "安装NVIDIA驱动..."
|
|||
|
|
sudo apt install -y nvidia-driver-470
|
|||
|
|
echo "请重启系统后重新运行此脚本"
|
|||
|
|
exit 1
|
|||
|
|
fi
|
|||
|
|
|
|||
|
|
# 安装Miniconda
|
|||
|
|
if [ ! -d "$HOME/miniconda3" ]; then
|
|||
|
|
echo "安装Miniconda..."
|
|||
|
|
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
|
|||
|
|
bash Miniconda3-latest-Linux-x86_64.sh -b
|
|||
|
|
rm Miniconda3-latest-Linux-x86_64.sh
|
|||
|
|
fi
|
|||
|
|
|
|||
|
|
# 初始化conda
|
|||
|
|
source ~/miniconda3/bin/activate
|
|||
|
|
|
|||
|
|
# 创建虚拟环境
|
|||
|
|
echo "创建Python虚拟环境..."
|
|||
|
|
conda create -n imeeting python=3.10 -y
|
|||
|
|
conda activate imeeting
|
|||
|
|
|
|||
|
|
# 安装PyTorch和相关库
|
|||
|
|
echo "安装PyTorch..."
|
|||
|
|
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
|
|||
|
|
|
|||
|
|
# 安装AI相关库
|
|||
|
|
echo "安装AI相关库..."
|
|||
|
|
pip install openai-whisper pyannote.audio ollama
|
|||
|
|
pip install librosa pydub soundfile huggingface-hub
|
|||
|
|
|
|||
|
|
# 安装Ollama
|
|||
|
|
echo "安装Ollama..."
|
|||
|
|
curl -fsSL https://ollama.ai/install.sh | sh
|
|||
|
|
|
|||
|
|
# 启动Ollama服务
|
|||
|
|
sudo systemctl start ollama
|
|||
|
|
sudo systemctl enable ollama
|
|||
|
|
|
|||
|
|
# 下载模型
|
|||
|
|
echo "下载模型..."
|
|||
|
|
ollama pull qwen2.5:7b
|
|||
|
|
|
|||
|
|
echo "环境部署完成!"
|
|||
|
|
echo "请使用以下命令激活环境:"
|
|||
|
|
echo "conda activate imeeting"
|
|||
|
|
|
|||
|
|
EOF
|
|||
|
|
|
|||
|
|
chmod +x setup_imeeting_ai.sh
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
#### 4.2 运行部署脚本
|
|||
|
|
```bash
|
|||
|
|
./setup_imeeting_ai.sh
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### 5. 性能优化建议
|
|||
|
|
|
|||
|
|
#### 5.1 GPU内存优化
|
|||
|
|
```python
|
|||
|
|
import torch
|
|||
|
|
|
|||
|
|
# 清理GPU缓存
|
|||
|
|
def clear_gpu_cache():
|
|||
|
|
if torch.cuda.is_available():
|
|||
|
|
torch.cuda.empty_cache()
|
|||
|
|
|
|||
|
|
# 使用混合精度
|
|||
|
|
from torch.cuda.amp import autocast
|
|||
|
|
|
|||
|
|
# 在推理时使用autocast
|
|||
|
|
with autocast():
|
|||
|
|
result = model.transcribe(audio_path)
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
#### 5.2 批处理优化
|
|||
|
|
```python
|
|||
|
|
# 批量处理音频文件
|
|||
|
|
def batch_process_audio(audio_files, batch_size=2):
|
|||
|
|
results = []
|
|||
|
|
for i in range(0, len(audio_files), batch_size):
|
|||
|
|
batch = audio_files[i:i+batch_size]
|
|||
|
|
# 处理批次
|
|||
|
|
for audio_file in batch:
|
|||
|
|
result = process_single_audio(audio_file)
|
|||
|
|
results.append(result)
|
|||
|
|
# 清理缓存
|
|||
|
|
clear_gpu_cache()
|
|||
|
|
return results
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### 6. 监控和日志
|
|||
|
|
|
|||
|
|
#### 6.1 系统监控
|
|||
|
|
```bash
|
|||
|
|
# 安装监控工具
|
|||
|
|
pip install nvidia-ml-py3 psutil
|
|||
|
|
|
|||
|
|
# GPU监控脚本
|
|||
|
|
python -c "
|
|||
|
|
import nvidia_ml_py3 as nvml
|
|||
|
|
import psutil
|
|||
|
|
import time
|
|||
|
|
|
|||
|
|
nvml.nvmlInit()
|
|||
|
|
handle = nvml.nvmlDeviceGetHandleByIndex(0)
|
|||
|
|
|
|||
|
|
while True:
|
|||
|
|
# GPU使用率
|
|||
|
|
util = nvml.nvmlDeviceGetUtilizationRates(handle)
|
|||
|
|
print(f'GPU使用率: {util.gpu}%')
|
|||
|
|
|
|||
|
|
# 内存使用
|
|||
|
|
mem_info = nvml.nvmlDeviceGetMemoryInfo(handle)
|
|||
|
|
print(f'GPU内存: {mem_info.used/1024**3:.1f}GB / {mem_info.total/1024**3:.1f}GB')
|
|||
|
|
|
|||
|
|
# CPU和RAM
|
|||
|
|
print(f'CPU使用率: {psutil.cpu_percent()}%')
|
|||
|
|
print(f'RAM使用率: {psutil.virtual_memory().percent}%')
|
|||
|
|
print('-' * 40)
|
|||
|
|
|
|||
|
|
time.sleep(5)
|
|||
|
|
"
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### 7. 故障排除
|
|||
|
|
|
|||
|
|
#### 7.1 常见问题
|
|||
|
|
1. **CUDA版本不匹配**: 确保PyTorch版本与CUDA版本匹配
|
|||
|
|
2. **GPU内存不足**: 减小模型尺寸或使用CPU模式
|
|||
|
|
3. **Hugging Face访问权限**: 确保已获得pyannote模型访问权限
|
|||
|
|
4. **Ollama服务未启动**: 检查服务状态 `sudo systemctl status ollama`
|
|||
|
|
|
|||
|
|
#### 7.2 测试脚本
|
|||
|
|
```python
|
|||
|
|
# 创建测试脚本
|
|||
|
|
def test_environment():
|
|||
|
|
"""测试环境是否正确配置"""
|
|||
|
|
|
|||
|
|
# 测试CUDA
|
|||
|
|
import torch
|
|||
|
|
print(f"CUDA可用: {torch.cuda.is_available()}")
|
|||
|
|
if torch.cuda.is_available():
|
|||
|
|
print(f"GPU设备: {torch.cuda.get_device_name(0)}")
|
|||
|
|
print(f"GPU内存: {torch.cuda.get_device_properties(0).total_memory / 1024**3:.1f}GB")
|
|||
|
|
|
|||
|
|
# 测试Whisper
|
|||
|
|
try:
|
|||
|
|
import whisper
|
|||
|
|
print("✓ Whisper安装成功")
|
|||
|
|
except ImportError:
|
|||
|
|
print("✗ Whisper安装失败")
|
|||
|
|
|
|||
|
|
# 测试pyannote
|
|||
|
|
try:
|
|||
|
|
from pyannote.audio import Pipeline
|
|||
|
|
print("✓ pyannote.audio安装成功")
|
|||
|
|
except ImportError:
|
|||
|
|
print("✗ pyannote.audio安装失败")
|
|||
|
|
|
|||
|
|
# 测试Ollama
|
|||
|
|
try:
|
|||
|
|
import ollama
|
|||
|
|
client = ollama.Client()
|
|||
|
|
models = client.list()
|
|||
|
|
print(f"✓ Ollama连接成功,可用模型: {len(models['models'])}")
|
|||
|
|
except Exception as e:
|
|||
|
|
print(f"✗ Ollama连接失败: {e}")
|
|||
|
|
|
|||
|
|
if __name__ == "__main__":
|
|||
|
|
test_environment()
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
## 开发阶段规划
|
|||
|
|
|
|||
|
|
### 阶段一:基础转录功能
|
|||
|
|
- 集成Whisper进行基础语音转文字
|
|||
|
|
- 实现音频文件上传和处理队列
|
|||
|
|
- 基础的转录结果展示
|
|||
|
|
|
|||
|
|
### 阶段二:说话人分离
|
|||
|
|
- 集成pyannote-audio
|
|||
|
|
- 实现多说话人识别和标记
|
|||
|
|
- 优化音频预处理流程
|
|||
|
|
|
|||
|
|
### 阶段三:智能摘要
|
|||
|
|
- 集成大语言模型或API
|
|||
|
|
- 实现会议摘要自动生成
|
|||
|
|
- 添加关键信息提取功能
|
|||
|
|
|
|||
|
|
### 阶段四:系统优化
|
|||
|
|
- 性能优化和错误处理
|
|||
|
|
- 用户界面完善
|
|||
|
|
- 部署和运维自动化
|