227 lines
6.0 KiB
Markdown
227 lines
6.0 KiB
Markdown
|
|
# 前言
|
|||
|
|
## 视频
|
|||
|
|
- 【【搬运】M4 Mac Mini 集群【 Alex Ziskind】】 https://www.bilibili.com/video/BV17UzcYqEsc/?share_source=copy_web&vd_source=fe8142e8e12816535feaeabd6f6cdc8e
|
|||
|
|
|
|||
|
|
## 方法
|
|||
|
|
- 官方 mlx.launch(推荐,轻量可控)
|
|||
|
|
- Exo(可视化,一键集群)
|
|||
|
|
|
|||
|
|
# 豆包回答
|
|||
|
|
如何使用MLX部署MacStudio群集?
|
|||
|
|
|
|||
|
|
在 Mac Studio 集群上用 MLX 做分布式推理 / 训练,核心是**统一环境、免密 SSH、低延迟网络、mlx.launch 调度**,推荐优先用 Thunderbolt 5 + RDMA 以获得最佳性能。
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
### 一、前置准备(所有节点必须一致)
|
|||
|
|
|
|||
|
|
1. **系统与硬件**
|
|||
|
|
|
|||
|
|
- macOS 16.2+(支持 RDMA over Thunderbolt 5)
|
|||
|
|
- 网络:Thunderbolt 5 直连或交换机(推荐);千兆 / 万兆以太网兜底
|
|||
|
|
- 每台 Mac Studio 启用 RDMA(仅需一次):
|
|||
|
|
|
|||
|
|
关机 → 长按电源键进 “启动选项”→ 选项 → 继续 → 实用工具 → 终端:
|
|||
|
|
|
|||
|
|
bash
|
|||
|
|
|
|||
|
|
运行
|
|||
|
|
|
|||
|
|
```bash
|
|||
|
|
rdma_ctl enable
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
重启后验证:`rdma_ctl status`
|
|||
|
|
2. **免密 SSH 与主机名**
|
|||
|
|
|
|||
|
|
- 所有节点设静态主机名(如 `mac-studio-01`),编辑 `/etc/hosts` 写入所有节点的 IP 与主机名
|
|||
|
|
- 在主节点生成密钥并分发至所有节点:
|
|||
|
|
|
|||
|
|
bash
|
|||
|
|
|
|||
|
|
运行
|
|||
|
|
|
|||
|
|
```bash
|
|||
|
|
ssh-keygen -t ed25519 -N "" -f ~/.ssh/mlx-cluster
|
|||
|
|
for node in mac-studio-01 mac-studio-02; do
|
|||
|
|
ssh-copy-id -i ~/.ssh/mlx-cluster "$node"
|
|||
|
|
done
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
- 测试:`ssh -i ~/.ssh/mlx-cluster mac-studio-02 hostname`
|
|||
|
|
3. **统一 Python 环境**
|
|||
|
|
|
|||
|
|
- 所有节点安装依赖:
|
|||
|
|
|
|||
|
|
bash
|
|||
|
|
|
|||
|
|
运行
|
|||
|
|
|
|||
|
|
```bash
|
|||
|
|
brew install python@3.12 uv openmpi # openmpi 为 MPI 后端所需
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
- 用 uv 创建并同步环境:
|
|||
|
|
|
|||
|
|
bash
|
|||
|
|
|
|||
|
|
运行
|
|||
|
|
|
|||
|
|
```bash
|
|||
|
|
# 主节点创建
|
|||
|
|
uv venv ~/mlx-cluster-venv
|
|||
|
|
uv pip install mlx mlx-lm mpi4py
|
|||
|
|
# 同步至其他节点(示例)
|
|||
|
|
for node in mac-studio-02 mac-studio-03; do
|
|||
|
|
rsync -av ~/mlx-cluster-venv "$node":~/
|
|||
|
|
done
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
- 内存优化(按需):
|
|||
|
|
|
|||
|
|
bash
|
|||
|
|
|
|||
|
|
运行
|
|||
|
|
|
|||
|
|
```bash
|
|||
|
|
sudo sysctl iogpu.wired_limit_mb=180000 # 192GB 机型示例,重启失效
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
### 二、快速部署(两种方式)
|
|||
|
|
|
|||
|
|
#### 方式 A:官方 mlx.launch(推荐,轻量可控)
|
|||
|
|
|
|||
|
|
1. **生成主机配置文件**用 `mlx.distributed_config` 自动检测 Thunderbolt 接口并生成配置(主节点执行):
|
|||
|
|
|
|||
|
|
bash
|
|||
|
|
|
|||
|
|
运行
|
|||
|
|
|
|||
|
|
```bash
|
|||
|
|
source ~/mlx-cluster-venv/bin/activate
|
|||
|
|
mlx.distributed_config \
|
|||
|
|
--hosts mac-studio-01,mac-studio-02,mac-studio-03 \
|
|||
|
|
--over thunderbolt \
|
|||
|
|
--auto-setup \
|
|||
|
|
--output cluster_hosts.json
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
手动配置示例(`cluster_hosts.json`):
|
|||
|
|
|
|||
|
|
json
|
|||
|
|
|
|||
|
|
```json
|
|||
|
|
[
|
|||
|
|
{"ssh": "mac-studio-01", "ips": ["192.168.2.101"]},
|
|||
|
|
{"ssh": "mac-studio-02", "ips": ["192.168.2.102"]}
|
|||
|
|
]
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
2. **运行分布式任务**
|
|||
|
|
|
|||
|
|
- 推理(以 mlx-lm 为例,张量并行):
|
|||
|
|
|
|||
|
|
bash
|
|||
|
|
|
|||
|
|
运行
|
|||
|
|
|
|||
|
|
```bash
|
|||
|
|
mlx.launch \
|
|||
|
|
--hostfile cluster_hosts.json \
|
|||
|
|
--backend mpi \
|
|||
|
|
~/mlx-cluster-venv/bin/python -m mlx_lm.generate \
|
|||
|
|
--model mlx-community/DeepSeek-V2.5-1210-3bit \
|
|||
|
|
--tensor-parallel 2 \
|
|||
|
|
--prompt "Hello, cluster!"
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
- 训练(数据并行,需修改脚本支持分布式):
|
|||
|
|
|
|||
|
|
bash
|
|||
|
|
|
|||
|
|
运行
|
|||
|
|
|
|||
|
|
```bash
|
|||
|
|
mlx.launch \
|
|||
|
|
--hostfile cluster_hosts.json \
|
|||
|
|
--backend ring \
|
|||
|
|
train_script.py
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
|
|||
|
|
#### 方式 B:Exo(可视化,一键集群)
|
|||
|
|
|
|||
|
|
适合快速上手,自动做拓扑感知与 RDMA 优化:
|
|||
|
|
|
|||
|
|
1. 所有节点安装:
|
|||
|
|
|
|||
|
|
bash
|
|||
|
|
|
|||
|
|
运行
|
|||
|
|
|
|||
|
|
```bash
|
|||
|
|
brew install uv node macmon
|
|||
|
|
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
|
|||
|
|
rustup toolchain install nightly
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
2. 克隆并启动:
|
|||
|
|
|
|||
|
|
bash
|
|||
|
|
|
|||
|
|
运行
|
|||
|
|
|
|||
|
|
```bash
|
|||
|
|
git clone https://github.com/exo-explore/exo && cd exo
|
|||
|
|
cd dashboard && npm install && npm run build && cd ..
|
|||
|
|
uv run exo
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
3. 访问 `http://localhost:52415`,选择 MLX 模型与并行策略,一键启动。
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
### 三、关键配置与优化
|
|||
|
|
|
|||
|
|
表格
|
|||
|
|
|
|||
|
|
|场景|推荐后端|网络|并行策略|说明|
|
|||
|
|
|---|---|---|---|---|
|
|||
|
|
|大模型推理|MPI|Thunderbolt 5 + RDMA|张量并行|低延迟,分片模型权重|
|
|||
|
|
|小批量训练|Ring|万兆以太网|数据并行|梯度聚合快,易部署|
|
|||
|
|
|超大规模模型|MPI|Thunderbolt 5 交换机|流水线 + 张量并行|结合 mlx-lm 的 pipeline_generate.py|
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
### 四、验证与排错
|
|||
|
|
|
|||
|
|
1. **验证集群**
|
|||
|
|
|
|||
|
|
bash
|
|||
|
|
|
|||
|
|
运行
|
|||
|
|
|
|||
|
|
```bash
|
|||
|
|
# 主节点执行,测试所有节点环境
|
|||
|
|
mlx.launch --hostfile cluster_hosts.json --backend mpi python -c "import mlx.core as mx; print(f'Rank {mx.distributed.rank()}/{mx.distributed.size()}')"
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
2. **常见问题**
|
|||
|
|
- SSH 连接失败:检查密钥、`/etc/hosts`、防火墙
|
|||
|
|
- RDMA 未启用:重新进入恢复模式执行 `rdma_ctl enable`
|
|||
|
|
- 内存不足:降低模型量化精度(4bit)、增大 `iogpu.wired_limit_mb`、使用流水线并行
|
|||
|
|
- 性能差:确认使用 Thunderbolt 5 端口,后端选 MPI,关闭节点间其他网络负载
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
### 五、生产化建议
|
|||
|
|
|
|||
|
|
1. 用 LaunchAgent 自动启动 Exo 或训练脚本
|
|||
|
|
2. 共享模型与数据目录(NFS 或 SMB),避免重复下载
|
|||
|
|
3. 监控:`top -l 1 -o cpu`、`nvtop`(Apple Silicon 版)、Exo 仪表盘
|
|||
|
|
4. 备份:定期同步环境配置与训练脚本
|
|||
|
|
|
|||
|
|
需要我提供一个可直接运行的分布式训练脚本模板(含梯度 all-reduce 和数据分片)吗?
|