17370845950

新闻动态

如何在 Apache POI 中精准定位并插入图片到 Word 文档指定文本后

本文详解使用 apache poi（xwpf）在 ms word 文档中，于每个匹配关键词（如 "word"）之后自动插入图片的完整实现方案，重点解决图片不可见、并发修改异常及文本分段导致匹配失败三大核心问题。

在基于 Apache POI 操作 Word（.docx）文档时，许多开发者尝试通过遍历 XWPFRun 并调用 insertNewRun() 插入图片，却发现图片“未显示”或程序抛出 ConcurrentModificationException。根本原因在于：Word 的底层文本结构高度碎片化、图片尺寸单位易被误解、以及动态插入操作破坏迭代器稳定性。以下为专业级解决方案。

✅ 核心问题与修复要点

图片尺寸单位错误（最常见隐形 Bug）
XWPFRun.addPicture(..., width, height) 的 width/height 参数单位是 EMU（English Metric Units），而非像素（px）或点（pt）。直接传入 200 表示 200 EMU ≈ 0.000556 cm —— 远小于 1 像素，肉眼完全不可见。
✅ 正确做法：使用 Units.pixelToEMU(200)（若原始尺寸为像素）或 Units.toEMU(200)（若为点）：
```
import org.apache.poi.util.Units;
// ...
imageRun.addPicture(
    new FileInputStream(imgFile),
    XWPFDocument.PICTURE_TYPE_PNG,
    imgFile,
    Units.pixelToEMU(200),    // ✅ 转换为 EMU
    Units.pixelToEMU(150)     // ✅ 高度同理
);
```

避免 ConcurrentModificationException
使用增强 for 循环 for (XWPFRun run : runs) 时，若在循环体内调用 paragraph.insertNewRun()，会向 runs 列表插入新元素，导致底层 Iterator 失效。
✅ 安全替代：改用索引遍历，并注意 insertNewRun() 后需重新获取 runs 列表（因内部结构已变更）：

List runs = paragraph.getRuns();
for (int r = 0; r < runs.size(); r++) {
    XWPFRun run = runs.get(r);
    String text = run.getText(0);
    if (text != null && text.contains("word")) {
        // ✅ 安全插入：使用当前索引 + 1
        XWPFRun imageRun = paragraph.insertNewRun(r + 1);
        // ⚠️ 注意：此时 runs 已过期，后续需重新 getRuns()
        runs = paragraph.getRuns(); // 重载最新 runs
        // ... 插入图片逻辑
    }
}

精准匹配被拆分的关键词（Word 文本分段陷阱）
Word 可能将一个单词 "word" 拆分为多个 XWPFRun（如 word），导致 run.getText(0).contains("word") 永远为 false。
✅ 推荐方案：使用 XWPFParagraph.searchText() 配合 TextSegment，它能跨 Run 级别进行语义搜索：

PositionInParagraph startPos = new PositionInParagraph(0, 0, 0);
TextSegment segment;
while ((segment = paragraph.searchText("word", startPos)) != null) {
    int endRunIndex = segment.getEndRun();
    List runs = paragraph.getRuns();
    XWPFRun targetRun = runs.get(endRunIndex);

    // 在目标 Run 后插入新 Run（即关键词后）
    XWPFRun imageRun = paragraph.insertNewRun(endRunIndex + 1);
    // ... 添加图片

    // 更新搜索起始位置：跳过刚插入的图片 Run
    startPos = new PositionInParagraph(
        runs.indexOf(imageRun),  // 新 Run 的索引
        0, 0                      // 从该 Run 开头继续
    );
}

✅ 完整可运行示例（推荐使用）

import java.io.*;
import org.apache.poi.xwpf.usermodel.*;
import org.apache.poi.util.Units;
import java.util.List;

public class ImageWord {
    public static void main(String[] args) throws Exception {
        // ✅ 使用 .docx（非 .doc）格式；路径建议用相对路径或 Path.of()
        try (XWPFDocument doc = new XWPFDocument(new FileInputStream("./image.docx"))) {
            List paragraphs = doc.getParagraphs();
            File[] images = new File("./images/").listFiles((dir, name) -> 
                name.toLowerCase().endsWith(".png") || 
                name.toLowerCase().endsWith(".jpg"));

            if (images == null || images.length == 0) {
                throw new RuntimeException("No images found in ./images/");
            }

            int imageCounter = 0;
            String keyword = "word";

            for (XWPFParagraph paragraph : paragraphs) {
                PositionInParagraph pos = new PositionInParagraph(0, 0, 0);
                TextSegment segment;

                while ((segment = paragraph.searchText(keyword, pos)) != null) {
                    List runs = paragraph.getRuns();
                    int insertIndex = segment.getEndRun() + 1;

                    // 创建新 Run 并插入图片
                    XWPFRun imageRun = paragraph.insertNewRun(insertIndex);
                    File imgFile = images[imageCounter % images.length];

                    try (FileInputStream fis = new FileInputStream(imgFile)) {
                        imageRun.addPicture(
                            fis,
                            XWPFDocument.PICTURE_TYPE_PNG,
                            imgFile.getName(),
                            Units.pixelToEMU(200),
                            Units.pixelToEMU(150)
                        );
                    }
                    imageRun.addBreak(); // 可选：图片后换行

                    // 更新搜索位置，避免重复匹配或越界
                    runs = paragraph.getRuns();
                    int newIndex = runs.indexOf(imageRun);
                    pos = new PositionInParagraph(newIndex, 0, 0);
                    imageCounter++;
                }
            }

            // ✅ 使用 try-with-resources 或显式 close
            try (FileOutputStream out = new FileOutputStream("./images_output.docx")) {
                doc.write(out);
            }
        }
    }
}

⚠️ 注意事项与最佳实践

文件格式：确保输入为 .docx（OOXML 格式），Apache POI 对旧版 .doc 支持有限且不稳定。
依赖版本：使用 Apache POI ≥ 5.2.4，以获得完整的 TextSegment 和 searchText 支持。
图片路径：addPicture 的第三个参数（pictureName）建议传入文件名（如 imgFile.getName()），而非绝对路径，利于文档可移植性。
异常处理：务必包裹 FileInputStream 在 try-with-resources 中，防止资源泄漏；对 images 数组判空，避免 NullPointerException。
性能提示：若文档极大或图片极多，考虑批量写入或启用 XWPFDocument 的 setCompressPictures(true)。

通过以上三重加固（EMU 单位转换、安全索引遍历、TextSegment 语义搜索），即可稳定、精准地在任意 Word 文档关键词后插入图片，真正实现自动化文档生成场景下的生产级需求。

17370845950

✅ 核心问题与修复要点

✅ 完整可运行示例（推荐使用）

⚠️ 注意事项与最佳实践

关于我们

服务项目

广告推广

案例欣赏