MySQL数据去重需按场景选择方法:查重用GROUP BY或DISTINCT;删重推荐ROW_NUMBER()(8.0+)或自连接;预防重复须加唯一索引并配合INSERT IGNORE/ON DUPLICATE KEY UPDATE。
MySQL中数据去重不能靠“一键清除”,得根据场景选对方法:是临时查重、保留一条、彻底删重,还是避免重复写入。核心思路就两条:用GROUP BY或DISTINCT查出唯一值;用ROW_NUMBER()(8.0+)或自连接/子查询删掉冗余行。
先确认哪些字段组合存在重复,再决定怎么处理。常用写法:
SELECT name, email, COUNT(*) FROM users GROUP BY name, email HAVING COUNT(*) > 1;
SELECT * FROM users WHERE (name, email) IN (SELECT name, email FROM users GROUP BY name, email HAVING COUNT(*) > 1);
适用于已有重复,需清理历史数据。推荐用窗口函数(MySQL 8.0+)更安全清晰:
DELETE t1 FROM users t1 INNER JOIN users t2 WHERE t1.name = t2.name AND t1.email = t2.email AND t1.id
DELETE FROM users WHERE id IN (SELECT id FROM (SELECT id, ROW_NUMBER() OVER (PARTITION BY name, email ORDER BY id DESC) rn FROM users) t WHERE rn > 1);
适合报表、接口等只读场景,简单高
效:
SELECT DISTINCT name, email FROM users;
SELECT name, email, MAX(created_at) as latest_time FROM users GROUP BY name, email;
比事后清理更重要。关键在约束和逻辑:
ALTER TABLE users ADD UNIQUE INDEX uk_name_email (name, email); 插入重复时直接报错INSERT IGNORE或ON DUPLICATE KEY UPDATE 处理冲突,例如:INSERT INTO users (name, email) VALUES ('张三','z@x.com') ON DUPLICATE KEY UPDATE updated_at = NOW();
SELECT检查,再插入,同时表上仍有唯一索引兜底