Skip to content

Mysql 特定场景下批量选择时 between 的性能要好于 limit offset #39

Open
@Shellbye

Description

@Shellbye

作为技术人员,一个良好的技术环境是非常重要的,一个人闭门造车往往容易误入歧途。比如在之前的业务中,需要使用Logstash从MySQL中读取数据( #8 )并导入到ElasticSearch中,由于数据量非常大(7千万),没法一次全部读取到内存中,所以就只能使用limit一次一点的去读取(一次一万),使用的是大约如下的SQL

sql> select * from table_name limit 1000000,2;

然后经过简单处理之后导入到MySQL中,程序刚启动时,读取一次非常快,后面就渐渐变慢了,

[2018-12-27 11:43:56] 2 rows retrieved starting from 1 in 7m 25s 157ms (execution: 7m 25s 99ms, fetching: 58ms)

不过因为这个整体的导入是一个非常低频的操作,所以也就没有深究。

后来因为人员变动,我的工作要交接给另一个大牛,他发现了这个问题,然后把我的limit换成了between,产生了奇效!

sql> select * from table_name where id BETWEEN 1457581 and 1457582
[2018-12-27 11:44:03] 2 rows retrieved starting from 1 in 92ms (execution: 88ms, fetching: 4ms)

不得不说虽然基本上实现了同样的功能,但是他们之间的差距真的是非常的大,原来limit并不是我想象的那种根据索引直接查询,而且要全表走一遍。以下是两个sql语句的详细分析

mysql> explain select * from table_name limit 1000,1;
+----+-------------+------------+------------+------+---------------+------+---------+------+-----------+-------+
| id | select_type | table      | partitions | type | possible_keys | key  | key_len | ref  | rows      | Extra |
+----+-------------+------------+------------+------+---------------+------+---------+------+-----------+-------+
|  1 | SIMPLE      | table_name | NULL       | ALL  | NULL          | NULL | NULL    | NULL |  23730731 | NULL  |
+----+-------------+------------+------------+------+---------------+------+---------+------+-----------+-------+
1 row in set, 1 warning (0.01 sec)

mysql> explain select * from table_name where id between 1000 and 1010;
+----+-------------+------------+------------+-------+---------------+---------+---------+------+------+-------------+
| id | select_type | table      | partitions | type  | possible_keys | key     | key_len | ref  | rows | Extra       |
+----+-------------+------------+------------+-------+---------------+---------+---------+------+------+-------------+
|  1 | SIMPLE      | table_name | NULL       | range | PRIMARY       | PRIMARY | 4       | NULL |    1 | Using where |
+----+-------------+------------+------------+-------+---------------+---------+---------+------+------+-------------+
1 row in set, 1 warning (0.00 sec)

对比上面的expalin的输出结果,我们可以看到不同的地方主要是typerows。下面就结合官方文档来看看不一样的地方到底有啥差别:

type

The type column of EXPLAIN output describes how tables are joined.
range
Only rows that are in a given range are retrieved, using an index to select the rows. The key column in the output row indicates which index is used. The key_len contains the longest key part that was used.
ALL
A full table scan is done for each combination of rows from the previous tables.

rows

The rows column indicates the number of rows MySQL believes it must examine to execute the query.
For InnoDB tables, this number is an estimate, and may not always be exact.

从上面的解释可以看出来,当type是range的时候,只有符合条件的数据才会被拉取,而且还利用到了索引,但是如果是ALL,需要做一个全表扫描。所以就慢了非常非常多。而且rows也给出了大约的需要访问的数据条数,这也是一个重要的参考指标。

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions