Mysql 特定场景下批量选择时 between 的性能要好于 limit offset

作为技术人员，一个良好的技术环境是非常重要的，一个人闭门造车往往容易误入歧途。比如在之前的业务中，需要使用Logstash从MySQL中读取数据（ #8 ）并导入到ElasticSearch中，由于数据量非常大（7千万），没法一次全部读取到内存中，所以就只能使用limit一次一点的去读取（一次一万），使用的是大约如下的SQL

```sql
sql> select * from table_name limit 1000000,2;
```

然后经过简单处理之后导入到MySQL中，程序刚启动时，读取一次非常快，后面就渐渐变慢了，

```sql
[2018-12-27 11:43:56] 2 rows retrieved starting from 1 in 7m 25s 157ms (execution: 7m 25s 99ms, fetching: 58ms)
```

不过因为这个整体的导入是一个非常低频的操作，所以也就没有深究。

后来因为人员变动，我的工作要交接给另一个大牛，他发现了这个问题，然后把我的limit换成了between，产生了奇效！

```sql
sql> select * from table_name where id BETWEEN 1457581 and 1457582
[2018-12-27 11:44:03] 2 rows retrieved starting from 1 in 92ms (execution: 88ms, fetching: 4ms)
```
不得不说虽然基本上实现了同样的功能，但是他们之间的差距真的是非常的大，原来limit并不是我想象的那种根据索引直接查询，而且要全表走一遍。以下是两个sql语句的详细分析

```sql
mysql> explain select * from table_name limit 1000,1;
+----+-------------+------------+------------+------+---------------+------+---------+------+-----------+-------+
| id | select_type | table      | partitions | type | possible_keys | key  | key_len | ref  | rows      | Extra |
+----+-------------+------------+------------+------+---------------+------+---------+------+-----------+-------+
|  1 | SIMPLE      | table_name | NULL       | ALL  | NULL          | NULL | NULL    | NULL |  23730731 | NULL  |
+----+-------------+------------+------------+------+---------------+------+---------+------+-----------+-------+
1 row in set, 1 warning (0.01 sec)

mysql> explain select * from table_name where id between 1000 and 1010;
+----+-------------+------------+------------+-------+---------------+---------+---------+------+------+-------------+
| id | select_type | table      | partitions | type  | possible_keys | key     | key_len | ref  | rows | Extra       |
+----+-------------+------------+------------+-------+---------------+---------+---------+------+------+-------------+
|  1 | SIMPLE      | table_name | NULL       | range | PRIMARY       | PRIMARY | 4       | NULL |    1 | Using where |
+----+-------------+------------+------------+-------+---------------+---------+---------+------+------+-------------+
1 row in set, 1 warning (0.00 sec)
```
对比上面的expalin的输出结果，我们可以看到不同的地方主要是`type`和`rows`。下面就结合[官方文档](https://dev.mysql.com/doc/refman/8.0/en/explain-output.html)来看看不一样的地方到底有啥差别：

 ##### type
> The type column of EXPLAIN output describes how tables are joined.
*range*
Only rows that are in a given range are retrieved, using an index to select the rows. The key column in the output row indicates which index is used. The key_len contains the longest key part that was used.
*ALL*
A full table scan is done for each combination of rows from the previous tables. 

##### rows
> The rows column indicates the number of rows MySQL believes it must examine to execute the query.
For InnoDB tables, this number is an estimate, and may not always be exact.

从上面的解释可以看出来，当`type`是range的时候，只有符合条件的数据才会被拉取，而且还利用到了索引，但是如果是ALL，需要做一个全表扫描。所以就慢了非常非常多。而且`rows`也给出了大约的需要访问的数据条数，这也是一个重要的参考指标。

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Mysql 特定场景下批量选择时 between 的性能要好于 limit offset #39

type

rows

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Mysql 特定场景下批量选择时 between 的性能要好于 limit offset #39

Description

type

rows

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions