Skip to content

Commit

Permalink
fix: offline load data & show table status (#3349)
Browse files Browse the repository at this point in the history
* fix: offline load data & show table status

* fix

* fix ut

* fix

* fix
  • Loading branch information
vagetablechicken authored Jul 12, 2023
1 parent 0b974ba commit 0902a74
Show file tree
Hide file tree
Showing 10 changed files with 181 additions and 118 deletions.
36 changes: 20 additions & 16 deletions docs/en/reference/sql/ddl/SHOW_TABLE_STATUS.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,22 +13,26 @@ For example, `'%'` means all databases, including the hidden ones.

## Output Information

| Column | Note |
| ----------------- |----------------------------------------------------------------------------------------------------------------------------------------|
| Table_id | It shows the unique id of the table. |
| Table_name | It shows the name of the table. |
| Database_name | It shows the name of the database, which the table belongs to. |
| Storage_type | It shows the storage type of the table. There are three types of value: `memory`,`ssd` and `hdd`. |
| Rows | It shows the number of rows in this table. |
| Memory_data_size | It shows the memory usage of the table in bytes. |
| Disk_data_size | It shows the disk usage of the table in bytes. |
| Partition | It shows the number of partitons of the table. |
| Partition_unalive | It shows the number of the unalive partitions of the table. |
| Replica | It shows the number of replicas of the table. |
| Offline_path | It shows the path of the offline data for this table and is valid only for offline tables. The `NULL` value means the path is not set. |
| Offline_format | It shows the offline data format of the table and is valid only for offline tables. The `NULL` value means it is not set. |
| Offline_deep_copy | It indicates whether deep copy is used on the table and is valid only for offline tables. The `NULL` value means it is not set. |
| Warnings | Warnings related to the table, including the following four types:<br/>1) `leader/follower mode inconsistent`: the leader/follower information from nameserver is not consistent with those in tablet<br/>2) `state is kNotFound/kTableUndefined/kTableLoading`:the partition is unavailable, `kNotFound` means the partition does not exist; `kTableUndefined` means the partition is not loaded successfully; `kTableLoading` means the partition is being loaded<br/>3) `real replica number xx does not match the configured replicanum xx`:the number of replicas != `replicanum`<br/>4) `not connected to leader`:follower is not connected to the leader, which usually occurs together with 3) |
| Column | Note |
| ---------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| Table_id | It shows the unique id of the table. |
| Table_name | It shows the name of the table. |
| Database_name | It shows the name of the database, which the table belongs to. |
| Storage_type | It shows the storage type of the table. There are three types of value: `memory`,`ssd` and `hdd`. |
| Rows | It shows the number of rows in this table. |
| Memory_data_size | It shows the memory usage of the table in bytes. |
| Disk_data_size | It shows the disk usage of the table in bytes. |
| Partition | It shows the number of partitons of the table. |
| Partition_unalive | It shows the number of the unalive partitions of the table. |
| Replica | It shows the number of replicas of the table. |
| Offline_path | It shows the path of the offline data for this table and is valid only for offline tables. The `NULL` value means the path is not set. |
| Offline_format | It shows the offline data format of the table and is valid only for offline tables. The `NULL` value means it is not set. |
| Offline_symbolic_paths | It shows the paths that load data in deep_copy==false way. The `NULL` value means it is not set. |
| Warnings | Warnings related to the table, including the following four types:<br/>1) `leader/follower mode inconsistent`: the leader/follower information from nameserver is not consistent with those in tablet<br/>2) `state is kNotFound/kTableUndefined/kTableLoading`:the partition is unavailable, `kNotFound` means the partition does not exist; `kTableUndefined` means the partition is not loaded successfully; `kTableLoading` means the partition is being loaded<br/>3) `real replica number xx does not match the configured replicanum xx`:the number of replicas != `replicanum`<br/>4) `not connected to leader`:follower is not connected to the leader, which usually occurs together with 3) |

```{note}
When version <=0.8.1, `Offline_symbolic_paths` doesn't exist, the position is used by `Offline_deep_copy` instead.
```

## Example

Expand Down
2 changes: 1 addition & 1 deletion docs/zh/deploy/conf.md
Original file line number Diff line number Diff line change
Expand Up @@ -301,7 +301,7 @@ local模式即Spark任务运行在本地(TaskManager所在主机),该模
"yarn"和"yarn-cluster"是同一个模式,即Spark任务运行在Yarn集群上,该模式下需要配置的参数较多,主要包括:
-**启动TaskManager前**配置环境变量`HADOOP_CONF_DIR`为Hadoop和Yarn的配置文件所在目录,文件目录中应包含Hadoop的`core-site.xml``hdfs-site.xml`、Yarn的`yarn-site.xml`等配置文件,参考[Spark官方文档](https://spark.apache.org/docs/3.2.1/running-on-yarn.html#launching-spark-on-yarn)
- `spark.yarn.jars`配置Yarn需要读取的Spark运行jar包地址,必须是`hdfs://`地址。可以上传[OpenMLDB Spark 发行版](../../tutorial/openmldbspark_distribution.md)解压后的`jars`目录到HDFS上,并配置为`hdfs://<hdfs_path>/jars/*`(注意通配符)。[如果不配置该参数,Yarn会将`$SPARK_HOME/jars`打包上传分发,并且每次离线任务都要分发](https://spark.apache.org/docs/3.2.1/running-on-yarn.html#preparations),效率较低,所以推荐配置。
- `batchjob.jar.path`必须是HDFS路径,上传batchjob jar包到HDFS上,并配置为对应地址,保证Yarn集群上所有Worker可以获得batchjob包。
- `batchjob.jar.path`必须是HDFS路径(具体到包名),上传batchjob jar包到HDFS上,并配置为对应地址,保证Yarn集群上所有Worker可以获得batchjob包。
- `offline.data.prefix`必须是HDFS路径,保证Yarn集群上所有Worker可读写数据。应使用前面配置的环境变量`HADOOP_CONF_DIR`中的Hadoop集群地址。

##### yarn-client模式
Expand Down
4 changes: 2 additions & 2 deletions docs/zh/openmldb_sql/ddl/SET_STATEMENT.md
Original file line number Diff line number Diff line change
Expand Up @@ -148,7 +148,7 @@ CREATE TABLE t1 (col0 STRING, col1 int, std_time TIMESTAMP, INDEX(KEY=col1, TS=s

### 离线命令配置详情

- 设置离线命令同步执行,同步的超时时间将自动设置
- 设置离线命令同步执行,同步的超时时间将自动设置为gflag `sync_job_timeout`,默认30min

```sql
> SET @@sync_job = "true";
Expand All @@ -160,7 +160,7 @@ CREATE TABLE t1 (col0 STRING, col1 int, std_time TIMESTAMP, INDEX(KEY=col1, TS=s
- 配置客户端`--sync_job_timeout`,不可大于`server.channel_keep_alive_time`。SDK暂不支持修改。
```

- 设置离线异步命令或离线管理命令的等待时间(单位为毫秒):
- 离线命令异步执行时,同样会有超时时间,可手动配置。设置离线异步命令或离线管理命令的等待时间(单位为毫秒):
```sql
> SET @@job_timeout = "600000";
```
Expand Down
Loading

0 comments on commit 0902a74

Please sign in to comment.