fix: offline load data & show table status (#3349)

* fix: offline load data & show table status * fix * fix ut * fix * fix
4paradigm · Jul 12, 2023 · 0902a74 · 0902a74
1 parent 0b974ba
commit 0902a74
Show file tree

Hide file tree

Showing 10 changed files with 181 additions and 118 deletions.
diff --git a/docs/en/reference/sql/ddl/SHOW_TABLE_STATUS.md b/docs/en/reference/sql/ddl/SHOW_TABLE_STATUS.md
@@ -13,22 +13,26 @@ For example, `'%'` means all databases, including the hidden ones.
 
 ## Output Information
 
-| Column            | Note                                                                                                                                   |
-| ----------------- |----------------------------------------------------------------------------------------------------------------------------------------|
-| Table_id          | It shows the unique id of the table.                                                                                                   |
-| Table_name        | It shows the name of the table.                                                                                                        |
-| Database_name     | It shows the name of the database, which the table belongs to.                                                                         |
-| Storage_type      | It shows the storage type of the table. There are three types of value: `memory`,`ssd` and `hdd`.                                      |
-| Rows              | It shows the number of rows in this table.                                                                                             |
-| Memory_data_size  | It shows the memory usage of the table in bytes.                                                                                       |
-| Disk_data_size    | It shows the disk usage of the table in bytes.                                                                                         |
-| Partition         | It shows the number of partitons of the table.                                                                                         |
-| Partition_unalive | It shows the number of the unalive partitions of the table.                                                                            |
-| Replica           | It shows the number of replicas of the table.                                                                                              |
-| Offline_path      | It shows the path of the offline data for this table and is valid only for offline tables. The `NULL` value means the path is not set. |
-| Offline_format    | It shows the offline data format of the table and is valid only for offline tables. The `NULL` value means it is not set.              |
-| Offline_deep_copy | It indicates whether deep copy is used on the table and is valid only for offline tables. The `NULL` value means it is not set.        |
-| Warnings          | Warnings related to the table, including the following four types：<br/>1) `leader/follower mode inconsistent`: the leader/follower information from nameserver is not consistent with those in tablet<br/>2) `state is kNotFound/kTableUndefined/kTableLoading`：the partition is unavailable, `kNotFound` means the partition does not exist; `kTableUndefined` means the partition is not loaded successfully; `kTableLoading` means the partition is being loaded<br/>3) `real replica number xx does not match the configured replicanum xx`：the number of replicas != `replicanum`<br/>4) `not connected to leader`：follower is not connected to the leader, which usually occurs together with 3) |
+| Column                 | Note                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       |
+| ---------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
+| Table_id               | It shows the unique id of the table.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       |
+| Table_name             | It shows the name of the table.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            |
+| Database_name          | It shows the name of the database, which the table belongs to.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |
+| Storage_type           | It shows the storage type of the table. There are three types of value: `memory`,`ssd` and `hdd`.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          |
+| Rows                   | It shows the number of rows in this table.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 |
+| Memory_data_size       | It shows the memory usage of the table in bytes.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           |
+| Disk_data_size         | It shows the disk usage of the table in bytes.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |
+| Partition              | It shows the number of partitons of the table.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |
+| Partition_unalive      | It shows the number of the unalive partitions of the table.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                |
+| Replica                | It shows the number of replicas of the table.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              |
+| Offline_path           | It shows the path of the offline data for this table and is valid only for offline tables. The `NULL` value means the path is not set.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     |
+| Offline_format         | It shows the offline data format of the table and is valid only for offline tables. The `NULL` value means it is not set.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  |
+| Offline_symbolic_paths | It shows the paths that load data in deep_copy==false way. The `NULL` value means it is not set.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            |
+| Warnings               | Warnings related to the table, including the following four types：<br/>1) `leader/follower mode inconsistent`: the leader/follower information from nameserver is not consistent with those in tablet<br/>2) `state is kNotFound/kTableUndefined/kTableLoading`：the partition is unavailable, `kNotFound` means the partition does not exist; `kTableUndefined` means the partition is not loaded successfully; `kTableLoading` means the partition is being loaded<br/>3) `real replica number xx does not match the configured replicanum xx`：the number of replicas != `replicanum`<br/>4) `not connected to leader`：follower is not connected to the leader, which usually occurs together with 3) |
+
+```{note}
+When version <=0.8.1, `Offline_symbolic_paths` doesn't exist, the position is used by `Offline_deep_copy` instead.
+```
 
 ## Example
 

diff --git a/docs/zh/deploy/conf.md b/docs/zh/deploy/conf.md
@@ -301,7 +301,7 @@ local模式即Spark任务运行在本地（TaskManager所在主机），该模
 "yarn"和"yarn-cluster"是同一个模式，即Spark任务运行在Yarn集群上，该模式下需要配置的参数较多，主要包括：
 - 在**启动TaskManager前**配置环境变量`HADOOP_CONF_DIR`为Hadoop和Yarn的配置文件所在目录，文件目录中应包含Hadoop的`core-site.xml`、`hdfs-site.xml`、Yarn的`yarn-site.xml`等配置文件，参考[Spark官方文档](https://spark.apache.org/docs/3.2.1/running-on-yarn.html#launching-spark-on-yarn)。
 - `spark.yarn.jars`配置Yarn需要读取的Spark运行jar包地址，必须是`hdfs://`地址。可以上传[OpenMLDB Spark 发行版](../../tutorial/openmldbspark_distribution.md)解压后的`jars`目录到HDFS上，并配置为`hdfs://<hdfs_path>/jars/*`（注意通配符）。[如果不配置该参数，Yarn会将`$SPARK_HOME/jars`打包上传分发，并且每次离线任务都要分发](https://spark.apache.org/docs/3.2.1/running-on-yarn.html#preparations)，效率较低，所以推荐配置。
-- `batchjob.jar.path`必须是HDFS路径，上传batchjob jar包到HDFS上，并配置为对应地址，保证Yarn集群上所有Worker可以获得batchjob包。
+- `batchjob.jar.path`必须是HDFS路径（具体到包名），上传batchjob jar包到HDFS上，并配置为对应地址，保证Yarn集群上所有Worker可以获得batchjob包。
 - `offline.data.prefix`必须是HDFS路径，保证Yarn集群上所有Worker可读写数据。应使用前面配置的环境变量`HADOOP_CONF_DIR`中的Hadoop集群地址。
 
 ##### yarn-client模式

diff --git a/docs/zh/openmldb_sql/ddl/SET_STATEMENT.md b/docs/zh/openmldb_sql/ddl/SET_STATEMENT.md
@@ -148,7 +148,7 @@ CREATE TABLE t1 (col0 STRING, col1 int, std_time TIMESTAMP, INDEX(KEY=col1, TS=s
 
 ### 离线命令配置详情
 
-- 设置离线命令同步执行，同步的超时时间将自动设置：
+- 设置离线命令同步执行，同步的超时时间将自动设置为gflag `sync_job_timeout`，默认30min：
 
 ```sql
 > SET @@sync_job = "true";
@@ -160,7 +160,7 @@ CREATE TABLE t1 (col0 STRING, col1 int, std_time TIMESTAMP, INDEX(KEY=col1, TS=s
 - 配置客户端`--sync_job_timeout`，不可大于`server.channel_keep_alive_time`。SDK暂不支持修改。
 ```
 
-- 设置离线异步命令或离线管理命令的等待时间(单位为毫秒)：
+- 离线命令异步执行时，同样会有超时时间，可手动配置。设置离线异步命令或离线管理命令的等待时间(单位为毫秒)：
 ```sql
 > SET @@job_timeout = "600000";
 ```