Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: offline load data & show table status #3349

Merged
merged 6 commits into from
Jul 12, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
36 changes: 20 additions & 16 deletions docs/en/reference/sql/ddl/SHOW_TABLE_STATUS.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,22 +13,26 @@ For example, `'%'` means all databases, including the hidden ones.

## Output Information

| Column | Note |
| ----------------- |----------------------------------------------------------------------------------------------------------------------------------------|
| Table_id | It shows the unique id of the table. |
| Table_name | It shows the name of the table. |
| Database_name | It shows the name of the database, which the table belongs to. |
| Storage_type | It shows the storage type of the table. There are three types of value: `memory`,`ssd` and `hdd`. |
| Rows | It shows the number of rows in this table. |
| Memory_data_size | It shows the memory usage of the table in bytes. |
| Disk_data_size | It shows the disk usage of the table in bytes. |
| Partition | It shows the number of partitons of the table. |
| Partition_unalive | It shows the number of the unalive partitions of the table. |
| Replica | It shows the number of replicas of the table. |
| Offline_path | It shows the path of the offline data for this table and is valid only for offline tables. The `NULL` value means the path is not set. |
| Offline_format | It shows the offline data format of the table and is valid only for offline tables. The `NULL` value means it is not set. |
| Offline_deep_copy | It indicates whether deep copy is used on the table and is valid only for offline tables. The `NULL` value means it is not set. |
| Warnings | Warnings related to the table, including the following four types:<br/>1) `leader/follower mode inconsistent`: the leader/follower information from nameserver is not consistent with those in tablet<br/>2) `state is kNotFound/kTableUndefined/kTableLoading`:the partition is unavailable, `kNotFound` means the partition does not exist; `kTableUndefined` means the partition is not loaded successfully; `kTableLoading` means the partition is being loaded<br/>3) `real replica number xx does not match the configured replicanum xx`:the number of replicas != `replicanum`<br/>4) `not connected to leader`:follower is not connected to the leader, which usually occurs together with 3) |
| Column | Note |
| ---------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| Table_id | It shows the unique id of the table. |
| Table_name | It shows the name of the table. |
| Database_name | It shows the name of the database, which the table belongs to. |
| Storage_type | It shows the storage type of the table. There are three types of value: `memory`,`ssd` and `hdd`. |
| Rows | It shows the number of rows in this table. |
| Memory_data_size | It shows the memory usage of the table in bytes. |
| Disk_data_size | It shows the disk usage of the table in bytes. |
| Partition | It shows the number of partitons of the table. |
| Partition_unalive | It shows the number of the unalive partitions of the table. |
| Replica | It shows the number of replicas of the table. |
| Offline_path | It shows the path of the offline data for this table and is valid only for offline tables. The `NULL` value means the path is not set. |
| Offline_format | It shows the offline data format of the table and is valid only for offline tables. The `NULL` value means it is not set. |
| Offline_symbolic_paths | It shows the paths that load data in deep_copy==false way. The `NULL` value means it is not set. |
| Warnings | Warnings related to the table, including the following four types:<br/>1) `leader/follower mode inconsistent`: the leader/follower information from nameserver is not consistent with those in tablet<br/>2) `state is kNotFound/kTableUndefined/kTableLoading`:the partition is unavailable, `kNotFound` means the partition does not exist; `kTableUndefined` means the partition is not loaded successfully; `kTableLoading` means the partition is being loaded<br/>3) `real replica number xx does not match the configured replicanum xx`:the number of replicas != `replicanum`<br/>4) `not connected to leader`:follower is not connected to the leader, which usually occurs together with 3) |

```{note}
When version <=0.8.1, `Offline_symbolic_paths` doesn't exist, the position is used by `Offline_deep_copy` instead.
```

## Example

Expand Down
2 changes: 1 addition & 1 deletion docs/zh/deploy/conf.md
Original file line number Diff line number Diff line change
Expand Up @@ -301,7 +301,7 @@ local模式即Spark任务运行在本地(TaskManager所在主机),该模
"yarn"和"yarn-cluster"是同一个模式,即Spark任务运行在Yarn集群上,该模式下需要配置的参数较多,主要包括:
- 在**启动TaskManager前**配置环境变量`HADOOP_CONF_DIR`为Hadoop和Yarn的配置文件所在目录,文件目录中应包含Hadoop的`core-site.xml`、`hdfs-site.xml`、Yarn的`yarn-site.xml`等配置文件,参考[Spark官方文档](https://spark.apache.org/docs/3.2.1/running-on-yarn.html#launching-spark-on-yarn)。
- `spark.yarn.jars`配置Yarn需要读取的Spark运行jar包地址,必须是`hdfs://`地址。可以上传[OpenMLDB Spark 发行版](../../tutorial/openmldbspark_distribution.md)解压后的`jars`目录到HDFS上,并配置为`hdfs://<hdfs_path>/jars/*`(注意通配符)。[如果不配置该参数,Yarn会将`$SPARK_HOME/jars`打包上传分发,并且每次离线任务都要分发](https://spark.apache.org/docs/3.2.1/running-on-yarn.html#preparations),效率较低,所以推荐配置。
- `batchjob.jar.path`必须是HDFS路径,上传batchjob jar包到HDFS上,并配置为对应地址,保证Yarn集群上所有Worker可以获得batchjob包。
- `batchjob.jar.path`必须是HDFS路径(具体到包名),上传batchjob jar包到HDFS上,并配置为对应地址,保证Yarn集群上所有Worker可以获得batchjob包。
- `offline.data.prefix`必须是HDFS路径,保证Yarn集群上所有Worker可读写数据。应使用前面配置的环境变量`HADOOP_CONF_DIR`中的Hadoop集群地址。

##### yarn-client模式
Expand Down
4 changes: 2 additions & 2 deletions docs/zh/openmldb_sql/ddl/SET_STATEMENT.md
Original file line number Diff line number Diff line change
Expand Up @@ -148,7 +148,7 @@ CREATE TABLE t1 (col0 STRING, col1 int, std_time TIMESTAMP, INDEX(KEY=col1, TS=s

### 离线命令配置详情

- 设置离线命令同步执行,同步的超时时间将自动设置
- 设置离线命令同步执行,同步的超时时间将自动设置为gflag `sync_job_timeout`,默认30min

```sql
> SET @@sync_job = "true";
Expand All @@ -160,7 +160,7 @@ CREATE TABLE t1 (col0 STRING, col1 int, std_time TIMESTAMP, INDEX(KEY=col1, TS=s
- 配置客户端`--sync_job_timeout`,不可大于`server.channel_keep_alive_time`。SDK暂不支持修改。
```

- 设置离线异步命令或离线管理命令的等待时间(单位为毫秒):
- 离线命令异步执行时,同样会有超时时间,可手动配置。设置离线异步命令或离线管理命令的等待时间(单位为毫秒):
```sql
> SET @@job_timeout = "600000";
```
Expand Down
Loading
Loading