Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: add English version of install_deploy #1516

Merged
merged 10 commits into from
Apr 7, 2022
132 changes: 66 additions & 66 deletions docs/en/install_deploy.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Install and deploy
# Install and Deploy

## Software and hardware environment requirements
## Software and Hardware Environment Requirements

* Operating system: CentOS 7, Ubuntu 20.04, macOS >= 10.15. Where Linux glibc version >= 2.17. Other operating system versions have not been fully tested and cannot be guaranteed to function correctly.
cswjd marked this conversation as resolved.
Show resolved Hide resolved
* Memory: Depends on the amount of data, 8 GB and above is recommended.
Expand All @@ -9,14 +9,14 @@
* The number of cores is recommended to be no less than 4 cores. If the CPU does not support the AVX2 instruction set in the Linux environment, the deployment package needs to be recompiled from the source code.


## Deployment package preparation
## Deployment Package Preparation
The precompiled OpenMLDB deployment package is used by default in this documentation ([Linux](https://github.com/4paradigm/OpenMLDB/releases/download/v0.4.3/openmldb-0.4.3-linux.tar.gz) , [macOS](https://github.com/4paradigm/OpenMLDB/releases/download/v0.4.3/openmldb-0.4.3-darwin.tar.gz)), the supported operating system requirements are: CentOS 7, Ubuntu 20.04, macOS >= 10.15. If the user wishes to compile by himself (for example, for OpenMLDB source code development, the operating system or CPU architecture is not in the support list of the precompiled deployment package, etc.), the user can choose to compile and use in the docker container or compile from the source code. For details, please refer to our [compile documentation](compile.md).

## Configure environment (Linux)
## Configure Environment (Linux)

### Disable system swap

Check if the swap area is disabled.
Check the status of the swap area.

```bash
$ free
Expand All @@ -25,15 +25,15 @@ Mem: 264011292 67445840 2230676 3269180 194334776 191204160
Swap: 0 0 0
```

If the swap item is all 0, it means it has been closed, otherwise run the following command to disable all swap
If the swap item is all 0, it means it has been closed, otherwise run the following command to disable all swap.

```
$ swapoff -a
```

### Disable THP (Transparent Huge Pages)

See if THP is off
Check th status of THP.
cswjd marked this conversation as resolved.
Show resolved Hide resolved

```
$ cat /sys/kernel/mm/transparent_hugepage/enabled
Expand All @@ -42,7 +42,7 @@ $ cat /sys/kernel/mm/transparent_hugepage/defrag
[always] madvise never
```

If "never" is not surrounded by square brackets in the above two configurations, it needs to be set
If "never" is not surrounded by square brackets in the above two configurations, it needs to be set.

```bash
$ echo 'never' > /sys/kernel/mm/transparent_hugepage/enabled
Expand All @@ -58,21 +58,21 @@ $ cat /sys/kernel/mm/transparent_hugepage/defrag
always madvise [never]
```

### Time and Zone settings
### Time and zone settings

The OpenMLDB data expiration deletion mechanism relies on the system clock. If the system clock is incorrect, the expired data will not be deleted or the data that has not expired will be deleted.

```bash
$date
Wed Aug 22 16:33:50 CST 2018
```
Please make sure the time is correct
Please make sure the time is correct.

## Deploy standalone version
## Deploy Standalone Version

OpenMLDB standalone version needs to deploy a nameserver and a tablet. The nameserver is used for table management and metadata storage, and the tablet is used for data storage. APIServer is optional. If you want to interact with OpenMLDB in http, you need to deploy this module
OpenMLDB standalone version needs to deploy a nameserver and a tablet. The nameserver is used for table management and metadata storage, and the tablet is used for data storage. APIServer is optional. If you want to interact with OpenMLDB in http, you need to deploy this module.
cswjd marked this conversation as resolved.
Show resolved Hide resolved

**Notice:** It is best to deploy different components in different directories for easy upgrades individually
**Notice:** It is best to deploy different components in different directories for easy upgrades individually.

### Deploy tablet

Expand All @@ -85,16 +85,16 @@ mv openmldb-0.4.3-linux openmldb-tablet-0.4.3
cd openmldb-tablet-0.4.3
```

#### 2 Modify the configuration file conf/standalone_tablet.flags
#### 2 Modify the Configuration File: conf/standalone_tablet.flags

* Modify endpoint. endpoint is the deployment machine ip/domain name and port number separated by colons
* Modify `endpoint`. The endpoint is the deployment machine ip/domain name and port number separated by colons.

```
--endpoint=172.27.128.33:9527
```

**Notice:**
* The endpoint cannot use 0.0.0.0 and 127.0.0.1
* The endpoint cannot use 0.0.0.0 and 127.0.0.1.
* If the domain name is used here, all the machines where the client using openmldb is located must be equipped with the corresponding host. Otherwise, it will not be accessible.

#### 3 Start the service
Expand All @@ -103,9 +103,9 @@ cd openmldb-tablet-0.4.3
sh bin/start.sh start standalone_tablet
```

**Notice**: After the service is started, the standalone_tablet.pid file will be generated in the bin directory, and the process number at startup will be saved in it. If the pid inside the file is running, the startup will fail
**Notice**: After the service is started, the standalone_tablet.pid file will be generated in the bin directory, and the process number at startup will be saved in it. If the pid inside the file is running, the startup will fail.

### Deploy nameserver
### Deploy Nameserver

#### 1 Download the OpenMLDB deployment package

Expand All @@ -116,25 +116,25 @@ mv openmldb-0.4.3-linux openmldb-ns-0.4.3
cd openmldb-ns-0.4.3
```

#### 2 Modify the configuration file conf/standalone_nameserver.flags
#### 2 Modify the Configuration File: conf/standalone_nameserver.flags

* Modify endpoint. endpoint is the deployment machine ip/domain name and port number separated by colons
* The tablet configuration item needs to be configured with the address of the tablet that was started earlier
* Modify `endpoint`. The endpoint is the deployment machine ip/domain name and port number separated by colons.
* The `tablet` configuration item needs to be configured with the address of the tablet that was started earlier.

```
--endpoint=172.27.128.33:6527
--tablet=172.27.128.33:9527
```

**Notice**: endpoint cannot use 0.0.0.0 and 127.0.0.1
**Notice**: The endpoint cannot use 0.0.0.0 and 127.0.0.1.

#### 3 Start the service

```
sh bin/start.sh start standalone_nameserver
```

#### 4 Check if the service is started
#### 4 Verify the running status of the service

```bash
$ ./bin/openmldb --host=172.27.128.33 --port=6527
Expand All @@ -145,7 +145,7 @@ $ ./bin/openmldb --host=172.27.128.33 --port=6527
0 row in set
```

### Deploy apiserver
### Deploy APIServer

APIServer is responsible for receiving http requests, forwarding them to OpenMLDB and returning results. It is stateless and is not a must-deploy component of OpenMLDB.
Before running, make sure that the OpenMLDB cluster has been started, otherwise APIServer will fail to initialize and exit the process.
cswjd marked this conversation as resolved.
Show resolved Hide resolved
Expand All @@ -159,10 +159,10 @@ mv openmldb-0.4.3-linux openmldb-apiserver-0.4.3
cd openmldb-apiserver-0.4.3
```

#### 2 Modify the configuration file conf/standalone_apiserver.flags
#### 2 Modify the Configuration File: conf/standalone_apiserver.flags

* Modify endpoint. endpoint is the deployment machine ip/domain name and port number separated by colons
* Modify nameserver to be the address of nameserver
* Modify `endpoint`. The endpoint is the deployment machine ip/domain name and port number separated by colons.
* Modify `nameserver` to be the address of Nameserver.

```
--endpoint=172.27.128.33:8080
Expand All @@ -179,15 +179,15 @@ cd openmldb-apiserver-0.4.3
sh bin/start.sh start standalone_apiserver
```

## Deploy cluster version
## Deploy Cluster Version

OpenMLDB cluster version needs to deploy zookeeper, nameserver, tablet and other modules. Among them, zookeeper is used for service discovery and saving metadata information. The nameserver is used to manage the tablet, achieve high availability and failover. Tablets are used to store data and synchronize data between master and slave. APIServer is optional. If you want to interact with OpenMLDB in http, you need to deploy this module
OpenMLDB cluster version needs to deploy Zookeeper, Nameserver, Tablet and other modules. Among them, Zookeeper is used for service discovery and saving metadata information. The Nameserver is used to manage the tablet, achieve high availability and failover. Tablets are used to store data and synchronize data between master and slave. APIServer is optional. If you want to interact with OpenMLDB in http, you need to deploy this module.

**Notice:** It is best to deploy different components in different directories for easy upgrades individually. If multiple tablets are deployed on the same machine, they also need to be deployed in different directories
**Notice:** It is best to deploy different components in different directories for easy upgrades individually. If multiple tablets are deployed on the same machine, they also need to be deployed in different directories.

### Deploy zookeeper
### Deploy Zookeeper

It is recommended to deploy version 3.4.14. If there is an available zookeeper cluster, you can skip this step
It is recommended to deploy version 3.4.14. If there is an available zookeeper cluster, you can skip this step.

#### 1. Download the zookeeper installation package

Expand All @@ -197,9 +197,9 @@ cd zookeeper-3.4.14
cp conf/zoo_sample.cfg conf/zoo.cfg
```

#### 2. Modify the configuration file
#### 2. Modify the Configuration File

Open the file `conf/zoo.cfg` and modify `dataDir` and `clientPort`
Open the file `conf/zoo.cfg` and modify `dataDir` and `clientPort`.

```
dataDir=./data
Expand All @@ -212,7 +212,7 @@ clientPort=7181
sh bin/zkServer.sh start
```

Deploy the zookeeper cluster [refer to here](https://zookeeper.apache.org/doc/r3.4.14/zookeeperStarted.html#sc_RunningReplicatedZooKeeper)
Deploy the Zookeeper cluster [refer to here](https://zookeeper.apache.org/doc/r3.4.14/zookeeperStarted.html#sc_RunningReplicatedZooKeeper).

### Deploy tablet

Expand All @@ -225,40 +225,40 @@ mv openmldb-0.4.3-linux openmldb-tablet-0.4.3
cd openmldb-tablet-0.4.3
```

#### 2 Modify the configuration file conf/tablet.flags
#### 2 Modify the Configuration File: conf/tablet.flags

* Modify endpoint. endpoint is the deployment machine ip/domain name and port number separated by colons
* Modify zk_cluster to the already started zk cluster address
* If you share zk with other OpenMLDB, you need to modify zk_root_path
* Modify `endpoint`. The endpoint is the deployment machine ip/domain name and port number separated by colons.
* Modify `zk_cluster` to the already started zk cluster address.
* If you share zk with other OpenMLDB, you need to modify `zk_root_path`.

```
--endpoint=172.27.128.33:9527
--role=tablet

# if tablet run as cluster mode zk_cluster and zk_root_path should be set
# If tablet run as cluster mode zk_cluster and zk_root_path should be set:
--zk_cluster=172.27.128.33:7181,172.27.128.32:7181,172.27.128.31:7181
--zk_root_path=/openmldb_cluster
```

**Notice:**
* The endpoint cannot use 0.0.0.0 and 127.0.0.1
* The endpoint cannot use 0.0.0.0 and 127.0.0.1.
* If the domain name is used here, all the machines where the client using openmldb is located must be equipped with the corresponding host. Otherwise, it will not be accessible.
cswjd marked this conversation as resolved.
Show resolved Hide resolved
* The configuration of zk_cluster and zk_root_path is consistent with that of nameserver
* The configuration of zk_cluster and zk_root_path is consistent with that of Nameserver.

#### 3 Start the service

```
sh bin/start.sh start tablet
```

Repeat the above steps to deploy multiple tablets
Repeat the above steps to deploy multiple tablet.
cswjd marked this conversation as resolved.
Show resolved Hide resolved

**Notice:**
* After the service is started, the tablet.pid file will be generated in the bin directory, and the process number at startup will be saved in it. If the pid inside the file is running, the startup will fail
* Cluster version needs to deploy at least 2 tablets
* If you need to deploy multiple tablets, deploy all the tablets before deploying the nameserver
* After the service is started, the tablet.pid file will be generated in the bin directory, and the process number at startup will be saved in it. If the pid inside the file is running, the startup will fail.
* Cluster version needs to deploy at least 2 tablets.
* If you need to deploy multiple tablets, deploy all the tablets before deploying the Nameserver.

### Deploy nameserver
### Deploy Nameserver

#### 1 Download the OpenMLDB deployment package

Expand All @@ -269,11 +269,11 @@ mv openmldb-0.4.3-linux openmldb-ns-0.4.3
cd openmldb-ns-0.4.3
```

#### 2 Modify the configuration file conf/nameserver.flags
#### 2 Modify the Configuration File: conf/nameserver.flags

* Modify endpoint. endpoint is the deployment machine ip/domain name and port number separated by colons
* Modify zk_cluster to the address of the zk cluster that has been started. ip is the ip of the machine where zk is located, and port is the port number configured by clientPort in the zk configuration file. If zk is in cluster mode, separate it with commas, and the format is ip1:port1,ip2:port2, ip3:port3
* If you share zk with other OpenMLDB, you need to modify zk_root_path
* Modify `endpoint`. The endpoint is the deployment machine ip/domain name and port number separated by colons.
* Modify `zk_cluster` to the address of the zk cluster that has been started. Ip is the ip of the machine where zk is located, and port is the port number configured by clientPort in the zk configuration file. If zk is in cluster mode, separate it with commas, and the format is ip1:port1,ip2:port2, ip3:port3.
cswjd marked this conversation as resolved.
Show resolved Hide resolved
* If you share zk with other OpenMLDB, you need to modify `zk_root_path`.

```
--endpoint=172.27.128.31:6527
Expand All @@ -282,17 +282,17 @@ cd openmldb-ns-0.4.3
--enable_distsql=true
```

**Notice:** endpoint cannot use 0.0.0.0 and 127.0.0.1
**Notice:** The endpoint cannot use 0.0.0.0 and 127.0.0.1.

#### 3 Start the service

```
sh bin/start.sh start nameserver
```

Repeat the above steps to deploy multiple nameservers
Repeat the above steps to deploy multiple nameservers.

#### 4 Check if the service is started
#### 4 Verify the running status of the service

```bash
$ ./bin/openmldb --zk_cluster=172.27.128.31:7181,172.27.128.32:7181,172.27.128.33:7181 --zk_root_path=/openmldb_cluster --role=ns_client
Expand All @@ -302,7 +302,7 @@ $ ./bin/openmldb --zk_cluster=172.27.128.31:7181,172.27.128.32:7181,172.27.128.3
172.27.128.31:6527 leader
```

### Deploy apiserver
### Deploy APIServer

APIServer is responsible for receiving http requests, forwarding them to OpenMLDB and returning results. It is stateless and is not a must-deploy component of OpenMLDB.
Before running, make sure that the OpenMLDB cluster has been started, otherwise APIServer will fail to initialize and exit the process.
Expand All @@ -316,10 +316,10 @@ mv openmldb-0.4.3-linux openmldb-apiserver-0.4.3
cd openmldb-apiserver-0.4.3
```

#### 2 Modify the configuration file conf/apiserver.flags
#### 2 Modify the Configuration File: conf/apiserver.flags

* Modify endpoint. endpoint is the deployment machine ip/domain name and port number separated by colons
* Modify zk_cluster to the zk cluster address of OpenMLDB to be forwarded to
* Modify `endpoint`. The endpoint is the deployment machine ip/domain name and port number separated by colons.
* Modify ``zk_cluster`` to the zk cluster address of OpenMLDB to be forwarded to.

```
--endpoint=172.27.128.33:8080
Expand Down Expand Up @@ -356,14 +356,14 @@ cd openmldb-taskmanager-0.4.3

#### 2 Modify the configuration file conf/taskmanager.properties

* Modify server.host. host is the ip/domain name of the deployment machine.
* Modify server.port. port is the port number of the deployment machine.
* Modify zk_cluster to the address of the zk cluster that has been started. ip is the ip of the machine where zk is located, and port is the port number configured by clientPort in the zk configuration file. If zk is in cluster mode, it is separated by commas, and the format is ip1:port1,ip2:port2,ip3:port3.
* Modify `server.host`. The host is the ip/domain name of the deployment machine.
* Modify `server.port`. THe port is the port number of the deployment machine.
* Modify `zk_cluster` to the address of the zk cluster that has been started. Ip is the ip of the machine where zk is located, and port is the port number configured by clientPort in the zk configuration file. If zk is in cluster mode, it is separated by commas, and the format is ip1:port1,ip2:port2,ip3:port3.
cswjd marked this conversation as resolved.
Show resolved Hide resolved
* If you share zk with other OpenMLDB, you need to modify zookeeper.root_path.
* Modify batchjob.jar.path to the BatchJob Jar file path. If it is set to empty, it will search in the upper-level lib directory. If you use Yarn mode, you need to modify it to the corresponding HDFS path.
* Modify offline.data.prefix to the offline table storage path. If Yarn mode is used, it needs to be modified to the corresponding HDFS path.
* Modify spark.master to run in offline task mode, currently supports local and yarn modes.
* Modify spark.home to the Spark environment path. If it is not configured or the configuration is empty, the configuration of the SPARK_HOME environment variable will be used. It needs to be set as the directory where the spark-optimized package is extracted in the first step, and the path is an absolute path.
* Modify `batchjob.jar.path` to the BatchJob Jar file path. If it is set to empty, it will search in the upper-level lib directory. If you use Yarn mode, you need to modify it to the corresponding HDFS path.
* Modify `offline.data.prefix` to the offline table storage path. If Yarn mode is used, it needs to be modified to the corresponding HDFS path.
* Modify `spark.master` to run in offline task mode, currently supports local and yarn modes.
* Modify `spark.home` to the Spark environment path. If it is not configured or the configuration is empty, the configuration of the SPARK_HOME environment variable will be used. It needs to be set as the directory where the spark-optimized package is extracted in the first step, and the path is an absolute path.

```
server.host=0.0.0.0
Expand All @@ -382,7 +382,7 @@ spark.home=
bin/start.sh start taskmanager
```

#### 4 Check if the service is started
#### 4 Verify the running status of the service

```bash
$ ./bin/openmldb --zk_cluster=172.27.128.31:7181,172.27.128.32:7181,172.27.128.33:7181 --zk_root_path=/openmldb_cluster --role=sql_client
Expand Down