Explanation of the overall steps for creating a new UDF

apache · Jun 21, 2023 · 0c10baa · 0c10baa
1 parent 02d10e5
commit 0c10baa
Show file tree

Hide file tree

Showing 26 changed files with 702 additions and 17 deletions.
diff --git a/docs/user-guide/datasource-client.md b/docs/user-guide/datasource-client.md
@@ -1,6 +1,6 @@
 ---
 title: DataSource Client SDK
-sidebar_position: 4
+sidebar_position: 6
 ---
 
 > ```

diff --git a/docs/user-guide/datasource-manual.md b/docs/user-guide/datasource-manual.md
@@ -1,6 +1,6 @@
 ---
 title: Data Source Manual 
-sidebar_position: 7
+sidebar_position: 4
 ---
 
 > Introduce how to use the new feature function data source of version 1.1.0

diff --git a/docs/user-guide/dynamic-variables.md b/docs/user-guide/dynamic-variables.md
@@ -1,6 +1,6 @@
 ---
 title: built-in time variable
-sidebar_position: 6
+sidebar_position: 7
 ---
 
 ## 1. Overview

diff --git a/docs/user-guide/linkiscli-manual.md b/docs/user-guide/linkiscli-manual.md
@@ -1,6 +1,6 @@
 ---
 title: Shell Scripts Manual
-sidebar_position: 3
+sidebar_position: 2
 ---
 
 ## 1.Introduction

diff --git a/docs/user-guide/udf-function.md b/docs/user-guide/udf-function.md
@@ -0,0 +1,137 @@
+---
+title: UDF Function
+sidebar_position: 5
+---
+
+## 1 Introduction to UDF
+UDF: User Defined Function, user-defined function. In some scenarios, we need to use hive functions to process some data. Functions like count() and sum() are built-in. If we want to use some functions that are not built-in, we need to customize the function, which can be done by writing UDF.
+
+
+## 1.Overview of the overall steps for creating UDF
+### 1 Universal type UDF functions
+Overall step description
+- Write UDF functions in UDF format locally and package them as jar package files
+- 【Scriptis >> Workspace】Upload to the corresponding directory in the workspace
+- 【Management Console>>UDF Function】 Create udf (default loading)
+- Used in task code (only effective for newly started engines)
+
+**Step1 Writing jar packages locally**
+
+UDF Example：https://help.aliyun.com/apsara/agile/v_3_6_0_20210705/odps/ase-user-guide/udf-example.html
+
+**Step2【Scriptis >> Workspace】Upload jar package**
+Select the corresponding folder and right-click to select Upload
+
+**Step3【Management Console>>UDF Function】 Create UDF**
+- Function name: Conforming to the rules is sufficient, such as test_ Udf_ Using jar in scripts such as SQL
+- Function Type: General
+- Script path: Select the shared directory path where the jar package is stored, such as../..// Wds_ Functions_ 1_ 0_ 0. jar
+- Registration format: package name+class name, such as com.webank.wedatasphere.willink.bdp.udf.ToUpperCase
+- Usage format: Input type and return type must be consistent with the definition in the jar package
+- Classification: drop-down selection; Alternatively, enter a custom directory (which will create a new target level directory under the personal function)
+
+Note that the newly created UDF function is loaded by default and can be viewed on the [Scriptis>>UDF Functions] page for easy viewing during Scriptis task editing. Checking the UDF function indicates that it will be loaded and used
+
+**Step4  Use this udf function**
+
+Innovative udf function using the above steps in the task
+Function name is [Create UDF] Function name
+In pyspark:
+print (sqlContext.sql("select test_udf_jar(name1) from stacyyan_ind.result_sort_1_20200226").collect())
+
+### 2 UDF functions of Spark type
+Overall step description
+- Create a new Spark script file in the desired directory in the [Scriptis>>workspace]
+- Create UDF in [Management Console>>UDF Functions] (default loading)
+- Used in task code (only effective for newly started engines)
+
+**Step1 dss-scriptis-Create a new scala script**
+
+def helloWorld(str: String): String = "hello, " + str
+
+**Step2 Create UDF**
+- Function name: Conforming to the rules is sufficient, such as test_ Udf_ Scala
+- Function type: spark
+- Script Path:../..// B
+- Registration format: The input type and return type must be consistent with the definition; The function names that need to be defined in the registration format must be strictly consistent, such as helloWorld
+- Classification: Drop down and select the first level directory that exists under dss scriptis UDF function - Personal function; Alternatively, enter a custom directory (which will create a new target level directory under the personal function)
+
+**Step3 Use this udf function**
+
+Use the above steps in the task to create a new udf function
+Function name is [Create UDF] Function name
+- In scala
+  val s=sqlContext.sql("select test_udf_scala(name1)
+  from stacyyan_ind.result_sort_1_20200226")
+  show(s)
+- in pyspark
+  print(sqlContext.sql("select test_udf_scala(name1)
+  from stacyyan_ind.result_sort_1_20200226").collect());
+- in sql
+  select test_udf_scala(name1) from stacyyan_ind.result_sort_1_20200226;
+
+### 3 Python functions
+Overall step description
+- Create a new Python script file in the desired directory in the [Scriptis>>workspace]
+- Create UDF in [Management Console>>UDF Functions] (default loading)
+- Used in task code (only effective for newly started engines)
+
+**Step1 dss-scriptis-Create a new pyspark script**
+def addation(a, b):
+return a + b
+Step2 Create UDF
+- Function name: Conforming to the rules is sufficient, such as test_ Udf_ Py
+- Function type: spark
+- Script Path:../..// A
+- Registration format: The function names that need to be defined must be strictly consistent, such as addition
+- Usage format: The input type and return type must be consistent with the definition
+- Classification: Drop down and select the first level directory that exists under dss scriptis UDF function - Personal function; Alternatively, enter a custom directory (which will create a new target level directory under the personal function)
+
+**Step3 uses this udf function**
+Use the above steps in the task to create a new udf function
+Function name is [Create UDF] Function name
+- in pyspark
+  print(sqlContext.sql("select test_udf_py(pv,impression) from neiljianliu_ind.alias where entityid=504059 limit 50").collect());
+- in sql
+  select test_udf_py(pv,impression) from neiljianliu_ind.alias where entityid=504059 limit 50
+
+### 4 scala functions
+Overall step description
+- Create a new Spark Scala script file in the desired directory in the [Scriptis>>workspace]
+- Create UDF in [Management Console>>UDF Functions] (default loading)
+- Used in task code (only effective for newly started engines)
+-
+**Step1 dss-scriptis-Create a new scala script**
+def hellozdy(str:String):String = "hellozdy,haha " + str
+**Step2 CREATE FUNCTION**
+- Function name: Must be strictly consistent with the defined function name, such as hellozdy
+- Function Type: Custom Function
+- Script Path:../..// D
+- Usage format: The input type and return type must be consistent with the definition
+- Classification: Drop down and select the first level directory that exists under dss scriptis method function personal function; Alternatively, enter a custom directory (which will create a new target level directory under the personal function)
+  Step3 Use this function
+  Use the above steps in the task to create a new udf function
+  Function name is [Create UDF] Function name
+  val a = hellozdy("abcd");
+  print(a)
+
+### 5 Common usage issues
+#### 5.1 UDF function loading failed
+"FAILED: SemanticException [Error 10011]: Invalid function xxxx"
+
+  ![](/Images/udf/udf_10.png)
+
+- Firstly, check if the UDF function configuration is correct:
+
+  ![](/Images/udf/udf_11.png)
+
+- The registration format is the function path name:
+
+  ![](/Images/udf/udf_12.png)
+
+- Check the scriptis udf function to see if the loaded function is checked. When the function is not checked, udf will not be loaded when the engine starts
+
+  ![](/Images/udf/udf_13.png)
+
+- Check if the engine has loaded UDF. If not, please restart another engine or restart the current engine
+  Note: UDF will only be loaded when the engine is initialized. If UDF is added midway, the current engine will not be able to perceive and load it
diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/current/user-guide/control-panel/_category_.json b/i18n/zh-CN/docusaurus-plugin-content-docs/current/user-guide/control-panel/_category_.json
@@ -1,4 +1,4 @@
 {
   "label": "管理台的使用",
-  "position": 7
+  "position": 8
 }
diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/current/user-guide/datasource-client.md b/i18n/zh-CN/docusaurus-plugin-content-docs/current/user-guide/datasource-client.md
@@ -1,6 +1,6 @@
 ---
 title: 数据源 SDK
-sidebar_position: 5
+sidebar_position: 6
 ---
 
 > Linkis DataSource 提供了方便的JAVA和SCALA调用的Client SDK 接口，只需要引入linkis-datasource-client的模块就可以进行使用，

diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/current/user-guide/dynamic-variables.md b/i18n/zh-CN/docusaurus-plugin-content-docs/current/user-guide/dynamic-variables.md
@@ -1,6 +1,6 @@
 ---
 title: 内置时间变量
-sidebar_position: 6
+sidebar_position: 7
 ---
 
 ## 1. 总述

diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/current/user-guide/udf-function.md b/i18n/zh-CN/docusaurus-plugin-content-docs/current/user-guide/udf-function.md
@@ -0,0 +1,137 @@
+---
+title: UDF功能
+sidebar_position: 5
+---
+
+> 详细介绍一下如何使用UDF功能
+
+## 1.UDF创建的整体步骤说明
+### 1 通用类型的UDF函数
+整体步骤说明
+- 在本地按UDF函数格式 编写udf 函数 ，并打包称jar包文件
+- 在【Scriptis >> 工作空间】上传至工作空间对应的目录
+- 在 【管理台>>UDF函数】 创建udf (默认加载）
+- 在任务代码中使用（对于新起的引擎才生效）
+
+**Step1 本地编写jar包**
+
+UDF示例：https://help.aliyun.com/apsara/agile/v_3_6_0_20210705/odps/ase-user-guide/udf-example.html
+
+**Step2【Scriptis >> 工作空间】上传jar包**
+选择对应的文件夹 鼠标右键 选择上传
+
+**Step3【管理台>>UDF函数】 创建UDF**
+- 函数名称：符合规则即可，如test_udf_jar  在sql等脚本中使用
+- 函数类型：通用
+- 脚本路径：选择jar包存放的共享目录路径  如 ../../../wds_functions_1_0_0.jar
+- 注册格式：包名+类名，如 com.webank.wedatasphere.willink.bdp.udf.ToUpperCase
+- 使用格式：输入类型与返回类型，需与jar包里定义一致
+- 分类：下拉选择；或者输入自定义目录（会在个人函数下新建目标一级目录）
+
+注意  新建的udf 函数 是默认加载的 可以在  【Scriptis >> UDF函数】 页面查看到，方便大家在Scriptis 任务编辑时 方便查看，勾选中的UDF函数 表明是会被加载使用的
+
+**Step4  使用该udf函数**
+
+在任务中 使用上述步骤创新的udf 函数
+函数名为  【创建UDF】 函数名称
+在pyspark中：
+print (sqlContext.sql("select test_udf_jar(name1) from stacyyan_ind.result_sort_1_20200226").collect())
+
+### 2 Spark类型的UDF函数
+整体步骤说明
+- 在【Scriptis >> 工作空间】在需要的目录下新建Spark脚本文件
+- 在 【管理台>>UDF函数】 创建udf (默认加载）
+- 在任务代码中使用（对于新起的引擎才生效）
+
+**Step1 dss-scriptis-新建scala脚本**
+
+def helloWorld(str: String): String = "hello, " + str
+
+**Step2 创建UDF**
+- 函数名称：符合规则即可，如test_udf_scala
+- 函数类型：spark
+- 脚本路径：../../../b
+- 注册格式：输入类型与返回类型，需与定义一致；注册格式需定义的函数名严格保持一致，如helloWorld
+- 分类：下拉选择dss-scriptis-UDF函数-个人函数下存在的一级目录；或者输入自定义目录（会在个人函数下新建目标一级目录）
+
+
+**Step3 使用该udf函数**
+
+在任务中 使用上述步骤创建新的udf 函数
+函数名为  【创建UDF】 函数名称
+- 在scala中
+  val s=sqlContext.sql("select test_udf_scala(name1)
+  from stacyyan_ind.result_sort_1_20200226")
+  show(s)
+- 在pyspark中
+  print(sqlContext.sql("select test_udf_scala(name1)
+  from stacyyan_ind.result_sort_1_20200226").collect());
+- 在sql中
+  select test_udf_scala(name1) from stacyyan_ind.result_sort_1_20200226;
+
+### 3 python函数
+整体步骤说明
+- 在【Scriptis >> 工作空间】在需要的目录下新建Python脚本文件
+- 在 【管理台>>UDF函数】 创建udf (默认加载）
+- 在任务代码中使用（对于新起的引擎才生效）
+
+**Step1 dss-scriptis-新建pyspark脚本**
+def addation(a, b):
+return a + b
+**Step2 创建UDF**
+- 函数名称：符合规则即可，如test_udf_py
+- 函数类型：spark
+- 脚本路径：../../../a
+- 注册格式：需定义的函数名严格保持一致，如addation
+- 使用格式：输入类型与返回类型，需与定义一致
+- 分类：下拉选择dss-scriptis-UDF函数-个人函数下存在的一级目录；或者输入自定义目录（会在个人函数下新建目标一级目录）
+
+**Step3 使用该udf函数**
+在任务中 使用上述步骤创建新的udf 函数
+函数名为  【创建UDF】 函数名称
+- 在pyspark中
+  print(sqlContext.sql("select test_udf_py(pv,impression) from neiljianliu_ind.alias where entityid=504059 limit 50").collect());
+- 在sql中
+  select test_udf_py(pv,impression) from neiljianliu_ind.alias where entityid=504059 limit 50
+
+### 4 scala函数
+整体步骤说明
+- 在【Scriptis >> 工作空间】在需要的目录下新建Spark Scala脚本文件
+- 在 【管理台>>UDF函数】 创建udf (默认加载）
+- 在任务代码中使用（对于新起的引擎才生效）
+
+**Step1 dss-scriptis-新建scala脚本**
+def hellozdy(str:String):String = "hellozdy,haha " + str
+
+**Step2 创建函数**
+- 函数名称：需与定义的函数名严格保持一致，如hellozdy
+- 函数类型：自定义函数
+- 脚本路径：../../../d
+- 使用格式：输入类型与返回类型，需与定义一致
+- 分类：下拉选择dss-scriptis-方法函数-个人函数下存在的一级目录；或者输入自定义目录（会在个人函数下新建目标一级目录）
+
+**Step3 使用该函数**
+在任务中 使用上述步骤创建新的udf 函数
+函数名为  【创建UDF】 函数名称
+val a = hellozdy("abcd");
+print(a)
+
+### 5 常见的使用问题
+#### 5.1 UDF函数加载失败
+"FAILED: SemanticException [Error 10011]: Invalid function xxxx"
+![](/Images/udf/udf_10.png)
+
+- 首先检查UDF函数配置是否正确：
+
+  ![](/Images/udf/udf_11.png)
+
+- 注册格式即为函数路径名称：
+
+  ![](/Images/udf/udf_12.png)
+
+- 检查scriptis-udf函数-查看加载的函数是否勾选，当函数未勾选时，引擎启动时将不会加载udf
+
+  ![](/Images/udf/udf_13.png)
+
+- 检查引擎是否已加载UDF，如果未加载，请重新另起一个引擎或者重启当前引擎
+  备注：只有当引擎初始化时，才会加载UDF，中途新增UDF，当前引擎将无法感知并且无法进行加载
diff --git a/...saurus-plugin-content-docs/version-1.3.2/current/user-guide/control-panel/_category_.json b/...saurus-plugin-content-docs/version-1.3.2/current/user-guide/control-panel/_category_.json
@@ -1,4 +1,4 @@
 {
   "label": "管理台的使用",
-  "position": 7
+  "position": 8
 }
diff --git a/...aurus-plugin-content-docs/version-1.3.2/current/user-guide/datasource-client.md b/...aurus-plugin-content-docs/version-1.3.2/current/user-guide/datasource-client.md
@@ -1,6 +1,6 @@
 ---
 title: 数据源 SDK
-sidebar_position: 5
+sidebar_position: 6
 ---
 
 > Linkis DataSource 提供了方便的JAVA和SCALA调用的Client SDK 接口，只需要引入linkis-datasource-client的模块就可以进行使用，

diff --git a/...aurus-plugin-content-docs/version-1.3.2/current/user-guide/dynamic-variables.md b/...aurus-plugin-content-docs/version-1.3.2/current/user-guide/dynamic-variables.md
@@ -1,6 +1,6 @@
 ---
 title: 内置时间变量
-sidebar_position: 6
+sidebar_position: 7
 ---
 
 ## 1. 总述