Skip to content

Commit

Permalink
Explanation of the overall steps for creating a new UDF
Browse files Browse the repository at this point in the history
  • Loading branch information
ahaoyao committed Jun 21, 2023
1 parent 02d10e5 commit 0c10baa
Show file tree
Hide file tree
Showing 26 changed files with 702 additions and 17 deletions.
2 changes: 1 addition & 1 deletion docs/user-guide/datasource-client.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
---
title: DataSource Client SDK
sidebar_position: 4
sidebar_position: 6
---

> ```
Expand Down
2 changes: 1 addition & 1 deletion docs/user-guide/datasource-manual.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
---
title: Data Source Manual
sidebar_position: 7
sidebar_position: 4
---

> Introduce how to use the new feature function data source of version 1.1.0
Expand Down
2 changes: 1 addition & 1 deletion docs/user-guide/dynamic-variables.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
---
title: built-in time variable
sidebar_position: 6
sidebar_position: 7
---

## 1. Overview
Expand Down
2 changes: 1 addition & 1 deletion docs/user-guide/linkiscli-manual.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
---
title: Shell Scripts Manual
sidebar_position: 3
sidebar_position: 2
---

## 1.Introduction
Expand Down
137 changes: 137 additions & 0 deletions docs/user-guide/udf-function.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,137 @@
---
title: UDF Function
sidebar_position: 5
---

## 1 Introduction to UDF
UDF: User Defined Function, user-defined function. In some scenarios, we need to use hive functions to process some data. Functions like count() and sum() are built-in. If we want to use some functions that are not built-in, we need to customize the function, which can be done by writing UDF.


## 1.Overview of the overall steps for creating UDF
### 1 Universal type UDF functions
Overall step description
- Write UDF functions in UDF format locally and package them as jar package files
- 【Scriptis >> Workspace】Upload to the corresponding directory in the workspace
- 【Management Console>>UDF Function】 Create udf (default loading)
- Used in task code (only effective for newly started engines)

**Step1 Writing jar packages locally**

UDF Example:https://help.aliyun.com/apsara/agile/v_3_6_0_20210705/odps/ase-user-guide/udf-example.html

**Step2【Scriptis >> Workspace】Upload jar package**
Select the corresponding folder and right-click to select Upload

**Step3【Management Console>>UDF Function】 Create UDF**
- Function name: Conforming to the rules is sufficient, such as test_ Udf_ Using jar in scripts such as SQL
- Function Type: General
- Script path: Select the shared directory path where the jar package is stored, such as../..// Wds_ Functions_ 1_ 0_ 0. jar
- Registration format: package name+class name, such as com.webank.wedatasphere.willink.bdp.udf.ToUpperCase
- Usage format: Input type and return type must be consistent with the definition in the jar package
- Classification: drop-down selection; Alternatively, enter a custom directory (which will create a new target level directory under the personal function)

Note that the newly created UDF function is loaded by default and can be viewed on the [Scriptis>>UDF Functions] page for easy viewing during Scriptis task editing. Checking the UDF function indicates that it will be loaded and used

**Step4 Use this udf function**

Innovative udf function using the above steps in the task
Function name is [Create UDF] Function name
In pyspark:
print (sqlContext.sql("select test_udf_jar(name1) from stacyyan_ind.result_sort_1_20200226").collect())

### 2 UDF functions of Spark type
Overall step description
- Create a new Spark script file in the desired directory in the [Scriptis>>workspace]
- Create UDF in [Management Console>>UDF Functions] (default loading)
- Used in task code (only effective for newly started engines)

**Step1 dss-scriptis-Create a new scala script**

def helloWorld(str: String): String = "hello, " + str

**Step2 Create UDF**
- Function name: Conforming to the rules is sufficient, such as test_ Udf_ Scala
- Function type: spark
- Script Path:../..// B
- Registration format: The input type and return type must be consistent with the definition; The function names that need to be defined in the registration format must be strictly consistent, such as helloWorld
- Classification: Drop down and select the first level directory that exists under dss scriptis UDF function - Personal function; Alternatively, enter a custom directory (which will create a new target level directory under the personal function)

**Step3 Use this udf function**

Use the above steps in the task to create a new udf function
Function name is [Create UDF] Function name
- In scala
val s=sqlContext.sql("select test_udf_scala(name1)
from stacyyan_ind.result_sort_1_20200226")
show(s)
- in pyspark
print(sqlContext.sql("select test_udf_scala(name1)
from stacyyan_ind.result_sort_1_20200226").collect());
- in sql
select test_udf_scala(name1) from stacyyan_ind.result_sort_1_20200226;

### 3 Python functions
Overall step description
- Create a new Python script file in the desired directory in the [Scriptis>>workspace]
- Create UDF in [Management Console>>UDF Functions] (default loading)
- Used in task code (only effective for newly started engines)

**Step1 dss-scriptis-Create a new pyspark script**
def addation(a, b):
return a + b
Step2 Create UDF
- Function name: Conforming to the rules is sufficient, such as test_ Udf_ Py
- Function type: spark
- Script Path:../..// A
- Registration format: The function names that need to be defined must be strictly consistent, such as addition
- Usage format: The input type and return type must be consistent with the definition
- Classification: Drop down and select the first level directory that exists under dss scriptis UDF function - Personal function; Alternatively, enter a custom directory (which will create a new target level directory under the personal function)

**Step3 uses this udf function**
Use the above steps in the task to create a new udf function
Function name is [Create UDF] Function name
- in pyspark
print(sqlContext.sql("select test_udf_py(pv,impression) from neiljianliu_ind.alias where entityid=504059 limit 50").collect());
- in sql
select test_udf_py(pv,impression) from neiljianliu_ind.alias where entityid=504059 limit 50

### 4 scala functions
Overall step description
- Create a new Spark Scala script file in the desired directory in the [Scriptis>>workspace]
- Create UDF in [Management Console>>UDF Functions] (default loading)
- Used in task code (only effective for newly started engines)
-
**Step1 dss-scriptis-Create a new scala script**
def hellozdy(str:String):String = "hellozdy,haha " + str
**Step2 CREATE FUNCTION**
- Function name: Must be strictly consistent with the defined function name, such as hellozdy
- Function Type: Custom Function
- Script Path:../..// D
- Usage format: The input type and return type must be consistent with the definition
- Classification: Drop down and select the first level directory that exists under dss scriptis method function personal function; Alternatively, enter a custom directory (which will create a new target level directory under the personal function)
Step3 Use this function
Use the above steps in the task to create a new udf function
Function name is [Create UDF] Function name
val a = hellozdy("abcd");
print(a)

### 5 Common usage issues
#### 5.1 UDF function loading failed
"FAILED: SemanticException [Error 10011]: Invalid function xxxx"

![](/Images/udf/udf_10.png)

- Firstly, check if the UDF function configuration is correct:

![](/Images/udf/udf_11.png)

- The registration format is the function path name:

![](/Images/udf/udf_12.png)

- Check the scriptis udf function to see if the loaded function is checked. When the function is not checked, udf will not be loaded when the engine starts

![](/Images/udf/udf_13.png)

- Check if the engine has loaded UDF. If not, please restart another engine or restart the current engine
Note: UDF will only be loaded when the engine is initialized. If UDF is added midway, the current engine will not be able to perceive and load it
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
{
"label": "管理台的使用",
"position": 7
"position": 8
}
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
---
title: 数据源 SDK
sidebar_position: 5
sidebar_position: 6
---

> Linkis DataSource 提供了方便的JAVA和SCALA调用的Client SDK 接口,只需要引入linkis-datasource-client的模块就可以进行使用,
Expand Down
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
---
title: 内置时间变量
sidebar_position: 6
sidebar_position: 7
---

## 1. 总述
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,137 @@
---
title: UDF功能
sidebar_position: 5
---

> 详细介绍一下如何使用UDF功能
## 1.UDF创建的整体步骤说明
### 1 通用类型的UDF函数
整体步骤说明
- 在本地按UDF函数格式 编写udf 函数 ,并打包称jar包文件
- 在【Scriptis >> 工作空间】上传至工作空间对应的目录
- 在 【管理台>>UDF函数】 创建udf (默认加载)
- 在任务代码中使用(对于新起的引擎才生效)

**Step1 本地编写jar包**

UDF示例:https://help.aliyun.com/apsara/agile/v_3_6_0_20210705/odps/ase-user-guide/udf-example.html

**Step2【Scriptis >> 工作空间】上传jar包**
选择对应的文件夹 鼠标右键 选择上传

**Step3【管理台>>UDF函数】 创建UDF**
- 函数名称:符合规则即可,如test_udf_jar 在sql等脚本中使用
- 函数类型:通用
- 脚本路径:选择jar包存放的共享目录路径 如 ../../../wds_functions_1_0_0.jar
- 注册格式:包名+类名,如 com.webank.wedatasphere.willink.bdp.udf.ToUpperCase
- 使用格式:输入类型与返回类型,需与jar包里定义一致
- 分类:下拉选择;或者输入自定义目录(会在个人函数下新建目标一级目录)

注意 新建的udf 函数 是默认加载的 可以在 【Scriptis >> UDF函数】 页面查看到,方便大家在Scriptis 任务编辑时 方便查看,勾选中的UDF函数 表明是会被加载使用的

**Step4 使用该udf函数**

在任务中 使用上述步骤创新的udf 函数
函数名为 【创建UDF】 函数名称
在pyspark中:
print (sqlContext.sql("select test_udf_jar(name1) from stacyyan_ind.result_sort_1_20200226").collect())

### 2 Spark类型的UDF函数
整体步骤说明
- 在【Scriptis >> 工作空间】在需要的目录下新建Spark脚本文件
- 在 【管理台>>UDF函数】 创建udf (默认加载)
- 在任务代码中使用(对于新起的引擎才生效)

**Step1 dss-scriptis-新建scala脚本**

def helloWorld(str: String): String = "hello, " + str

**Step2 创建UDF**
- 函数名称:符合规则即可,如test_udf_scala
- 函数类型:spark
- 脚本路径:../../../b
- 注册格式:输入类型与返回类型,需与定义一致;注册格式需定义的函数名严格保持一致,如helloWorld
- 分类:下拉选择dss-scriptis-UDF函数-个人函数下存在的一级目录;或者输入自定义目录(会在个人函数下新建目标一级目录)


**Step3 使用该udf函数**

在任务中 使用上述步骤创建新的udf 函数
函数名为 【创建UDF】 函数名称
- 在scala中
val s=sqlContext.sql("select test_udf_scala(name1)
from stacyyan_ind.result_sort_1_20200226")
show(s)
- 在pyspark中
print(sqlContext.sql("select test_udf_scala(name1)
from stacyyan_ind.result_sort_1_20200226").collect());
- 在sql中
select test_udf_scala(name1) from stacyyan_ind.result_sort_1_20200226;

### 3 python函数
整体步骤说明
- 在【Scriptis >> 工作空间】在需要的目录下新建Python脚本文件
- 在 【管理台>>UDF函数】 创建udf (默认加载)
- 在任务代码中使用(对于新起的引擎才生效)

**Step1 dss-scriptis-新建pyspark脚本**
def addation(a, b):
return a + b
**Step2 创建UDF**
- 函数名称:符合规则即可,如test_udf_py
- 函数类型:spark
- 脚本路径:../../../a
- 注册格式:需定义的函数名严格保持一致,如addation
- 使用格式:输入类型与返回类型,需与定义一致
- 分类:下拉选择dss-scriptis-UDF函数-个人函数下存在的一级目录;或者输入自定义目录(会在个人函数下新建目标一级目录)

**Step3 使用该udf函数**
在任务中 使用上述步骤创建新的udf 函数
函数名为 【创建UDF】 函数名称
- 在pyspark中
print(sqlContext.sql("select test_udf_py(pv,impression) from neiljianliu_ind.alias where entityid=504059 limit 50").collect());
- 在sql中
select test_udf_py(pv,impression) from neiljianliu_ind.alias where entityid=504059 limit 50

### 4 scala函数
整体步骤说明
- 在【Scriptis >> 工作空间】在需要的目录下新建Spark Scala脚本文件
- 在 【管理台>>UDF函数】 创建udf (默认加载)
- 在任务代码中使用(对于新起的引擎才生效)

**Step1 dss-scriptis-新建scala脚本**
def hellozdy(str:String):String = "hellozdy,haha " + str

**Step2 创建函数**
- 函数名称:需与定义的函数名严格保持一致,如hellozdy
- 函数类型:自定义函数
- 脚本路径:../../../d
- 使用格式:输入类型与返回类型,需与定义一致
- 分类:下拉选择dss-scriptis-方法函数-个人函数下存在的一级目录;或者输入自定义目录(会在个人函数下新建目标一级目录)

**Step3 使用该函数**
在任务中 使用上述步骤创建新的udf 函数
函数名为 【创建UDF】 函数名称
val a = hellozdy("abcd");
print(a)

### 5 常见的使用问题
#### 5.1 UDF函数加载失败
"FAILED: SemanticException [Error 10011]: Invalid function xxxx"
![](/Images/udf/udf_10.png)

- 首先检查UDF函数配置是否正确:

![](/Images/udf/udf_11.png)

- 注册格式即为函数路径名称:

![](/Images/udf/udf_12.png)

- 检查scriptis-udf函数-查看加载的函数是否勾选,当函数未勾选时,引擎启动时将不会加载udf

![](/Images/udf/udf_13.png)

- 检查引擎是否已加载UDF,如果未加载,请重新另起一个引擎或者重启当前引擎
备注:只有当引擎初始化时,才会加载UDF,中途新增UDF,当前引擎将无法感知并且无法进行加载
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
{
"label": "管理台的使用",
"position": 7
"position": 8
}
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
---
title: 数据源 SDK
sidebar_position: 5
sidebar_position: 6
---

> Linkis DataSource 提供了方便的JAVA和SCALA调用的Client SDK 接口,只需要引入linkis-datasource-client的模块就可以进行使用,
Expand Down
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
---
title: 内置时间变量
sidebar_position: 6
sidebar_position: 7
---

## 1. 总述
Expand Down
Loading

0 comments on commit 0c10baa

Please sign in to comment.