Flink

Hive 函数

Tue Aug 25, 2020

Hive Functions

Hive 函数

通过 HiveModule 使用 Hive 内置功能 #

HiveModule 将 Hive 内置函数作为 Flink 系统（内置）函数提供给 Flink SQL 和 Table API 用户。

具体信息请参考 HiveModule。

Scala

val name            = "myhive"
val version         = "2.3.4"

tableEnv.loadModue(name, new HiveModule(version));

YAML

modules:
   - name: core
     type: core
   - name: myhive
     type: hive

注意，旧版本中的一些 Hive 内置功能存在线程安全问题。我们建议用户给自己的 Hive 打上补丁来修复它们。

Hive 用户定义的函数 #

用户可以在 Flink 中使用他们现有的 Hive 用户定义函数。

支持的 UDF 类型包括:

UDF
GenericUDF
GenericUDTF
UDAF
GenericUDAFResolver2

在查询规划和执行时，Hive 的 UDF 和 GenericUDF 会自动翻译成 Flink 的 ScalarFunction，Hive 的 GenericUDTF 会自动翻译成 Flink 的 TableFunction，Hive 的 UDAF 和 GenericUDAFResolver2 会翻译成 Flink 的 AggregateFunction。

要使用 Hive 的用户定义函数，用户必须做到:

设置一个由 Hive Metastore 支持的 HiveCatalog 作为会话的当前目录，其中包含该函数。
在 Flink 的 classpath 中加入一个包含该函数的 jar。
使用 Blink 计划器。

使用 Hive 用户定义函数 #

假设我们在 Hive Metastore 中注册了以下 Hive 函数。

/**
 * Test simple udf. Registered under name 'myudf'
 */
public class TestHiveSimpleUDF extends UDF {

	public IntWritable evaluate(IntWritable i) {
		return new IntWritable(i.get());
	}

	public Text evaluate(Text text) {
		return new Text(text.toString());
	}
}

/**
 * Test generic udf. Registered under name 'mygenericudf'
 */
public class TestHiveGenericUDF extends GenericUDF {

	@Override
	public ObjectInspector initialize(ObjectInspector[] arguments) throws UDFArgumentException {
		checkArgument(arguments.length == 2);

		checkArgument(arguments[1] instanceof ConstantObjectInspector);
		Object constant = ((ConstantObjectInspector) arguments[1]).getWritableConstantValue();
		checkArgument(constant instanceof IntWritable);
		checkArgument(((IntWritable) constant).get() == 1);

		if (arguments[0] instanceof IntObjectInspector ||
				arguments[0] instanceof StringObjectInspector) {
			return arguments[0];
		} else {
			throw new RuntimeException("Not support argument: " + arguments[0]);
		}
	}

	@Override
	public Object evaluate(DeferredObject[] arguments) throws HiveException {
		return arguments[0].get();
	}

	@Override
	public String getDisplayString(String[] children) {
		return "TestHiveGenericUDF";
	}
}

/**
 * Test split udtf. Registered under name 'mygenericudtf'
 */
public class TestHiveUDTF extends GenericUDTF {

	@Override
	public StructObjectInspector initialize(ObjectInspector[] argOIs) throws UDFArgumentException {
		checkArgument(argOIs.length == 2);

		// TEST for constant arguments
		checkArgument(argOIs[1] instanceof ConstantObjectInspector);
		Object constant = ((ConstantObjectInspector) argOIs[1]).getWritableConstantValue();
		checkArgument(constant instanceof IntWritable);
		checkArgument(((IntWritable) constant).get() == 1);

		return ObjectInspectorFactory.getStandardStructObjectInspector(
			Collections.singletonList("col1"),
			Collections.singletonList(PrimitiveObjectInspectorFactory.javaStringObjectInspector));
	}

	@Override
	public void process(Object[] args) throws HiveException {
		String str = (String) args[0];
		for (String s : str.split(",")) {
			forward(s);
			forward(s);
		}
	}

	@Override
	public void close() {
	}
}

从 Hive CLI 中，我们可以看到他们已经注册了。

hive> show functions;
OK
......
mygenericudf
myudf
myudtf

然后，用户可以在 SQL 中使用它们作为。

Flink SQL> select mygenericudf(myudf(name), 1) as a, mygenericudf(myudf(age), 1) as b, s from mysourcetable, lateral table(myudtf(name, 1)) as T(s);

原文链接: https://ci.apache.org/projects/flink/flink-docs-release-1.11/dev/table/hive/hive_functions.html