Skip to content
Reynold Xin edited this page May 7, 2013 · 8 revisions

We document the patches we made to Hive below. We try to avoid this as much as possible, but for the cases documented below, they are necessary, either for performance or for correctness.

  1. Fixed TextConverter inefficiency.

    This greatly improves performance on UDF's involving strings. It has been incorporated into Hive trunk and is a part of the Hive-0.9.0 release. See also: https://issues.apache.org/jira/browse/HIVE-2891

    https://github.com/amplab/hive/commit/bc144113f14d448bd035b8fc8b6282022700dd13

  2. Fixed concurrency issue with LazyBinaryUtils.

    Hive uses a static variable in a non-threadsafe way that can cause concurrency problems. This only affects the LazyBinarySerDe when it is deserializing a row that contains a string and there is more than one task running on the same node. Hive does not have this problem because they run each task in a separate JVM.

    Hive JIRA tracking: https://issues.apache.org/jira/browse/HIVE-3772

    For Hive 0.7:

    https://github.com/amplab/hive/commit/17e1c3dd2f6d8eca767115dc46d5a880aed8c765 and https://github.com/amplab/hive/commit/37eb3e4edaf99be2e7d66448d2582b16a15033a6

    For Hive 0.9:

    https://github.com/amplab/hive/commit/17e1c3dd2f6d8eca767115dc46d5a880aed8c765