Dowemo
0 0 0 0

About hive dynamic partition

In relational databases, such as oracle, the database will automatically insert the data into the corresponding partition according to the value of the partition field.

Scene description

Existing test, partition table test_partitioned, need to be I & erted into a partitioned table by month, day

test

day url
2015-05-10Url1
2015-05-10Url2
2015-06-14Url1
2015-06-14Url2
2015-06-15Url1
2015-06-15Url2

Target table: test_partitioned

CREATETABLE t_lxw1234_partitioned (
 url STRING
) PARTITIONED BY (month STRING,day STRING) 
stored AS textfile;

Requirements

Data in test follows the month ( month ), date ( day ), and inserted into the appropriate partition of the target table test partitioned.
This requirement isn't easy to implement if you follow the insert data in the previous section described earlier.
It's time to use dynamic partitions.

Set parameters

Use dynamic partitioning to be aware of setting the following paramete:

  • Hive. Exec. Dynamic. Partition.

    Default: false
    Do you want to turn on dynamic partition features, default false.
    When using dynamic partitions, this parameter must be set to true;

  • Hive. Exec. Dynamic. Partition. Mode.

    Default: strict,
    Dynamic partition mode, default strict, indicating that at least one partition must be specified as static partition, nonstrict mode indicates that all partition fields can be used for dynamic partitioning.
    General needs to be set to nonstrict

  • Hive. Exec. Max. Dynamic. Partitions. Pernode.

    Default: 100, the maximum number of dynamic partitions can be created on each node on which the mr is executed.
    This parameter needs to be set according to actual data.
    For example, the source data contains a year of data, that's, the day field has 365 values, then the parameter needs to be set to greater than 365, if the default value 100.

  • Hive. Exec. Max. Dynamic. Partitions.

    Default: 1000,
    In all the nodes that execute mr, the maximum total number of dynamic partitions can be created.
    argument.

  • Hive. Exec. Max. Create. Files.

    Default: 100000,
    In the entire mr job, the maximum number of hdfs files can be created.
    General default is sufficient, unless your amount of data is large, the number of files you need to create is greater than 100000, and can be adjusted according to the actual situation.

  • Hive. Error. On. Empty. Partition.

    Default: false,
    Whether an exception is thrown when an empty partition is generated.
    No need to set up.

Execute code

SET hive.exec.dynamic.partition=true;SET hive.exec.dynamic.partition.mode=nonstrict;SET hive.exec.max.dynamic.partitions.pernode = 1000;SET hive.exec.max.dynamic.partitions=1000;INSERT overwrite TABLE t_lxw1234_partitioned PARTITION (month,day) SELECT url,substr(day,1,7) ASmonth,dayFROM t_lxw1234;

Co ideratio

Note: the partition field name is specified in partition ( month, day ).
In the last two fields of the select clause, you must correspond to the partition fields specified in the previous partition ( month, day ), including order.




Copyright © 2011 Dowemo All rights reserved.    Creative Commons   AboutUs