- example of a document
{ "_id" : ObjectId("5302f29ad7945f76575d2c75"), "a" : 2, "b" : 1 }
- CRUD
db.test.aggregate({$project:{a:1,b:1,isTrue:{$gt:['$a','$b']}}},{$match:{isTrue:true}})
Wednesday, February 26, 2014
Tuesday, February 25, 2014
Cloudera 4 - Hive (permanent functions)
- download tarball of hive source
- http://archive.cloudera.com/cdh4/cdh/4/hive-0.10.0-cdh4.5.0.tar.gz (2014. 02. 14)
- tar xfvz hive-0.10.0-cdh4.5.0.tar.gz
- cd hive-0.10.0-cdh4.5.0/src/ql/src/java/org/apache/hadoop/hive/ql/exec/
- vi FunctionRegistry.java
- find 'registerGenericUDF("split", GenericUDFSplit.class);' line
- insert 'registerGenericUDF("yourFunctionName", yourClassName.class);' under above line
- add 'import org.apache.hadoop.hive.ql.udf.generic.yourJavaName;'
- cd ../udf/generic/
- copy yourJavaName.java to here
- change the package of yourJavaName.java to 'package org.apache.hadoop.hive.ql.udf.generic;'
- cd hive-0.10.0-cdh4.5.0/src
- ant
- cd hive-0.10.0-cdh4.5.0/src/build/ql
- copy hive-exec-0.10.0-cdh4.5.0.jar to /opt/cloudera/parcels/CDH/lib/hive/lib
- restart Hive
- http://archive.cloudera.com/cdh4/cdh/4/hive-0.10.0-cdh4.5.0.tar.gz (2014. 02. 14)
- tar xfvz hive-0.10.0-cdh4.5.0.tar.gz
- cd hive-0.10.0-cdh4.5.0/src/ql/src/java/org/apache/hadoop/hive/ql/exec/
- vi FunctionRegistry.java
- find 'registerGenericUDF("split", GenericUDFSplit.class);' line
- insert 'registerGenericUDF("yourFunctionName", yourClassName.class);' under above line
- add 'import org.apache.hadoop.hive.ql.udf.generic.yourJavaName;'
- cd ../udf/generic/
- copy yourJavaName.java to here
- change the package of yourJavaName.java to 'package org.apache.hadoop.hive.ql.udf.generic;'
- cd hive-0.10.0-cdh4.5.0/src
- ant
- cd hive-0.10.0-cdh4.5.0/src/build/ql
- copy hive-exec-0.10.0-cdh4.5.0.jar to /opt/cloudera/parcels/CDH/lib/hive/lib
- restart Hive
Monday, February 24, 2014
Cloudera Manager 4.8.1 - Configuration (good to know)
1. hdfs
1.1. Default
- dfs.permission
- Automatically Restart Process
- heap size
1.2. Monitoring
- Enable Health Alerts for this Role
2. mapreduce
- mapred.userlog.retain.hours
- heap size
3. hive
- ZooKeeper Service
- heap size
- Hive Service Configuration Safety Valve for hive-site.xml
<property>
<name>hive.support.concurrency</name>
<value>false</value>
</property>
<property>
<name>hive.exec.compress.intermediate</name>
<value>true</value>
</property>
4. Cloudera Management Services
- Navigator Server Data Expiration Period
- Host Monitor Data Expiration Period
- Service Monitor Data Expiration Period
- Purge Activities Data at This Age
- Purge Attempts Data at This Age
- Purge MapReduce Service Data at This Age
1.1. Default
- dfs.permission
- Automatically Restart Process
- heap size
1.2. Monitoring
- Enable Health Alerts for this Role
2. mapreduce
- mapred.userlog.retain.hours
- heap size
3. hive
- ZooKeeper Service
- heap size
- Hive Service Configuration Safety Valve for hive-site.xml
<property>
<name>hive.support.concurrency</name>
<value>false</value>
</property>
<property>
<name>hive.exec.compress.intermediate</name>
<value>true</value>
</property>
4. Cloudera Management Services
- Navigator Server Data Expiration Period
- Host Monitor Data Expiration Period
- Service Monitor Data Expiration Period
- Purge Activities Data at This Age
- Purge Attempts Data at This Age
- Purge MapReduce Service Data at This Age
Subscribe to:
Posts (Atom)