Wednesday, February 26, 2014

Mongo 2.4.9 - comparison of two fields

- example of a document

{ "_id" : ObjectId("5302f29ad7945f76575d2c75"), "a" : 2, "b" : 1 }

- CRUD

db.test.aggregate({$project:{a:1,b:1,isTrue:{$gt:['$a','$b']}}},{$match:{isTrue:true}})

Tuesday, February 25, 2014

Cloudera 4 - Hive (permanent functions)

- download tarball of hive source
- http://archive.cloudera.com/cdh4/cdh/4/hive-0.10.0-cdh4.5.0.tar.gz (2014. 02. 14)
- tar xfvz hive-0.10.0-cdh4.5.0.tar.gz
- cd  hive-0.10.0-cdh4.5.0/src/ql/src/java/org/apache/hadoop/hive/ql/exec/
- vi FunctionRegistry.java
- find 'registerGenericUDF("split", GenericUDFSplit.class);' line
- insert 'registerGenericUDF("yourFunctionName", yourClassName.class);' under above line
- add 'import org.apache.hadoop.hive.ql.udf.generic.yourJavaName;'
- cd ../udf/generic/
- copy yourJavaName.java to here
- change the package of yourJavaName.java to 'package org.apache.hadoop.hive.ql.udf.generic;'
- cd hive-0.10.0-cdh4.5.0/src
- ant
- cd hive-0.10.0-cdh4.5.0/src/build/ql
- copy hive-exec-0.10.0-cdh4.5.0.jar to /opt/cloudera/parcels/CDH/lib/hive/lib
- restart Hive

Monday, February 24, 2014

Cloudera Manager 4.8.1 - Configuration (good to know)

1. hdfs

1.1. Default

- dfs.permission
- Automatically Restart Process
- heap size

1.2. Monitoring

- Enable Health Alerts for this Role

2. mapreduce

- mapred.userlog.retain.hours
- heap size

3. hive

- ZooKeeper Service
- heap size
- Hive Service Configuration Safety Valve for hive-site.xml

<property>
    <name>hive.support.concurrency</name>
    <value>false</value>
</property>

<property>
    <name>hive.exec.compress.intermediate</name>
    <value>true</value>
</property>

4. Cloudera Management Services

- Navigator Server Data Expiration Period
- Host Monitor Data Expiration Period
- Service Monitor Data Expiration Period
- Purge Activities Data at This Age
- Purge Attempts Data at This Age
- Purge MapReduce Service Data at This Age