- bin/flume-ng: line X: syntax error in conditional expression (flume)
- add " symbol. for instance,
- if [[ $line =~ "^java\.library\.path=(.*)$" ]]; then
- or, update bash.
- check bash version using 'bash -version' command
- add " symbol. for instance,
- warning: unprotected private key file! / permissions 0xxx for 'path/to/id_rsa' are too open (OpenSSH)
- chmod 600 path/to/id_rsa
- ask for password, even authorized_keys file exists (SSH)
- chmod 700 .ssh
- chmod 644 .ssh/authorized_keys
- failed to recv file (R - scp)
- check the file path
- no lines available in input / 입력에 가능한 라인들이 없습니다 (R - read.table with pipe)
- check the file path
- java.net.MalformedURLException: unknown protocol: hdfs (java)
- add 'URL.setURLStreamHandlerFactory(new FsUrlStreamHandlerFactory());'
- argument list too long (curl)
- it causes the error, because 'cat base64.txt' returns long content
- curl -XPOST "http://localhost:9200/test/person" -d '
{
"file" : {
"_content" : "'`cat base64.txt`'"
}
}'
- use '@-' to solve the problem
- curl -XPOST "http://localhost:9200/test/person" -d @- <<CURL_DATA
{
"file" : {
"_content" : "`cat base64.txt`"
}
}
CURL_DATA - note that you can use any string instead of CURL_DATA, and there is no single quote inside the value of _content this time
- it causes the error, because 'cat base64.txt' returns long content
- invalid target release: 1.7 (maven)
- export JAVA_HOME=<java home path>
Monday, August 31, 2015
Help 4 Error - Old
Behemoth
- 2015.08.18
- prerequisites
- java 1.6
- apache maven 2.2.1
- internet connection
- compiling
- git clone https://github.com/DigitalPebble/behemoth.git
- cd behemoth
- mvn install
- mvn test
- mvn package
- generate a corpus
- hadoop jar tika/target/behemoth-tika-*-job.jar com.digitalpebble.behemoth.util.CorpusGenerator -i <file or dir> -o output1
- ./behemoth importer
- extract text
- hadoop jar tika/target/behemoth-tika-*-job.jar com.digitalpebble.behemoth.tika.TikaDriver -i output1 -o output2
- ./behemoth tika
- inspect the corpus
- hadoop jar tika/target/behemoth-tika-*-job.jar com.digitalpebble.behemoth.util.CorpusReader -i output2 -a -c -m -t
- hadoop fs -libjars tika/target/behemoth-tika-*-job.jar -text output2/part-00000
- hadoop fs -libjars tika/target/behemoth-tika-*-job.jar -text output2/part-00001
- ./behemoth reader
- extract content from seq files
- hadoop jar tika/target/behemoth-tika-*-job.jar com.digitalpebble.behemoth.util.ContentExtractor -i output2 -o output3
- hadoop jar tika/target/behemoth-tika-*-job.jar com.digitalpebble.behemoth.util.ContentExtractor -i output2/part-00000 -o output4
- hadoop jar tika/target/behemoth-tika-*-job.jar com.digitalpebble.behemoth.util.ContentExtractor -i output2/part-00001 -o output5
- ./behemoth exporter
- prerequisites
ElasticSearch
- 2015.08.14
- mapper attachments type for elasticsearch
- each node
- bin/plugin install elasticsearch/elasticsearch-mapper-attachments/2.4.3
- note that 2.4.3 is for ES 1.4
- restart
- bin/plugin install elasticsearch/elasticsearch-mapper-attachments/2.4.3
- DELETE /test
- PUT /test
- PUT /test/person/_mapping
{
"person" : {
"properties" : {
"file" : {
"type" : "attachment",
"fields" : {
"file" : {"term_vector" : "with_positions_offsets", "store": true},
"title" : {"store" : "yes"},
"date" : {"store" : "yes"},
"author" : {"store" : "yes"},
"keywords" : {"store" : "yes"},
"content_type" : {"store" : "yes"},
"content_length" : {"store" : "yes"},
"language" : {"store" : "yes"}
}
}
}
}
} - curl -XPOST "http://localhost:9200/test/person" -d '
{
"file" : {
"_content" : "... base64 encoded attachment ..."
}
}' - for long base64
- curl -XPOST "http://localhost:9200/test/person" -d @- <<CURL_DATA
{
"file" : {
"_content" : "`base64 my.pdf | perl -pe 's/\n/\\n/g'`"
}
}
CURL_DATA
- GET /test/person/_search
{
"fields": [ "file.date", "file.title", "file.name", "file.author", "file.keywords", "file.language", "file.cotent_length", "file.content_type", "file" ],
"query": {
"match": {
"file.content_type": "pdf"
}
}
}
- each node
- mapper attachments type for elasticsearch
- 2015.03.03
- bashrc
- export INNERIP=`hostname -i`
export ES_HEAP_SIZE=8g
export ES_CLASSPATH=/etc/hadoop/conf:/opt/cloudera/parcels/CDH-5.3.0-1.cdh5.3.0.p0.30/lib/hadoop/libexec/../../hadoop/lib/*:/opt/cloudera/parcels/CDH-5.3.0-1.cdh5.3.0.p0.30/lib/hadoop/libexec/../../hadoop/.//*:/opt/cloudera/parcels/CDH-5.3.0-1.cdh5.3.0.p0.30/lib/hadoop/libexec/../../hadoop-hdfs/./:/opt/cloudera/parcels/CDH-5.3.0-1.cdh5.3.0.p0.30/lib/hadoop/libexec/../../hadoop-hdfs/lib/*:/opt/cloudera/parcels/CDH-5.3.0-1.cdh5.3.0.p0.30/lib/hadoop/libexec/../../hadoop-hdfs/.//*:/opt/cloudera/parcels/CDH-5.3.0-1.cdh5.3.0.p0.30/lib/hadoop/libexec/../../hadoop-yarn/lib/*:/opt/cloudera/parcels/CDH-5.3.0-1.cdh5.3.0.p0.30/lib/hadoop/libexec/../../hadoop-yarn/.//*:/opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/lib/*:/opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/.//*
- configuration
- cluster.name: test
- node.name: ${HOSTNAME}
- transport.host: ${INNERIP}
- discovery.zen.ping.multicast.enabled: false
- discovery.zen.ping.unicast.hosts: ["10.0.2.a", "10.0.2.b", "10.0.2.c"]
- indices.fielddata.cache.size: 40%
- bashrc
- 2015.03.02
- snapshot and restore
- repository register
- PUT _snapshot/hdfs
{
"type": "hdfs",
"settings": {
"path": "/backup/elasticsearch"
}
}
- repository verification
- POST _snapshot/hdfs/_verify
- snapshot
- PUT _snapshot/hdfs/20150302
- monitoring snapshot/restore progress
- GET _snapshot/hdfs/20150302/_status
- GET _snapshot/hdfs/20150302
- snapshot information and status
- GET _snapshot/hdfs/20150302
- GET _snapshot/hdfs/_all
- GET _snapshot/_status
- GET _snapshot/hdfs/_status
- GET _snapshot/hdfs/20150302/_status
- restore
- POST _snapshot/hdfs/20150302/_restore
- snapshot deletion / stopping currently running snapshot and restore operations
- DELETE _snapshot/hdfs/20150302
- repository deletion
- DELETE _snapshot/hdfs
- reference
- repository register
- rolling update
- Disable shard reallocation
- curl -XPUT localhost:9200/_cluster/settings -d '{ "transient" : { "cluster.routing.allocation.enable" : "none" } }'
- Shut down a single node within the cluster
- curl -XPOST 'http://localhost:9200/_cluster/nodes/_local/_shutdown'
- Confirm that all shards are correctly reallocated to the remaining running nodes
- Download newest version
- Extract the zip or tarball to a new directory
- Copy the configuration files from the old Elasticsearch installation’s config directory to the new Elasticsearch installation’s config directory
- Move data files from the old Elasticsesarch installation’s data directory
- Install plugins
- Start the now upgraded node
- Confirm that it joins the cluster
- Re-enable shard reallocation
- curl -XPUT localhost:9200/_cluster/settings -d '{ "transient" : { "cluster.routing.allocation.enable" : "all" } }'
- Observe that all shards are properly allocated on all nodes
- Repeat this process for all remaining nodes
- Reference
- Disable shard reallocation
- snapshot and restore
- 2015.02.13
- MySQL Slow Query Log Mapping
PUT msql-2015 { "mappings": { "log": { "properties": { "@timestamp": { "type": "date", "format": "dateOptionalTime" }, "@version": { "type": "string" }, "host": { "type": "string", "index": "not_analyzed" }, "ip": { "type": "string", "index": "not_analyzed" }, "lock_time": { "type": "double" }, "message": { "type": "string", "index": "not_analyzed" }, "query": { "type": "string" }, "query_time": { "type": "double" }, "rows_examined": { "type": "double" }, "rows_sent": { "type": "double" }, "type": { "type": "string" }, "user": { "type": "string" } } } } }
- MySQL Slow Query Dump Mapping
PUT msqld-2015 { "mappings": { "dump": { "properties": { "@timestamp": { "type": "date", "format": "dateOptionalTime" }, "@version": { "type": "string" }, "count": { "type": "double" }, "host": { "type": "string", "index": "not_analyzed" }, "ip": { "type": "string", "index": "not_analyzed" }, "lock": { "type": "double" }, "message": { "type": "string", "index": "not_analyzed" }, "query": { "type": "string" }, "rows": { "type": "double" }, "time": { "type": "double" }, "type": { "type": "string" }, "user": { "type": "string" } } } } }
- 2015.02.12
- MySQL Slow Query Log & Dump Mappings
PUT msqld-2015 { "mappings": { "log": { "properties": { "@timestamp": { "type": "date", "format": "dateOptionalTime" }, "@version": { "type": "string" }, "host": { "type": "string", "index": "not_analyzed" }, "ip": { "type": "string", "index": "not_analyzed" }, "lock_time": { "type": "double" }, "message": { "type": "string", "index": "not_analyzed" }, "query": { "type": "string" }, "query_time": { "type": "double" }, "rows_examined": { "type": "double" }, "rows_sent": { "type": "double" }, "type": { "type": "string" }, "user": { "type": "string" } } }, "dump": { "properties": { "@timestamp": { "type": "date", "format": "dateOptionalTime" }, "@version": { "type": "string" }, "count": { "type": "double" }, "host": { "type": "string", "index": "not_analyzed" }, "ip": { "type": "string", "index": "not_analyzed" }, "lock": { "type": "double" }, "message": { "type": "string", "index": "not_analyzed" }, "query": { "type": "string" }, "rows": { "type": "double" }, "time": { "type": "double" }, "type": { "type": "string" }, "user": { "type": "string" } } } } }
- 2015.01.19
- restart script
curl -XPUT localhost:9200/_cluster/settings -d '{"transient":{"cluster.routing.allocation.disable_allocation":true}}' sleep 1s curl -XPOST 'http://localhost:9200/_cluster/nodes/_local/_shutdown' sleep 1s bin/elasticsearch -d sleep 10s curl -XPUT localhost:9200/_cluster/settings -d '{"transient":{"cluster.routing.allocation.disable_allocation":false}}'
- restart script
- ~ 2015.01.01
- Commnad
- curl 'http://localhost:9200/?pretty'
- curl -XPOST 'http://localhost:9200/_shutdown'
- curl -XPOST 'http://localhost:9200/_cluster/nodes/_local/_shutdown'
- curl -XPOST 'http://localhost:9200/_cluster/nodes/nodeId1,nodeId2/_shutdown'
- curl -XPOST 'http://localhost:9200/_cluster/nodes/_master/_shutdown'
- Configuration
- config/elasticsearch.yml
- cluster.name
- node.name
- node.master
- node.data
- path.*
- path.conf: -Des.path.conf
- path.data
- path.work
- path.logs
- discovery.zen.ping.multicast.enabled: false
- discovery.zen.ping.unicast.hosts
- gateway.recover_after_nodes: n
- discovery.zen.minimum_master_nodes: (n/2) + 1
- action.disable_delete_all_indices: true
- action.auto_create_index: false
- action.destructive_requires_name: true
- index.mapper.dynamic: false
- script.disable_dynamic: true
- indices.fielddata.cache.size: 40%
- dynamic
- discovery.zen.minimum_master_nodes
curl -XPUT localhost:9200/_cluster/settings -d '{ "persistent" : { "discovery.zen.minimum_master_nodes" : (n/2) + 1 } }'
- disable _all
PUT /my_index/_mapping/my_type { "my_type": { "_all": { "enabled": false } } }
- include_in_all
PUT /my_index/my_type/_mapping { "my_type": { "include_in_all": false, "properties": { "title": { "type": "string", "include_in_all": true }, ... } } }
- _alias, _aliases
PUT /my_index_v1 PUT /my_index_v1/_alias/my_index
POST /_aliases { "actions": [ { "remove": { "index": "my_index_v1", "alias": "my_index" }}, { "add": { "index": "my_index_v2", "alias": "my_index" }} ] }
- refresh_interval (bulk indexing)
PUT /my_logs { "settings": { "refresh_interval": "30s" } }
POST /my_logs/_settings { "refresh_interval": -1 } POST /my_logs/_settings { "refresh_interval": "1s" }
- flush
POST /blogs/_flush POST /_flush?wait_for_ongoing
- optimize
POST /logstash-old-index/_optimize?max_num_segments=1
- filed length norm (for logging)
PUT /my_index { "mappings": { "doc": { "properties": { "text": { "type": "string", "norms": { "enabled": false } } } } } }
- tune cluster and index recovery settings (test the value)
curl -XPUT localhost:9200/_cluster/settings -d '{"transient":{"cluster.routing.allocation.node_initial_primary_recoveries":25}}' curl -XPUT localhost:9200/_cluster/settings -d '{"transient":{"cluster.routing.allocation.node_concurrent_recoveries":5}}' ? curl -XPUT localhost:9200/_cluster/settings -d '{"transient":{"cluster.recovery.max_bytes_per_sec":"100mb"}}' curl -XPUT localhost:9200/_cluster/settings -d '{"transient":{"cluster.recovery.concurrent_streams":20}}'
- logging.yml
- use node.name instead of cluster.name
file: ${path.logs}/${node.name}.log
- elasticsearch.in.sh
- disable HeapDumpOnOutOfMemoryError
#JAVA_OPTS="$JAVA_OPTS -XX:+HeapDumpOnOutOfMemoryError"
- ES_HEAP_SIZE: 50% (< 32g)
- export ES_HEAP_SIZE=31g
- no swap
- bootstrap.mlockall = true
- ulimit -l unlimited
- thread pools
- thread pool size
- search - 3 * # of processors (3 * 64 = 192)
- index - 2 * # of processors (2 * 64 = 128)
- bulk - 3 * # of processors (3 * 64 = 192)
- queues - set the size to -1 to prevent rejections from ES
- thread pool size
- buffers
- increased indexing buffer size to 40%
- dynamic node.name
- ES script
export ES_NODENMAE=`hostname -s`
- elasticsearch.yml
node.name: "${ES_NODENAME}"
- config/elasticsearch.yml
- Hardware
- CPU
- core
- disk
- SSD
- noop / deadline scheduler
- better IOPS
- cheaper WRT: IOPS
- manufacturing tolerance can vary
- RAID
- do not necessarily need
- ES handles redundancy
- SSD
- CPU
- Monitoring
- curl 'localhost:9200/_cluster/health'
- curl 'localhost:9200/_nodes/process'
- max_file_descriptotrs: 30000?
- curl 'localhost:9200/_nodes/jvm'
- version
- mem.heap_max
- curl 'localhost:9200/_nodes/jvm/stats'
- heap_used
- curl 'localhost:9200/_nodes/indices/stats'
- fielddata
- curl 'localhost:9200/_nodes/indices/stats?fields=created_on'
- fields
- curl 'localhost:9200/_nodes/http/stats'
- http
- GET /_stats/fielddata?fields=*
- GET /_nodes/stats/indices/fielddata?fields=*
- GET /_nodes/stats/indices/fielddata?level=indices&fields=*
- Scenario
- adding nodes
- disable allocation to stop shard shuffling until ready
curl -XPUT localhost:9200/_cluster/settings -d '{"transient":{"cluster.routing.allocation.disable_allocation":true}}'
- increase speed of transfers
curl -XPUT localhost:9200/_cluster/settings -d '{"transient":{"indices.recovery.concurrent_streams":6,"indices.recovery.max_bytes_per_sec":"50mb"}}'
- start new nodes
- enable allocation
curl -XPUT localhost:9200/_cluster/settings -d '{"transient":{"cluster.routing.allocation.disable_allocation":false}}'
- removing nodes
- exclude the nodes from the cluster, this will tell ES to move things off
curl -XPUT localhost:9200/_cluster/settings -d '{"transient":{"cluster.routing.allocation.exclude._name":"node-05*,node-06*"}}'
- increase speed of transfers
curl -XPUT localhost:9200/_cluster/settings -d '{"transient":{"indices.recovery.concurrent_streams":6,"indices.recovery.max_bytes_per_sec":"50mb"}}'
- shutdown old nodes after all shards move off
curl -XPOST 'localhost:9200/_cluster/nodes/node-05*,node-06*/_shutdown'
- upgrades / node restarts
- disable auto balancing if doing rolling restarts
curl -XPUT localhost:9200/_cluster/settings -d '{"transient":{"cluster.routing.allocation.disable_allocation":true}}'
- restart
- able auto balancing
curl -XPUT localhost:9200/_cluster/settings -d '{"transient":{"cluster.routing.allocation.disable_allocation":false}}'
- re / bulk indexing
- set replicas to 0
- increase after completion
- configure heap size
- heap size setting
- export ES_HEAP_SIZE=9g
- curl -XPUT localhost:9200/_cluster/settings -d '{"transient":{"cluster.routing.allocation.disable_allocation":true}}'
- curl -XPOST 'http://localhost:9200/_cluster/nodes/_local/_shutdown'
- bin/elasticsearch -d
- curl -XPUT localhost:9200/_cluster/settings -d '{"transient":{"cluster.routing.allocation.disable_allocation":false}}'
- adding nodes
- Commnad
Zeppelin installation (HDP)
- echo $JAVA_HOME
- git clone https://github.com/apache/incubator-zeppelin.git
- cd incubator-zeppelin
- mvn clean install -DskipTests -Pspark-1.3 -Dspark.version=1.3.1 -Phadoop-2.6 -Pyarn
- hdp-select status hadoop-client | sed 's/hadoop-client - \(.*\)/\1/'
- 2.3.0.0-2557
- vim conf/zeppelin-env.sh
- export HADOOP_CONF_DIR=/etc/hadoop/conf
- export ZEPPELIN_PORT=10008
- export ZEPPELIN_JAVA_OPTS="-Dhdp.version=2.3.0.0-2557"
- cp /etc/hive/conf/hive-site.xml conf/
- su hdfs -l -c 'hdfs dfs -mkdir /user/zeppelin;hdfs dfs -chown zeppelin:hdfs /user/zeppelin'
- bin/zeppelin-daemon.sh start
- http://$host:10008
Maven
- 2015.08.12
- install maven using yum
- wget http://repos.fedorapeople.org/repos/dchen/apache-maven/epel-apache-maven.repo -O /etc/yum.repos.d/epel-apache-maven.repo
- yum -y install apache-maven
- installation
- echo $JAVA_HOME
- cd /opt
- wget http://apache.tt.co.kr/maven/maven-3/3.3.3/binaries/apache-maven-3.3.3-bin.zip
- wget http://www.apache.org/dist/maven/maven-3/3.3.3/binaries/apache-maven-3.3.3-bin.zip.asc
- wget http://www.apache.org/dist/maven/KEYS
- gpg --import KEYS
- gpg --verify apache-maven-3.3.3-bin.zip.asc apache-maven-3.3.3-bin.zip
- unzip apache-maven-3.3.3-bin.zip
- export PATH=/opt/apache-maven-3.3.3/bin:$PATH
- mvn -v
- install maven using yum
Help 4 HDP - Old
- caused by: unrecognized locktype: native (solr)
- vim /opt/lucidworks-hdpsearch/solr/server/solr/configsets/data_driven_schema_configs/conf/solrconfig.xml
- search lockType
- set it to hdfs
- /opt/lucidworks-hdpsearch/solr/server/scripts/cloud-scripts/zkcli.sh -zkhost localhost:2181 -cmd upconfig -confname myCollConfigs -confdir /opt/lucidworks-hdpsearch/solr/server/solr/configsets/data_driven_schema_configs/conf
- caused by: direct buffer memory (solr)
- vim /opt/lucidworks-hdpsearch/solr/server/solr/configsets/data_driven_schema_configs/conf/solrconfig.xml
- search solr.hdfs.blockcache.direct.memory.allocation
- set it to false
- /opt/lucidworks-hdpsearch/solr/server/scripts/cloud-scripts/zkcli.sh -zkhost localhost:2181 -cmd upconfig -confname myCollConfigs -confdir /opt/lucidworks-hdpsearch/solr/server/solr/configsets/data_driven_schema_configs/conf
- restart solr
- or try 'caused by: java heap space (solr)' directly
- caused by: java heap space (solr)
- vim /opt/lucidworks-hdpsearch/solr/bin/solr.in.sh
- search solr_heap
- increase it
- restart solr
- error: not found: value StructType (spark)
- import org.apache.spark.sql.types._
- note that you need to import 'import org.apache.spark.sql.types._' even if 'import org.apache.spark.sql._' is already imported
- no response from namenode UI / 50070 is binded to private IP (hadoop)
- ambari web -> HDFS -> configs -> custom core-site -> add property
- key: dfs.namenode.http-bind-host
- value: 0.0.0.0
- save it and restart related services
- note that there are 'dfs.namenode.rpc-bind-host', 'dfs.namenode.servicerpc-bind-host' and 'dfs.namenode.https-bind-host' properties which can solve similar issue
- ambari web -> HDFS -> configs -> custom core-site -> add property
- root is not allowed to impersonate <username> (hadoop)
- ambari web -> HDFS -> configs -> custom core-site -> add property
- key: hadoop.proxyuser.root.groups
- value: *
- key: hadoop.proxyuser.root.hosts
- value: *
- save it and restart related services
- note that you should change root to the user name who runs/submits the service/job
- ambari web -> HDFS -> configs -> custom core-site -> add property
- option sql_select_limit=default (ambari)
- use latest jdbc driver
- cd /usr/share
- mkdir java
- cd java
- wget http://cdn.mysql.com/Downloads/Connector-J/mysql-connector-java-5.1.36.zip
- unzip mysql-connector-java-5.1.36.zip
- cp mysql-connector-java-5.1.36/mysql-connector-java-5.1.36-bin.jar .
- ln -s mysql-connector-java-5.1.36-bin.jar mysql-connector-java.jar
- use latest jdbc driver
HDP Search installation on HDP 2.3
- prerequisites
- CentOS v6.x / Red Hat Enterprise Linux (RHEL) v6.x / Oracle Linux v6.x
- JDK 1.7 or higher
- Hortonworks Data Platform (HDP) v2.3
- installation
- note that solr should be installed on each node that runs HDFS
- each node
- export JAVA_HOME=/usr/jdk64/jdk1.8.0_40/
- ls /etc/yum.repos.d/HDP-UTILS.repo
- yum -y install lucidworks-hdpsearch
Help 4 CDH
- failed to start name node (hadoop)
- check the permission of command(s) which are used to perform name node related processes such as /bin/df
- modify permission(s) properly, if necessary
- delete the name node dir
- retry
- it is recommended to check and modify the permission of command(s) before installing HDFS, otherwise HDFS may not be installed correctly
- line length exceeds max (flume)
- increase deserializer.maxlinelength
- agent01.sources.source01.deserializer.maxLineLength
- increase deserializer.maxlinelength
- the channel is full (flume)
- increase memory capacity
- agent01.channels.channel01.capacity
- increase memory capacity
- fail to extract date information by using specified date format (flume)
- check LANG configuration
- LANG="en_US.UTF-8"
- check LANG configuration
- failed parsing date from field (logstash)
- set locale to en
date { locale => en match => [ "timestamp" , "dd/MMM/yyyy:HH:mm:ss Z" ] }
- set locale to en
- could not load ffi provider (logstash)
- configure to use another directory instead of /tmp or
- mount -o remount,exec /tmp
Kerberos
- 2015.08.03
- Introduction
- Kerberos is a computer network authentication protocol which works on the basis of 'tickets' to allow nodes communicating over a non-secure network to prove their identity to one another in a secure manner
- Install a new MIT KDC
- yum -y install krb5-server krb5-libs krb5-workstation
- vi/etc/krb5.conf
- Change the [realms] section of this file by replacing the default “kerberos.example.com” setting for the kdc and admin_server properties with the Fully Qualified Domain Name of the KDC server host
- kdb5_util create -s
- kadmin.local -q "addprinc admin/admin"
- /etc/rc.d/init.d/krb5kdc start
- /etc/rc.d/init.d/kadmin start
- chkconfig krb5kdc on
- chkconfig kadmin on
- Introduction
Lucidworks - Connectors
- 2015.08.04
- hive serde
- introduction
- The Lucidworks Hive SerDe allows reading and writing data to and from Solr using Apache Hive
- example
- hive
- CREATE TABLE books (id STRING, cat STRING, title STRING, price FLOAT, in_stock BOOLEAN, author STRING, series STRING, seq INT, genre STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
- LOAD DATA LOCAL INPATH '/opt/lucidworks-hdpsearch/solr/example/exampledocs/books.csv' OVERWRITE INTO TABLE books;
- ADD JAR /opt/lucidworks-hdpsearch/hive/lucidworks-hive-serde-2.0.3.jar;
- CREATE EXTERNAL TABLE solr (id STRING, cat_s STRING, title_s STRING, price_f FLOAT, in_stock_b BOOLEAN, author_s STRING, series_s STRING, seq_i INT, genre_s STRING) STORED BY 'com.lucidworks.hadoop.hive.LWStorageHandler' LOCATION '/tmp/solr' TBLPROPERTIES('solr.server.url' = 'http://10.0.2.104:8983/solr', 'solr.collection' = 'myCollection');
- INSERT OVERWRITE TABLE solr SELECT b.* FROM books b;
- solr UI -> core selector -> myCollection_shar1_replica1 -> query -> execute query
- introduction
- hive serde
Subscribe to:
Posts (Atom)