- Introduction
- Elasticsearch is an open-source search engine built on top of Apache Lucene™ , a full-text search engine library
- Elasticsearch is a real-time distributed search and analytics engine
- It allows you to explore your data at a speed and at a scale never before possible
- It is used for full text search, structured search, analytics, and all three in combination
- Why ES
- reason 1
- Unfortunately, most databases are astonishingly inept at extracting actionable knowledge from your data
- Can they perform full-text search, handle synonyms and score documents by relevance?
- Can they generate analytics and aggregations from the same data?
- Most importantly, can they do this in real-time without big batch processing jobs?
- reason 2
- A distributed real-time document store where every field is indexed and searchable
- A distributed search engine with real-time analytics
- Capable of scaling to hundreds of servers and petabytes of structured and unstructured data
- reason 1
- Installation
- curl -L -O https://download.elasticsearch.org/elasticsearch/elasticsearch/elasticsearch-1.3.2.zip
- unzip elasticsearch-1.3.2.zip
- cd elasticsearch-1.3.2
- ./bin/elasticsearch -d
- Cluster
- 3 lower resource master-eligible nodes in large clusters
- light wight client nodes
- metal is more configurable
- metal can utilize SSD
- Commnad
- curl 'http://localhost:9200/?pretty'
- curl -XPOST 'http://localhost:9200/_shutdown'
- curl -XPOST 'http://localhost:9200/_cluster/nodes/_local/_shutdown'
- curl -XPOST 'http://localhost:9200/_cluster/nodes/nodeId1,nodeId2/_shutdown'
- curl -XPOST 'http://localhost:9200/_cluster/nodes/_master/_shutdown'
- Configuration
- config/elasticsearch.yml
- cluster.name
- node.name
- node.master
- node.data
- path.*
- path.conf: -Des.path.conf
- path.data
- path.work
- path.logs
- discovery.zen.ping.multicast.enabled: false
- discovery.zen.ping.unicast.hosts
- gateway.recover_after_nodes: n
- discovery.zen.minimum_master_nodes: (n/2) + 1
- action.disable_delete_all_indices: true
- action.auto_create_index: false
- action.destructive_requires_name: true
- index.mapper.dynamic: false
- script.disable_dynamic: true
- dynamic
- discovery.zen.minimum_master_nodes
curl -XPUT localhost:9200/_cluster/settings -d '{ "persistent" : { "discovery.zen.minimum_master_nodes" : (n/2) + 1 } }'
- disable _all
PUT /my_index/_mapping/my_type { "my_type": { "_all": { "enabled": false } } }
- include_in_all
PUT /my_index/my_type/_mapping { "my_type": { "include_in_all": false, "properties": { "title": { "type": "string", "include_in_all": true }, ... } } }
- _alias, _aliases
PUT /my_index_v1 PUT /my_index_v1/_alias/my_index
POST /_aliases { "actions": [ { "remove": { "index": "my_index_v1", "alias": "my_index" }}, { "add": { "index": "my_index_v2", "alias": "my_index" }} ] }
- refresh_interval (bulk indexing)
PUT /my_logs { "settings": { "refresh_interval": "30s" } }
POST /my_logs/_settings { "refresh_interval": -1 } POST /my_logs/_settings { "refresh_interval": "1s" }
- flush
POST /blogs/_flush POST /_flush?wait_for_ongoing
- optimize
POST /logstash-old-index/_optimize?max_num_segments=1
- filed length norm (for logging)
PUT /my_index { "mappings": { "doc": { "properties": { "text": { "type": "string", "norms": { "enabled": false } } } } } }
- tune cluster and index recovery settings (test the value)
curl -XPUT localhost:9200/_cluster/settings -d '{"transient":{"cluster.routing.allocation.node_initial_primary_recoveries":25}}' curl -XPUT localhost:9200/_cluster/settings -d '{"transient":{"cluster.routing.allocation.node_concurrent_recoveries":5}}' ? curl -XPUT localhost:9200/_cluster/settings -d '{"transient":{"cluster.recovery.max_bytes_per_sec":"100mb"}}' curl -XPUT localhost:9200/_cluster/settings -d '{"transient":{"cluster.recovery.concurrent_streams":20}}'
- discovery.zen.minimum_master_nodes
- logging.yml
- use node.name instead of cluster.name
file: ${path.logs}/${node.name}.log
- use node.name instead of cluster.name
- elasticsearch.in.sh
- disable HeapDumpOnOutOfMemoryError
#JAVA_OPTS="$JAVA_OPTS -XX:+HeapDumpOnOutOfMemoryError"
- disable HeapDumpOnOutOfMemoryError
- ES_HEAP_SIZE: 50%
- heaps < 32GB
- no swap
- bootstrap.mlockall = true
- ulimit -l unlimited
- thread pools
- thread pool size
- search - 3 * # of processors (3 * 64 = 192)
- index - 2 * # of processors (2 * 64 = 128)
- bulk - 3 * # of processors (3 * 64 = 192)
- queues - set the size to -1 to prevent rejections from ES
- thread pool size
- buffers
- increased indexing buffer size to 40%
- dynamic node.name
- ES script
export ES_NODENMAE=`hostname -s`
- elasticsearch.yml
node.name: "${ES_NODENAME}"
- ES script
- config/elasticsearch.yml
- Hardware
- CPU
- core
- disk
- SSD
- noop / deadline scheduler
- better IOPS
- cheaper WRT: IOPS
- manufacturing tolerance can vary
- RAID
- do not necessarily need
- ES handles redundancy
- SSD
- CPU
- Monitoring
- curl 'localhost:9200/_cluster/health'
- curl 'localhost:9200/_nodes/process'
- max_file_descriptotrs: 30000?
- curl 'localhost:9200/_nodes/jvm'
- version
- mem.heap_max
- curl 'localhost:9200/_nodes/jvm/stats'
- heap_used
- curl 'localhost:9200/_nodes/indices/stats'
- fielddata
- curl 'localhost:9200/_nodes/indices/stats?fields=created_on'
- fields
- curl 'localhost:9200/_nodes/http/stats'
- http
- GET /_stats/fielddata?fields=*
- GET /_nodes/stats/indices/fielddata?fields=*
- GET /_nodes/stats/indices/fielddata?level=indices&fields=*
- Scenario
- adding nodes
- disable allocation to stop shard shuffling until ready
curl -XPUT localhost:9200/_cluster/settings -d '{"transient":{"cluster.routing.allocation.disable_allocation":true}}'
- increase speed of transfers
curl -XPUT localhost:9200/_cluster/settings -d '{"transient":{"indices.recovery.concurrent_streams":6,"indices.recovery.max_bytes_per_sec":"50mb"}}'
- start new nodes
- enable allocation
curl -XPUT localhost:9200/_cluster/settings -d '{"transient":{"cluster.routing.allocation.disable_allocation":false}}'
- disable allocation to stop shard shuffling until ready
- removing nodes
- exclude the nodes from the cluster, this will tell ES to move things off
curl -XPUT localhost:9200/_cluster/settings -d '{"transient":{"cluster.routing.allocation.exclude._name":"node-05*,node-06*"}}'
- increase speed of transfers
curl -XPUT localhost:9200/_cluster/settings -d '{"transient":{"indices.recovery.concurrent_streams":6,"indices.recovery.max_bytes_per_sec":"50mb"}}'
- shutdown old nodes after all shards move off
curl -XPOST 'localhost:9200/_cluster/nodes/node-05*,node-06*/_shutdown'
- exclude the nodes from the cluster, this will tell ES to move things off
- upgrades / node restarts
- disable auto balancing if doing rolling restarts
curl -XPUT localhost:9200/_cluster/settings -d '{"transient":{"cluster.routing.allocation.disable_allocation":true}}'
- restart
- able auto balancing
curl -XPUT localhost:9200/_cluster/settings -d '{"transient":{"cluster.routing.allocation.disable_allocation":false}}'
- disable auto balancing if doing rolling restarts
- re / bulk indexing
- set replicas to 0
- increase after completion
- adding nodes
- Restoration
- snapshot
- Reference
- Elasticsearch - The Definitive Guide
- http://www.elasticsearch.org/webinars/elasticsearch-pre-flight-checklist/
- http://www.elasticsearch.org/webinars/elk-stack-devops-environment/
- http://www.elasticsearch.org/videos/moloch-elasticsearch-powering-network-forensics-aol/
- http://www.elasticsearch.org/videos/elastic-searching-big-data/
Wednesday, December 31, 2014
ElasticSearch
Apache Sqoop
- Introduction
- Sqoop is a tool designed to transfer data between Hadoop and relational databases. You can use Sqoop to import data from a relational database management system (RDBMS) such as MySQL or Oracle into the Hadoop Distributed File System (HDFS), transform the data in Hadoop MapReduce, and then export the data back into an RDBMS.
- Creating password file
- echo -n password > .password
- hdfs dfs -put .password /user/$USER/
- Installing the MySQL JDBC driver in CDH
- mkdir -p /var/lib/sqoop
- chown sqoop:sqoop /var/lib/sqoop
- chmod 755 /var/lib/sqoop
- donwload JDBC dirver from http://dev.mysql.com/downloads/connector/j/5.1.html
- sudo cp mysql-connector-java-version/mysql-connector-java-version-bin.jar /var/lib/sqoop/
- Reference
Apache Solr
- Solr 소개
- REST 형식의 API를 사용하는 독립적 엔터프라이즈 검색 서버
- XML, JSON, CSV 혹은 HTTP 위 바이너리로 인덱싱을 진행
- HTTP GET 요청으로 검색
- XML, JSON, CSV 혹은 바이너리 결과를 반환
- Solr 특징
- 고급 Full-Text 검색 능력
- 다량의 웹 트래픽을 위한 최적화
- XML, JSON과 HTTP 등 오픈 인터페이스 기반 표준
- 종합적인 HTML 관리자 인터페이스
- 모니터링을 위한 서버 통계
- 선형 확장, 자동 인덱스 복사, 자동 대체와 복구
- 근 실시간 인덱싱
- XML 설정으로 유연성과 적응성을 제공
- 확장가능한 플러그인 아키텍처
- 사용 설명
- 설치 및 실행
- Java 1.7 이상 필요
- wget http://apache.mirror.cdnetworks.com/lucene/solr/4.9.0/solr-4.9.0.tgz
- tar -xvf solr-4.9.0.tgz
- cd solr-4.9.0/
- cd example/
- java -jar start.jar &
- 아래 주소를 통하여 Solr Admin에 접속
- http://211.49.227.178:8983/solr (211.49.227.178 대신 정확한 IP로 대체)
- 아래 그림은 Solr Admin 화면
- 데이터 인덱싱
- cd exampledocs/
- 아래 명령어를 통하여 solr.xml과 monitor.xml를 인덱싱
- java -jar post.jar solr.xml monitor.xml
- 아래 방법으로 solr 단어를 검색
- Core Selector -> collection1 -> Query -> q에 solr 입력 -> wt에서 xml 선택 -> Execute Query
- 혹은 아래 http 요청으로 직접 검색
- http://211.49.227.178:8983/solr/collection1/select?q=solr&wt=xml
- 아래 명령어로 exampledocs에 있는 모든 xml 파일 인덱싱
- java -jar post.jar *.xml
- 데이터 갱신
- 'java -jar post.jar solr.xml monitor.xml'와 'java -jar post.jar *.xml' 명령어로 solr.xml과 monitor.xml를 두번 인덱싱 하였지만 중복으로 인덱싱되지 않고 후에 인덱싱한 내용으로 대체
- 이는 인덱싱함에 있어 'id'라는 유니크한 키를 사용하기 때문이며 데이터 갱신은 이런 형식으로 진행
- 데이터 삭제
- 아래 명령어로 id가 SP2514N인 문서를 삭제
- java -Ddata=args -Dcommit=false -jar post.jar "<delete><id>SP2514N</id></delete>"
- 아래 명령어로 name에 DDR 단어가 포함된 문서를 삭제
- java -Dcommit=false -Ddata=args -jar post.jar "<delete><query>name:DDR</query></delete>"
- 위 삭제 명령어에서 -Dcommit=false 옵션을 사용하였기에 아래 명령어로 커밋하여야 문서를 검색하지 못함
- java -jar post.jar -
- 이렇게 하는 원인은 커밋하는 작업이 비싸기에 모든 작업을 한번에 커밋하는 것을 추천하기 때문
- 데이터 검색
- 아래 요청으로 video 관련 문서에서 name과 id 필드만 반환
- http://211.49.227.178:8983/solr/collection1/select/?indent=on&q=video&fl=name,id
- 아래 요청으로 video 관련 문서의 모든 내용을 반환
- http://211.49.227.178:8983/solr/collection1/select/?indent=on&q=video&fl=*
- 아래 요청으로 video 관련 문서에서 name, id 및 price를 필드를 반환하는데 price 내림 순으로 정렬
- http://211.49.227.178:8983/solr/collection1/select/?indent=on&q=video&sort=price%20desc&fl=name,id,price
- 아래 요청으로 video 관련 문서를 JSON으로 반환
- http://211.49.227.178:8983/solr/collection1/select/?indent=on&q=video&wt=json
- 아래 요청으로 video 관련 문서를 inStock 오름 순과 price 내림 순으로 반환
- http://211.49.227.178:8983/solr/collection1/select/?indent=on&q=video&sort=inStock%20asc,%20price%20desc
- 아래 요청처럼 함수 사용 가능
- http://211.49.227.178:8983/solr/collection1/select/?indent=on&q=video&sort=div(popularity,add(price,1))%20desc
- 설치 및 실행
- 참고 자료
Kafka Web Console
- Introduction
- Kafka Web Console is a Java web application for monitoring Apache Kafka. With a modern web browser, you can view from the console
- Registered brokers
- Topics, partitions, log sizes, and partition leaders
- Consumer groups, individual consumers, consumer owners, partition offsets and lag
- Graphs showing consumer offset and lag history as well as consumer/producer message throughput history
- Latest published topic messages (requires web browser support for WebSocket)
- Kafka Web Console is a Java web application for monitoring Apache Kafka. With a modern web browser, you can view from the console
- Prerequisite
- Typesafe Activator
- download
- unzip typesafe-activator-1.2.10.zip\?_ga\=1.191598701.679995947.1412749425
- export PATH=$PATH:/relativePath/to/activator
- activator -help
- Play Framework
- download
- unzip play-2.2.5.zip
- export PATH=$PATH:/relativePath/to/play
- Typesafe Activator
- installation
- wget https://github.com/claudemamo/kafka-web-console/archive/master.zip
- unzip master.zip
- cd kafka-web-console-master
- play start
- error
- Database 'default' needs evolution!
- play "start -DapplyEvolutions.default=true"
- org.jboss.netty.channel.ChannelException: Failed to bind to: /0.0.0.0:9000
- play "start -Dhttp.port=8080"
- Database 'default' needs evolution!
- http://hostname:8080/
- Register
- zookeepers -> resiger zookeeper
- Reference
Kafka Offset Monitor
- Introduction
- This is an app to monitor your kafka consumers and their position (offset) in the queue
- You can see the current consumer groups, for each group the topics that they are consuming and the position of the group in each topic queue
- This is useful to understand how quick you are consuming from a queue and how fast the queue is growing
- It allows for debuging kafka producers and consumers or just to have an idea of what is going on in your system
- The app keeps an history of queue position and lag of the consumers so you can have an overview of what has happened in the last days
- Installation
- Download
- run it
java -cp KafkaOffsetMonitor-assembly-0.2.0.jar \ com.quantifind.kafka.offsetapp.OffsetGetterWeb \ --zk zk-server1,zk-server2 \ --port 8080 \ --refresh 10.seconds \ --retain 2.days
- Arguments
- zk the ZooKeeper hosts port on what
- port will the app be available refresh how often should the app
- refresh and store a point in the DB
- retain how long should points be kept in the DB
- dbName where to store the history (default 'offsetapp')
- Reference
Apache Kafka
- Introduction
- Kafka is a distributed, partitioned, replicated commit log service
- It provides the functionality of a messaging system, but with a unique design
- Feature
- Fast
- A single Kafka broker can handle hundreds of megabytes of reads and writes per second from thousands of clients
- Scalable
- Kafka is designed to allow a single cluster to serve as the central data backbone for a large organization
- It can be elastically and transparently expanded without downtime
- Data streams are partitioned and spread over a cluster of machines to allow data streams larger than the capability of any single machine and to allow clusters of co-ordinated consumers
- Durable
- Messages are persisted on disk and replicated within the cluster to prevent data loss. Each broker can handle terabytes of messages without performance impact
- Distributed by Design
- Kafka has a modern cluster-centric design that offers strong durability and fault-tolerance guarantees
- Fast
- Installation
- wget http://mirror.apache-kr.org/kafka/0.8.1.1/kafka_2.9.2-0.8.1.1.tgz
- tar -xzf kafka_2.9.2-0.8.1.1.tgz
- cd kafka_2.9.2-0.8.1.1
- bin/kafka-server-start.sh config/server.properties
- Configuration
- broker
- broker.id
- log.dirs
- zookeeper.connect
- host.name
- topic-level
- bin/kafka-topics.sh --zookeeper localhost:2181 --create --topic my-topic --partitions 1 --replication-factor 1 --config max.message.bytes=64000 --config flush.messages=1
- bin/kafka-topics.sh --zookeeper localhost:2181 --alter --topic my-topic --config max.message.bytes=128000
- bin/kafka-topics.sh --zookeeper localhost:2181 --alter --topic my-topic --deleteConfig max.message.bytes
- controlled.shutdown.enable=true
- auto.leader.rebalance.enable=true
- consumer
- group.id
- zookeeper.connect
- producer
- metadata.broker.list
- server production server configuration
# Replication configurations num.replica.fetchers=4 replica.fetch.max.bytes=1048576 replica.fetch.wait.max.ms=500 replica.high.watermark.checkpoint.interval.ms=5000 replica.socket.timeout.ms=30000 replica.socket.receive.buffer.bytes=65536 replica.lag.time.max.ms=10000 replica.lag.max.messages=4000 controller.socket.timeout.ms=30000 controller.message.queue.size=10 # Log configuration num.partitions=8 message.max.bytes=1000000 auto.create.topics.enable=true log.index.interval.bytes=4096 log.index.size.max.bytes=10485760 log.retention.hours=168 log.flush.interval.ms=10000 log.flush.interval.messages=20000 log.flush.scheduler.interval.ms=2000 log.roll.hours=168 log.cleanup.interval.mins=30 log.segment.bytes=1073741824 # ZK configuration zookeeper.connection.timeout.ms=6000 zookeeper.sync.time.ms=2000 # Socket server configuration num.io.threads=8 num.network.threads=8 socket.request.max.bytes=104857600 socket.receive.buffer.bytes=1048576 socket.send.buffer.bytes=1048576 queued.max.requests=16 fetch.purgatory.purge.interval.requests=100 producer.purgatory.purge.interval.requests=100
- broker
- Operations
- bin/kafka-topics.sh --zookeeper zk_host:port/chroot --create --topic my_topic_name --partitions 20 --replication-factor 3 --config x=y
- bin/kafka-topics.sh --zookeeper zk_host:port/chroot --alter --topic my_topic_name --partitions 40
- bin/kafka-topics.sh --zookeeper zk_host:port/chroot --alter --topic my_topic_name --config x=y
- bin/kafka-topics.sh --zookeeper zk_host:port/chroot --alter --topic my_topic_name --deleteConfig x
- bin/kafka-topics.sh --zookeeper zk_host:port/chroot --delete --topic my_topic_name
- bin/kafka-preferred-replica-election.sh --zookeeper zk_host:port/chroot
- bin/kafka-run-class.sh kafka.tools.ConsumerOffsetChecker --zkconnect localhost:2181 --group test
- Reference
Apache Flume
- Introduction
- Flume is a distributed, reliable, and available service for efficiently collecting, aggregating, and moving large amounts of log data.
- It has a simple and flexible architecture based on streaming data flows.
- It is robust and fault tolerant with tunable reliability mechanisms and many failover and recovery mechanisms.
- It uses a simple extensible data model that allows for online analytic application.
- System Requirements
- Java Runtime Environment - Java 1.6 or later (Java 1.7 Recommended)
- Memory - Sufficient memory for configurations used by sources, channels or sinks
- Disk Space - Sufficient disk space for configurations used by channels or sinks
- Directory Permissions - Read/Write permissions for directories used by agent
- Features
- complex flows
- Flume allows a user to build multi-hop flows where events travel through multiple agents before reaching the final destination. It also allows fan-in and fan-out flows, contextual routing and backup routes (fail-over) for failed hops.
- reliability
- The events are staged in a channel on each agent. The events are then delivered to the next agent or terminal repository (like HDFS) in the flow. The events are removed from a channel only after they are stored in the channel of next agent or in the terminal repository. This is a how the single-hop message delivery semantics in Flume provide end-to-end reliability of the flow. Flume uses a transactional approach to guarantee the reliable delivery of the events. The sources and sinks encapsulate in a transaction the storage/retrieval, respectively, of the events placed in or provided by a transaction provided by the channel. This ensures that the set of events are reliably passed from point to point in the flow. In the case of a multi-hop flow, the sink from the previous hop and the source from the next hop both have their transactions running to ensure that the data is safely stored in the channel of the next hop.
- recover ability
- The events are staged in the channel, which manages recovery from failure. Flume supports a durable file channel which is backed by the local file system. There’s also a memory channel which simply stores the events in an in-memory queue, which is faster but any events still left in the memory channel when an agent process dies can’t be recovered.
- complex flows
- Installation
- download
- tar xvf apache-flume-1.5.0.1-bin.tar.gz
- cd apache-flume-1.5.0.1-bin
- bin/flume-ng agent -n $agent_name -c conf -f conf/flume-conf.properties.template
- CDH
- Installattion
- CDH5-Installation-Guide.pdf (P.155)
- http://www.cloudera.com/content/cloudera-content/cloudera-docs/CDH5/latest/CDH5-Installation-Guide/cdh5ig_flume_installation.html
- Security Configuration
- CDH5-Security-Guide.pdf (P.53)
- http://www.cloudera.com/content/cloudera-content/cloudera-docs/CDH5/latest/CDH5-Security-Guide/cdh5sg_flume_security.html
- Installattion
- CM
- Service
- Managing-Clusters-with-Cloudera-Manager.pdf (P.49)
- http://www.cloudera.com/content/cloudera-content/cloudera-docs/CM5/latest/Cloudera-Manager-Managing-Clusters/cm5mc_flume_service.html
- Properties: http://www.cloudera.com/content/cloudera-content/cloudera-docs/CM5/latest/Cloudera-Manager-Configuration-Properties/cm5config_cdh500_flume.html
- Health Tests
- Metrics
- Metrics: http://www.cloudera.com/content/cloudera-content/cloudera-docs/CM5/latest/Cloudera-Manager-Metrics/flume_metrics.html
- Channel Metrics: http://www.cloudera.com/content/cloudera-content/cloudera-docs/CM5/latest/Cloudera-Manager-Metrics/flume_channel_metrics.html
- Sink Metrics: http://www.cloudera.com/content/cloudera-content/cloudera-docs/CM5/latest/Cloudera-Manager-Metrics/flume_sink_metrics.html
- Source Metrics: http://www.cloudera.com/content/cloudera-content/cloudera-docs/CM5/latest/Cloudera-Manager-Metrics/flume_source_metrics.html
- Service
- Reference
Talend Open Studio for Big Data
- Introduction
- Talend provides a powerful and versatile open source big data product that makes the job of working with big data technologies easy and helps drive and improve business performance, without the need for specialist knowledge or resources.
- Features
- Integration at Cluster Scale
- Talend’s big data product combines big data components for MapReduce 2.0 (YARN), Hadoop, HBase, Hive, HCatalog, Oozie, Sqoop and Pig into a unified open source environment so you can quickly load, extract, transform and process large and diverse data sets from disparate systems.
- Big Data Without The Need To Write / Maintain Code
- Ready to Use Big Data Connectors
- Talend provides an easy-to-use graphical environment that allows developers to visually map big data sources and targets without the need to learn and write complicated code. Running 100% natively on Hadoop, Talend Big Data provides massive scalability. Once a big data connection is configured the underlying code is automatically generated and can be deployed remotely as a job that runs natively on your big data cluster - HDFS, Pig, HCatalog, HBase, Sqoop or Hive.
- Big Data Distribution and Big Data Appliance Support
- Talend's big data components have been tested and certified to work with leading big data Hadoop distributions, including Amazon EMR, Cloudera, IBM PureData, Hortonworks, MapR, Pivotal Greenplum, Pivotal HD, and SAP HANA. Talend provides out-of-the-box support for big data platforms from the leading appliance vendors including Greenplum/Pivotal, Netezza, Teradata, and Vertica.
- Open Source
- Using the Apache software license means developers can use the Studio without restrictions. As Talend’s big data products rely on standard Hadoop APIs, users can easily migrate their data integration jobs between different Hadoop distributions without any concerns about underlying platform dependencies. Support for Apache Oozie is provided out-of-the-box, allowing operators to schedule their data jobs through open source software.
- Pull Source Data from Anywhere Including NoSQL
- With 800+ connectors, Talend integrates almost any data source so you can transform and integrate data in real-time or batch. Pre-built connectors for HBase, MongoDB,Cassandra, CouchDB, Couchbase, Neo4J and Riak speed development without requiring specific NoSQL knowledge. Talend big data components can be configured to bulk upload data to Hadoop or other big data appliance, either as a manual process, or an automatic schedule for incremental data updates.
- Ready to Use Big Data Connectors
- Integration at Cluster Scale
- Products
- https://www.talend.com/products/big-data/matrix
FEATURES Talend Open Studio for Big Data Talend Enterprise Big Data Talend Platform for Big Data Job Designer xxxComponents for HDFS, HBase, HCatalog, Hive, Pig, Sqoop xxxHadoop Job Scheduler xxxNoSQL Support xxxVersioning xxxShared Repository xxReporting and Dashboards xHadoop Profiling, Parsing and Matching xIndemnification/Warranty and Talend Support xxLicense Open Source Subscription Subscription
- https://www.talend.com/products/big-data/matrix
- Reference
Tuesday, September 2, 2014
LogStash - 'Could not load FFI Provider'
1. version
- 1.4.2
2. Error
- LoadError: Could not load FFI Provider: (NotImplementedError) FFI not available: null
3. Solution
- cd logstash-1.4.2/bin
- vim logstash.lib.sh
- add 'JAVA_OPTS="$JAVA_OPTS -Djava.io.tmpdir=/path/to/somewhere"
- mkdir /path/to/somewhere
- start logstash
- 1.4.2
2. Error
- LoadError: Could not load FFI Provider: (NotImplementedError) FFI not available: null
3. Solution
- cd logstash-1.4.2/bin
- vim logstash.lib.sh
- add 'JAVA_OPTS="$JAVA_OPTS -Djava.io.tmpdir=/path/to/somewhere"
- mkdir /path/to/somewhere
- start logstash
Thursday, July 31, 2014
PostgreSQL - Installation
1. Version Info
- PostgreSQL 9.3
- CentOS base 6.4 64-bit
2. Installation
- vi /etc/yum.repos.d/CentOS-Base.repo
- add 'exclude=postgresql*' at '[base]'
- yum localinstall http://yum.postgresql.org/9.3/redhat/rhel-6-x86_64/pgdg-centos93-9.3-1.noarch.rpm
- yum list postgres*
- yum install postgresql93-server.x86_64
- service postgresql-9.3 initdb
- cd /var/lib/pgsql/9.3/data/
- vi postgresql.conf
- add "listen_addresses=’*’"
- vi pg_hba.conf
- add "host all all 'your network info'/24 trust"
- service postgresql-9.3 start
- connect to it and create a user by using command or pgAdmin
- vi pg_hba.conf
- change "host all all 'your network info'/24 trust" to "host all all 'your network info'/24 md5"
- service postgresql-9.3 restart
- PostgreSQL 9.3
- CentOS base 6.4 64-bit
2. Installation
- vi /etc/yum.repos.d/CentOS-Base.repo
- add 'exclude=postgresql*' at '[base]'
- yum localinstall http://yum.postgresql.org/9.3/redhat/rhel-6-x86_64/pgdg-centos93-9.3-1.noarch.rpm
- yum list postgres*
- yum install postgresql93-server.x86_64
- service postgresql-9.3 initdb
- cd /var/lib/pgsql/9.3/data/
- vi postgresql.conf
- add "listen_addresses=’*’"
- vi pg_hba.conf
- add "host all all 'your network info'/24 trust"
- service postgresql-9.3 start
- connect to it and create a user by using command or pgAdmin
- vi pg_hba.conf
- change "host all all 'your network info'/24 trust" to "host all all 'your network info'/24 md5"
- service postgresql-9.3 restart
Thursday, April 3, 2014
MongoDB 2.4 - sharding
- killall mongod
- killall mongos
- cd
- mkdir a0, b0, cfg0
- mongod --shardsvr --dbpath a0 --logpath log.a0 --fork --logappend --smallfiles --oplogSize 50
- Note that with --shardsvr specified the default port for mongod becomes 27018
- mongod --configsvr --dbpath cfg0 --fork --logpath log.cfg0 --logappend
- Note with --configsvr specified the default port for listening becomes 27019 and the default data directory /data/configdb. Wherever your data directory is, it is suggested that you verify that the directory is empty before you begin.
- mongos --configdb your_host_name:27019 --fork --logappend --logpath log.mongos0
- mongo
- sh.addShard('your_host_name:27018')
- sh.enableSharding('databaseName')
- db.collectionName.ensureIndex({a:1,b:1})
- sh.shardCollection('databaseName.collectionName',{a:1,b:1})
- use config
- db.chunks.find()
- db.chunks.find({}, {min:1,max:1,shard:1,_id:0,ns:1})
- exit
- mongod --shardsvr --dbpath b0 --logpath log.b0 --fork --logappend --smallfiles --oplogSize 50 --port 27000
- mongo
- sh.addShard('your_host_name:27000')
- sh.status()
- db.getSisterDB("config").shards.count()
- killall mongos
- cd
- mkdir a0, b0, cfg0
- mongod --shardsvr --dbpath a0 --logpath log.a0 --fork --logappend --smallfiles --oplogSize 50
- Note that with --shardsvr specified the default port for mongod becomes 27018
- mongod --configsvr --dbpath cfg0 --fork --logpath log.cfg0 --logappend
- Note with --configsvr specified the default port for listening becomes 27019 and the default data directory /data/configdb. Wherever your data directory is, it is suggested that you verify that the directory is empty before you begin.
- mongos --configdb your_host_name:27019 --fork --logappend --logpath log.mongos0
- mongo
- sh.addShard('your_host_name:27018')
- sh.enableSharding('databaseName')
- db.collectionName.ensureIndex({a:1,b:1})
- sh.shardCollection('databaseName.collectionName',{a:1,b:1})
- use config
- db.chunks.find()
- db.chunks.find({}, {min:1,max:1,shard:1,_id:0,ns:1})
- exit
- mongod --shardsvr --dbpath b0 --logpath log.b0 --fork --logappend --smallfiles --oplogSize 50 --port 27000
- mongo
- sh.addShard('your_host_name:27000')
- sh.status()
- db.getSisterDB("config").shards.count()
Wednesday, March 26, 2014
Mongo 2.4.9 - getSisterDB
- show dbs
test
test1
- db
test
- db.getSisterDB('test1').collection.count()
123
test
test1
- db
test
- db.getSisterDB('test1').collection.count()
123
Mongo 2.4.9 - connect
- var c = connect('localhost:27002/database')
- db
- show dbs
- c.isMaster()
- c.getMongo().setSlaveOk()
- c.colleciont.count()
- c.collection.find().limit(3)
- db
- show dbs
- c.isMaster()
- c.getMongo().setSlaveOk()
- c.colleciont.count()
- c.collection.find().limit(3)
Monday, March 24, 2014
MongoDB 2.4 - Replica Set for Testing and Developing
1. start a single mongod as a standalone server
- cd
- mkdir 1
- mongod --dbpath 1 --port 27001 --smallfiles --oplogSize 50 --logpath 1.log --logappend --fork
- mongo --port 27001
2. convert the mongod instance to a single server replica set
- exit
- killall mongod
- mongod --replSet rstest --dbpath 1 --port 27001 --smallfiles --oplogSize 50 --logpath 1.log --logappend --fork
- mongo --port 27001
- Note: use hostname command to check hostname of the server
-
cfg =
{
"_id" : "rstest",
"members" : [
{
"_id" : 0,
"host" : "localhost:27001"
}
]
}
- rs.initiate(cfg)
3. add two more members to the set
- exit
- cd
- mkdir 2 3
- mongod --replSet rstest --dbpath 2 --port 27002 --smallfiles --oplogSize 50 --logpath 2.log --logappend --fork
- mongod --replSet rstest --dbpath 3 --port 27003 --smallfiles --oplogSize 50 --logpath 3.log --logappend --fork
- mongo --port 27001
3.1.
- cfg = rs.conf()
- cfg.members[1] = { "_id" : 1, "host" : "localhost:27002" }
- cfg.members[2] = { "_id" : 2, "host" : "localhost:27003" }
- rs.reconfig(cfg)
3.2.
- rs.add('localhost:27002')
- rs.add('localhost:27003')
4. retire the first member from the set
- rs.stepDown(300)
- exit
- terminate mongod process for memeber 1
- go to the new primary of the set
- cfg = rs.conf()
- cfg.members.shift()
- rs.reconfig(cfg)
- cd
- mkdir 1
- mongod --dbpath 1 --port 27001 --smallfiles --oplogSize 50 --logpath 1.log --logappend --fork
- mongo --port 27001
2. convert the mongod instance to a single server replica set
- exit
- killall mongod
- mongod --replSet rstest --dbpath 1 --port 27001 --smallfiles --oplogSize 50 --logpath 1.log --logappend --fork
- mongo --port 27001
- Note: use hostname command to check hostname of the server
-
cfg =
{
"_id" : "rstest",
"members" : [
{
"_id" : 0,
"host" : "localhost:27001"
}
]
}
- rs.initiate(cfg)
3. add two more members to the set
- exit
- cd
- mkdir 2 3
- mongod --replSet rstest --dbpath 2 --port 27002 --smallfiles --oplogSize 50 --logpath 2.log --logappend --fork
- mongod --replSet rstest --dbpath 3 --port 27003 --smallfiles --oplogSize 50 --logpath 3.log --logappend --fork
- mongo --port 27001
3.1.
- cfg = rs.conf()
- cfg.members[1] = { "_id" : 1, "host" : "localhost:27002" }
- cfg.members[2] = { "_id" : 2, "host" : "localhost:27003" }
- rs.reconfig(cfg)
3.2.
- rs.add('localhost:27002')
- rs.add('localhost:27003')
4. retire the first member from the set
- rs.stepDown(300)
- exit
- terminate mongod process for memeber 1
- go to the new primary of the set
- cfg = rs.conf()
- cfg.members.shift()
- rs.reconfig(cfg)
Thursday, March 20, 2014
Eclipse - Specified VM install not found: type Standard VM, name XXX
Solution 1
- suppose the path of workspace of eclipse is 'D:\eclipse\workspace'
- then, go to 'D:\eclipse\workspace\.metadata\.plugins\org.eclipse.debug.core\.launches'
- find 'projectName build.xml.launch' file and delete it
Solution 2
- Right Click on build.xml
- Go to "Run As" >> "External Tools Configurations..."
- It shall open new window
- Go to JRE tab
- Select proper JRE if missing
- suppose the path of workspace of eclipse is 'D:\eclipse\workspace'
- then, go to 'D:\eclipse\workspace\.metadata\.plugins\org.eclipse.debug.core\.launches'
- find 'projectName build.xml.launch' file and delete it
Solution 2
- Right Click on build.xml
- Go to "Run As" >> "External Tools Configurations..."
- It shall open new window
- Go to JRE tab
- Select proper JRE if missing
MongoDB 2.4 - Java (Fail over)
import java.net.UnknownHostException;
import java.util.Arrays;
import com.mongodb.BasicDBObject;
import com.mongodb.DBCollection;
import com.mongodb.MongoClient;
import com.mongodb.MongoException;
import com.mongodb.ServerAddress;
public class MongoDB {
public static void main(String[] args) throws UnknownHostException,
InterruptedException {
MongoClient client = new MongoClient(Arrays.asList(new ServerAddress(
"localhost", 27017), new ServerAddress("localhost", 27018),
new ServerAddress("localhost", 27019)));
DBCollection test = client.getDB("database")
.getCollection("collection");
test.drop();
for (int i = 0; i < Integer.MAX_VALUE; i++) {
for (int retries = 0; retries < 3; retries++) {
try {
test.insert(new BasicDBObject("_id", i));
System.out.println("Inserted document: " + i);
break;
} catch (MongoException.DuplicateKey e) {
System.out.println("Document already inserted " + i);
} catch (MongoException e) {
System.out.println(e.getMessage());
System.out.println("Retrying");
Thread.sleep(5000);
}
}
Thread.sleep(500);
}
}
}
import java.util.Arrays;
import com.mongodb.BasicDBObject;
import com.mongodb.DBCollection;
import com.mongodb.MongoClient;
import com.mongodb.MongoException;
import com.mongodb.ServerAddress;
public class MongoDB {
public static void main(String[] args) throws UnknownHostException,
InterruptedException {
MongoClient client = new MongoClient(Arrays.asList(new ServerAddress(
"localhost", 27017), new ServerAddress("localhost", 27018),
new ServerAddress("localhost", 27019)));
DBCollection test = client.getDB("database")
.getCollection("collection");
test.drop();
for (int i = 0; i < Integer.MAX_VALUE; i++) {
for (int retries = 0; retries < 3; retries++) {
try {
test.insert(new BasicDBObject("_id", i));
System.out.println("Inserted document: " + i);
break;
} catch (MongoException.DuplicateKey e) {
System.out.println("Document already inserted " + i);
} catch (MongoException e) {
System.out.println(e.getMessage());
System.out.println("Retrying");
Thread.sleep(5000);
}
}
Thread.sleep(500);
}
}
}
MongoDB 2.4 - Java (connecting to a replica set)
import java.net.UnknownHostException;
import java.util.Arrays;
import com.mongodb.BasicDBObject;
import com.mongodb.DBCollection;
import com.mongodb.MongoClient;
import com.mongodb.ServerAddress;
public class MongoDB {
public static void main(String[] args) throws UnknownHostException,
InterruptedException {
MongoClient client = new MongoClient(Arrays.asList(new ServerAddress(
"localhost", 27017), new ServerAddress("localhost", 27018),
new ServerAddress("localhost", 27019)));
DBCollection test = client.getDB("database")
.getCollection("collection");
test.drop();
for (int i = 0; i < Integer.MAX_VALUE; i++) {
test.insert(new BasicDBObject("_id", i));
System.out.println("Inserted document: " + i);
Thread.sleep(500);
}
}
}
import java.util.Arrays;
import com.mongodb.BasicDBObject;
import com.mongodb.DBCollection;
import com.mongodb.MongoClient;
import com.mongodb.ServerAddress;
public class MongoDB {
public static void main(String[] args) throws UnknownHostException,
InterruptedException {
MongoClient client = new MongoClient(Arrays.asList(new ServerAddress(
"localhost", 27017), new ServerAddress("localhost", 27018),
new ServerAddress("localhost", 27019)));
DBCollection test = client.getDB("database")
.getCollection("collection");
test.drop();
for (int i = 0; i < Integer.MAX_VALUE; i++) {
test.insert(new BasicDBObject("_id", i));
System.out.println("Inserted document: " + i);
Thread.sleep(500);
}
}
}
Wednesday, March 19, 2014
Ubuntu 12.04 - link /var/lib/postgresql to another partition's directory
- sudo service postgresql stop
- sudo cp -rf /var/lib/postgresql /anotherPartition/postgresql
- sudo chwon -R postgres:postgres /anotherPartition/postgresql
- sudo service postgresql start
- sudo cp -rf /var/lib/postgresql /anotherPartition/postgresql
- sudo chwon -R postgres:postgres /anotherPartition/postgresql
- sudo service postgresql start
Thursday, March 13, 2014
Mongo 2.4.9 - MapReduce
1. Map
function map_closest() {
var pitt = [-80.064879, 40.612044];
var phil = [-74.978052, 40.089738];
function distance(a, b) {
var dx = a[0] - b[0];
var dy = a[1] - b[1];
return Math.sqrt(dx * dx + dy * dy);
}
if (distance(this.loc, pitt) < distance(this.loc, phil)) {
emit("pitt", 1);
} else {
emit("phil", 1);
}
}
2.1. Array.sum()
function map_closest() {
var pitt = [-80.064879, 40.612044];
var phil = [-74.978052, 40.089738];
function distance(a, b) {
var dx = a[0] - b[0];
var dy = a[1] - b[1];
return Math.sqrt(dx * dx + dy * dy);
}
if (distance(this.loc, pitt) < distance(this.loc, phil)) {
emit("pitt", 1);
} else {
emit("phil", 1);
}
}
2. Reduce
2.1. Array.sum()
function reduce_closest(city, counts) {
return Array.sum(counts);
}
2.2. for()
function reduce_closest(city, counts) {
var total = 0;
var length = counts.length;
for (var i = 0; i < length; i++) {
total += counts[i];
}
return total;
}
2.3. forEach()
function reduce_closest(city, counts) {
var total = 0;
counts.forEach(function(count) {
total += count;
});
return total;
}
2.2. for()
function reduce_closest(city, counts) {
var total = 0;
var length = counts.length;
for (var i = 0; i < length; i++) {
total += counts[i];
}
return total;
}
2.3. forEach()
function reduce_closest(city, counts) {
var total = 0;
counts.forEach(function(count) {
total += count;
});
return total;
}
3. mapReduce
db.zips.mapReduce(
map_closest,
reduce_closest,
{
query: { state: 'PA' },
out: { inline: 1 }
}
)
Mongo 2.4.9 - $elemMatch
- db.policies.find( { status : { $ne : "expired" }, coverages : { $elemMatch : { type : "liability", rates : { $elemMatch : { rate : { $gte : 100 }, current : true } } } } } ).pretty()
{
"_id" : "1024850AB",
"status" : "draft",
"insured_item" : {
"make" : "Cessna",
"model" : "Skylane",
"year" : 1982,
"serial" : "AB1783A"
},
"insured_parties" : [
ObjectId("5097f7351d9a5941f5111d61")
],
"coverages" : [
{
"type" : "liability",
"limit" : 1000000,
"rates" : [
{
"rate" : 200,
"staff_id" : ObjectId("5097f7351d9a5999f5111d69"),
"date" : ISODate("2012-11-05T17:29:54.561Z"),
"current" : true
}
]
},
{
"type" : "property",
"deductible" : 5000,
"limit" : 100000,
"rates" : [
{
"rate" : 300,
"staff_id" : ObjectId("5097f7351d9a5999f5111d69"),
"date" : ISODate("2012-11-05T17:29:56.561Z"),
"current" : true
}
]
}
],
"underwriting" : {
"staff_id" : ObjectId("5097f84cf8dd729bc7273068"),
"action" : "approved",
"date" : ISODate("2012-11-05T17:33:00.693Z")
}
}
{
"_id" : "1024850AB",
"status" : "draft",
"insured_item" : {
"make" : "Cessna",
"model" : "Skylane",
"year" : 1982,
"serial" : "AB1783A"
},
"insured_parties" : [
ObjectId("5097f7351d9a5941f5111d61")
],
"coverages" : [
{
"type" : "liability",
"limit" : 1000000,
"rates" : [
{
"rate" : 200,
"staff_id" : ObjectId("5097f7351d9a5999f5111d69"),
"date" : ISODate("2012-11-05T17:29:54.561Z"),
"current" : true
}
]
},
{
"type" : "property",
"deductible" : 5000,
"limit" : 100000,
"rates" : [
{
"rate" : 300,
"staff_id" : ObjectId("5097f7351d9a5999f5111d69"),
"date" : ISODate("2012-11-05T17:29:56.561Z"),
"current" : true
}
]
}
],
"underwriting" : {
"staff_id" : ObjectId("5097f84cf8dd729bc7273068"),
"action" : "approved",
"date" : ISODate("2012-11-05T17:33:00.693Z")
}
}
Mongo 2.4.9 - Command
- mongoimport -d database -c collection --drop example.json
- mongo example.js
- mongo --shell example.js
- mongo --shell database example.js
- mongo example.js
- mongo --shell example.js
- mongo --shell database example.js
Wednesday, March 12, 2014
Mongo 2.4.9 - $substr
- db.zips.findOne()
{
"city" : "ACMAR",
"loc" : [
-86.51557,
33.584132
],
"pop" : 6055,
"state" : "AL",
"_id" : "35004"
}
- db.zips.count()
29467
- db.zips.aggregate({$project:{_id:{$substr:['$city',0,1]}}},{$group:{_id:'$_id',n:{$sum:1}}},{$sort:{n:-1}})
{
"result" : [
{
"_id" : "S",
"n" : 2871
},
{
"_id" : "C",
"n" : 2692
},
{
"_id" : "M",
"n" : 2348
},
{
"_id" : "B",
"n" : 2344
},
{
"_id" : "W",
"n" : 1834
},
{
"_id" : "L",
"n" : 1738
},
{
"_id" : "P",
"n" : 1681
},
{
"_id" : "H",
"n" : 1621
},
{
"_id" : "A",
"n" : 1398
},
{
"_id" : "G",
"n" : 1304
},
{
"_id" : "R",
"n" : 1284
},
{
"_id" : "D",
"n" : 1162
},
{
"_id" : "N",
"n" : 1128
},
{
"_id" : "F",
"n" : 1091
},
{
"_id" : "E",
"n" : 1050
},
{
"_id" : "T",
"n" : 955
},
{
"_id" : "O",
"n" : 767
},
{
"_id" : "K",
"n" : 630
},
{
"_id" : "J",
"n" : 391
},
{
"_id" : "V",
"n" : 381
},
{
"_id" : "I",
"n" : 288
},
{
"_id" : "U",
"n" : 165
},
{
"_id" : "Y",
"n" : 112
},
{
"_id" : "Q",
"n" : 68
},
{
"_id" : "Z",
"n" : 48
},
{
"_id" : "3",
"n" : 22
},
{
"_id" : "6",
"n" : 20
},
{
"_id" : "4",
"n" : 19
},
{
"_id" : "5",
"n" : 15
},
{
"_id" : "2",
"n" : 13
},
{
"_id" : "7",
"n" : 10
},
{
"_id" : "9",
"n" : 8
},
{
"_id" : "8",
"n" : 3
},
{
"_id" : "0",
"n" : 3
},
{
"_id" : "X",
"n" : 2
},
{
"_id" : "1",
"n" : 1
}
],
"ok" : 1
}
- db.zips.aggregate({$project:{_id:{$substr:['$city',0,1]}}},{$group:{_id:'$_id',n:{$sum:1}}},{$match:{_id:{$in:['0','1','2','3','4','5','6','7','8','9']}}}, {$group:{_id:null,count:{$sum:'$n'}}})
{ "result" : [ { "_id" : null, "count" : 114 } ], "ok" : 1 }
- db.zips.remove({city:/^[0-9]/})
{
"city" : "ACMAR",
"loc" : [
-86.51557,
33.584132
],
"pop" : 6055,
"state" : "AL",
"_id" : "35004"
}
- db.zips.count()
29467
- db.zips.aggregate({$project:{_id:{$substr:['$city',0,1]}}},{$group:{_id:'$_id',n:{$sum:1}}},{$sort:{n:-1}})
{
"result" : [
{
"_id" : "S",
"n" : 2871
},
{
"_id" : "C",
"n" : 2692
},
{
"_id" : "M",
"n" : 2348
},
{
"_id" : "B",
"n" : 2344
},
{
"_id" : "W",
"n" : 1834
},
{
"_id" : "L",
"n" : 1738
},
{
"_id" : "P",
"n" : 1681
},
{
"_id" : "H",
"n" : 1621
},
{
"_id" : "A",
"n" : 1398
},
{
"_id" : "G",
"n" : 1304
},
{
"_id" : "R",
"n" : 1284
},
{
"_id" : "D",
"n" : 1162
},
{
"_id" : "N",
"n" : 1128
},
{
"_id" : "F",
"n" : 1091
},
{
"_id" : "E",
"n" : 1050
},
{
"_id" : "T",
"n" : 955
},
{
"_id" : "O",
"n" : 767
},
{
"_id" : "K",
"n" : 630
},
{
"_id" : "J",
"n" : 391
},
{
"_id" : "V",
"n" : 381
},
{
"_id" : "I",
"n" : 288
},
{
"_id" : "U",
"n" : 165
},
{
"_id" : "Y",
"n" : 112
},
{
"_id" : "Q",
"n" : 68
},
{
"_id" : "Z",
"n" : 48
},
{
"_id" : "3",
"n" : 22
},
{
"_id" : "6",
"n" : 20
},
{
"_id" : "4",
"n" : 19
},
{
"_id" : "5",
"n" : 15
},
{
"_id" : "2",
"n" : 13
},
{
"_id" : "7",
"n" : 10
},
{
"_id" : "9",
"n" : 8
},
{
"_id" : "8",
"n" : 3
},
{
"_id" : "0",
"n" : 3
},
{
"_id" : "X",
"n" : 2
},
{
"_id" : "1",
"n" : 1
}
],
"ok" : 1
}
- db.zips.aggregate({$project:{_id:{$substr:['$city',0,1]}}},{$group:{_id:'$_id',n:{$sum:1}}},{$match:{_id:{$in:['0','1','2','3','4','5','6','7','8','9']}}}, {$group:{_id:null,count:{$sum:'$n'}}})
{ "result" : [ { "_id" : null, "count" : 114 } ], "ok" : 1 }
- db.zips.remove({city:/^[0-9]/})
- db.zips.count()
29353
Subscribe to:
Posts (Atom)