Mungeol Heo: Hadoop Cluster Administration

Monday, June 12, 2017

Hadoop Cluster Administration

Guide
- 정상적인 운영 작업 진행 시
  - 관련 서비스를 maintenance mode로 전환
  - Rolling restart 지향
  - 작업 전 알림 기능 중지 및 완료 후 시작
    - 불필요 알림 제거
    - 관련 서비스의 maintenance mode 전환 시 alert이 발생하지 않아 알림도 발생하지 않음
    - 하지만 일부 작업은 작업자가 사전에 미리 파악하지 않으면 영향을 주는 서버스를 모르기에 관련 서비스의 maintenance mode 전환을 하지 않음
    - 따라서 해당 작업이 필요
Starting and Stopping the ambari alert notification
- Ambari UI → Alerts → Actions → Manage Notifications → select the notification method → click the setting icon → Edit → Groups →
- For stopping
  - Custom → Clear All
- For starting
  - All
- → Save
Decommissioning and Re-commissioning a Worker Node
- Decommissioning and recommissioning is a multi-step process. Worker nodes normally run both a DataNode and a NodeManager, and both are typically commissioned or decommissioned together.
- With the replication level set to three, HDFS is resilient to individual DataNode failures. However, there is a high chance of data loss when you terminate multiple DataNodes without decommissioning them first. Decommissioning multiple DataNodes should be accomplished on a schedule that permits the replication the data of blocks that reside on DataNodes being taken out of service. For additional data safety, consider decommissioning on a single DataNode at a time.
- Decommissioning a NodeManager is different. If a NodeManager is shut down, the ResourceManager will reschedule the tasks on other NodeManagers in the cluster. However, decommissioning a NodeManager might be required in situations where you want a NodeManager to stop to accepting new tasks, or when the tasks take time to execute but you still want to be agile in your cluster management.
Deleting Worker Nodes

Mungeol Heo

Monday, June 12, 2017

Hadoop Cluster Administration

No comments:

Post a Comment