- Guide
- 정상적인 운영 작업 진행 시
- 관련 서비스를 maintenance mode로 전환
- 불정확 alert 발생 제거
- alert 발생 정보를 사용하여 정기적 클러스터 상태 확인 및 문제 발생 예방
- 불필요 알림 제거
- Rolling restart 지향
- 작업 내용을 공유 및 공지 하였더라도 사람이 하는 일이기에 만일을 대비 필요
- 작업 전 알림 기능 중지 및 완료 후 시작
- 불필요 알림 제거
- 관련 서비스의 maintenance mode 전환 시 alert이 발생하지 않아 알림도 발생하지 않음
- 하지만 일부 작업은 작업자가 사전에 미리 파악하지 않으면 영향을 주는 서버스를 모르기에 관련 서비스의 maintenance mode 전환을 하지 않음
- 따라서 해당 작업이 필요
- 정상적인 운영 작업 진행 시
- Starting and Stopping the ambari alert notification
- Ambari UI → Alerts → Actions → Manage Notifications → select the notification method → click the setting icon → Edit → Groups →
- For stopping
- Custom → Clear All
- For starting
- All
- → Save
- Decommissioning and Re-commissioning a Worker Node
- Decommissioning and recommissioning is a multi-step process. Worker nodes normally run both a DataNode and a NodeManager, and both are typically commissioned or decommissioned together.
- With the replication level set to three, HDFS is resilient to individual DataNode failures. However, there is a high chance of data loss when you terminate multiple DataNodes without decommissioning them first. Decommissioning multiple DataNodes should be accomplished on a schedule that permits the replication the data of blocks that reside on DataNodes being taken out of service. For additional data safety, consider decommissioning on a single DataNode at a time.
- Decommissioning a NodeManager is different. If a NodeManager is shut down, the ResourceManager will reschedule the tasks on other NodeManagers in the cluster. However, decommissioning a NodeManager might be required in situations where you want a NodeManager to stop to accepting new tasks, or when the tasks take time to execute but you still want to be agile in your cluster management.
- Deleting Worker Nodes
Monday, June 12, 2017
Hadoop Cluster Administration
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment
Note: Only a member of this blog may post a comment.