博客
关于我
强烈建议你试试无所不能的chatGPT,快点击我
flume 集群datanode节点失败导致hdfs写失败(转)
阅读量:6985 次
发布时间:2019-06-27

本文共 3187 字,大约阅读时间需要 10 分钟。

来自:http://www.geedoo.info/dfs-client-block-write-replace-datanode-on-failure-enable.html

 

这几天由于杭州集群处于升级过度时期,任务量大,集群节点少(4个DN),集群不断出现问题,导致flume收集数据出现错误,以致数据丢失。

出现数据丢失,最先拿来开刀的就是数据收集,好嘛,先看看flume的错误日志:

[php]Caused by: java.io.IOException: Failed to add a datanode. User may turn off this feature by setting dfs.client.block.write.replace-datanode-<br />

on-failure.policy in configuration, where the current policy is DEFAULT. (Nodes: current=[10.0.2.163:50010, 10.0.2.164:50010], original=[10.0.2.163:50010, 10.0.2.164:50010])<br />
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.findNewDatanode(DFSOutputStream.java:817)<br />
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:877)<br />
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:983)<br />
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:780)<br />
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:449)[/php]

错误:

Failed to add a datanode. User may turn off this feature by setting dfs.client.block.write.replace-datanode-on-failure.policy in configuration, where the current policy is DEFAULT

从日志上看是说添加DN失败,需要关闭dfs.client.block.write.replace-datanode-on-failure.policy特性。但是我没有添加节点啊?看来问题不是这么简单。

通过查看官方配置文档对上述的参数配置:

参数 默认值 说明
dfs.client.block.write.replace-datanode-on-failure.enable true If there is a datanode/network failure in the write pipeline, DFSClient will try to remove the failed datanode from the pipeline and then continue writing with the remaining datanodes. As a result, the number of datanodes in the pipeline is decreased. The feature is to add new datanodes to the pipeline. This is a site-wide property to enable/disable the feature. When the cluster size is extremely small, e.g. 3 nodes or less, cluster administrators may want to set the policy to NEVER in the default configuration file or disable this feature. Otherwise, users may experience an unusually high rate of pipeline failures since it is impossible to find new datanodes for replacement. See also dfs.client.block.write.replace-datanode-on-failure.policy
 dfs.client.block.write.replace-datanode-on-failure.policy  DEFAULT  This property is used only if the value of dfs.client.block.write.replace-datanode-on-failure.enable is true. ALWAYS: always add a new datanode when an existing datanode is removed. NEVER: never add a new datanode. DEFAULT: Let r be the replication number. Let n be the number of existing datanodes. Add a new datanode only if r is greater than or equal to 3 and either (1) floor(r/2) is greater than or equal to n; or (2) r is greater than n and the block is hflushed/appended.

来自:

然后寻找源码位置在dfsclient中,发现是客户端在pipeline写数据块时候的问题,也找到了两个相关的参数:

dfs.client.block.write.replace-datanode-on-failure.enable
dfs.client.block.write.replace-datanode-on-failure.policy
前者是,客户端在写失败的时候,是否使用更换策略,默认是true没有问题。
后者是,更换策略的具体细节,默认是default。
default在3个或以上备份的时候,是会尝试更换 结点次数??次datanode。而在两个备份的时候,不更换datanode,直接开始写。

由于我的节点只有4个,当集群负载太高的时候,同时两台以上DN没有响应,则出现HDFS写的问题。当集群比较小的时候我们可以关闭这个特性。

参考:

flume JIRA 说明

https://issues.apache.org/jira/browse/FLUME-2261

转载地址:http://csmpl.baihongyu.com/

你可能感兴趣的文章
为什么 scrum 开发人员是一个 T-形的人 ?
查看>>
使用 CODING 进行 Spring Boot 项目的集成
查看>>
centOS上docker 的简单使用
查看>>
web前端性能优化总结
查看>>
pandas 修改 DataFrame 列名
查看>>
《2018年云上挖矿态势分析报告》发布,非Web类应用安全风险需重点关注
查看>>
leetcode409.Longest Palindrome
查看>>
以太坊客户端Ethereum Wallet与Geth区别简介
查看>>
蚂蚁区块链平台BaaS技术解析与实践
查看>>
Nervos 双周报第 3 期:佛系新年之后的开工大吉!
查看>>
测试开发系类之接口自动化测试
查看>>
【PHP 扩展开发】Zephir 基础篇
查看>>
HTML
查看>>
HashMap浅析?
查看>>
字节跳动开源Go结构体标签表达式解释器,成请求参数校验的杀手锏
查看>>
怎么将在线录制的视频转为GIF动态图
查看>>
js的setTimeout和Promise---同步异步和微任务宏任务
查看>>
【剑指offer】顺时针打印矩阵
查看>>
以太坊交易池处理逻辑
查看>>
怎么将图片上传封装成指令?
查看>>