hdfs 性能测试

 

DFSIO测试:

 

执行以下命令来运行HDFS写性能测试,其中参数-nrFiles指定了测试中要写的文件数目,参数-fileSize指明了写入每个文件的大小单位是MB

--写

hadoop jar /opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/hadoop-mapreduce-client-jobclient-3.0.0-cdh6.1.0-tests.jar TestDFSIO -D test.build.data=/tmp/benchmark -write -nrFiles 10 -fileSize 100
hadoop jar /opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/hadoop-mapreduce-client-jobclient-3.0.0-cdh6.1.0-tests.jar TestDFSIO -D test.build.data=/tmp/benchmark -write -nrFiles 1000 -fileSize 100

--读

hadoop jar /opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/hadoop-mapreduce-client-jobclient-3.0.0-cdh6.1.0-tests.jar TestDFSIO -D test.build.data=/tmp/benchmark -read -nrFiles 10 -fileSize 100
hadoop jar /opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/hadoop-mapreduce-client-jobclient-3.0.0-cdh6.1.0-tests.jar TestDFSIO -D test.build.data=/tmp/benchmark -read -nrFiles 1000 -fileSize 100

--清理数据

hadoop jar /opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/hadoop-mapreduce-client-jobclient-3.0.0-cdh6.1.0-tests.jar TestDFSIO -D test.build.data=/tmp/benchmark -clean

 

会在当前目录生产报告文件 TestDFSIO_results.log

结果:

----- TestDFSIO ----- : write
Date & time: Wed Jan 09 02:08:46 EST 2019
Number of files: 10
Total MBytes processed: 1000
Throughput mb/sec: 49.04
Average IO rate mb/sec: 49.35
IO rate std deviation: 3.89
Test exec time sec: 36.42

----- TestDFSIO ----- : write
Date & time: Wed Jan 09 02:27:51 EST 2019
Number of files: 1000
Total MBytes processed: 100000
Throughput mb/sec: 34.27
Average IO rate mb/sec: 43.23
IO rate std deviation: 22.3
Test exec time sec: 802.94

----- TestDFSIO ----- : read
Date & time: Wed Jan 09 02:33:27 EST 2019
Number of files: 10
Total MBytes processed: 1000
Throughput mb/sec: 525.21
Average IO rate mb/sec: 554.81
IO rate std deviation: 129.88
Test exec time sec: 29.33

----- TestDFSIO ----- : read
Date & time: Wed Jan 09 02:42:51 EST 2019
Number of files: 1000
Total MBytes processed: 100000
Throughput mb/sec: 161.28
Average IO rate mb/sec: 176.63
IO rate std deviation: 51.12
Test exec time sec: 527.89

 

 

TeraSort测试:

 

Hadoop的TeraSort是一个常用的测试,目的是利用MapReduce来尽可能快的对数据进行排序。TeraSort使用MapReduce框架通过分区操作将Map过程中的结果输出到Reduce任务,确保整体排序的顺序。TeraSort测试可以很好的对MapReduce框架的每个过程进行压力测试,为调优和配置Hadoop集群提供一个合理的参考

 

time hadoop jar /opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar teragen 10000000 /tmp/terasort-input
time hadoop jar /opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar terasort /tmp/terasort-input  /tmp/terasort-ouput
time hadoop jar /opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar teravalidate /tmp/terasort-ouput /tmp/terasort-validate

 

运行结果

TeraSort命令耗时42.787s

您可以选择一种方式赞助本站

发表评论

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

图片 表情