BigFuzz: Efficient Fuzz Testing for Data Analytics using Framework Abstraction
Published in ASE 20, 2020
Recommended citation: Qian Zhang, Jiyuan Wang, Muhammad Ali Gulzar, Rohan Padhye, Miryung Kim. (2020). "BigFuzz: Efficient Fuzz Testing for Data Analytics using Framework Abstraction." ASE 2020. 1(2). https://conf.researchr.org/details/ase-2020/ase-2020-papers/86/BigFuzz-Efficient-Fuzz-Testing-for-Data-Analytics-using-Framework-Abstraction
We propose a novel coverage-guided fuzz testing tool for bigdata analytics, calledBigFuzz. The key essence of our approach is that: (a) we focus on exercising application logic as opposed to increasing framework code coverage by abstracting the DISC frame-work using specifications. BigFuzz performs automated source to source transformations to construct an equivalent DISC application suitable for fast test generation, and (b) we design schema-aware data mutation operators based on our in-depth study of DISC application error types. BigFuzz speeds up the fuzzing time by 78 to1477X compared to random fuzzing, improves application code coverage by 20% to 271%, and achieves 33% to 157% improvement in detecting application errors. When compared to the state of the art that uses symbolic execution to test big data analytics, BigFuzz is applicable to twice more programs and can find 81% more bugs.
Recommended citation: ‘Qian Zhang, Jiyuan Wang, Muhammad Ali Gulzar, Rohan Padhye, Miryung Kim. (2020). "BigFuzz: Efficient Fuzz Testing for Data Analytics using Framework Abstraction." ASE 2020. 1(2).’