Programming Languages & Software Engineering

Programming Languages & Software Engineering

Empirically Revisiting the Test Independence Assumption Sai Zhang, Darioush Jalali, Jochen Wuttke, Kvan Mulu, Wing Lam, Michael D. Ernst, David Notkin University of Washington Order dependence Dependent test Two tests: createFile(foo) ... readFile(foo) ... Executing them in default order: (the intended test results)

Executing them in a different order: 2 Visible test result rather than internal program state Use the default execution order as baseline Dependent test A test that yields a different test result than Executing them in default order: the(testdefault result results by design) in a reordered subsequence Executing

them in different orders: of the original test suite. createFile(foo) ... readFile(foo) ... Execute real tests rather than contrived ones 3 Why should we care about test dependence? Makes test behaviors inconsistent Affects downstream testing techniques Test prioritization Test parallelization Test selection

CPU 1 CPU 2 4 Conventional wisdom: test dependence is not a significant issue Test independence is assumed by: Test selection Test prioritization Test parallel execution Test factoring Test generation

31 papers in ICSE, FSE, ISSTA, ASE, ICST, TSE, and TOSEM (2000 2013) 5 Conventional wisdom: test dependence is not a significant issue Consider test dependence Test independence is assumed by:

Test selection Test prioritization Test parallel execution Test factoring Test generation Assume test independence without justification 1 As a threat to validity 3 31 papers in ICSE, FSE, ISSTA, ASE, ICST, TSE,

27 and TOSEM (2000 2013) 6 Is the test independence No! assumption valid? Does test dependence arise in practice? both human-written and automatically-genera What repercussions does test dependence have? onsistent results: missed alarms and false alar ffecting downstream testing techniques How to detect test dependence? of: the general problem is NP-complete proximate algorithms based on heuristics work 7

Is the test independence No! Implications: assumption valid? Does test dependence arise in practice? both human-written and automatically-genera Test independence should no longer be assumed What repercussions does test dependence have? onsistent results: missed alarms and false alar ffecting downstream testing techniques How to detect test dependence?

New challenges in designing of: the general problem is NP-complete testing techniques proximate algorithms based on heuristics work 8 Is the test independence assumption valid? Does test dependence arise in practice? both human-written and automatically-genera What repercussion does test dependence have ? onsistent results: missed alarms and false alar ffecting downstream testing techniques How to detect test dependence? general problem is NP-complete proximate algorithms based on heuristics work 9

Methodology Reported dependent tests 5 issue tracking systems New dependent tests 4 real-world projects 10 Methodology Reported dependent tests 5 issue tracking systems Search for 4 key phrases: (dependent test, test dependence,

test execution order, dierent test outcome)erent test outcome Manually inspect 450 matched bug reports Identify 96 distinct dependent tests Characteristics: Manifestation Root cause Developers action 11 Manifestation Number of tests involved to yield a different result

(default order) (run in isolation) #Tests = 1 (run after another) #Tests = 2 12 Manifestation Number of tests involved to yield a different result 96 dependent tests 13 Manifestation Number of tests involved to yield a different result

#Tests = 1 Unknown 6 15 #Tests = 3 2 82% can be revealed by no more than 2 tests 73 #Tests = 2 14 Root cause 96 dependent tests

15 Root cause Unknown 23 database 10 4 59 at least 61% are due to side-eectingecting access to static variables. static variable file system

16 Developers action 98% of the reported tests are marked as major or minor issues 91% of the dependence has been fixed Improving documents Fixing test code or source code 17 Methodology New dependent tests Human-written test suites 4176 tests 29 dependent tests

Automatically-generated test suites use Randoop [Pacheco07] 6330 tests 354 dependent tests Ran dependent test detection algorithms (details later) 4 real-world projects 18 Characteristics Manifestation: number of tests to yield a different result 29 manual dependent tests 19

Characteristics Manifestation: number of tests to yield a different result #Tests = 3 #Tests = 2 4 2 29 manual dependent tests 354 auto-generated dependent tests 23 #Tests= 1 20 Characteristics

Manifestation: number of tests to yield a different result #Tests = 3 #Tests = 2 #Tests 2 4 2 29 manual dependent tests 186 168 23 #Tests= 1 #Tests = 1 21

Characteristics Manifestation: number of tests to yield a different result #Tests = 3 #Tests = 2 #Tests 2 4 2 29 manual dependent tests 186 168 23 #Tests= 1 #Tests = 1

Root cause All because of side-effecting access of static variables 22 Developers actions Confirm all manual dependent tests tests should always stand alone, that is test engineering 101 Merged two tests to remove the dependence Opened a bug report to fix the dependent test Wont fix the dependence, since it is due to the library design 23 Is the test independence assumption valid? Does test dependence arise in practice? both human-written and automatically-genera What repercussion does test dependence have ?

onsistent results: missed alarms and false alar ffecting downstream testing techniques How to detect test dependence? general problem is NP-complete proximate algorithms based on heuristics work 24 Reported dependent tests 96 dependent tests 5 issue tracking systems 25 Reported dependent tests Missed alarms 2

96 dependent tests 94 5 issue tracking systems False alarms 26 Example false alarm void testDisplay() { //create a Display object //dispose the Display object } void testShell() { //create a Display object }

In Eclipse, only one Display object is allowed. In default order: testDisplay In a non-default order: testShell testShell testDisplay Led to a false bug report that took developers 3 months to resolve. 27 Example missed alarm public final class OptionBuilder { static String argName = null; static void reset() { argName = arg; }

Need to be set to arg before a client calls any method in the class. } BugTest.test13666 validates correct behavior. This test should fail, but passes when running in the default order Another test calls reset() before this test Hid a bug for 3 years. 28 Example missed alarm public final class OptionBuilder { static String argName = null; static void reset() {

argName = arg; } Need to be set to arg before a client calls any method in the class. } BugTest.test13666 validates correct behavior. This test should fail, but passes when running in the default order Another test calls reset() before this test Hid a bug for 3 years. 29

Example missed alarm public final class OptionBuilder { static String argName = null; static void reset() { } static { Bug argName = arg; fix } } Need to be set to arg before a client calls any method in the class. BugTest.test13666 validates correct behavior. This test should fail,

but passes when running in the default order Another test calls reset() before this test Hid a bug for 3 years. 30 Test prioritization A test execution order A new test execution order Achieve coverage faster Improve fault detection rate Each test should yield the same result. 31

Five test prioritization techniques [Elbaum et al. ISSTA 2000] Test prioritization technique Randomized ordering Prioritize on coverage of statements Prioritize on coverage of statements not yet covered Prioritize on coverage of methods Prioritize on coverage of methods not yet covered 4 real-world projects Total: 4176 manual tests Record the number of tests yielding different results 32 Evaluating test prioritization techniques Test prioritization technique Number of tests that

yield different results Randomized ordering 12 Prioritize on coverage of statements 11 Prioritize on coverage of statements not yet covered 17 Prioritize on coverage of methods 11 Prioritize on coverage of methods not yet covered 12

Implication: Total: 4176 manual tests Existing techniques are not aware of test dependence 33 Is the test independence assumption valid? Does test dependence arise in practice? both human-written and automatically-genera What repercussion does test dependence have ? onsistent results: missed alarms and false alar ffecting downstream testing techniques How to detect test dependence? general problem is NP-complete

proximate algorithms based on heuristics work 34 General problem of test dependence detection A test suite All dependent tests NP-Complete Proof: reducing the Exact Cover problem to the dependent test detection problem 35 Detecting dependent tests in a test suite A test suite

All dependent tests Approximate algorithms Reversal algorithm Randomized execution Exhaustive bounded algorithm Dependence-aware bounded algorithm All algorithms are sound but incomplete 36 Approximate algorithms by heuristics

Reversal algorithm Randomized execution Exhaustive bounded algorithm Dependence-aware bounded algorithm Intuition: changing order of each pair may expose dependences 37 Approximate algorithms by heuristics Reversal algorithm Randomized execution Exhaustive bounded algorithm

Dependence-aware bounded algorithm Shuffle the execution order multiple times 38 Approximate algorithms by heuristics Reversal algorithm Randomized execution Exhaustive bounded algorithm Dependence-aware bounded algorithm Executes all k-permutations for a bounding parameter k k= 2

Most dependent tests can be found by running short test subsequences (82% of the dependent tests are revealed by no more than 2 tests) Approximate algorithms by heuristics Reversal algorithm Randomized execution Exhaustive bounded algorithm Dependence-aware bounded algorithm x read y write

write k= 2 Record read/write info for each test Filter away unnecessary permutations Evaluating approximate algorithms Finding New dependent tests Human-written test suites 4176 tests 29 dependent tests Automatically-generated test suites use Randoop [Pacheco07] 6330 tests 354 dependent tests 4 real-world projects

41 Evaluating approximate algorithms 400 300 200 100 Re ve rs al R an do m iz ed kbo D

ep un en de de d nc eAw ar e 0 Shuffle 1000 times 100000000 10000000 1000000 100000 10000 1000

100 10 Estimated cost Actual cost Re ve rs R al an do m iz ed kD bo ep

un en de de d nc eAw ar e Number of dependent tests Time cost (seconds) k=2 (did not finish for some programs) 42 Evaluating approximate algorithms 400 300

200 100 Re ve rs al R an do m iz ed kbo D ep un en de de d

nc eAw ar e 0 Time cost (seconds) 100000000 10000000 1000000 100000 10000 1000 100 10 Re ve rs R al

an do m iz ed kD bo ep un en de de d nc eAw ar e Number of dependent tests CheapFind

andDetects detects halfmost of the dependent tests! all dependences the dependent within a bound, tests. but computationally infeasible. 43 Related work Existing definitions of test dependence Based on program state change [Kapfhammer03] Informal definitions [Bergelson06] Our definition focuses on the concrete test execution result.

Program state change may not affect test execution result. Flaky tests [Luo et al14, Google testing blog] Tests revealing inconsistent results Dependent test is a special type of flaky test. Tools supporting to execute tests in different orders JUnit 4.1: executing tests in alphabetical order by name DepUnit, TestNg: supporting specifying test execution order Do not support detecting test dependence. 44 Contributions Revisiting the test independence assumption Test dependence arises in practice Test dependence has non-trivial repercussions

Test dependence detection is NP-complete Heuristic algorithms are effective in practice Test independence should no longer be assumed! Our tool implementation http://testisolation.googlecode.com 45 [Backup slides] 46 Why not run each test in a separate process? Implemented in JCrasher Supported in Ant + JUnit Unacceptably high overhead 10 138 X slowdown Recent work merges tests running in separate processes

into a single one [Bell & Kaiser, ICSE 2014] 47 Why more dependent tests in automatically-generated test suites? Manual test suites: Developers understanding of the code and their testing goals help build well-structured tests Developers often try to initialize and destroy the shared objects each unit test may use Auto test suites: Most tools are not state-aware The generated tests often misuse APIs, e.g., setting up the environment incorrectly Most tools can not generate environment setup / destroy code 48 What is the default test execution order?

The intended execution order as designed Specified by developers Such as, in make file, ant file, or TestAll.java Lead to the intended results as developers want to see 49 Dependent tests vs. Nondeterministic tests Nondeterminism does not imply dependence A program may execute non-deterministically, but its tests may deterministically succeed. Test dependence does not imply nondeterminism A program may have no sources of nondeterminism, but its tests can still be dependent on each other 50 Controlled Regression Testing Assumption (CRTA) [Rothermel et al., TSE 1996] A stronger assumption than determinism, forbidding:

Porting to another system Nondeterminism Time-dependencies Interaction with the external environment (implicitly) test dependence The authors commented CRTA is not necessarily impossible to employ. Our paper has a more practical focus on the overlooked issue of test dependence 51

Recently Viewed Presentations

  • Sabbath Questions

    Sabbath Questions

    Test each new campaign element added to gauge audience appeal. IDEA . Seek to appeal to different age groups, genders. FOLLOW-UP . Timeframe still to be determined. Resume commercials (pre-paid) Establish a better web presence, platform, & analytics access. Run...
  • Findings from the INANE Member Survey on Student Papers ...

    Findings from the INANE Member Survey on Student Papers ...

    by Janice E Hawkins; June 2015, 25(2) Converting a DNP Scholarly Project into a Manuscript . by Heather Carter-Templeton, March 2015, 25, (1) Student Faculty Authorship: Challenges and Solutions . by Jessica Nishikawa, Estelle Codier, Debra Mark, & Maureen Shannon;...
  • Macbeth Act II - Woodland Hills School District

    Macbeth Act II - Woodland Hills School District

    Lady Macbeth's Soliloquy (Mr. Baun Version) Come to my breast, and replace the milk with envy and hatred. They will fuel my murderous plot. Make me like other cold-blooded killers, who secretly serve evil forces.. Please give me the darkest...
  •  Context Free Grammars  Definition of a grammar G

    Context Free Grammars Definition of a grammar G

    Context Free Grammars Definition of a grammar G Deriving strings and defining L(G) Context-Free Language definition * * Context-Free Grammars Definition * Definition A context-free grammar G = (V, Σ, S, P) V: finite set of variables (nonterminals) Σ: finite...
  • Focusing on Google Plug-ins & Csiszer, M. ED.

    Focusing on Google Plug-ins & Csiszer, M. ED.

    GoAnimate for Schools. This is a great animated-video creation tool. Students can create their own videos with drag-and-drop, then add your music and backgrounds and engage your students! Would be great to use as an alternate assessment tool.
  • MAKING COPY SHINE WITH EDITING (Lessons 4-7) Sabrina

    MAKING COPY SHINE WITH EDITING (Lessons 4-7) Sabrina

    Lesson 4: Quotes and Transitions. Objectives - In this lesson you will: ... Blues Brothers 2000 has a PG 13 rating and is showing at Westport cinema. 9. I had a bag of chips and a coca-cola for lunch. 10....
  • Berkeley County Schools Principals Meeting Tuesday, September 3,

    Berkeley County Schools Principals Meeting Tuesday, September 3,

    WVEIS Training for Secretaries (October 14, February 17 and April 21) Berkeley County Schools . 401 South Queen Street . Martinsburg, WV 25401. Principal Workshops. STAR Workshop for Principals at Aux. Meeting Room on Wednesday, September 11.
  • Mario Kart Wii by Nintendo - University of Michigan

    Mario Kart Wii by Nintendo - University of Michigan

    Mario Kart is a game worth purchasing (especially with the inclusion of the Wii Wheel) Overall Strengths. 1 to 4 players. The ability to throw objects at the leader to give you a chance to come from behind. Courses reflect...