Laboratory methods: A review - Stony Brook

Laboratory methods: A review - Stony Brook

Experimental methods: A review (Source: Tombaugh & Dillons A Practical Introduction to Experimental Design in CHI Research, 1993) Empirical methods in HCI Heuristic evaluation

Controlled laboratory experiments Quasi-experiments Ethnographic observation Task analysis User studies/user testing Experiment Design is similar to user interface design: Iterative: conceptualize, pilot, improve, run The perfect experiment does not exist! Experiments involve tradeoffs:

cost vs. running the ideal experiment saving time vs. running ideal experiment sometimes there are conflicting guidelines What IS an experiment? All experiments have control conditions

(comparisons are made) Hypotheses are tested Assignment is random (If assignment is not random, then its a quasi-experiment) In the successful experiment, conclusions about causality can be made (no bias or confounds) Can be replicated!! Criteria for expt design

External validity Internal validity Reliability Overview, experiment design Determine research problem Consider validity, etc. Pilot !! Design experiment

NO Is Design ok? YES Collect data Choices in experiment design

Variables (to manipulate and measure) Design (within-Ss, between-Ss, mixed) Controls (what will be compared) Sample (how subjects will be chosen) Task (many considerations) Stimuli Variables

IV: Independent variable (What is manipulated; factor, treatment) DV: Dependent variable (What is measured) Extraneous variable: (Any variable, other than the IV, that might affect the DV) Confound: An extraneous variable that covaries with the IV Confounds Example from handout:

Compare Screen A vs. Screen B Screen A is used in a room w/ windows Screen B is used in a room w/out windows What can you conclude if performance is better with Screen B than Screen A? Another example: The dreaded end of

the semester effect Between subjects designs Each person is tested in one condition Avoiding confounds Subjects are randomly assigned to conditions. If individual differences are likely to be

important, subjects can be matched on important characteristics. Within subjects designs (also known as repeated measures) Each person is tested in all conditions This avoids effects of ind. Differences! Order of conditions is randomized or counterbalanced

But you can get unexpected order effects! Advantages of Within-Ss fewer subjects needed statistical tests are more powerful control for individual differences best way to study learning or forgetting or the effects of expertise (longitudinal designs)

Disadvantages of Within-Ss Order effects can ruin the results (practice, fatigue, learning, boredom); counterbalancing is necessary! More testing materials are required. Order of presentation of materials must be controlled (counterbalanced).

It can be difficult to get subjects to return for repeated testing. Disadvantages of Within-Ss Order effects can ruin the results (practice, fatigue, learning, boredom); counterbalancing is necessary! More testing materials are required. Order of presentation of materials must be controlled (counterbalanced). It can be difficult to get subjects to return for repeated testing. (Between-Ss designs are an alternative.)

Mixed designs Combine between- and within-subjects comparisons One or more comparison is between two groups of different people One or more comparison is within the same group of people What is an interaction?

Example: Which type of dialog is better, commands or menus? What is an interaction? Example: Which type of dialog is better, commands or menus? (When the answer is it depends!, that suggests an interaction.) Example of an interaction DIALOG STYLE Command Menu

Novice 42 28 Expert 16 20 USER

(Here, the DV is the time it takes to do the task. The IVs are Dialogue Style and Users Expertise). Interaction For example: Which type of dialog is better, commands or menus? Answer: Commands are faster for experts & menus are faster for novices. Interaction For example: Which type of dialog is better, commands or menus? Answer: Commands are faster for experts & menus are faster for novices. (If the answer is it depends

that is, it depends on levels of another independent variable such as expertise, then there is an interaction.) Example of an interaction DIALOG STYLE Command Menu Novice 42 28

Expert 16 20 USER Example of an interaction 45 40 35 30

Time 25 Novice Expert 20 15 10 5 0 Command

Menu Example of an interaction 45 40 35 30 Time 25 Novice Expert

20 15 10 5 0 Command Menu (Which representation is better?) Between vs within vs mixed designs DIALOG STYLE Command

Menu Novice 10 Ss 10 Ss Expert 10 Ss 10 Ss USER

Between vs within vs mixed designs DIALOG STYLE Command Novice 10 Ss Expert 10 Ss Menu USER

(What are the advantages and disadvantages of having Dialog Style as a within-subs vs. a between-subs variable?) What can go wrong Two types of errors: Type 1 - Your data show a statistical effect, but its not real. Type 2 - Your data fail to show any statistical effect, but the effect is out there in the world. What can go wrong Two types of errors:

Type 1 - Your data show a statistical effect, but its not real. Type 2 - Your data fail to show any statistical effect, but the effect is out there in the world. Avoid Type 1 errors by replicating your effects. Avoid Type 2 by increasing your power. Power

Run more subjects Include more observations/items/tasks Eliminate noise (reduce variance) Try to achieve better control (These pairs of distributions show the same differences in means, but very different variances.) Common ways to reduce variance and increase power

Random assignment Minimize differences in subjects & items Try to use a within-Ss design Match subjects characteristics remove confounds by making the comparison groups or items similar Counterbalance Quasi-Experiments

When no random assignment is possible, e.g. in the study of: Gender effects Bilingualism Experts vs. novices (unless expertise can be acquired in the course of the expt.)

You cannot conclude anything about causality in a quasi-expt, due to potential confounds! Many HCI expts are quasi-expts. When is it appropriate to do an expt? when a direct comparison of two or more

systems or variables is required when its feasible to achieve some measure of control when you want to show causality when you want to test predictions; when precise understanding is needed when you need data for establishing the parameters of a model Advantages of experiments

provide comparative data enable strong statements of causality a wide variety of designs are available excellent conceptual match between experimental problems and statistical tests (always know how youre going to analyze the data before you collect it!!) Limitations of experiments Experiments often are time consuming are expensive can be ineffective for comparing complex systems (what is causing differences?)

can result in weak generalization (if the task is overly simplified, or if materials and setting arent varied enough) R&D (Research vs. Development) Scientific research is strongly associated with experiments (hypothesis-testing) Science also includes a descriptive

component Discuss: Applied vs. basic research; development Summary Designing an expt involves many tradeoffs There is no perfect expt. Pilot! (Expt design is iterative!) Applied research makes tradeoffs differently than academic research: its more timely, more generalizable, more descriptive, less controlled, and more relevant to real-world problems. A note about questionnaires:

Questionnaire design Sample questionnaire, CGB, Fig. 14.3 Overall, the system was easy to use strongly disagree to strongly agree The

system was quick and efficient The system had the capabilities I expected. What are the top 2 suggestions you could make to improve the system? What are the top 3 things you liked about the system? Questionnaire design - critique! Sample questionnaire, CGB, Fig. 14.3 Overall, the system was easy to use. strongly disagree to strongly agree

The system was quick and efficient. The system had the capabilities I expected. What are the top 2 suggestions you could make to improve the system? What are the top 3 things you liked about the system? Questionnaire design - critique!

Sample questionnaire, CGB, Fig. 14.3 Overall, the system was easy to use. strongly disagree to strongly agree Biased!

The system was quick and efficient. The system had the capabilities I expected. What are the top 2 suggestions you could make to improve the system? What are the top 3 things you liked about the system? Review of user studies (Gomolls article) User studies (Gomoll, 1990) Set up observation

(tasks, users, situation) Describe the evaluations purpose Tell user she can quit at any time

Introduce equipment Explain how to think aloud Explain that you will not provide help Describe tasks and system Ask for questions Conduct the

observations (debrief the subject) Summarize results

Recently Viewed Presentations

  • From Internet Radio for Mass Communication to a Rich Media ...

    From Internet Radio for Mass Communication to a Rich Media ...

    From Internet Radio for Mass Communication to a Rich Media Infrastructure for E-Learning TRACK 4: Emerging Technologies and Practice Wednesday, November 4, 2003
  • 1 THESSALONIANS - abiblecommentary.com

    1 THESSALONIANS - abiblecommentary.com

    Purpose is to reassure Thessalonians that those who have died in Christ won't be left behind or miss out on the Lord's return in any way. LESSONS FOR TODAY. 1. 1 Thess. provides a model for _____-_____ missions today.
  • The Age of Exploration - Mr. Baker's History Class

    The Age of Exploration - Mr. Baker's History Class

    Daily Warm Up. Use your text to define or explain the following terms: Line of Demarcation p. 42. Treaty of Tordesillas p. 41. Conquistador (glossary)
  • Diapositiva 1 - Inquinamento Luminoso

    Diapositiva 1 - Inquinamento Luminoso

    Il primo provvedimento ufficiale che risulta essere stato adottato risale al 1958 e fu varato a tutela dell'Osservatorio Astronomico di Flagstaff, in Arizona, magistralmente fondato e diretto da Percival LOWELL, all'epoca ormai defunto (foto).
  • East Asia in WHAP - White Plains Middle School

    East Asia in WHAP - White Plains Middle School

    East Asia in WHAP. China figures in prominently but so does Korea and Japan…May the Fourth be with you!!!!! ... 2010 Comp Han imperial admin to Rome or India. 2011 CCOT long distance migration 1700-1900 (coolies anyone?) ... decreasing the...
  • Class 7  October 16 Defamation in the Internet

    Class 7 October 16 Defamation in the Internet

    Class 7 - October 16Defamation in the Internet Age (cont.) and possibly some Rights to your Image but let's face it probably not a chance in hell I obviously suck at planning, at this point I freely admit it
  • Education Welfare and Child Protection Services

    Education Welfare and Child Protection Services

    Meeting of the Joint Consultative Forum - 26 February 2014 Alex Barr CEWO SELB/WELB and Pamela Woods CEWO BELB Examples from across elb's * * The core activity of the education welfare service is to fulfil the board's statutory duty...
  • April GATE Equity Webinar: What is Career Ready?

    April GATE Equity Webinar: What is Career Ready?

    Under the green tab called K-12 Data and Reports, you can go to Performance Indicators for school district data. We believe that equity means giving every student or student group what they need to be successful. This data is disaggregated...