Why SDC is important - - Government Statistical Service

Why SDC is important - - Government Statistical Service

Applying Cell-Key Perturbation to 2021 Census Outputs Iain Dove, 12/07/17 Disclosure Control Branch Overview Statistical Disclosure Control What happened for Census 2011 What could happen for Census 2021 The cell-key method (cell perturbation) Benefits & trade-offs 2 Why SDC is important The Statistics and Registration Services Act (SRSA, 2007) defines personal information as information that identifies a particular person if the identity of that person (a) is specified in the information, (b) can be deduced from the information, or (c) can be deduced from the information taken together with any other published information. It is a criminal offence to release personal information Disclosure Control Aims Comply with relevant legislation (SRSA) Protect confidentiality of responses Preserve utility of data Maintain trust in ONS and official stats 4 Census 2011 Targeted record swapping applied 5 Record Swapping Age: 30-34, Ethnic group: Other A Output Area A B Age: 30-34, Ethnic group: Mixed C Output Area B

1) Identify risky or unique households 2) Find a nearby match for that household 3) Swap the records 6 Targeted Record swapping Swap risky households between geographies: Before: Output area 1 35-44 Output area 2 15-24 25-34 15-24 25-34 35-44 White 23 18 20 White 25 19 23 Mixed 3 0 2 Mixed

2 3 4 Other 0 0 1 Other 1 0 1 Output area 2 15-24 25-34 35-44 After: Output area 1 15-24 25-34 35-44 White 23 18 20 White 25

19 23 Mixed 3 0 3 Mixed 2 3 3 Other 0 0 0 Other 1 0 2 7 Census 2011 Targeted record swapping applied Tables had to be checked Disclosive tables had to be redesigned 8 Differencing Different sources of information can be combined to aid disclosure Output area 1 0-4 5-9

10-14 Output area 1 0-5 6-10 11-15 White 12 15 20 White 15 14 20 Mixed 3 2 2 Mixed 4 1 2 Other 2 0 1 Other

2 0 2 Output area 1 5 10 15 White 3 2 2 Mixed 1 0 0 Other 0 0 1 9 Census 2021 Investigating several methods of protection, including use of targeted record swapping plus cell perturbation Tables would be available from an online system Little or no checking would be needed 10 The Cell- key Method 1) Assign each record a

random number (record-key) Record r1 r2 r3 rN Rkey 54 104 93 26 2) Create a frequency table. For each cell, sum record keys and take the modulo to get the cell key Age by Sex 0-15 16-24 25-34 Male . . . 3) Use perturbation table to get perturbation value from cell value and cell key Ptable 1 Cell Value 1 2 3 4 5 2 +1 Cell key (1-200) 3 62

200 -1 -1 +1 +1 -1 Female . 4 . Record Rkey r2 104 r4 61 r56 7 r72 90 Sum Rkey = 262 Cell key = 262 mod 200 = 62 4) Apply the chosen perturbation to the cell Age by Sex 0-15 16-24 25-34 Male . . . Female . 5 . 11 Differencing after perturbation If perturbation has been applied, differences could either be real, or introduced at random Output area 1 0-4

5-9 10-14 Output area 1 0-5 6-10 11-15 White 13 15 20 White 15 14 20 Mixed 3 2 2 Mixed 4 1 1 Other 2 0 1

Other 2 0 2 Output area 1 5 10 15 White 2 2 2 Mixed 1 0 -1 Other 0 0 1 12 Benefits Protection from both record swapping and cell key method: Easier access to data Much earlier access to data More flexibility Trade Offs Some inconsistencies between different tables: 1) High geography table vs aggregate of low

geography tables 2) The same marginal totals appearing in different tables Example of Inconsistencies Between high and low geographies At Local Authority level: Occupation: Ethnicity: 1 2 White 3885 6014 Mixed 27 55 Asian 198 357 Black 23 93 Chinese 23 40 4156 6559 Total: 3 4325 64 134 42 11 4576 4 5726 44 168 49 6 5993 5 5202 47 171 34 25 5479

6 4172 53 135 76 15 4451 7 4394 70 292 44 23 4823 8 4282 34 233 38 15 4602 5 5205 48 172 33 25 5483 6 4171 52 137 76 15 4451 7 4391 68 292 44 24 4819 8 4276 35 231

41 15 4598 9 6784 70 294 97 33 7278 44784 464 1982 496 191 47917 Aggregated from Output area level: Ethnicity: White Mixed Asian Black Chinese Total: Occupation: 1 2 3886 6012 27 57 196 359 23 93 22 40 4154 6561 3 4324 66 133 41 11 4575 4

5730 43 166 48 5 5992 9 6784 70 291 94 32 7271 44779 466 1977 493 189 47904 Example of Inconsistencies Between marginal totals in different tables 0-15 16-24 25-34 35-45 45-54 55-64 65-74 75+ Total: White 10146 6946 6895 7810 8378

7518 5612 4856 58161 Mixed 420 186 145 87 83 37 15 13 986 Asian 1179 679 822 581 393 227 111 54 4046 Black 233

106 158 178 96 29 26 22 848 Other 196 635 129 60 37 20 8 4 1089 Total: 12174 8552 8149 8716 8987 7831 5772 4949

65130 0-15 16-24 25-34 35-44 Single 12176 8313 5249 2886 Married 0 207 2439 4468 Separated 0 11 241 403 Divorced 0 13 199 914 Widowed 0 8 20 44 Total: 12176 8552 8148 8715 45-54 55-64 65-74 75+ Total: 1517 613 308 274 31336 5355 5312 3769 2030 23580 432 223

105 38 1453 1521 1257 659 251 4814 160 426 930 2357 3945 8985 7831 5771 4950 65128 16 Trade Offs Small inconsistencies between different tables: High geography table vs aggregate of low geography tables The same marginal totals appearing in different tables Aim to provide: Population counts unperturbed Tables at higher geography unperturbed Summary and next steps Use of the cell key method could allow better and earlier access to data Does this outweigh the inconsistencies? Further testing to be carried out, including trial implementation Gather as much user feedback as possible on the proposed trade-offs

Recently Viewed Presentations

  • Computer Organization

    Computer Organization

    CPU time = IC × CPI × CC . Under . ideal. conditions and with a large number of instructions, the speedup from pipelining is approximately equal to the number of pipe stages. A five stage pipeline is nearly five...
  • Stroke Prevention in AF - educatehealth.ca

    Stroke Prevention in AF - educatehealth.ca

    Primary Care Education Progam. If this is a Continuing Medical Education (CME) setting this would be the time for the speaker to introduce themselves to the group, and depending on group size have other participants introduce themselves and their job...
  • Vocabulary Week of 4/23 - 4/27 - TIRE DEALERS

    Vocabulary Week of 4/23 - 4/27 - TIRE DEALERS

    deceive; trick. dupe is a verb. synonyms: hoodwink, swindle. antonym: inform. Now please write a sentence with context clues and use the word correctly. Vocabulary Word of the Day. didactic.
  • Welcome to Back to School Night! - Edl

    Welcome to Back to School Night! - Edl

    Welcome to Back to School Night! Please take a minute to fill out the letter on your desk With Ms. Hlavac 3rd Grade Soleado * -Sticker Chart for each day with no name on board/homework turned in….all kids will get...
  • MTE Stakeholder Outreach

    MTE Stakeholder Outreach

    Mid-Term Evaluation" to re-examine the appropriateness of the 2022-2025 regulations. Three possibilities for Mid-Term Evaluation: targets appropriate, more stringent or less . stringent. WHAT IS THE ONE NATIONAL PROGRAM? MTE Provision Was Critical To Industry Acceptance To .
  • Intro to Medical Terminology

    Intro to Medical Terminology

    75% of all medical terms are derived from Latin or Greek. ... Means for doctors, nurses, and others caring for the pt. to communicate. Legal documentation describing the care the pt. received and can be used as evidence in court.
  • Freedom of association - International Labour Organization

    Freedom of association - International Labour Organization

    Individual and collective rights of workers. State guarantees the rights of all workers to self-organization, collective bargaining and negotiations, and peaceful concerted activities, including the right to strike in accordance with law ... Recognition of the Principles of Freedom of...
  • Revolutions Disrupt Europe - Loudoun County Public Schools

    Revolutions Disrupt Europe - Loudoun County Public Schools

    Toussaint L'Overture Simon Bolivar Led a successful slave revolt in Haiti Led a peace conference and wanted things to be like they used to be Created a French Empire Led successful revolutions in Latin America 1848 The Year of Revolutions...