Each Sunday Andy Kriebel (VizWiz) and Andy Cotgreave (Gravy Anecdote) post a link to a chart, a file with the underlying data. Then people build makeovers and share them over twitter and pinterest. Find more info here.
This week’s starting point comes from Information is Beautiful and is powered by Viz Sweet. It’s huge, and has a lot of interactivity that my screenshot doesn’t capture. Go check it out for yourself!
Oh God Why are they moving?! (I know, I know… I’m such a Grinch.) Once the bubbles settle they overlap, making it hard to see overall trends and compare breaches.
The “Method of Leak” coloring is slick. I LOVE how the color key combines with the filter. Unfortunately, the other color option isn’t. When you click “Color by year” (as seen above) Important Stories (How is this even determined?) become orange, and the leftover breaches take on a subtle gradient shading by year. The “Important Stories” color overwhelms the shading by year, and years are already denoted by vertical placement, so call it “Color by importance” or just remove it entirely.
Some of the older “more info” links are broken.
The scale of the graphic makes you scroll down and down and down to look at trends over time. Because of this, I think the visual is not designed to focus on trends over time, but I’m not sure what the actual focus could be. What question is this viz trying to answer? (Okay – Unfocused, sandboxy, “explore the data” vizes are fine, I just don’t like them.)
My plan for this makeover was to clearly show the total number of breaches and affected records by year. It’s a simple visual, but my inability to sus this information out of the original made me curious. Immediately I saw that the number of records affected in 2010 is really low. Outside a 95% confidence interval low. New plan – investigate 2010.
Looking over the original viz a second time I noticed the subtitle:
It’s possible that security improved, sentencing got harsher, hackers ran out of easy targets, or some other combination of factors led to 2010’s dramatic decline. It’s more likely that the selected incidents are incomplete. Especially considering that 2004-06 is sparse and the sources listed at the bottom of the viz don’t include an official comprehensive database. Researching 2010 data breaches led me to several significant breaches not in the spreadsheet, and checking them against the original data set drew my attention to a few duplicate records.
There are definitely issues with the underlying data – but does correcting those issues make 2010 less of an outlier? By cross-referencing the original data with records from Privacy Rights Clearinghouse I built a more complete version of the original spreadsheet.
It turned out that adding data made 2010 even more of an outlier. It looks like 2010 legitimately had significantly fewer massive breaches than expected. I have no idea why that is. (I’ve already spent significantly more time on this week’s makeover than I had budgeted.) Still, this has been a great adventure!
Check out my completed viz here!
Quick tips from this week’s viz:
- When you are making stacked bar charts order matters! Add your big category (Method) and sort it before adding your individual sections (Companies).
- Add borders to your stacked bar chart through the Edit Color box. (Not Formatting, even though formatting controls all the other borders.)
- Vertical reference bands don’t work if your dates are discrete!