Welcome to GraphGraph.
Read more about our site on the About page.
Welcome to GraphGraph.
Read more about our site on the About page.
Graph masters Mekko put out a slide the other day that shows the most recent work experience of every US President, divided neatly into 5 categories. It’s interesting to look at, but I couldn’t shake the feeling that it could be improved.
Specifically, it was the color that bothered me. I wondered if the hodge-podge of colors might even detract from the visualization, as it seems to have themes (blues, greens, etc…) that don’t actually signify anything. Furthermore, US politics have a well established color-code, wouldn’t that make this chart more informative?
Thankfully Mekko left the slide downloadable and editable, so it took just a few minutes and hey-presto, party affiliation is baked right in! I went with the standard Democrats in blue and Republicans in red, made Whigs a dark slate, Democratic-Republicans purple, and left the relatively speaking non-affliated Washington and Adams white and grey, respectively.
So, dear graph enthusiast: have I improved the chart, or added an unnecessary detail? Does reinforcing the political divide take away from the intended message? Are there any other enhancements that come to mind?
However, with 128 teams in the FBS, that means the percentage of teams making the playoffs is a whopping 3.125%.
Meanwhile, the Men’s Basketball playoff is upon us, which lets in a field of 68 teams across 351 teams, for a percentage of 19%.
What about every other NCAA sport? If you were to play a different sport, what percentage of teams have a shot at the end of the season of playing for the National Title?
Information was gathered about all of the NCAA sports where championships are offered. The focus was on Division I, but some sports have a “combined” championship which spans divisions, like Skiing and Women’s Bowling. Source data was originally collected in January 2013 and updated in March 2015 where new information was available.
For Men’s sports, on average 28% of teams make it in. For Women’s sports, 21% of teams. Men’s, despite the low representation percentage in Football, get a boost from Wrestling, Fencing, and Gymnastics.
Men’s and Women’s basketball have a similar number of competing teams. In fact, there are only two schools which offer Men’s but not Women’s, The Citadel and VMI.
Also, take a look at the difference between the Football Bowl Subdivision (1-A) and the Football Championship Division (1-AA).
One could argue that every week during the regular season in the FBS is an “elimination game” on the road to the playoffs, and there are certainly the non-playoff bowl games to consider, but I will leave that debate to the masses.
The other way to “split” this data is to look at it by the type of sport that it is. Some sports are truly team sports, like Football, Basketball, and Soccer.
There are also a few sports which are at its core individual sports, but the structure of the event brings a team element into play and the team as a whole qualifies for the event. Examples include Cross Country, Golf, and Women’s Bowling.
Other sports are based on individual qualification and the “team” component only comes into play if you as an individual have qualified for the Championships. Examples of this include Fencing, Wrestling, Swimming, and Track. Hence, sports like these might have a higher number of teams representing them, but may only have a single athlete or two from that school that have qualified.
The bar charts give you a percentage, but the following scatter plots should help illustrate the volume of teams participating and making the playoffs.
And here’s that same chart colored by sport type:
So if you want to be playing for a national championship, Men’s FBS Football may not be the best sport to do it in. Have you considered Men’s Gymnastics?
Source information here: NCAATeamsAndPlayoffs_2015.csv
I am also happy to hear comments and corrections on information that I might have missed.
Category: ‘Game Shows’
Answer: Between ‘Jeopardy!’ and ‘Wheel of Fortune’, this program is shown first each day.
Question: What is…well, it depends on where you live.
I grew up in New Jersey, and every weeknight, ‘Jeopardy!’ started at 7:00pm and ‘Wheel of Fortune’ came on directly after at 7:30pm. One year when I was visiting family in Virginia, and I entered a Bizarro world where ‘Wheel’ was on FIRST, and ‘Jeopardy!’ second.
Many years later, I had a thought. What do most Americans see first? ‘Jeopardy!’ or ‘Wheel’?
There are 210 different media markets in the United States. From the research I’ve gathered from the ‘Jeopardy!’ and ‘Wheel’ websites, 206 of the 210 media markets have a local TV affiliate which airs the shows. Now I will admit that the four missing markets may indeed get these shows. Perhaps the listings on the show websites were not complete, or perhaps they receive these broadcasts from neighboring markets. I’m not quite sure.
129 of the markets show ‘Jeopardy!’ first, and 77 of the markets show ‘Wheel’ first. There’s also information available about the approximate number of televisions in a given media market, and to that end 74 million televisions get ‘Jeopardy!’ first, and 40 million get ‘Wheel’ first.
However, just seeing the numbers isn’t the full picture. What does this information look like on a map? Well, here you go.
It’s interesting to take a geographic look at markets which show ‘Wheel’ first. There’s a concentration on the east coast, and pockets across the nation.
For the majority of the nation, when you watch these shows, they come one right after the other. That’s not the case across the entire nation, as this next set of graphs will show.
‘Jeopardy!’ tends to get a much earlier start time overall. A number of markets will choose to show ‘Jeopardy!’ early on in the day, especially those markets in the mid-west which tend to show ‘Jeopardy!’ before the local news, and ‘Wheel’ a few hours later right before Prime Time. If you want to be the first in the nation to see the show, I recommend moving to the Montgomery-Selma, Alabama market. They show new episodes at 9:30am local time. The last market in the nation to air new episodes is the Lafayette, Louisiana market, which starts the show at 12:36 AM. KATC-3 airs ABC Prime Time shows, Local News, Jimmy Kimmel, Nightline, Inside Edition, and then finally good ol’ ‘Jeopardy!’.
Wheel of Fortune is a much different story. There are four time slots: 6:00, 6:30, 7:00, and 7:30. That’s it. No deviation. No late nights or early mornings.
Here’s a look of the build over time, with respect to local time. You can see ‘Jeopardy!’ gradually building up through the day, and then in the ‘Power Hours’ between 6:00 and 8:00, ‘Wheel’ is shown for everyone. And finally, our friends in the Lafayette market get to see ‘Jeopardy!’ in the late late evening.
And here’s what this looks like on a map. The lighter the color, the earlier in the day it’s shown. For ‘Jeopardy!’, you’ll notice that most of the early showings happen in the Central time zone. Interestingly, most of the largest markets in the US show ‘Jeopardy!’ closer to prime time. However, Chicago shows it at 2:30pm local time on the station WLS.
‘Wheel’, as mentioned before, is much more uniform. Earlier 6pm to 7pm times in the Central and Mountain time zones, with Eastern and Pacific tending to air in the 7pm to 8pm hour.
Now, what happens if we take a look at time as it relates to a single time zone? A show may air at 7:00pm in the East, but when it’s shown at 7:00pm on the west coast, it’ll be 10:00pm back east. These graphs show the build over time with time zone shifts applied as they relate to the Eastern time zone. So, when something is shown at 7:30pm Eastern and 6:30pm Central, they’re actually on at the same time.
Here’s that build over time, with Montgomery kicking things off at 10:30am and Lafayette shutting it down at 1:36am Eastern Tim the following morning. ‘Wheel’ is more spread out in this case, with the final showing at 11:30pm Eastern Time in the Honolulu market.
Here’s what those time shifts look like on a map, with the gradients scaled to show later times in a darker hue. First, ‘Jeopardy!’.
When we look at the difference in times between shows within the same market, the majority of airings have one show directly after the other. When that is not the case, ‘Jeopardy!’ will often be shown first, then a gap, then ‘Wheel’ later on in the day. In fact, there is only one market in the US which shows Wheel first and then doesn’t show ‘Jeopardy!’ right after, and that’s our friends in Lafayette who show ‘Wheel’ at 6:30pm and wait until 12:36am to show ‘Jeopardy!’.
Here’s those time differences shown geographically as well. Blue hues are ‘Jeopardy!’, and Red hues are ‘Wheel’. Notice there are only two red hues, since ‘Wheel’ is always followed directly by ‘Jeopardy!’ in those markets, save for Lafayette.
Finally, I wondered which networks aired the shows, as in ABC, CBS, FOX, NBC, MYTV, or Independents. The results were actually quite surprising and extremely spread out, but skewed in favor for ABC, CBS, and NBC. Here’s a look at the number of markets plus the number of televisions within those markets for both shows.
You’ll notice that in terms of number of markets, it’s fairly even between ABC, CBS, and NBC. However, ABC affiliates have coverage in the top four markets (New York, Los Angeles, Chicago, and Philadelphia) and six of the top eight, which skews the number of TVs highly in their favor.
Fun Fact: In 23 of the 206 markets, the two shows are actually shown on DIFFERENT networks. Most of these cases tend to be in the Central and Mountain time zones.
Overall, I hope you enjoyed. If you want the tl;dr version:
Update: May 6, 2014
Not surprisingly, I’ve gotten some of the data points wrong. Reddit user RAS310 asked me:
Just the other day I was thinking about which affiliate airs the shows the most. Do you know which market is the sole one that airs Wheel at 6:30 Eastern? I thought none of them aired the show before 7.
This caused me to look back into some of my original data points. Well, it seems that the question has revealed a problem with the Wheel of Fortune website and with the KML files I used to draw the maps.
The airtime at 6:30 Eastern came back as the ROCHESTER, MN-MASON CITY, IA-AUSTIN, MN. This is wrong for two reasons.
Wheel recently changed how you can look up airtimes. Before it was a clickable map of the US, and it showed you the TV Markets and what time they aired Wheel. They went to an newer version based on ZIP code look up. I looked up a sample ZIP code for Austin, MN (55912). When you plug it into the Wheel website, KXAN-TV, a station in Austin, TX showed up. I didn’t realize I was looking at a Texas station, so I picked up the wrong airtime.
That didn’t explain the time zone shift though, as TX and MN are both Central time. It also looks like there’s a mix-up in the KML file of the TV Markets I obtained that switched the labels for Rochester, MN and Rochester, NY. I did time-zone shifts based on the codes for those, so Rochester, MN is EST for the color shifts, and Rochester, NY is CST for their color shifts.
So, a number of errors on my part in gathering the data.
Happy to hear any additional thoughts in the comments.
This winner of this month’s award for “Unexpected Achievement in the World of Graphs” is Sean Taylor of the Facebook Data Science Team. Rather than describe what has been done, I’ll just leave a link here and say it’s Super Bowl related. It’s better explained by Sean anyway.
We here at GraphGraph appreciate a good graph, but what is getting our spreadsheets all in a pivot right now is dreaming of the amount of data that the good folks* at Facebook have at their disposal.
I have one problem with their presentation, and that would be the use of grey as a color. I understand that with 32 teams, there are only so many color options, and I can’t at this moment say how I would have done it differently. Nevertheless, to my eye, grey always looks like it represents “neutral” or “no data available,” not “Patriots or Colts or maybe even Cowboys.” Oh well.
There is a series of maps that shows the support for each remaining team as this year’s postseason progressed. I immediately wished there was an animated version, so I created a gif for your internetting consumption. Enjoy.
I’d like to see this map redone with the map weighted by population, like they do around election time. I also wouldn’t mind seeing this for other sports, like baseball and basketball and curling. Finally, I would be remiss not to mention two things:
1) Sean, Corey, and I all share the same alma mater.
2) Go Ravens.
*I sincerely hope that they are good, given all of the embarrassing pictures they have of young graph enthusiasts.
Monopoly’s a great game, isn’t it? Did you know that the properties in the US version of the game are named after actual streets in Atlantic City, NJ?
We had a thought: What would it look like if you drew lines on the ACTUAL streets on Atlantic City? Well, here you go:
We made this using the ‘Custom Maps’ feature of Google Maps, and the embedded map is below:
View Monopoly Streets in a larger map
Some interesting notes:
I recently leased a 2012 Kia Optima Hybrid. I’ve been doing a lot of driving for work lately, so I decided to get a mid-size car that could handle a lot of highway miles plus give me decent MPGs.
Being the numbers nerd that I am, I’ve been keeping track of various different stats.
The EPA estimates for the car when I bought it were 40 Highway & 35 City for an average of 37.
After the first nine fill-ups, I’ve been disappointed. I was averaging 29.9 MPGs, WAY below the EPA estimates.
Here’s a graph showing the numbers through the first nine fill-ups:
My numbers are extremely off. Why is this? There are a few options:
Let’s examine each of these:
“The brand of car doesn’t actually give the MPGs promised.”
Right after leasing the car, Kia (and parent company Hyundai) was docked by the EPA for overstating their MPG numbers. The new estimates were 39 Highway and 34 City, for an average of 36. Kia is trying to make it right, though, through partial reimbursements that you can read about here.
Even with the “new” estimates, however, I’m still way off the mark.
How do I compare with other Kia Optima Hybrid owners? My co-worked showed me an amazing site called Fuelly, which is essentially a fuel stat-tracking website. The added value of the website, however, is that you can look at all other owners of the same model and see how you compare to them.
Here’s the link to the list of other 2012 Kia Optima Hybrid owners.
Here’s some basic statistical data about the Optima, as of December 20, 2012:
MPG City: 34
MPG Highway: 39
Standard Deviation: 4.91
The mean is right at the MPG City number, and the median and mode are to the left of that.
For comparison’s sake, here’s a screenshot and some basic statistical data about the 2012 Toyota Prius from December 20, 2012.
MPG City: 51
MPG Highway: 48
Standard Deviation: 4.41
The mean, median, and mode all fall within the EPA estimates.
Perhaps my sample size is too small? My personal number of 30 MPGs is -.81 Standard Deviations off the mean, so perhaps it means that problem is multi-faceted: The MPGs for the brand are not what was promised, AND there are problems with my particular car. Let’s examine the second half of that point next.
“My specific car doesn’t actually give the MPGs promised” and “I’m a terrible driver who drives inefficiently”
According to my rough statistics, I’m -.81 Standard Deviations off the mean. So, what’s wrong with my particular car? Is it a problem with the car or with the driver? Or both?
In regards to the car, it’s a brand-new lease, so I would hope that there is nothing wrong with it. After the latest fill-up, I decided to check the tire pressure. The tires are meant to have 44 psi, but each tire was hovering between 30 to 34. Yikes! I’ll have to see if this gives me an improvement.
Looking at various sites about getting better fuel efficient driving, I stumbled up this post specifically about the Kia Optima Hybrid, including this video about “best practices”.
So perhaps the problem is my individual car (which I will have to continue investigating), but perhaps the problem is my driving? I feel that I’ve tried to adjust to the hybrid, but perhaps I can I still do better?
Instead of a tachometer, the Optima Hybrid has an “efficiency” gauge that gives instant feedback of “good”, “medium”, and “poor” driving. There’s even an “Eco Score” that gives you points for driving “efficiently”. I throw that in quotes because it’s based on what the car thinks is ideal, but from a gamification point-of-view it creates an incentive for me to try to drive better to earn virtual “points”, which should (in theory) correspond with better MPGs. I didn’t start tracking my Eco Points until my 7th fuel-up, and in a future post I’ll see if there’s a relationship between tank MPG and Eco Points.
[Image from CNET.]
Another potentially contributing factor would be where I live and the current weather. I’m doing a lot of travel between Pennsylvania and New Jersey, and it involves a lot of up and down through rolling hills. We’re also moving into winter, and as a result the car is very cold in the morning and has frost, meaning I need to burn fuel to defrost and to heat the car up when I start driving. Will other seasons be better for my MPGs?
“The gallons being dispensed at the pump are not the gallons actually being put into the car”
The last thing we’ll look at today is about the trust you have at the pump about the gallons that you purchased. At the end of November, I had two fuel-ups at the same station in New Jersey where my MPGs seemed really low compared to the average. When I fueled up the second time there, the tank was about 1/6 full but I noticed that they dispensed nearly a full tank’s worth of fuel. Something seemed off here.
Fortunately there are consumer-protection groups such as local Weights and Measures departments. I placed a call and they’re investigating, so we’ll see if I was dealing with a crooked gas station, or perhaps my far really did take a full tank of fuel.
From an overall point of view for MPGs, since it’s a ratio of miles divided by gallons, you have to assume that the gallons being dispensed at the pump are the actual gallons being put into your tank. If not, any calculation you do will be suspect.
So what’s next? I see a few actions:
Got any tips for fuel-efficient driving? Leave them in the comments!
Sometimes at work we get requests to visualize data on maps. It’s a really cool feature, but the challenge is generally the calculation being used to drive the chart is a straight sum, and states like California, New York, and Texas always seem to have the highest values.
XKCD had a great comic the other day that I 100% agree with: the problem with geographic heat maps is that it’s essentially just a population map.
This is why anytime you’re putting together a heat map, it’s best to normalize the data as best as you can with a per-capita calculation.
Compare the following two calculations:
This very simple switch allows you to make a much more effective comparison of large states like California to small states like Rhode Island.
USA Today offers a graph every morning in the bottom-left corner of the front page.
Take a look at this first one:
Why am I not surprised by these results? It’s essentially a “top 5 population” map. The only thing that seems off is that Arizona and Georgia are showing up here, but I’m bringing outside knowledge that the population is not very high there, so I can assume that it must be an outlier.
However a few weeks later I picked up the paper and pleasantly surprised to see this map:
Much better! Now that they’ve switched to a percentage-focused view, I get a much better sense that in these states the proportions are indeed larger when compared to other states.
Map visualizations can be very powerful, but a little simple division can help you get a greater wealth of information from the same pixel space!
During the course of the 2012 U.S. presidential election you’ve no doubt seen lots of maps of the United States.
The maps that most people saw on election night (and in the weeks running up to it) had a very simple binary look: Blue for Obama, and Red for Romney. Usually, they just have one color per state because that’s what matters in the Electoral College.
However, it’s interesting to split that data out into counties as well.
This map (found on Gawker) takes it three steps beyond just the standard red/blue state map. The second map shows counties with a binary red/blue scheme. The third map shows each individual county on a red to purple to blue scale. The final map changes the transparency of any given county based on the population of that county; the brighter the county the more people that live there.
I think this gives a great visualization because it gives a truer perspective of where the votes fell in this election.
Another way to show this data is through a cartogram. Since the presidential election is decided by electoral votes, it makes sense to scale the US appropriately. This cartogram mashes up the two concepts nicely; the shape still resembles the United States, but gives you a more accurate representation in each state’s contribution to the electoral vote total.
A great deal has been made this year about election spending. This video courtesy of NPR gives fascinating insight as to where the money in this election was being spent, represented in maps.
What other interesting maps did you find in this past election?
Last year I decided the best way to have fun on Halloween was to make graphs. It was so much fun, I decided to do it again this year.
When the night was over, we had a whole lot more leftover candy than last year. Did we buy too much candy? Did not enough Trick-or-Treaters visit this year? Why didn’t we run out of candy like we did last year?
The basic premise was the same:
Last Year’s Stats – 2011
This Year’s Stats – 2012
What a difference! We bought about the same about of Treats as the year prior, yet we had a LOT more leftover candy, even though there were more Trick-or-Treaters.
Because of this, we only ran out of two types of candy: M&M’s Peanut and Skittles, and we ran out of those types in the final 15 minutes.
Here’s a graph showing the starting and ending percentages of the different candies:
Purple marks if it was taken LESS relative to other candies.
Orange marks if it was taken MORE relative to other candies.
Let’s also group the candy types together and see if there’s a trend:
Candy that was in Bar form (for example, Hershey’s and Snickers) was less popular than candy in Bit form (for example, M&Ms and Starburst).
Sugar-based candies (Skittles and Starburst) were more popular than Chocolate and Nut candies. This is a departure from last year, when we had a bunch of Starburst left over!
In last year’s post I noted that trying to put all candies individually on a line chart would make it messy, and very hard to get any information out of the chart. This year, I decided to use a Trellis chart to help alleviate that problem.
In this chart, each brand gets its own view. However, for the candy types where I didn’t have a large starting amount, it’s hard to discern differences. If we start each brand at 100% and work downwards from there, we can see trends of how quickly (or slowly) a particular type of candy was taken, and since we counted at the mid-point, we can see which types went faster earlier or later in the evening.
Here is where this gets REALLY geeky.
Last year I put together a basic line chart highlighting the inverse relationship between number of Trick-or-Treaters and the amount of pieces they took. As it turns out, my formula for calculating that number was flawed.
Here’s last year’s chart:
What was flawed about it was the way I calculated the number of pieces per Trick-or-Treater. Last year I took a full count of the candy at specific times:
Here’s a screenshot of my Excel sheet from last year. Any cell with a gray background means an actual observation, and white cells represent a formula to approximate what the best-guess of the remaining amount was.
The problem was with the old formula. Last year I had assumed that the amount between observation points should have been:
S = Second Observation Point
F = First Observation Point
RR = Number of 15 Minute Intervals Remaining until Second Observation Point
TR = Total Number of 15 Minute Intervals between First and Second Observation Point
S + ((F-S) * RR/TR) = Candy Remaining for a Given Interval
This is pretty similar to a standard depreciation formula as you move from Date A to Date B.
However, that assumes the same amount of Trick-or-Treaters in each 15-minute interval, which was NOT the case. Given that I knew how many kids visited within each 15-minute interval, I could better refine the formula to approximate the number of pieces remaining within each time block.
S = Second Observation Point
F = First Observation Point
RTT = Number of Trick-or-Treaters remaining until Second Observation Point
TTT = Total Number of Trick-or-Treaters between First and Second Observation Point
S + ((F-S) * RTT/TTT) = Candy Remaining for a Given Interval.
This leads to a much more refined formula. Here’s last year’s chart again, with the old and new formulas:
So now that we’ve established a new (and hopefully better) formula, we can compare this year to last year using the same methodology:
Interesting things to note:
This year we picked up seven different types of multi-pack bags. In every instance, we received more candy than what was promised on the bag, which was a nice plus.
Favorites and Non-Favorites
Last year we saw that sugar-based candies where the least likely to be taken by the Trick-or-Treaters. What about this year?
This is a relatively boring graph, in that there’s very little movement. That in itself tells a story, though, in the fact that for the most part, kids were taking candy in roughly equal proportions. Was this because we broke the three types out into three separate buckets? Were kids just evenly grabbing from each?
Compared to last year, the increase of sugar candies compared to the average is the most interesting. What was the difference? Did more kids take a liking to Skittles and Starburst? Did we do a better job preventing the smaller Starburst packages from falling to the bottom of the bucket? These are the mysteries of life that allude us all.
Just for kicks, here’s the graph comparing Bar candies to Bit candies:
Again, very little movement!
Intervals for Trick-or-Treating
We had 216 total Trick-or-Treaters. I was able to track two things with each group:
There were 60 total groups of Trick-or-Treaters, with an average size of 3.6.
Here’s the distribution of groups:
This makes for interesting visualizations when you decide the time interval to split it by:
Planning for Next Halloween
Now, who wants to help me eat this leftover candy?