1.3 Exercises

The following exercises refer to a dataset containing information about video games sold between 2004 and 2018. A few rows of the data are shown below.

number

game

release_date

price

owners

developer

publisher

average_playtime

median_playtime

metascore

release_year

316

Decisive Campaigns: Case Blue

2012-07-16

39.99

0 .. 20,000

VR Designs

Slitherine Ltd.

0

0

2012

270

Clones

2010-11-18

0.99

100,000 .. 200,000

Tomkorp Computer Solutions Inc.

Tomkorp Computer Solutions Inc.

0

0

2010

379

Midvinter

2016-05-05

4.99

0 .. 20,000

Talecore Studios

Talecore Studios, Valiant Game Studio AB

0

0

2016

6929

Quest room: Hanon

2018-10-25

2.99

0 .. 20,000

BURRIK

BURRIK

0

0

2018

7271

ReHack

2018-03-23

1.99

0 .. 20,000

EasyWays Team

EasyWays

0

0

2018

Exercise 1.3.1

The following bar plot shows how many games were published in each of the years in this dataset:

  1. Which of the following would be a reasonable conclusion from this plot? (Check all that apply)
  • Half of all games in the dataset were published in 2018
  • The number of video games published has been increasing over the years.
  • There were probably more games published in 2019 than 2018.
  • There were more games published in 2010 than in 2009

Hint: Remember that this dataset is a sample, that we use to draw conclusions about the population.

  1. Which of the following is a way we could improve this plot:
  • Take away the axis labels
  • Take away the title
  • Put the years in order from most games to least games
  • Change the y-axis from counts to percents
  • Label all the bars, not just 2004, 2008, 2012, and 2016.

Exercise 1.3.2

Consider the following plot, showing the publisher variable, i.e., which company released the game. Only the top 10 most common publishers are shown.

  1. Which of the following would be a way to improve the y-axis of this plot?
  • Change the label from “count” to “counts of games”
  • Remove the label entirely
  • Change the scale from counts to percents
  • Make the axis start at 50 instead of 0, so we don’t waste space
  1. Which of the following would be a way to improve the x-axis of this plot?
  • Change the label from “publisher” to “video game publisher”
  • Remove the label entirely
  • Reorder the publishers so the bars are from biggest to smallest
  1. Which of the following would be a way to improve the color choices of this plot?
  • Make the background white instead of grey
  • Make each bar a different color, from a colorblind-friendly palette
  • Add a background color to the white area around the plot
  • Change the title to be an eye-catching color
  1. Which of the following would be a way to improve the titles and captions of this plot?
  • Remove the title
  • Include a subtitle stating that these are the top 10 publishers only
  • Include a caption stating that the data is from 2004 to 2018
  • Include a caption with a link to the original data source

Exercise 1.3.3

Consider the following histogram, which shows the metascore (i.e., average player rating) for the games in this dataset:

  1. What can we conclude from the histogram? (Check all that apply)
  • The mean metascore is around 73.
  • The median metascore is around 73.
  • The median is probably bigger than the mean.
  • This variable is right-skewed.
  • There are four modes in this variable.

Now consider the following histogram, showing the average playtime of games in this dataset.

  1. It appears that there are only three bins in the histogram: one tall one around zero, and two short ones next to it. However, the x-axis goes all the way to 6000. What is going on here?
  • We need to use a smaller binwidth, so we can see more bins.
  • We need to use a bigger binwidth, so we can capture more observations in each bin.
  • We should cut off the part of the plot to the right, because none of our data is there.
  • This variable is extremely right-skewed, and the bins at the high numbers are too small to see.

Now consider the same plot, but with all the games with an average playtime of 0 (meaning nobody has played it yet) removed:

  1. What can we conclude from the histogram? (Check all that apply)
  • The mean playtime is around 1000
  • The median playtime is around 1000.
  • The median is probably bigger than the mean.
  • This variable is right-skewed.
  • There are no modes in this variable.

Exercise 1.3.4

Consider the following visualization, from the webcomic xkcd:

  1. Which of the following is part of the aesthetic (or mapping) of the plot?
  • There is no title
  • The x-axis is age
  • The y-axis is “Urge to run up the down escalator”
  • The y-axis ranges from “Weak” to “Strong”
  • This is a line graph
  • The dots probably represent an average over the year he was a certain age.
  • There are two lines: one for “What I was supposed to feel” and “What I’ve actually felt”
  • The lines are labeled with text (“What I was supposed to feel” and “What I’ve actually felt”)
  • Only even ages are labelled
  • Stick figure people are sliding down the line
  1. Which of the following is part of the geometry of the plot?
  • There is no title
  • The x-axis is age
  • The y-axis is “Urge to run up the down escalator”
  • The y-axis ranges from “Weak” to “Strong”
  • The dots probably represent an average over the year he was a certain age.
  • This is a line graph
  • There are two lines: one for “What I was supposed to feel” and “What I’ve actually felt”
  • The lines are labeled with text (“What I was supposed to feel” and “What I’ve actually felt”)
  • Only even ages are labelled
  • Stick figure people are sliding down the line
  1. Which of the following is part of the facets of the plot?
  • There is no title
  • The x-axis is age
  • The y-axis is “Urge to run up the down escalator”
  • The y-axis ranges from “Weak” to “Strong”
  • The dots probably represent an average over the year he was a certain age.
  • This is a line graph
  • There are two lines: one for “What I was supposed to feel” and “What I’ve actually felt”
  • The lines are labeled with text (“What I was supposed to feel” and “What I’ve actually felt”)
  • Only even ages are labelled
  • Stick figure people are sliding down the line
  1. Which of the following is part of the statistics of the plot?
  • There is no title
  • The x-axis is age
  • The y-axis is “Urge to run up the down escalator”
  • The y-axis ranges from “Weak” to “Strong”
  • The dots probably represent an average over the year he was a certain age.
  • This is a line graph
  • There are two lines: one for “What I was supposed to feel” and “What I’ve actually felt”
  • The lines are labeled with text (“What I was supposed to feel” and “What I’ve actually felt”)
  • Only even ages are labelled
  • Stick figure people are sliding down the line
  1. Which of the following is part of the coordinates or scale of the plot?
  • There is no title
  • The x-axis is age
  • The y-axis is “Urge to run up the down escalator”
  • The y-axis ranges from “Weak” to “Strong”
  • The dots probably represent an average over the year he was a certain age.
  • This is a line graph
  • There are two lines: one for “What I was supposed to feel” and “What I’ve actually felt”
  • The lines are labeled with text (“What I was supposed to feel” and “What I’ve actually felt”)
  • Only even ages are labelled
  • Stick figure people are sliding down the line
  1. Which of the following is part of the theme of the plot?
  • There is no title
  • The x-axis is age
  • The y-axis is “Urge to run up the down escalator”
  • The y-axis ranges from “Weak” to “Strong”
  • The dots probably represent an average over the year he was a certain age.
  • This is a line graph
  • There are two lines: one for “What I was supposed to feel” and “What I’ve actually felt”
  • The lines are labeled with text (“What I was supposed to feel” and “What I’ve actually felt”)
  • Only even ages are labelled
  • Stick figure people are sliding down the line

Exercise 1.3.5

Recall the Star Wars dataset from Exercises 1.2:

name

height

mass

hair_color

eye_color

age

gender

homeworld

from_tatooine

Luke Skywalker

172

77.0

blond

blue

19.0

masculine

Tatooine

1

Darth Vader

202

136.0

none

yellow

41.9

masculine

Tatooine

1

Leia Organa

150

49.0

brown

brown

19.0

feminine

Alderaan

0

Owen Lars

178

120.0

brown

blue

52.0

masculine

Tatooine

1

Beru Whitesun Lars

165

75.0

brown

blue

47.0

feminine

Tatooine

1

Biggs Darklighter

183

84.0

black

brown

24.0

masculine

Tatooine

1

Obi-Wan Kenobi

182

77.0

auburn

blue-gray

57.0

masculine

Stewjon

0

Wilhuff Tarkin

180

auburn

blue

64.0

masculine

Eriadu

0

Han Solo

180

80.0

brown

brown

29.0

masculine

Corellia

0

Wedge Antilles

170

77.0

brown

hazel

21.0

masculine

Corellia

0

Palpatine

170

75.0

grey

yellow

82.0

masculine

Naboo

0

Boba Fett

183

78.2

black

brown

31.5

masculine

Kamino

0

Lando Calrissian

177

79.0

black

brown

31.0

masculine

Socorro

0

Lobot

175

79.0

none

blue

37.0

masculine

Bespin

0

Mon Mothma

150

auburn

blue

48.0

feminine

Chandrila

0

Arvel Crynyd

brown

brown

masculine

Raymus Antilles

188

79.0

brown

brown

masculine

Alderaan

0

Match the following research questions about this dataset to one of the plots below:

  1. Do characters from the planet Tatooine have different eye colors than characters not from Tatooine?
  2. Is a blue-eyed character more likely to be from Tatooine, or not?
  3. Are there more blue-eyed Tatooine natives, or more brown-eye Tatooine natives?
  4. Are more characters from Tatooine or not from Tatooine?

Exercise 1.3.6

The following plot shows information from the video games dataset:

Comment on this plot. Include:

  • What research question it addresses, and how you would answer the question.
  • Descriptions of the shape of the quantitative variable in each category.
  • Any improvements you would make to the plot.