Chapter 3 Example datasets

3.1 Edgar Anderson’s Iris Data

In R:


From the iris manual page:

This famous (Fisher’s or Anderson’s) iris data set gives the measurements in centimeters of the variables sepal length and width and petal length and width, respectively, for 50 flowers from each of 3 species of iris. The species are Iris setosa, versicolor, and virginica.

Iris setosa (credit Wikipedia) Iris versicolor (credit Wikipedia) Iris virginica (credit Wikipedia)


For more details, see ?iris.

3.2 Motor Trend Car Road Tests

In R


From the ?mtcars manual page:

The data was extracted from the 1974 Motor Trend US magazine, and comprises fuel consumption and 10 aspects of automobile design and performance for 32 automobiles (1973-74 models).


For more details, see ?mtcars.

3.3 Sub-cellular localisation

The hyperLOPIT2015 data is used to demonstrate t-SNE and its comparison to PCA. These data provide sub-cellular localisation of proteins in Mouse E14TG2a embryonic stem cells, as published in Christoforou et al. (2016).

The data comes as an MSnSet object from the Biocpkg("MSnbase") package, specifically developed for such quantitative proteomics data. Alternatively, comma-separated files containing a somehow simplified version of the data can also be found here.

These data are only used to illustrate some concepts and are not loaded and used directly to avoid installing numerous dependencies.

They are available through the Bioconductor project and can be installed with

biocLite(c("MSnbsase", "pRoloc")) ## software
biocLite("pRolocdata") ## date

3.4 The diamonds data

The diamonds data ships with the ggplot2 package and predict the price (in US dollars) of about 54000 round cut diamonds.

In R:

## Warning in instance$preRenderHook(instance): It seems your data is too
## big for client-side DataTables. You may consider server-side processing: