In this problem, we will find numerical and graphical summaries of the Titanic dataset
using R. The dataset consists of information on all the passengers of the ill
Atlantic ship Titanic. We are looking at only 3
columns (variables) in the dataset, namely
Age, Sex and Survival status of the passengers. The dataset “titanic_as.RData” is (.RData
is a convenient R data format) available in the homework folder on Carmen. Load the
file using the command:
Note, if you are not in the same directly as the .RData file then you need to put the
filepath in front of titanic_as.RData. Once you load the data you should see a data.frame
object with the name “titanic_as”. The data.frame has 3 columns/vari
ables, Age, Sex
and Survived. In the variable “Sex”, 0 indicates female and 1 indicates male. In the
variable “Survived”, 0 indicates did not survive and 1 indicates survived. Note that the
data does not contain information on all the passengers and may no
t match the version
of the same dataset available elsewhere. Use only this dataset to answer the following
questions. You should report all R code used to obtain the answers (at the end of your
homework as a script). Do NOT print the data file.
What fraction of people survived the crash?
Report the summary statistics for the variable Age. Your summary statistics
should at least contain the mean, median and standard deviation.
Report the summary statistics for the variable Age only for those passengers who
survived the crash (i.e., whose value in the Survived column is 1).
Plot the histogram for the variable Sex. Then, plot the histogram for the same
variable, but only for people who survived.
What comments can you make about the proportions of male-female passengers in the entire ship, and among those who survived, on the basis of the two
histograms you generated in part (d)?
Note on R Problems:
For full credit on the R problems,
make sure that you are displaying
labeled tables and graphs that are easy to read. Your results should be presented as they
would be in a formal report. You are also required to attach the R code at the end of the
homework (R code only without any results mixed in).