In this problem, we will find numerical and graphical summaries of the Titanic dataset

using R. The dataset consists of information on all the passengers of the ill

fated trans

Atlantic ship Titanic. We are looking at only 3

columns (variables) in the dataset, namely

Age, Sex and Survival status of the passengers. The dataset “titanic_as.RData” is (.RData

is a convenient R data format) available in the homework folder on Carmen. Load the

file using the command:


Note, if you are not in the same directly as the .RData file then you need to put the

filepath in front of titanic_as.RData. Once you load the data you should see a data.frame

object with the name “titanic_as”. The data.frame has 3 columns/vari

ables, Age, Sex

and Survived. In the variable “Sex”, 0 indicates female and 1 indicates male. In the

variable “Survived”, 0 indicates did not survive and 1 indicates survived. Note that the

data does not contain information on all the passengers and may no

t match the version

of the same dataset available elsewhere. Use only this dataset to answer the following

questions. You should report all R code used to obtain the answers (at the end of your

homework as a script). Do NOT print the data file.


What fraction of people survived the crash?


Report the summary statistics for the variable Age. Your summary statistics

should at least contain the mean, median and standard deviation.


Report the summary statistics for the variable Age only for those passengers who

survived the crash (i.e., whose value in the Survived column is 1).


Plot the histogram for the variable Sex. Then, plot the histogram for the same

variable, but only for people who survived.


What comments can you make about the proportions of male-female passengers in the entire ship, and among those who survived, on the basis of the two

histograms you generated in part (d)?

Note on R Problems:

For full credit on the R problems,

make sure that you are displaying

labeled tables and graphs that are easy to read. Your results should be presented as they

would be in a formal report. You are also required to attach the R code at the end of the

homework (R code only without any results mixed in).

