Titanic Survival Predictor

About the Project

The Titanic Survival Predictor is a machine learning model that predicts the likelihood of a passenger's survival based on various factors such as age, gender, and ticket class. This project demonstrates the power of data science in analyzing historical events and drawing meaningful insights.

Journey to Predict Titanic Survival

Step 1: Gathering the Passenger Manifest

Started with a list of passengers and their information, like age, gender, and ticket class.

Analogy: Imagine you're the ship's captain, collecting boarding passes as passengers embark.

Step 2: Cleaning Up Passenger List

Making sure list is complete and accurate, filling in any missing information.

Analogy: It's like making sure everyone's name is spelled correctly on the guest list for a big party.

Step 3: Dividing Passengers into Groups

Spliting passenger list into two groups: one for training prediction model, and one for testing it.

Analogy: Think of it as dividing a deck of cards into two piles: one to practice card tricks with, and one to perform the tricks for an audience.

Step 4: Teaching Model

Using the training group to teach model how different factors affected survival rates.

Analogy: It's like teaching a new crew member how to spot which passengers might need extra help in an emergency.

Step 5: Testing Model

We use the testing group to see how well model predicts survival, comparing its predictions to what actually happened.

Analogy: Imagine running a lifeboat drill with the passengers we set aside, seeing if crew member correctly identifies who needs help.

Step 6: Improving Model

Based on how well model performed, we make adjustments to improve its accuracy.

Analogy: It's like fine-tuning emergency procedures after the drill, making sure we're as prepared as possible for the real thing.


The Model

We used logistic regression to build prediction model. The process involved data preprocessing, feature engineering, and model training. Here are some visualizations of data and model:

Correlation Heatmap

Correlation Heatmap

This heatmap shows the correlation between different features in dataset. Darker colors indicate stronger correlations, helping us identify which factors might be most important for predicting survival.

Feature Distribution

Feature Distribution Subplots

These subplots show the distribution of various features in dataset, separated by survival outcome. They help us visualize how different factors might influence a passenger's chances of survival.

Results and Evaluation

After training model, we evaluated its performance using various metrics. Here are the key results:

Confusion Matrix

Confusion Matrix

The confusion matrix shows how well model predicts survival and non-survival. It helps us understand the model's accuracy, including false positives and false negatives.

Learning Curve

Learning Curve

The learning curve shows how model's performance improves as it's trained on more data. It helps us determine if we need more training data or if model is overfitting.

Model Performance

  • Accuracy:
  • Precision:
  • Recall:
  • F1-score: