Surviving the Titanic: What Are Your Chances?

Have you ever wondered what your chances of surviving the Titanic disaster might have been? This project uses data science and statistical modeling to provide an answer. Rather than guessing, our model predicts survival probabilities based on historical Titanic passenger data and detailed analysis. Let’s explore how it works.

The Big Idea

The goal was to build a tool that predicts the likelihood of surviving the Titanic sinking based on key passenger details. Using a logistic regression model trained on historical Titanic data, we can estimate a passenger’s survival odds from just a few pieces of information.

  • Data Formats:
    Uses text and numeric data formatting. Numeric data is used for inputs like passenger age, fare, and number of family members aboard. Text data includes categorical inputs such as passenger class, sex, and port of embarkation. The output is provided as survival probabilities in textual format.

  • Different Use Cases:
    This model can be used for education and data science demonstrations. While it predicts Titanic survival chances based on historical data, its primary value lies in teaching statistical modeling and machine learning concepts. It helps students and enthusiasts understand how various passenger features influence survival outcomes.

  • Pipeline Stages:
    (Link to the documentation blog will go here)

  • Tooling/Libraries:
    This project uses scikit-learn (LogisticRegression and DecisionTreeClassifier) and Pandas for data manipulation. It is implemented with a full Python backend and JavaScript frontend. The Titanic dataset is loaded using the Seaborn library, which provides access to a cleaned historical dataset.

  • Data Source Type:
    The dataset used is a publicly available Titanic passenger dataset, commonly used in machine learning education and it gathers data whenever a sumbission is submitted, right now the data is from students.

The Model

To train our model, we used logistic regression — a statistical technique ideal for binary classification problems such as predicting whether a passenger survived or not. Here’s why logistic regression works well for this task:

  • It predicts probabilities between 0 and 1, providing survival likelihoods.
  • It is interpretable, allowing us to see which features influence the outcome.
  • It performs well on smaller datasets with binary outcomes.

This model is based on open-source code by Mr. Mortenson, from whom I obtained the original implementation.

Making a Prediction

When you use the web form, you input:

  • Passenger class (1st, 2nd, or 3rd)
  • Age
  • Sex
  • Number of siblings/spouses aboard
  • Number of parents/children aboard
  • Fare paid
  • Port of embarkation (C, Q, or S)
  • Whether the passenger is traveling alone

Your inputs are sent to the backend, where they are fed into our Titanic survival prediction model.

api = Api(titanic_api)
class TitanicAPI:
    class _Predict(Resource):
        
        def post(self):
            """ Semantics: In HTTP, POST requests are used to send data to the server for processing.
            Sending passenger data to the server to get a prediction fits the semantics of a POST request.
            
            POST requests send data in the body of the request...
            1. which can handle much larger amounts of data and data types, than URL parameters
            2. using an HTTPS request, the data is encrypted, making it more secure
            3. a JSON formated body is easy to read and write between JavaScript and Python, great for Postman testing
            """     
            # Get the passenger data from the request
            passenger = request.get_json()

            # Get the singleton instance of the TitanicModel
            titanicModel = TitanicModel.get_instance()
            # Predict the survival probability of the passenger
            response = titanicModel.predict(passenger)

            # Return the response as JSON
            return jsonify(response)

    api.add_resource(_Predict, '/predict')

The model calculates two predictions death or alive.

User Interface

We designed the frontend with simplicity and clarity in mind. Key features include:

  • A clean form with clearly labeled input fields for each passenger attribute
  • Instant survival probability results displayed after submission
  • A Chart.js-powered pie chart to visually represent survival vs. non-survival probability
  • A “What does this mean?” tooltip to help users interpret their prediction
  • A public predictions section where users can opt to save and display their results using sql lite table

Future Improvements

Looking ahead, we aim to make the Titanic survival prediction tool more user-focused and open-source-friendly:

  • Improve user authentication so that only registered users can submit predictions
  • Allow users to view and manage their past predictions, creating a more personalized experience
  • Refactor the codebase to be more modular and beginner-friendly, making it easier for contributors to understand and build upon
  • Add inline documentation and usage guides to support new members of the open source coding society

These enhancements will not only make the tool more secure and engaging, but also position it as a valuable resource for collaborative learning and development.