College Predictor

Purpose Learn how to predict a student’s chance of college acceptance using a Linear Regression model. Machine learning is flexible! It can be used to evaluate college readiness just like this College Predictor, or for much larger tasks like admissions algorithms or career forecasting.

Data Collection This specific feature was trained using the self-created dataset below. Data can be taken from multiple sources such as school APIs, university admissions reports, or self-reported results. It does not need to be freshly created.

data = {
    'gpa': [3.0, 3.2, 3.5, 3.8, 4.0, 3.9, 3.6, 3.1, 2.9],
    'sat': [1100, 1200, 1300, 1400, 1500, 1480, 1350, 1250, 1000],
    'act': [21, 24, 27, 30, 33, 32, 28, 23, 19],
    'apCount': [2, 3, 5, 7, 10, 9, 6, 4, 1],
    'extracurriculars': [3, 4, 6, 8, 10, 9, 7, 5, 2],
    'chance': [30, 40, 55, 70, 90, 88, 65, 45, 20]
}

Processing The data is processed using a Linear Regression model. This is done by splitting the features (GPA, SAT, ACT, AP Count, and Extracurriculars) into the variable X and the target (Chance of Acceptance) into the variable Y.

Linear regression is mathematical. The X and Y are plotted and a line of best fit is created to most accurately predict the acceptance chance. It can usually be explained using an equation like this: Chance = (a⋅gpa) + (b⋅sat) + (c⋅act) + (d⋅apCount) + (e⋅extracurriculars) Variables a–e are the weights that the model learns.

Here’s an example: If the model learns that a = 10, b = 0.02, and the user inputs a GPA of 3.7 and SAT of 1450, it will calculate part of the equation as (10 × 3.7) + (0.02 × 1450), then combine with the rest to get the final prediction.

Below is the full code on how to split the data.

from sklearn.linear_model import LinearRegression
from sklearn.tree import DecisionTreeRegressor
from sklearn.model_selection import train_test_split
import pandas as pd
import numpy as np

class CollegePredictor:
    _instance = None

    def __init__(self):
        self.model = None
        self.dt = None
        self.features = ['gpa', 'sat', 'act', 'apCount', 'extracurriculars']
        self.target = 'chance'
        self.college_data = self._load_data()

    def _load_data(self):
        # Simulated training data
        data = {
            'gpa': [3.0, 3.2, 3.5, 3.8, 4.0, 3.9, 3.6, 3.1, 2.9],
            'sat': [1100, 1200, 1300, 1400, 1500, 1480, 1350, 1250, 1000],
            'act': [21, 24, 27, 30, 33, 32, 28, 23, 19],
            'apCount': [2, 3, 5, 7, 10, 9, 6, 4, 1],
            'extracurriculars': [3, 4, 6, 8, 10, 9, 7, 5, 2],
            'chance': [30, 40, 55, 70, 90, 88, 65, 45, 20]
        }
        return pd.DataFrame(data)

    def _train(self):
        X = self.college_data[self.features]
        y = self.college_data[self.target]

        self.model = LinearRegression()
        self.model.fit(X, y)

        self.dt = DecisionTreeRegressor()
        self.dt.fit(X, y)

    @classmethod
    def get_instance(cls):
        if cls._instance is None:
            cls._instance = cls()
            cls._instance._train()
        return cls._instance

    def predict(self, data):
        df = pd.DataFrame([data])
        prediction = self.model.predict(df)[0]
        return {'predicted_chance': round(prediction, 2)}

    def importance(self):
        importances = self.dt.feature_importances_
        return {feature: round(importance, 2) for feature, importance in zip(self.features, importances)}

def testCollege():
    print("Step 1: Sample Student Data")
    student = {
        'gpa': 3.7,
        'sat': 1450,
        'act': 31,
        'apCount': 6,
        'extracurriculars': 8
    }
    print("\t", student)
    model = CollegePredictor.get_instance()

    print("Step 2: Predict Acceptance Chance")
    prediction = model.predict(student)
    print("\t Predicted Acceptance Chance:", prediction)

    print("Step 3: Feature Importance")
    importances = model.importance()
    for k, v in importances.items():
        print(f"\t {k}: {v}")

if __name__ == "__main__":
    testCollege()

Using this code, a test run of the program can be done, displaying how well the regression model and feature is working. If executing, it will show the sample input (student stats), the predicted acceptance chance, and the importance of each feature. The next step is to create the API endpoint.

API SETUP 
from flask import Blueprint, request, jsonify
from flask_restful import Api, Resource
from model.college import CollegePredictor

college_api = Blueprint('college_api', __name__, url_prefix='/api/college')
api = Api(college_api)

class CollegeAPI:
    class _Predict(Resource):
        def post(self):
            data = request.get_json()
            model = CollegePredictor.get_instance()
            prediction = model.predict(data)
            return jsonify(prediction)

    class _FeatureImportance(Resource):
        def get(self):
            model = CollegePredictor.get_instance()
            importance = model.importance()
            return jsonify(importance)

    api.add_resource(_Predict, '/predict')
    api.add_resource(_FeatureImportance, '/features')

Split your main model (CollegeAPI) into sub-classes. Each of these classes is what the API endpoint will be named (Predict class is /predict). Remember to update the main.py.

Now, the Output A simple frontend can be created by attaching the API endpoint (/api/college/predict). In order to display different images based on the result given by the Linear Regression model, we can use conditional statements.

if (chance > 80) {
  imageSrc = "/images/accepted.png";
  message = "You have a strong chance! 🎉";
} else if (chance < 40) {
  imageSrc = "/images/uncertain.png";
  message = "Consider adding safety schools.";
}

If the predicted chance is over 80, show an “accepted” image. If it is below 40, show a caution image. These ranges can be customized by the programmer.

Next Steps What can we do next with this? How can we improve? Here are some suggestions:

Add the ability to switch between different universities.

Include more features like essay score, class rank, and recommendation strength.

Learn about other data models (like classification) to improve accuracy.

Conclusion This project shows how linear regression can predict a student’s college acceptance chance based on stats like GPA and test scores. By combining a trained model with a simple frontend, we can make machine learning both educational and helpful.

It’s a great example of how data science can be used creatively. With more detailed data and refined models, this predictor could evolve into a powerful tool for future applicants.