Assignment 3: Implementing Linear and Logistic Regression

Explore the journey of predicting house prices and customer churn using regression and classification techniques. From preprocessing large datasets to building insightful models, this blog highlights key takeaways and practical lessons from the assignment.

Table of Contents

  1. Overview of the Problem Statement
  2. Folder Structure
  3. Approach and Methodology
  4. Results and Learnings
  5. Final Thoughts

Overview of the Problem Statement

Machine learning is all about extracting insights and making predictions from data. As part of the Innoquest Cohort-1 Machine Learning module, I tackled an exciting assignment focusing on regression and classification techniques. Here’s a quick overview:

  1. Regression Task: Explore the Ames Housing dataset to predict house prices.
    • Implement Simple Linear Regression, Multiple Linear Regression, and Polynomial Regression.
    • Preprocess the data thoroughly for accurate model performance.
  2. Classification Task: Work with two datasets to apply Logistic Regression and Multinomial Logistic Regression.
    • Focus on binary and multiclass classification problems.
    • Evaluate model performance using metrics like accuracy and confusion matrices.

Folder Structure

To keep the project organized and streamlined, I followed a structured approach:

Main Directory

  • Text Files:
    • requirements.txt: Listed libraries and dependencies.
    • encoding.txt: Documented how columns were encoded for the regression task.
    • data.json: Contained custom Seaborn palettes for visualization.
  • Datasets:
    • Ames Housing Dataset:
      • raw_data, without_na (handled missing values), without_ols (removed outliers), encoded_unscaled, scaled, and most_imp_39_features (features selected after correlation analysis).
    • Customer Churn Dataset:
      • raw_data, encoded_unscaled (dataset prepared for modeling).
  • Notebooks:
    • Preprocessing: Data cleaning, encoding, scaling, and feature selection.
    • Model Building: Notebooks for each regression and classification model.

Approach and Methodology

1. Regression Task: Predicting House Prices

The Ames Housing dataset was a challenging yet rewarding choice, given its complexity. The original dataset had over 250 features after encoding. Here’s how I tackled it:

  • Data Preprocessing:
    • Handled missing values.
    • Removed outliers using statistical techniques.
    • Scaled features to ensure model compatibility.
    • Reduced feature size to 39 by analyzing correlations.
  • Model Building:
    • Explored Simple Linear Regression for single-feature predictions.
    • Implemented Multiple Linear Regression for multivariate analysis.
    • Enhanced predictions with Polynomial Regression to capture non-linear patterns.

2. Classification Task: Predicting Customer Churn

For the classification task, I chose the Customer Churn dataset, which was slightly more manageable:

  • Binary Classification: Applied Logistic Regression to predict whether customers would churn.
  • Multiclass Classification: Used Multinomial Logistic Regression for multiclass problems (though less relevant for this dataset).

Visualizations and Business Insights

Although business exploration for the Ames Housing dataset was limited, I visualized key trends in the Customer Churn dataset, uncovering patterns like:

  • Features contributing to churn likelihood.
  • Insights into customer demographics and subscription behavior.

Results and Learnings

Regression Task Results

  • RMSE: ~18,000
  • R² Score: 0.89

Classification Task Results

  • Binary Classification Accuracy: ~79%
    • Did not address class imbalance due to time constraints.

Key Takeaways

  • Technical Growth:
    • Gained hands-on experience with large datasets, feature engineering, and model evaluation.
    • Improved proficiency in preprocessing techniques like scaling, encoding, and handling missing data.
  • Time Management:
    • Balancing deep dives into datasets with practical deliverables was a critical lesson.

Final Thoughts

This assignment was not just an exercise in implementing machine learning techniques but also a lesson in real-world problem-solving, where time and data complexity often dictate the scope of exploration. I’m eager to further refine these models and delve deeper into business insights in future projects!

2 Comments

Leave a Reply

Your email address will not be published. Required fields are marked *