Overview
Logistic Regression is a fundamental supervised learning algorithm used for binary classification tasks. In this lesson, we’ll cover the basics of Logistic Regression, how it works, and practical considerations when applying it using Python.
Learning Objectives
- Understand the concept and application of Logistic Regression for binary classification.
- Implement Logistic Regression using Python’s Scikit-Learn library.
- Explore practical applications and considerations for Logistic Regression.
What is Logistic Regression?
Logistic Regression predicts the probability that an instance belongs to a particular class. Despite its name, it’s used for classification rather than regression tasks.
How Logistic Regression Works
Logistic Regression models the probability using the logistic function (sigmoid function):
import numpy as np
import matplotlib.pyplot as plt
# Sigmoid function
def sigmoid(z):
return 1 / (1 + np.exp(-z))
# Generate data
z = np.linspace(-10, 10, 100)
sigma_z = sigmoid(z)
# Plot sigmoid function
plt.figure(figsize=(8, 6))
plt.plot(z, sigma_z, label='Sigmoid Function')
plt.title('Sigmoid Function')
plt.xlabel('z')
plt.ylabel('$\sigma(z)$')
plt.legend()
plt.grid(True)
plt.show()
Implementing Logistic Regression in Python
Here’s how you can implement Logistic Regression using Python’s Scikit-Learn library:
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score
# Load dataset
iris = load_iris()
X = iris.data[:, :2] # Use only 2 features for simplicity
y = (iris.target != 0) * 1 # Convert to binary classification problem (1 = Versicolor, 0 = Setosa)
# Split data into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
# Feature scaling
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)
# Initialize and train Logistic Regression model
model = LogisticRegression(random_state=0)
model.fit(X_train_scaled, y_train)
# Predictions
y_pred = model.predict(X_test_scaled)
# Accuracy
accuracy = accuracy_score(y_test, y_pred)
print(f'Accuracy: {accuracy:.2f}')
Practical Considerations
- Feature Scaling: Logistic Regression benefits from feature scaling for optimal performance.
- Interpretability: Coefficients from Logistic Regression provide insights into feature importance.
- Regularization: Scikit-Learn’s LogisticRegression class supports regularization to prevent overfitting.
Applications and Limitations
- Applications: Logistic Regression is used in various fields such as healthcare (disease diagnosis), finance (credit scoring), and marketing (customer segmentation).
- Limitations: Assumes linear relationship between features and log-odds of the outcome; may not perform well with complex, non-linear relationships without transformations or non-linear variants.
Conclusion
Logistic Regression is a versatile algorithm for binary classification tasks, offering simplicity and interpretability. By implementing it in Python, you can leverage its strengths for practical machine learning applications.