Writing · April 16, 2026
Decision Trees vs Random Forests: Which is Better for Classification? (2026)
Decision Trees vs Random Forests — a complete comparison for classification tasks. How each algorithm works, when to use which, bias-variance tradeoff, feature importance, and Python examples.
How Decision Trees Work {#decision-trees}
A decision tree learns a series of if/else rules from training data. At each node, it finds the feature and threshold that best separates the classes.
TRAINING DATA: Predict if a cloud instance should be rightsized
Features: [cpu_avg, memory_avg, network_avg, age_days]
Label: 1 = rightsize, 0 = keep
LEARNED TREE:
┌─────────────────────┐
│ cpu_avg < 20%? │
└──────────┬──────────┘
│
┌────────────────┴────────────────┐
YES NO
│ │
▼ ▼
┌───────────────────┐ ┌───────────────────┐
│ memory_avg < 30%?│ │ Keep instance │
└─────────┬─────────┘ │ (well utilised) │
│ └───────────────────┘
┌────┴────┐
YES NO
│ │
▼ ▼
┌─────────┐ ┌─────────┐
│Rightsize│ │ Keep │
│ (waste) │ │(memory │
└─────────┘ │ bound) │
└─────────┘
How splits are chosen — Gini Impurity:
Gini Impurity = 1 - Σ(pᵢ²)
Pure node (all one class): Gini = 1 - (1² + 0²) = 0
Perfectly mixed (50/50): Gini = 1 - (0.5² + 0.5²) = 0.5
The algorithm picks the split that minimises weighted Gini
across the two resulting child nodes.
Problems with Single Decision Trees {#problems}
Decision trees have one critical weakness: they overfit.
TRAINING DATA: TEST DATA:
───────────── ──────────
Accuracy: 99% Accuracy: 71%
The tree memorised the training data.
It learned noise, not signal.
Why this happens:
A deep decision tree can create a unique leaf
for every training example — perfect training
accuracy, terrible generalisation.
Depth 1: Underfits (too simple)
Depth 5: Good generalisation
Depth 20: Overfits (memorises training data)
Unlimited: Perfect training accuracy, poor test accuracy
How Random Forests Work {#random-forests}
Random Forest fixes overfitting by building many trees and averaging their predictions. Two sources of randomness make each tree different:
RANDOM FOREST TRAINING
Original dataset (N rows, M features)
│
├── Bootstrap sample 1 (random N rows with replacement)
│ + Random subset of features (√M features)
│ → Train Tree 1
│
├── Bootstrap sample 2 (different random N rows)
│ + Different random subset of features
│ → Train Tree 2
│
├── Bootstrap sample 3
│ → Train Tree 3
│
└── ... repeat for 100-500 trees
PREDICTION:
New sample → run through all trees
Classification: majority vote
Regression: average of all predictions
Tree 1 says: RIGHTSIZE
Tree 2 says: KEEP
Tree 3 says: RIGHTSIZE
Tree 4 says: RIGHTSIZE
Tree 5 says: KEEP
...
Tree 100 says: RIGHTSIZE
Final prediction: RIGHTSIZE (majority vote)
The magic: each tree overfits to its bootstrap sample, but their errors are uncorrelated — they overfit in different directions. Averaging cancels out the noise.
Head-to-Head Comparison {#comparison}
| Property | Decision Tree | Random Forest |
|---|---|---|
| Accuracy | Moderate | High |
| Overfitting | High risk | Low risk |
| Interpretability | ✅ Fully interpretable | ❌ Black box |
| Training speed | Fast | Slower (100s of trees) |
| Prediction speed | Very fast | Slower (100s of trees) |
| Feature importance | Yes | Yes (more reliable) |
| Handles missing data | Needs imputation | More robust |
| Hyperparameters | depth, min_samples | n_estimators, max_features, depth |
| Memory usage | Low | High |
| Best for | Explainability, small data | Production ML, tabular data |
Bias-Variance Tradeoff {#bias-variance}
ERROR = BIAS² + VARIANCE + IRREDUCIBLE NOISE
HIGH BIAS (underfitting):
Model too simple, misses patterns
Training error: high
Test error: high
HIGH VARIANCE (overfitting):
Model too complex, memorises noise
Training error: low
Test error: high
SWEET SPOT:
Training error: low-medium
Test error: low
┌─────────────────────────────────────────────┐
│ Error │
│ │ │
│ │ Total Error │
│ │ ╲ ╱ │
│ │ ╲ ╱ │
│ │ ╲ ╭──────────╮╱ │
│ │ ╲ ╱ Variance ╲ │
│ │ ╲╱ ╲ │
│ │ ╱╲ Bias² ╲ │
│ │ ╱ ╲──────────────► │
│ └────────────────────────── Model │
│ Simple Complex │
└─────────────────────────────────────────────┘
Decision Tree (deep): Low bias, HIGH variance
Random Forest: Low bias, LOW variance ← ideal
Feature Importance {#feature-importance}
Random forests provide reliable feature importance scores — how much each feature contributes to predictions:
from sklearn.ensemble import RandomForestClassifier
import pandas as pd
import numpy as np
# Cloud instance rightsizing dataset
feature_names = ['cpu_avg_pct', 'memory_avg_pct', 'network_mbps',
'disk_iops', 'age_days', 'instance_family']
X = np.random.rand(1000, 6)
y = (X[:, 0] < 0.2) & (X[:, 1] < 0.3) # rightsize if low CPU and memory
rf = RandomForestClassifier(n_estimators=100, random_state=42)
rf.fit(X, y)
importance = pd.Series(rf.feature_importances_, index=feature_names)
importance_sorted = importance.sort_values(ascending=False)
print("Feature Importance:")
for feat, score in importance_sorted.items():
bar = '█' * int(score * 50)
print(f" {feat:<20} {bar} {score:.3f}")
# Output:
# cpu_avg_pct ████████████████████████ 0.481
# memory_avg_pct ████████████████ 0.312
# age_days ████ 0.089
# network_mbps ███ 0.062
# disk_iops ██ 0.038
# instance_family █ 0.018
Python Implementation {#code}
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split, cross_val_score
from sklearn.metrics import classification_report
import numpy as np
# Generate synthetic cloud rightsizing data
np.random.seed(42)
n = 2000
X = np.column_stack([
np.random.uniform(0, 100, n), # cpu_avg
np.random.uniform(0, 100, n), # memory_avg
np.random.uniform(0, 1000, n), # network_mbps
np.random.randint(1, 730, n), # age_days
])
y = ((X[:, 0] < 25) & (X[:, 1] < 35)).astype(int)
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, random_state=42
)
# Decision Tree
dt = DecisionTreeClassifier(max_depth=5, random_state=42)
dt.fit(X_train, y_train)
dt_cv = cross_val_score(dt, X, y, cv=5).mean()
# Random Forest
rf = RandomForestClassifier(
n_estimators=100,
max_depth=10,
max_features='sqrt', # √M features per split
random_state=42,
n_jobs=-1 # use all CPU cores
)
rf.fit(X_train, y_train)
rf_cv = cross_val_score(rf, X, y, cv=5).mean()
print(f"Decision Tree — CV Accuracy: {dt_cv:.3f}")
print(f"Random Forest — CV Accuracy: {rf_cv:.3f}")
print()
print("Random Forest Classification Report:")
print(classification_report(y_test, rf.predict(X_test),
target_names=['Keep', 'Rightsize']))
When to Use Which {#when-to-use}
Use a Decision Tree when:
- You need to explain every prediction (loan approval, medical diagnosis, compliance)
- Stakeholders need to understand the model logic
- Dataset is very small (< 1,000 rows) — random forest may not help much
- Inference speed is critical and you can't afford 100 trees
Use a Random Forest when:
- Accuracy matters more than interpretability
- You have tabular data (the domain where random forests excel)
- You want reliable feature importance scores
- You need a robust baseline before trying deep learning
- Dataset is medium to large (1,000+ rows)
Consider alternatives when:
- Data is images or text → use neural networks
- You need maximum accuracy on tabular data → try XGBoost or LightGBM
- You need full interpretability with high accuracy → try SHAP values on random forest
FAQ {#faq}
What is the difference between a decision tree and a random forest? A decision tree is a single model making splits on features. A random forest is an ensemble of 100–500 trees, each trained on a random data subset with random feature subsets. The forest votes on the final prediction, dramatically reducing overfitting.
When should you use a decision tree instead of a random forest? When interpretability is critical — you need to explain exactly why a prediction was made. Decision trees produce human-readable rules. Random forests are more accurate but are essentially black boxes.
Does random forest always outperform decision trees? Almost always on accuracy. The tradeoff is interpretability and training time. On very small datasets, a well-tuned decision tree may generalise just as well.
What is the best number of trees in a random forest? Start with 100. Accuracy improves up to ~200–500 trees, then plateaus. More trees increase training time without meaningful accuracy gains. Use cross-validation to find the sweet spot for your dataset.
Working on ML classification problems? Let's connect on LinkedIn.
Comments & Reactions