Writing · April 10, 2026

What is AI? Algorithms, Neural Networks & Predictive Models Explained (2026)

What is AI? Algorithms, Neural Networks & Predictive Models Explained (2026)

A complete guide to AI algorithms — regression, classification, clustering, neural networks, and time series. With code examples, flow diagrams, and real-world use cases for engineers.

AIMLOpsCloudAWSAzure


What is AI? {#what-is-ai}

Artificial Intelligence is the field of building systems that perform tasks that normally require human intelligence — recognising images, understanding language, making decisions, and predicting outcomes.

The term covers a wide spectrum:

Term Definition Example
AI Machines that simulate intelligent behaviour Chess engines, recommendation systems
Machine Learning (ML) Systems that learn patterns from data Spam filters, fraud detection
Deep Learning (DL) ML using multi-layer neural networks Image recognition, GPT models
Generative AI Models that create new content ChatGPT, Stable Diffusion, GitHub Copilot

The key insight of modern AI: instead of programming rules, you feed data and let the algorithm find the rules itself.


Types of Machine Learning {#types-of-ml}

                    ┌─────────────────────────────┐
                    │      Machine Learning        │
                    └──────────────┬──────────────┘
                                   │
          ┌────────────────────────┼────────────────────────┐
          │                        │                        │
          ▼                        ▼                        ▼
  ┌───────────────┐      ┌─────────────────┐      ┌────────────────┐
  │  Supervised   │      │  Unsupervised   │      │ Reinforcement  │
  │   Learning    │      │   Learning      │      │   Learning     │
  └───────┬───────┘      └────────┬────────┘      └───────┬────────┘
          │                       │                        │
  Labelled data            Unlabelled data          Reward signals
  Regression               Clustering               Game playing
  Classification           Association rules        Robotics
  Time series              Dimensionality           Self-driving cars
                           reduction

Predictive Models {#predictive-models}

Predictive models estimate or classify future outcomes based on historical data.

Regression — Predicting Continuous Values {#regression}

Regression answers: "How much?" or "How many?"

Linear regression fits a straight line through data points to predict a continuous output.

y = mx + b

y  = predicted value (e.g. house price)
x  = input feature (e.g. square footage)
m  = slope (weight learned from data)
b  = intercept (bias term)

Multiple regression extends this to many input features:

y = w₁x₁ + w₂x₂ + w₃x₃ + ... + b

Python example — predicting cloud infrastructure cost:

from sklearn.linear_model import LinearRegression
import numpy as np

# Features: [num_instances, storage_tb, data_transfer_gb]
X_train = np.array([
    [10, 2, 500],
    [50, 10, 2000],
    [100, 25, 5000],
    [200, 50, 10000],
])
y_train = np.array([1200, 5800, 11500, 23000])  # monthly cost in USD

model = LinearRegression()
model.fit(X_train, y_train)

# Predict cost for a new environment
new_env = np.array([[75, 15, 3000]])
predicted_cost = model.predict(new_env)
print(f"Predicted monthly cost: ${predicted_cost[0]:,.0f}")
# Output: Predicted monthly cost: $8,650

Real-world use cases:


Classification — Categorising Data {#classification}

Classification answers: "Which category does this belong to?"

Decision Trees

A decision tree splits data by asking yes/no questions at each node:

                    ┌─────────────────────┐
                    │  Email received?     │
                    └──────────┬──────────┘
                               │
              ┌────────────────┴────────────────┐
              │                                 │
     Contains "FREE MONEY"?            From known sender?
              │                                 │
         Yes ─┤                            Yes ─┤
              ▼                                 ▼
          ┌───────┐                        ┌────────┐
          │ SPAM  │                        │  INBOX │
          └───────┘                        └────────┘
              │ No                              │ No
              ▼                                ▼
     Has suspicious links?            Check domain reputation
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split

# Features: [has_free_money, unknown_sender, suspicious_links, caps_ratio]
X = [[1,1,1,0.8], [0,0,0,0.1], [1,0,1,0.6], [0,1,0,0.2], [1,1,0,0.7]]
y = [1, 0, 1, 0, 1]  # 1=spam, 0=not spam

clf = DecisionTreeClassifier(max_depth=3)
clf.fit(X, y)

new_email = [[1, 1, 0, 0.5]]
print("Spam?" , "Yes" if clf.predict(new_email)[0] else "No")

Support Vector Machines (SVM)

SVM finds the maximum margin hyperplane — the widest possible boundary between classes:

     Class A (●)    │    Class B (○)
                    │
    ●    ●          │          ○    ○
         ●    ●─────┼─────○    ○
    ●         ●     │     ○         ○
                    │
              ← margin →
              (maximised)

Best for: high-dimensional data, text classification, image recognition with small datasets.

Neural Networks for Classification

Neural networks learn non-linear decision boundaries that decision trees and SVMs struggle with.

from sklearn.neural_network import MLPClassifier

# Multi-layer perceptron: 2 hidden layers of 64 and 32 neurons
clf = MLPClassifier(
    hidden_layer_sizes=(64, 32),
    activation='relu',
    max_iter=500,
    random_state=42
)
clf.fit(X_train, y_train)

Algorithm comparison for classification:

Algorithm Best for Interpretable? Training speed Handles non-linearity
Decision Tree Small datasets, explainability ✅ Yes Fast Limited
Random Forest Tabular data, production ML Partial Medium Good
SVM High-dimensional, small data ❌ No Slow on large data Yes (with kernel)
Neural Network Images, text, complex patterns ❌ No Slow Excellent
Logistic Regression Binary classification baseline ✅ Yes Very fast No

Time Series Forecasting {#time-series}

Time series models analyse data ordered by time to forecast future values.

  Value
    │
 95 ┤                                    ╭──── Forecast
 90 ┤              ╭──╮                 ╱
 85 ┤         ╭───╯  ╰──╮           ╭─╯
 80 ┤    ╭───╯           ╰──╮    ╭─╯
 75 ┤───╯                    ╰──╯
    └────────────────────────────────── Time
    Jan  Feb  Mar  Apr  May  Jun  Jul →
         ◄── Historical data ──►  ◄─ Predicted ─►

Python example — forecasting API request volume:

import pandas as pd
from statsmodels.tsa.arima.model import ARIMA

# Daily API request counts over 30 days
requests = [1200, 1350, 1100, 1400, 1600, 1800, 1750,
            1900, 2100, 1950, 2200, 2400, 2300, 2500,
            2600, 2450, 2700, 2900, 2800, 3000, 2950,
            3100, 3300, 3200, 3400, 3600, 3500, 3700,
            3900, 3800]

series = pd.Series(requests)

# ARIMA(p=2, d=1, q=2) — autoregressive integrated moving average
model = ARIMA(series, order=(2, 1, 2))
result = model.fit()

# Forecast next 7 days
forecast = result.forecast(steps=7)
print("Next 7 days forecast:", forecast.round(0).tolist())

Common time series algorithms:

Algorithm Use case Handles seasonality
ARIMA Stationary trends, financial data No (use SARIMA)
SARIMA Seasonal patterns (weekly, yearly) ✅ Yes
Prophet (Meta) Business metrics with holidays ✅ Yes
LSTM (deep learning) Complex non-linear sequences ✅ Yes
Transformer (Temporal Fusion) Multi-variate, long horizons ✅ Yes

Descriptive Models {#descriptive-models}

Descriptive models reveal hidden structure in data — no prediction, no labels. They answer: "What patterns exist?"

Clustering {#clustering}

Clustering groups similar data points together without predefined categories.

K-Means Clustering

  Step 1: Place K centroids randomly
  Step 2: Assign each point to nearest centroid
  Step 3: Move centroids to mean of their cluster
  Step 4: Repeat until centroids stop moving

  Before:                After (K=3):
  · · · · · ·            ● ● ▲ ▲ ■ ■
  · · · · · ·            ● ● ▲ ▲ ■ ■
  · · · · · ·            ● ● ▲ ▲ ■ ■
  (no structure)         (3 clusters found)
from sklearn.cluster import KMeans
import numpy as np

# Customer data: [monthly_spend, support_tickets, login_frequency]
customers = np.array([
    [500, 1, 30], [480, 2, 28], [520, 1, 32],   # High-value, low-touch
    [100, 8, 5],  [90, 10, 3], [110, 7, 6],      # Low-value, high-support
    [300, 3, 15], [280, 4, 12], [320, 3, 18],    # Mid-tier
])

kmeans = KMeans(n_clusters=3, random_state=42, n_init=10)
kmeans.fit(customers)

labels = kmeans.labels_
print("Cluster assignments:", labels)
# Output: [0 0 0 1 1 1 2 2 2]
# Cluster 0 = high-value, Cluster 1 = at-risk, Cluster 2 = mid-tier

Hierarchical Clustering

Builds a tree (dendrogram) of nested clusters — no need to specify K upfront:

  Distance
    │
  5 ┤                    ┌──────────┐
    │                    │          │
  3 ┤          ┌─────────┤     ┌────┤
    │          │         │     │    │
  1 ┤    ┌─────┤    ┌────┤  ┌──┤ ┌──┤
    │    │     │    │    │  │  │ │  │
    └────┴─────┴────┴────┴──┴──┴─┴──┴──
         A     B    C    D  E  F G  H

Cut the dendrogram at any height to get different numbers of clusters.


Association Rules — Finding Item Relationships {#association}

Association rules find items that frequently appear together. The classic example is market basket analysis.

Key metrics:

Support    = P(A and B)           — how often A and B appear together
Confidence = P(B | A)             — given A, how likely is B?
Lift       = Confidence / P(B)    — is the association stronger than random?

The Apriori algorithm:

  Transaction data:
  T1: {bread, butter, milk}
  T2: {bread, butter}
  T3: {bread, milk, eggs}
  T4: {butter, milk}
  T5: {bread, butter, milk, eggs}

  Step 1 — Find frequent individual items (min support = 60%):
  bread:  4/5 = 80% ✅
  butter: 4/5 = 80% ✅
  milk:   4/5 = 80% ✅
  eggs:   2/5 = 40% ❌

  Step 2 — Find frequent pairs:
  {bread, butter}: 3/5 = 60% ✅
  {bread, milk}:   3/5 = 60% ✅
  {butter, milk}:  3/5 = 60% ✅

  Step 3 — Generate rules:
  bread → butter  (confidence: 3/4 = 75%, lift: 0.75/0.8 = 0.94)
  butter → milk   (confidence: 3/4 = 75%, lift: 0.75/0.8 = 0.94)
from mlxtend.frequent_patterns import apriori, association_rules
import pandas as pd

# One-hot encoded transaction data
data = {
    'bread':  [1,1,1,0,1],
    'butter': [1,1,0,1,1],
    'milk':   [1,0,1,1,1],
    'eggs':   [0,0,1,0,1],
}
df = pd.DataFrame(data)

frequent_items = apriori(df, min_support=0.6, use_colnames=True)
rules = association_rules(frequent_items, metric="lift", min_threshold=0.9)
print(rules[['antecedents', 'consequents', 'support', 'confidence', 'lift']])

Real-world applications:


Summarization & Sequence Discovery {#summarization}

Summarization condenses large datasets into representative patterns:

Sequence discovery finds recurring patterns in ordered data:

from sklearn.decomposition import PCA
import numpy as np

# 100 cloud metrics with 20 dimensions — reduce to 2 for visualisation
np.random.seed(42)
metrics = np.random.randn(100, 20)

pca = PCA(n_components=2)
reduced = pca.fit_transform(metrics)

print(f"Variance explained: {pca.explained_variance_ratio_.sum():.1%}")
# Output: Variance explained: 18.3%
# (first 2 components capture 18% of total variance)

How Neural Networks Work {#neural-networks}

A neural network is a system of interconnected nodes (neurons) organised in layers. Each connection has a weight — a number that gets adjusted during training.

Architecture

  INPUT LAYER        HIDDEN LAYER 1     HIDDEN LAYER 2     OUTPUT LAYER
  (raw features)     (64 neurons)       (32 neurons)       (prediction)

  ┌───┐              ┌───┐              ┌───┐
  │x₁ │──────────────│   │──────────────│   │──────────────┐
  └───┘         ╱────│   │         ╱────│   │              │   ┌───────┐
  ┌───┐        ╱     └───┘        ╱     └───┘              ├──▶│  ŷ   │
  │x₂ │───────╱      ┌───┐       ╱      ┌───┐              │   └───────┘
  └───┘       ╲──────│   │──────╱  ╱────│   │──────────────┘   (output)
  ┌───┐        ╲     └───┘      ╲ ╱     └───┘
  │x₃ │─────────╲────┌───┐      ╲╱      ┌───┐
  └───┘           ╲──│   │──────╱╲──────│   │
  ┌───┐               └───┘              └───┘
  │x₄ │               ...                ...
  └───┘
  (features)         (learn low-level    (learn high-level
                      patterns)           abstractions)

Forward Pass — How a Prediction is Made

  1. Input features enter the network
  2. Each neuron computes: z = Σ(weight × input) + bias
  3. Activation function applied: a = ReLU(z) = max(0, z)
  4. Output flows to next layer
  5. Final layer produces prediction ŷ
import numpy as np

def relu(z):
    return np.maximum(0, z)

def sigmoid(z):
    return 1 / (1 + np.exp(-z))

# Single neuron forward pass
inputs  = np.array([0.5, 0.3, 0.8])   # x
weights = np.array([0.4, 0.7, 0.2])   # w (learned)
bias    = 0.1                           # b (learned)

z = np.dot(weights, inputs) + bias     # weighted sum
a = relu(z)                            # activation
print(f"Neuron output: {a:.4f}")       # 0.5300

Backpropagation — How the Network Learns

  Forward pass  →  Compute prediction ŷ
                ↓
  Compute loss  →  Loss = (ŷ - y)²   (mean squared error)
                ↓
  Backward pass →  Compute gradient of loss w.r.t. each weight
                ↓
  Update weights → w = w - learning_rate × gradient
                ↓
  Repeat for all training examples (one epoch)
  Repeat for many epochs until loss converges

Full Training Loop in PyTorch

import torch
import torch.nn as nn

# Define a simple 3-layer network
class CloudCostPredictor(nn.Module):
    def __init__(self):
        super().__init__()
        self.net = nn.Sequential(
            nn.Linear(3, 64),   # 3 input features → 64 neurons
            nn.ReLU(),
            nn.Linear(64, 32),  # 64 → 32 neurons
            nn.ReLU(),
            nn.Linear(32, 1),   # 32 → 1 output (cost prediction)
        )

    def forward(self, x):
        return self.net(x)

model     = CloudCostPredictor()
optimizer = torch.optim.Adam(model.parameters(), lr=0.001)
loss_fn   = nn.MSELoss()

# Training loop
for epoch in range(100):
    predictions = model(X_train_tensor)
    loss        = loss_fn(predictions, y_train_tensor)

    optimizer.zero_grad()   # clear previous gradients
    loss.backward()         # compute gradients (backprop)
    optimizer.step()        # update weights

    if epoch % 10 == 0:
        print(f"Epoch {epoch:3d} | Loss: {loss.item():.4f}")

Activation Functions Compared

Function Formula Use case Output range
ReLU max(0, x) Hidden layers (default) [0, ∞)
Sigmoid 1/(1+e⁻ˣ) Binary classification output (0, 1)
Softmax eˣⁱ/Σeˣ Multi-class output (0, 1), sums to 1
Tanh (eˣ-e⁻ˣ)/(eˣ+e⁻ˣ) RNNs, some hidden layers (-1, 1)
GELU x·Φ(x) Transformers (BERT, GPT) (-0.17, ∞)

Running AI on AWS & Azure in 2026 {#cloud-ai}

As a cloud engineer, you don't always train models from scratch. Here's the practical landscape:

AWS AI/ML Stack

  ┌─────────────────────────────────────────────────────┐
  │                   AWS AI Services                    │
  ├─────────────────┬───────────────────────────────────┤
  │  Pre-built APIs │  Rekognition (vision)              │
  │  (no ML needed) │  Textract (document OCR)           │
  │                 │  Comprehend (NLP)                  │
  │                 │  Forecast (time series)            │
  ├─────────────────┼───────────────────────────────────┤
  │  Foundation     │  Bedrock (Claude, Llama, Titan)    │
  │  Models         │  SageMaker JumpStart               │
  ├─────────────────┼───────────────────────────────────┤
  │  Custom         │  SageMaker Training Jobs           │
  │  Training       │  SageMaker Pipelines (MLOps)       │
  │                 │  SageMaker Endpoints (inference)   │
  └─────────────────┴───────────────────────────────────┘

Deploy a SageMaker endpoint in 10 lines:

import boto3
import sagemaker
from sagemaker.sklearn import SKLearn

role    = sagemaker.get_execution_role()
session = sagemaker.Session()

estimator = SKLearn(
    entry_point="train.py",
    role=role,
    instance_type="ml.m5.xlarge",
    framework_version="1.2-1",
)
estimator.fit({"train": "s3://my-bucket/train-data/"})

predictor = estimator.deploy(
    initial_instance_count=1,
    instance_type="ml.t2.medium",
)
print(predictor.predict([[75, 15, 3000]]))

Azure ML Stack

from azure.ai.ml import MLClient, command
from azure.identity import DefaultAzureCredential

ml_client = MLClient(
    DefaultAzureCredential(),
    subscription_id="<subscription-id>",
    resource_group_name="ml-rg",
    workspace_name="ml-workspace",
)

job = command(
    code="./src",
    command="python train.py --data $",
    inputs={"training_data": "azureml:my-dataset:1"},
    environment="AzureML-sklearn-1.0-ubuntu20.04-py38-cpu@latest",
    compute="cpu-cluster",
)
returned_job = ml_client.jobs.create_or_update(job)
print(f"Job submitted: {returned_job.name}")

FAQ {#faq}

What is the difference between AI, machine learning, and deep learning? AI is the broad field. Machine learning is a subset where systems learn from data. Deep learning is a subset of ML using multi-layer neural networks. Every deep learning model is ML, and every ML model is AI — but not vice versa.

What is the best algorithm for beginners to learn first? Linear regression. It is simple, interpretable, and the mathematical foundation for everything else. Once you understand gradient descent on linear regression, neural networks make intuitive sense.

How do I run machine learning models on AWS in 2026? Use AWS SageMaker for custom training and deployment. Use AWS Bedrock for foundation models (Claude, Llama) without managing infrastructure. Use pre-built services like Rekognition, Textract, and Comprehend for common tasks without any ML code.

What is the difference between supervised and unsupervised learning? Supervised learning trains on labelled data — you provide inputs and correct outputs. Unsupervised learning finds hidden structure in unlabelled data. Clustering and association rules are unsupervised. Regression and classification are supervised.

How many layers does a neural network need? Minimum 3: input, one hidden, output. Deep learning uses many hidden layers — ResNet-50 has 50 layers, GPT-4 has hundreds. More layers learn more abstract representations but require more data and compute to train.

What is overfitting and how do I prevent it? Overfitting is when a model memorises training data instead of learning general patterns — it performs well on training data but poorly on new data. Prevent it with: more training data, dropout layers, regularisation (L1/L2), early stopping, and cross-validation.


Building AI infrastructure on AWS or Azure? Let's connect on LinkedIn.


Comments & Reactions