SimpleNNs.jl
SimpleNNs.jl is heavily inspired by SimpleChains.jl, which showed that there is space for micro-optimisations to be very important for small neural networks (see the blog post). This project aims to expand upon SimpleChains.jl by introducing both CPU and GPU support with zero-allocation inference and training.
Key Features
- Zero-allocation inference: Pre-allocated buffers eliminate memory allocations during forward and backward passes
- GPU acceleration: Full CUDA support for both training and inference
- High performance: Optimised implementations for small to medium-sized networks
- Memory efficient: Flat parameter vectors and pre-allocated caches minimise memory usage
Package Goals
As the name suggests, this is not a fully featured neural network library, and most notably, it does not include auto-differentiation capabilities. The specific goals of this package are:
- Simple architectures: Build neural networks with dense and convolutional layers
- Flat parameters: All model parameters stored in a single vector for easy manipulation
- Pre-allocated computation: Zero-allocation forward and backward passes using pre-allocated buffers
- Cross-platform: Execution on both CPU and GPU (CUDA)
- High performance: Optimised for small to medium neural networks where micro-optimisations matter
Supported Features
Layer Types
- Dense layers: Fully connected layers with customisable activation functions
- Convolutional layers: 2D convolutions with ReLU, tanh, and sigmoid activations
- Pooling layers: Max pooling with configurable pool sizes and strides
- Utility layers: Static input specification and flattening layers
Activation Functions
- ReLU (
relu) - Hyperbolic tangent (
tanh,tanh_fast) - Logistic sigmoid (
sigmoid) - Identity (
identity)
Loss Functions
- Mean Squared Error (
MSELoss) - Cross Entropy Loss (
LogitCrossEntropyLoss)
GPU Support
- Full CUDA acceleration through
CUDA.jl,cuDNN.jl, andNNlib.jl - Seamless CPU/GPU model transfer with the
gpu()function - Optimised GPU kernels for convolution and dense operations
Quick Start
Here's a minimal example to get you started:
using SimpleNNs
# Create a simple neural network
model = chain(
Static(4), # 4 input features
Dense(8, activation_fn=relu), # Hidden layer with ReLU
Dense(1, activation_fn=identity) # Output layer
)
# Generate some data
batch_size = 32
inputs = randn(Float32, 4, batch_size)
targets = randn(Float32, 1, batch_size)
# Pre-allocate computation buffers
forward_cache = preallocate(model, batch_size)
backward_cache = preallocate_grads(model, batch_size)
# Set inputs and run forward pass
set_inputs!(forward_cache, inputs)
forward!(forward_cache, model)
outputs = get_outputs(forward_cache)
# Define loss and run backward pass
loss = MSELoss(targets)
total_loss = backprop!(backward_cache, forward_cache, model, loss)
# Access gradients
grads = gradients(backward_cache)Performance Philosophy
SimpleNNs.jl is designed around the principle that for small to medium neural networks, careful memory management and micro-optimisations can provide significant performance benefits. Key design decisions include:
- Pre-allocation: All memory is allocated upfront, eliminating allocations during computation
- Flat parameters: Single parameter vector enables efficient gradient updates and serialisation
- Type stability: Careful type design ensures fast, predictable performance
- GPU optimisation: Custom CUDA kernels and NNlib integration for accelerated computation
When to Use SimpleNNs.jl
SimpleNNs.jl is ideal for:
- Small to medium networks where performance matters
- Embedded applications requiring minimal memory footprint
- Research applications needing fine control over memory and computation
- GPU-accelerated inference with minimal overhead
- Applications requiring many small models (e.g., ensemble methods)
Consider other frameworks like Flux.jl or Lux.jl for:
- Very large deep learning models
- Complex architectures requiring auto-differentiation
- Research requiring cutting-edge layer types
- Applications where development speed > runtime performance
Documentation Structure
Installation
Add the package using Julia's package manager:
using Pkg
Pkg.add("https://github.com/JamieMair/SimpleNNs.jl")For GPU support, also install the CUDA ecosystem:
Pkg.add(["CUDA", "cuDNN", "NNlib"])Contributing
Contributions are welcome! Please see the GitHub repository for issue tracking and pull requests.
Acknowledgments
This package is inspired by and builds upon the excellent work in:
SimpleChains.jl- The original high-performance "small" neural network packageNNLib.jl- Provides the CUDA implementation for some of the supported layersCUDA.jl- Allows seamless GPU support in this package