VISAGE-MOJO | Dev Coffee

Overview

visage-mojo is a foundational machine learning library written entirely in Mojo, implementing neural networks and training algorithms from first principles. No PyTorch wrappers, no TensorFlow bindings—just pure, high-performance Mojo code.

The goal: Combine Python’s expressiveness with C-level performance, while understanding every line of the stack.

Early Development

This is a learning project and research experiment. Expect incomplete implementations, breaking changes, and plenty of exploration. Not production-ready—yet.

What’s Implemented

Basic neural network layers (Dense, Conv2D, etc.)
Backpropagation from scratch
Common activation functions (ReLU, Sigmoid, Softmax)
Loss functions (MSE, CrossEntropy)
SGD and Adam optimizers
SIMD-optimized matrix operations

Why Mojo?

Mojo promises Python-like syntax with C/C++ performance. For machine learning, this could be game-changing:

No Python-C++ boundary overhead
Native SIMD support built into the language
Compile-time optimizations without sacrificing readability
Gradual performance tuning - start high-level, optimize later

visage-mojo explores what’s possible when you don’t compromise between expressiveness and speed.

Example

Here’s what a simple neural network looks like in visage-mojo:

from visage import Sequential, Dense, ReLU, Adam

# Define model
model = Sequential(
    Dense(784, 128),
    ReLU(),
    Dense(128, 10)
)

# Train
optimizer = Adam(lr=0.001)
model.fit(X_train, y_train, optimizer, epochs=10)

Familiar syntax, but running at native speed with SIMD-optimized operations under the hood.

Performance Goals

visage-mojo aims to match or exceed PyTorch/TensorFlow performance for core operations:

Matrix multiplication: Competitive with BLAS libraries
Backpropagation: Minimal overhead compared to frameworks
Training: Fast iteration times without JIT warm-up
Inference: Single-digit millisecond latency on CPU

Early benchmarks show promising results, but there’s a long way to go.

Roadmap

Complete core layer implementations
Add more optimizers (AdamW, RMSProp)
Implement automatic differentiation engine
GPU acceleration via Metal/CUDA
Model serialization/loading
Benchmarking suite vs PyTorch

Learning from This Project

visage-mojo is built for learning. The code is heavily commented, explaining not just what it does, but why and how. Perfect if you want to understand:

How neural networks actually work (no black boxes)
What backpropagation really does mathematically
How to optimize matrix operations with SIMD
Performance tradeoffs in ML framework design

Contributing

This project is in early stages and welcomes contributors. Whether you’re learning Mojo, exploring ML fundamentals, or just want to see what’s possible—check out the repository.