visage-mojo
COMPILING _Foundational ML library written in Mojo from first principles
Overview
visage-mojo is a foundational machine learning library written entirely in Mojo, implementing neural networks and training algorithms from first principles. No PyTorch wrappers, no TensorFlow bindings—just pure, high-performance Mojo code.
The goal: Combine Python's expressiveness with C-level performance, while understanding every line of the stack.
⚠️ Early Development
This is a learning project and research experiment. Expect incomplete implementations, breaking changes, and plenty of exploration. Not production-ready—yet.
What's Implemented
- Basic neural network layers (Dense, Conv2D, etc.)
- Backpropagation from scratch
- Common activation functions (ReLU, Sigmoid, Softmax)
- Loss functions (MSE, CrossEntropy)
- SGD and Adam optimizers
- SIMD-optimized matrix operations
Why Mojo?
Mojo promises Python-like syntax with C/C++ performance. For machine learning, this could be game-changing:
- No Python-C++ boundary overhead
- Native SIMD support built into the language
- Compile-time optimizations without sacrificing readability
- Gradual performance tuning - start high-level, optimize later
visage-mojo explores what's possible when you don't compromise between expressiveness and speed.
Example
Here's what a simple neural network looks like in visage-mojo:
from visage import Sequential, Dense, ReLU, Adam
# Define model
model = Sequential(
Dense(784, 128),
ReLU(),
Dense(128, 10)
)
# Train
optimizer = Adam(lr=0.001)
model.fit(X_train, y_train, optimizer, epochs=10) Familiar syntax, but running at native speed with SIMD-optimized operations under the hood.
Performance Goals
visage-mojo aims to match or exceed PyTorch/TensorFlow performance for core operations:
- Matrix multiplication: Competitive with BLAS libraries
- Backpropagation: Minimal overhead compared to frameworks
- Training: Fast iteration times without JIT warm-up
- Inference: Single-digit millisecond latency on CPU
Early benchmarks show promising results, but there's a long way to go.
Roadmap
- Complete core layer implementations
- Add more optimizers (AdamW, RMSProp)
- Implement automatic differentiation engine
- GPU acceleration via Metal/CUDA
- Model serialization/loading
- Benchmarking suite vs PyTorch
Learning from This Project
visage-mojo is built for learning. The code is heavily commented, explaining not just what it does, but why and how. Perfect if you want to understand:
- How neural networks actually work (no black boxes)
- What backpropagation really does mathematically
- How to optimize matrix operations with SIMD
- Performance tradeoffs in ML framework design
Contributing
This project is in early stages and welcomes contributors. Whether you're learning Mojo, exploring ML fundamentals, or just want to see what's possible—check out the repository.