Author: Chaim Rand
-

Optimizing highly parallel AI algorithm execution
11 min read -

Capturing and reproducing failures in PyTorch training with Lightning
10 min read -

Efficient Metric Collection in PyTorch: Avoiding the Performance Pitfalls of TorchMetrics
Machine LearningMetric collection is an essential part of every machine learning project, enabling us to track…
13 min read -

How PyTorch NestedTensors, FlashAttention2, and xFormers can Boost Performance and Reduce AI Costs
17 min read -

Increasing Transformer Model Efficiency Through Attention Layer Optimization
Artificial IntelligenceHow paying “better” attention can drive ML cost savings
16 min read -

Accelerating AI/ML Model Training with Custom Operators – Part 4
14 min read -

Tips for accelerating ML with AWS Neuron SDK
11 min read -

Accelerating AI/ML Model Training with Custom Operators – Part 3.A
13 min read -

Accelerating AI/ML Model Training with Custom Operators – Part 3
17 min read -

Revisiting CPU for ML in an Era of GPU Scarcity
16 min read