Mykolas Perevicius

Technical Writing

Achieving 4.6× Speedup: Custom CUDA Kernels for AlexNet

December 2024 GPU Computing CUDA

When PyTorch's default kernels aren't fast enough, you write your own. Here's how I optimized AlexNet training on an NVIDIA DGX-1 cluster using custom CUDA kernels and MPI orchestration, the architectural decisions that mattered most, and what I learned about memory coalescing the hard way.

⚡ 4.6× faster than baseline PyTorch

🔧 Custom memory management patterns

📊 Multi-GPU orchestration with MPI

Why I Built Yet Another System Installer (And Why You Might Need One Too)

November 2024 DevOps Automation

Losing access to three machines in one weekend taught me something: your development environment should be reproducible in under an hour. Koala's Forge is my answer to Ninite's $30/year subscription, a free, cross-platform installer supporting 100+ apps with silent installations, dependency resolution, and rollback support.

🐨 100+ applications supported

🔄 Automatic dependency resolution

⏪ Rollback failed installations

The Hidden Cost of Convenience: When Abstractions Leak Performance

October 2024 Performance Systems

High-level frameworks are amazing, until they're not. A deep dive into when PyTorch's conveniences become bottlenecks, why dropping to CUDA gave me 4.6× speedup, and how to know when it's time to stop using abstractions and start writing assembly (or close to it). Plus: the mental model I use to decide when optimization is premature vs. necessary.

🎯 When to optimize (and when not to)

⚙️ Understanding abstraction overhead

🔍 Profiling strategies that actually work

Breaking Things to Understand Them: A Weekend with Distributed Consensus

September 2024 Distributed Systems Learning

I spent a weekend intentionally breaking Raft consensus to understand how it works. What happens when you introduce network partitions? What if followers lie about their log indices? Can you make the cluster elect two leaders? Turns out, theoretical correctness and practical resilience are two very different things.

🔨 Breaking Raft in creative ways

🧪 Network partition experiments

💡 What textbooks don't tell you

Teaching Python by Building Games: Why Education Playground Works

August 2024 Education Python

117 interactive Jupyter notebooks covering Python from basics to GPU programming. The secret? Every concept is taught through building something you can actually see and interact with. No "hello world" tutorials here, we're building games, visualizations, and tools from day one. Here's why learning by building beats learning by reading.

🎮 Learn by building games

📚 117 hands-on lessons

🚀 Basics to GPU programming

Resume.doc

Live GitHub Metrics

Experience

Software Engineer

Software Engineer (Internship)

Curriculum Developer & Instructor

Software Engineer

Research Intern

Featured Projects

🐨 Koala's Forge

📊 GitHub Stats

✨ Key Features

⚡ Distributed AlexNet

🚀 Ultimate System Setup

🎓 Education Playground

🎵 Melody Matcher

♻️ Smart Recycling Bin

🐨 Koala's Forge

⚡ Distributed AlexNet

🚀 Ultimate System Setup

🎓 Education Playground

🎵 Melody Matcher

♻️ Smart Recycling Bin

Technical Arsenal

Languages

Frameworks

Infrastructure

Specialized

Education

New Jersey Institute of Technology

Technical Writing

Achieving 4.6× Speedup: Custom CUDA Kernels for AlexNet

Why I Built Yet Another System Installer (And Why You Might Need One Too)

The Hidden Cost of Convenience: When Abstractions Leak Performance

Breaking Things to Understand Them: A Weekend with Distributed Consensus

Teaching Python by Building Games: Why Education Playground Works

Let's Connect

Mykolas Perevicius

Resume.doc

Live GitHub Metrics

Experience

Software Engineer

Software Engineer (Internship)

Curriculum Developer & Instructor

Software Engineer

Research Intern

Featured Projects

🐨 Koala's Forge

📊 GitHub Stats

✨ Key Features

⚡ Distributed AlexNet

🚀 Ultimate System Setup

🎓 Education Playground

🎵 Melody Matcher

♻️ Smart Recycling Bin

🐨 Koala's Forge

⚡ Distributed AlexNet

🚀 Ultimate System Setup

🎓 Education Playground

🎵 Melody Matcher

♻️ Smart Recycling Bin

Technical Arsenal

Languages

Frameworks

Infrastructure

Specialized

Education

New Jersey Institute of Technology

Technical Writing

Achieving 4.6× Speedup: Custom CUDA Kernels for AlexNet

Why I Built Yet Another System Installer (And Why You Might Need One Too)

The Hidden Cost of Convenience: When Abstractions Leak Performance

Breaking Things to Understand Them: A Weekend with Distributed Consensus

Teaching Python by Building Games: Why Education Playground Works

Achieving 4.6× Speedup: Custom CUDA Kernels for AlexNet

The Problem: PyTorch Was Too Slow

Memory Coalescing: The 10× Difference

Custom Convolution Kernel

Multi-GPU with MPI

Results: 4.6× Faster

Try It Yourself

Why I Built Yet Another System Installer (And Why You Might Need One Too)

The Weekend Everything Broke

What Is Koala's Forge?

Key Features

1. Dependency Resolution

2. Silent Installations

3. Rollback Support

Real-World Impact

Try It Yourself

The Hidden Cost of Convenience: When Abstractions Leak Performance

The Performance Paradox

When Abstractions Break Down

My Mental Model for Optimization

Phase 1: Make It Work

Phase 2: Measure Everything

Phase 3: Optimize The Hot Path

When NOT to Optimize

The Takeaway

Breaking Things to Understand Them: A Weekend with Distributed Consensus

The Best Way to Learn: Break It On Purpose

Experiment 1: Can We Elect Two Leaders?

Experiment 2: Network Partitions (Split Brain)

Experiment 3: Log Conflicts

What Textbooks Don't Tell You

Try Breaking It Yourself

Teaching Python by Building Games: Why Education Playground Works

The Problem with Traditional Tutorials

Learning By Building

Lesson 1: Not "Hello World", but "Make a Game"

Progressive Complexity: From Games to GPU Programming

Weeks 1-4: Fundamentals Through Games

Weeks 5-8: Data Structures Through Visualizations

Weeks 9-12: Real-World Applications

Results: 90% Certification Pass Rate

Why It Works: The Science of Learning

1. Active Learning Beats Passive Reading