Training YOLO on a Laptop GPU Over Summer Break

June 2022. Board exams were behind me. Strato Foods still needed occasional fixes. And I had a laptop GPU, a Kaggle dataset, and the arrogant belief that I could reproduce a YOLO demo in a weekend.

It took the whole month. This is what that actually looked like for a teenager building through Strato Inc, not a research lab with labeled forensic data.

Why I Picked Object Detection

I had already played with image classifiers. Detection felt like the next level: boxes, classes, something you could point at on a screen and say “see, it works.”

I was not solving fraud. I was not building verification for anyone’s pipeline. I wanted a side project that might eventually ship as a silly demo app or teach me enough CV to stop copy-pasting Stack Overflow answers into Flutter plugins.

YOLOv5 was the default recommendation on ML Twitter that summer. Ultralytics made it easy to pretend the hard parts were solved.

The Hardware Reality

My laptop had a consumer NVIDIA GPU. Not datacenter. Not even particularly new.

What the tutorials skip:

The first time training ran overnight and Windows Update rebooted the machine, I considered switching to philosophy.

What Actually Worked

flowchart LR
    subgraph data [Data]
        K[Kaggle dataset]
        L[Label cleanup by hand]
    end

    subgraph train [Train]
        Y[YOLOv5 small]
        G[Laptop GPU]
    end

    subgraph ship [Ship]
        E[Export ONNX]
        T[Test on phone photos]
    end

    K --> L --> Y
    G --> Y
    Y --> E --> T

Start tiny. YOLOv5s, not x. I did not need SOTA. I needed a box around a cup that did not flicker every third frame.

Label hygiene matters more than architecture. Half my early failures were mislabeled training images I was too lazy to fix.

Export early. Getting a .pt file to run in Python is not the same as making it usable anywhere else. I learned that before I learned mAP.

Record your hyperparameters. Not because I was rigorous. Because I forgot what worked and re-ran bad configs twice.

What Failed

Real-time on phone. The exported model ran on laptop webcam at acceptable FPS. On a mid-range Android phone through a naive pipeline, it did not. That gap between “demo” and “product” would follow me for years.

Custom classes with too little data. I tried adding a class with forty images. The model confidently detected everything as that class. Classic overfit. I was not special; every beginner hits this wall.

Expecting tutorial metrics to mean product quality. High mAP on a clean validation set does not mean robustness to bad lighting, motion blur, or a user who holds the phone at a angle that would make a cinematographer cry.

Connection to Strato Inc

None of this shipped as a Strato Foods feature. It was parallel learning — the same pattern as my GPT-3 Telegram experiments that month: build at night, learn something real, maybe reuse later.

That reuse would come in random places: better intuition for camera pipelines in Android apps, less trust in “99% accuracy” slides, patience when ML Kit did something dumb in production.

Takeaway

If you are a student with one GPU and infinite YouTube confidence: shrink the problem until it fits your hardware and your patience. Detection is not magic. It is labels, compute, and the humility to fix your dataset before you swap model architectures.

Summer break ended. I went back to classes, Strato Foods, and GPT-3 plumbing. The YOLO weights lived in a folder I would rediscover months later and delete to free space.

The lessons stuck longer than the checkpoints.

--claps