Blog

Thoughts on infrastructure, AI, high-load systems, and building things that work.

Nemotron 8B Embedding Model on RTX 5090 / 4090 / 3090: An Honest Benchmark (My First Results Were 10x Wrong)

Real throughput numbers for the nvidia/llama-embed-nemotron-8b embedding model on RTX 5090, 4090, 3090, and a $60 P102-100 mining card. Why naive padding and torch.compile mistakes inflate TPS by 10x, how Nemotron 8B compares to bge-m3 and OpenAI text-embedding-3 for local LLM inference, and where the real bottleneck is.

llmembeddingsembedding-modelbenchmarknvidianemotrongpurtx-5090rtx-4090rtx-3090pytorchtorch-compilelocal-llmrag

Best AI Coding Stack 2026: $31/mo Beats $200 Subscriptions (Claude Code, NotebookLM, Gemini CLI, GLM-4.7)

Stop paying $200/mo for Claude Max, ChatGPT Pro, Cursor, or GitHub Copilot. How to build a multi-agent AI coding stack for $31/mo using Claude Pro ($24) for architecture, GLM-4.7 + Kilo Code ($7) for bulk parsing, NotebookLM Pro for research, and Gemini CLI + Google Antigravity as a free fallback shield.

aillmclaudeclaude-codeclaude-progeminigemini-clinotebooklmglmkilo-codeai-coding-toolscursorgithub-copilotmulti-agentdeveloper-toolsproductivity

Oracle Cloud Free Tier 2026: 4-Core / 24 GB ARM Ampere A1 VPS Benchmarked vs AWS, Hetzner, DigitalOcean

The best free VPS in 2026 is hiding in Oracle Cloud Infrastructure (OCI) Always Free Tier: 4 OCPU ARM Ampere A1, 24 GB RAM, 200 GB block storage, 10 TB monthly egress. Real sysbench, dd, and speedtest numbers vs the AMD micro, plus the Pay-As-You-Go trick that bypasses out-of-capacity errors and idle reclamation. 5-year savings vs AWS t4g.medium: $4,000+.

oracle-cloudocifree-tierfree-vpsarmampere-a1clouddevopsbenchmarkaws-alternativeshetznervpsinfrastructuresysadmincloud-nativecost-optimizationindie-hacker

Lightning AI Studio Review: A Free Google Colab Alternative With NVIDIA T4, 30 GB RAM, and Real SSH (75 Hours of Cloud GPU)

Free cloud GPU for ML and LLM work without a discrete graphics card on your laptop. 15 starting credits on Lightning AI Studio buy ~75 hours of NVIDIA T4 (16 GB VRAM) with 30 GB system RAM and fast NVMe. The killer feature over Google Colab, Kaggle, Paperspace Gradient, RunPod, and Modal: full native SSH access, so VS Code, PyCharm, and the Claude CLI all run on remote hardware as if it lived inside your laptop.

mlllmcloud-gpulightning-ailightning-ai-studionvidia-t4free-tierfree-gpugoogle-colab-alternativecolab-alternativekagglepaperspacerunpodmodalsshvscode-remotepytorchmlopsai-developmentremote-development