❤️ Donate

Brainy is a tiny, open-source research sidekick that lives at askbrainy.com and in Telegram — built with free tools, a shoestring budget, and a lot of love. It currently runs on a hilariously old Mac mini A1347 (2012, MD387D/A) I rescued off eBay for €56 (plus delivery). That old champ keeps the lights on — but it can’t run Brainy locally.

It has 16 GB RAM (nice) and an SSD, but the Intel Core i5-2415M and Intel HD 3000 mean local LLMs are… let’s say “historical reenactments.” So Brainy leans on Together AI for inference. That works, but:

  • Context is tight on the free endpoints (8,193 tokens in+out combined), so complex multi-document research and long chat history hit hard limits.
  • RPM/TPM caps kick in quickly as more people use Brainy.
  • Free model pools get congested and sometimes refuse to serve.

Also: the old Mini still draws power 24/7 (≈ €10/month electricity). Your donation makes an immediate, measurable difference.

Brainy will remain free and open-source forever. Donations prevent paywalls and keep the code open.


🎯 Funding goals

1) Goal 1 — $50 (micro)
Top up Together AI for higher throughput. Target: Build Tier 2, which reduces “429 busy”/queue time and lets Brainy batch more users.

2) Goal 2 — $750 (macro)
Buy a Mac mini (M4, 10-core CPU / 10-core GPU, 16 GB unified) — for example: www.computeruniverse.net
Why? It’s shockingly efficient and fast enough to run ~14B models locally (quantized), offloading a lot of traffic from Together while keeping Together for bigger contexts and specialty models. Apple lists 4 W idle / 65 W max for the base M4 Mini; the old 2012 Mini peaks at 85 W.


⏳ Live progress

Target: $750

$0   [>-----------------------] 0%

Ads/sponsorships/collabs will also count toward the goal.
✉️ Contact: [email protected]


🧠 Why the M4 upgrade

1) Model throughput: old Mini vs. M4 Mini vs. RTX 3060 (14B, quantized)

Let’s use a popular 14B family (e.g., Qwen 2 14B in Q4/Q5 quantization) and llama.cpp/MLX-style local inference as the yardstick.

Machine Stack Model / Quant Tokens/s (TG) Notes
Old Mac mini (2012) i5-3210M, CPU-only llama.cpp (CPU), Q4 ~0.5–1.5 t/s (estimated) CPU-only 13–34B community reports land ~1.5–4 t/s on much newer multi-channel CPUs; older dual-core Ivy Bridge is substantially slower. Order-of-magnitude only.
Mac mini (M4, 16 GB, 10-GPU) Metal/MLX, Q4/Q5 ~15–20 t/s (estimated) Community M4 Pro (64 GB) reports 30–35 t/s for Qwen 2.5 14B (MLX + speculative decoding). Base M4 (fewer GPU cores, less memory headroom) should land lower; conservative estimate shown.
PC w/ RTX 3060 (12 GB) llama.cpp (CUDA), Qwen2 14B Q5_K_M 28.9 t/s (measured) Example benchmark shows 28.88 t/s and model file ~9.8 GiB—fits easily in 12 GB VRAM.

Takeaway: the M4 Mini should be orders of magnitude faster than the 2012 Mini and competitive with a 3060-class PC for 14B-ish INT4/INT5 workloads—while using a fraction of the power and heat budget.

Why this is realistic:
- llama.cpp has a first-class Metal backend; Apple Silicon is explicitly optimized. Most local LLM pipelines on macOS (Ollama, MLX) run compute on GPU via Metal.
- Measured data points exist for M-series (e.g., M3 Max): LLaMA-3-70B Q4_K_M text-gen ≈ 7.5 t/s, showing how far Metal has come; 14B is far lighter than 70B.


2) Energy & thermals (tokens per watt)

Apple publishes official power numbers:

  • Mac mini (M4, base): 4 W idle / 65 W max
  • Mac mini (Late 2012): up to 85 W max

And for the PC baseline:

  • RTX 3060 TGP ≈ 170 W (GPU alone; system draw is higher).

Very rough efficiency math (generation phase, not prompt eval):

  • Old Mini (2012, CPU-only): ~1 t/s ÷ 85 W ≈ 0.012 t/s/W (pain).
  • M4 Mini: ~18 t/s ÷ 65 W ≈ 0.28 t/s/W (quiet + cool).
  • RTX 3060 PC (GPU only): 28.9 t/s ÷ 170 W ≈ 0.17 t/s/W (doesn’t include CPU/system overhead).

Bottom line: The M4 Mini trades a small performance delta vs. a 3060 for ~1.6× better tokens/W (GPU-only) and dramatically lower whole-system draw. For a 24/7 community tool, that’s greener and cheaper to run.


🔧 Together AI: what’s great, what hurts

I love Together’s model buffet and pricing, but Brainy’s usage hits tier math and model caps fast. Some free endpoints enforce stricter, per-model caps (e.g., 70B-class “free” endpoints) and congestion can throttle you below your nominal tier—meaning you see 429s even when you think you’re under the limit.

Real-world example I see frequently:

together.error.RateLimitError: Error code: 429
{"message":"You have reached the rate limit specific to this model meta-llama/Llama-3.3-70B-Instruct-Turbo-Free.
The maximum rate limit for this model is 6.0 queries and 180000 tokens per minute."}

This happens even if Brainy isn’t hammering at 6 RPM—because the per-model pool for the free 70B is tiny and often saturated. (Topping up to Build Tier 2 gives Brainy a much less crowded lane.)


📦 What donations unlock for Brainy

  • $50 (Build Tier 2 top-up):
  • Jump to 60 RPM for free models → dramatically fewer 429s and bigger concurrent user headroom.
  • $750 (M4 Mini):
  • Local 14B inference at usable speeds (see table), cutting API calls dramatically.
  • Way better energy profile than GPU PC builds, and silent enough to run next to my coffee.
  • Still keep Together for 70B+ needs.

📦 What you’ll unlock for Everyone

  • More context memory for longer, more thoughtful answers.
  • Higher reliability during traffic spikes (fewer rate-limit hiccups).
  • Faster iteration on new and existing features.

💸 How to donate (crypto only for now)

  • USDT (TRC-20): TK5uyyAbuchtBS4hwWwtQA4G15MA54RDkG
  • USDT (BSC): 0x942891F9a02632d67C496305c9746ACedfC0eb2D
  • USDT (SOL): 5yzcNUo8r7goHZMzwF9hPS8MVqXevwuyT4S8hhyHQVqK

If you’d like to run an ad, sponsor, or collaborate—those count as donations too.
✉️ [email protected]


📝 About Brainy

  • Brainy is free, open-source, and built on free-tier tooling wherever possible. Donations are voluntary and go 100% into compute credits & hardware for Brainy.
  • I will keep this page updated with live progress and receipts (screenshots/links) as milestones are hit.
  • If you want to earmark funds (e.g., “Only for Build Tier 2 top-ups”), add a memo in your email and I’ll honor it.
  • Donations are typically not tax-deductible and are generally non-refundable. See the Terms of Service on the site.

Thanks for reading this far ❤️ If you can chip in—even a few USDT—it directly translates into fewer 429s, faster responses, and a greener Brainy that we can all use.