en de es fr id pt ru tr

❤️ Donate

Brainy is a tiny, open-source research sidekick that lives at askbrainy.com and in Telegram — built with free tools, a shoestring budget, and a lot of love. It currently runs on a hilariously old Mac mini A1347 (2012, MD387D/A) I rescued off eBay for €56 (plus delivery). That old champ keeps the lights on — but it can’t run Brainy locally.

It has 16 GB RAM (nice) and an SSD, but the Intel Core i5-2415M and Intel HD 3000 mean local LLMs are… let’s say “historical reenactments.” So Brainy leans on Together AI for inference. That works, but:

Context is tight on the free endpoints (8,193 tokens in+out combined), so complex multi-document research and long chat history hit hard limits.
RPM/TPM caps kick in quickly as more people use Brainy.
Free model pools get congested and sometimes refuse to serve.

Also: the old Mini still draws power 24/7 (≈ €10/month electricity). Your donation makes an immediate, measurable difference.

Brainy will remain free and open-source forever. Donations prevent paywalls and keep the code open.

🎯 Funding goals

1) Goal 1 — $50 (micro)
Top up Together AI for higher throughput. Target: Build Tier 2, which reduces “429 busy”/queue time and lets Brainy batch more users.

2) Goal 2 — $750 (macro)
Buy a Mac mini (M4, 10-core CPU / 10-core GPU, 16 GB unified) — for example: www.computeruniverse.net
Why? It’s shockingly efficient and fast enough to run ~14B models locally (quantized), offloading a lot of traffic from Together while keeping Together for bigger contexts and specialty models. Apple lists 4 W idle / 65 W max for the base M4 Mini; the old 2012 Mini peaks at 85 W.

⏳ Live progress

Target: $750

$0   [>-----------------------] 0%

Ads/sponsorships/collabs will also count toward the goal.
✉️ Contact: [email protected]

🧠 Why the M4 upgrade

1) Model throughput: old Mini vs. M4 Mini vs. RTX 3060 (14B, quantized)

Let’s use a popular 14B family (e.g., Qwen 2 14B in Q4/Q5 quantization) and llama.cpp/MLX-style local inference as the yardstick.

Machine	Stack	Model / Quant	Tokens/s (TG)
Old Mac mini (2012) i5-3210M, CPU-only	llama.cpp (CPU), Q4	~0.5–1.5 t/s (estimated)	CPU-only 13–34B community reports land ~1.5–4 t/s on much newer multi-channel CPUs; older dual-core Ivy Bridge is substantially slower. Order-of-magnitude only.
Mac mini (M4, 16 GB, 10-GPU)	Metal/MLX, Q4/Q5	~15–20 t/s (estimated)	Community M4 Pro (64 GB) reports 30–35 t/s for Qwen 2.5 14B (MLX + speculative decoding). Base M4 (fewer GPU cores, less memory headroom) should land lower; conservative estimate shown.
PC w/ RTX 3060 (12 GB)	llama.cpp (CUDA), Qwen2 14B Q5_K_M	28.9 t/s (measured)	Example benchmark shows 28.88 t/s and model file ~9.8 GiB—fits easily in 12 GB VRAM.

Takeaway: the M4 Mini should be orders of magnitude faster than the 2012 Mini and competitive with a 3060-class PC for 14B-ish INT4/INT5 workloads—while using a fraction of the power and heat budget.

Why this is realistic:
- llama.cpp has a first-class Metal backend; Apple Silicon is explicitly optimized. Most local LLM pipelines on macOS (Ollama, MLX) run compute on GPU via Metal.
- Measured data points exist for M-series (e.g., M3 Max): LLaMA-3-70B Q4_K_M text-gen ≈ 7.5 t/s, showing how far Metal has come; 14B is far lighter than 70B.

2) Energy & thermals (tokens per watt)

Apple publishes official power numbers:

Mac mini (M4, base): 4 W idle / 65 W max
Mac mini (Late 2012): up to 85 W max

And for the PC baseline:

RTX 3060 TGP ≈ 170 W (GPU alone; system draw is higher).

Very rough efficiency math (generation phase, not prompt eval):

Old Mini (2012, CPU-only): ~1 t/s ÷ 85 W ≈ 0.012 t/s/W (pain).
M4 Mini: ~18 t/s ÷ 65 W ≈ 0.28 t/s/W (quiet + cool).
RTX 3060 PC (GPU only): 28.9 t/s ÷ 170 W ≈ 0.17 t/s/W (doesn’t include CPU/system overhead).

Bottom line: The M4 Mini trades a small performance delta vs. a 3060 for ~1.6× better tokens/W (GPU-only) and dramatically lower whole-system draw. For a 24/7 community tool, that’s greener and cheaper to run.

🔧 Together AI: what’s great, what hurts

I love Together’s model buffet and pricing, but Brainy’s usage hits tier math and model caps fast. Some free endpoints enforce stricter, per-model caps (e.g., 70B-class “free” endpoints) and congestion can throttle you below your nominal tier—meaning you see 429s even when you think you’re under the limit.

Real-world example I see frequently:

together.error.RateLimitError: Error code: 429
{"message":"You have reached the rate limit specific to this model meta-llama/Llama-3.3-70B-Instruct-Turbo-Free.
The maximum rate limit for this model is 6.0 queries and 180000 tokens per minute."}

This happens even if Brainy isn’t hammering at 6 RPM—because the per-model pool for the free 70B is tiny and often saturated. (Topping up to Build Tier 2 gives Brainy a much less crowded lane.)

📦 What donations unlock for Brainy

$50 (Build Tier 2 top-up):
Jump to 60 RPM for free models → dramatically fewer 429s and bigger concurrent user headroom.
$750 (M4 Mini):
Local 14B inference at usable speeds (see table), cutting API calls dramatically.
Way better energy profile than GPU PC builds, and silent enough to run next to my coffee.
Still keep Together for 70B+ needs.

📦 What you’ll unlock for Everyone

More context memory for longer, more thoughtful answers.
Higher reliability during traffic spikes (fewer rate-limit hiccups).
Faster iteration on new and existing features.

💸 How to donate (crypto only for now)

USDT (TRC-20): TK5uyyAbuchtBS4hwWwtQA4G15MA54RDkG
USDT (BSC): 0x942891F9a02632d67C496305c9746ACedfC0eb2D
USDT (SOL): 5yzcNUo8r7goHZMzwF9hPS8MVqXevwuyT4S8hhyHQVqK

If you’d like to run an ad, sponsor, or collaborate—those count as donations too.
✉️ [email protected]

📝 About Brainy

Brainy is free, open-source, and built on free-tier tooling wherever possible. Donations are voluntary and go 100% into compute credits & hardware for Brainy.
I will keep this page updated with live progress and receipts (screenshots/links) as milestones are hit.
If you want to earmark funds (e.g., “Only for Build Tier 2 top-ups”), add a memo in your email and I’ll honor it.
Donations are typically not tax-deductible and are generally non-refundable. See the Terms of Service on the site.

Thanks for reading this far ❤️ If you can chip in—even a few USDT—it directly translates into fewer 429s, faster responses, and a greener Brainy that we can all use.