
TRELLIS.2 image-to-3D now runs on Mac (Apple Silicon) - no NVIDIA GPU needed
I ported Microsoft's TRELLIS.2 to run on Apple Silicon via PyTorch MPS. The original depends on five CUDA-only compiled extensions (flex_gemm, flash_attn, o_voxel, cumesh, nvdiffrast) that have no Mac equivalent. Wrote replacement backends from scratch: Pure-PyTorch sparse 3D convolution (replacing flex_gemm), Python mesh extraction using spatial hashing (replacing CUDA hashmap ops in o_voxel), SDPA attention for sparse transformers (replacing flash_attn), and GPU-accelerated trilinear interpolation (replacing cumesh and nvdiffrast).
Tech Blogger Take
Your M5 Pro with 24GB RAM just became an AI powerhouse — here's the quantization sweet spot
Someone with an M5 Pro and 24GB of RAM is trying to figure out the best way to run Qwen 3.6 locally via Ollama, specifically whether Q4 or Q3 quantization makes more sense for their setup. This is exactly the kind of question that shows how far we've come — we're literally debating which massive language model runs best on a laptop. The fact that this is even a conversation proves that the democratization of AI is happening faster than anyone predicted. For 24GB of unified memory, Q4_K_M is your goldilocks zone — it'll give you near-full quality while leaving enough headroom for your system to breathe. Q3 quantization exists but it's harder to find and the quality drop isn't worth the marginal memory savings when you've got 24GB to work with. What's wild is that this person is casually running a model that would have required a server farm just two years ago.
Action
馬上試用Paying per API call, waiting for cloud responses, and losing internet connection kills your AI workflow
Instant responses, unlimited usage, and AI that works on airplanes — all running silently on your laptop
AI Analysis
Software Development
highTest Q4_K_M quantization first — it's the sweet spot for 24GB setups and you'll know within minutes if it works
The M5 Pro's unified memory architecture means your 24GB is shared between system and model, so you're really working with ~20GB for the actual model
You can now run enterprise-grade AI models on your laptop instead of burning through API credits or waiting for cloud responses
Job Impact Analysis
AI Engineer
Role ShiftLocal model inference means you can prototype and iterate without API costs or latency concerns
Download Ollama today and test Qwen 3.6 Q4_K_M — your development cycle just got 10x faster
Data Scientist
Opportunity24GB RAM opens up models that were previously cloud-only, giving you offline analysis capabilities
Start with Q4 quantization for your next project — you'll be shocked how close it performs to full precision