Ireland's first custom LLM creation studio is here.
Fine-tune frontier models up to 235B parameters using Tinker distributed training, powered by Thinking Machines Lab. Build domain-specific intelligence layers that you own and deploy anywhere. Your data. Your model. Our GPUs.
Why Custom Models? Why Now?
The AI landscape has fundamentally shifted. Custom models are no longer a luxury—they're a strategic necessity. The cost of training large models is dropping 10x every 18 months. Open-source frontier models now rival proprietary alternatives. Enterprises demand private, compliant AI—not shared APIs.
Your custom model is your competitive edge in the AI era. For high-volume use cases, custom models eliminate per-token API costs—often 95% cheaper than OpenAI at scale. You get your weights, your VPC, and full control. No data leakage. No vendor lock-in. Your domain knowledge becomes the model's expertise.
But here's the problem: most businesses can't build custom models. They don't have GPU clusters, ML research teams, or distributed training infrastructure. That's where LeemerLabs Model Foundry comes in.
What We're Launching
LeemerLabs Model Foundry is Ireland's first custom LLM creation studio. We're a premium, end-to-end service where companies pay us to create custom fine-tuned LLMs, domain-specific knowledge models, enterprise private chat models, industry-specialized agents, distilled small models from big ones, hosted inference endpoints, and white-label chat apps powered by their model.
Everything is powered by Tinker—the distributed training platform built by Thinking Machines Lab, founded by former OpenAI CTO Mira Murati and a team of ex-OpenAI researchers including co-founder John Schulman. Instead of managing clusters, GPUs, and training jobs, you write a simple Python training loop, and Tinker turns it into fault-tolerant distributed training on their GPU infrastructure.
We can fine-tune models from 1B parameters all the way up to massive 235B MoE architectures like Qwen3-235B-A22B. Switching models is often as easy as changing a single string. Under the hood, Tinker uses LoRA (Low-Rank Adaptation) based on their 'LoRA Without Regret' research, which shows that with the right setup—correct learning-rate scaling, rank selection, and layer coverage—LoRA can match full fine-tuning for many post-training tasks, especially reinforcement learning.
Why Tinker Matters
Tinker compresses an entire AI infrastructure team into an API. It handles GPU scheduling, checkpointing, fault tolerance, and multi-node training—so we focus on data, objectives, and evaluation instead of cluster babysitting.
It gives us access to frontier-class models with open weights. Support for modern Llama and Qwen families—including huge MoE models—means we can train models competitive with proprietary labs while letting you own and export your weights.
Most importantly, Tinker is built by people who've shipped frontier models before. Thinking Machines is stacked with former OpenAI leaders who've actually built and deployed frontier-scale systems. For LeemerLabs Model Foundry, that means we can reliably offer serious, research-grade training loops instead of 'just another wrapper around someone else's API.'
Why We're Ireland's Only Tinker Beta Partner
We chose to partner with Thinking Machines and join their early Tinker beta because it gives our clients something most agencies simply cannot offer: access to cutting-edge training infrastructure. We get the same style of distributed training stack that powered frontier models—exposed through a clean API—so we can fine-tune everything from compact 1B experts to MoE giants like Qwen3-235B for your domain.
Instead of treating LoRA as a hack, we use it the way the 'LoRA Without Regret' team intended: correct learning-rate scaling, rank selection, and layer coverage. That means better sample efficiency, better RL behavior, and faster iteration on your custom model.
Because Tinker is built for open-weight bases (Llama, Qwen, etc.), we hand you exportable weights at the end of a project. You're not locked into our infra—or anyone else's. Thinking Machines' mission is to make advanced AI more understandable and customizable, not more opaque. That lines up perfectly with what LeemerLabs Model Foundry stands for: 'Your data. Your model. Your intelligence layer.'
What Makes LeemerLabs Different
We're not another agency—we're a full AI lab. Most 'AI agencies' wrap OpenAI and call it a day. We build models, agents, pipelines, infrastructure, and entire platforms. LeemerLabs is the research arm behind LeemerChat, Warren.wiki, ExamMate, HeyCouncil, DeepThis, and more—real systems used by real users every day.
We've been in the AI game since 2023—long before it was cool. We were training models, distilling Qwen, orchestrating multi-model workflows, and building agents before GPT-4o, before Gemini, before the hype cycle. We've lived through everything from LLaMA-1 to LLaMA-3 and saw the entire open-weights revolution unfold. We didn't jump on the wave—we were here when the wave began.
We've built our own models—not just glued APIs together. We've fine-tuned Qwen, LLaMA, Gemma, Mistral, Mixtral, and multiple small models. We've built internal bilingual Bengali/English models, distilled models for production apps, and even crafted custom Orchestrator → Worker model chains inside LeemerChat. We understand models from the inside, not just the prompt.
LeemerChat alone has processed over 1 billion tokens for real users. That's 1B tokens of model reasoning, user queries, real-world edge cases, and what breaks and what scales. This is battle-tested experience, not theoretical knowledge.
Multi-model orchestration is our native language. We built Leemer Heavy, Leemer Heavy Fast, Leemer Research, and multi-agent pipelines using Qwen, Groq LPU models, GPT-4.1/4o, Claude, Kimi, LLaMA, and DeepSeek. We design architectures where small, large, and domain models collaborate—so your system is always fast, accurate, and cheap.
We deliver full-stack AI deployment—not just training. We don't just hand you weights. We deliver the entire intelligence layer: private APIs, white-label chat apps, internal agents, custom embeddings, RAG pipelines, Slack/Teams/WhatsApp bots, on-prem deployment, monitoring, rate-limits, logging, and analytics. Most agencies 'fine-tune a model.' We deploy your entire AI system.
The Foundry Pipeline
Our process is straightforward and transparent. Week 1: Data Forge—we help create, clean, or synthesize datasets. Domain distillation from frontier models, multi-turn conversation generation, and dataset labeling pipelines.
Week 2: Model Crafting—we fine-tune models up to 235B parameters using Tinker's distributed infrastructure. LoRA adapters, instruction tuning, and domain-specific behaviors.
Week 3: Evaluation—comprehensive benchmarks (TruthfulQA, MMLU, GSM8K, HumanEval), real client data evals, safety tests, hallucination analysis, and benchmark reports.
Week 4: Deployment—private API endpoints, downloadable model weights, SDKs for JS/TS, Python, and cURL. Hosted inference with custom rate limiting and analytics. White-label chat apps, RAG pipelines with vector databases, custom retrievers, and evaluation frameworks.
From initial data handoff to production-ready deployment, typical projects take 4 weeks. Complex enterprise projects may take 6-8 weeks. We provide progress updates throughout the process.
Who This Is For
Startups: Build your AI moat early. Custom models give you a defensible advantage over competitors using generic APIs.
Agencies: Offer AI services to your clients with white-label models. Become the AI partner they need.
Enterprise: Deploy private intelligence layers with full compliance, security, and governance controls. GDPR-ready, ISO-friendly architecture, on-prem deployment available, data sovereignty, private inference, and exportable weights.
Pricing & What's Included
We offer three tiers. Starter Fine-Tune (€1,200–€3,000) for small businesses: small dataset (<20k samples), 7B–8B model, LoRA fine-tune, hosted API, basic eval report, and 30-day support.
Business Model (€5,000–€12,000) for startups and agencies: 7B–32B models, dataset creation, multi-turn training, RAG pipeline, API + SDK, white-label chat (optional add-on), and 3 months support.
Enterprise Intelligence Layer (€15,000–€50,000) for serious clients: 32B–235B models, domain datasets + RL dataset, full eval suite, safety tuning, hosted inference + rate limits, dedicated Slack, white-label end-user app, and 6 months support.
We also offer monthly retainer packages (€499–€2,500/month) for model improvements, dataset expansion, retraining, API uptime monitoring, and new feature development. This ensures your model stays current with your evolving needs.
The Bottom Line
We handle your data and training loop. Tinker handles the heavy GPU lifting. You walk away with a model that feels like your company's own OpenAI—without giving up control.
For LeemerLabs Model Foundry, that means we can reliably offer serious, research-grade training loops instead of 'just another wrapper around someone else's API.'
We're builders, not consultants. Everything we sell, we use in our own products. We're not theorizing—we're operating. LeemerChat, Warren.wiki, HeyCouncil, ExamMate… these are full AI platforms built on the same systems we deliver to clients. If we didn't build real things, we wouldn't be here.
We believe in open models—and we give YOU ownership. We back open-weights. We support local hosting. And at the end of the engagement, you own the model, the weights, and the intelligence layer. You're not renting intelligence from a Silicon Valley API. You're owning your own model.
Built in Waterford, scaling globally. We're proud of where we come from. We build world-class AI—in Ireland. No Silicon Valley ego, no bloated teams, no fluff. Just pure engineering, research, and delivery.
Ready to Forge Your Model?
Talk to our AI architects today. Whether you're testing custom models, deploying enterprise intelligence layers, or fine-tuning with your own datasets, we keep the experience cohesive—one foundry, multiple specialized models.