My Favorite Local AI Models

These are my favorite AI models to use for local use on my MacBook right now.

These models provide privacy by not transmitting your conversations outside of your local computer. They are less capable than the modern online models but they provide a nice option when you need total privacy or when you don’t have a reliable internet connection such as in a rural area and some traveling situations.

I’m using Llama 3.1 as my primary local model

I use the 8b (latest) parameter size. Another, larger, size is also available.

Llama 3.1 is a good general purpose chat model with very good coding support.

Llama 3.2 is a newer version but is focused on a smaller model size (1B and 3B parameters) and may be more appropriate for certain phone or embedded use cases.

Llama 3.3 is also newer but is focused on a larger model (70B parameters) so is less suitable for laptop use.

Llama 4, also newer, is much too big for use on my laptop.

I’m using Gemma 3 as an alternative local model

Gemma 3 is a good general purpose chat model with good coding support. It handles text and image input and generates text output. It’s a little more conservative than Llama 3.1 which increases factual accuracy and safety. That’s useful in certain situations. Sometimes I test Llama 3.1 and Gemma 3 against each other for specific tasks.

Gemma 3n is a related model that is designed for smaller devices like mobile phones and laptops. It might be worth a look but I Haven’t tested it yet.

I’m doing experiments with Phi 4 Mini and other small Phi models

Phi 4 is a set of small and efficient reasoning models from Microsoft.

Phi 4 is good for lightweight deployment on edge or mobile systems.

I don’t use Phi regularly but I’ve used Phi 3 for experimenting with tasks that I like for very small models. Specifically, I’ve used it in creative cases where I actually wanted the model to be a little bit “dumber” in order to work that into some (unreleased) creative writing. I also have some ideas for trying Phi on small embedded systems.

Phi 3 is a 3.8b model (with a 14b option).

Phi 4 is a larger 12b model.

Phi4-mini is a 3.8B model that supports function calling.

Phi4-mini-reasoning is a 14b model that supports reasoning.

I use Llama 2 Uncensored when I need an uncensored model

There are some limited use cases where I need an uncensored model. Often to ask questions that the primary models are a bit too sensitive on so they refuse to answer. I’ve reached for the Llama 2 Uncensored series in these cases.

A note on network access from local models

I’ve used LittleSnitch to test Ollama’s network access. I’ve spent a few hours running various tests on two different occasions. I’m not an expert focused on network access so the following observations should be taken with a grain of salt.

Ollama accesses the network when pulling new models
Conversations with Llama3.1 do not create any network traffic

Written by Joel Dare on July 17, 2025.

Want to see my personal updates, tech projects, and business ideas? Join the mailing list.

Follow Along

Terms - Privacy