The tool above allows you to explore different open models and see which closed models they best match up to. The methodology consists of using the models' scores on various benchmarks sourced from artificialanalysis.ai. Cosine similarity is used to identify a panel of best 'shaped' fits, and then an error-minimization function is used to find the minimal deviation between the two models. Models are evaluated only where benchmarks overlap. The number of benchmarks matched and used in the evaluation is noted on the closed model card. Three ranked matches are provided, giving you an idea of the capabilities of the model.
These two tools can be used to orient yourself within the world of local and open source LLMs.
The tool below offers a heuristic lens through which to view local models. The models are stacked up by Size and AA Intelligence Score, with the size of the bubble indicating a higher ratio of intelligence/size. These larger bubbles indicate models that are great candidates for local inference as they will be more performant on constrained hardware. General hardware requirements can be seen along the top while vibes-based generalizations of the potential performance of a model can be seen to the right. This tool is useful if you are thinking about your hardware and which models may be a good fit.
Memory sizes appear in two bands each - in the first band for a given memory size, you can expect to be able to run a higher quantization - such as Q6 or Q8, which will yield a higher quality experience for that model. The second band indicates models which will need to be ran at a lower quantization - such as a Q4 quantization. This tradeoff is often acceptable, but it should be noted. Hover the bubbles to see what the expecting token generation speed would be on full-VRAM-offload systems, or systems with unified memory like the DGX Spark or various Mac models. Your real speeds will vary - these are just averages sourced from localmaxxing.com and my own personal tests. Consider that those publicly benchmarking their models are more likely to have higher end hardware.
The tool above allows you to explore different open models and see which closed models they best match up to. The methodology consists of using the models' scores on various benchmarks sourced from artificialanalysis.ai. Cosine similarity is used to identify a panel of best 'shaped' fits, and then an error-minimization function is used to find the minimal deviation between the two models. Models are evaluated only where benchmarks overlap. The number of benchmarks matched and used in the evaluation is noted on the closed model card. Three ranked matches are provided, giving you an idea of the capabilities of the model.
The tool below offers a heuristic lens through which to view local models. The models are stacked up by Size and AA Intelligence Score, with the size of the bubble indicating a higher ratio of intelligence/size. These larger bubbles indicate models that are great candidates for local inference as they will be more performant on constrained hardware. General hardware requirements can be seen along the top while vibes-based generalizations of the potential performance of a model can be seen to the right. This tool is useful if you are thinking about your hardware and which models may be a good fit.
Memory sizes appear in two bands each - in the first band for a given memory size, you can expect to be able to run a higher quantization - such as Q6 or Q8, which will yield a higher quality experience for that model. The second band indicates models which will need to be ran at a lower quantization - such as a Q4 quantization. This tradeoff is often acceptable, but it should be noted. Hover the bubbles to see what the expecting token generation speed would be on full-VRAM-offload systems, or systems with unified memory like the DGX Spark or various Mac models. Your real speeds will vary - these are just averages sourced from localmaxxing.com and my own personal tests. Consider that those publicly benchmarking their models are more likely to have higher end hardware.