Testing the Kimi K1.5 Model - Dark Bun

Before we deep dive into, kimi K1.5 testing. lets understand about it

What is Kimi K1.5?

Kimi K1.5 is a multimodal AI model, meaning it can process both text and images. Unlike some other models like DeepSeek R1, which is open-source, Kimi K1.5 is not. However, the developers have shared its technical report, and it is freely available for use on their platform with no rate limits.

This model launched on the same day as R1 and claims to have impressive performance in various benchmarks. It is particularly noted for its short and long chain-of-thought reasoning, outperforming models like GPT-4o and Claude Sonnet 3.5 on AIM Math 500 and Live Code Bench. It also matches OpenAI’s GPT-4o across multiple modalities, including math, visual reasoning (Vista, AMIE), and competitive programming (CodeForces).

Training and Capabilities

Long Context Scaling: Supports up to 128K token context, improving performance with longer inputs.
Reinforcement Learning: Uses an advanced optimization method called online mirror descent.
Multimodal Processing: Jointly trained on both text and vision data, allowing it to analyze images and text together.

Kimi K1.5 is available in two versions on their platform:

Base Kimi Model – A simpler version without chain-of-thought reasoning.
Long Thinking Mode – A more advanced version that can process complex reasoning tasks.

For this test, I used the Long Thinking Mode to evaluate its capabilities.

Secret API Access for Free

A few days after DeepSeek’s release, another Chinese model, Kimi K1.5 Long Thinking, gained attention. It is said to be similar to OpenAI’s GPT-4o but at one-tenth the cost. Interestingly, there is a way to access Kimi K1.5’s API for free for an extended period.

How to Get the Free API Key

The Kimi K1.5 team has released a Google Form where users can apply for free API access. Here’s what you need to do:

Fill in your first name, last name, email address, job title, and a link to your social profile.
Mention your research interests and usage scenario (e.g., running an AI content YouTube channel).
State your expected API usage (5-10 requests initially).
Approximate token usage (mention 10,000 for safety).
Estimated duration of API access (suggest 1-2 months).
Your country.
Submit the form.

After submission, the team will verify your application. Within a week, you should receive an email from Moonshot AI, the parent company of Kimi K1.5, granting API access. The email contains a free API key with a 20 million token quota—which is huge for research and development!

How to Use the Free API Key

Once you receive your API key, here’s how you can start using it:

import openai

openai.api_key = "YOUR_API_KEY_HERE"
openai.api_base = "https://api.moonshot.ai/v1"

response = openai.ChatCompletion.create(
    model="kimi-k1.5-preview",
    messages=[{"role": "user", "content": "Find the hypotenuse of a right triangle with legs 3 cm and 4 cm."}],
    stream=True
)

for chunk in response:
    print(chunk)

This setup is very similar to DeepSeek’s API, so if you have used that before, transitioning will be seamless. The model processes the query and provides answers efficiently.

Testing the Model

I ran Kimi K1.5 through 13 different tasks, covering logic, math, language processing, and coding. Below are the results:

Task	Expected Answer	Result
Name a country ending in -lia and its capital	Australia, Canberra	✅ Pass
Number that rhymes with a tall plant	Three	✅ Pass
Haiku where second letters spell simple	Custom haiku	✅ Pass
English adjective of Latin origin (11 letters, vowels in order)	Transparent	✅ Pass
Correcting overstated count (48 people, 20% over)	40	✅ Pass
Apple counting word problem	2	✅ Pass
Counting Sally’s sisters logically	1	✅ Pass
Hexagon diagonal calculation	73.9	✅ Pass
HTML page with a confetti button	Functional	✅ Pass
Playable synth keyboard (HTML, CSS, JS)	Functional	❌ Fail
Generate SVG of a butterfly	Correct shape	❌ Fail
3D moving circle in HTML, CSS, JS	Functional	✅ Pass
Game of Life in Python (Terminal)	Functional	✅ Pass

Analysis: Strengths & Weaknesses

Strengths

Logical & Math Problems: It handled logic puzzles and math calculations with ease.
Long Chain-of-Thought Reasoning: The model successfully tackled complex, multi-step problems.
General Knowledge & Language Tasks: It produced accurate results for trivia and creative writing tasks.
Web Development Tasks: It successfully created a functional HTML page with interactivity.

Weaknesses

Coding Performance: It struggled with advanced programming tasks. Unlike DeepSeek R1, which excels at coding, Kimi K1.5 had difficulty with a playable synth keyboard and generating a correct butterfly SVG.
Token Repetition: Occasionally, it repeated words and tokens, which affected fluency.
Limited Uniqueness: While it’s a solid model, it doesn’t offer anything groundbreaking beyond what’s available with models like DeepSeek R1.

Final Verdict

Kimi K1.5 is a good multimodal model with solid reasoning capabilities, but it doesn’t bring anything significantly new to the table. While it outperforms certain models in reasoning tasks, its lack of open-source availability and struggles with coding tasks make it less appealing compared to models like DeepSeek R1. Additionally, since its API is not yet widely available, its practical use remains limited.

That said, with free access to 20 million tokens, it is a fantastic opportunity for those who need a high-performance reasoning model at no cost.