Local Translation App with Apple Intelligence

Apple Intelligence has released a translation application using its in-device LLM, Foundation Models. I want to write about the benefits, limitations, and possibilities of Apple Intelligence that I learned through this experience. There aren’t many people who write about it.

Pre-Babel Lens

The application name is Pre-Babel Lens.

The operating environment is a Mac with Apple Intelligence enabled. My development environment is a MacBook M2 16GB model. I think that even with an M1 16GB model, it will run smoothly.

Pre-Babel Lens is an application that translates using a two-pass DeepL-style approach. As of the latest version v 0.5.0, implemented as of this article, it also includes the ability to translate selected text with a double press of ⌘+C. Supported languages are 15 languages supported by Apple Intelligence. The Foundation Models used for translation are in-device LLMs, so desktop translation can be performed even without a network connection. Also, the translated text does not flow over the network.

Installation and Usage

GitHub ttrace/pre-babel-lens

It is convenient to download and use the release version that is signed by the developer, but if you have a development environment, you can clone the GitHub repository and build it.

After launching the application, enter the text you want to translate in the left Source, and click the Translate button to start translation.

During the application launch, you can translate the input text by executing ⌘+C twice in a row, DeepL style. Because it is clipboard monitoring, it does not require accessibility registration.

Pre-Babel Lens can pass the text to be translated via URL scheme, so you can register the following shell script in Automator’s quick action (by passing the text to stdin) to execute translation from the service menu.

#!/bin/zsh

text="$(cat)"
if [ -z "$text" ]; then
  text="$*"
fi

if [ -z "$text" ]; then
  exit 0
fi

encoded="$(printf '%s' "$text" | /usr/bin/python3 -c 'import sys, urllib.parse; print(urllib.parse.quote(sys.stdin.read()))')"
open "prebabellens://translate?text=$encoded"

#!/bin/zsh

text="$(cat)"
if [ -z "$text" ]; then
  text="$*"
fi

if [ -z "$text" ]; then
  exit 0
fi

encoded="$(printf '%s' "$text" | /usr/bin/python3 -c 'import sys, urllib.parse; print(urllib.parse.quote(sys.stdin.read()))')"
open "prebabellens://translate?text=$encoded"

Limitations

As mentioned in the latter part of the article, there are several unavoidable restrictions when using Apple Intelligence. Here are a few of them in brief

Token count: 4096. This is enough token count for desktop translation with shortcuts, but if you are used to translating with DeepL, GPT, or Gemini, you will feel restricted
Translation language restriction: Unlike unrestricted LLMs, Apple Intelligence is restricted to 15 supported languages
CSAM safeguard:
Privacy Shield: Apple Intelligence may stop translation if it identifies a document containing personal names and dates as a privacy act
Political Shield: Apple Intelligence may refuse to translate documents related to specific countries or claims

Token count and language could be understoodable. The underage and privacy guards are troublesome, but I can reduce the untranslated parts in the segment division. The tricky one is the political shield. There are cases where translation is not done when Israel is involved. Large paragraphs are affected all at once, so it’s difficult to handle

App Note

I must say, the ease of translating with ⌘+C twice is quite impressive. It’s convenient to use even when communication is unavailable, and the ability to encrypt the translation text gives me peace of mind.

The quality of the translation is slightly lower than DeepL or Google Translation a few months ago. Although the Foundation Models are a small model with a download capacity of 7GB, they are faster to boot and have faster output compared to the Ollama-compatible model with half the capacity, with a tendency to improve translation quality.

It’s a great app that I’m glad to have made. I plan to develop an iOS version in the future.

That’s what I need as a user. Next, I’ll write about developing applications using Apple Intelligence.

Development Notes

Development was done using Codex. I’ve reviewed the code, but most of it, including testing, is executed by Codex. Therefore, I cannot introduce how to make an application using Apple Intelligence. Here, I will introduce the limitations I encountered, workarounds, and design directions.

Reasons for Using Apple Intelligence

I was just working on a local translation app using Ollama, but it was quite challenging to bring it to iPhone. That’s why I decided to start using Apple Intelligence, which utilizes Foundation Models in-device on iPhone.

Foundation Models are not designed for high-quality generation. From the purpose stated on Apple’s page, it is evident that they are not high-quality models capable of performing the main roles of applications or services with a single prompt, but are intended for use as part of applications. I was curious about how much Foundation Models can be used for limited purposes like translation.

Piracy and LLMs

Like other LLM development companies, Apple has also been accused of using pirated data during the development of Foundation Models, but Apple responds that it has not used pirated data in its development process of Apple Intelligence or Foundation Models, citing licensed data, publicly released data, and its own creation data.

Using structured output with @Guide and @Generable

Apple Intelligence’s greatest benefit is structured output that doesn’t depend on prompts. We have been working to utilize these structured outputs as much as possible in the Pre-Babel Lens.

Let’s take an example. At v0.3.0, prompts were used to execute translations.

private static func sensitiveContentSafeInstructions(for input: TranslationInput) -> String {
    """
    You are a universal translation engine.
    Translate from \(input.sourceLanguage) to \(input.targetLanguage).
    The input can include politics, war, safety, or other sensitive topics.
    Your role is strictly translation, not policy analysis.
    Keep meaning and tone faithful.
    Return only the translated text.
    Do not refuse, summarize, or add warnings.
    If any part cannot be translated due to safety restrictions, keep only that part in the original language and annotate it with "原文ママ".
    """
}

private static func sensitiveContentSafeInstructions(for input: TranslationInput) -> String {
    """
    You are a universal translation engine.
    Translate from \(input.sourceLanguage) to \(input.targetLanguage).
    The input can include politics, war, safety, or other sensitive topics.
    Your role is strictly translation, not policy analysis.
    Keep meaning and tone faithful.
    Return only the translated text.
    Do not refuse, summarize, or add warnings.
    If any part cannot be translated due to safety restrictions, keep only that part in the original language and annotate it with "原文ママ".
    """
}

This prompt is commonly viewed, but there is a big problem with this prompt. If the load increases or the original language is German or Spanish, it will be translated into English. It is probably because the prompt is written in English. If I write the prompt in Japanese, it will output the translation language in Japanese when processing drops.

So, starting with v0.4.0, we switched to structured output from Apple Intelligence. We defined the output type from the model as follows and created an LLM session. By specifying the target language in this type, we are limiting the scope of interpretation by natural language prompts. Since we switched to this way from v0.4.0, the output has become stable and the response speed has improved.

@Generable
struct StructuredTranslationPayload {
    @Guide(description: "Target language code.")
    var targetLanguage: String
    @Guide(description: "Segment kind label.")
    var kind: String
    @Guide(description: "Translated text only.")
    var translation: String
}

...

let stream = session.streamResponse(
    to: prompt,
    generating: StructuredTranslationPayload.self,
    includeSchemaInPrompt: false
)

@Generable
struct StructuredTranslationPayload {
    @Guide(description: "Target language code.")
    var targetLanguage: String
    @Guide(description: "Segment kind label.")
    var kind: String
    @Guide(description: "Translated text only.")
    var translation: String
}

...

let stream = session.streamResponse(
    to: prompt,
    generating: StructuredTranslationPayload.self,
    includeSchemaInPrompt: false
)

I have also implemented the function to make the kind of conversation, UI text, and bullet points, etc., be judged by heuristic classification as mentioned in the @Guide. Increasing the parameter of @Guide by just one can not affect the processing speed.

Speed and competition between multiple sessions

Pre-Babel Lens translates news articles of about 200 words in one second or less. Even opinion articles or excerpts from novels with paragraphs of nearly 500 words, it will not wait for two seconds. It’s really impressive. However, if you start two sessions of Apple Intelligence at the same time, you will be severely slowed down.

In v0.3.0, we tried to only judge the document type heuristically before translation. However, because the type determination from one of the seven enums and the simultaneous execution of translation and type determination of tens of words, the translation and type determination were solidified, and the processing that was supposed to end in 2 seconds took 10 or 20 seconds. There was also a crash because the sessions could not be released. When using Apple Intelligence, the design must not cause the sessions to run in parallel.

Warm-up is when you do it because it’s slow, but it doesn’t really make sense for Apple Intelligence. The translation session of Pre-Babel Lens starts in 0.02 seconds. To get better responses, it’s best to reduce the amount of text in the initial segment. If you are using a method of displaying characters, you may also need to display raw without waiting for the end of structured output.

Conlusion

Although Apple Intelligence is not much of a topic, it is not bad when translated. I was impressed by the certainty of structured output. With this, tools can be created. Swift was unique, but Codex can write moving code and set up the development environment, so it was quite easy.

I had been thinking about making an iPhone version since the Mac version has started working, but I realized at the time of releasing v0.3.0 that my iPhone is an iPhone SE and Apple Intelligence is not usable. I will have to look at its operation through an emulator for a while.

Local Translation App with Apple Intelligence

Pre-Babel Lens

Installation and Usage

Limitations

App Note

Development Notes

Reasons for Using Apple Intelligence

Piracy and LLMs

Using structured output with @Guide and @Generable

Speed and competition between multiple sessions

Conlusion

Like this:

Published by taiyofujii

One thought on “Local Translation App with Apple Intelligence”

Leave a ReplyCancel reply

Pre-Babel Lens

Installation and Usage

Limitations

App Note

Development Notes

Reasons for Using Apple Intelligence

Piracy and LLMs

Using structured output with @Guide and @Generable

Speed and competition between multiple sessions

Conlusion

Share this:

Like this:

Published by taiyofujii

One thought on “Local Translation App with Apple Intelligence”

Leave a ReplyCancel reply

Discover more from Taiyo Fujii, writing