We have developed a node-based no-code AI test environment that may be useful for those who are thinking about developing Apple AI, local AI, and physical AI.

It is a playground where you can try Apple Intelligence’s Foundation Models (in-device LLM) by simply connecting nodes and entering prompts.I tried to set up chat, grammar judgment, and proper noun extraction, and it starts to move in a few minutes, and you can easily do A/B testing of prompts and input data.
There is no user guide yet, but I think you will know how to use it if you touchie it, so please try it.
The other day, we released zen Babel, a local translation app for macOS and iOS.
The translation engine is the Apple Translation Framework, a machine learning model, and the Apple Intelligence Foundation Models, a local LLM. Both are system-standard machine learning and AI, but I worked on this AI development.
Apple Intelligence’s local LLM, the number of elements of Foundation Models is 3B. With this, everything from translation, composition, proofreading to image generation can be done with a local model. Optimization, including distillation, is probably working. You’ll want to use it, right?However, there is no Web API, and there is no playground like OpenAI. The only way to try it is to make an app, so many people don’t know what they can actually do.In fact, I didn’t know if I could translate properly until I made zen Babel (predecessor pre-Babel Lens).
When I actually incorporated it into the app, it was overwhelmingly slower and less reliable than the machine learning model, but it was usable for a lightweight local LLM of about 3B.Even if it’s a little suspicious, you can at least translate, and as long as the warm-up is over, a light task may return the result in about 0.2 seconds. This speed is attractive. Depending on how the session is set up, it may be possible to incorporate heuristic processing into the app.
However, the hurdle is high to try.After all, you can’t try it unless you write an app.In fact, I’m thinking of putting an Apple AI session that extracts proper nouns in the local translation app zen Babel I’m making now to improve translation quality, but for that, I’m embedding a new AI session in the processing flow of the already running app and coordinating it with other AI sessions (I’ll write later, but processing series is extremely important in local AI) and then “it’s useless after all” is too damaging.That’s why I want to test how much it costs to extract proper nouns at the playground, whether reliable output can be obtained in the first place, and whether it can be used if there is preprocessing.
So we decided to build a playground for Apple AI.I’m not a program writer myself, so I want a no-code node modeler. The session switches between schema and Apple Intelligence’s @Guide/@Generable, and the node edge passes the Schema or Object.
FM-Deck
I was making it using Codex for about 4 days, but it was done faster than I thought.
So far, there are the following 7 types of cards.
- Input: A card that receives input to the session (also an argument when called by the parent Swift)
- Schema: A modeler that defines an LLM output schema
- Session: LLM session. A card that receives message, Schema, prompt, and sends json defined in schema
- Constant: A card that defines a constant Object
- Output: The card that receives the final output (the part that returns to the parent Swift project)
- Split-router: A card that can retrieve Object and Schema parameters
- Merge-router: A card that bundles multiple Objects into one
If you connect the node and click the run button, the result will appear if the session is executed and connected to the Output.
The Schema modeler has an interface like this, and I’m also thinking of importing json.The definition of input and constant also diverts the same interface.

When you run a session, the Session card measures the duration (for benchmarking). You can also specify a timeout for the Session card. The processing of LLM is different from deterministic algorithms, and the processing time may suddenly increase with a slight difference in prompts and input data.In the case of Apple AI, you often have to wait 10 or 20 seconds when the guardrail is violated, but you can also use FM-Deck to design how long you can wait on the user experience of the application.

If it’s a heavy process, let’s split it, there are many processors, so let’s run it in parallel with a thread-safe design. Another scary thing about LLM is that the common sense of traditional programming does not work, which is to wait for the results to return asynchronously.
1 Apple Intelligence Foundation Models belong to the lightest category among the local LLMs available for practical use, but if I try to put two in parallel on my MacBook M2, no matter how light the session is, one of them will be forced to time out.If you try to use it in the mood of the machine learning model, even 3B is not light at all. If you look at it with Instrument, you can see that ANE and GPU resources are exclusive like a comb when the session runs. There is a gap, but if it overlaps by mistake, it will fall.
For example, if you extract proper nouns from the news and sort them in order according to the importance of the original context (it can also be used for displaying news information), I think many people will first extract the name, create an array, and then rearrange it.If it is processed just before, the original text can be left in the cache, and it is cumbersome to follow the order while extracting the name – yes, it is a trap. LLM is not a cute thing that increases the load linearly depending on the length of the information and output given.
We have prepared two proper noun extraction sessions here.

Green is news, blue-green (the color called teal in Apple’s development) is the schema, and light blue is the AI processing session. As you can see, it is a program that enters the same news and outputs it with the same schema. The only difference is the prompt. When the warm-up is over, the processing will be completed in about 1.5 to 2.2 seconds.There is almost no loading time for the session. You can expect a small heuristic treatment. However, the prediction betrays.
The prompt of the above session:You are a proper noun extraction device.Extract proper nouns from the userMessage statement and output them to names:["name","name2"].The prompt of the following session:You are a proper noun extraction device.Extract proper nouns from the userMessage statement and output them to names:["name","name2"].Name is arranged in order of importance mentioned in the userMessage.
As you can see, the session below is sorted for importance. If you think about it normally, the bottom should be heavier. However, the bottom of Apple AI ends faster. The result is also generally correct than above. I still don’t know why. No matter how many times you try, it’s faster to rearrange.Maybe it’s because the context leading to the decision is sufficient – just like a human being.
Release 0.1.1 also has a sample of this session model, so please play with it.
From now on, FM-Deck will add sessions in the form of @Guide/@Generable format, which is a feature of Apple Intelligence, and machine learning models such as Apple Vision Framework, so that models and Schema of sessions described in Swift can be output.Looking forward to it!

