Shikisha: Building LLM Workflows in Swift

I have been building more AI-assisted tools in Swift lately, and I kept running into the same awkward edge.

Calling an LLM from Swift is easy.

Keeping the surrounding workflow understandable and maintainable is the hard part.

The hard part starts once the feature becomes an actual system: prompts, structured output, retries, streaming, tools, retrieval, memory, tests, and a UI that should still feel like a normal Apple-platform app.

That is why I started building Shikisha, a Swift-native take on LangChain-style workflows for macOS and iOS. It is still early, currently 0.1.0, but it has reached the point where I am using it in real projects instead of just sketching examples around it.

The Problem Was the Glue

Most LLM demos begin in a reasonable place: send a few messages, get text back, print the answer.

That is enough for a toy chatbot. It is not enough for the kind of tools I have been building.

JPResume, for example, turns a western resume into Japanese 履歴書 and 職務経歴書 formats. The model is useful there, but only in certain places. I still want deterministic parsing where possible. I want validation between stages. I want structured JSON back from the model, not a nice paragraph that looks plausible. I want to inspect intermediate artifacts before they become a PDF I might send to a company.

Once a tool has those constraints, the model call becomes one step in a larger system.

Before Shikisha, I could still build that system in Swift, but each project carried its own small pile of glue code:

one-off provider wrappers
handwritten JSON parsing
ad hoc retry logic
duplicated streaming adapters
separate mock implementations for tests

None of it was individually hard, but it made each new project feel like starting over.

With a shared workflow shape, those pieces become composable parts instead of per-project scaffolding.

I wanted the same basic vocabulary to work across small chains, RAG pipelines, tool-using agents, and app UI.

A Swift Shape for LangChain Ideas

Shikisha borrows ideas from LangChain, but the implementation is written for Swift rather than translated line by line.

Most AI tooling ecosystems assume Python. That works well for experiments and backend services, but Apple-platform apps eventually need the workflow to coexist with SwiftUI, structured concurrency, app state, testing, sandboxing, and platform-native APIs.

I also wanted workflows that could live directly inside Apple-platform apps instead of assuming a separate Python orchestration backend.

That is the part I wanted in Swift.

A way to make LLM workflows feel like normal typed application code instead of a separate scripting layer bolted onto the side.

The core idea is one protocol:

public protocol Runnable<Input, Output>: Sendable {
    associatedtype Input: Sendable
    associatedtype Output: Sendable
    func invoke(_ input: Input) async throws -> Output
}

Prompt templates, chat models, output parsers, retrievers, and whole chains can all be Runnable. That means a simple prompt-model-parser pipeline can read top to bottom:

let chain = ChatPromptTemplate.fromTuples([
    .system("Answer concisely."),
    .human("{question}")
])
.pipe(model)
.pipe(StringOutputParser())

let answer = try await chain.invoke(["question": "What is Swift Concurrency?"])

The important part is not the operator. It is the contract.

If a prompt produces messages, and a model accepts messages, the compiler can check that connection. If a parser expects an AIMessage, the model has to provide one. When a chain works, it can be invoked, batched, retried, wrapped with callbacks, or tested with a fake model in the same style as any smaller piece.

flowchart LR
    prompt[Prompt template]
    model[Chat model]
    parser[Output parser]
    tools[Typed tools]
    memory[Memory]
    stateGraph[StateGraph]
    callbacks[Callbacks]

    prompt --> model --> parser
    model --> tools
    tools --> model
    model --> memory
    memory --> model
    parser --> stateGraph
    callbacks -.-> model
    callbacks -.-> tools
    callbacks -.-> stateGraph

Shikisha also uses the Swift tools I would expect to reach for in 2026: structured concurrency, Sendable types, AsyncSequence streaming, Codable shapes for provider payloads, and Swift Package Manager as the normal installation path.

What Is Included So Far

The current release covers the pieces I keep needing.

Models and Prompts

chat model adapters for OpenAI, Anthropic, Google Gemini, and Ollama
prompt templates
output parsers, including structured output helpers

Retrieval and Indexing

document loaders, text splitters, embeddings, and vector stores
retrievers for RAG, including vector, BM25, hybrid, MMR, multi-query, parent-document, time-weighted, self-querying, and contextual compression
incremental indexing so changed documents do not always require re-embedding everything

Agents and Workflows

memory for conversations
typed tools and a tool-calling agent loop
StateGraph for cyclic or stateful workflows

Observability and Testing

callbacks for tracing, usage, cost tracking, and other side effects
FakeChatModel and local embeddings for offline examples and tests

That list sounds large, but the useful thing is that the pieces share a shape. A RAG chain and a basic one-question prompt are not separate worlds. They are different arrangements of the same few ideas.

You can run the example suite without an API key:

swift run ShikishaExamples
swift run ShikishaExamples basicChain
swift run ShikishaExamples ragPipeline

Tests and demos should not depend on a live provider whenever the behavior under test is the workflow itself.

The Documentation Is Part of the Project

I recently added full DocC documentation and tutorials, which changed how the project feels.

The docs now include conceptual guides for Swift developers who are new to LLM app patterns, plus feature articles for chat models, prompts, output parsers, structured output, documents, embeddings, retrievers, memory, agents, graphs, indexing, observability, and resilience.

The tutorials are more practical. They walk through building:

a streaming chatbot with memory
a RAG app that answers from your own documents
a tool-using agent
a small coding agent with file tools
a universal SwiftUI app around that agent

The coding-agent tutorial is a good example of why I wanted this framework in Swift. A surprising amount of an AI coding assistant is just a model, a few file tools, and a loop. The details still matter: sandbox paths, typed tool arguments, tool specs, memory, callbacks, and deciding when the model can act. But once those pieces have names, the workflow is easier to reason about.

Even the agent loop stays small enough to read:

let agent = ToolCallingAgent(
    model: model,
    tools: [readFileTool, editFileTool],
    maxIterations: 8
)

let result = try await agent.run([
    SystemMessage(content: "You are a coding assistant working in this project."),
    HumanMessage(content: "Summarize the TODOs in this project.")
])

It also becomes easier to move from a command-line experiment into a SwiftUI app without rewriting the whole mental model in another language.

Where I Am Already Using It

JPResume is the first public project where Shikisha fits naturally.

The resume pipeline needs model calls, structured output, validation, and reviewable intermediate files. Shikisha gives me a cleaner way to express those stages without every stage owning its own provider wrapper or parser convention. It does not remove the hard product decisions, like what the model should be allowed to infer, or how strict validation should be. It just gives the workflow a better foundation.

I am also using Shikisha in another project that I will write about later. That project pushed on a different side of the library: agent behavior, tools, and app integration. I am not ready to describe it yet, but it has been useful as a second check that Shikisha is not only shaped around one resume tool.

That is usually the moment when abstractions start proving themselves. Not when the first demo works, but when the same abstractions survive a second project with different pressure.

What Still Feels Early

Shikisha is not a mature ecosystem. It is a young Swift package, and the API may still move as the real use cases sharpen.

There are also places where the older approach is still fine. If all you need is one provider call behind one button, adding a framework might be more structure than the feature deserves. A small direct URLSession call can be the right answer.

Shikisha starts to matter when the workflow becomes more than the call: when you need a repeatable chain, retrieval, structured output, tool execution, memory, tracing, retries, streaming, or tests that should run without hitting a provider.

I am not trying to make every AI feature look elaborate. I am trying to make the elaborate ones less fragile.

What Changed

The interesting part for me is not that Shikisha can call models.

It is that the same abstractions now survive across:

command-line experiments
SwiftUI apps
retrieval pipelines
typed tool execution
local testing

That is the point where a library starts becoming infrastructure instead of just a demo.

The documentation is here:

krisbaker.com/Shikisha/documentation/shikisha

The repo is here:

github.com/KristopherGBaker/Shikisha

If you build Swift apps and have been curious about adding LLM workflows without leaving the Apple-platform toolchain, Shikisha is my current answer. It is still early, but it is already past the point of being just an experiment.