Tarot Core: Caching Strategy for AI Tarot Service

Problem Awareness

Content created by AI is always new, but calling the API with the same input repeatedly incurs costs. Tarot reading is especially like this. Even if the same card appears, different interpretations should be provided each time to keep users entertained. However, calling the OpenAI API every time will soon lead to a cost explosion.

Tarot Core solves this dilemma with a bucket system. By generating cache keys with 78 cards x 2 directions x 10 buckets = 1,560 unique combinations, the same combination is immediately returned from Valkey. This enforces a JSON response format with Structured Outputs, addressing both cost and latency.

Buckets: Balancing Diversity and Efficiency

78 cards x 2 directions x 10 buckets = 1,560 unique combinations. Each time a user makes a request, one of these combinations is randomly selected. The cache key is in the form tarot:read:{card}:{direction}:{bucket}, and subsequent requests with the same key are immediately returned from Valkey. The OpenAI API is only called when there is no cache.

Keywords are not included in the cache key. Even if the same bucket is selected, different keywords can lead AI to generate readings in different contexts. This design saves cache space while preventing monotony.

flowchart LR
    subgraph "Random Selection"
        Card[78 Cards]
        Dir[Upright/Reversed]
        Bucket[Bucket 1~10]
        Keywords[4 Keywords]
    end

    subgraph "Cache Key"
        Key["tarot:read:{card}:{dir}:{bucket}"]
    end

    Card --> Key
    Dir --> Key
    Bucket --> Key
    Key --> Valkey[(Valkey)]
    Keywords -.->|Reading Direction Setup| OpenAI[OpenAI API]

Structured Outputs: Consistency in Response Format

The most troublesome aspect of using the OpenAI API is the consistency of the response format. Tarot Core completely blocks this issue with the Structured Outputs API. By providing a Zod schema, the OpenAI API enforces a JSON format, eliminating client-side parsing errors.

export const ReadResponseSchema = z.object({
  title: z.string().min(1), // English card name
  titleKR: z.string().min(1), // Korean card name
  keywords: z.array(z.string()).min(1),
  advice: z.string().min(1), // Advice message
});

The system prompt includes constraints like “cold and natural like a fortune teller” and “no special characters allowed.” Card information and 4 keywords are delivered as JSON in the user message, allowing AI to understand the context.

Even in the event of a cache server failure, the service is not interrupted. All cache calls are wrapped in try/catch, treating lookup failures as cache misses and quietly ignoring storage failures. When Valkey is down, it gracefully falls back to direct OpenAI calls.

sequenceDiagram
    participant Client as Client
    participant Service as TarotService
    participant Cache as Valkey
    participant AI as OpenAI

    Client->>Service: Request for tarot reading
    Service->>Service: Random selection of card/direction/bucket/keywords
    Service->>Cache: Cache lookup

    alt Cache Hit
        Cache-->>Service: Return stored result
    else Cache Miss
        Service->>AI: Request Structured Output
        AI-->>Service: {title, titleKR, keywords, advice}
        Service->>Cache: Store result (ignore on failure)
    end

    Service-->>Client: Reading result

Module Design and Deployment

The design is intentionally simple. There is no database, and the card deck and keyword pool are all hardcoded in memory. The NestJS module structure consists of ConfigModule → ValkeyModule (global) → TarotModule, with TarotService handling all business logic. This simplicity reduces code volume and makes testing easier.

Configuration is managed in two layers: YAML files and environment variables. Basic settings are loaded with js-yaml and overwritten by 6 environment variables, including OPENAI_API_KEY. Zod schemas handle default values and validation simultaneously.

The Dockerfile uses a three-stage multi-stage build, and the Helm Chart operates with 2 replicas by default, automatically scaling from 2 to 10 with HPA. In the security context, it runs non-root, applies a seccomp profile, and removes all capabilities.

Room for Improvement

Currently, keywords are not included in the cache key, so if only keywords differ within the same bucket, a cache hit may result in an unintended reading. This is intentional by design, but without cache warming, OpenAI calls are concentrated during cold starts. We are considering pre-generating popular combinations in the future.

Conclusion

Tarot Core demonstrates a balance between caching and AI generation, reliable response formats created with Structured Outputs, and modern deployment using NestJS and Kubernetes. It stands out as a design that considers both cost and performance in a real operational environment, beyond just being a simple toy.