S177 Architecture Design — Lark Base Controlled CRUD Gateway (2026-05-19)
S177 — Architecture Design Document: Lark Base Controlled CRUD Gateway
Status: DRAFT for Huyên review → then Sprint 1 implementation
Date: 2026-05-19
Source of truth: knowledge/dev/lark/s177-controlled-crud-gateway-requirements-v2.md (đề bài v2.2 FINAL, GPT R3 8.8/10)
Author: Claude Code (Opus 4.7)
Survey basis: KB architecture contract (knowledge/dev/lark/README.md rev3, lark-client-architecture.md S176, lark-base-registry.md, snapshot 88-phai-cu-base-dem.md) + live @larksuiteoapi/lark-mcp tool surface observed in-session. Live source /opt/incomex/lark-client/ was NOT directly read — see §J Open Question OQ-1.
Readback provenance (S177-DESIGN-E1): original written to VPS
/opt/incomex/docs/mcp-writes/s177-architecture-design.md;write_filereported 31407 bytes; byte-identical readback ⇒ SHA-2560440ef92ee9f5355c16902aaf417a346b1b2a97adbd7dded360cf320763639e5, 502 lines. NOT git-committed; final repo path not yet populated.
A. Executive Summary
Scope
Add controlled write capability (records + fields + later tables/views) to the existing read-only lark-client v1.0.0, behind a mandatory 8-layer SafetyLayer, exposed through two tracks that share one Application Service Layer:
- Track B (CLI, production-grade, built first):
lark-tool records .../lark-tool fields ... - Track A (MCP, Cowork interactive): adapter over the same service layer; production delete forbidden, Base đệm only.
Architecture (target)
Cowork / MCP (Track A) Claude Code CLI / Cron (Track B)
│ │
└────────────────┬─────────────────┘
▼
Application Service Layer (lark_client/service.py)
— single write entrypoint, no duplicated write logic —
▼
SafetyLayer (lark_client/safety.py)
dry-run → approval → backup(GPG) → audit-pre → lock →
rate-limit → PII-scan → Lark API call → audit-post
▼
LarkCore (existing — GSM token, whitelist, retry)
▼
Lark Open API https://open.larksuite.com
Sprint plan (from requirements §10, unchanged)
| Sprint | Deliverable | Track |
|---|---|---|
| 1 | writer.py + SafetyLayer core + GPG backup + 2-phase audit + tests |
B |
| 2 | MCP adapter over service layer + record.get / record.delete MCP |
A |
| 3 | field_manager.py (Text/Number/SingleChoice/Checkbox) + ApprovalProvider interface + Directus prototype |
B |
| 4 | table/base schema ops + monitoring + full integration test | A+B |
Top risks identified during survey
| # | Risk | Severity | Mitigation |
|---|---|---|---|
| R-1 | Design not validated against live source code (could not read /opt/incomex/lark-client/) |
HIGH | Sprint 1 step 0 = code-reconcile checklist (§J OQ-1) before any new code |
| R-2 | Audit-post failure after a successful destructive API call → write with no trail | HIGH | 2-phase audit + emergency fallback log to a second sink (§C.6) |
| R-3 | PII leaking into audit/backup logs | HIGH | 2 parallel PII layers + metadata-only audit + GPG-encrypted backup |
| R-4 | GPG private key on VPS → backups decryptable by an attacker who roots the box | HIGH | Public-key-only on VPS; private key offline with Huyên (§E) |
| R-5 | Lark batch partial failure leaves data half-written | MED | Stop-and-report, no auto-rollback, auto-generated manual rollback cmd (req §12.10) |
| R-6 | Track A lark-mcp cannot be extended via config → needs custom MCP server |
MED | Gap analysis §F decides this in Sprint 2; service layer makes either path cheap |
| R-7 | Cowork/MCP accidentally hitting production Base instead of Base đệm | MED | Hard allowlist: MCP write path rejects any app_token ≠ Base đệm token (§F, §H) |
B. Application Service Layer
File: lark_client/service.py
Principle (req §12.8): CLI and MCP MUST call this layer. No write logic anywhere else. Never import requests; all HTTP via existing LarkCore.
B.1 Interface
class WriteOutcome(TypedDict):
status: Literal["dry_run", "success", "partial_failure", "failed", "aborted"]
operation: str # "record.create" | "record.delete" | "field.create" ...
base_key: str
table_id: str
targets: list[str] # record_ids / field_ids affected (or planned)
idempotency_key: str # UUID v4
rollback_command: str | None # auto-generated, printed to stdout
audit_pre_id: str # id of the pre-execution audit entry
audit_post_id: str | None
pii: dict # metadata only (see §D.4)
error: str | None
class LarkWriteServiceABC(ABC):
@abstractmethod
def create_record(self, ctx: WriteContext, fields: dict) -> WriteOutcome: ...
@abstractmethod
def batch_create_records(self, ctx: WriteContext, records: list[dict]) -> WriteOutcome: ...
@abstractmethod
def get_record(self, ctx: ReadContext, record_id: str) -> dict: ...
@abstractmethod
def update_record(self, ctx: WriteContext, record_id: str, fields: dict) -> WriteOutcome: ...
@abstractmethod
def batch_update_records(self, ctx: WriteContext, records: list[dict]) -> WriteOutcome: ...
@abstractmethod
def delete_record(self, ctx: WriteContext, record_id: str) -> WriteOutcome: ...
@abstractmethod
def batch_delete_records(self, ctx: WriteContext, record_ids: list[str]) -> WriteOutcome: ...
# Sprint 3+
@abstractmethod
def create_field(self, ctx: WriteContext, spec: FieldSpec) -> WriteOutcome: ...
@abstractmethod
def update_field(self, ctx: WriteContext, field_id: str, spec: FieldSpec) -> WriteOutcome: ...
@abstractmethod
def delete_field(self, ctx: WriteContext, field_id: str) -> WriteOutcome: ...
# Sprint 4
@abstractmethod
def create_table(self, ctx: WriteContext, spec: TableSpec) -> WriteOutcome: ...
@abstractmethod
def delete_table(self, ctx: WriteContext) -> WriteOutcome: ...
@abstractmethod
def list_views(self, ctx: ReadContext) -> list[dict]: ...
WriteContext carries the intent, not the credential:
@dataclass(frozen=True)
class WriteContext:
base_key: str # registry key, resolved to app_token via bases.yaml (NEVER hardcoded)
table_id: str
operation: str # canonical op id, drives approval defaults + wildcard policy
agent: str # $LARK_AGENT ∈ {claude-code, cowork-mcp, cron}
approval_id: str
dry_run: bool = True # default ON (req §5.1)
confirmed: bool = False # --confirm; required for update/delete on production
idempotency_key: str = field(default_factory=lambda: str(uuid4()))
is_buffer_base: bool = False # True iff base_key resolves to Base đệm token
B.2 LarkWriteService (concrete, Sprint 1)
- Constructed with dependency-injected collaborators (no internal
new):LarkWriteService(core: LarkCore, safety: SafetyLayer, registry: Registry) - Resolves
base_key → app_tokenthrough the existingRegistry/bases.yamlSSOT. RaisesUnknownBaseErrorif not in registry (req §12.1, no hardcodedapp_token). - Builds the Lark request, then delegates the entire mutating call to
SafetyLayer.guard(...)— the service never callsLarkCorewrite methods directly. - Returns
WriteOutcome; never raises for an expected guarded rejection (returnsstatus="aborted"+error); raises only on programming errors.
B.3 Error handling strategy
| Class | Example | Behaviour |
|---|---|---|
ApprovalError |
missing/expired/scope-mismatch/used one-time approval | abort before API, status=aborted, exit code 3 |
SafetyViolation |
dry-run not run, lock held, PII block, audit-pre fail | abort before API, exit code 3 |
LarkApiError |
4xx/5xx from Lark after retries | status=failed, audit-post records failure, exit 4 |
PartialFailureError |
batch: some items ok, some not | status=partial_failure, no auto-rollback, print manual rollback cmd, exit 5 (req §12.10) |
AuditError (post) |
audit-post sink down after API success | status=success + warning, emergency fallback log written (§C.6), exit 0 with warning |
All errors derive from existing lark_client.exceptions base; add the new subclasses there (do not invent a parallel hierarchy).
B.4 Rate limit
Reuse the existing LarkCore global file lock /var/lock/lark-api.lock @ 10 req/s (README §4). The service adds batch sizing: split any batch >500 into ≤500-record chunks (req §5.7), each chunk a separate guarded call with its own idempotency sub-key {idempotency_key}#{chunk_index}. Chunk failure → stop, report which chunks committed (R-5).
C. SafetyLayer Design
File: lark_client/safety.py
Single public method: guard(ctx: WriteContext, payload, api_call: Callable) -> WriteOutcome
api_call is a zero-arg closure that performs exactly one LarkCore mutating request; SafetyLayer decides if/when to invoke it.
C.1 Execution order (req §4 invariant)
1 dry-run gate 2 approval check 3 backup (GPG) 4 audit-pre
5 lock acquire 6 rate-limit 7 PII scan 8 → api_call()
9 audit-post
10 lock release
approval_exempt_bases bypasses layer 2 only; layers 1,3,4,5,6,7,8,9 always run (req §5 note, §13.3).
C.2 Per-layer behaviour & failure mode
| # | Layer | Pass condition | Failure mode |
|---|---|---|---|
| 1 | dry-run gate | if ctx.dry_run: build + validate payload, return status=dry_run WITHOUT calling API. Real run requires dry_run=False; update/delete on a non-buffer base also requires confirmed=True |
not confirmed → SafetyViolation, abort |
| 2 | approval | ApprovalProvider.check(ctx) → valid, unexpired, scope covers base_key+table_id, op allowed, wildcard policy ok, one-time not yet consumed |
invalid → ApprovalError, abort |
| 3 | backup | for update/delete: get_record(s) BEFORE mutation, serialize, GPG-encrypt, write to backups dir, fsync |
encryption/write fail → SafetyViolation, abort (never mutate without a backup) |
| 4 | audit-pre | append phase=planned JSONL entry, fsync, capture entry id |
write fail → ABORT, do not call API (req §9) |
| 5 | lock | acquire per-record advisory lock lark-write:{base_key}:{table_id}:{record_id} (and the global rate lock) |
lock held → SafetyViolation (concurrent write), abort |
| 6 | rate-limit | token-bucket 10 req/s via existing global lock; batch ≤500 | exceeded → block/wait, then proceed |
| 7 | PII scan | run FieldPIIRegistry + PatternPIIDetector over payload; compute redaction metadata; policy: detection never blocks the write itself — it only controls what audit/backup record (metadata-only) | scanner crash → fail-closed: abort with SafetyViolation |
| 8 | api_call | invoke the closure once (with idempotency/client_token) | Lark error after retries → LarkApiError, jump to audit-post(failed) |
| 9 | audit-post | append `phase=success | failed` JSONL entry |
| 10 | release | always release locks in finally |
— |
C.3 ApprovalProvider — dependency-injected (req §7, §13.11)
class ApprovalProvider(ABC):
@abstractmethod
def check(self, ctx: WriteContext) -> ApprovalDecision: ...
@abstractmethod
def consume(self, ctx: WriteContext, approval_id: str) -> None: ... # one-time-use marking
class YamlApprovalProvider(ApprovalProvider): # Sprint 1–2
def __init__(self, path="config/write-approvals.yaml"): ...
class DirectusApprovalProvider(ApprovalProvider): # Sprint 3+ prototype
...
SafetyLayer.__init__(self, *, approval_provider: ApprovalProvider, ...) — SafetyLayer never imports YamlApprovalProvider. Wiring happens in a composition root (lark_client/factory.py or CLI bootstrap). Swapping YAML→Directus must not touch safety.py.
C.4 Wildcard / first-write policy (req §7, §13.8, §13.10)
Enforced inside layer 2 from a static table:
| Operation | Wildcard table allowed? |
|---|---|
| record.create | ✅ (within scope+expiry) |
| record.update / delete | ❌ |
| field.create/update | ❌ |
| field/table delete | ❌ (break-glass, explicit) |
First write to any specific base/table → approval scope MUST name explicit base_key+table_id; wildcard rejected regardless of operation.
C.5 Approval defaults (req §8, baked into provider validation)
record.create reusable-within-expiry (narrow scope only); record.update one-time; record.delete one-time mandatory; field.create/update one-time mandatory; field/table delete break-glass one-time mandatory. Reusable must be explicit and is forbidden for delete/schema ops.
C.6 Audit 2-phase + emergency fallback (req §9)
- Primary sink:
/var/log/lark-ops/YYYYMMDD.jsonl(existing audit stream, append+fsync). - Phase 1 (pre): entry
{phase:"planned", op, base_key, table_id, targets, agent, approval_id, idempotency_key, ts}. Fsync fail → abort, API not called. - Phase 3 (post): entry
{phase:"success"|"failed", ...same id..., lark_response_meta, pii:{...}}. - Emergency fallback: if phase-3 write fails after a successful API call, write to an independent sink
/var/log/lark-ops/EMERGENCY/<ts>-<idempotency_key>.json(separate file, separate fd) AND emitWriteOutcome.status="success"witherror="audit_post_degraded". Never silently swallow. If even the emergency sink fails → also stderr-print a structuredLARK-AUDIT-LOSTline so cron/CI capture it.
D. PII Protection (req §6)
Two layers run in parallel (both always active — 18 bases built by many people, unknown PII fields).
D.1 FieldPIIRegistry (whitelist)
- Structure:
config/pii-fields.yamlbases: "65-yeu-cau-thanh-toan": tblXXXX: fldYYYY: { type: national_id, label: "CMND/CCCD" } fldZZZZ: { type: bank_account } - Loaded once at service init; keyed by
(base_key, table_id, field_id)— field_id, never name (req §12.7). Seed from the S176 schema snapshots; growable by PR.
D.2 PatternPIIDetector (regex, for unknown/legacy fields)
VN-specific patterns (ordered, longest/most-specific first to reduce false positives):
| Type | Pattern (anchored on token boundaries) | Note |
|---|---|---|
national_id_cccd |
\b\d{12}\b |
CCCD 12 digits — match before phone |
national_id_cmnd |
\b\d{9}\b |
CMND 9 digits |
passport |
\b[A-Z]{1,2}\d{7}\b |
e.g. B1234567, C12345678 |
phone_vn |
`\b(?:+84 | 0)(?:3 |
bank_account |
\b\d{8,16}\b |
heuristic — high false-positive; only flag, never auto-mutate |
email |
RFC-lite \b[\w.+-]+@[\w-]+\.[\w.-]+\b |
optional, low risk |
Detector returns types + counts only, never the matched substrings.
D.3 Pipeline integration
Both layers feed SafetyLayer layer 7. Union of (registry hits ∪ pattern hits) → redaction_types, redacted_fields_count. Policy decision (matches req §6): PII presence does NOT block the write — it governs what gets logged (metadata-only audit) and ensures the GPG backup (which does contain raw old values) is encrypted. A --pii-strict mode (off by default) MAY be added to abort on detection; default = log+proceed. Flag this default in §J OQ-3 for Huyên confirmation.
D.4 Audit redaction format (req §6)
{ "pii_redacted": true,
"redaction_types": ["national_id_cccd","bank_account"],
"redacted_fields_count": 3,
"detector": ["registry","pattern"] }
Raw values appear ONLY inside the GPG-encrypted backup blob — never in JSONL, never in stdout, never in rollback command (rollback cmd references the encrypted backup file path, not inline values).
E. GPG Backup Design (req §6, §13.6 — mandatory from Sprint 1)
E.1 Key source — GSM, public-key-only on VPS
Consistent with the golden rule "1 credential, GSM SSOT, never hardcode":
- New GSM secret in project
github-chatgpt-ggcloud:LARK_BACKUP_GPG_PUBKEY= ASCII-armored public key. - VPS fetches it via the same
LarkCore/secret path used forLARK_APP_*(do not read GSM directly in business code — extend the existing secret accessor). - Private key is NEVER on the VPS. Held offline by Huyên (hardware token / offline keyring). VPS can encrypt, cannot decrypt → a rooted VPS still cannot read PII backups (mitigates R-4).
E.2 Rotation policy
- Rotate annually, or immediately on suspected compromise.
- Procedure: generate new keypair offline → publish new public key as a new GSM version of
LARK_BACKUP_GPG_PUBKEY→ service picks uplateston next start → old backups remain decryptable with the retired private key (retain old private keys offline, indexed by fingerprint). Each backup file records the encrypting key fingerprint in its sidecar metadata.
E.3 File naming & storage
/var/log/lark-ops/writes/<YYYYMMDD>/
<base_key>__<table_id>__<record_id>__<idempotency_key>__pre.json.gpg
<base_key>__<table_id>__<record_id>__<idempotency_key>__pre.meta.json # unencrypted: key fp, ts, op, NO pii
Batch: one .json.gpg per chunk, records concatenated as JSON lines before encryption.
E.4 Recovery procedure
- Locate backup file by
idempotency_key(also recorded in the audit-pre entry). - Read sidecar
.meta.json→ confirm key fingerprint. - On Huyên's offline machine:
gpg --decrypt <file>.json.gpg > restored.json. - Re-apply via
lark-tool records update ... --data @restored.json --approval <new APR> --no-dry-run --confirm(recovery is itself a guarded write — fully audited, no special bypass).
F. Track A — MCP Plugin
F.1 Survey result: @larksuiteoapi/lark-mcp
Could not run npm list on the VPS (no exec tool). Authoritative substitute: the live lark-mcp tool surface bound to this very session, which is the same @larksuiteoapi/lark-mcp plugin. Exactly 9 bitable tools exposed:
| Available now (9) | Category |
|---|---|
bitable_v1_app_create |
Base app |
bitable_v1_appTable_create, bitable_v1_appTable_list |
Table |
bitable_v1_appTableField_list |
Field (read) |
bitable_v1_appTableRecord_search |
Record (read) |
bitable_v1_appTableRecord_create, _batchCreate |
Record (create) |
bitable_v1_appTableRecord_update, _batchUpdate |
Record (update) |
This exactly matches requirements §2 ("MCP plugin 9 tools — Create + Read + Update, NO Delete, NO field management").
F.2 Gap analysis
| Needed | In plugin? | Decision |
|---|---|---|
record.get by id |
❌ (only search) |
must add |
record.delete / batchDelete |
❌ | must add |
field.create/update/delete |
❌ (only field_list) |
must add |
appTable.update/delete |
❌ (only create/list) | must add (Sprint 4) |
view.list/create/delete |
❌ | must add (Sprint 4) |
The published @larksuiteoapi/lark-mcp exposes a fixed tool set; missing operations are not togglable via config (the plugin simply does not implement delete/field-mgmt tools). Conclusion: a thin custom MCP server is required for Track A — but it must NOT re-implement Lark calls. It is an adapter that imports lark_client.service.LarkWriteService and exposes new MCP tools.
F.3 Custom MCP adapter design (Sprint 2)
- New small server
lark_client/mcp_adapter/(Python, MCP SDK) OR extend if an internal MCP host exists — adapter only, zero write logic. - Tools exposed (Sprint 2 scope):
lark_record_get,lark_record_delete,lark_record_create,lark_record_update. Sprint 4: field/view tools. - Every tool builds a
WriteContextwithagent="cowork-mcp"and callsLarkWriteService. - Hard guard (R-7): the adapter resolves
base_key; if the resolved app_token ≠ Base đệm tokenNf2bb1ExXaYnlksgoyQl72GNgAc, anydelete/ schema op is rejected at the adapter boundary with a clear error (req §11, §13.4 — Cowork/MCP delete = Base đệm only). Production writes via MCP are adapter-only and still pass full SafetyLayer. - Auth: reuses GSM
LARK_APP_*throughLarkCore. No new bot, no new credential (req §12.2). - The existing 9-tool
@larksuiteoapi/lark-mcpmay remain mounted for read/create/update interactive use; the custom adapter only fills the gap. Final mount topology = §J OQ-4.
G. Track B — CLI Write Module
G.1 lark_client/writer.py
class LarkWriter:
def __init__(self, service: LarkWriteService): ... # DI, no write logic here
def create(self, ctx, fields: dict) -> WriteOutcome
def batch_create(self, ctx, records: list[dict]) -> WriteOutcome
def get(self, ctx, record_id: str) -> dict
def update(self, ctx, record_id: str, fields: dict) -> WriteOutcome
def batch_update(self, ctx, records: list[dict]) -> WriteOutcome
def delete(self, ctx, record_id: str) -> WriteOutcome
def batch_delete(self, ctx, record_ids: list[str]) -> WriteOutcome
writer.py is a typed façade over service.py (keeps CLI thin, satisfies req §12.8 single-layer rule). Return type always WriteOutcome.
G.2 lark_client/field_manager.py (Sprint 3)
class LarkFieldManager:
SUPPORTED = {"Text", "Number", "SingleSelect", "Checkbox"} # req §10 note
def create(self, ctx, name: str, ftype: str, options: dict|None) -> WriteOutcome
def update(self, ctx, field_id: str, spec: FieldSpec) -> WriteOutcome
def delete(self, ctx, field_id: str) -> WriteOutcome # field_id only, never name (req §12.7)
Complex types (Formula, Lookup, Link) → explicit UnsupportedFieldType until Sprint 4+.
G.3 CLI commands (Click)
cli/commands/records.py:
lark-tool records create <base-key> <table-id> --data '{...}|@file' --approval APR-xxx [--no-dry-run]
lark-tool records get <base-key> <table-id> <record-id>
lark-tool records update <base-key> <table-id> <record-id> --data '{...}' --approval APR-xxx --no-dry-run --confirm
lark-tool records delete <base-key> <table-id> <record-id> --approval APR-xxx --no-dry-run --confirm
lark-tool records batch-create/-update/-delete <base-key> <table-id> --data @file.jsonl --approval APR-xxx --no-dry-run [--confirm]
cli/commands/fields.py:
lark-tool fields create <base-key> <table-id> --name "X" --type Text --approval APR-xxx --no-dry-run
lark-tool fields update <base-key> <table-id> <field-id> ... --approval APR-xxx --no-dry-run --confirm
lark-tool fields delete <base-key> <table-id> <field-id> --approval APR-xxx --no-dry-run --confirm "tôi hiểu không thể undo"
Conventions: --dry-run default ON (omit --no-dry-run ⇒ dry run); update/delete on non-buffer base require --confirm; field/table delete require the literal acknowledgement string. $LARK_AGENT mandatory for batch (req §12.5), defaults to claude-code for interactive CLI. Exit codes per §B.3. Registered into the existing cli/lark_tool.py Click group (do not fork a new entrypoint).
G.4 config/write-approvals.yaml schema
approvals:
- id: APR-001
operation: record.update # canonical op id
scope:
base_key: "65-yeu-cau-thanh-toan" # explicit; wildcard table only for record.create
table_id: "tblXXXX" # required for first write & all update/delete/schema
one_time_use: true # default true; reusable must be explicit + not delete/schema
used: false
reason: "Fix sai số tiền dòng 42 theo yêu cầu KT"
created_by: "Huyên" # human-created (req §13.2)
created_at: "2026-05-19T10:00:00Z"
expires_at: "2026-05-20T10:00:00Z"
approval_exempt_bases: # bypass approval CHECK only; all other layers apply
- "88-phai-cu-base-dem"
G.5 Integration with existing LarkCore
- Reuse
LarkCorefor: GSM token, retry (3× backoff on 429/503/network), global rate lock, endpoint whitelist. - Whitelist additions to
config/allowed_endpoints.yaml(each = 1 reviewed change, req README §5.2 "write endpoints initially EMPTY"):POST /open-apis/bitable/v1/apps/:app_token/tables/:table_id/recordsGET .../records/:record_idPUT .../records/:record_idDELETE .../records/:record_idPOST .../records/batch_create | batch_update | batch_deletePOST .../tables/:table_id/fields·PUT/DELETE .../fields/:field_id(Sprint 3)POST /open-apis/bitable/v1/apps/:app_token/tables·DELETE .../tables/:table_id· view endpoints (Sprint 4)
- Idempotency: pass
client_token/UUIDper Lark write API where supported (record create/batch).
H. Testing Strategy
H.1 Base đệm (test target — CONFIRMED not production)
- Name:
88 - Phái cử (Base đệm)— registry row 8, role "staging/buffer". - app_token:
Nf2bb1ExXaYnlksgoyQl72GNgAc - Tables:
TTS=tblPQ6N79EeOmnTm(7 fields, PKSTT);Đơn hàng=tblaU7kxyPTNBSrR(5 fields, PKSTT). Duplex link fields between them. - Production Base 88 is a DIFFERENT token:
YSIkb8PxOaNaozs2vwalOOcagkf(80 tables, "Core"). Tests MUST NOT use this token. A test-time assertion rejects any app_token other than the Base đệm token (req §H, §13.4).
H.2 12 test cases (Base đệm only)
| # | Case | Expect |
|---|---|---|
| T1 | record.create dry-run default | no API call, status=dry_run |
| T2 | record.create --no-dry-run valid approval |
record created, audit pre+post present |
| T3 | record.update without --confirm on non-buffer |
aborted SafetyViolation |
| T4 | record.update on Base đệm with confirm | updated, GPG backup of old value exists |
| T5 | record.delete one-time approval, reuse same approval | 2nd call → ApprovalError (consumed) |
| T6 | batch_create 600 records | split 500+100, both chunks audited |
| T7 | batch partial failure (1 bad record) | partial_failure, no auto-rollback, rollback cmd printed |
| T8 | approval scope mismatch (wrong table_id) | ApprovalError |
| T9 | wildcard table on record.delete | rejected by wildcard policy |
| T10 | PII payload (CCCD + bank acct) | write proceeds, audit shows metadata only, raw only in GPG backup |
| T11 | audit-pre sink unwritable | API NOT called, abort |
| T12 | audit-post fails after success (inject) | status=success + emergency fallback file written |
H.3 Isolation & mocking
- Unit tests: mock
LarkCoreHTTP layer (no real API) — assert SafetyLayer ordering, approval logic, PII metadata, GPG invoked, audit phases. This is the bulk; mirrors existing 19/19 + 8/8 mocked style. - Integration tests (T2,T4,T6 subset): real Lark API against Base đệm only, gated behind env
LARK_TEST_INTEGRATION=1, hard-asserting the Base đệm token. Base đệm reset by Claude Code is permitted but the reset itself must be audited (req §12.12). app_tokenliteral allowed only intests/andbases.yaml(README §6).
I. Sprint Breakdown
Sprint 1 — Track B core (CLI records + safety)
Deliverables: service.py (record ops), writer.py, safety.py (8 layers), ApprovalProvider ABC + YamlApprovalProvider, GPG backup module, 2-phase audit, cli/commands/records.py, config/write-approvals.yaml, config/pii-fields.yaml, PII registry+pattern, whitelist record endpoints.
Acceptance: T1–T12 (record scope) green; existing 19/19+8/8 still pass; no import requests; no hardcoded app_token (pre-commit grep); dry-run default verified; GPG backup decryptable offline with the private key; audit-pre-fail aborts before API.
Sprint 2 — Track A MCP adapter
Deliverables: lark_client/mcp_adapter/ exposing lark_record_get/delete/create/update; Base-đệm hard guard for delete; wired to LarkWriteService.
Acceptance: Cowork can get/delete a record on Base đệm via MCP; MCP delete on a production token is rejected at adapter boundary; all MCP writes show in the same audit stream with agent=cowork-mcp; no write logic duplicated (adapter imports service).
Sprint 3 — Field operations + ApprovalProvider swap
Deliverables: field_manager.py (Text/Number/SingleSelect/Checkbox), cli/commands/fields.py, field endpoints whitelisted, DirectusApprovalProvider prototype injected without touching safety.py.
Acceptance: create/update/delete a Text+Number field on Base đệm; complex types rejected with UnsupportedFieldType; Directus provider passes the same approval contract tests as YAML provider (provider-swap test).
Sprint 4 — Schema ops + monitoring + full integration
Deliverables: table/base create/delete + view list/create/delete, maintenance-window + staging gate for schema ops (req §13.5), monitoring (audit volume / failure-rate alarms, e.g. uptime-kuma push), full end-to-end integration suite. Acceptance: schema op refuses to run outside declared maintenance window; full T1–T12 + schema cases green on Base đệm; monitoring fires on injected audit-loss; documentation + README §3/§8 updated.
J. Open Questions (resolve before Sprint 1 coding)
- OQ-1 (BLOCKER, R-1): This design was built from the KB architecture contract, not from reading
/opt/incomex/lark-client/(no shell/file access to that path in this environment). Sprint 1 must begin with a code-reconcile checklist: confirm actual module names/signatures ofLarkCore(token method, retry, rate-lock API),Registry/bases.yamlloader,lark_client.exceptionsbase classes, the Click group incli/lark_tool.py, and existing test harness conventions. Any deviation from this doc's assumed names is an implementation detail to adjust, not a redesign — but it must be checked first. - OQ-2: GPG key — confirm the public-key-only on VPS / private-key-offline model (§E) and who custodies the private key + GSM secret name
LARK_BACKUP_GPG_PUBKEY. If Huyên wants on-VPS decryption capability, R-4 mitigation weakens — needs explicit sign-off. - OQ-3: PII default — confirm "detect → log metadata → proceed" (NOT block) is the intended behaviour (matches req §6 wording). Decide whether
--pii-strict(abort on detection) ships in Sprint 1 or later. - OQ-4: Track A topology — keep the existing 9-tool
@larksuiteoapi/lark-mcpmounted alongside the custom adapter, or replace it entirely with the custom server? (§F.3) - OQ-5: Lark batch hard limit — requirements say 500/batch; confirm against Lark Open API current limit for
batch_deletespecifically (some endpoints cap lower). Sprint 1 will treat 500 as the configured ceiling, overridable in config. - OQ-6 (process): This file was written to
/opt/incomex/docs/mcp-writes/s177-architecture-design.md(the only VPS write-allowlisted dir) and was not git-committed (no exec tool / repo not in scope). Huyên or an agent with repo access must move it to the intended path and run theS177-DESIGN:commit.
End of S177 Architecture Design Document — DRAFT awaiting Huyên review on OQ-1…OQ-6.