plainfile-family-history

Plainfile Family History

An operating spec for a durable, file-first family-history archive with an AI research assistant layered on top.

status type works with format license

This project stemmed from one idea: for a hundred years, genealogy lived in a filing cabinet, and anyone could open the drawer. No login, no subscription, no schema migration. A century later a curious descendant could still pull the folder or open the book and read it. Modern genealogy software and workflows have lost that virtue.

Plainfile is that filing cabinet, built to last. Plain files at the foundation, with search, structured claims, and an AI research layer stacked on top of the files, never instead of them. Delete every layer above and the archive still works, the way the drawer still works.

NOTE: This is a specification and scaffold, not finished software.

It is the blueprint for a simple future proof system for family research. The goal of this repo is to establish simple standards that can be maintained with or without tooling. It also provides the spec to create tooling from scratch if you so wish and sample tools you can you use to get you started.

Repo, tools, and your archive

Three things live at arm’s length from each other, by design:

  1. This repo (public): the spec (SPEC.md, TOOLING.md, AGENTS.md), the docs, the generic fha tools (once built), an empty archive-template/, and a fictional example-archive/ fixture.
  2. The tools (public, in tools/): generic — they operate on any conforming archive and hold no family data. Publishing them is the manifestation of the spec. Tools are replaceable glue, regenerable from the spec.
  3. Your archive (private, separate repo): your real family’s records, created from archive-template/, depending on this repo’s spec and tools but never living inside it. Public examples stay fictional; your groceries don’t go in the cookbook.

Public examples must remain fictional. Do not open issues or PRs containing real records about living people, private family documents, raw DNA files, or identifying photos. See PRIVACY.md.

How the two repos relate

In practice you end up with two separate repositories — one public, one private:

plainfile-family-history/   ← PUBLIC:  the spec + the generic tools (this repo)
my-family-archive/          ← PRIVATE: your real family's records

They are not technically linked. The only relationship is that your private archive uses the tools that live in this public repo. There are two ways to get those tools to your archive (decide once the tools are built — you don’t need to now):

Either way, your private family data never enters this public repo. The public repo is the cookbook and the appliances; your private repo is your kitchen with your food in it.


Contents


What this is

Plainfile is an archive-first system. The durable archive — plain text and standard file formats on disk — is the source of truth. Every other moving part (the search index, the AI assistant, any genealogy app or website) is an optional, replaceable helper built from the archive and rebuildable from scratch.

It is designed to be operated with an AI coding agent. You open the archive in Claude Code (or any agent that reads AGENTS.md), and the agent helps you process records, draft sourced claims, build family trees, and surface research leads — while a set of small deterministic tools (the fha command suite, specified in TOOLING.md) does the mechanical work. The spec is written so that all of that tooling can be regenerated from the documents, in any language, if it is ever lost.

What this is not

How it works

Your existing photos and documents plus FIVE record types, all plain Markdown/YAML on disk:

Type What it is
Person P- A human — identity, flags, and prose.
Source S- A piece of evidence: a record, document, photo, interview.
Claim C- A single sourced assertion (a date, place, relationship) living inside its source record, moving through a suggested → accepted review lifecycle.
Place L- A physical location, identified by coordinates, with a dated name/jurisdiction history.
Hypothesis H- An unsourced working theory — a guess, never a fact, until evidence promotes it to a claim.

Around those, a rebuildable index (SQLite, regenerated from the files) powers search, family-tree generation, contradiction detection, and a research report — none of it authoritative, all of it disposable. The operating loop is simple: capture → file → process → review → report, with human review the only gate to an accepted fact.

Repository layout

plainfile-family-history/
├── README.md            ← you are here
├── SPEC.md              ← the law: philosophy, data model, physical format, governance
├── TOOLING.md           ← implementation design for every supporting tool (the fha suite)
├── AGENTS.md            ← canonical operating instructions for the AI agent
├── CLAUDE.md            ← Claude Code entry point (defers to AGENTS.md)
├── docs/                ← supporting documentation
│   ├── GETTING_STARTED.md
│   ├── GLOSSARY.md
│   └── FAQ.md
├── archive-template/    ← empty skeleton (+ fha.yaml) to copy when starting your own (private) archive
├── example-archive/     ← a small, fully fictional worked example (+ its own fha.yaml)
├── tools/               ← the generic fha command suite (skeletal in v1; see TOOLING.md)
├── tests/               ← fixtures for the linter (skeletal in v1)
├── PRIVACY.md           ← example-data policy
└── .github/             ← issue templates, contributing guide

Quick start

You need an AI coding agent that can read project instructions and run shell commands — Claude Code is the reference harness. The spec is harness-agnostic; anything that reads AGENTS.md works.

  1. Clone this repo and read SPEC.md end to end. It is the contract; everything else serves it.
  2. Open the folder in your agent. It will read CLAUDE.mdAGENTS.md and know the rules before you say anything.
  3. Build the tools. Declare tool-building mode and point the agent at the build order in TOOLING.md §15. The first milestone is the linter running clean on example-archive/.
  4. Start your own archive. Copy the structure, drop your first scan or note into inbox/, and ask the agent to process it.

See docs/GETTING_STARTED.md for the full walkthrough.

The documents

Document Read it for
SPEC.md The complete specification — what exists, how it lives on disk, and the rules that never bend. Start here.
TOOLING.md How every tool is built, in enough detail to rewrite it from scratch. The fha command suite, the index schema, the linter rules.
AGENTS.md What an AI agent may and may not do inside the archive — the contract, the operating modes, the workflows.
docs/GETTING_STARTED.md A practical first-session walkthrough.
docs/GLOSSARY.md Every term and ID type defined.
docs/FAQ.md Why files, why not a database, why AI, how durable is this really.

Design principles

  1. The archive is the source of truth; tools are replaceable.
  2. Durable, plain formats. .md, .txt, .csv, .jsonl, .yaml, .jpg, .tiff, embedded IPTC/XMP.
  3. Every important fact traces to a source. Uncited prose is story or context, never fact.
  4. AI suggestions are not facts. They enter a review queue and stay there until a human accepts them.
  5. Nothing generated is load-bearing. Index, search, trees, the website — all rebuildable from the files.
  6. Folder location is for human browsing; metadata carries meaning.
  7. Stay light. Long-term durability beats short-term convenience.

Status & roadmap

Current: spec v1.2 — milestone 1 complete.

The first implementation milestone is done: fha lint runs on the example archive with no errors. The intended build sequence (detailed in TOOLING.md §15):

A complementary project

A related project worth studying: if your interest is the research half — autonomous AI research loops, archive guides for specific countries, prompt templates for pushing a family tree backward — see autoresearch-genealogy by Matt Prusak. It and Plainfile arrived independently at the same files-first, Claude-Code-driven philosophy from different angles: that project is a research playbook, this one is the filing system the findings live in. They complement each other well.

Contributing

This is an early-stage spec and feedback is genuinely useful — especially from genealogists and from anyone building the tools against it. See .github/CONTRIBUTING.md. Issues and discussion welcome.

License

MIT. The spec, the documents, and any code built from them are free to use, adapt, and build on.