Skip to content

Introduction

Purpose and Scope

GENEALOGIX defines a human-readable, version-controlled archive format for family history data. It addresses the limitations of existing genealogy formats by providing:

  • Git-native architecture for reliable collaboration and version control
  • Evidence-first model where all claims are backed by documented sources
  • Human-readable YAML files instead of binary or proprietary formats
  • Structured data validation with JSON Schema compliance
  • Complete provenance tracking from repository to conclusion
  • Flexible organization - archives can be a single file, many files, or any combination

The specification covers:

  • 9 core entity types for comprehensive family history documentation
  • Universal file format with entity type keys
  • Evidence hierarchy from physical repositories to specific claims
  • Git workflow integration for collaborative research
  • Extensible schema system for future enhancements
  • Validation tools and conformance testing

File Format: All GENEALOGIX files use the same structure with top-level entity type keys (persons, sources, etc.) containing maps of entities. Parsers collate all entities of each type across all .glx files in the repository.

Design Principles

Clarity and Simplicity

  • YAML-based files that are readable and editable in any text editor
  • Consistent naming conventions with structured ID formats
  • Hierarchical organization following standard archival practices
  • Minimal required fields with rich optional metadata

Evidence-First Architecture

  • Source-backed assertions - every claim must reference evidence
  • Quality assessment - structured evaluation of evidence reliability
  • Citation specificity - exact references to source locations
  • Multiple evidence support - corroboration from multiple sources

Provenance and Traceability

  • Complete audit trails from repository to genealogical conclusion
  • Author attribution for all changes and contributions
  • Timestamp tracking for research chronology
  • Change documentation through Git commit history

Git-Native Collaboration

  • Branch-based research - isolate investigations in feature branches
  • Merge conflict resolution for conflicting evidence
  • Pull request reviews for quality assurance
  • Tag-based releases for milestone preservation

Terminology

Core Concepts

  • Archive: A complete GENEALOGIX repository containing all family history data
  • Entity: A typed record representing a person, event, place, or other genealogical concept
  • Assertion: A discrete, evidence-backed claim about a person, event, or relationship
  • Evidence Chain: The complete path from physical repository through source and citation to conclusion

Entity Types

  • Person: Individual human being with biographical information
  • Relationship: Connection between people (parent-child, marriage, etc.)
  • Event: Life events and facts (birth, marriage, occupation, residence, death)
  • Place: Geographic locations with hierarchical organization
  • Source: Original materials (books, records, certificates, websites)
  • Citation: Specific reference within a source with quality assessment
  • Repository: Physical or digital archive holding sources
  • Assertion: Evidence-based conclusion or claim
  • Media: Supporting photos, documents, and multimedia files

Evidence Hierarchy

  • Repository: Physical location (archive, library, church, government office)
  • Source: Document or record (parish register, census, certificate)
  • Citation: Specific reference (page number, entry number, URL)
  • Assertion: Claim supported by citations (person born on specific date)

Quality Assessment

  • Quality Rating: 0-3 scale indicating evidence reliability
    • 3 = Primary, direct evidence (birth certificate)
    • 2 = Secondary, direct evidence (census record)
    • 1 = Primary, indirect evidence (family Bible notation)
    • 0 = Secondary, indirect evidence (published biography)

Use Cases

Individual Research

Family historians maintaining personal archives with:

  • Complete family trees with source documentation
  • Research notes and evidence evaluation
  • Photo and document organization
  • Version-controlled research progress

Collaborative Projects

Research teams working together on:

  • Extended family documentation across multiple branches
  • Surname studies and one-name studies
  • Local history projects
  • Genealogical society publications

Institutional Archives

Libraries and archives preserving:

  • Community genealogy collections
  • Historical society records
  • Government genealogy databases
  • Academic research projects

Migration and Integration

Converting from existing formats:

  • GEDCOM format compatibility guidance (manual conversion)
  • Legacy database migration patterns
  • Paper record digitization guidance
  • Integration with existing genealogy software workflows

Comparison with Existing Formats

FeatureGENEALOGIXGEDCOMGramps XML
FormatYAML (human-readable)Custom binaryXML
Version ControlGit-nativeDifficultManual
Evidence ModelBuilt-in citationsBasic sourcesComplex
CollaborationGit workflowsFile sharingDatabase
ValidationJSON SchemaSyntax onlyPartial
ExtensibilitySchema-basedLimitedPlugin-based

Getting Started

The quickest way to understand GENEALOGIX is through examples:

  1. Quick Start: Follow the 5-minute tutorial
  2. Complete Examples: Explore the complete family example
  3. Specification Details: Read the detailed entity specifications in sections 4-8
  4. Implementation: Use the CLI tool for validation and management

Community and Support

GENEALOGIX is an open-source project welcoming contributions:

Version History

This specification follows semantic versioning:

  • Version 0.0.0-beta.1: Beta release
  • Version 1.0: Initial stable release (future)
  • Version 1.1+: Backwards-compatible enhancements (future)
  • Version 2.0: May include breaking changes with migration path (future)

See CHANGELOG.md for detailed version history.

Licensed under Apache License 2.0