Skip to content

Migration from GEDCOM Guide

Guide for converting GEDCOM files to GENEALOGIX format using the automated import tool.

Key Differences

Understanding how GENEALOGIX differs from GEDCOM helps you get the most out of your migration.

AspectGEDCOMGENEALOGIX
FormatCustom tag-based text formatYAML
Evidence modelSource citations attached to factsFull evidence chains (Source → Citation → Assertion)
Version controlMonolithic fileGit-native, multi-file archives
IDsSequential references (@I1@, @F1@)Typed, descriptive IDs (person-a3f8d2c1)
PlacesFlat comma-separated stringsHierarchical entities with coordinates
MediaFile paths or embedded BLOBsMedia entities with MIME types and metadata
NotesInline or shared text blocksFirst-class entity notes
ExtensibilityUnderscore-prefixed custom tagsCustom vocabularies

Before You Start

Prerequisites

Supported GEDCOM Versions

VersionSupport Level
GEDCOM 5.5.1Full support
GEDCOM 7.0Full support

The importer auto-detects the GEDCOM version from the HEAD.GEDC.VERS tag. Unknown versions are treated as GEDCOM 5.5.1.

Check your version

Open your .ged file in a text editor and look near the top for a line like 2 VERS 5.5.1 or 2 VERS 7.0.

Automated Import

Basic Usage

bash
# Import to multi-file archive (default)
glx import family.ged -o family-archive/

# Import to single-file archive
glx import family.ged -o family.glx --format single

CLI Flags

FlagShortDefaultDescription
--output-o(required)Output file or directory
--format-fmultiOutput format: multi or single
--no-validatefalseSkip validation before saving
--verbose-vfalseShow detailed import progress
--show-first-errors10Number of validation errors to show (0 for all)

Single-File vs Multi-File

Single-file (--format single): All entities in one .glx file. Best for small archives or sharing.

Multi-file (--format multi): One entity per file in a directory structure. Best for Git tracking, collaboration, and large archives.

bash
# Single-file output
glx import family.ged -o family.glx --format single

# Multi-file output creates a directory structure:
# family-archive/
# ├── persons/
# ├── events/
# ├── relationships/
# ├── places/
# ├── sources/
# ├── citations/
# ├── repositories/
# ├── media/
# ├── assertions/
# └── vocabularies/
glx import family.ged -o family-archive/

Import Statistics

After a successful import, the CLI prints a summary of what was created:

✓ Successfully imported to family.glx

Import statistics:
  Persons:       31
  Events:        77
  Relationships: 49
  Places:        5
  Sources:       3
  Citations:     12
  Repositories:  1
  Media:         0
  Assertions:    150

What Gets Imported

The importer processes records in dependency order across multiple passes, handling all standard GEDCOM record types.

Individuals: Names (with parsed components), gender, 20+ event types, 8+ property types, external IDs, notes, media references

Events: Each imported event receives an auto-generated title field for human readability. The format varies by event type:

  • Individual events: "Birth of Robert Webb (1815)"
  • Couple events: "Marriage of John Smith and Jane Doe (1850)"
  • Date-only: "Census (1860)"
  • Name-only: "Death of Jane Miller"

Unknown event types fall back to Title Case of the snake_case type (e.g., military_service → "Military Service"). Titles are generated from participant names and the event date — they are not extracted from the GEDCOM source.

Families: Spouse relationships, parent-child relationships (with pedigree types), 9 family event types, media

Sources and evidence: Source records with metadata, inline citations, evidence chain construction (Source → Citation → Assertion)

Places: Hierarchical place entities built from comma-separated GEDCOM places, with coordinates from MAP/LATI/LONG tags

Repositories: Repository records with address, contact info, and type detection

Media: Media entities with MIME type resolution, file path rewriting, and BLOB decoding

Notes: Both shared notes (GEDCOM 5.5.1 NOTE records, GEDCOM 7.0 SNOTE records) and inline notes

Media File Handling

The importer handles three types of media references:

  • Relative file paths: Copied to media/files/ in your archive, with paths rewritten automatically. Duplicate filenames are deduplicated (e.g., photo.jpgphoto-2.jpg).
  • URLs and absolute paths: Preserved as-is in the media entity's URI field.
  • BLOB data (GEDCOM 5.5.1): Binary data is decoded and written to files in media/files/.

Media source directory

Relative file paths in the GEDCOM are resolved from the directory containing the .ged file. Make sure media files are accessible at those paths before importing.

Field Mapping

Individual Records (INDI)

Events

GEDCOM TagGLX Event TypeNotes
BIRTbirthDate/place also propagated to person properties
DEATdeathDate/place also propagated to person properties
CHRchristening
BURIburial
CREMcremation
ADOPadoption
BAPMbaptism
BARMbar_mitzvah
BATMbat_mitzvah
BLESblessing
CHRAadult_christening
CONFconfirmation
FCOMfirst_communion
ORDNordination
NATUnaturalization
EMIGemigration
IMMIimmigration
PROBprobate
WILLwill
GRADgraduation
RETIretirement

Properties

GEDCOM TagGLX PropertyNotes
NAMEnameParsed into structured fields (see Name Conversion)
SEXgenderM→male, F→female, U→unknown, X→other
OCCUoccupationTemporal property
RELIreligion
EDUCeducation
NATInationality
CASTcaste
SSNssn
TITLtitleHandles CONT/CONC for long values
RESIresidenceTemporal property with date and place
FACT(varies)Mapped to properties or generic events based on content
EXIDexternal_idsGEDCOM 7.0 external identifiers
NOTEnotesInline or shared note text
OBJEmediaMedia references or embedded media

Special Handling

GEDCOM TagGLX MappingNotes
CENSTemporal properties + synthetic sourceCensus records create residence properties and synthetic sources/citations — not events
FAMCParent-child relationshipDeferred until all individuals are processed; uses PEDI for relationship type
FAMSSpouse linkUsed during family processing
NONegative assertionGEDCOM 7.0 only; creates assertion with no_ prefix

Family Records (FAM)

Participants

GEDCOM TagGLX MappingNotes
HUSBParticipant (role: spouse)
WIFEParticipant (role: spouse)
CHIL(via INDI FAMC)Parent-child relationships created from individual records

Events

GEDCOM TagGLX Event TypeNotes
MARRmarriageAlso sets start_event on the relationship
DIVdivorceAlso sets end_event on the relationship
ENGAengagement
MARBmarriage_banns
MARCmarriage_contract
MARLmarriage_license
MARSmarriage_settlement
ANULannulment
DIVFdivorce_filed
EVENeventGeneric family event

Source Records (SOUR)

GEDCOM TagGLX FieldNotes
TITLsource.title
AUTHsource.authors
PUBLsource.properties.publication_info
ABBRsource.properties.abbreviation
REPOsource.repositoryWith CALN stored as call_number
TEXTsource.description
NOTEsource.notes
DATA.EVENsource.properties.events_recordedEvent types this source records
DATA.AGNCsource.properties.agencyResponsible agency
DATA.DATEsource.date
TYPEsource.typeMapped via source type vocabulary
OBJEsource.media
EXIDsource.properties.external_idsGEDCOM 7.0

Source type inference

If no TYPE tag is present, the importer infers the source type from keywords in the title: "census" → census, "birth certificate" → vital_record, "parish register" → church_register, "newspaper" → newspaper, and so on.

Citation Subrecords (SOUR within events)

GEDCOM TagGLX FieldNotes
PAGEcitation.properties.locatorLocation within the source
DATA.DATEcitation.properties.source_dateWhen the source recorded the information
DATA.TEXTcitation.properties.text_from_sourceTranscription from source
TEXTcitation.properties.text_from_sourceGEDCOM 5.5.1 direct text
QUAYcitation.notesQuality assessment preserved as note
NOTEcitation.notes
OBJEcitation.media

Repository Records (REPO)

GEDCOM TagGLX FieldNotes
NAMErepository.name
ADDRrepository.address, .city, .state_province, .postal_code, .countryFull address with subfields
PHONrepository.properties.phones
EMAILrepository.properties.emails
WWWrepository.website
NOTErepository.notes
TYPErepository.typeGEDCOM 7.0
EXIDrepository.properties.external_idsGEDCOM 7.0

Repository deduplication

Repositories are automatically deduplicated by name, city, and country. If two GEDCOM REPO records match on these fields, they are merged into a single repository entity.

Media Records (OBJE)

GEDCOM TagGLX FieldNotes
FILEmedia.uriRelative paths rewritten to media/files/
FILE.FORM(MIME inference)GEDCOM 5.5.1 format
FILE.MIMEmedia.mime_typeGEDCOM 7.0 explicit MIME
FILE.TITLmedia.title
FORM(MIME inference)GEDCOM 5.5.1 format at OBJE level
FORM.MEDImedia.properties.mediumMedium type (photo, document, etc.)
TITLmedia.title
CROPmedia.properties.cropGEDCOM 7.0 crop coordinates
NOTEmedia.notes
BLOB(decoded to file)GEDCOM 5.5.1 deprecated binary data

Common Challenges

Name Conversion

GEDCOM names use slash delimiters for surnames and quotes for nicknames. The importer parses these into structured name fields.

GEDCOM:

1 NAME Dr. John "Jack" /von Smith/ Jr.

GENEALOGIX:

yaml
properties:
  name:
    value: "Dr. John \"Jack\" von Smith Jr."
    fields:
      prefix: "Dr."
      given: "John"
      nickname: "Jack"
      surname_prefix: "von"
      surname: "Smith"
      suffix: "Jr."

The importer also handles GEDCOM name substructure tags (NPFX, GIVN, NICK, SPFX, SURN, NSFX) which override the parsed values when present.

Multiple NAME records on a single individual are imported as a temporal name list. The TYPE subrecord (e.g., birth, married, aka) is preserved as the type field. See Name Variations for all supported type values.

Recognized surname prefixes include: von, van, de, der, den, del, della, di, da, le, la, du, des, af, av.

Place Hierarchy

GEDCOM stores places as flat, comma-separated strings (specific to general). The importer builds a proper hierarchy of Place entities with parent references.

GEDCOM:

2 PLAC Leeds, Yorkshire, England

GENEALOGIX:

yaml
places:
  place-3:
    name: "England"
    type: country

  place-2:
    name: "Yorkshire"
    type: county
    parent: place-3

  place-1:
    name: "Leeds"
    type: city
    parent: place-2

Place types are inferred from hierarchy depth (city, county, state, country) and keyword detection (cemetery, church, hospital, etc.). Coordinates from MAP/LATI/LONG subrecords are preserved.

Places are deduplicated by name and parent, so "Leeds, Yorkshire, England" appearing in multiple records creates only one set of place entities.

Date Formats

GEDCOM dates are converted to ISO 8601 format where possible. Qualified and range dates preserve GEDCOM keywords.

GEDCOMGENEALOGIXDescription
15 JAN 18501850-01-15Exact date
JAN 18501850-01Month precision
18501850Year precision
ABT 1850ABT 1850Approximate
BEF 1920BEF 1920Before
AFT 15 MAR 1900AFT 1900-03-15After
CAL 1850CAL 1850Calculated
BET 1849 AND 1851BET 1849 AND 1851Between range
FROM 1900 TO 1950FROM 1900 TO 1950Period range

See Core Concepts - Data Types for the complete date format specification.

Evidence Chains

GEDCOM attaches source citations directly to facts. The importer expands these into complete evidence chains with separate Source, Citation, and Assertion entities.

GEDCOM:

0 @I1@ INDI
1 BIRT
2 DATE 15 JAN 1850
2 SOUR @S1@
3 PAGE Page 23
3 DATA
4 TEXT "Born January 15, 1850"

GENEALOGIX:

yaml
sources:
  source-1:
    title: "Birth Certificate"
    type: vital_record

citations:
  citation-1:
    source: source-1
    properties:
      locator: "Page 23"
      text_from_source: "Born January 15, 1850"

assertions:
  assertion-1:
    subject:
      person: person-1
    property: born_on
    value: "1850-01-15"
    citations: [citation-1]

Assertions require citations

The importer only creates assertions when citations exist. Properties without source citations are stored directly on the entity without an assertion wrapper.

Pedigree Types

GEDCOM FAMC.PEDI tags specify the nature of parent-child relationships. The importer maps these to relationship types.

PEDI ValueGLX Relationship Type
birthbiological_parent_child
adoptedadoptive_parent_child
fosterfoster_parent_child
(empty or unknown)parent_child
Any other valueparent_child

Address Handling

GEDCOM ADDR records with subfields (ADR1, ADR2, CITY, STAE, POST, CTRY) are handled in two ways:

  1. Full address text: Preserved in the entity's address properties
  2. Place hierarchy fallback: When no PLAC tag is present on an event, ADDR subfields (CITY, STAE, CTRY) are used to build a place hierarchy

Census Records

Census records receive special handling. Instead of creating events, the importer:

  1. Creates a synthetic Source and Citation (titled "Census of {date}" or using the TYPE subrecord)
  2. Creates temporal residence properties on the person when a place is present
  3. Links everything through assertions backed by the census citation
  4. For family-level CENS records, applies the same data to both spouses

Post-Migration Workflow

Validate Your Import

bash
# Validate the imported archive
glx validate family-archive/

# Or for single-file
glx validate family.glx

Fix any reported errors and re-validate. Use --show-first-errors 0 to see all errors at once.

Review Import Results

After importing, review the results:

  • Entity counts: Check the import statistics match expectations for your tree
  • Relationships: Verify parent-child and spouse relationships were created correctly
  • Places: Review the inferred place hierarchy and types
  • Evidence chains: Spot-check that source citations created proper assertions

Enhancement Opportunities

The automated import creates a solid foundation. Consider enhancing:

  • Confidence levels: Add confidence: high/medium/low to assertions
  • Assertion status: Add status: proven/speculative/disproven to track research verification
  • Transcriptions: Add text_from_source to citations
  • Place details: Add coordinates and refine place types
  • Research notes: Add notes to entities documenting your analysis
  • Custom vocabularies: Extend vocabulary files for domain-specific event or relationship types

Git Tracking

Initialize version control for your archive:

bash
cd family-archive/
git init
git add .
git commit -m "Import from GEDCOM: family.ged"

Troubleshooting

Common Issues

"Failed to open GEDCOM file" Check that the file path is correct and the file is readable.

Validation errors after import Run with --no-validate to skip validation and import anyway, then fix issues manually. Use --show-first-errors 0 to see all errors.

Missing media files Media files referenced by relative paths must exist alongside the .ged file. The importer copies them to media/files/ in your archive. Check that the files exist at the paths specified in your GEDCOM.

Garbled text or encoding issues The parser handles UTF-8 (with or without BOM) and standard line endings (LF, CRLF, CR). If your file uses a different encoding, convert it to UTF-8 first.

Large GEDCOM files The parser supports files up to 1MB line buffer size. Very large files with long continuation lines should import without issues.

Extension tags not imported Custom tags starting with _ (e.g., _MARNM, _PRIM) are recognized but not stored in the GLX output. These are logged as warnings in verbose mode.

GEDCOM Version Differences

Most differences are handled transparently by the importer, but it helps to know what to expect.

FeatureGEDCOM 5.5.1GEDCOM 7.0
Shared notesNOTE records with XRefSNOTE tag
Media MIME typesInferred from FORM tagExplicit MIME tag
External IDsNot supportedEXID tag with optional TYPE
Negative assertionsNot supportedNO tag (e.g., NO BIRT)
Crop coordinatesNot supportedCROP tag on media
Extension schemasConvention only (_ prefix)SCHMA tag with URI definitions
Binary dataBLOB tag (deprecated)Not supported
Void pointersNot used@VOID@ for embedded structures

See Also

Licensed under Apache License 2.0