Inhaltsverzeichnis

Documentation for Neo4j Data Migration and Validation

This documentation covers the process of migrating fauna data from CSV files into a Neo4j graph database and subsequently validating the integrity of the migration using automated tests. The process is divided into three main components: data migration for creating nodes and relationships (fauna_taxon_migration_relationships.py), utility scripts for setting up and managing the database (fauna_taxon_migration.py), and test scripts for validating the data migration (test_taxon_data_migration.py)

1. Database Management: fauna_taxon_migration.py

This script contains utility functions for managing the graph database structure, such as creating indexes, creating nodes and processing batches of node data from CSV files.

There are 7 types:

Functionalities

Execution

Run the script directly to perform database setup tasks or node data migrations as needed. Modify the main() function to include or exclude specific operations.

2. Data Relationship Migration: fauna_taxon_migration_relationships.py

Overview

TaxonRelationshipBuilder is a Python class designed to facilitate the migration of taxonomic relationships from CSV files into a Neo4j graph database. It manages connections to the database, processes CSV files in batches, and handles the creation of various types of relationships between nodes mentioned above.

The are 9 relationships that connects the fauna_taxon nodes with other nodes:

Key Features

Usage

  1. Initialization: Create an instance of TaxonRelationshipBuilder with the Neo4j connection URI and optional credentials.
  2. Build Relationships: Call build_all_taxon_relationships() to start processing predefined CSV files and creating relationships in the database.
  3. Clean-up: The close() method is automatically called to close the database connection once operations are complete.

3. Data Migration Validation: test_taxon_data_migration.py

This script uses pytest to define and run tests that validate the integrity of the data migration process.

Key Components

Running Tests

Execute the tests using the pytest command. Ensure Neo4j is running and accessible at the specified URI.

Note: development brach name: feature_taxon_data_migration