This documentation covers the process of migrating fauna data from CSV files into a Neo4j graph database and subsequently validating the integrity of the migration using automated tests. The process is divided into three main components: data migration for creating nodes and relationships (fauna_taxon_migration_relationships.py), utility scripts for setting up and managing the database (fauna_taxon_migration.py), and test scripts for validating the data migration (test_taxon_data_migration.py)
This script contains utility functions for managing the graph database structure, such as creating indexes, creating nodes and processing batches of node data from CSV files.
There are 7 types:
create_indexs_for_graphs() and drop_indexes_for_graphs() manage indexes for faster query performance.batch_taxon_transaction, batch_area_transaction, etc.), allowing for flexible and organized data migration.
Run the script directly to perform database setup tasks or node data migrations as needed. Modify the main() function to include or exclude specific operations.
TaxonRelationshipBuilder is a Python class designed to facilitate the migration of taxonomic relationships from CSV files into a Neo4j graph database. It manages connections to the database, processes CSV files in batches, and handles the creation of various types of relationships between nodes mentioned above.
The are 9 relationships that connects the fauna_taxon nodes with other nodes:
ThreadPoolExecutor for parallel processing of data batches, improving efficiency.logging module for informative and debuggable output.TaxonRelationshipBuilder with the Neo4j connection URI and optional credentials.build_all_taxon_relationships() to start processing predefined CSV files and creating relationships in the database.close() method is automatically called to close the database connection once operations are complete.
This script uses pytest to define and run tests that validate the integrity of the data migration process.
pytest fixture neo4j_driver that sets up and tears down the Neo4j connection for tests.test_node_types_exist function checks for the existence of expected node labels in the database.test_relationship_types_exist function verifies that all expected relationship types are present.
Execute the tests using the pytest command. Ensure Neo4j is running and accessible at the specified URI.
Note: development brach name: feature_taxon_data_migration