No description
  • TypeScript 100%
Find a file
2025-06-01 16:17:46 +01:00
docs Add spec for Companies House UK's 2025-05-01 data product (CSV file). 2025-06-01 12:45:53 +01:00
infra Add docker-compose for ArangoDB setup. 2025-06-01 12:44:36 +01:00
src Rename source file. 2025-06-01 16:15:57 +01:00
.gitignore Add configs. 2025-06-01 12:15:09 +01:00
.npmrc Add configs. 2025-06-01 12:15:09 +01:00
config.ts Add configs. 2025-06-01 12:15:09 +01:00
LICENSE.md Update project description; add README and LICENSE files. 2025-06-01 16:11:28 +01:00
package-lock.json Add configs. 2025-06-01 12:15:09 +01:00
package.json Update project description; add README and LICENSE files. 2025-06-01 16:11:28 +01:00
prettier.config.mts Update Prettier configs - override for Markdown files. 2025-06-01 16:14:09 +01:00
README.md Modify README file: move badges. 2025-06-01 16:17:46 +01:00
tsconfig.json Add configs. 2025-06-01 12:15:09 +01:00

TypeScript Node.js Version License: MIT Open Issues Last Commit PRs Welcome

Companies House UK Bulk Data Processor

A Node.js script for processing bulk data obtained from Companies House UK in CSV format. Useful for normalising, filtering, and uploading to databases such as ArangoDB or Google BigQuery.

Provides a solid foundation for company analysis projects (e.g., augmenting company information with web scraping) for market research purposes. Can be further enhanced with AI interface (e.g., LLM interface, MCP server to power agents, and more).

Features

  • Designed for bulk operations on Companies House datasets (CSV files)

    • CSV data processing and normalisation
  • Custom filtering and transformation capabilities

  • Database integration (ArangoDB, Google BigQuery) for efficient batch inserts

  • Coming Soon: Extensible architecture for additional processing steps

Installation

  1. Clone this repository:

    git clone https://github.com/yourusername/companies-house-processor.git
    cd companies-house-processor
    
  2. Install project dependencies.

    npm install
    
  3. Configure your environment by copying the example env file:

      cp .env.example .env
    

Usage

  1. (Optional) Place your Companies House CSV files in the input directory.

  2. Configure your processing parameters in config.ts.

  3. Ensure .env file is correct, according to processing parameters in config.ts.

  4. Run the processor.

      npm start
    

Data Flow

See processCsvFile(...) function call in main() function body (src/index.ts):

  1. The database client is initialised.

  2. A read stream is opened to the input CSV file whose path is specified in config.ts.

  3. The CSV file is traversed to parse records according to the Record schema (src/models/record.ts).

  4. The callback is executed for the record — which contains logic for any filtering, transformation, and enqueuing for database insertion.

  5. After all records have been processed, the database client is flushed.

Contribution

Pull requests are welcome!

For major changes, please open an issue first to discuss what you would like to change.

License

MIT