- TypeScript 100%
| docs | ||
| infra | ||
| src | ||
| .gitignore | ||
| .npmrc | ||
| config.ts | ||
| LICENSE.md | ||
| package-lock.json | ||
| package.json | ||
| prettier.config.mts | ||
| README.md | ||
| tsconfig.json | ||
Companies House UK Bulk Data Processor
A Node.js script for processing bulk data obtained from Companies House UK in CSV format. Useful for normalising, filtering, and uploading to databases such as ArangoDB or Google BigQuery.
Provides a solid foundation for company analysis projects (e.g., augmenting company information with web scraping) for market research purposes. Can be further enhanced with AI interface (e.g., LLM interface, MCP server to power agents, and more).
Features
-
Designed for bulk operations on Companies House datasets (CSV files)
- CSV data processing and normalisation
-
Custom filtering and transformation capabilities
-
Database integration (ArangoDB, Google BigQuery) for efficient batch inserts
-
Coming Soon: Extensible architecture for additional processing steps
Installation
-
Clone this repository:
git clone https://github.com/yourusername/companies-house-processor.git cd companies-house-processor -
Install project dependencies.
npm install -
Configure your environment by copying the example env file:
cp .env.example .env
Usage
-
(Optional) Place your Companies House CSV files in the
inputdirectory. -
Configure your processing parameters in
config.ts. -
Ensure
.envfile is correct, according to processing parameters inconfig.ts. -
Run the processor.
npm start
Data Flow
See processCsvFile(...) function call in main() function body (src/index.ts):
-
The database client is initialised.
-
A read stream is opened to the input CSV file whose path is specified in
config.ts. -
The CSV file is traversed to parse records according to the
Recordschema (src/models/record.ts). -
The callback is executed for the record — which contains logic for any filtering, transformation, and enqueuing for database insertion.
-
After all records have been processed, the database client is flushed.
Contribution
Pull requests are welcome!
For major changes, please open an issue first to discuss what you would like to change.