Skip to content
This repository has been archived by the owner on Sep 28, 2022. It is now read-only.

Datasources

Brian Evans edited this page Dec 2, 2021 · 4 revisions

Datasources

Where the data comes from

Companies

A list of companies with basic information is available at http://download.companieshouse.gov.uk/en_output.html.

The file format is CSV.

Import to postgres

  • use xsv command line tool to split the CSV into partitions of 2 million rows each.
xsv split -s 2000000 split_dir companies.csv
  • make sure database datestyle is set to dmy
  • use PGAdmin or psql to import each CSV using script below
psql --username=postgres --command "\copy public.companies_basic_info (\"CompanyName\", \"CompanyNumber\", \"RegAddress.CareOf\", \"RegAddress.POBox\", \"RegAddress.AddressLine1\", \"RegAddress.AddressLine2\", \"RegAddress.PostTown\", \"RegAddress.County\", \"RegAddress.Country\", \"RegAddress.PostCode\", \"CompanyCategory\", \"CompanyStatus\", \"CountryOfOrigin\", \"DissolutionDate\", \"IncorporationDate\", \"Accounts.AccountRefDay\", \"Accounts.AccountRefMonth\", \"Accounts.NextDueDate\", \"Accounts.LastMadeUpDate\", \"Accounts.AccountCategory\", \"Returns.NextDueDate\", \"Returns.LastMadeUpDate\", \"Mortgages.NumMortCharges\", \"Mortgages.NumMortOutstanding\", \"Mortgages.NumMortPartSatisfied\", \"Mortgages.NumMortSatisfied\", \"SICCode.SicText_1\", \"SICCode.SicText_2\", \"SICCode.SicText_3\", \"SICCode.SicText_4\", \"LimitedPartnerships.NumGenPartners\", \"LimitedPartnerships.NumLimPartners\", \"URI\", \"PreviousName_1.CONDATE\", \"PreviousName_1.CompanyName\", \"PreviousName_2.CONDATE\", \"PreviousName_2.CompanyName\", \"PreviousName_3.CONDATE\", \"PreviousName_3.CompanyName\", \"PreviousName_4.CONDATE\", \"PreviousName_4.CompanyName\", \"PreviousName_5.CONDATE\", \"PreviousName_5.CompanyName\", \"PreviousName_6.CONDATE\", \"PreviousName_6.CompanyName\", \"PreviousName_7.CONDATE\", \"PreviousName_7.CompanyName\", \"PreviousName_8.CONDATE\", \"PreviousName_8.CompanyName\", \"PreviousName_9.CONDATE\", \"PreviousName_9.CompanyName\", \"PreviousName_10.CONDATE\", \"PreviousName_10.CompanyName\", \"ConfStmtNextDueDate\", \"ConfStmtLastMadeUpDate\") FROM 'C:\Users\bme\projects\companies-house-database-manager\samples\companies\split\0.csv' DELIMITER ',' CSV HEADER QUOTE '\"' ENCODING 'UTF8';"

Creation script can be made by getting headers with xsv and then transforming in NodeJS to be SQL.

FAQ

SIC codes

A CSV of all SIC codes can be downloaded from companies house at https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_data/file/527619/SIC07_CH_condensed_list_en.csv/preview

Accounts

Company accounts by month: http://download.companieshouse.gov.uk/en_monthlyaccountsdata.html

These files are in iXBRL format, and need to be scanned with [Arelle] to be converted to CSV or database format.

Persons with significant control

PSC bulk download: http://download.companieshouse.gov.uk/en_pscdata.html

The file is ndjson format (new line deliminated JSON).

Officers data

Officers bulk downloads are not available for download on a website. You have to send them a message on their API forum and they send a link via email to download current officers data.

The file is a mixture of fixed width fields and < deliminated values. I have written a C program to convert this to pure < deliminated values which can be ingested directly to postgres with psql.

Post codes

Detailed information about every post code. Available from doogal.co.uk/files/postcodes.zip.

CSV format.

Streams

Companies stream

Any change to company information, when a confirmation statement is filed. Many events do not contain a change, and there are duplicates. Available on /companies as long running HTTP request.

PSC stream

Updates to person with significant control, long running HTTP request.

Officers stream

Update to directory/secretary.

Filing stream

Any filings, usually also triggers another event such as company, psc or officer.

Insolvency stream

A change to a companies insolvency status. If they go into administration or are liquidated etc.

Charges stream

Debentures and fixed charges.

Officers update file

A FTP file transfer once a day to keep up to date the bulk officers file.