-
Notifications
You must be signed in to change notification settings - Fork 0
Datasources
Where the data comes from
A list of companies with basic information is available at http://download.companieshouse.gov.uk/en_output.html.
The file format is CSV.
- use
xsv
command line tool to split the CSV into partitions of 2 million rows each.
xsv split -s 2000000 split_dir companies.csv
- make sure database
datestyle
is set todmy
- use PGAdmin or
psql
to import each CSV using script below
psql --username=postgres --command "\copy public.companies_basic_info (\"CompanyName\", \"CompanyNumber\", \"RegAddress.CareOf\", \"RegAddress.POBox\", \"RegAddress.AddressLine1\", \"RegAddress.AddressLine2\", \"RegAddress.PostTown\", \"RegAddress.County\", \"RegAddress.Country\", \"RegAddress.PostCode\", \"CompanyCategory\", \"CompanyStatus\", \"CountryOfOrigin\", \"DissolutionDate\", \"IncorporationDate\", \"Accounts.AccountRefDay\", \"Accounts.AccountRefMonth\", \"Accounts.NextDueDate\", \"Accounts.LastMadeUpDate\", \"Accounts.AccountCategory\", \"Returns.NextDueDate\", \"Returns.LastMadeUpDate\", \"Mortgages.NumMortCharges\", \"Mortgages.NumMortOutstanding\", \"Mortgages.NumMortPartSatisfied\", \"Mortgages.NumMortSatisfied\", \"SICCode.SicText_1\", \"SICCode.SicText_2\", \"SICCode.SicText_3\", \"SICCode.SicText_4\", \"LimitedPartnerships.NumGenPartners\", \"LimitedPartnerships.NumLimPartners\", \"URI\", \"PreviousName_1.CONDATE\", \"PreviousName_1.CompanyName\", \"PreviousName_2.CONDATE\", \"PreviousName_2.CompanyName\", \"PreviousName_3.CONDATE\", \"PreviousName_3.CompanyName\", \"PreviousName_4.CONDATE\", \"PreviousName_4.CompanyName\", \"PreviousName_5.CONDATE\", \"PreviousName_5.CompanyName\", \"PreviousName_6.CONDATE\", \"PreviousName_6.CompanyName\", \"PreviousName_7.CONDATE\", \"PreviousName_7.CompanyName\", \"PreviousName_8.CONDATE\", \"PreviousName_8.CompanyName\", \"PreviousName_9.CONDATE\", \"PreviousName_9.CompanyName\", \"PreviousName_10.CONDATE\", \"PreviousName_10.CompanyName\", \"ConfStmtNextDueDate\", \"ConfStmtLastMadeUpDate\") FROM 'C:\Users\bme\projects\companies-house-database-manager\samples\companies\split\0.csv' DELIMITER ',' CSV HEADER QUOTE '\"' ENCODING 'UTF8';"
Creation script can be made by getting headers with xsv
and then transforming in NodeJS to be SQL.
A CSV of all SIC codes can be downloaded from companies house at https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_data/file/527619/SIC07_CH_condensed_list_en.csv/preview
Company accounts by month: http://download.companieshouse.gov.uk/en_monthlyaccountsdata.html
These files are in iXBRL format, and need to be scanned with [Arelle] to be converted to CSV or database format.
PSC bulk download: http://download.companieshouse.gov.uk/en_pscdata.html
The file is ndjson format (new line deliminated JSON).
Officers bulk downloads are not available for download on a website. You have to send them a message on their API forum and they send a link via email to download current officers data.
The file is a mixture of fixed width fields and <
deliminated values. I have written a C program to convert this to pure <
deliminated values which can be ingested directly to postgres with psql.
Detailed information about every post code. Available from doogal.co.uk/files/postcodes.zip.
CSV format.
Any change to company information, when a confirmation statement is filed. Many events do not contain a change, and there are duplicates. Available on /companies
as long running HTTP request.
Updates to person with significant control, long running HTTP request.
Update to directory/secretary.
Any filings, usually also triggers another event such as company, psc or officer.
A change to a companies insolvency status. If they go into administration or are liquidated etc.
Debentures and fixed charges.
A FTP file transfer once a day to keep up to date the bulk officers file.