Using compressed data files by default
02 February 2023
Using compressed files is now the default recommendation in documentation and templates. On the backends, where datasets can be very large, using uncompressed files significantly slows execution and consumes more disk space.
The research-template has
been updated to generate csv.gz
files from cohortextractor by default, and
the examples and Getting Started
documentation have been updated
to match.
In addition, recommendations for using compressed formats for further data files in python, R and Stata has been updated.
Because of the change in filename, if you have a workspace with a large amount of data in uncompressed CSV files, ask tech-support about moving to compressed CSVs, and we can help do this efficiently.