NEWS


parquetize 0.5.7 (2024-03-04)

This release includes :

parquetize 0.5.6.1 (2023-05-10)

This release includes :

fst_to_parquet function

Other

parquetize 0.5.6

This release includes :

Possibility to use a RDBMS as source

You can convert to parquet any query you want on any DBI compatible RDBMS :

dbi_connection <- DBI::dbConnect(RSQLite::SQLite(),
  system.file("extdata","iris.sqlite",package = "parquetize"))
  
# Reading iris table from local sqlite database
# and conversion to one parquet file :
dbi_to_parquet(
  conn = dbi_connection,
  sql_query = "SELECT * FROM iris",
  path_to_parquet = tempdir(),
  parquetname = "iris"
)

You can find more information on dbi_to_parquet documentation.

check_parquet function

Deprecations

Two arguments are deprecated to avoid confusion with arrow concept and keep consistency

Other

parquetize 0.5.5 (2023-03-28)

This release includes :

A very important new contributor to parquetize !

Due to these numerous contributions, @nbc is now officially part of the project authors !

Three arguments deprecation

After a big refactoring, three arguments are deprecated :

They will raise a deprecation warning for the moment.

Chunking by memory size

The possibility to chunk parquet by memory size with table_to_parquet(): table_to_parquet() takes a chunk_memory_size argument to convert an input file into parquet file of roughly chunk_memory_size Mb size when data are loaded in memory.

Argument by_chunk is deprecated (see above).

Example of use of the argument chunk_memory_size:

table_to_parquet(
  path_to_table = system.file("examples","iris.sas7bdat", package = "haven"),
  path_to_parquet = tempdir(),
  chunk_memory_size = 5000, # this will create files of around 5Gb when loaded in memory
)

Passing argument like compression to write_parquet when chunking

The functionality for users to pass argument to write_parquet() when chunking argument (in the ellipsis). Can be used for example to pass compression and compression_level.

Example:

table_to_parquet(
  path_to_table = system.file("examples","iris.sas7bdat", package = "haven"),
  path_to_parquet = tempdir(),
  compression = "zstd",
  compression_level = 10,
  chunk_memory_size = 5000
)

A new function download_extract

This function is added to ... download and unzip file if needed.

file_path <- download_extract(
  "https://www.nomisweb.co.uk/output/census/2021/census2021-ts007.zip",
  filename_in_zip = "census2021-ts007-ctry.csv"
)
csv_to_parquet(
  file_path,
  path_to_parquet = tempdir()
)

Other

Under the cover, this release has hardened tests

parquetize 0.5.4 (2023-03-13)

This release fix an error when converting a sas file by chunk.

parquetize 0.5.3 (2023-02-20)

This release includes :

parquetize 0.5.2

This release includes :

parquetize 0.5.1 (2023-01-30)

This release removes duckdb_to_parquet() function on the advice of Brian Ripley from CRAN.
Indeed, the storage of DuckDB is not yet stable. The storage will be stabilized when version 1.0 releases.

parquetize 0.5.0 (2023-01-13)

This release includes corrections for CRAN submission.

parquetize 0.4.0

This release includes an important feature :

The table_to_parquet() function can now convert tables to parquet format with less memory consumption. Useful for huge tables and for computers with little RAM. (#15) A vignette has been written about it. See here.

parquetize 0.3.0

parquetize 0.2.0

parquetize 0.1.0