Home Data Prep Q&A


Big News: we’ve moved to the DataRobot Community! Please keep your eye out for an email invitation to join us there. Refer to the We've Moved FAQ for a guide on how to use your existing Paxata Community account to login to our new home.

Visit the official Paxata Documentation portal for all of your doc needs.

Exporting parquet files with snappy compression

shivkumrshivkumr Posts: 1
edited March 17, 2020 4:27AM in Data Prep Q&A
I tried to export a 3 milion plus rows dataset as a parquet file to HDFS to feed a hive external table. It comes around 6 GB in size. The same file is 5.8 GB when exported as csv.

I would like to apply some compression when exporting it as parquet, because I believe paxata is doing some compression when storing its files in library in parquet format(otherwise it cannot store all these files if parquet occupies more space than csv. I have lot more files that are huge in rows and columns). can the same compression be applied to exports also? If applied, can a external hive table created on top of it will be able to read the data? also, any REST API command exists for the same? The version of paxata I use is Release: 2018.2.7.11.2697. 

Answers

  • sayyarsayyar Posts: 25 admin
    Hi @shivkumr,

    Paxata currently does not support compression during export. We will add it to the backlog of requested features. Thank you for the feedback!

    Regards,
    Shyam Ayyar
    Product Manager
Sign In or Register to comment.