Home Data Prep Q&A

Paxata Community Members: Something special in a community experience is coming your way. Stay tuned to this space.
In the meantime, check out the brand new Data Prep for Data Science topic here and the new DataRobot Community.

Visit the official Paxata Documentation portal for all of your doc needs.

Exporting parquet files with snappy compression

shivkumrshivkumr Posts: 1
edited March 17, 2020 4:27AM in Data Prep Q&A
I tried to export a 3 milion plus rows dataset as a parquet file to HDFS to feed a hive external table. It comes around 6 GB in size. The same file is 5.8 GB when exported as csv.

I would like to apply some compression when exporting it as parquet, because I believe paxata is doing some compression when storing its files in library in parquet format(otherwise it cannot store all these files if parquet occupies more space than csv. I have lot more files that are huge in rows and columns). can the same compression be applied to exports also? If applied, can a external hive table created on top of it will be able to read the data? also, any REST API command exists for the same? The version of paxata I use is Release: 2018.2.7.11.2697. 

Answers

  • sayyarsayyar Posts: 24 ✭✭
    Hi @shivkumr,

    Paxata currently does not support compression during export. We will add it to the backlog of requested features. Thank you for the feedback!

    Regards,
    Shyam Ayyar
    Product Manager
Sign In or Register to comment.