Visit the official Paxata Documentation portal for all of your doc needs.

XML/JSON/AVRO files

XML, JSON, and AVRO files don't look like most other standard file formats. The structure is not a simple row and column layout; instead it is hierarchical. Within any piece of data can be another set of data nested in it. The product that generates the file will determine the details of how the data is laid out. Because of this, it is not always easy to use these formats in more traditional products. Paxata will provide a simple way to "flatten out" the data into a row and column structure that is easily understood by less technical users. Below are the first two records of a simple contacts file from CSV, JSON, and XML to compare.

CSV
"Name","Email","Company_Name","City","State","Age","Gender","Status","Date","Score"
" HOUNDESOME, ADDY ","[email protected]","Visiting Nurse Service of NY","FLINT","MICHIGAN","38","","true","2018-10-01T03:57:00.000Z","70"
" KINSEY, PETERUS ","[email protected]","Morgan Stanley","TALLAHASSEE","FLORIDA","45","MALE","true","2018-10-01T07:00:00.000Z","85"

JSON
{  "data" : [ {    "Name" : " HOUNDESOME, ADDY ",    "Email" : "[email protected]",    "Company_Name" : "Visiting Nurse Service of NY",    "City" : "FLINT",    "State" : "MICHIGAN",    "Age" : 38,    "Gender" : null,    "Status" : true,    "Date" : "2018-10-01T03:57:00.000Z",    "Score" : 70  }, {    "Name" : " KINSEY, PETERUS ",    "Email" : "[email protected]",    "Company_Name" : "Morgan Stanley",    "City" : "TALLAHASSEE",    "State" : "FLORIDA",    "Age" : 45,    "Gender" : "MALE",    "Status" : true,    "Date" : "2018-10-01T07:00:00.000Z",    "Score" : 85  }

XML
<?xml version="1.0" encoding="UTF-8"?><Table>  <Record>    <Name> HOUNDESOME, ADDY </Name>    <Email>[email protected]</Email>    <Company_Name>Visiting Nurse Service of NY</Company_Name>    <City>FLINT</City>    <State>MICHIGAN</State>    <Age>38</Age>    <Gender></Gender>    <Status>true</Status>    <Date>2018-10-01T03:57:00.000Z</Date>    <Score>70</Score>  </Record>  <Record>    <Name> KINSEY, PETERUS </Name>    <Email>[email protected]</Email>    <Company_Name>Morgan Stanley</Company_Name>    <City>TALLAHASSEE</City>    <State>FLORIDA</State>    <Age>45</Age>    <Gender>MALE</Gender>    <Status>true</Status>    <Date>2018-10-01T07:00:00.000Z</Date>    <Score>85</Score>  </Record>

Excel

Excel provides the "Power Query" add-on that is included only in the Windows edition. If you are using Excel for Mac, you will have to find a third-party tool to work with these file types. Also of note, Excel for Windows only support XML and JSON; there is no support for AVRO files which is a binary compressed JSON file. The instructions below are a bit different for XML than with JSON but this will give you the idea of how to do it.

From the Data Ribbon, select "Get Data", "From File", "From XML" (or "From JSON")
Select your file, click "Import"
On the navigator, select the node level where you want the file to be flattened out
Click "Load"
For a simple file, it shouldn't take any more effort than this. For more complex files you may need to work the other options in the menus or the programming language that it built into Power Query. As an example, here is the code that was generated by this simple task:

let    Source = Xml.Tables(File.Contents("C:\Documents\Contacts - Oct2018.xml")),    Table0 = Source{0}[Table],    #"Changed Type" = Table.TransformColumnTypes(Table0,{{"Name", type text}, {"Email", type text}, {"City", type text}, {"State", type text}, {"Status", type logical}, {"Date", type datetime}, {"Score", Int64.Type}})in    #"Changed Type

Paxata

Paxata provides a very consistent interface for accessing your datasources. Regardless of file format or system you will always go to the same place.
In the library, click the import button




If the file is on a server (Hadoop, S3, Azure Blob Storage, SFTP site, Network File Share, etc) click the "Select Data Source" dropdown, otherwise click "Upload Local File" to pull a file from your workstation.




Once you find and select your file, or multiple files, you will be presented with the following four quadrant user interface




  • The upper left will let you bring in more files
  • The upper right will let you override the file format and remove files if you change you mind
  • The bottom left will let you change properties of the file. Some of these properties will be different depending on the source
  • The bottom right will let you see a preview of the data and manage the columns (rename, reorder, remove)

Focusing on the properties in the bottom left quadrant, you may decide to adjust how Paxata chose to flatten out the data. Simply click the "Override" button, then click on the node in the datasource display. With just a simple point and click, you can change how Paxata will interpret the data.




Once you are satisfied with the settings, click the "Finish" button and Paxata will bring the data into the environment.




Tagged:

Comments

Sign In or Register to comment.