Help us build a better product, earn the PaxPro Advocate badge: Six short questions, one cool badge.
- to create a personal view: you want to hide columns from the grid so they don’t clutter your view but you need to keep them in your data because they are used for calculating Project Steps. Example: your work in a Project is primarily with numeric columns. In this case, create your personal view of the data that hides all column types other than numeric.
- to publish an AnswerSet that includes only specific columns: you may want to publish a custom AnswerSet that only displays some columns from the data. First create a lens on the Step where you want to create the AnswerSet, then hide the columns you do not want to show in the AnswerSet. Save the lens and your selections are saved. When you publish from the lens, only the columns you selected to show are published in the AnswerSet.
For more details, see the official documentation for Hide Columns in a Project.
Pro Tip: if you share a Project and also want to share your personal view of the data, use the lens tool to capture your view. The lens captures your personal view of the grid, and anyone else working in the Project can see your personal view by simply clicking on the lens in the Steps panel
|Connection UI||Connection Settings|
Allows you to connect to a JDBC source for Data Library imports. The following fields are used to define the connection parameters.
Export Batch Size: the batch size used when exporting data.
You have to decide if you want to build a table or not (and make sure the account used in the credentials has the rights to create table, or that specific privilege.
Now - if you want to retain some of the records or the structure, then I would suggest running the pre-export & post-export SQL. Pre-export would delete what you want to remove, but retain the database structure. A post command can be issued as a follow up to do activities like rebuild an index or things of that manner.
Pre-export SQL: SQL statement to execute before beginning export, after the
table is created if auto-create is enabled.
The user authentication can be done through a Shared Account or an Individual Account. If you chose to authenticate with an individual account, the user will be prompted to enter a username and password to access this Data Source. If you choose to authenticate with a Shared Account, the following fields are required.
This is a common data prep scenario and you can capture that Filtergram view in four easy Steps.
In this example, here's a Filtergram view of the Season1 column:
To capture the view above in an AnswerSet:
1. Perform a Group by on the column––in this example, the column named Season 1. And the resulting new aggregated column becomes the Count – Show Name column:
2. Sort the new Count – Show Name column by descending:
3. Now create a Lens for this view: Click the Lens tool, provide a Lens name, and Save the Lens to the Step:
4. Publish the lens for export. Note that when you mouse over the lens icon on the Step, the Publish button displays:
Intelligent Automation through the Automatic Project Flows (APF) feature allows you to intelligently operationalize curated data flows. With a single click, APF computes the entire sequence of data prep Steps across Paxata Projects, datasets and AnswerSets to produce an end-to-end, automated output Flow for your data. You can set the Flow to run on a recurring time-based schedule, or run it just once to produce an end-result AnswerSet. All runs can then be easily managed through the APF Monitoring Interface. For complete details, see Automatic Project Flows.
What are the key benefits of APF over the Automation feature?
Business Analysts and Data Engineers can simplify complex data flows by breaking them into smaller groups of Paxata Projects that can be operationalized—with each Project focused on performing a related or cohesive set of Steps for improved readability and limited complexity. When you're finished creating your Projects, simply select the final Project in the sequence as your "target" Project. APF takes care of the rest—sequencing, preparing and automating the entire end-to-end flow without any manual stitching required.
For teams that require input from both Business and the IT Leader, the data prep process can be simplified when members build Paxata Projects that depend on output AnswerSets created by others. Everyone completes their data prep work in their own Paxata Project, and then the entire sequence is operationalized from a single "target" Project. APF takes care of the rest with no manual stitching required, regardless of who created or owns the Projects and AnswerSets. Members of the team can then use the APF Monitoring Interface to view how their Projects and AnswerSets participate in the Flow's final output.
How do I migrate my legacy Automation job to APF?
The Paxata Customer Success team has built a utility that uses information about current Automation schedules and lists of Projects in a tenant, and then identifies all of the “Target Projects” for which an APF needs to be created. Once the APFs are created, users can set up their schedules, custom import options, and export configurations.
Can a customer have both Legacy Automation and Intelligent Automation?
No. Automation and APF cannot co-exist in the same tenant.
I don't see the APF feature in my software. What am I missing?
The APF is a feature is behind a feature flag. Contact Paxata Customer Success to enable the feature.
My Flow has a lot of input datasets and output projects. How do I see those details?
You can easily determine metadata statistics for the dataset inputs by hovering your mouse over a dataset name in the DATASETS column. The dataset's version, creation date and the user who added it to the Library, and the number of columns and rows are displayed in a pop-up window. The corresponding is true for Projects (Outputs) as well. From the pop-over, one can click on a button to navigate to the Dataset/Project as desired. For complete details, see Automatic Project Flows.
Legacy Automation allows me to configure import options. What about APF?
All of the datasets that are identified as part of a Flow are displayed on the Inputs tab. If a dataset was previously imported from a Connector data source, there is an option to re-import the dataset. When that option is selected, you’re provided with the ability to configure the re-import options, just like Legacy Automation.
With Automation, I can export to 3rd Party Data Sources. What about APF?
Yes! Export of Lens Output to data sources is just like how it works in Legacy Automation. The Outputs tab provides a list of all the output AnswerSets that are published from the Flow. Click the Configure Lens button for the lens and the Export panel opens at the bottom of the page. By default, AnswerSets are published to the Paxata Library. To also publish out to an external data source, click the drop-down for the Export Lens field and select "Library and Export". You can then specify the output location details and any export formatting options for that AnswerSet.
I have created an APF, but my Project script has changed since then. How do I update the Projects in a Flow?
You can update to the latest version of any of theProjects as long as the two Project versions have the same inputs and lenses. You can update a Flow to use the latest version of all Projects. This can be done from the "Actions" drop-down while on the Outputs tab. Select "Update Projects" and you are prompted to confirm your selection.For complete details, see Automatic Project Flows.
My Project consumes an AnswerSet that it produces. Can I create a Flow for it?
Yes. A Project consuming its own AnswerSet is called a self-loop and this is supported. When such a Project is involved in the Flow, the AnswerSet that causes the loop is brought in as a special input to the Project and is depicted by a dotted line for the looped input.
Does APF have REST API support?
Yes. Please contact Paxata Customer Success for the REST API documentation.
Do existing job limits/quotas apply for APF?
Yes. In the APF world, we consider each Dataset Import and Project Execution as a "chore". Before the execution of a chore begins, a check is performed to determine if the group/tenant has sufficient quotas available. If the quotas have been exhausted, we do not run the chore and then a run of the Flow is failed. Note: the quotas from the Legacy Automation feature are applied to APF.
When you visit a Project, you will see an icon in the top right-hand corner of the header next to the “Create Project Flow” button. Click this icon to find all the Flows in which the Project participates. Please note that if you don't have sufficient permissions to View a particular Flow, the search results will exclude it.
How do I share a Project Flow with my team?
In the Project Flows list page, every Flow that you have permissions to share has a clickable Permission button. Click on that button to adjust the Permissions on the Flow. You will be taken to a page that is very similar to how permissions work for Projects and datasets.
I have a Flow and there are many lenses in the Outputs tab that are enabled by default, and I am not able to disable them. Why?
If your lens has an indicator like the one displayed below, it means that the output from the lens is used by a Project in the Flow. Disabling such a lens would cause a failure when the Flow runs. So, we have prevented users from disabling these required lenses. You can, however, decide whether to publish the output to just Library or to also export it to a Data Source.
What happens if, during the run of a Flow, one of the imports or Projects fail?
If any chore (Dataset Import or Project Execution) fails during the run of a Flow, the entire run fails. You can see the chore status in the Run Details tab. Any chore that failed will have a “display errors and warnings” link, which, when clicked, opens a panel to display the errors encountered during execution.
Can I create more than one Flow for a Project?
Yes. However, best practices recommend that you determine if a Project is already participating in other Flows. If so, then you will want to be clear which versions of the Project each Flow is using.
How do I identify if a dataset/AnswerSet was produced as part of an APF Run?
In Library, on every dataset (or AnswerSet), a new menu item “Go to Run” has been added. If the dataset was produced by a Project Flow, the button is enabled. Clicking on the button takes you to the Run Details tab of the Run that produced the dataset.
Can I cancel the run of a Flow?
Yes, you can. While a run is in progress, you can visit the Run Details page for that run and click on the "Stop" button to stop a run that is in progress. This will put the run into a cancel mode and confirmation message is displayed. Any chores currently in progress will complete and then run will halt.
My Flow just started running with wrong configurations. What can I do now?
You can cancel the run of the Flow by clicking on the "Stop" button from the Run Details page. Once the UI acknowledges that a request to cancel has been made, you can run the Flow again.