High Performance Multi-System Analytics using Teradata QueryGrid

June 1, 2015, 3:14 pm

≫ Next: The Many Uses of the Teradata Querybanding Feature

≪ Previous: Teradata Application Performance Tuning: Tips and Techniques

Course Number:

53556

Training URL:

http://www.teradata.com/t/ten/

Training Format:

Recorded webcast

Today's analytic environments incorporate multiple technologies and systems. Getting value from these systems often requires that they work together to address business needs.

Users should have self-service access to the systems and data they need in a timely manner. This allows them to leverage all available analytics and data sources to make well-informed decisions.

Teradata QueryGrid provides:

Self-service access to heterogeneous systems and data
Push-down processing capabilities to allow processing the data where it resides
Bi-directional parallel data transfer
Easy access through SQL using the skills and tools users already have
Flexibility to pick the best-of-breed technologies that meet business requirements

Key Points

Understand how Teradata QueryGrid works and the value it can provide
Live use case showing syntax and functionality
When to use QueryGrid vs. a single system

Presenter: Andy Sanderson, Product Marketing Manager - Teradata Corporation

Audience:

Project Managers / Business Roles, Data Warehouse Architects, Data Warehouse Technical Specialists, CIOs, CTOs, DBAs, System Administrators

Price:

$195

Credit Hours:

Channel:

Tags:

↧

The Many Uses of the Teradata Querybanding Feature

June 2, 2015, 11:43 am

≫ Next: Oracle to Teradata 101

≪ Previous: High Performance Multi-System Analytics using Teradata QueryGrid

Course Number:

43042

Training URL:

http://www.teradata.com/t/ten/

Training Format:

Recorded webcast

Querybanding provides the capability to wrap metadata around a query. This presentation will focus on the many uses of the Querybanding feature.

Key Points

Understanding the Query Band feature
Learn how to use Query Banding with Query Logging
Learn how to use Query Banding with workload management
Learn how to use Query Banding Reserved Names

Note: Updated and re-recorded in April 2015

Presenters:
Debbie Galeazzi, Teradata Developer - Teradata Corporation
Omer Ahamd, PS Consultant - Teradata Corporation

Price:

$195

Credit Hours:

Channel:

database

Tags:

query band

query banding

↧

Oracle to Teradata 101

June 10, 2015, 3:57 pm

≫ Next: Teradata’s Path to the Cloud - Secure Zones

≪ Previous: The Many Uses of the Teradata Querybanding Feature

Course Number:

22381

Training URL:

http://www.teradata.com/t/ten/

Training Format:

Recorded webcast

If you've worked with Oracle for years and now find yourself the DBA for a Teradata Data Warehouse, you'll want to get productive as soon as possible.

You already have a good understanding of relational databases and SQL. You just need to learn the ins and outs of this new database. This session will review the Teradata Database in Oracle terms. Basic architectural differences and the similarities between the two databases will be discussed. Database setup and database structures will be covered. Armed with this comparison, you'll have the introductory information you'll need to get started with Teradata.

Key Points:

Understanding Oracle will give you a head start in learning Teradata.
There are some differences you will need to learn about the two databases.
You will be able to use your Oracle knowledge to understand Teradata.

Updated and re-recorded in June 2015.

Presenter: Dawn McCormick, Senior Database Consultant - Teradata Corporation

Price:

$195

Credit Hours:

Channel:

Tags:

↧

Teradata’s Path to the Cloud - Secure Zones

June 24, 2015, 11:38 am

≫ Next: Teradata Express for VMware Player

≪ Previous: Oracle to Teradata 101

Short teaser:

Multi-tenant secure zones enable Cloud paradigms

Cover Image:

Database users are increasingly becoming more comfortable with storing their data in database systems located in public or private clouds. For example, Amazon RDS (relational database service) is used by approximately half of all customers of Amazon’s AWS cloud. Given that AWS has over a million active customers, this implies that there are over a half a million users that are willing to store their data in Amazon’s public cloud database services.

A key feature of cloud computing --- a feature that enables efficient resource utilization and reduced overall costs --- is multi-tenancy. Many different users, potentially from different organizations, share the same physical resources in the cloud. Since many database-backed applications are not able to fully utilize the CPU, memory, and I/O resources of a single server machine 24 hours a day, database users can leverage the mutli-tenant nature of the cloud in order to reduce costs.

The most straightforward way to implement database multi-tenancy in the cloud is to acquire a virtual machine in the cloud (e.g. via Amazon EC2), install the database system on the virtual machine, and load data into it and access it as one would any other database system. As an optimization, many cloud providers offer specialized virtual machines with the database preinstalled and preconfigured in order to accelerate the process of setting up the database and making it ready to use. Amazon RDS is one example of a specialized virtual machine of this kind.

The “database system on a virtual machine” approach is a clean, elegant, and general way to implement multi-tenancy. Multiple databases running on different virtual machines can be mapped to the same physical machine, with negligible concern for any security problems arising from the resulting multi-tenancy. This is because the hypervisor effectively shields each virtual machine from being able to access data from other virtual machines located on the same physical machine.

For general cloud environments such as Amazon AWS, this general approach of achieving database multi-tenancy is a good solution, since it dovetails with the EC2 philosophy of giving each user his own virtual machine. However, for specific database-as-a-service / database-in-the-cloud solutions, supporting multi-tenancy via installing multiple database instances in separate virtual machines is inefficient for several reasons. First, storage, memory, and cache space must be consumed for each virtual machine. Second, the same database software must be installed on each virtual machine, thereby duplicating the storage, memory and cache space needed for the redundant instances of the same database software. These redundant copies also reduce instruction cache locality, since even if the same parts of the database system codebase are being accessed by different instances, since each instance has its own separate codebase, other instances running on other virtual machines that access the same part of the code cannot benefit from the fact that it is already in the instruction cache of a different virtual machine.

To summarize the above points:

(1) Allowing multiple database users to share the same physical hardware (“multi-tenancy”) helps optimize resource utilization in the cloud, and therefore reduce costs.

(2) Secure mutli-tenancy can be easily achieved via giving each user a separate virtual machine and mapping multiple virtual machines to the same physical machine.

(3) When the different virtual machines are all running the same OS and database software, the virtual machine approach results in inefficient redundancy.

If all tenants of a multi-tenant system are using the same software, it is far more efficient to install a single instance of that software on the system, and allow all tenants to share the same software instance. However, a major concern with this approach is security: for example, in a database system, it is totally unacceptable for different tenants to have access to each other’s data. Even metadata should not be visible across tenants --- they should not be aware of each other’s table names and data profiles. In other words, each tenant should have a view of the database as if they are using an instance of that database installed and prepared specifically for that tenant, and any data and metadata associated with other tenants should be totally invisible.

Furthermore, even the performance of database queries, transactions, and other types of requests should not be harmed by the requests of other tenants. For example, if tenant A is running a long and resource-intensive query, tenant B should not observe slow-downs of the requests it is concurrently making of the database system.

Teradata’s recent announcement of its secure zones feature is thus a major step towards a secure, multi-tenant version of Teradata. Each tenant exists with its own “Secure Zone”, and each zone has its own separate set of users that can only access database objects within that zone. The view that a user has of the database is completely local to the zone in which that user is defined --- even the database metadata (“data dictionary tables” in Teradata lingo) is local to the zone, such that user queries of this metadata only return results for the metadata associated with the zone in which the user is defined. Users are not even able to explicitly grant permissions to view database objects of their zone to users of a different zone --- each zone is 100% isolated from the other secure zones[1].

Figure 1: Secure zones contain database uses, tables, profiles, and views

A key design theme in Teradata’s Secure Zones feature is the separation of administrative duties from access privileges. For example, in order to create a new tenant, there needs to be a way to create a new secure zone for that tenant. Theoretically, the most straightforward mechanism for accomplishing this would be via a “super user” analogous to the linux superuser / root that has access to the entire system and can create new users and data on the system at will. This Teradata superuser could then add and remove new secure zones, create users for those zones, and access data within those zones.

Unfortunately, this straightforward “superuser” solution is fundamentally antithetical to the general “Secure Zones” goal of isolating zones from each other, since the zone boundaries have no effect on the superuser. In fact, the presence of a superuser would violate regulatory compliance requirements in certain multi-tenant application scenarios.

Therefore, Teradata’s Secure Zones feature includes the concept of a “Zone Administrator” --- a special type of user that can perform high level zone administration duties, but has no discretionary access rights on any objects or data within a zone. For example, the Zone Administrator has the power to create and drop zones, and to grant limited access to the zone for specific types of users. Furthermore, the Zone Administrator determines the root object of a zone. However, the Zone Administrator cannot read or write that root object, nor any of its descendants.

Analogous to a Zone Administrator is a special kind of user called a “DBA User”. Just as a Zone Administrator can perform administrative zone management tasks without discretionary access rights in the zones that it manages, a DBA User can perform administrative tasks for a particular zone without superuser discretionary access rights in that zone. In particular, DBA Users only receive automatic DDL and DCL rights within a zone, along with the power to create and drop users and objects. However, they must be directly assigned DML rights for any objects within a zone that they do not own in order to be able to access them. Thus, if every zone in a Teradata system is managed by a DBA User, then the resulting configuration has complete separation of administrative duties from access privileges --- the Zone Administrator and DBA Users perform the administration without any automatic discretionary access rights on the objects in the system.

The immediate use case for secure zones is Teradata’s new “Software Defined Warehouse” which is basically a Teradata private cloud within an organization. It consists of a single Teradata system that is able to serve multiple different Teradata database instances from the same system. If the organization develops a new application that can be served from a Teradata database, instead of acquiring the hardware and software package that composes a new Teradata system, the organization can instead serve this application from the software defined warehouse. Multiple existing Teradata database instances can also be consolidated into the software defined warehouse.

Figure 2: Software-Defined Warehouse Workloads

The software defined warehouse is currently intended for use cases where all applications / database instances that it is managing belong to the same organization. Nonetheless, in many cases, different parts of an organization are not allowed access to data for other parts of that organization. This is especially true for multinational or conglomerate companies with multiple subsidiaries where access to subsidiary data must be tightly controlled and restricted to users of the subsidiary or citizens of a specific country. Therefore, each database instance that the software defined warehouse is managing exists within a secure zone.

In addition to secure zones, the other major Teradata feature that makes efficient multi-tenancy possible is Teradata Workload Management. Without workload management, it is possible for system resources to get hogged by a single database instance that is running a particularly resource intensive task, while users of the other instances see significantly increased latencies and overall degraded performance. For the multiple virtual-machine implementation of the cloud mentioned above, the hypervisor implements workload management --- ensuring that each virtual machine gets a guaranteed amount of important system resources such as CPU and memory. Teradata’s “virtual partitions” works the same way --- the system resources are divided up so that each partition is guaranteed a fixed amount of system resources. By placing each Teradata instance inside its own virtual partition, the Teradata workload manager can thus ensure that the database utilization of one instance does not affect the observed performance of other instances.

When you combine Teradata Secure Zones and Teradata Workload Management, you end up with a cloud-like environment, where multiple different Teradata databases can be served from a single system. Additional database instances can be created “on demand”, backed by this same system, without having to wait for procurement of an additional Teradata system. However, this mechanism of “cloudifying” Teradata is much more efficient that installing the Teradata database software in multiple different virtual machines, since all instances are served from a single version of the Teradata codebase, without redundant operating system and database system installations.

Since I am not a full-time employee of Teradata and have not been briefed on future plans for Teradata in the cloud, I can only speculate about the next steps for Teradata’s plans for cloud. Obviously, Teradata’s main focus for secure zones and virtual partitions have been the software-defined warehouse, so that organizations can implement a private cloud or consolidate multiple Teradata instances onto a single system. However, I do not see any fundamental limitations to prevent Teradata from leveraging these technologies in order to build a public Teradata cloud, where Teradata instances from different organizations share the same physical hardware, just like VMs from different organizations share the same hardware in Amazon’s cloud. Whether or not Teradata chooses to go in this direction is likely a business decision that they will have to make, but it’s interesting to see that with secure zones and workload management, they already have the major technological components to proceed in this direction and build a highly-efficient database-as-a-service offering.

[1] There is a concept of a special type of user called a “Zone Guest”, which is not associated with any zone, and can have guest access to objects in multiple zones, but the details of this special type of user is outside the scope of this post.

___________________________________________________________________________________________________________

Daniel Abadi is an Associate Professor at Yale University, founder of Hadapt, and a Teradata employee following the recent acquisition. He does research primarily in database system architecture and implementation. He received a Ph.D. from MIT and a M.Phil from Cambridge. He is best known for his research in column-store database systems (the C-Store project, which was commercialized by Vertica), high performance transactional systems (the H-Store project, commercialized by VoltDB), and Hadapt (acquired by Teradata). http://twitter.com/#!/daniel_abadi.

Tags:

secure zones cloud

Ignore ancestor settings:

Channel:

database

Apply supersede status to children:

↧

Teradata Express for VMware Player

June 30, 2015, 10:33 am

≫ Next: Extract and analyze SHOWBLOCKS/SHOWWHERE data as easy as pie

≪ Previous: Teradata’s Path to the Cloud - Secure Zones

Tags:

vmware

express

Channel:

database

Full user details required:

Full user details required

Download Teradata Express for VMware, a free, fully-functional Teradata database, that can be up and running on your system in minutes. For TDE 14.0, please read the launch announcement, and the user guide. For previous versions, read the general introduction to the new Teradata Express family, or learn how to install and configure Teradata Express for VMware.

There are multiple versions of Teradata Express for VMware available: for Teradata 13.0 and 13.10 and 14.0. More information on each package is available on our main Teradata Express page.

Note that in order to run this VM, you'll need to install VMware Player or VMware Server on your system. Also, please note that your system must have 64-bit support. For more details, see how to install and configure Teradata Express for VMware.

For feedback, discussion, and community support, please visit the Cloud Computing forum.

Download sizes, space requirements and MD5 checksums

Package	Version	Initial Disk Space	Download size (bytes)	MD5 checksum
Teradata Express 14.0 for VMware (4 GB)	14.00.00.01		`3,105,165,305`	`F8EFE3BBE29F3A3504B19709F791E17A`
Teradata Express 14.0 for VMware (40 GB)	14.00.00.01		`3,236,758,640`	`B6C81AA693F8C3FB85CC6781A7487731`
Teradata Express 14.0 for VMware (1 TB)	14.00.00.01		`3,484,921,082`	`2D335814C61457E0A27763F187842612`
Teradata Express 13.10 for VMware (1 TB)	13.10.00.10	15 GB	`3,002,848,127`	`04e6cb9742f00fe8df34b56733ade533`
Teradata Express 13.10 for VMware (40 GB)	13.10.00.10	10 GB	`2,943,708,647`	`ab1409d8511b55448af4271271cc9c46`
Teradata Express 13.0 for VMware (1 TB)	13.00.00.19	64 GB	`3,072,446,375`	`91665dd69f43cf479558a606accbc4fb`
Teradata Express 13.0 for VMware (40 GB)	13.00.00.19	10 GB	`2,018,812,070`	`5cee084224343a01de4ae3879ada9237`
Teradata Express 13.0 for VMware (40 GB, Japanese)	13.00.00.19	10 GB	`2,051,920,372`	`8e024743aeed8e5de2ade0c0cd16fda9`
Teradata Express 13.0 for VMware (4 GB)	13.00.00.19	10 GB	`2,002,207,401`	`5a4d8754685e80738e90730a0134db9c`
Teradata Tools and Utilities 13.10 Windows Client Install Package	13.10.00.00		`409,823,322`	`8e2d5b7aaf5ecc43275e9679ad9598b1`

Ignore ancestor settings:

Respect ancestor settings

Apply supersede status to children:

Ignore children

Download License:

Teradata Express Agreement (+SLES 10)

Package version:

13.10.00.05

↧

Extract and analyze SHOWBLOCKS/SHOWWHERE data as easy as pie

July 2, 2015, 3:11 pm

≫ Next: Teradata Partitioning

≪ Previous: Teradata Express for VMware Player

Short teaser:

SQL SHOWBLOCKS & SQL SHOWWHERE

Cover Image:

Systemwide summary of number of cylinders of various temperatures

Tired of parsing the output from Ferret SHOWBLOCKS and SHOWWHERE commands? You don’t need to do that anymore with the release of the SQL SHOWBLOCKS and SQL SHOWWHERE features. The SQL SHOWBLOCKS feature is available in Teradata Database 15.0, and SQL SHOWWHERE is available in 15.10. These new features allow you to easily extract SHOWBLOCKS and SHOWWHERE data, which can subsequently be queried for analysis using SQL. Not only is this information simpler to view, but the table-based data can easily be exported to third party tools for better view/analysis of the system-level information.

Unlike the Ferret utility, the SQL SHOWBLOCKS and SQL SHOWWHERE macros do not require special console privileges (DBCCONS or CNS supervisor window) to run. These SQL macros can be run through many user sessions at the same time, whereas the number of sessions that one can initiate through Ferret is limited to the number of CNS Supervisor window sessions that can be started.

SQL SHOWBLOCKS and SQL SHOWWHERE both employ the same two new system macros. CreateFsysInfoTable creates a target table to hold the file system information and PopulateFsysInfoTable populates the target table with system-level information for SHOWBLOCKS or SHOWWHERE. Once the target tables are populated with SHOWBLOCKS or SHOWWHERE rows, normal SQL queries can be run on those target tables and several system level details, such as Data Block Size Statistics, Cylinder and Block Level Compression related stats, Temperature and grade of the storage, can be obtained.

The example below shows how to create a target table to hold SHOWBLOCKS information:

EXEC DBC.CreateFsysInfoTable (‘SYSTEMINFO’, ‘SHOWBLOCKS_M’, ‘PERM’, ‘Y’, ‘SHOWBLOCKS’, ‘M’ );

- Creates the permanent target table ‘SHOWBLOCKS_M’ with fallback in the target database ‘SYSTEMINFO’ for capturing the SHOWBLOCKS’s medium (‘M’) display rows.

The example below shows how to populate the target table created above:

EXEC DBC.PopulateFsysInfoTable (‘PRODUCTION’, ‘CALLLOG_2015, ‘SHOWBLOCKS’, ‘M’ , ‘SYSTEMINFO’, ‘SHOWBLOCKS_M’);

- Populates the target table ‘SYSTEMINFO.SHOWBLOCKS_M’ with SHOWBLOCKS medium (‘M’) display rows of the input table ‘PRODUCTION.CALLLOG_2015’.

For more details on syntax, invocation and other requirements refer to the SQL Functions, Operators, Expressions and Predicates manual and the SQL SHOWBLOCKS and SQL SHOWWHERE Orange Book listed in Appendix A.

Figure1 shows the sample rows from a target table “SYSTEMINFO.SHOWBLOCKS_M” which stored SHOWBLOCKS information for a source table “PRODUCTION.CALLOG_2015”.

Figure 1: Sample SQL SHOWBLOCKS rows from a target table

The corresponding SHOWBLOCKS output collected from Ferret is shown in Figure 2.

Figure 2: SHOWBLOCKS /M output from Ferret

The sample SQL used to extract data from the target table, which was then used to create the graphs shown in Figure3 and Figure4, is presented below:

Note that these graphs represent multiple invocations of the PopulateFsysInfoTable macro over time into the target table ‘SYSTEMINFO.SHOWBLOCKS_M’ before running the above SQL.

Figure3: Estimated Compression Ratio by Date Figure 4 : Min, Max and Average DB Size values (in units of sectors) by Date

Key Points

Compared to the traditional mechanism (the Ferret utility) for extracting SHOWBLOCKS and SHOWWHERE data for analysis of system level information, the new SQL SHOWBLOCKS and SQL SHOWWHERE methods are much easier, and have fewer limitations.
System-level data can be captured at multiple points over time, making it easy to collect historical data for long-term analysis
Graphs are easier to produce and data interpretation is more straightforward.

Appendix A: Reference Material

Teradata Database Manual “SQL Functions, Operators, Expressions and Predicates, Release 15.10”
Teradata Orange Book “SQL SHOWBLOCKS and SQL SHOWWHERE in Teradata Database” , Book# 541-0010699-A02, April 2015
Teradata Orange Book “Block Level Compression, in Teradata 14.0, including Temperature Based and Independent Sub-table Compression; 2011-10”

Tags:

PopulateFsysInfoTable

Ignore ancestor settings:

Channel:

database

Apply supersede status to children:

↧

Teradata Partitioning

August 6, 2015, 3:20 pm

≫ Next: Teradata Database 15.10 Overview

≪ Previous: Extract and analyze SHOWBLOCKS/SHOWWHERE data as easy as pie

Course Number:

53716

Training URL:

http://www.teradata.com/t/ten/

Training Format:

Recorded webcast

Teradata partitioning was originally released in V2R5.0. This presentation reviews partitioning and the changes to this feature (from DPE to columnar) that have occurred over the last twelve years.

Additional insights into the feature are provided. If you thought you knew all about partitioning from V2R5.0, this presentation brings you up to date and also provides some insight into the usage of the feature.

Presenter: Paul Sinclair - Teradata Corporation

Price:

$195

Credit Hours:

Channel:

database

Tags:

partitioning

partitioning expressions

partition

columnar

↧

Teradata Database 15.10 Overview

August 7, 2015, 12:43 pm

≫ Next: Teradata Basics: Parallelism

≪ Previous: Teradata Partitioning

Course Number:

54172

Training URL:

http://www.teradata.com/t/ten/

Training Format:

Recorded webcast

Come hear all about the newest features of Teradata Database 15.10!

Learn about the exciting and new enhancements to our Columnar Tables offer, the latest on In-Memory Optimizations, the New Load Isolation feature, a new table level locking feature: Partition Level Locking (PLL) and Secure Zones for multi-tenancy.

The Teradata Database 15.10 release represents the next major step in the evolution of Data Warehousing excellence.

Presenter: Richard Charucki - Teradata Corporation

Price:

$195

Credit Hours:

Channel:

database

Tags:

td 15.10

teradata database 15.10

teradata database release 15.10

↧

Teradata Basics: Parallelism

August 10, 2015, 8:37 am

≫ Next: High Performance Multi-System Analytics using Teradata QueryGrid

≪ Previous: Teradata Database 15.10 Overview

Cover Image:

This “Teradata Basics” posting describes the current dimensions of parallelism in the Teradata Database, and how they work together.

What is Query Parallelism?

Executing a single SQL statement in parallel means breaking the request into components, and working on all components at the same time, with one single answer delivered. Parallel execution can incorporate all or part of the operations within a query, and can significantly reduce the response time of an SQL statement, particularly if the query reads and analyzes a large amount of data.

With a design goal of eliminating single-threaded operations, the original architects of the Teradata Database parallelized everything, from the entry of SQL statements to the smallest detail of their execution. The database’s entire foundation was constructed around the idea of giving each component in the system many look-alike counterparts. Not knowing where the future bottlenecks might spring up, early developers weeded out all possible single points of control and effectively eliminated the conditions that breed gridlock in a system.

Limitless interconnect pathways, and multiple optimizers, host channel connections, gateways, and units of parallelism are supported in Teradata, increasing flexibility and control over performance that is crucial to large-scale data analytics today.

Teradata’s Unit of Parallelism

As you probably know, Teradata’s basic unit of parallelism is the AMP. From system configuration time forward all queries, data loads, backups, index builds, in fact everything that happens in a Teradata system, is shared across a pre-defined number of AMPs. The parallelism is predictable and understandable.

Each AMP acts like a microcosm of the database, supporting such things as data loading, reading, writing, journaling and recovery for all the data that it owns. The parallel units also have knowledge of each other and work cooperatively together behind the scenes. This teamwork among parallel units is an unusual strength of the Teradata Database, driving higher performance with minimal overhead.

Teradata’s Dimensions of Query Parallelism

While the AMP is the fundamental unit of apportionment, and delivers basic query parallelism to the work in the system, there are two additional parallel dimensions woven into the Teradata Database, specifically for query performance. These are referred to here as “within-a-step” parallelism, and “multi-step” parallelism. A description of all three dimensions of parallelism that Teradata applies to a query follows:

Query Parallelism.Query parallelism is usually enabled in Teradata by hash-partitioning the data across all the AMPs defined in the system. One exception is No Primary Index tables, which use other mechanisms for assigning data to AMPs. Once data is assigned to an AMP, the AMP provides all the database services on its allocation of data blocks. All relational operations such as table scans, index scans, projections, selections, joins, aggregations, and sorts execute in parallel across the AMPs simultaneously. Each operation is performed on an AMP’s data independently of the data associated with the other AMPs.

Within-a-Step Parallelism. A second dimension of parallelism that will naturally unfold during query execution is an overlapping of selected database operations referred to here as within-a-step parallelism. The optimizer splits an SQL query into a small number of high level database operations called “steps” and dispatches these distinct steps for execution to the AMPs, one after another.

A step can be a small piece or a large chunk of work. It can be simple, such as "scan a table and return the result" or complex, such as "scan two tables and apply predicates to each, join the two tables, redistribute the join result on specified columns, sort the redistributed rows, and place the redistributed rows in an intermediate table".

Within each of these potentially large chunks of work that we call steps, multiple relational operations can be processed in parallel by pipelining. While a table scan is taking place, rows that are selected can be pipelined into a join process immediately. Pipelining is the ability to begin one task before its predecessor task has completed and will take place whenever possible within each distinct step.

This dynamic execution technique, in which a second operation jumps off of a first one to perform portions of the step in parallel, is key to increasing the basic query parallelism. The relational-operator mix of a step is carefully chosen by the Teradata optimizer to avoid stalls within the pipeline.

Multi-Step Parallelism. Multi-step parallelism is enabled by executing multiple “steps” of a query simultaneously, across all the participating units of parallelism in the system. One or more tasks are invoked for each step on each AMP to perform the actual database operation. Multiple steps for the same query can be executing at the same time to the extent that they are not dependent on results of previous steps.

Below is a representation of how all of these three types of parallelism might appear in a query’s execution.

The above figure shows four AMPs supporting a single query’s execution, and the query has been optimized into 7 steps. Step 1.2 and Step 2.2 demonstrate within-a-step parallelism, where two different tables are scanned and joined together (three different operations are performed). The result of those three operations are pipelined into a sort and then a redistribution, all in one step. Step 1.1 and 1.2 together (as well as 2.1 and 2.2 together) demonstrate multi-step parallelism, as two distinct steps are chosen to execute at the same time, within each AMP.

And Even More Parallel Possibilities

In addition to the three dimensions of parallelism shown above, the Teradata Database offers an SQL extension called a “multi-statement request” that allows several distinct statements to be bundled together and sent from the client to the database as if they were one. These SQL statements will then be executed in parallel.

When the multi-statement request feature is used, any sub-expressions that the different SQL statements have in common will be executed once and the results shared among them. Known as “common sub-expression elimination,” this means that if six select statements were bundled together and all contained the same subquery, that subquery would only be executed once. Even though these SQL statements are executed in an inter-dependent, overlapping fashion, each query in a multi-statement request will return its own distinct answer set.

This multi-faceted parallelism is not easy to choreograph unless it is planned for in the early stages of product evolution. An optimizer that generates three dimensions of parallelism for one query such as described here must be intimately familiar with all the parallel capabilities that are available and know how and when to use them. But most importantly, the Teradata Database applies these multiple dimensions of parallelism automatically, without user intervention or special setup.

Ignore ancestor settings:

Tags:

Apply supersede status to children:

↧

High Performance Multi-System Analytics using Teradata QueryGrid

August 29, 2015, 8:27 am

≫ Next: Hands-On with Teradata QueryGrid™ - Teradata-to-Teradata

≪ Previous: Teradata Basics: Parallelism

Course Number:

54300

Training URL:

http://www.teradata.com/t/ten/

Training Format:

Recorded webcast

Today's analytic environments incorporate multiple technologies and systems. Teradata QueryGrid™ Teradata-to-Hadoop allows you to access data and processing on Hadoop from your Teradata data warehouse.

This course gives you an in-depth understanding how QueryGrid Teradata-to-Hadoop works including querying metadata, partition pruning, push-down processing, importing data and joins. Using live queries we explain the syntax and functionality of QueryGrid in order to combine data and analytics across your analytic ecosystem.

Pre-Requisite: Course #53556, High Performance Multi-System Analytics using Teradata QueryGrid (Webcast)

Price:

$195

Credit Hours:

Channel:

Tags:

↧

Hands-On with Teradata QueryGrid™ - Teradata-to-Teradata

September 11, 2015, 5:49 pm

≫ Next: Workload Management with SLES 11 – Tips and Techniques

≪ Previous: High Performance Multi-System Analytics using Teradata QueryGrid

Course Number:

54299

Training URL:

http://www.teradata.com/t/ten/

Training Format:

Recorded webcast

Teradata QueryGrid enables high performance multi-system analytics across various data platforms. Integrating the data and analytics of multiple Teradata systems can help organizations to economically scale their Teradata environment by adding systems with different characteristics ...

... Or they can take advantage of existing investments by leveraging idle resources. The QueryGrid: Teradata-to-Teradata connector has been available as of Q1 2015 and provides bi-directional parallel data transfer, ad-hoc query capabilities and push-down processing between Teradata databases. This course is a deep dive into both the functionality and example use cases for this connector. We use live queries to explain these capabilities and how to use them as well as discuss various applications for your analytic environment.

Presenter: Andy Sanderson, Product Marketing Manager - Teradata Corporation

Prerequisite: Course #53556 High Performance Multi-System Analytics using Teradata QueryGrid (Webcast)

Price:

$195

Credit Hours:

Channel:

Tags:

↧

Workload Management with SLES 11 – Tips and Techniques

October 13, 2015, 2:21 pm

≫ Next: Implementing Temporal on Teradata

≪ Previous: Hands-On with Teradata QueryGrid™ - Teradata-to-Teradata

Course Number:

53853

Training URL:

http://www.teradata.com/t/ten/

Training Format:

Recorded webcast

This session covers the main topics of Workload Management on SLES 11 systems.

Typical concerns that arise while configuring workload management for SLES 11 are discussed and strategies for many of these situations are provided.

Presenter: Dan Fritz - Teradata Corporation

Price:

$195

Credit Hours:

Channel:

Tags:

↧

Implementing Temporal on Teradata

October 16, 2015, 1:41 pm

≫ Next: Defense in Depth - Best Practices for Securing a Teradata Data Warehouse

≪ Previous: Workload Management with SLES 11 – Tips and Techniques

Course Number:

53912

Training URL:

http://www.teradata.com/t/ten/

Training Format:

Recorded webcast

This presentation explains issues and recommended design patterns for implementing a database design in Teradata using in-database Temporal capabilities.

Presenter: Steve Molini - Teradata Corporation

Price:

$195

Credit Hours:

Channel:

database

Tags:

Temporal; Time Series; Slowly Changing Dimension; Data Model; Historical Reporting

↧

Defense in Depth - Best Practices for Securing a Teradata Data Warehouse

October 20, 2015, 3:12 pm

≫ Next: Teradata Virtual Machine Community Edition for VMware

≪ Previous: Implementing Temporal on Teradata

Course Number:

36675

Training URL:

http://www.teradata.com/t/ten/

Training Format:

Recorded webcast

Defense in depth is the practice of implementing multiple layers of security to protect business-critical information.

This session helps security administrators and database administrators better understand the types of attacks that can be used to compromise the security of a data warehouse. The session describes proven best practices that can be used to build a comprehensive, layered defense against these attacks and demonstrates how to use the security features within the Teradata Database and associated platforms to implement a strategy that balances security requirements with other important cost, performance, and operational considerations.

Key Points:

Understand the major security threats to a data warehouse
Proven best practices in 24 key areas for implementation of a defense in depth security strategy

Presenter: Jim Browning, Enterprise Security Architect - Teradata Corporation

Audience:

Database Administrator, Data Warehouse Architect

Price:

$195

Credit Hours:

Channel:

Tags:

↧

Teradata Virtual Machine Community Edition for VMware

November 4, 2015, 3:19 pm

≫ Next: Hands-on with Teradata QueryGrid™ - Teradata-to-Oracle

≪ Previous: Defense in Depth - Best Practices for Securing a Teradata Data Warehouse

Tags:

vmware

express

Channel:

database

Full user details required:

Full user details required

Community Edition is a SUSE Linux Enterprise Server (SLES) operating system and Teradata Database packaged into a virtual container that runs in a VMware vSphere ESXI virtualized environment on third-party hardware. Community Edition software consists of a template and associated property files and scripts. When a Community Edition virtual machine is deployed, it operates as a fully functional instance of the configured Teradata Database. Once deployed by the VMware administrator, Community Edition can be used to evaluate Teradata Virtual Machine and the Teradata Database.

Note: You must have administrative privileges on the destination VMware environment to install and configure Community Edition virtual machines.

Getting Started

The first task is to download the TVM Community Edition template, install scripts, and user manual from DEV/x.

You will need the following technical requirements and components to run Community Edition.

Component Requirements

VMware ESXi version ESXi 5.5 or 6.0

Datastore space for template deployment: Teradata Database: 30 GB minimum

vCenter management

• VMware administrator privileges

• Running vCenter Management instance (verified using the Microsoft

Services control within the vCenter image)

• vCenter Standard version (must support ESXi 5.5 or 6.0 servers)

• vCenter 5.5 or 6.0

vSphere Client

• Windows OS for vSphere Client

• vSphere PowerCLI version 5.5 release 2 or version 6.0

PowerCLI • Windows .Net Framework 4.5

• Windows PowerShell V2 or latest

• vSphere PowerCLI toolkit

Ignore ancestor settings:

Respect ancestor settings

Apply supersede status to children:

Ignore children

Download License:

Teradata Express Agreement (+SLES 10)

Package version:

13.10.00.05

↧

Hands-on with Teradata QueryGrid™ - Teradata-to-Oracle

November 6, 2015, 4:07 pm

≫ Next: Teradata Secure Zones: Implementation Basics

≪ Previous: Teradata Virtual Machine Community Edition for VMware

Course Number:

54628

Training URL:

http://www.teradata.com/t/ten/

Training Format:

Recorded webcast

Teradata QueryGrid enables high performance multi-system analytics across various data platforms. QueryGrid: Teradata-to-Oracle provides powerful SQL based functionality allowing you to join datasets across Teradata and Oracle systems.

Using bi-directional data transfer, ad-hoc query capabilities and push-down processing you can do queries and find new insights with data that may not be integrated yet while minimizing data duplication and movement. It also enables you to consolidate data marts that have DB links to other Oracle systems in your environment, saving you money. This course is a deep dive into both the functionality and example use cases for this connector. We use live queries to explain these capabilities and how to use them as well as discuss various applications for your analytic environment.

Prerequisite: Course #53556 High Performance Multi-System Analytics using Teradata QueryGrid (Webcast)

Presenter: Andy Sanderson, Product Marketing Manager - Teradata Corporation

Price:

$195

Credit Hours:

Channel:

Tags:

↧

Teradata Secure Zones: Implementation Basics

November 13, 2015, 3:10 pm

≫ Next: Things that Run in the Session Workload

≪ Previous: Hands-on with Teradata QueryGrid™ - Teradata-to-Oracle

Course Number:

54294

Training URL:

http://www.teradata.com/t/ten/

Training Format:

Recorded webcast

This session provides a technical overview of Teradata Secure Zones and shows you how easy it is to implement a zone through a hands-on demo.

Version: Teradata 15.10

Presenter: Youko Watari, Technical Product Marketing Manager - Teradata Corporation

Price:

$195

Credit Hours:

Channel:

database

Tags:

Secure Zones

security

Software-Defined Warehouse

multi-tenant

↧

Things that Run in the Session Workload

November 13, 2015, 3:27 pm

≫ Next: Hands-on with Teradata QueryGrid™ - Teradata-Aster

≪ Previous: Teradata Secure Zones: Implementation Basics

Cover Image:

You’ll probably want to skip over this posting.

Unless you’re curious about some of the more obscure workload classification conventions used by TASM and TIWM.

In a previous blog posting titled “A New, Simplified Approach to Parsing Workload Selection” I explained how a session workload is determined at the time a session logs onto the database. Once established, the session workload is used to support the activities required for either logging on a session or for parsing queries within the session.

If you have set up a parsing-only workload to serve as a single, high priority session workload for all of your users, and you happen to monitor that workload, you may find things that are not part of session logons or query parsing that are running in that workload.

This blog explains what these other activities are that may show up in a session workload, and some of the oddities in how the session workload is used by workload management routines.

Session Workload

First, let’s be clear on what a session workload does: The session workload manages the access to resources and the priority of activities that take place on the parsing engine, such as logging on a session or parsing a request.

A session workload that was responsible for parsing a query is reported in a field in the DBQLogTbl named SessionWDID. SessionWDID may be the same workload as the WDID for that request. If those two fields are the same, that means that parsing and optimizing for the query took place in the same workload as the AMP execution. Sometimes SessionWDID and WDID are the same workload, sometimes they are not. Read the blog posting mentioned above for more background, and for detail on the session workload and how it is selected.

Now let’s consider some of the activities that you may observe running in SessionWDID.

Stored Procedure CALL Statement

Stored procedure CALL statements always run in the SessionWDID workload for the session. When the dispatcher sees a CALL statement it will ensure it gets put in its SessionWDID so the SessionWDID and the WDID will always be the same for the CALL.

Here’s how a stored procedure CALL would look in DBQLogTbl:

The stored procedure CALL statement incorporates the code and logic, if any, contained in the stored procedure. It is usually a very, very small amount of CPU usage that comes before, between, and after the SQL statements within the stored procedure. A CALL stored procedure is given its own row in the DBQLogTbl, apart from the SQL contained within it.

Below is a different example of DBQL output that illustrates a CALL statement and the three SQL statements within the stored procedure. Note that DBQL in this example reports zero AMP CPU Time for the CALL, and a slight amount of CPU on the parsing engine. The request number for all of the SQL statements within the stored procedure will be the same as the request number for the CALL. However, each SQL statement that is part of the stored procedure will have a distinct internal request number that simply increments the original request number, as shown below.

In earlier releases stored procedure calls ran in the default workload, called “WD-Default”. Starting in 14.10 that changed, as some Teradata sites were concerned when they saw activity in WD-Default after having been careful to set up workload classification in way to avoid anything failing into the default workload. Consequently, there was a change, and today all CALL statements execute in the SessionWDID for the session.

If you would prefer CALL statements to execute in a workload of your choice, and not in the session workload, you can set up classification criteria on the workload where you want the CALL statements to execute. Use the Target Criteria named "Stored Procedure". You can name one or more stored procedures you want to run in that workload, or you can design it so that all stored procedures run in a single workload by using the wild card (*). But remember, we’re talking about only the CALL statement there, the framework of the stored procedure that encapsulates the SQL statements. All the SQL statements within the stored procedure will run in whatever workload they classify to, independent of the CALL statement.

CHECK WORKLOAD Statements

CHECK WORKLOAD statements were introduced in Teradata Database 13.10 to support the load utility unit of work. There are two such statements:

CHECK WORKLOAD FOR: Precedes the utility unit of work. Passes object information, such as database and table, into the database.
CHECK WORKLOAD END: Begins the utility unit of work. Determines the number of AMP sessions, triggers workload classification, and performs the utility throttle and resource limit checks.

The CHECK WORKLOAD FOR statement works the same way as the CALL statement in terms of workload classification. CHECK WORKLOAD FOR will always run in SessionWDID (which will be the parsing-only workload if you have one defined). This is very light work, as all that statement does is get the names of the objects involved in the load utility from parser memory, where they are initially read into the database, and pass them to the dispatcher. Similar to a CALL stored procedure, there is no AMP activity involved in the Check Workload For statement.

A CHECK WORKLOAD END statement classifies to the workload where the load utility will execute. It uses a different approach to workload classification than does a CHECK WORKLOAD FOR. A CHECK WORKLOAD END goes through normal request classification and matches to a single workload. Once that workload is selected, the workload becomes the load uiltity’s session workload as well If you look at DBQL output for a load utility, you will see WDID = SessionWDID for the duration of the load.

Here is an example of how CHECK WORKLOAD FOR and CHECK WORKLOAD END look in DBQL:

CHECK WORKLOAD FOR finds SessionWDID and moves it to the WDID, making them the same. CHECK WORKLOAD END does the opposite. It finds the WDID and moves it to the SessionWDID, making them the same for the duration of the load utility.

HELP and SHOW Commands

HELP and SHOW commands are short administrative commands that execute on the parsing engine but may go to the AMPs to get data dictionary information. Typical HELP/SHOW commands include:

HELP DATABASE
HELP SESSION
HELP STATISTICS
SHOW STATISTICS
SHOW TABLE
SHOW VIEW

Most of the time these types of commands will use express requests if they require AMP data. Express requests are streamlined requests that bypass the parser and optimizer and go directly to the data dictionary.

However, if any of these administrative commands are consider large, they will be executed as normal SQL and will be parsed, optimized, and dispatched to the AMPs. This would be the case if a spool file needed to be built to return the data.

If an express request is used to satisfy one of these commands, as is likely the case with a HELP DATABASE or HELP SESSION, then you will see only a SessionWDID in DBQL data. The WDID will be NULL. An express request bypasses the dispatcher, which is the module where the classification to a WDID is performed. So it never undergoes workload classification.

If, however, the administrative command is satisfied by issuing SQL, which might be the case when a lengthy histogram is returned from a SHOW STATISTICS command, it will appear in DBQL with both a SessionWDID and a WDID. They may be the same workload or a different workload.

Here is how DBQL output might look for a series of HELP and SHOW commands:

Notice that some of these commands have a WDID and some do not.

Throttles Can Delay HELP/SHOW Commands

There’s something else to be aware of concerning these administrative commands. Sometimes they end up in the throttle delay queue.

Throttle rules are enforced within the dispatcher before the first step in a request is sent to the AMPs. Because express requests bypass the dispatcher, they are never subject to being delayed by a workload management throttle rule. This means that most HELP/SHOW commands, at least the ones that only display a SessionWDID but not a WDID, are not at risk of being placed in the delay queue.

However, the HELP/SHOW commands that run as regular SQL (the ones that do display a WDID) could be sent to the delay queue. Whether this happens or not depends on whether such commands classify to a system throttle rule, and whether that system throttle is at its limit at the time the SQL reaches the dispatcher. Since a virtual partition throttle is a variation of a system throttle, such an unexpected delay could also happen as a result of a virtual partition throttle.

You can see this use of SQL instead of express requests yourself by running an EXPLAIN on a HELP or a SHOW command:

This HELP command will use SQL and can tentatively be delayed:

EXPLAIN HELP DATABASE cab;

1) First, we lock DBC.TVM for access.

2) Next, we do an all-AMPs RETRIEVE step from DBC.TVM by way of an

all-rows scan with a condition of ("DBC.TVM.Field_1 = '00002F04'XB")

into Spool 1 (group_amps), which is built locally on the AMPs.

Then we do a SORT to order Spool 1 by the sort key in spool field1

(DBC.TVM.NameI).

3) Finally, we send out an END TRANSACTION step to all AMPs involved

in processing the request.

-> The contents of Spool 1 are sent back to the user as the result of statement 1.

This SHOW command uses an express request and will not be delayed:

EXPLAIN SHOW TABLE nation;

-> The row is sent directly back to the user as the result of statement 1.

Preventing Delays

It is simple to prevent these administrative types of commands from ever being delayed. Refine the classification on any system throttles that could be the cause of such a delay by adding critieria that these commands cannot match.

The chart below is a subset of the earlier chart, displaying only the HELP/SHOW commands that have both a SessionWDID and a WDID, an indication that they were converted to SQL:

One thing they have in common is an estimated processing time of zero. Knowing that, you can add classification criteria to any system throttles that could cause these commands to be delayed. Just include this additional specification: “EstProcTime > 0”. That way only requests with an actual estimate will be under the control of the throttle, but HELP and SHOW commands will not.

A future enhancement will automatically bypass throttles for such commands.

Conclusion

A session classifies to a session workload to do the logon processing and also to do the parsing engine work of all requests that execute within that session. However, there are additional types of very light work that will run in the session workload as well.

If you have set up a parsing-only workload, the additional commands and requests that execute at the session priority should not be a performance concern.

Ignore ancestor settings:

Tags:

Apply supersede status to children:

↧

Hands-on with Teradata QueryGrid™ - Teradata-Aster

December 3, 2015, 12:22 pm

≫ Next: Explaining the Aster Explain

≪ Previous: Things that Run in the Session Workload

Course Number:

54756

Training URL:

http://www.teradata.com/t/ten/

Training Format:

Recorded webcast

Teradata QueryGrid enables high performance multi-system analytics across various analytic platforms. QueryGrid: Teradata-to-Aster allows Aster’s powerful multi-genre analytics to be used as extensions of the Teradata database.

Using bi-directional data transfer, ad-hoc query capabilities and push-down processing you can seamlessly run queries using data and analytics on both Teradata and Aster. This allows you to leverage the strengths and of both environments to discover new insights. It also allows easy ad-hoc data transfer between the platforms for discovery and analysis. This course covers the main functionality and an example use case for using this QueryGrid connector.

Prerequisite: Course #53556 High Performance Multi-System Analytics using Teradata QueryGrid (Webcast)

Presenter: Andy Sanderson, Product Marketing Manager - Teradata Corporation

Price:

$195

Credit Hours:

Channel:

Tags: