Quantcast
Channel: Teradata Developer Exchange - Database
Viewing all 174 articles
Browse latest View live

ARRAY Data Type Scenario

$
0
0
Short teaser: 
Store matrices of single- or multi-dimensional data in a single field with array data types.
Cover Image: 

SQL ARRAY/VARRAY Data Type

This feature was introduced in Teradata Database 14.00.

Description

ARRAY is a type of user-defined data type (UDT) that can represent up to five dimensions, each with a user-defined maximum number of individual elements. All of the elements of an ARRAY must be of the same data type.

The one-dimensional ARRAY type is largely compatible with the VARRAY type provided by Oracle Database. You can create a one-dimensional ARRAY data type using the Teradata database ARRAY keyword, or by using the VARRAY keyword and syntax for Oracle compatibility.

More...

Benefits

  • The ARRAY/VARRAY data types permit many values of the same data type to be stored sequentially or in a matrix-like format, extending the number of values of the same data type that can be stored in a row.
  • The values of an ARRAY/VARRAY column are stored within the row itself, not in a Large Object (LOB). Although this limits the size of ARRAY data types to 64K bytes, it provides major performance benefits because you can access and modify individual elements without having to create and write a new LOB each time.
  • Teradata Database provides many built-in functions and operators, and a constructor expression, to support the ARRAY and VARRAY data types. The elements of an ARRAY/VARRAY data type can be accessed individually or by using one of the built-in functions provided to support these data types as a sequential subset. Because these built in functions are provided as Teradata Database embedded services, the performance is much better than if you were to create your own UDFs for array manipulations.

Considerations

  • You can update only one element of the array at a time to a unique value, or all elements of the array to the same value. You cannot update several elements to different unique values in a single operation.
  • You cannot specify an ARRAY column in the ORDER BY, GROUP BY, or HAVING clauses of a SELECT request.
  • You cannot specify an ARRAY column in an index.
  • Casting functionality is provided to cast an ARRAY data type to a VARCHAR formatted string, and to cast a VARCHAR formatted string to an ARRAY data type. You can create your own user-defined function (UDF) to perform additional casting functionality.
  • The one-dimensional ARRAY data type is partially compliant with the ANSI SQL:2008 standard. There is no current ANSI SQL:2008 standard for multidimensional ARRAY data types.
  • ARRAY data types are stored in the SYSUDTLIB database together with their respective automatically generated constructors. In order to create an ARRAY data type, a user must have appropriate UDT access rights on the SYSUDTLIB database.

Scenario: Managing a Clothing Inventory

This scenario creates a simple three-dimensional array to represent aspects of a clothing vendor inventory. A table for Clothing has rows for different types of clothing. The inventory information for all the different styles, sizes, and colors of each clothing type is stored in a single field, the array type column for the row.

The scenario demonstrates how to:

  • Create the ARRAY UDT
  • Create a table with a column defined to contain ARRAY values
  • Insert rows containing ARRAY data into the table
  • Display ARRAY data from the table
  • Update a value stored in an ARRAY
  • Use an ARRAY system function to find the inventory of all colors of a given style and size of one type of clothing

For More Information About the ARRAY and VARRAY Data Types

For more information about the ARRAY and VARRAY data types, and about the SQL used in these examples see:

DocumentDescription
SQL Data Definition Language - Detailed Topics, B035-1184Provides information on creating ARRAY and VARRAY objects, and discusses creating and using arrays.
SQL Data Definition Language - Syntax and Examples, B035-1144Shows SQL syntax for creating and altering UDT objects.
SQL Data Types and Literals, B035-1143Describes the ARRAY and VARRAY data types.
SQL Functions, Operators, Expressions, and Predicates, B035-1145Describes functions that operate on ARRAY and VARRAY data.
SQL External Routine Programming, B035-1147Describes how to define C and C++ functions that use ARRAY and VARRAY data types.
Orange Book: Working with User Defined Types: Creation, Usage and Potential Pitfalls, Document #: 541-0005614Detailed discussion and application examples of using UDTs.

Example: Define a Three-Dimensional Array

The following CREATE TYPE statement defines an ARRAY data type as a distinct UDT:

CREATE TYPE clothing_inv_ARY AS INTEGER ARRAY [2][4][3];

Note the following:

  • This array consists of three dimensions of sizes 2, 4, and 3, which represent styles, sizes, and colors of clothing items, respectively.
  • An ARRAY must contain data of a single data type. In this case the array contains INTEGER data, the inventory quantifies.
  • The data stored in an array should be of the same kind. For example, our array has three dimensions, representing style, size, and color, but the values stored in the individual array elements will all be inventory quantities for the different combinations of style, size, and color. Mixing different kinds of data in the array, such as values for style, size, and color is not the recommended use of arrays.
  • The maximum number of values the array can hold is the product of the sizes of all dimensions. In this example, our clothing_inv_ARY array can hold up to (2x4x3)=24 values.
  • To identify and address any single piece of data (element) of the array, you specify its position within each dimension. Because this is a three-dimensional array, every element is specified by a three number address, each address corresponds to a specific combination of style, size, and color for the clothing item. The first number can be 1 or 2 (two possible styles), the second 1 through 4 (four possible sizes), and the third 1 through 3 (three possible colors). The elements of the array would have addresses like [1][1][1], [1][1][2],[1][1][3],[1][2][1],[1][2][2],[1][2][3],[1][3][1], [1][3][2] and so forth through [2][4][3].
Note: An alternative syntax for creating ARRAY data types uses two signed integers to specify the size of each dimension. The following CREATE TYPE statement defines the same ARRAY as above using this syntax:
CREATE TYPE clothing_inv_ARY AS INTEGER ARRAY [1:2][1:4][1:3];
The parameters with [n:m] represent the minimum and maximum bounds of each dimension which are used to identify and address the individual array elements. This two-number specification allows you to specify negative numbers to be used when addressing a particular array element. For example, we could create an analogous 24-element array using this syntax:
CREATE TYPE clothing_inv_ARY AS INTEGER ARRAY [-1:0][-4:-1][1:3];

The individual dimensions would have the same sizes as our array above, however to address the elements you would use the following sequence of locations: [-1][-4][1], [-1][-4][2],[-1][-4][3],[-1][-3][1],[-1][-3][2],[-1][-3][3], and so forth through [0][-1][3].

Example: Create a Table Referencing an Array UDT

The following example creates a table that includes an array column. The Clothing_Inventory column is defined to be of type clothing_inv_ARY, our ARRAY UDT. For each row, which represents a type of clothing, the Clothing_Inventory column stores the entire inventory for all styles, sizes, and colors of that clothing type.

CREATE TABLE Clothing (
   Clothing_Type VARCHAR(10),
   Clothing_Inventory clothing_inv_ARY);

Example: Insert to a Table That has an ARRAY Column

The following example demonstrates how to add array data to a table that has a column defined to hold ARRAY type data.

Assume the table was defined to have two columns: Clothing_Type and Clothing_Inventory, where Clothing_Inventory is an ARRAY UDT that defines a three-dimensional array to hold inventory quantities of the clothing items. The array addresses are indexed by the style, size, and color of the clothing type represented by each row. The following example populates the table, and demonstrates inserting data into the ARRAY values:

INSERT INTO Clothing VALUES
   ('T-Shirt', NEW clothing_inv_ARY(20, 8, 4)
   );


INSERT INTO Clothing VALUES
   ('Sweatpants', NEW clothing_inv_ARY( 11,   2,  23,
                                         0,  10,  25,
                                       130,  62,  17,
                                        77, 142,  22,
                                        81,   0, 215,
                                        30,  27,  31,
                                        12,  23,   0,
                                         4,  55,  11)

   );

The data in an ARRAY must all be of the same data type. In this example, the data is all INTEGERs. Each ARRAY has 24 elements (2x4x3). The two rows inserted into the table each represent a different type of clothing.

The three dimensions of the array correspond to:

  • Styles:
    • For shirts: long sleeve or short sleeve, represented by addresses 1 and 2, respectively, in the first dimension
    • For sweatpants: elastic waist or drawstring waist, represented by addresses 1 and 2, respectively, in the first dimension
  • Sizes: small, medium, large, or extra large, represented by addresses 1, 2, 3, and 4, respectively, in the second dimension for both shirts and sweatpants
  • Colors: blue, gray, or white, represented by addresses 1, 2, and 3, respectively, in the third dimension for both shirts and sweatpants

The assignment of array dimension address values to specific dimension values maps the array and allows us to associate a specific three-number array address with a specific style, size, and color of the clothing item represented by each row of the table.

Data is added to the array in row-major order. Row-major order means that the first dimension (leftmost in the array specification) is the most major dimension, and as we move toward the last dimension, they become less and less major. Our array values in this example would fill the clothing_inv_ARY array addresses for sweatpants in this way:

Array Address  Quantity Stored |Array address  Quantity Stored
===============================|==============================
[1][1][1]       11             |[1][3][1]       130                
[1][1][2]        2             |[1][3][2]        62
[1][1][3]       23             |[1][3][3]        17
[1][2][1]        0             |[1][4][1]        77
[1][2][2]       10             |[1][4][2]       142
[1][2][3]       25             |[1][4][3]        22 ... and so forth.

Although the array can hold quantities for up to 24 different kinds of each type of clothing, each a unique combination of style, size, and color, neither the shirts array nor the sweatpants arrays are fully populated.

Example: Display Array Contents

This example displays the entire contents of the array for each type of clothing.

SELECT clothing_type, clothing_inventory FROM clothing;

Result:

Clothing_Type Clothing_inventory
------------- -------------------------------------------------------------
Sweatpants    (11,2,23,0,10,25,130,62,17,77,142,22,81,0,215,30,27,31,12,23,
T-Shirt       (20,8,4)

To display the inventory quantity for a particular clothing item, use the array address that corresponds to the combination of style, size, and color of the item of interest. For example, the following query returns the inventory quantity of sweatpants having a drawstring waist (address 2 in the first dimension), medium size (address 2 in the second dimension), and white color (address 3 in the third dimension).

SELECT clothing_inventory [2][2][3]
   FROM clothing
   WHERE clothing_type = 'Sweatpants';

Result:

Clothing_inventory[2][2][3]
---------------------------
                         31

Example: Update the Quantity for One Item in the Array

Here is our array showing the corresponding array addresses for every value:

The following queries update the quantity of drawstring waist, medium, white sweatpants, then return the new value:

UPDATE clothing
   SET clothing_inventory [2][2][3] = 999
   WHERE clothing_type = 'Sweatpants';

To verify the change, we can display the new contents of the array at th [2][2][3] address:

SELECT clothing_inventory [2][2][3]
   FROM clothing
   WHERE clothing_type = 'Sweatpants';

Result:

Clothing_inventory[2][2][3]
---------------------------
                        999

Example: Display the Sum of a Subset of Array Contents Using ARRAY_SUM

The ARRAY data type feature includes several embedded service system functions for manipulating arrays. This example uses the ARRAY_SUM function returns the sum of values in the array for a given subset of array addresses.

To query the inventory for elastic waist, small size sweatpants, summing the quantities for all colors, you would use the following query, which specifies the addresses for elastic and small, and the minimum and maximum bounds of the color dimension:

SELECT ARRAY_SUM(clothing_inventory, NEW arrayVec(1,1,1), NEW arrayVec(1,1,3))
   FROM clothing 
   WHERE clothing_type = 'Sweatpants';

Result:

ARRAY_SUM(clothing_inventory, NEW ARRAYVEC(1, 1, 1), NEW ARR
------------------------------------------------------------
                                                          36
Supersede
Ignore ancestor settings: 
0
Apply supersede status to children: 
0
Channel: 

Period Data Type and Derived Period Column Scenario

$
0
0
Short teaser: 
Use Period data types and derived period columns to query and manipulate durations.
Cover Image: 

Overview: Period Data Types and Derived Period Columns

Teradata Database supports storage and manipulations that involve durations, spans of time defined by a beginning and ending bounds:

  • Period data types allow creation of single values to represent durations, where the beginning and ending bounds can be any pairs of standard Teradata Database DateTime values (DATE, TIME, and TIMESTAMP data types).
  • Derived period columns allow you to define a duration using values from two existing Date or TIMESTAMP columns.

Both Period data types and derived period columns define durations that are inclusive of the beginning duration bound, and exclusive of the ending bound.

Teradata Database provides a rich variety of functions, operators, and predicates that can be used on Period data type and derived period columns.

Period Data Types

This feature was introduced in Teradata Database 13.10.

Description

Period data types represent durations. A Period data type value includes the beginning and ending bound that defines the duration. Teradata Database supports the following Period data types, where the differences are based on the type of DateTime data used to define the duration bounds:
  • PERIOD(DATE)
  • PERIOD(TIME[(n)])
  • PERIOD(TIME[(n)] WITH TIME ZONE)
  • PERIOD(TIMESTAMP[(n)])
  • PERIOD(TIMESTAMP[(n)] WITH TIME ZONE)
More...

Benefits

  • Extends the temporal capabilities of Teradata Database with a data type that represents a duration.
  • Period data type columns are defined using standard SQL CREATE TABLE statements, and can be added to or dropped from tables using standard ALTER TABLE statements.
  • Teradata Database provides a rich variety of functions, operators, and predicates that can be used on PERIOD data.
  • Allows faster development of efficient applications that use and manipulate durations.
  • Period data types can be used to define Teradata temporal tables.

Considerations

  • Constraint definitions cannot be specified for Period data type columns.
  • Column storage attributes cannot be specified for Period data type columns.
  • Distinct UDTs cannot be created as Period data types.
  • Window functions are not allowed on Period data types.
  • Indexes cannot be specified for Period data type columns.

Derived Period Column

This feature was introduced in Teradata Database 14.10.

Description

A derived period column provides a way to represent a span of time in a single table column. The derived period column combines a pair of defined DateTime type columns that specify the beginning and ending bounds of a duration. Under most circumstances, derived period columns are treated by Teradata Database similarly to columns defined to have true Period data types.

Benefits

  • Derived period columns are defined using standard SQL CREATE TABLE statements, and can be added to or dropped from tables using standard ALTER TABLE statements.
  • Derived period columns can be used with all functions, operators, and predicates that support true Period data types.
  • The EXPAND clause of SELECT statements supports expansion on derived period columns.
  • Derived period columns can be used to define temporal tables.
  • Provides Teradata Database compatibility with existing temporal applications that use two columns to define durations.
  • The two component columns of a derived period column can be used anywhere a regular DateTime column can be used, for example in index definitions, partition expressions, projections, and check constraints.
  • A check constraint that ensures the beginning bound column value is a DateTime value prior to the corresponding ending bound column value is automatically enforced for all DML operations on the component columns.
  • Any constraint definition or function allowed on a DateTime column is allowed on the component begin and end columns of a derived period column. Such functions include hash index, unique or primary key constraints, check constraints, reference constraints.
  • Statistics can be collected on begin and end bound columns that define a derived period column, which can improve query performance. Consequently, temporal tables that use derived period columns rather than Period data type columns for their VALIDTIME and TRANSACTIONTIME columns can display better performance.
  • Although derived period columns share the restrictions of true Period data types, the component begin and end bound columns that comprise a derived period column are regular DateTime columns, and can be used and manipulated, for the most part, as any regular DateTime column.

Considerations

  • Derived period columns cannot be:
    • part of primary or secondary index
    • included in a partition expression
    • named in a SELECT list
  • Derived period columns must always be defined as NOT NULL.
  • The component begin and end bound columns that define a derived period column cannot have a TIME data type.
  • The only column attributes that can be defined for a derived period column are AS VALIDTIME and AS TRANSACTIONTIME, used to create temporal tables.
  • Each derived period column defined for a table counts against the system limit for columns per table.

Differences Between Period Data Type Columns and Derived Period Columns

Period data type columns and derived period columns are largely analogous, and can be used for similar purposes. However, there are some differences that are noteworthy. In most cases, restrictions on Period data type columns also apply to derived period columns, however, the component columns that comprise derived period columns are generally not so restricted.

Period Data Type ColumnsDerived Period Columns
Period data type columns support many column attributes. These include data type attributes (such as NULL and NOT NULL, FORMAT 'format string', TITLE, DEFAULT value), column constraint attributes (such as UNIQUE and CHECK), and column storage attributes (such as COMPRESS).Derived period columns support only the VALIDTIME and TRANSACTIONTIME data type attributes if the derived period column is part of a temporal table.
Statistics cannot be collected on Period data type columns.Statistics cannot be collected on derived period columns, however, statistics can be collected on the component columns that function as the begin and end bounds of a derived period column.
Constraint definitions cannot be specified for Period data type columns.Constraint definitions can be specified on the component columns of a derived period column.
Period data type columns cannot be part of secondary indexes.The component columns of a derived period column can be part of secondary indexes.
Period data type columns can be defined to allow NULL values.Component begin and end bound DateTime columns of a derived period column must be defined as NOT NULL.
Period data type column data can use TIME data types in the PERIOD data type constructors unless the period data type column is a VALIDTIME or TRANSACTIONTIME column in a temporal table. Component begin and end bound DateTime columns of a derived period column cannot be of type TIME.
Period data type columns can be included in column specifications for SELECT statements (can be projected in SELECT lists).Derived period columns cannot be included in SELECT lists, however, component columns of a derived period column can be in SELECT lists.
Begin and end bound values of Period data type columns can be used in partition expressions.Derived period columns cannot be part of a partition expression, however their component begin and end time bound columns can be used in partition expressions.

Scenario: Determining Length of Service for Employees

This scenario creates two versions of an employee data table. One table uses a period data type column to contain the start and end employment date for each employee. The second table uses a derived period column, and two columns that define the start and end employment date for each employee. These tables allows for querying the duration of service for any employee. The scenario also demonstrates inserting employee records having period data into the table, and updates the start date for one employee.

For More Information about Period Data Types and Derived Period Columns

For more information about Period data types and derived period columns, and about the SQL used in these examples, see:

DocumentDescription
SQL Data Types and Literals, B035-1143Describes the Period data types.
SQL Data Definition Language - Syntax and Examples, B035-1144Describes defining, adding, and removing Period data type columns and derived period columns.
SQL Functions, Operators, Expressions, and Predicates, B035-1145Describes functions, operators, and predicates that are used with Period data types and on derived period columns.
SQL Data Manipulation Language, B035-1146Provides examples of DML operations on Period data types and derived period columns.

Example: Create a Table with a PERIOD(DATE) Data Type Column

The following CREATE TABLE statement defines an employee table that includes a PERIOD(DATE) data type column with a default value set using a Period literal.

CREATE TABLE Service_Period
(
employee_id INTEGER,
employee_name CHARACTER(15),
employee_duration PERIOD(DATE)
DEFAULT PERIOD '(2005-02-03, 2006-02-03)'
);

Example: Create a Table with a Derived Period Column

The following CREATE TABLE statement defines an employee table that includes a derived period column using DATE type values to define the beginning and ending bounds of the derived period.

CREATE TABLE Service_Derived_Period
(
employee_id INTEGER,
employee_name CHARACTER(15),
employee_beg DATE NOT NULL FORMAT 'YYYY-MM-DD',
employee_end DATE NOT NULL FORMAT 'YYYY-MM-DD',
PERIOD FOR employee_duration(employee_beg, employee_end)
);

Example: Insert Into a Table with a Period Data Type Column

The following INSERT statements insert employee data into a PERIOD(DATE) column.

INSERT INTO Service_Period VALUES (1,'John',PERIOD(DATE '2005-02-03',DATE '2013-02-04'))
;INSERT INTO Service_Period VALUES (2,'Mary',PERIOD(DATE '2000-12-15',DATE '2013-02-04'))
;INSERT INTO Service_Period VALUES (3,'Adam',PERIOD(DATE '2001-06-30',DATE '2013-02-04'))
;INSERT INTO Service_Period VALUES (4,'Simon',PERIOD(DATE '2002-03-15',DATE '2013-02-04'));

Example: Insert Into a Table With a Derived Period Column

The following INSERT statements insert employee data into the DATE type columns that are used to define the bounds of a derived period column.

INSERT INTO Service_Derived_Period VALUES (1,'John',(DATE '2005-02-03'),(DATE '2013-02-04'))
;INSERT INTO Service_Derived_Period VALUES (2,'Mary',(DATE '2000-12-15'),(DATE '2013-02-04'))
;INSERT INTO Service_Derived_Period VALUES (3,'Adam',(DATE '2001-06-30'),(DATE '2013-02-04'))
;INSERT INTO Service_Derived_Period VALUES (4,'Simon',(DATE '2002-03-15'),(DATE '2013-02-04'));

Example: Select an Interval From Period Data

This SELECT statement returns the number of years of service for all employees using Period data.

SELECT employee_name, INTERVAL (employee_duration) YEAR FROM Service_Period
   ORDER BY 2 DESC;

Result:

employee_name    INTERVAL(employee_duration) YEAR
---------------  --------------------------------
Mary                                           13
Adam                                           12
Simon                                          11
John                                            8

Example: Select an Interval From Derived Period Data

This SELECT statement returns the number of years of service for all employees using a derived period column.

SELECT employee_name, INTERVAL (employee_duration) YEAR FROM Service_Derived_Period 
   ORDER BY 2 DESC;

Result:

employee_name    INTERVAL(employee_duration) YEAR
---------------  --------------------------------
Mary                                           13
Adam                                           12
Simon                                          11
John                                            8

Example: Update Period Data

This UPDATE statement changes the start date for John in the Period data type column.

UPDATE Service_Period SET employee_duration = PERIOD(DATE '2006-02-01',DATE '2013-02-04')
   WHERE employee_id = 1;

This SELECT statement returns the number of years of service for John from the Period data type column.

SELECT employee_name, INTERVAL (employee_duration) YEAR
   FROM Service_Period 
   WHERE employee_id = 1;

Result:

employee_name    INTERVAL(employee_duration) YEAR
---------------  --------------------------------
John                                            7

Example: Update Derived Period Column

This UPDATE statement changes the start date for John, which defines the start bound of the employee_duration derived period column.

UPDATE Service_Derived_Period SET employee_beg = (DATE '2006-02-01')
   WHERE employee_id = 1;

This SELECT statement queries the derived period column to return the number of years of service for John.

SELECT employee_name, INTERVAL (employee_duration) YEAR 
   FROM Service_Derived_Period 
   WHERE employee_id = 1;

Result:

employee_name    INTERVAL(employee_duration) YEAR
---------------  --------------------------------
John                                            7
Supersede
Ignore ancestor settings: 
0
Apply supersede status to children: 
0
Channel: 

Defense in Depth - Best Practices for Securing a Teradata Data Warehouse

$
0
0

Defense in depth is the practice of implementing multiple layers of security to protect business-critical information.

This session will help security administrators and database administrators better understand the types of attacks that can be used to compromise the security of a data warehouse. The session will describe proven best practices that can be used to build a comprehensive, layered defense against these attacks and will demonstrate how to use the security features within the Teradata Database and associated platforms to implement a strategy that balances security requirements with other important cost, performance, and operational considerations.

Key Points:

  • Understand the major security threats to a data warehouse 
  • Proven best practices in 21 key areas for implementation of a defense in depth security strategy 

Presenter: Jim Browning, Enterprise Security Architect - Teradata Corporation
 

Audience: 
Database Administrator, Data Warehouse Architect
Training details
Course Number: 
36675
Training Format: 
Recorded webcast
Price: 
$195
Credit Hours: 
2
Channel: 

R14.10 Temporal Feature Deep Dive

$
0
0

When Teradata released its new temporal capabilities in 2010, it caused us all to start to thinking differently about the element of time in the Data Warehouse by providing powerful tools for managing and navigating temporal & bitemporal data structures. With the release of Teradata R14.10, we will be challenged still further. 

The session will provide a deep dive into Teradata’s latest DBMS release. It will focus on the following temporal enhancements and best practices for implementation of:

•Derived or Pseudo Periods
•Join Elimination for SEQUENCED Views
•Enhanced Normalize function
•SEQUENCED VALIDTIME SELECT Support for Aggregate Functions
•Sequenced View Support for Teradata ILDMs
•Temporal Table Timestamps Granularity Settings

We will use a case study approach to highlight these features to solve real world problems. Further, we will discuss implications for loading & accessing data as well as the impact on resource consumption & database administration.

Teradata Release Information: Version 14.10

Presenters:
Greg Sannik, PS Principal Consultant – Teradata Corporation
Fred Daniels, Pre-Sales Senior Consultant – Teradata Corporation

Audience: 
Database Administrator, Designer/Architect, Application Developer
Training details
Course Number: 
50787
Training Format: 
Recorded webcast
Price: 
$195
Credit Hours: 
2
Channel: 

Easing Into Using the New AutoStats Feature

$
0
0

Everyone with a Teradata Database collects statistics.  No way around it.  Good query plans rely on it, and we’ve all gotten used to it.

But starting in Teradata 14.10 there’s some big changes.  Numerous enhancements to the optimizer, some important database changes, a new set of APIs, an added data dictionary table, and a brand new Viewpoint portlet all combine together to produce the Automated Statistics (AutoStats) feature.  Used fully, AutoStats can identify statistics the optimizer looked for and found missing, flag unused statistics that you no longer need to collect on, prioritize and execute the stats collection statements, and organize/run your statistics collection jobs in the most efficient way possible.

There’s a lot going on and a lot of choices involved.  So what’s the easiest way to start getting familiar with AutoStats?   And what’s required with the new DBQL logging options?  If you like to build up confidence in a new feature at your own pace, this blog posting offers a few simple steps designed to introduce you to key features and benefits AutoStats.  

Sources of Information

AutoStats leverages a variety of new statistics enhancements available in Teradata 14.10.  Many of these new capabilities can be used directly within customized statistics management routines that already exist at many customer sites.

Reference the orange book titled:  Teradata Database 14.10 Statistics Enhancements by Rama Krishna Korlapati for more information.  Major features discussed in this orange book include automatic downgrading to sampled statistics, automatic skipping of unnecessary recollections based on thresholds, tracking update-delete-insert counts, and collecting statistics on single table expressions.

However, to gain the full benefit of an automated closed-loop architecture that incorporates these enhancements, take advantage of the statistics collection management and tuning analysis opportunities of AutoStats.  For a solid understanding of this feature, read the orange book titled  “Automated Statistics Management” by Louis Burger and Eric Scheie.   It discusses what AutoStats does, how to use the Viewpoint portlet screens, and provides guidelines on using AutoStats successfully.  

Here in Developer Exchange, take a look at the article from Shrity dated May 2013 titled “Teradata Viewpoint 14.10 Release”.   It describes and illustrates the Viewpoint portlet that supports AutoStats that you will be using when you get to 14.10.  This Viewpoint portlet is named “Stats Manager”.   

This blog posting is intended to help you ease into using the new AutoStats feature.  Subsequent related postings will explore individual features of 14.10 statistics management, such as the Threshold option, and suggest ways you can use them outside of the AutoStats architecture.

Version 6 of the Statistics Histogram

One underpinning of the Teradata Database 14.10 enhancements to statistics collection is a new version of the statistics histogram.  Version 6 is necessary to support some of these new enhancements, including a system-derived sampling option for the statistics collection statement, and  for capturing insert, delete and update counts. 

In order to use Version 6, the system needs to be committed to not back down to any Teradata Database release earlier than 14.10. The DBS Control general field #65 NoDot0Backdown must be set to true to indicate the no back down decision.  This change does not require a restart.  The NoDot0Backdown is an internal DBS Control field.  

If you have turned on DBQL USECOUNT logging for a database (discussed in the next section), and no back down is committed, the Version 6 histogram will automatically be used the next time you recollect statistics for the tables within that database.

The database supports backward compatibility.  Statistics whose histograms use the previous versions (Version 1 to Version 5) can be imported into the system that supports Version 6. However, the Version 6 statistics cannot be imported to systems that support Version 5 or lower versions.  After migrating to Teradata 14.10, it’s a good idea to accelerate the recollection of current statistics.  The recollection will move your histograms to the new version, which in turn will support a wider use of the new enhancements.

New DBQL options  - What They Do and When to Use Them

AutoStats functionality relies on your making more information available to the database about how statistics are used.  Some of this information it digs up itself, such as which stats you have already collected, and which options you have specified on your current statistics collection statements.

But to provide the most interesting (and helpful) new functionality, such as determining which statistics you have missed collecting on, or no longer need, additional DBQL logging will be required.  While this extra DBQL logging is not absolutely necessary if you choose to use AutoStats only for collection and management purposes, things like the new re-collection threshold functionality will not work without DBQL USECOUNT logging taking place.  AutoStats uses a “submit often” architecture, where statistics collections are submitted frequently, making the 14.10 threshold logic important for determining which statistics actually need to be refreshed at run time.

DBQL USECOUNT Logging

If you enable DBQL logging with the USECOUNT option, and then run an AutoStats Analyze job from the Stats Manager portlet, you can identify unused statistics as well as detect which statistics are being used.   Although you turn on USECOUNT by a BEGIN QUERY LOGGING statement, USECOUNT is not a normal query-based DBQL option, and it is not something you enable at the user level.  You turn USECOUNT on at the database level in a separate logging statement from the DBQL logging targeted to users.   

BEGIN QUERY LOGGING WITH USECOUNT ON SandBoxDB;

Once you enable logging, USECOUNT tracks usage of all objects for that database.  USECOUNT is the new mechanism for turning on Object Use Count (OUC).  

One type of object tracking USECOUNT does has to do with statistics.  In 14.10 each individual statistic is considered an “object” and will have a row in a new table, DBC.ObjectUsage.   With USECOUNT on, the database counts how often statistics objects were used by the optimizer.  Whenever the optimizer uses a particular statistic, the access count in that row in the DBC.ObjectUsage table is incremented.   If a particular statistics has no usage over a period of time, routines will flag it as being unused, and subsequent executions of Analyze jobs scoped to that database will recommend de-activating it.

USECOUNT also tracks inserts, deletes and updates made to tables within the database you have selected.  This provides the extrapolation process with more accurate information on changes in table size, and also allows enforcement of the Threshold Change Percent limits that you can apply to individual stats collection statements starting in 14.10.  The AutoStats feature incorporates threshold-based information in determining the ranking, and therefore the submission priority, of the individual statistic collections within the scripts it builds.

Here are few tips when logging USECOUNT:

  1. Before you turn on USECOUNT, and in order for the optimizer to use the use counts, set the DBS Control field NoDot0Backdown to true.
  2. Keep USECOUNT logging on for databases whose tables are being analyzed in an ongoing basis.
  3. In general, it is good to keep USECOUNT logging on for all important databases.  In particular, USECOUNT logging is important for those databases where the SYSTEM change-based THRESHOLD is in effect. New optimizer statistics fields in DBS Control allow the user to enable and set system-level THRESHOLD defaults.   A future blog posting will discuss the new THRESHOLD functionality in more detail. 

DBQL STATSUSAGE and XMLPLAN Logging

These additional DBQL logging options create XML documents that identify which statistics were used, and ones that the optimizer looked for but did not find.  Contrary to USECOUNT, these two logging types should be turned on temporarily, only as needed in preparation for analysis.  Both provide input to Analyze jobs whose evaluation method has been specified as  “DBQL”.   Both logging options store the XML that they produce in the DBC.DBQLXMLTbl.

They basically do the same thing.  However XMLPLAN is more granular, provides more detail, and comes with more overhead than STATSUSAGE logging.   

The difference is that STATSUAGE logs the usage of existing statistics and recommendations for new statistics (that it found to be missing) into the DBQLXMLTbl without any relationship to the query steps.  The relationship to the query steps can only be made if XMLPLAN logging is also enabled.  XMLPLAN logs detailed step data in XML format.  If you enable XMLPLAN by itself without STATSUSAGE, there is no stats related data logged into the DBQLXMLTbl.  STATSUSAGE provides all the statistics recommendations, and if both are being logged, then those statistical recommendations can be attached to specific steps.   So if you plan to log XMLPLAN, log STATSUSAGE at the same time.   

If both types of logging are on, both options log into a single XML document for a given request.  But you must explicitly request logging for them, as the ALL option in DBQL will not include STATSUSAGE and XMLPLAN.   Logging for both are enabled at the user level, as is STEP or OBJECT logging. 

When you begin logging with these new DBQL options, take into consideration any logging that is already taking place.   In the statement below, although SQL and OBJECTs are not used by AutoStats, they may be included in the same logging statement with STATSUSAGE and XMLPLAN.

BEGIN QUERY LOGGING WITH SQL, OBJECTS, STATSUSAGE, XMLPLAN LIMIT SQLTEXT=0 ON USER1;

The orange book Teradata Database 14.10 Statistics Enhancements provides an example of the XML output captured by DBQL STATSUSAGE logging.  See page 65-67.

Here are few tips when logging STATSUSAGE and XMLPLAN:

  1. For first time tuning or analysis, or after changes to the database, enable both XMLPLAN and STATSUSAGE options for those users who access the affected tables.
  2. Log with both options for at least one week in advance of running an Analyze job, or however long it takes to get good representation for the different queries.
  3. Consider using the more detailed XMLPLAN only for focused tuning, or briefly for new applications, as it incurs some overhead and requires more space.
  4. Turn off logging of both options during non-tuning periods of time or when no Analyze jobs on affected data are being planned in the near future.
  5. Keep the scope of logging to just the users accessing the databases and applications undergoing analysis and tuning.

Note:  You can enable DBQL logging for STATSUSAGE and XMLPLAN based on account or user, or whatever DBQL logging conventions currently exist at your site.   Keep in mind the other DBQL logging rules already in place.

Becoming Familiar with AutoStats a Step at a Tme

Because AutoStats jobs can incorporate just a single table’s statistics or be an umbrella for  all of your databases, you want to keep careful track of how you scope your initial jobs.  Pick your boundaries and start off small. 

Initial steps to take when first starting out with AutoStats include:

  1. Read the orange book Automated Statistics Management  and understand how it works.
  2. Select a small, non-critical database with just a few tables, perhaps in your sandbox area, for your initial AutoStats interactions.
  3. Try to pick a database that represents a new application area that has no current statistics, or a database with some level of statistics, but whose statistics are not well tuned.
  4. Set up DBQL logging options so that new logging options are focused on the database tables you have scoped.
    1. Turn on DBQL USECOUNT logging just for that one database.
    2. Turn on DBQL logging that includes STATSUSAGE AND XMLPLAN only for users who access that database.
  5. Run the application’s queries against the database for several days or even a week.
  6. Enable the Viewpoint Stats Manager data collector in Admin->Teradata Systems portlet.  This enables the Stats Manager portlet.
  7. To run Stats Manger collect and analyze jobs, the TDStats database in DBC must have permission to collect statistics on the objects in the job’s scope.  For example: 
GRANT STATISTICS ON <database> TO TDStats;
  1. Select the database in the Statistics tab within the Viewpoint Stats Manager portlet and select the Automate command, in order to bring the existing statistics in that database under the control of AutoStats.   
  2. Create a new Analyze job
    1. Pick the DBQL option and limit queries by log date to incorporate just the last week when you were focused on logging that database.
    2. Select Require Review before applying recommendations.
  3. Run the Analyze job and review all recommendations and approve them.
  4. Create a new Collect job that incorporates your chosen database tables; run the Collect job.
    1. Specify Automatically generate the collect list (the default).
  5.  Preview the list of statistics to be collected before the collection job runs.

Starting off simply with AutoStats allows you to get familiar with the automated functionality and job scheduling capabilities, as well as the Stats Manager portlet screens.  Over time, you can give control over statistics analysis and collection to AutoStats for additional databases. 

Ignore ancestor settings: 
0
Apply supersede status to children: 
0

C/C++ Coded User-Defined Function Scenario

$
0
0
Short teaser: 
Add your own functions to Teradata Database with UDFs.

User-Defined Functions

This feature was introduced in Teradata Database V2R5.1.

Description

Teradata Database includes a wide variety of standard SQL functions that can be used in SQL queries. User-defined functions (UDFs) allow you to create your own functions and add them to Teradata Database to extend its built-in capabilities. You can create new functions in C/C++, Java, or in SQL itself and use them in SQL queries just like Teradata Database built-in functions.

There are three broad categories of UDFs that you can create:

  • Scalar functions accept zero or more input arguments and return a single value result. CURRENT_DATE and TRIM are examples of built-in scalar SQL functions.
  • Aggregate functions accept sets of input argument values (such as all or some of the values in a particular table column) and return a single value result of processing the input values. AVG, MIN, and MAX are examples of built-in aggregate functions.
  • Table functions are invoked in the FROM clause of a SELECT statement, and return a derived table to the calling statement, one row at a time. They can be used for special-purpose processing, such as shredding an XML document into SQL rows in a relational database table.

You can use UDFs for external I/O, complex business calculations, statistical calculations that use aggregate functions, and almost any type of data conversion. UDFs can help you enforce business rules and aid in data transformations.

More...

Benefits

UDFs:

  • Can be written in popular programming languages C/C++, Java, or in SQL itself.
  • Take full advantage of the Teradata Database parallel environment. When the UDF is referenced in a SQL request, a copy of the UDF executes on every AMP vproc.
  • Become part of the Teradata SQL language and are accessible to anyone who has the appropriate user access rights.
  • Can be used nearly any place within SQL expressions that Teradata built-in functions can be used.
  • Can be obtained from third-party vendors as packaged collections for special purposes.

Considerations

  • UDFs should be thoroughly tested and debugged prior to adding them to Teradata Database.
  • If the function requires access to specific operating system resources that ordinary OS users would not be able to access, you must use CREATE/REPLACE AUTHORIZATION to create a context that allows the UDF to perform I/O by running as a separate process under the authorization of that OS user.
  • If the UDF implements data transformations, ordering, or cast functionality for a user-defined data type (UDT), you must use appropriate DDL to register the UDF in TD (CREATE or REPLACE statements for TRANSFORM, ORDERING, or CAST).

Performance Implications of Protected and Unprotected Mode

By default, all UDFs are created to run in protected mode. In protected mode, each instance of the UDF runs in a separate process. This protects Teradata Database from programming errors that could cause the database to crash and restart, however, protected mode places additional processing burdens on the system that often result in performance degradation. Protected mode should be used only for UDF development and testing. The best practice is to develop and test your UDFs on a non-production test system.

In unprotected mode, a UDF is called directly by Teradata Database rather than running as a separate process. UDFs run much faster and place less of a performance burden on Teradata Database, but this mode should be used only for UDFs that have undergone rigorous testing. You must explicitly use ALTER FUNCTION to change a UDF from protected to unprotected mode.

Note: Java UDFs always run in protected mode. Therefore, they are slower than equivalent C or C++ UDFs.

Benefits and Considerations for C/C++ User-Defined Functions

Benefits of C/C++ UDFs

  • You can write complex functions using all the richness of the C/C++ programming language, and have those functions treated as if they were native Teradata Database functions.
  • You can build and distribute C/C++ UDFs without revealing the source code to users or purchasers.

Considerations for C/C++ UDFs

  • If you are providing C/C++ source code for your UDF, there must be a C/C++ compiler on all TD system nodes that have PEs. If your Teradata Database system runs on Linux nodes, the gcc compiler is already included on every node.
  • You should write, fully test, and debug your C/C++ code in an appropriate C/C++ code development environment.
  • The file sqltypes_td.h needs to be included in your C/C++ development environment for definition of the SQL data types and UDF interface APIs. For example, when using an IDE such as Microsoft Visual Studio, you must specify the path to sqltypes_td.h as an additional include directory.
  • The CREATE FUNCTION statement that creates and registers the function in Teradata Database must specify where the source or object code for the function exists. This is accomplished by means of the EXTERNAL clause of the CREATE FUNCTION statement.

About the EXTERNAL Clause of CREATE FUNCTION

The EXTERNAL clause of CREATE FUNCTION declares that the function originates external to Teradata Database, and optionally identifies the location of components required to compile and run the function. In most cases, you will be creating the source code for a single function on a client system and allowing Teradata Database to copy the source code to a Teradata system and compile the source there at the time the UDF is created.

Used without options, EXTERNAL causes Teradata Database to assume the C source code is in the current working directory of your client, and that the name of the .c source code file matches that of the function in the C source file, and matches the name of the UDF defined by the CREATE FUNCTION function_name statement. In this case, Teradata Database also assumes that no special header or include files are required to compile the C source.

You can use the EXTERNAL NAME clause to specify more details, such as the name and location of C source or object file that comprises the function, and other files required to compile or create the UDF, such as C header files and libraries. The EXTERNAL NAME clause takes a string literal value that uses letter codes to identify specific UDF components. The general format of a parameter entry in the string is one or two letters followed immediately by a delimiter character of your choosing, followed by a file name or location. Several parameters can be specified in the string.

The EXTERNAL NAME string parameters that can be used for C and C++ UDFs are described below. The exclamation mark (!) represents the delimiter character used to separate entries within the string.

CS!name_on_server!source_file_name
SS!name_on_server!source_file_name

Identifies the name of the C source code file. Use either the CS or SS prefix. CS indicates the file is located on the client machine. SS indicates the file is located on the Teradata server.

name_on_server specifies the name given to the source code file when it is copied from its original location to the Teradata server UDF compile directory. This can be any name that is unique in the Teradata Database system, and need not match the name of the source file itself. name_on_server should not include the .c file name extension.

source_file_name specifies the name of the original C source code file, including a file path if the file is not located in the current working directory. source_file_name must include the .c file extension.

Example: 'CS!myudfsource!C:\mycode\myudfsource.c'
CI!name_on_server!include_file_name
SI!name_on_server!include_file_name

Identifies the name of a C header file that is included with an <#include> statement in C source file. CI indicates the file is located on the client machine. SI indicates the file is located on the Teradata server.

name_on_server specifies the name given to the header file when it is copied from its original location to the Teradata server UDF compile directory. This can be any name that is unique in the Teradata Database system, and need not match the name of the header file itself. name_on_server should not include the .h file name extension.

include_file_name specifies the name of the header file, including a file path if the file is not located in the current working directory. include_file_name must include the .h file extension.

Example: 'CS!myudfinclude!C:\mycode\headers\myinclude.h'
CO!name_on_server!object_file_name
SO!name_on_server!object_file_name

Identifies the name of a compiled C object file that implements the function. Use either the CO or SO prefix. CO indicates the file is located on the client machine. SO indicates the file is located on the Teradata server.

name_on_server specifies the name given to the object file when it is copied from its original location to the Teradata server UDF directory. This can be any name that is unique in the Teradata Database system, and need not match the name of the object file itself.

include_file_name specifies the name of the object file, including a file path if the file is not located in the current working directory. Although the object module can be located on either the client or server, the object itself must have been either compiled on the Teradata Database server or on a system that generates an object that is compatible with the server.

Example: 'SO!myudfobject!/usr/jsmith/mycode/bin/myfunction'
SL!library_file_name

Identifies the name of a nonstandard C library that is required to compile the function. Standard libraries do not need to be specified.

library_file_name specifies the name of the nonstandard library file, including a file path if the file is not located in the current working directory.

Example: 'SL!/usr/jsmith/mycode/libs/mylibrary'
SP!package_file_name

Identifies the name of a package that contains the function to be implemented as a UDF.

package_file_name specifies the name of the package file, including a file path if the file is not located in the current working directory.

Example: 'SP!/opt/TDMiner/libTeraMiner.so '
F!function_source_name

Identifies the name of the function to be used for the UDF as that function is defined in the C source code. This parameter is required only if the name of the function defined in the C source file differs from the function_name declared by the CREATE FUNCTION function_name statement.

function_source_name specifies the name of the function defined in the C source file.

Example: 'F!plusudf'

Example

In the following EXTERNAL NAME clause string, the CS specification would indicate that the C source file for the function my_fun is located in the tests directory on the client system. The CI specification indicates that the function requires the inclusion of the header file my_header.h, also located on the client machine in the tests directory. The delimiter character used in this example is the exclamation mark (!).

EXTERNAL NAME 'CS!my_fun!c:\tests\my_fun.c!CI!my_header!c:\tests\my_header.h'

Scenario: Creating a C/C++ User-Defined Function

This scenario creates one simple scalar and one more complex aggregate C coded UDF.

Creating and using C and C++ UDFs involves the following tasks:

The scenario demonstrates the following tasks that are involved in creating and using C and C++ UDFs:

  • Write and debug the C code for your UDF using the C developement environment of your choice.
  • Use the CREATE FUNCTION statement to install and register a new UDF on Teradata Database. This statement identifies the function name, declares the function parameters and return value type, specifies the programming language used to create the function, and identifies the location of the UDF source or object code and other ancillary files required to create the UDF.
    Note: To use the CREATE FUNCTION statement, you must have been granted the CREATE FUNCTION privilege (or the FUNCTION privilege, which grants both CREATE and DROP FUNCTION privileges.
  • Use the UDF in queries just as you would use any built-in TD SQL function.

For More Information About C/C++ UDFs

For more information about C and C++ UDFs, and about the SQL used in these examples see:

DocumentDescription
SQL Data Definition Language - Detailed Topics, B035-1184Provides detailed information on the CREATE FUNCTION statement.
SQL Data Definition Language - Syntax and Examples, B035-1144Describes SQL syntax for CREATE FUNCTION.
SQL External Routine Programming, B035-1147Describes creating C and C++ UDFs.
Orange Book: Teradata Database User Defined Function User's Guide, Document #: 541-0004361Detailed discussion and application examples of using UDTs.

Example: Create a Simple Scalar C UDF

The following C code creates a simple scalar function to add two integers passed to the function and return the sum.

void plusudf(int *a, int *b, int *result, char sqlstate[5])
{  *result = *a + *b;}

The final parameter, char ssqlstate[5], allows error messages to be returned by Teradata for the UDF.

Assume the C source file is saved on the client machine as C:\myC\UDF\plusudf.c.

The following SQL creates a UDF named plusudf based on the simple C code above.

CREATE FUNCTION plusudf(
    a   INTEGER,
    b   INTEGER
)RETURNS INTEGER
LANGUAGE C
NO SQL
EXTERNAL NAME 'CS!plusudf!C:\myC\UDF\plusudf.c'
PARAMETER STYLE TD_GENERAL;

After the UDF has been created, it can be used like any Teradata Database function. In this case, it can even be run by itself from a tool like BTEQ:

SELECT plusudf(5,10);

plusudf(5,10)
-------------
           15

Example: Create an Aggregate C UDF

The following C code creates an aggregate function to calculate the standard deviation value from a column of values.

#define SQL_TEXT Latin_Text
#include <sqltypes_td.h>
#include <string.h>
#include <math.h>
typedef struct agr_storage {
        FLOAT count;
        FLOAT x_sq;
        FLOAT x_sum;
} AGR_Storage;

void STD_DEV( FNC_Phase phase,
              FNC_Context_t *fctx,
              FLOAT *x,
              FLOAT *result,
              int *x_i,
              int *result_i,
              char sqlstate[6],
              SQL_TEXT fncname[129],
              SQL_TEXT sfncname[129],
              SQL_TEXT error_message[257] )
{
     /* pointers to intermediate storage areas */
     AGR_Storage *s1 = fctx->interim1;
     AGR_Storage *s2 = fctx->interim2;

     /* The standard deviation function described here is: */
     /*                                                    */
     /* s = sqrt(sum(x^2)/N - (sum(x)/N)^2)                */
     /*                                                    */
     /* sum(x^2) :> x_sq, N :> count, sum(x) :> x_sum      */

     /* switch to determine the aggregation phase          */
     switch (phase)
     {

     /* This case selection is called once per group and */
     /* allocates and initializes all intermediate       */
     /* aggregate storage                                */
     
     case AGR_INIT:
     /* Get some storage for intermediate aggregate values. */
     /* FNC_DefMem returns zero if it cannot get the requested */
     /* memory. The amount of storage required is the size of */
     /* the structure used for intermediate aggregate values */

     s1 = FNC_DefMem(sizeof(AGR_Storage));
     if (s1 == NULL)
     {
        /* could not get storage */
        strcpy(sqlstate, "U0001"); /* see SQLSTATE table */
        return;
     }

     /* Initialize the intermediate aggregate values */
     s1->count = 0;
     s1->x_sq = 0;
     s1->x_sum = 0;

     /**************************************************  */
     /* Fall through to detail phase, because the         */
     /* AGR_INIT call passes in the first set of          */
     /* values for the group                              */
     /**************************************************  */
     /* This case selection is called once for each       */
     /* selected row to aggregate. x is the column the    */
     /* std_dev is being calculated for.                  */
     /* One copy will be running on each AMP              */

     case AGR_DETAIL:
     if (*x_i != -1)
     {
     s1->count++;
     s1->x_sq += *x * *x;
     s1->x_sum += *x;
     }
     break;

     /* This case selection combines the results of ALL */
     /* individual AMPs for each group                  */

     case AGR_COMBINE:
     s1->count += s2->count;
     s1->x_sq += s2->x_sq;
     s1->x_sum += s2->x_sum;
     break;

     /* This case selection returns the final standard */
     /* deviation. It is called once for each group.   */

     case AGR_FINAL:
     {
     FLOAT term2 = s1->x_sum/s1->count;
     FLOAT variance = s1->x_sq/s1->count - term2*term2;
     /* Adjust for deviations close to zero */
     if (fabs(variance) < 1.0e-14)
        variance = 0.0;
     *result = sqrt(variance);
     break;
     }

     case AGR_NODATA:
          /* return null if no data */
          *result_i = -1;
          break;

     default:
             /* If it gets here there must be an error because this */
             /* function does not accept any other phase options */
             strcpy(sqlstate, "U0005");
     }
     return;
}

Assume the C source file is saved on the client machine as C:\myC\UDF\std_dev.c.

The following SQL creates a UDF named STD_DEV based on the C code above.

REPLACE FUNCTION SYSLIB.STD_DEV(x FLOAT)
RETURNS FLOAT
CLASS AGGREGATE
LANGUAGE C
NO SQL
PARAMETER STYLE SQL
EXTERNAL NAME 'CS!STD_DEV!C:\myC\UDF\std_dev.c';

After the UDF has been created, it can be used like any Teradata Database function. The Teradata Database BTEQ session shown below creates a table, and loads it with some data. It is followed by another session where the user runs the STD_DEV UDF to calculate the standard deviation of the data in one of the table columns.

.logon testsys/sysdba,sysdba

DATABASE activity_db;

CREATE TABLE PRODUCT_LIFE 
       (Product_Id SMALLINT, 
        Product_Class VARCHAR (25),     
        Hours INTEGER)
;

INSERT INTO PRODUCT_LIFE  VALUES (1,'Bulb',2)
;INSERT INTO PRODUCT_LIFE  VALUES (2,'Bulb',4)
;INSERT INTO PRODUCT_LIFE  VALUES (3,'Bulb',4)
;INSERT INTO PRODUCT_LIFE  VALUES (4,'Bulb',4)
;INSERT INTO PRODUCT_LIFE  VALUES (5,'Bulb',5)
;INSERT INTO PRODUCT_LIFE  VALUES (6,'Bulb',5)
;INSERT INTO PRODUCT_LIFE  VALUES (7,'Bulb',7)
;INSERT INTO PRODUCT_LIFE  VALUES (8,'Bulb',9);


SELECT AVG(Hours),
       STD_DEV(Hours) 
FROM PRODUCT_LIFE 
WHERE PRODUCT_CLASS = 'Bulb'
;


*** Total elapsed time was 1 second.

Average(Hours)          STD_DEV(Hours)
--------------  ----------------------
             5   2.00000000000000E 000

.quit
Supersede
Ignore ancestor settings: 
0
Apply supersede status to children: 
0
Channel: 

AMP Worker Tasks

$
0
0

This session will give you a close-up picture of what AMP worker tasks are and how they support user work.

You'll learn what happens when you run out, and how to manage AMP worker task exhaustion and flow control.   We'll also share the latest recommendations for monitoring this important resource and wrap up with some insights on when you might want to increase the number of AMP worker tasks per AMP.


Presenter: Carrie Ballinger, Sr. Technical Advisor - Teradata

Audience: 
Data Warehouse Administrator, Data Warehouse Application Specialist, Data Warehouse Technical Specialist
Training details
Course Number: 
49781
Training Format: 
Recorded webcast
Price: 
$195
Credit Hours: 
2
Channel: 

Teradata Database 14.10 SQL Solutions for Real-World Temporal Problems

$
0
0

Data Warehouses are known for storing data over time, both historical point-in-time transactions as well as temporal data that is valid over a range of time...

Storing this data is one thing, but retrieving it to answer critical business questions can be a Structured Query Language (SQL) challenge. For example, an insurance company may want to determine exactly how many active policies a customer has had over periods of time, and whether there were periods in which the customer had no active policies.  These types of temporal problems can be tricky to solve, but this session focuses on Teradata 14.10 SQL features which both simplify the SQL coding and perform well.  Key features to be used include 1) the NORMALIZE clause and 2) sequenced aggregation.  The business problems in this session span industries, so you are bound to see problems that apply to your organization and how to solve them.

Presenter: Darrin Gaines, Senior Consultant – Teradata Corporation

Audience: 
Data Analyst, Application Developer, Architect/Designer
Training details
Course Number: 
51028
Training Format: 
Recorded webcast
Price: 
$195
Credit Hours: 
2
Channel: 

Migrating Your Oracle Database to a Teradata Data Warehouse

$
0
0

Your company is new to Teradata and wants to move Oracle managed user data into the new data warehouse. As a DBA, your assignment is to move this data into the Teradata system.

On the surface, this sounds like a straight forward effort, but, if you've been around databases long enough, you know that nothing is as straight forward as you'd like. Decisions have to be made about the purpose of the migration - is it to just re-host the data "as is" on the Teradata system or is it to provide new functionality to the user as well as re-host the data. Usually, migrations take the form of something in-between a "simple" re-hosting and adding new functionality.

Along with the effort to determine the business goals of the migration, there are technical issues involved in designing and carrying out a migration. In order to be successful, all aspects need to be understood.

"Migrating Your Oracle Database to a Teradata Data Warehouse" takes a DBA-centric view of the migration effort. We will examine those factors that affect the migration of Oracle database structures and user data to a Teradata environment. The type of migration to be performed as well as the technical aspects of the migration of database objects will be discussed.

Key Points:

  • One must understand the business goals of a data migration in order to be successful.
  • A synthesis of the migration goals, the database inventory, and the data sources is necessary in order to perform the migration.
  • Oracle databases can be migrated to Teradata

Presenter: Dawn McCormick - Teradata Corporation

Training details
Course Number: 
23279
Training Format: 
Recorded webcast
Price: 
$195
Credit Hours: 
1
Channel: 

Determining Precedence Among TASM Planned Environments

$
0
0

Most Teradata systems support different processing windows, each which has a somewhat different set of business priorities.  From 8 AM to 12 noon the most important work may be the Dashboard application queries.  But from 12 noon to 6 PM it could be the ad hoc business users.   At night maybe it’s the batch loads and reporting.   Planned Environments function within the TASM state matrix to support the ability to automatically manage changes to workload management setup during those different windows of time.

The state matrix represents more than just processing windows, however.  It intersects your Planned Environments (time-based processing windows) with any system Health Conditions you may have defined (such as high AMP worker task usage levels, or hardware disabilities).  TASM moves you from one Planned Environment to another or one Health Condition to another based on events that you have defined being triggered (such as time of day).

At the intersection of the two dimensions are the TASM states, which contain the actual workload management setup definitions.  A state may contain different values for some of the TASM settings, such as throttle limits or workload priorities. So when you change state, you are changing some of the rules that TASM uses when it manages your workloads.

Example of a State Matrix

The figure below illustrates a simple 4 x 3 state matrix.  The Planned Environments go along the top, and the Health Conditions go along the left side.   The same state can be, and often is, used at multiple intersection points.

How TASM Decides Which State to Switch to  

At each event interval, all system events are checked to see if a Health Condition needs to be changed.   After that is done, all defined Planned Environments are checked to see if a different Planned Environment should now be in effect.   It is possible for both the Planned Environment and the Health Condition to change on a given event interval.   Once the correct ones are established, their intersection points to the state that they are associated with. 

If more than one Planned Environment meets the current conditions (for example, one planned environment is for Monday and the other is for the end of the month and the end of the month falls on a Monday), then the Planned Environment with the highest precedence number wins. This will usually be the one in the rightmost position in the state matrix. 

When the Planned Environments are evaluated to see which one should be active, internally the search is performed from the right-most state (the one with the highest precedence) and moves leftwards, stopping at the first Planned Environment that fits the criteria.  For Health Conditions, the search starts at the bottom, with the one with the highest severity, stopping at the first Health Condition that qualifies.   That way, if more than one of those entities is a candidate for being active, the one that is deemed most important or most critical will be selected first. 

Changing Precedence

Viewpoint Workload Designer allows you to set the precedent order you wish by dragging and dropping individual planned environments.  In the state matrix shown above, assume that the Planned Environments are in the order in which they were created, with Monday being the right-most as it was add most recently.  But if you your business dictated that MonthEnd processing was more important and should be in effect during the times when Monday ended up being at the end of the month (a two-way tie), then you would want to drag EndOfMonth over to the right-most position.  You state matrix would then look like this:

Note that when you drag a Planned Environment to a different position, the cells below it that represent the states associated with that Planned Environment are moved as well.

In addition to looking at your Workload Designer screens, you can also see which Planned Environment has the highest precedence amongst two that might overlap by looking at tdwmdmp -a output.  

Under the heading "State and Event Information" tdwmdmp output will show you each Operating Environment (AKA Planned Environment) that is available, with their precedence order.  If more than one is eligible, the one with the higher precedence wins.  In the output below, the EndOfMonth Planned Environment will win if another Planned Environment is triggered at the same time.

Tdwmdmp -a also tells you the current Planned Environment.

State and Event Information: 

Operational Environments:

 PRECE  OPENV     OPERATING ENVIRONMENT 

 DENCE    ID             NAME               ENAB  CURR  DFLT

 -----  ------  --------------------------  ----  ----  ----

     4      77  EndOfMonth                   Yes            

     3      74  Monday                       Yes   Yes      

     2      75  Weekday                      Yes            

     1      76  Always                       Yes         Yes

 

Ignore ancestor settings: 
0
Apply supersede status to children: 
0

DBQL Scripts

$
0
0
Channel: 

The DBQL setup and maintenance documents provides scripts, as well as information on how to perform DBQL logging, maintenance, and upgrade, now including Teradata 14.0.

Ignore ancestor settings: 
Respect ancestor settings
Apply supersede status to children: 
Ignore children
Deprecated
Download License: 

Teradata Star-Schema Designs

$
0
0

This presentation explains how to physically implement a classic Star design with central fact table surrounded by several de-normalized dimension tables on Teradata.

Key Points:

  • Choosing the primary index columns for the fact table
  • Exploiting AJIs
  • Columnar Tables
  • What to do when no primary index choices look optimal
  • Slowly Changing Dimensions

We assume that you already have some Teradata physical design knowledge before taking this session.

Presenter: Steve Molini - Teradata Corporation

Audience: 
Those with a role of Database Administrator (DBA), Data Modeler, ETL Developer, Application Developer, Programmer, Performance Analyst, or Data Warehouse Architect.
Training details
Course Number: 
26536
Training Format: 
Recorded webcast
Price: 
$195
Credit Hours: 
2
Channel: 

How Teradata Works – the Extended Version

$
0
0

For anyone new to Teradata, this session is to acquaint you with the Teradata technology.

We will explore Teradata’s parallel environment; take a look inside a Node at Parsing Engines(PEs), the BYNET, and Access Module Processors (AMPs). We'll also discuss how the File System and Disk Sub-system works and what happens when you want to add Nodes.  Also included is an intro to Teradata’s Indexes, Columnar, compression, and some of the amazing optimizations that are automatically performed by Teradata when you submit a query.

Note: This is an expanded version of “Inside a Teradata Node” and many of the concepts are repeated for continuity and covered in more depth.

Presenter: Alison Torres, Director Teradata Warehouse Consulting - Teradata Corporation

Training details
Course Number: 
51198
Training Format: 
Recorded webcast
Price: 
$195
Credit Hours: 
2
Channel: 

Teradata Express for VMware Player

$
0
0
Channel: 
Full user details required: 
Full user details required

Download Teradata Express for VMware, a free, fully-functional Teradata database, that can be up and running on your system in minutes. For TDE 14.0, please read the launch announcement, and the user guide. For previous versions, read the general introduction to the new Teradata Express family, or learn how to install and configure Teradata Express for VMware.

There are multiple versions of Teradata Express for VMware available: for Teradata 13.0 and 13.10 and 14.0. More information on each package is available on our main Teradata Express page.

Note that in order to run this VM, you'll need to install VMware Player or VMware Server on your system. Also, please note that your system must have 64-bit support. For more details, see how to install and configure Teradata Express for VMware.

For feedback, discussion, and community support, please visit the Cloud Computing forum.

Download sizes, space requirements and MD5 checksums

Package Version Initial Disk Space Download size (bytes) MD5 checksum
Teradata Express 14.0 for VMware (4 GB) 14.00.00.01 3,105,165,305F8EFE3BBE29F3A3504B19709F791E17A
Teradata Express 14.0 for VMware (40 GB) 14.00.00.01 3,236,758,640B6C81AA693F8C3FB85CC6781A7487731
Teradata Express 14.0 for VMware (1 TB) 14.00.00.01 3,484,921,082 2D335814C61457E0A27763F187842612
Teradata Express 13.10 for VMware (1 TB) 13.10.00.10 15 GB3,002,848,12704e6cb9742f00fe8df34b56733ade533
Teradata Express 13.10 for VMware (40 GB) 13.10.00.10 10 GB2,943,708,647ab1409d8511b55448af4271271cc9c46
Teradata Express 13.0 for VMware (1 TB) 13.00.00.19 64 GB3,072,446,37591665dd69f43cf479558a606accbc4fb
Teradata Express 13.0 for VMware (40 GB) 13.00.00.19 10 GB2,018,812,0705cee084224343a01de4ae3879ada9237
Teradata Express 13.0 for VMware (40 GB, Japanese) 13.00.00.19 10 GB2,051,920,3728e024743aeed8e5de2ade0c0cd16fda9
Teradata Express 13.0 for VMware (4 GB) 13.00.00.19 10 GB2,002,207,4015a4d8754685e80738e90730a0134db9c
Teradata Tools and Utilities 13.10 Windows Client Install Package 13.10.00.00 409,823,3228e2d5b7aaf5ecc43275e9679ad9598b1

 

Ignore ancestor settings: 
Respect ancestor settings
Apply supersede status to children: 
Ignore children
Deprecated
Package version: 
13.10.00.05

Teradata’s New Priority Scheduler for SLES 11

$
0
0

Join this in-depth presentation that covers the basics of the New Teradata Priority Scheduler available with Linux SLES 11 operating system. 

We’ll discuss operating system background that will help you step through how the feature works, and usability pointers to help you manager your database priorities when you migrate to SLES11.  You’ll get familiar with the architecture of this new feature, and with the new priority groupings and their characteristics. You will learn about how resources are divided up by the new Priority Scheduler, and examine special benefits it will make available to your tactical workloads.   

Presenter: Carrie Ballinger - Teradata Corporation

Audience: 
Data Warehouse Administrator, Data Warehouse Architect/Designer
Training details
Course Number: 
51215
Training Format: 
Recorded webcast
Price: 
$195
Credit Hours: 
2
Channel: 

Teradata 101 The Strategy and Technology Basics

$
0
0

A database is a database, so if I know one , then I know them all, right?  Not really...

As with all technology, Teradata brings with it some fundamentally differences and it is better to understand them up front.  This session is geared towards those new to the Teradata environment and covers the philosophy of why things are different and then goes into the basics as to how Teradata is different.

The majority of the information is geared to those in the database administration or development arenas.  Simply put, what is covered is the “shared nothing, evenly distributed, self-managed” aspects that are the foundations of the Teradata database.

Presenter: Rob Armstrong - Teradata Corporation

Training details
Course Number: 
51377
Training Format: 
Recorded webcast
Price: 
$195
Credit Hours: 
1
Channel: 

Controlling the Mix: What's New in Teradata Active Syst Mgmt for Teradata 14.0 & 14.10

$
0
0

Workload Management is a critical component of balancing application demands and ensuring consistent application performance.

This session covers Teradata Active System Management (TASM) features introduced in Teradata 14.0 and Teradata 14.10. TASM continues to evolve to greater levels of automation and usability to regulate different classes of work and overall system health. Come hear what new TASM features are available and how you can best leverage these new features to control your workload mix.

Presenters:
Ruth Fenwick - Teradata Corporation
Brian Mitchell - Teradata Corporation

Training details
Course Number: 
51192
Training Format: 
Recorded webcast
Price: 
$195
Credit Hours: 
1
Channel: 

How Resources are Shared in the SLES 11 Priority Scheduler

$
0
0

The SLES 11 priority scheduler implements priorities and assigns resources to workloads based on a tree structure.   The priority administrator defines workloads in Viewpoint Workload Designer and places the workloads on one of several different available levels in this hierarchy. On some levels the admin assigns an allocation percent to the workloads, on other levels not.

How does the administrator influence who gets what?  How does tier level and the presence of other workloads are on the same tier impact what resources are actually allocated?  What happens when some workloads are idle and others are not?

This posting gives you a simple explanation of how resources are shared in SLES 11 priority scheduler and what happens when one or more workloads are unable to consume what they have been allocated.

The Flow of Resources within the Hierarchy

Conceptually, resources flow from the top of the priority hierarchy through to the bottom.  Workloads near the top of the hierarchy will be offered all the resources they are entitled to receive first.  What they cannot use, or what they are not entitled to, will flow to the next level in the tree.  Workloads at the bottom of the hierarchy will receive resources that either cannot be used by workloads above them, or resources that workloads above them are not entitled to.

What does “resources a workload is entitled to” mean? 

Tactical is the highest level where workloads can be placed in the priority hierarchy.  A workload in tactical is entitled to a lot of resources, practically all of the resources on the node if it is able to consume that much.  However, tactical workloads are intended to support very short, very highly-tuned requests, such as single-AMP queries, or few-AMP queries.  Tactical is automatically given a very large allocation of resources to boost its priority, so that work running there can enjoy a high level of consistency.  Tactical work is expected to use only a small fraction of what it is entitled to.

If recommended design approaches have been followed, the majority of the resources that flow into the tactical level will flow down to the level below.  If you are on an Active EDW platform, the next level down will be SLG Tier 1.   If you are on an Appliance platform, it will be Timeshare.

The flow of resources to Service Level Goal (SLG) Tiers

SLG Tiers are intended for workloads where there is a service level goal, whose requests have an expected elapsed time and where their elapsed time is critical to the business.  Up to five SLG Tiers may be defined, although one, or maybe two, are likely to be adequate for most sites.  Multiple workloads may be placed on each SLG Tier.  The figure below shows an example of what SLG Tier 1 might look like.

In looking back at priority hierarchy figure, shown first, note that the tactical tier and each SLG Tier include a workload labeled “Remaining”.  That workload is created internally by priority scheduler.  It doesn’t have any tasks or use any resources.  Its purpose is to connect to and act as a parent to the children in the tier below.  The Remaining workload passes unused or unallocated resources from one tier to another.

The administrator assigns an allocation percent to each user-defined workload on an SLG Tier.  This allocation represents a percent of resources the workload is entitled to from among the resources that flow into the tier.  If 80% of the node resources flow into SLG Tier 1, the Dashboard workload (which has been assigned an allocation of 15%) is entitled to 12% of the node resources (80% of 15% = 12%).

The Remaining workload on an SLG tier is automatically assigned an allocation that is derived by summing all the user-defined workload allocations on that tier and subtracting that sum from 100%.  Remaining in the figure above gets an allocation of 70% because 100% - (15% + 10% + 5%) = 70%.  Remaining’s allocation of 70% represents the percent of the resources that flow into SLG Tier 1 that the tiers below are entitled to.  You will be forced by Workload Designer to always leave some small percent to Remaining on an SLG Tier so work below will  never be in danger of starving.

Sharing unused resources within the SLG Tier

An assigned allocation percent could end up providing a larger level of node resources than a workload ever needs.   Dashboard may only ever consume 10% of node resources at peak processing times.   Or there may be times of day when Dashboard is not active.  In either of those cases, unused resources that were allocated to one workload will be shared by the other user-defined workloads on that tier, based on their percentages.  This is illustrated in the figure below.  

Note that what the Remaining workload is entitled to remains the same.   The result of Dashboard being idle is that WebApp1 and WebApp2 receive higher run-time allocations.  Only if the two of them are not able to use that spare resource will it go to Remaining and flow down to the tiers below.  

Unused resources on a tier are offered to sibling workloads (workloads on the same tier) first.  What is offered to each is based on the ratio of their individual workload allocations.  WebApp1 gets offered twice as much unused resource originally intended for Dashboard as WebApp2, because WebApp1 has twice as large a defined allocation.

Priority scheduler uses the same approach to sharing unused resources if the tiers below cannot use what flows to them.  The backflow that comes to an SLG tier from the tier below will be offered to all active workloads on the tier, proportional to their allocations. However, this situation would only occur if Timeshare workloads were not able to consume the resources that flowed down to them.  All resources flow down to the base of the hierarchy first.  Only if they cannot be used by the workloads at the base will they be available to other workloads to consume.   Just as in SLES 10 priority scheduler, no resource is wasted as long as someone is able to use it.

Sharing resources within Timeshare

Timeshare is a single level in the hierarchy that is expected to support the majority of the work running on a Teradata platform.  The administrator selects one of four access levels when a workload is assigned to Timeshare:  Top, High, Medium and Low.  The access level determines the level of resources that will be assigned to work running in that access level's workloads.  Each access level comes with an access rate that determines the actual contrast in priority among work running in Timeshare.  Top has an access rate of 8, High 4, Medium 2 or Low 1.  Access rates cannot be altered.

Priority Scheduler tells the operating system to allocate resources to the different Timeshare requests based on the access rates of the workload they have classified to.  This happens in such a way that any Top query will always receive eight times the resources as any Low query, and four times the resource of any Medium query, and two times the resource of any High query. 

This contrast in resource allocation is maintained among queries within Timeshare no matter how many are running in each access level.  If there are four queries running in Top, each will get 8 times the resource of a single query in Low. If there are 20 queries in Top, each will get 8 times the resource of a single query in Low.  In this way, high concurrency in one access level will not dilute the priority differences among queries active in different access levels at the same time.

Conclusions

When using SLES 11 priority scheduler, the administrator can influence the level of resources assigned to various workloads by several means.  The tier (or level) in the priority hierarchy where a workload is placed will identify its general priority.  If a workload is placed in the SLG Tier, the highest SLG tier will be offer a more predictable level of resources, compared to the lowest SLG Tier.  

The allocation percent given to SLG Tier workloads will determine the minimum percent those workloads will be offered.   How many other workloads are defined on the same SLG Tier and their patterns of activity and inactivity can tell you whether sibling sharing will enable a workload to receive more than its defined allocation.

Workloads placed in the Timeshare level may end up with the least predictable stream of resources, especially on a platform that supports SLG Tiers that use more at some times and less at others.  This is by design, because Timeshare work is intended to be less critical and not generally associated with service levels.  When there is low activity above Timeshare in the hierarchy, more unused resources will flow into Timeshare workloads.  But if all workloads above Timeshare are consuming 100% of their allocations, Timeshare will get less. 

However, there is always an expected minimum amount of resources you can count on Timeshare receiving.  This can be determined by looking at the allocation percent of the Remaining workload in the tier just above.  That Remaining workload is the parent of all activity that runs in Timeshare, so whatever is allocated to that Remaining will be shared across Timeshare requests.   

You can route more resources to Timeshare, should you need to do that, by ensuring that the SLG Tier Remaining workloads that are in the parent chain above Timeshare in the tree have adequate allocations associated with them.  (To accomplish this you may need to reduce some of the allocation percentages of the user-defined workloads on the various SLG Tiers.)  What Timeshare is entitled to, based on the Remaining workloads above it, is honored by the SLES 11 priority scheduler in the same way as the allocation of any other component higher up in the tree is honored.  But since Timeshare is able to get all the unused resources that no one else can use, it is likely that Timeshare workloads will receive much more than they are entitled to most of the time.

 

Ignore ancestor settings: 
0
Apply supersede status to children: 
0

Explaining the EXPLAIN – New Functionality in 14.0 & 14.10

$
0
0

This presentation builds on Recorded Webcast # 49911 Explaining the EXPLAIN, which did just that (if you have not viewed, please do so before this class).

Now, having completed that course, you should already have familiarity with the following Teradata EXPLAIN Facility features:   how and where the text is generated; how to read the output; what spool sizes and estimated time (cost) really represent; choices that can be made to affect the optimizer; confidence levels; and joins.

In this session, we will cover new features available in TD 14 and TD 14.10, including what to look for when accessing a columnar table, Incremental Query Processing (IPE) and the Partial Redistribution Partial Duplication join strategy for skewed data.

Presenter: Alison Torres, Director, Teradata Labs Customer Briefing Team - Teradata Coroporation

 

Training details
Course Number: 
51203
Training Format: 
Recorded webcast
Price: 
$195
Credit Hours: 
2
Channel: 

In-Memory the Intelligent Way with Teradata Intelligent Memory

$
0
0

Teradata Intelligent Memory keeps the most frequently used, or hottest, data in memory for in-memory performance without the need to purchase enough memory for all of your data.

It is automated and integrated into the Teradata Database for ease of use and dynamic adaptation to changes in workload patterns. Learn how Intelligent Memory works to improve system performance and get the most benefit possible from the larger memory capacities available in new platforms models.

Note: This was a 2013 Teradata Partners Conference session.

Presenter: Phil Benton - Teradata Corporation

Training details
Course Number: 
51283
Training Format: 
Recorded webcast
Price: 
$195
Credit Hours: 
1
Channel: 
Viewing all 174 articles
Browse latest View live


<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>