Expediting Express Requests

June 3, 2014, 1:40 pm

≪ Previous: In-Memory the Intelligent Way with Teradata Intelligent Memory

Cover Image:

If I told you there was a way you might be able to speed up parsing time for your queries, would you be interested?

In Teradata Database 14.10.02 there is a new capability that allows you to expedite express requests, and I’d like to explain how that works, describe when it can help you, and make some suggestions about how you can use it to get the best performance you can from parsing when the system is under stress. But first a little background.

What is an express request?

When data dictionary information is needed by modules in the parsing engine (PE), and it cannot be found in the data dictionary cache on that PE, an express request is issued. That stream-lined request goes directly to the AMPs and attempts to find that data dictionary information. Because the database code itself is issuing express requests (rather than an end user), and the database can trust itself to do the right thing, these very short requests are allowed to bypass the resolver, security, parsing and optimization modules and are sent directly to the AMPs.

Most express requests are single-AMP requests that go to one AMP. If there is lock on the row hash of the dictionary row they are trying to access, an express request will be resent to that AMP with an access lock applied. If data from the data dictionary is accessed using an access lock, that data is not cached as it is the result of a dirty read and that row or rows may be in the process of undergoing change.

Several different modules in the parsing engine can submit express requests. The figure below lists some of the dictionary information that express requests access, and which modules issue the requests. Even a simple query may require 30 or more express requests to be issued, and they will be issued serially. Things like the number of database objects referenced in the SQL, the number of statistics that the optimizer looks for, and the complexity of access rights can influence the amount of separate requests for data that will required.

Assessing parsing time spent on express requests

Starting in Teradata Database 14.0 you can see the wall clock time that was spent in processing express requests. Usually this number is close to zero. This data is in a new column in DBQLogTbl named ParserExpReq. Below is a sample of columns from several rows of DBQLogTbl output showing ParserExpReq, intentionally selected to show variation. The unit reported in ParserExpReq is seconds.

NumSteps	AMPCPUTime	ParserCPUTime	ParserExpReq
6	0.52	0.02	0.01
955	0.32	1.51	27.47
15	0.6	0.04	19.26
12	30.41	0.02	0.01
9	268.85	0.04	0
4	0.07	0.02	?
26	55.02	0.38	1.96

In many cases ParserExpReq will be NULL. You will see a NULL when no express requests were issued because all the data was found in the data dictionary. Zero means that the wall clock time for express requests was less than 0.01 seconds. 99.9% of theDBQLogTbl rows from my shared test system showed ParserExpReq values of either zero or NULL. I would expect that to be the same on your platform. But as you can see from the data above, taken from that same system, there were occasional times when ParserExpReq was reporting some number of seconds (close to half a minute in the worst case above), even when the CPU time for parsing was very low.

ParserExpReq reports wall clock time for the execution of all express requests combined on behalf of a query, and will not correlate directly to ParserCPUTime. The usual reason for ParserExpReq to be a higher number of seconds is that one or more of the express requests were blocked once they got to the AMP. This could happen if the AMP has exhausted AMP worker tasks.

What does expediting a request do?

Expediting a request marks it for special performance advantages. In SLES11 and current SLES10 releases, all queries within a tactical workload are automatically expedited. As of 14.10.02 you have the capability of expediting express requests as well.

Here’s why that might make a difference for you. When a request is expedited it is able to use special reserve pools of AMP worker tasks (AWTs), intended for tactical queries only. If there is a shortage of AWTs on your platform, use of these reserve pools can speed up the elapsed time of a request, as the request no longer has to wait for another request to complete and free up an AWT so that it can begin to execute.

See this blog posting on reserving AMP worker tasks for more information on how expedited requests take advantage of reserve pools:

http://developer.teradata.com/blog/carrie/2010/01/expedite-your-tactical-queries-whether-you-think-they-need-it-or-not

In addition to being given access to special reserve pools of AWTs, expedited requests are given other small internal boosts that are coded into the database. While probably not noticeable in most cases, these slight performance advantages can contribute to completing work more quickly, especially on a platform with a high degree of contention.

The standard way that express requests are assigned to AMP worker tasks

Prior to taking advantage of this enhancement in 14.10.02, express requests were sent to the AMP in a message classified as a Work01 work type message. Message work types are used to indicate the importance of the work that is contained in the message. Work01 is used for spawned work on behalf of a user-initiated query (such as the receiver task during row redistribution). It is a step up from new work (which runs in Work00).

If there is a delay in getting an AWT, Work01 messages queue up in the message queue ahead of Work00 messages (new work sent from the dispatcher), but behind all the other work types. If there are no AWTs available in the unassigned AWT pool at the time the message arrives, and the three reserves for Work01 are in-use, the message will wait on the queue. This can increase the time for an express request to complete.

How express requests are assigned to AMP worker tasks with this enhancement

If you take advantage of this enhancement, then messages representing express requests that arrive on the AMPs may be able to use the Work09 work type and reserve pool. Work09 is a more elevated work type and is the work type assigned to spawned work from an expedited request. Use of Work09 for express requests only happens, however, if there are AMP worker tasks reserved for the Work09 reserve pool. If there are no reserves in Work09, then Work01 will continue to be used.

For more information on work types and reserve pools read this posting:

http://developer.teradata.com/blog/carrie/2011/10/reserving-amp-worker-tasks-don-t-let-the-parameters-confuse-you

The important point in all of this is that if you are often, or even occasionally, out of AWTs, then making sure your express requests are not going to be impacted when that condition arises could provide better query performance for parsing activities during stressful times.

Steps you have to take

The default behavior for express requests will remain the same when you upgrade to 14.10.02 or 15.0. In order to expedite express requests you will need to put in a request to the support center or your account team asking them to change an internal DBS Control parameter called: EnableExpediteExp.

The EnableExpediteExp setting has three possible settings:

0 = Use current behavior, all express requests go to Work01 (the default)

1 = Parser express requests will be expedited and use Work09 for all requests if AWTs have been reserved

2 = Parser express requests will be expedited and use Work09 only if the current workload is expedited by workload management and AWTs have been reserved

If you set EnableExpediteExp = 1, then all express request for all queries will be expedited, even when the request undergoing parsing is running in a workload that itself is not expedited. If you set EnableExpediteExp = 2, then only express requests issued on behalf of an expedited workload will be expedited.

Marking EnableExpediteExp as 1 or 2 is going to provide a benefit primarily in situations where there is some level of AWT exhaustion and where an AMP worker task reserve pool for tactical work has been setup. You can look at ParserExpReq in DBQL to check if you are experiencing longer parsing times due to express request delays. If you are, have a conversation with the support center about whether this is the right change for you.

Ignore ancestor settings:

Tags:

Apply supersede status to children:

↧

How to Design Complex Views

June 26, 2014, 6:05 pm

≫ Next: Teradata Express 15 for VMware Player Available Now

≪ Previous: Expediting Express Requests

This presentation teaches you how to design effective semantic view layers on Teradata. After briefly explaining the mechanics of how the Teradata Parser processes views, it dissects a number of semantic view examples that have been successfully implemented in production environments to help you understand what works and what constructs are problematic. The emphasis is upon practical issues from understanding how to design a single view to design patterns appropriate for virtual star schemas.

Key Points

Which Query Rewrites are particularly important for the view designer to understand?
The role of constraints versus statistics in semantic view design.
What syntactic forms should you stay away from when designing a complex view?
Under what conditions does a UNION view work well for a semantic layer?

Since a variety of topics are crammed into a single session, it is strongly encouraged (but not required) that one already have taken the standard instructor-led courses on Teradata Physical Database Design and Physical Database Tuning before attending this session.

Note: This presentation is part of the Data Modeling-Intro to Implementation Series.

Presenter: Stephen Molini, Principal Consultant – Teradata Corporation

Audience:

Those with a role Database Administrator (DBA), Data Modeler, ETL Developer, Application Developer, Programmer, Performance Analyst, or Data Warehouse Architect

Training details

Course Number:

46183

Training URL:

http://www.teradata.com/t/ten/

Training Format:

Recorded webcast

Price:

$195

Credit Hours:

Tags:

Channel:

↧

Teradata Express 15 for VMware Player Available Now

June 30, 2014, 8:24 am

≫ Next: Teradata Database Enhancement Requests

≪ Previous: How to Design Complex Views

Short teaser:

Teradata Express 15 for VMware Player Available Now

Cover Image:

Teradata Express 15.0 is now available for download so that you can evaluate the new features of Teradata. Teradata Database 15.0 is the newest release of Teradata’s industry-leading RDBMS. This release extends Teradata’s leadership in several key areas, and continues to position Teradata Database as the database of choice for Integrated Warehousing, Business Analytics, and participation in the Unified Data Architecture.

The Teradata Express program offers customers a free evaluation version of the Teradata database as a download from Teradata.com. While the Express edition is not licensed for production usage and is not officially supported, this is a fully functional Teradata database that is a great tool for developers and testers working on their Teradata projects or other who want hands-on learning with Teradata. You can download Teradata Express and many other Teradata products for free here: http://downloads.teradata.com/download

What's New:

The key categories and features of Teradata 15.0 include:

• Big data & Analytics

• JSON Integration

• SQL-H Enhancements* (not available in 15.00.00 base)

• 3D Geospatial Data Type

• Geospatial Performance Enhancements

• Table Operator Enhancements

• Scripting and Language Support

• Performance Optimization

• Light Weight Redistribution

• Software Efficiency Improvements

• Ecosystem

• TASM - Virtual Partitions Enhancement (SLES11 only)

• TASM/TIWM – New Classification (by Datablock Selectivity)

• TIWM - Operating Periods

• Other Workload Management Enhancements

• Priority Scheduler Enhancements (SLES11 Only)

• Unity Callback Support

• Utility Statistics Logging

• SQL Interface For ShowBlocks

• Industry Compatiblity

• ANSI Temporal Syntax Support

• Sequenced Aggregation Enhancements

• Aggregate JI for Temporal

• Teradata Directory Services Enhancements

• 1MB Phase 2

• Quality & supportablity

• Faster time to support

• HSN Health

• Onsite System & Dump Analysis

• Intelligent Parser Retries

• Unresponsive Node Isolation

• New PI On Access Rights Table

• Roles Access Rights Improvements

• Space COD w/o Restart

• Debugging & Diagnosis

• DBQL – Show Parameters

More Information

We have additional articles on Developer Exchange for the new Teradata Express family:

Please note that while the Teradata Express family of products is not officially supported, you can talk to other users and get help in the Cloud Computing forum.

Ignore ancestor settings:

Channel:

database

Apply supersede status to children:

↧

Teradata Database Enhancement Requests

July 3, 2014, 12:04 pm

≫ Next: Teradata Terminology Deciphered

≪ Previous: Teradata Express 15 for VMware Player Available Now

Short teaser:

Submit your Teradata Database Enhancement Request

Cover Image:

Teradata Database Enhancement Requests

Do you have a Teradata Database enhancement request? You can submit your Teradata Database enhancement request at: http://www.teradata-partners.com/partners-user-group/enhancement-requests

The submitted Enhancement Requests are reviewed monthly by the Product Advisory Council (PAC), who is supported by the Partners User Group.

The PAC is an advisory council of 15 Teradata Customer members and 6 Teradata members from the Engineering, Product Management, Customer Services and the Marketing organizations. The PAC Customer members work closely with the Teradata members to review and discuss the Enhancement Requests (ERs) then determine the validity of each ER submitted.

The Enhancement Request Process:

An ER is submitted and received by the PAC
The ER will automatically be assigned a temporary ID number and a confirmation email will be sent to the submitter
On a monthly basis, the PAC reviews all submitted ERs
Each ER will be dispositioned as one of the following:
- Accepted
- Duplicate
- Rejected

If the ER is Accepted, it will be assigned an official ER number. The submitter will need to log into the ER website to check the disposition of the ER.

The PAC Customers also help to prioritize the enhancements to the Teradata Database. At least once a year, the PAC Customers create a PAC Top 10 ER list to Teradata. Teradata will assess and discuss the status of the accepted ERs and will advise the PAC on the progress of the Top ERs.

For more information on the PAC and the Partners User Group:

Teradata Product Advisory Council (PAC): http://www.teradata-partners.com/partners-user-group/product-advisory-council

Partners User Group: http://www.teradata-partners.com/partners-user-group/user-group-overview

Ignore ancestor settings:

Channel:

database

Apply supersede status to children:

↧

Teradata Terminology Deciphered

July 8, 2014, 10:23 am

≫ Next: Teradata Database 15 Overview

≪ Previous: Teradata Database Enhancement Requests

How did Teradata get its name? Does mistletoe grow in parse trees? If there are “virtual AMPs” are there “virtual Watts” as well?

Since there are pdisks and vdisks, do we configure a, b, and cdisks? Do Parsing Engines need spark plugs? What kind of string is used to bind a request parcel? What kind of fish can you catch with a BYNET? Does the Statistics Wizards do Data Mining? What kind of taxes does the Syntaxer collect? Does Locking Logger come in bottles and draft?

These questions and more will be answered in this informative, rudimentary introduction to Teradata Terminology.

Key points:

Hardware Platform Evolution
Communications Layers
Data Storage Terminology
Data Access Terminology
Data Protection

Presenter: Mark Jauss, Education Consultant - Teradata Corporation

Audience:

IT Professionals

Training details

Course Number:

21408

Training URL:

http://www.teradata.com/t/ten/

Training Format:

Recorded webcast

Price:

$195

Credit Hours:

Tags:

Channel:

↧

Teradata Database 15 Overview

July 18, 2014, 4:21 pm

≫ Next: Statistics Collection Recommendations – Teradata Database 14.10

≪ Previous: Teradata Terminology Deciphered

Come and hear all about the newest features of Teradata Database 15.00!

Learn about the exciting Teradata integration of semi-structured JSON data, SQL Ferret Visualization, Parameter Query Logging, Lightweight Redistribution, 3D-Geospatial capability, Temporal Aggregate Join Indexes, the latest TASM enhancements and of course, the newest SQL-H capabilities. Teradata Database 15.00 represents the next major step in the evolution of Data Warehousing excellence.

Teradata Release Information: 15

Presenter: Rich Charuki - Teradata Corporation

Audience:

Database Administrator, Application Developer

Training details

Course Number:

51524

Training URL:

http://www.teradata.com/t/ten/

Training Format:

Recorded webcast

Price:

$195

Credit Hours:

Tags:

td 15

teradata database 15

teradata database release 15

Channel:

database

↧

Statistics Collection Recommendations – Teradata Database 14.10

September 11, 2014, 10:37 am

≫ Next: Teradata 15.0 JSON Integration

≪ Previous: Teradata Database 15 Overview

Cover Image:

Statistical information is vital for the optimizer when it builds query plans. But collecting statistics can involve time and resources. By understanding and combining several different statistics gathering techniques, users of Teradata can find the correct balance between good query plans and the time required to ensure adequate statistical information is always available.

This recently-updated compilation of statistics collection recommendations are intended for sites that are on any of the Teradata Database 14.10 software release levels. Some of these recommendations apply to releases earlier than Teradata Database 14.10 and some rely on new features available only in Teradata Database 14.10.

For greater detail on collecting statistics for Teradata Database 14.10, see the orange book titled: Teradata Database 14.10 Statistics Enhancements by Rama Krishna Korlapati.

Contributors: Carrie Ballinger, Rama Krishna Korlapati, Paul Sinclair, September 11, 2014

Collect Full Statistics

Non-indexed columns used in predicates
All NUSIs
USIs/UPIs if used in non-equality predicates (range constraints)
Most NUPIs (see below for a fuller discussion of NUPI statistic collection)
Full statistics always need to be collected on relevant columns and indexes on small tables (less than 100 rows per AMP)
PARTITION for all partitioned tables undergoing upward growth
Partitioning columns of a row-partitioned table

Can Rely on Dynamic AMP Sampling

USIs or UPIs if only used with equality predicates
NUSIs with an even distribution of values
NUPIs that display even distribution, and if used for joining, conform to assumed uniqueness (see Point #2 under “Other Considerations” below)
See “Other Considerations” for additional points related to dynamic AMP sampling

Collect Multicolumn Statistics

Groups of columns that often appear together with equality predicates. These statistics are used for single-tables estimates.
Groups of columns used for joins or aggregations, where there is either a dependency or some degree of correlation among them. With no multicolumn statistics collected, the optimizer assumes complete independence among the column values. The more that the combination of actual values are correlated, the greater the value of collecting multicolumn statistics is in this situation.
Specify a name for such statistics, for ease of recollection, viewing, and/or dropping.

General Suggestions for the Statistics Collection Process

When multiple statistics on a table are collected for the first time, group all statistics with the same USING options into a single request.
After first time collections, collect all statistics being refreshed on a table into one statement, and if all are being refreshed, re-collect at the table level.
Do not rely on copied or transferred SUMMARY statistics or PARTITION statistics between tables within a system or across systems. Recollect them natively. This ensures that changes to the configuration and other internal details (such as how internal partitions are arranged) are available to the optimizer.
Recollect table-level SUMMARY statistics after data loading events in order to provide the optimizer with current table row counts and other detail. This operation runs very quickly and supports more effective extrapolations of statistics that were not able to be recollected after updates.
Do not drop and then recollect statistics, as history records for the statistic are lost when the statistic is dropped, making it less likely that the optimizer skips statistics collection or downgrades to sampling.

New Recommendations for Teradata Database 14.10

If migrating from a previous release, set the DBS Control internal field NoDot0Backdown to true in order to make use of the new Version 6 statistics histograms, which can carry update, delete and insert counts.
Enable DBQL USECOUNT logging for all important databases whose table row counts may change over time. This provides information about updates, deletes and inserts performed on each table within the logged databases and contributes to better extrapolations.
Benefit from the default system threshold option, which may allow some submitted statistics to be skipped, by turning on DBQL USECOUNT logging and building up statistic history records. Skipping can potentially reduce the resources required by statistics recollections.
Expect to perform several full collections before statistic skipping or automatic downgrade to sampling is considered.
Use the collection statement-level THRESHOLD option only for cases where there is a specific need to override the global threshold default.
Consider collecting statistics (and providing a name) on SQL expressions if they are frequently used in queries and if they reference columns from a single table.

Other Considerations

Optimizations such as nested join, partial GROUP BY, and dynamic partition elimination are not chosen unless statistics have been collected on the relevant columns.
NUPIs that are used in join steps in the absence of collected statistics are assumed to be 75% unique, and the number of distinct values in the table is derived from that. A NUPI that is far off from being 75% unique (for example, it’s 90% unique, or on the other side, it’s 60% unique or less) benefit from having statistics collected, including a NUPI composed of multiple columns regardless of the length of the concatenated values. However, if it is close to being 75% unique, dynamic AMP samples are adequate. To determine what the uniqueness of a NUPI is before collecting statistics, you can issue this SQL statement:

EXPLAIN SELECT DISTINCT NUPI-column FROM table;

For a partitioned table, it is recommended that you always collect statistics on:

PARTITION. This tells the optimizer how many row partitions are empty, a histogram of how many rows are in each row partition, and the compression ratio for column partitions. This statistic is used for optimizer costing.
Any partitioning columns. This provides cardinality estimates to the optimizer when the partitioning column is part of a query’s selection criteria.

For a partitioned primary index table, consider collecting these statistics if the partitioning column is not part of the table’s primary index (PI):

(PARTITION, PI). This statistic is most important when a given PI value may exist in multiple partitions, and can be skipped if a PI value only goes to one partition. It provides the optimizer with the distribution of primary index values across the partitions. It helps in costing the sliding-window and rowkey-based merge join, as well as dynamic partition elimination.
(PARTITION, PI, partitioning column). This statistic provides the combined number of distinct values for the combination of PI and partitioning columns after partition elimination. It is used in rowkey join costing.

Dynamic AMP sampling has the option of pulling samples from all AMPs, rather than from a single AMP (the default). Dynamic all-AMP sampling has these particular advantages:

It provides a more accurate row count estimate for a table with a NUPI. This benefit becomes important when NUPI statistics have not been collected (as might be the case if the table is extraordinarily large), and the NUPI has an uneven distribution of values.
Statistics extrapolation for any column in a table is not attempted for small tables or tables whose primary index is skewed (based on full statistics having been collected on the PI), unless all-AMP dynamic AMP sampling is turned on. Because a dynamic AMP sample is compared against the table row count in the histogram as the first step in the extrapolation process, an accurate dynamic AMP sample row count is critical for determining if collected statistics are stale, or not.

For temporal tables, follow all collection recommendations made above. Currently, statistics are not supported on BEGIN and END period types. That capability is planned for a future release.

[1] Any column which is over 95% unique is considered as a nearly-unique column.

[2] Dynamic AMP sampling is sometimes referred to as random AMP sampling.

Ignore ancestor settings:

Tags:

Apply supersede status to children:

↧

Teradata 15.0 JSON Integration

September 15, 2014, 9:17 am

≫ Next: Teradata Express Support for SQL H -- Install Instructions SQL H

≪ Previous: Statistics Collection Recommendations – Teradata Database 14.10

Come learn how the new JSON Integration feature will incorporate and use semi-structured data within the TD 15.0 Database.

The capabilities discussed include:

The ability to store semi-structured data inside the new JSON data type
The ability to query via SQL, the data within a JSON data type
New indexing capabilities on the new JSON data type
The ability to add/modify attributes within a JSON data type without performing ETL operations or schema changes.
New facility to shred and publish JSON documents
Provide for the allowance to use "wildcards" to navigate complex JSON documents.

Presenter:Rich Charucki, Engineering Fellow - Teradata Corporation

Training details

Course Number:

51925

Training URL:

http://www.teradata.com/t/ten/

Training Format:

Recorded webcast

Price:

$195

Credit Hours:

Channel:

database

Tags:

JSON

JavaScript Object Notation

Semi-Structured Data

JSON Integration

SQL JSON

↧

Teradata Express Support for SQL H -- Install Instructions SQL H

September 19, 2014, 9:40 am

≫ Next: Move The Process, Not The Data – Accelerating Analytics With In-Database Processing

≪ Previous: Teradata 15.0 JSON Integration

Short teaser:

Teradata Express Support for SQL H -- Resolve Hostname conflicts and other installation activities

The following instructions should help you set up and configure the Teradata SQL H capability with your Hadoop instance.

100 Resolve Hostname conflicts

Teradata SQL-H uses node hostname to resolve network addressing. However, typical systems configured by Teradata Staging will likely have conflicts/duplicate hostnames between Teradata nodes and Hadoop nodes. This conflict must be addressed, regardless if the connectivity is via customer/external LAN or via bynet/infiniband. This section describes the process to change the Hadoop node hostnames. This section may be skipped if there is no conflict in hostname between the Teradata and Hadoop nodes.

START OF SYSTEM OUTAGE. Note date and time: ___________________

100.1 Shutdown all Hadoop services. On the master node:

# hcli system stop

100.2 Change the hostname.

100.2.1 On the Hadoop master node, append the new hostname to each Hadoop node to the existing entry in the /etc/hosts file. Example:

Before: 39.64.8.2 byn002-8

After: 39.64.8.2 hdp002-8 byn002-8

Note it is mandatory that the new hostname must precede (to the left of) the original hostname.

100.2.2 Validate the new hostname is recognized. On the master node:

# hostname –f”

100.3 On the Hadoop Master node, run the following command:

# hcli support modifyHostname <oldname> <newname>

100.4 If Virtual IP addressing is used at the site, perform the necessary modifications for the new hostnames.

100.5 Make sure the Teradata system can resolve Hadoop node names (either through DNS or via /etc/hosts on the Teradata nodes).

100.6 Start Hadopp services. On the master node:

# hcli system start

END OF SYSTEM OUTAGE. Note date and time: ___________________

101 Add Hadoop IPAs to Teradata host file

Perform the following steps if Teradata hostnames and Hadoop hostnames are not conflicting and are not resolved through local DNS.

101.1 Save a copy of /etc/hosts file on all Teradata TPA nodes:

# cp /etc/hosts /etc/orig.hosts

101.2 Add Hadoop nodes to the /etc/hosts file of every Teradata TPA node. Example:

192.168.135.100 hdp002-8

192.168.135.101 hdp002-9

102 Hadoop Namenode configuration for Teradata proxy user setup

In order for Teradata SQL-H to work with aHadoop system, a Teradata super user must be configured on the hadoop namenode to be allowed to access HDFS from Teradata nodes on behalf of another Hadoop user in a secured way. The Teradata super user used for this setting is tdatuser. On the hadoop side, the following configurations are required in core-site.xml to add ‘tdatuser’ as trusted super user. The first property determines the file system groups that tdatuser is allowed to impersonate. The second property hadoop.proxyuser.tdatuser.hosts determines the hosts from where the tdatuser user is allowed to access the HDFS. If these configurations are not present, impersonation will not be allowed and Teradata queries will fail with a security error.

Add the following two properties to the hadoop namenode configuration file /etc/hadoop/conf/core-site.xml and change their values based on your Teradata/Hadoop environment setup, save the file and restart namenode..

<name>hadoop.proxyuser.tdatuser.groups</name>

<value>users</value>

Allow the superuser tdatuser to impersonate any members of HDFS group(s). For example, ‘users’ is used as HDFS group that tdatuser is allowed to impersonate users belonged to this group.

</description>

</property>

<name>hadoop.proxyuser.tdatuser.hosts</name>

The superuser can connect only from host1 and host2 to impersonate a user. Here host1 and host2 represents Teradata nodes. All nodes of the Teradata system need to be listed here in order for SQL-H query to be processed. It is recommended to use the IP addresses of the Teradata nodes.

</description>

</property>

103 Grant user privileges

Certain privileges are required for the Teradata database user to run the setup successfully, and if needed, they can be granted by the admin user using the following statements:

GRANT EXECUTE PROCEDURE ON SQLJ TO [user] WITH GRANT OPTION;

GRANT CREATE EXTERNAL PROCEDURE ON SYSLIB TO [user] WITH GRANT OPTION;

GRANT DROP PROCEDURE ON SYSLIB TO [user] WITH GRANT OPTION;

GRANT DROP FUNCTION ON SYSLIB TO [user] WITH GRANT OPTION;

GRANT EXECUTE FUNCTION ON SYSLIB TO [user] WITH GRANT OPTION;

GRANT CREATE FUNCTION ON SYSLIB TO [user] WITH GRANT OPTION;

GRANT SELECT ON SYSLIB TO [user] WITH GRANT OPTION;

104 Run setup script

104.1 On a Teradata node, go to the /opt/teradata/sqlh/<version> directory and run the BTEQ setup script (tdsqlh_td.bteq) to do the setup:

# bteq .logon localhost/[user],[password] < tdsqlh_td.bteq > output.txt 2>&1

where [user] and [password] are to be filled in.

104.2 On a Teradata node, go to the /opt/teradata/tdsqlh_hdp/<version> directory and run the BTEQ setup script (tdsqlh_hdp.bteq) to do the setup:

# bteq .logon localhost/[user],[password] < tdsqlh_hdp.bteq > output.txt 2>&1

where [user] and [password] are to be filled in.

104.3 Check the output file (output.txt in the above example) for possible errors. Upon successful execution, the script will install the required JAR files and create the load_from_hcatalog table operator function.

Notes:

You may run into the following error if there isn’t enough room in SYSLIB database. For installing Hadoop jars require about 19 megabytes of space. So, you need to increase the SYSLIB database size.

call sqlj.install_jar('cj!pig-withouthadoop.jar','pig', 0);

*** Failure 2644 No more room in database SYSLIB.

Running the setup script for the first time may return following SQL Failure messages. These errors are benign and can be ignored safely. Examples:

DROP FUNCTION SYSLIB.load_from_hcatalog;

*** Failure 5589 Function 'load_from_hcatalog' does not exist.

call sqlj.remove_jar('SQLH', 0);

*** Failure 7972 Jar 'SYSLIB.SQLH' does not exist.

*** Warning: 9241 Check output for possible warnings encountered in Installing or Replacing a JAR.

105 Validate SQL-H install

Below steps will help to validate if the Teradata/Hadoop setup is indeed correct and ready for SQL-H queries.

105.1 Create hcatalog table with data:

105.1.1 Download files tdsqlh_example.hive and tdsql_data.csv from directory /opt/teradata/tdsqlh_hdp/<version> on a Teradata node, and copy them to the hadoop namenode under /tmp directrory.

105.1.2 Log into the hadoop namenode and go to /tmp directory.

105.1.3 Change files permissions on the copied files.

# chmod 777 tdsqlh_example.hive tdsql_data.csv

105.1.4 Change user to hive.

# su hive

105.2 Run sample hive script.

# hive < tdsqlh_example.hive

Make sure the script completes and returns row count as 805.

Total MapReduce CPU Time Spent: 4 seconds 580 msec

805

Time taken: 33.76 seconds

On successful completion, this will create a table tdsqlh_test with 14 columns and populate 805 rows.

105.3 Run SQL-H query to import rows from tdsqlh_test table

105.3.1 Using SQL Assistance or BTEQ, logon to the Teradata system as user dbc. If user dbc wasn’t present then you may choose a different user.

105.3.2 Run the following SQL query with proper hcatalog server address and port and see if it returns count(*) as 805.

SELECT count(*)

FROM SYSLIB.load_from_hcatalog(USING

server('153.64.29.98') -- Hcatalog server address

port('9083') -- hcatalog server port

username('hive')

dbname('default')

tablename('tdsqlh_test')

columns('*')

templeton_port('50111') -- Web Hcatalog port

) as D1;

Results should come back with:

*** Query completed. One row found. One column returned.

*** Total elapsed time was 2 seconds.

Count(*)

-----------

805

If the above query returns an error instead of row count 805 then it’s possible that the TD/Hadoop setup could be wrong and requires manual trouble shooting to isolate the problem.

Ignore ancestor settings:

Channel:

database

Apply supersede status to children:

↧

Move The Process, Not The Data – Accelerating Analytics With In-Database Processing

September 23, 2014, 10:13 am

≫ Next: Performance Diagnostic Methodology

≪ Previous: Teradata Express Support for SQL H -- Install Instructions SQL H

In today’s fast changing world, companies are constantly looking to find better, cheaper, and faster ways to turn their data into competitive advantage.

At the same time, the data is getting more detailed and more frequent. The traditional process of moving data from platform to platform, from process to process only leads to higher cost, more data latency, confusion as to where it he correct data, and often times, bad decisions that affect the end customer.

It is apparent the old ways of moving data around is no longer feasible, it is time to embrace what the leaders have already know for years: you need to stop moving data and start moving process. Fortunately, Teradata has been the company enabling just that evolution and you can quickly leverage the same capability.

This session highlights the in-database functions that have been introduced since Teradata 12.0. These include Temporal, Geospatial, OLAP integration, and data labs. It also contains some of the new features in Teradata 15.0 with JSON, Script Programs, and QueryGrid which allows you to seamlessly cross platforms to integrate data, and leverage processing from Teradata, Aster, Hadoop and other environments.

Presenter: Rob Armstrong - Teradata Corporation

Training details

Course Number:

51944

Training URL:

http://www.teradata.com/t/ten/

Training Format:

Recorded webcast

Price:

$195

Credit Hours:

Channel:

Tags:

Accelerated Analytics

↧

Performance Diagnostic Methodology

October 6, 2014, 11:22 am

≫ Next: Teradata Active System Management (TASM) with SLES 11 Priority Scheduler

≪ Previous: Move The Process, Not The Data – Accelerating Analytics With In-Database Processing

Performance Diagnostic Method and Tools is targeted to uncover and resolve performance related issues in three primary areas...

Costly SQL Operations --- this will go beyond finding and resolving the “Killer Query” and the highly skewed queries which are quite easy to identify. Focus on finding costly scripts and repeating patterns which tend to have lots of smaller operations well below the radar of the traditional performance tuning effort.
Excessive Statistics Collection --- Statistics collection is very important in that it gives the optimizer critical demographic information with which to build efficient join plans. But statistics collection has a cost and it can be easily overdone. There are some systems where over 20% of the resources are consumed in statistics collection. Add the cost of ETL and there isn’t very much system left to perform analytic work. A reasonable target is five to ten percent.
Index and Partition Tuning --- it is a well-known fact that Teradata is extremely effective at doing full table scans and other that the Primary Index needed for data distribution is not overly dependent on indexes. What is less known is that Teradata has a full suite of index options which can be effective to improve efficiency and performance. The methods and tools shown here have proven helpful in identifying where index efforts should be focused.

Presenter: Arnie VanWyk

Training details

Course Number:

51945

Training URL:

http://www.teradata.com/t/ten/

Training Format:

Recorded webcast

Price:

$195

Credit Hours:

Channel:

Tags:

↧

Teradata Active System Management (TASM) with SLES 11 Priority Scheduler

October 9, 2014, 5:11 pm

≫ Next: OLAP Analytics in Action

≪ Previous: Performance Diagnostic Methodology

A new database priority scheduler is introduced with the Linux SLES 11 operating system. This changes how workload priorities are managed within Teradata Active System Management (TASM).

This presentation introduces the new database priority scheduler and TASM Workload Designer changes. We'll review the TASM migration options going from Linux SLES 10 to Linux SLES 11. We'll also discuss priority scheduler configuration and performance measurement.

Workshop outline: Introduce the new database priority scheduler using Viewpoint Workload Designer, review Linux SLES 11 database priority scheduler fundamentals, compare SLES 11 Pre-migration Tool and automatic migration upgrade approaches, describe Teradata Appliance platform features for the new scheduler, and discuss priority scheduler configuration and measurement techniques. This workshop brings together the pieces of the TASM and SLES 11 puzzle to help you with a smooth transition to the new priority scheduler.

Presenter: Dan Fritz - Teradata Corporation

Training details

Course Number:

52182

Training URL:

http://www.teradata.com/t/ten/

Training Format:

Recorded webcast

Price:

$195

Credit Hours:

Channel:

Tags:

↧

OLAP Analytics in Action

October 15, 2014, 2:38 pm

≫ Next: Putting Teradata Geospatial On The Map

≪ Previous: Teradata Active System Management (TASM) with SLES 11 Priority Scheduler

This presentation demystifies usage of OLAP Analytics function for TD 15.0 and previous releases.

It explains how to use all OLAP keywords, such as OVER, PARTITION BY, ORDER BY, ROWS UNBOUNDED, FOLLOWING, PRECEDING, CURRENT ROW and RESET WHEN. The material includes simple academic examples for clarity and real life SQL to understand how to apply OLAP in day to day situations.

Key Points:

Demystifies OLAP syntax
Understand how OLAP can answer business questions
Day to Day coding solution
Write better SQL

Note: This is a TD 15.0 updated presentation.

Presenter: Patrice Berube, Solution Architect - Teradata

Audience:

Data Warehouse Technical Specialist, Database Administrator, Data Warehouse Analytical Modeler, Data Warehouse Application Specialist

Training details

Course Number:

40140

Training URL:

http://www.teradata.com/t/ten/

Training Format:

Recorded webcast

Price:

$195

Credit Hours:

Tags:

Channel:

↧

Putting Teradata Geospatial On The Map

October 16, 2014, 4:05 pm

≫ Next: Best Practices for Writing and Tuning Teradata SQL

≪ Previous: OLAP Analytics in Action

The Teradata Database provides a powerful and scalable platform for storing and processing geospatial data

This presentation covers some of the key Teradata geospatial capabilities, features, and performance enhancement techniques and how they can be leveraged. Visualization and GIS tool The Teradata Database provides a powerful and scalable platform for storing and processing geospatial data.

Note: This was a 2011 Teradata Partners Conference session, updated and re-recorded in September 2014.

Presenter: Igor Machin, Senior Product Manager – Teradata

Audience:

Data Warehouse Architect/Designer, Data Warehouse Modeler, Data Warehouse Analytical Modeler

Training details

Course Number:

50111

Training URL:

http://www.teradata.com/t/ten/

Training Format:

Recorded webcast

Price:

$195

Credit Hours:

Tags:

Channel:

↧

Best Practices for Writing and Tuning Teradata SQL

November 14, 2014, 3:39 pm

≫ Next: A Structured Method for Tuning Teradata SQL

≪ Previous: Putting Teradata Geospatial On The Map

This session is intended for those already familiar with writing SQL but who are somewhat new to Teradata.

Teradata’s parallel architecture is best utilized with a basic understanding of concepts such as hashed access, set vs. row-at-a-time processing, and Teradata’s cost-based optimizer, with its reliance on statistics.

This course gives insight into Teradata performance considerations and helps attendees write efficient queries by providing tools and techniques for measuring and improving query performance.

Presenter: Robert Smith - Teradata

Training details

Course Number:

52811

Training URL:

http://www.teradata.com/t/ten/

Training Format:

Recorded webcast

Price:

$195

Credit Hours:

Channel:

Tags:

↧

A Structured Method for Tuning Teradata SQL

November 17, 2014, 3:38 pm

≫ Next: Teradata Database Architecture Overview

≪ Previous: Best Practices for Writing and Tuning Teradata SQL

SQL performance is vital for driving value from a data warehouse. With today's growing query complexity, optimizing SQL can be a daunting task ...

This session provides a fundamental method for SQL tuning that has been used and refined through thousands of tuning exercises. The structure and process in this method generates a context that minimizes the complexities and gets to the root of the performance issues, which enables faster remediation. Further, the method provides a consistent, repeatable process that can be leveraged across the organization from the end user to database administrator. In addition to the structured method, the presenters will share insight on typically seen performance problems and provide sample remedies. This provides the attendee with insight to commence their tuning efforts.

Key Points:

SQL tuning can be a daunting task. This method leads an analyst to diagnose & see the root cause of a performance issue in the process of developing a solution.
The attendee will receive a proven tuning method that has been used to systematically diagnose & fix 1000s of SQL tuning issues.
SQL tuning diagnostics with typical problem types & sample remedies are provided in this session to the attendee.

Presenters:
Arnie Van Wyck – Teradata
David H. Gardner – Teradata

Audience:

Data Warehouse Administrator; Data Warehouse Application Specialist; Data Warehouse Architect/ Designer; Data Warehouse Modeler; Data Warehouse Technical Specialist

Training details

Course Number:

45703

Training URL:

http://www.teradata.com/t/ten/

Training Format:

Recorded webcast

Price:

$195

Credit Hours:

Tags:

Channel:

↧

Teradata Database Architecture Overview

November 21, 2014, 5:13 pm

≫ Next: Statistics Enhancements in Teradata Database 14.0/14.10

≪ Previous: A Structured Method for Tuning Teradata SQL

The Teradata Database system is different from other databases in a number of fundamental ways. If you want to know what these differences are and how they make it possible for Teradata to deliver unlimited scalability in every dimension, high performance, simple management, Query Grid to Hadoop and other databases and all the other requirements of the Active Data Warehouse and structured Big Data, then this is the session for you.

How the latest additions to Teradata extend and change the architecture will be highlighted. We will discuss the architecture of Teradata in detail and how it enables you to quickly, efficiently and flexibly deliver value to your business.

With a plethora of database products on the market and an emerging market of new players today, it is increasingly difficult to understand how any of them are differentiated. It is challenging for evaluators and decision makers to get below the marketing slides to understand the technical differences.

Key product requirements needed to support scalable SQL analytics will be defined. An in depth look at the architecture will show what makes Teradata a differentiated product and how it responds to those requirements. Recent changes to extend the Teradata architecture will be included.

Evaluators and decision makers will have better information to use when choosing technology for a data warehouse. Architects, DBAs and developers will gain a better understanding of how Teradata works which they can use to get the best from their investment. Experienced users will get a view into the latest additions.

Note:This was a 2012 Teradata Partners Conference session, updated in November 2014.

Presenter: Todd Walter, Chief Technologist and Distinguished Fellow – Teradata

Audience:

Data Warehouse Administrator, Data Warehouse Architect/Designer, Data Warehouse Program Manager

Training details

Course Number:

49873

Training URL:

http://www.teradata.com/t/ten/

Training Format:

Recorded webcast

Price:

$195

Credit Hours:

Tags:

multi-temperature storage

Channel:

database

↧

Statistics Enhancements in Teradata Database 14.0/14.10

December 4, 2014, 2:51 pm

≫ Next: Introduction to Teradata Active System Management (TASM)

≪ Previous: Teradata Database Architecture Overview

This session focuses on statistics enhancements in Teradata Database 14.10, particularly the new threshold functionality, skipping statistics, and downgrade to sampling.

How to find out which statistics are not being used, and which ones are missing will be discussed, as well as the new ability to collect statistics on expressions. Some background on the new statistics enhancements that became available in the earlier 14.0 release will be be provided as well.

Presenter: Carrie Ballinger - Teradata Corporation

Training details

Course Number:

52772

Training URL:

http://www.teradata.com/t/ten/

Training Format:

Recorded webcast

Price:

$195

Credit Hours:

Channel:

Tags:

↧

Introduction to Teradata Active System Management (TASM)

December 11, 2014, 1:18 pm

≫ Next: Data Collection for a Solid Performance Management Foundation

≪ Previous: Statistics Enhancements in Teradata Database 14.0/14.10

TASM gives customers the ability to manage Teradata resource usage and performance on platforms executing diverse types of work.

This session discusses how to get started using TASM, as well as what workload management rules are available within TASM to control concurrency, to determine what queries are allowed to run, or to differentiate the priorities of active workloads. Examples of commonly used options from current TASM users are shared.

Key Points

Overview of TASM
TASM Features
Frequently used TASM options

Presenter: Carrie Ballinger, Sr. Technical Consultant – Teradata Corporation

Audience:

DataWarehouse Architects, DataWarehouse System Administrators, DBAs, Enterprise or IT Architects

Training details

Course Number:

39365

Training URL:

http://www.teradata.com/t/ten/

Training Format:

Recorded webcast

Price:

$195

Credit Hours:

Tags:

Channel:

↧

Data Collection for a Solid Performance Management Foundation

December 15, 2014, 3:38 pm

≫ Next: Teradata 15.0 QueryGrid

≪ Previous: Introduction to Teradata Active System Management (TASM)

Course Number:

52524

Training URL:

http://www.teradata.com/t/ten/

Training Format:

Recorded webcast

This presentation will review the benefits of collecting performance data on the Teradata platform.

We’ll discuss the major subject areas of performance data, the uses of each, and best practices around performance data retention. Teradata’s PDCR data collection tool will also be reviewed.

Presenter: Dan Fritz - Teradata Corporation

Price:

$195

Credit Hours:

Channel:

Tags:

↧