Teradata 15.0 QueryGrid

December 29, 2014, 12:47 pm

≫ Next: Identifying Used, Unused and Missing Statistics

≪ Previous: Data Collection for a Solid Performance Management Foundation

Course Number:

52330

Training URL:

http://www.teradata.com/t/ten/

Training Format:

Recorded webcast

QueryGrid provides access to multiple data sources within UDA for Teradata Database 15.0. Learn what QueryGrid provides, how to set it up, and how users access remote data in a self-service manner.

Presenter: Rich Charucki, Engineering Fellow, Teradata R & D

Price:

$195

Credit Hours:

Channel:

Tags:

↧

Identifying Used, Unused and Missing Statistics

January 8, 2015, 2:15 pm

≫ Next: New, Simplified Approach to Parsing Workload Selection

≪ Previous: Teradata 15.0 QueryGrid

Cover Image:

New DBQL logging options USECOUNT and STATSUSAGE, introduced in Teradata Database 14.10, enable the logging of used and missing statistics. The output of this logging can be utilized to find used, unused, and missing statistics globally (for all queries) or for just a subset of queries.

The USECOUNT logging option, which starting in Teradata Database 14.10 supports the object use count (OUC) feature, is associated with the database that owns the table. USECOUNT logs usage from all requests against that table irrespective of the user who submitted the query. This option logs the object usage from such objects as databases, tables, indexes, join indexes, and statistics. Additionally, it logs insert, update, and delete counts against the database’s tables. The STATSUSAGE logging option, similar to other DBQL logging options, is associated directly to a user and logs the used and missing statistics at the query level within an XML document.

In this article, we focus on the logging of the used statistics, how to retrieve them and join them to other dictionary tables to find the global list of used and unused statistics and also the missing statistics at query level.

Enabling the DBQL USECOUNT Option

Each statistic you define becomes an object that USECOUNT logging can track. Each time the optimizer actually makes use of a specific statistic, an access counter in the new DBC.ObjectUsage table gets incremented. So at any point in time you can look in the data dictionary and evaluate how frequently (or even if ever) a particular statistic is being used.

The descriptions of the statements to enable object use count (OUC) logging on a given database or a user are given below. It is especially useful to enable the USECOUNT option on the databases that own permanent tables. It can be enabled on users or user accounts also, but you would do that only when these accounts have tables on which you want the logging enabled.

Description	Command
To enable on a database (which is not a user). Note that other DBQL options are not allowed to be enabled on a database.	BEGIN QUERY LOGGING WITH USECOUNT ON <database Name>;
To enable on a user with no DBQL options enabled.	BEGIN QUERY LOGGING WITH USECOUNT ON <User Name>;
To enable on a user having some existing DBQL options enabled (use SHOW QUERY LOGGING ON <User> to find the current DBQL options).	REPLACE QUERY LOGGING WITH <current options>, USECOUNT ON <User Name>;

The following are exceptions and special scenarios in OUC logging:

OUC logging is not applicable for DBC tables. In other words, the OUC feature doesn’t log the use counts for statistics you may be collecting on dictionary objects.
OUC logging is not applicable for temporary tables.
Dropping statistics also drops the corresponding entries in DBC.ObjectUsage table. So, if you are dropping and recollecting statistics, the use counts reflect usage from the last collection only.
Unlike other DBQL options, disabling USECOUNT invalidates (resets the counts and timestamp) the corresponding rows in DBC.ObjectUsage. It is not recommended to disable this option unless there is a specific reason to do so.

The following sections describe how to find these statistics for each category along with examples.

Identifying Used Statistics

After enabling DBQL USECOUNT and letting some time pass, the following query can be used to query the dictionary table DBC.ObjectUsage to get the use counts of the existing statistics. The number of days a statistic has been continuously under the control of USECOUNT logging is given by the column DaysStatLogged; a value of -1 for this column indicates that USECOUNT option has been disabled after being enabled for some time in the past. You need to consider the number of days a statistic was a candidate for logging along with use counts to normalize the comparison across different statistics.

SELECT DBC.DBase.DatabaseName AS DatabaseName
      ,DBC.TVM.TVMName        AS TableName
      ,COALESCE(DBC.StatsTbl.StatsName
         ,DBC.StatsTbl.ExpressionList
               ,'SUMMARY')    AS StatName
      ,OU.UserAccessCnt       AS UseCount
      ,CAST(OU.LastAccessTimeStamp AS DATE) AS DateLastUsed
      ,CASE
       WHEN DBC.DBQLRuleTbl.TimeCreated IS NULL
       THEN -1  -- Logging disabled
       WHEN DBC.DBQLRuleTbl.TimeCreated > DBC.StatsTbl.CreateTimeStamp
       THEN CURRENT_DATE - CAST(DBC.DBQLRuleTbl.TimeCreated AS DATE)
       ELSE CURRENT_DATE - CAST(DBC.StatsTbl.CreateTimeStamp AS DATE)
       END AS DaysStatLogged
FROM DBC.ObjectUsage OU
    ,DBC.Dbase
    ,DBC.TVM
    ,DBC.StatsTbl LEFT JOIN DBC.DBQLRuleTbl
           ON DBC.StatsTbl.DatabaseId = DBC.DBQLRuleTbl.UserID
          AND DBQLRuleTbl.ExtraField5 = 'T'   /* Comment this line if TD 15.0 or above   */
        /*AND DBQLRuleTbl.ObjectUsage = 'T'*/ /* Uncomment this line if TD 15.0 or above */
WHERE DBC.Dbase.DatabaseId    = OU.DatabaseId
  AND DBC.TVM.TVMId           = OU.ObjectId
  AND DBC.StatsTbl.DatabaseId = OU.DatabaseId
  AND DBC.StatsTbl.ObjectId   = OU.ObjectId
  AND DBC.StatsTbl.StatsId    = OU.FieldId
  AND OU.UsageType            = 'STA'
ORDER BY 1, 2, 3;

Customize the above query for your databases and tables of interest, number of days a statistic is logged, etc. A sample output of the above query is given below.

Database Name	TableName	StatName	Use Count	DateLast Used	DaysStat Logged
DBA_TEST	PARTY_DATA	CURR_FLAG	209	11/15/2014	-1
PNR_DB	CODE_SHARE	FK_ITIODPAD_ITINERARY_INFO_ODP	253	11/1/2014	34
PNR_DB	ITINERARY_TBL	AD_ITINERARY_INFO_ODP	203	11/1/2014	34
PNR_DB	ITINERARY_TBL	CD_ID_PRODUCT	203	11/1/2014	34
PNR_DB	ITINERARY_TBL	CD_ORIGINAL_STATUS_CODE,CD_STATUS, FK_PNRODP_AD_PNR_HEADER_ODP	203	11/2/2014	34
PNR_DB	PNR_HEADER_ODP	AD_PNR_HEADER_ODP	600	11/28/2014	34
PNR_DB	PNR_HEADER_ODP	AD_PNR_HEADER_ODP,CD_CONTROL_NUMBER_LOC	700	11/28/2014	34

Identifying Unused Statistics

Using USECOUNT to find unused statistics requires consideration of multiple time dimensions. The basic question being answered here is: Am I collecting statistics that have not been used for some duration? To answer this question, you need to consider not only the usage counts, but also how long the USECOUNT logging was active, the most recent usage time stamp and also the age of the statistics. For example, if USECOUNT logging has been enabled only for the past few days, some statistics which get used on a monthly/quarterly workload may not yet have been logged as used. Similarly, if you collect new statistics (not re-collections), you need to let them get exposed to different workloads for certain period of time to properly identify whether they are being used or not.

The following query is designed to consider these aspects and to list out the statistics that meet the following criteria.

The age of the statistic is more than N days (first collected more than N days ago).
DBQL USECOUNT logging is active for more than N days on the owning database.
The statistic has not been used in the last N days.

The value of N can be customized based on your requirements. In the example query given below, the value of N is set to 30 (note that N is used in two predicates; both need to be updated if you customize this value).

SELECT DBC.DBase.DatabaseName AS DatabaseName
      ,DBC.TVM.TVMName        AS TableName
      ,COALESCE(DBC.StatsTbl.StatsName
               ,DBC.StatsTbl.ExpressionList
               ,'SUMMARY')    AS StatName
      ,CURRENT_DATE - CAST(DBC.StatsTbl.CreateTimeStamp AS DATE) AS StatAge
      ,CASE
       WHEN DatabaseName = 'DBC'       THEN -2  -- Logging Not Applicable
       WHEN DBC.StatsTbl.StatsType IN ('B', 'M')
       THEN -2  -- Logging Not Applicable on Temp tables (base and materialized)
       WHEN DBC.DBQLRuleTbl.TimeCreated IS NULL
       THEN -1  -- Logging Not Enabled
       WHEN DBC.DBQLRuleTbl.TimeCreated > DBC.StatsTbl.CreateTimeStamp
       THEN CURRENT_DATE - CAST(DBC.DBQLRuleTbl.TimeCreated AS DATE)
       ELSE CURRENT_DATE - CAST(DBC.StatsTbl.CreateTimeStamp AS DATE)
 END AS DaysStatLogged 
FROM   DBC.StatsTbl LEFT JOIN  DBC.DBQLRuleTbl
           ON DBC.StatsTbl.DatabaseId = DBC.DBQLRuleTbl.UserID
          AND DBQLRuleTbl.ExtraField5 = 'T'    /* Comment this line if TD 15.0 or above   */
        /*AND DBQLRuleTbl.ObjectUsage = 'T'*/  /* Uncomment this line if TD 15.0 or above */
      ,DBC.Dbase
      ,DBC.TVM      
WHERE DBC.StatsTbl.DatabaseId = DBC.DBASE.DatabaseId
  AND DBC.StatsTbl.ObjectId   = DBC.TVM.TVMId
  AND NOT EXISTS (SELECT 100 FROM DBC.ObjectUsage OU
                  WHERE OU.UsageType  = 'STA'                    AND OU.DatabaseId = DBC.StatsTbl.DatabaseId
                    AND OU.ObjectId   = DBC.StatsTbl.ObjectId
                    AND OU.FieldId    = DBC.StatsTbl.StatsId
                    AND CURRENT_DATE - CAST(OU.LastAccessTimeStamp AS DATE) < 30
                 )
  AND DaysStatLogged > 30
  AND DBC.StatsTbl.StatsId <> 0  -- Do not qualify table-level SUMMARY statistics as unused
                                 -- May get implicitly used but not recorded as used
ORDER BY 1, 2, 3;

You can customize the above query to adjust the number of days the statistic has not been used, for the databases and tables of your interest, the age of the statistic, etc.

DatabaseName	TableName	StatName	Stat Age	DaysStat Logged
PNR_DB	ITINERARY_TBL	PARTITION	34	34
PNR_DB	ITINERARY_TBL	ST_281020293276_0_ITINERARY_INFO_ODP	34	34
PNR_DB	ITINERARY_TBL	TS_ULT_ALZ	34	34
PNR_DB	PNR_HEADER_ODP	CD_CREATION_OFFICE_ID	34	34
PNR_DB	PNR_HEADER_ODP	CD_CREATOR_IATA_CODE	34	34
PNR_DB	PNR_HEADER_ODP	PARTITION	34	34

Identifying Missing Statistics

Two additional DBQL logging options, STATSUSAGE and XMLPLAN, create XML documents that identify which statistics were used, and ones that the optimizer looked for but did not find. Contrary to USECOUNT, these two logging options should be turned on temporarily and only as needed for analysis. These two options are enabled the same way as other DBQL logging is, by User or Account. Their output can be found in DBC.DBQLXMLTbl.

STATSUSAGE logs the usage of existing statistics within a query, as well as recommendations for new statistics that were found to be missing when the query was optimized. It does this without tying the recommendation to a particular query step in the plan. The relationship to query steps can only be seen if XMLPLAN logging is also enabled. XMLPLAN logs detailed step data in XML format.

Enable STATSUSAGE when looking for missing statistics always, with or without XMLPLAN. If you enable XMLPLAN by itself without STATSUSAGE, no statistics-related information of any kind is logged into DBQLXMLTbl. STATSUSAGE provides all the statistics recommendations and, if both are being logged, those statistic recommendations can be attached to specific steps in the plan.

Because XMLPLAN comes with increased overhead, for the purposes of identifying missing statistics, it usually is sufficient to start with STATSUSAGE logging without XMPLPLAN. The step information available in XMLPLAN is more useful when in analysis mode.

There is more information about logging to the DBQLXMLTbl in the DBQL chapter in the Database Administration manual, as well as in the orange book Teradata Database 14.10 Statistics Enhancements.

At the end of the XML document that represents the query, there is a series of entries with the <StatsMissing> label that look like this:

  <StatsMissing Importance="High">
      <RelationRef Ref="REL1"/>
      <FieldRef Ref="REL1_FLD1035"/>
    </StatsMissing>
    <StatsMissing Importance="High">
      <RelationRef Ref="REL2"/>
      <FieldRef Ref="REL2_FLD1025"/>
    </StatsMissing>

Each FieldRef value indicates a relation and field that together represent a missing statistic. It is composed of a relation ID (REL1, for example) and a field ID (FLD1035, for example). If the statistic is composed of multiple columns there is a FieldList label under which a FieldRef for each individual column appears.

In order to discover the actual table and column name, look higher in the XML document to find the Relation, and match the Relation ID to the Relation ID that appears in the StatMissing data. That tells you the name of the table. Below each relation description is a label for each field which provides both the FieldID and the FieldName. This allows you to easily match the ID values from the StatsMissing section to the Relation and Field sections higher up. Here are two Field sections from the top of the same XML document that match to the StatsMissing sections illustrated above.

<Field FieldID="1035" FieldName="L_SHIPDATE" Id= "REL1_FLD1035" JoinAccessFrequency="0" 
  RangeAccessFrequency="0" RelationId="REL1" ValueAccessFrequency="0"/>
<Field DataLength="4" FieldID="1025" FieldName="O_ORDERKEY" FieldType="I" Id="REL2_FLD1025" JoinAccessFrequency="0" 
  RangeAccessFrequency="0" RelationId="REL2" ValueAccessFrequency="0"/>

Note that the optimizer assigns an importance to each missing statistic it reports on.

Once you have identified missing statistics for one query, you can consider how many other queries also list those statistics as missing, and then make the decision whether or not to start collecting them. Using these new functionalities, you can streamline your statistics collection routines so they are more efficient and more focused.

Channel:

database

Ignore ancestor settings:

Tags:

statistics

optimizer

Apply supersede status to children:

↧

New, Simplified Approach to Parsing Workload Selection

January 19, 2015, 9:47 am

≫ Next: Big Blocks: Usability Tips & Tradeoffs

≪ Previous: Identifying Used, Unused and Missing Statistics

Cover Image:

As of Teradata Database 14.10.05 and 15.00.03, a simplified approach to determining which workload will support session management and parsing has been adopted This posting describes this more straightforward approach which will be used starting in these 14.10 and 15.0 releases, and going forward in all future releases.

An earlier posting titled: “A Closer Look at How to Setup a Parsing-Only Workload” explains how session and parsing workload assignment takes place prior to this more simplified approach. If you are on an earlier release, please see that posting from December 2014.

Background: Session Classification

“Session classification” happens at session logon time. Session classification determines which TASM or TIWM workload will be used by tasks on the PE that get the session established before any queries are issued. These parsing engine tasks do things such as validate the user who is logging in, and they execute under the control of a workload, just like AMP work does. Once a session begins to submit queries, all parsing and optimizing work on behalf of the queries will run at the session priority that has already been established for this session at logon time.

The workload that a session classifies to can be identified in DBQLogTbl in field SessionWDID. The workload used for session handling is the same workload that will support all parsing and optimizing activity for the queries that are executed within that session.

How a Session Classifies to a Workload

A session classifies to a single workload, based strictly on the “Who” classification criteria (referred to as Request Source criteria in Viewpoint) of the workload. Information about a session (and subsequently the queries within that session) is available at session logon time and includes Who classification criteria such as Account string, Application, Username and Profile.

A workload often has other classification criteria in addition to Who criteria. All other non-Who classification criteria that the workload might have (such as estimated processing time) are ignored at session classification time. That non-Who criteria will be used only to determine which workload is chosen when a request executes on the AMPs.

TASM and TIWM determine the workload that will support session management (and secondarily, request parsing for the session’s requests) by taking the following steps:

Sort all workloads by priority (highest priority workloads first). In SLES 10 priority is expressed by relative weight. In SLES 11 priority is based on tier first, and within each tier the highest global weight. If there are multiple virtual partitions, all Tier 1 workloads will sort above all Tier 2 workloads, etc. In Timeshare, all Timeshare Top workloads will sort above all Timeshare High workloads, etc.
Select the first workload encountered from within the sorted list where there is a match between the session logon information and a Who criteria of the workload. This selection process does not take into consideration the number of matches between a session and a workload’s Who criteria. The first workload that has any of its Who criteria matched by the session information will be selected.

Consider these five workloads in SLES 11. For simplicity’s sake, assume there is a single virtual partition defined:

Example:

WD-Critical: Account = ‘$M1$MSI&I’, Est Time <= 20 Sec. ==> SLG Tier 1, Alloc = 25%

WD-Short: Account = ‘$M1$MSI&I’, Est Time <= 500 Sec. ==> SLG Tier 1, Alloc = 15%

WD-Medium: Account = ‘$M1$MSI&I’, User = ‘RVP’, Est Time > 500 Sec. ==> SLG Tier 2, Alloc = 30%

WD-Long: Account = ‘$M1$MSI&I’, User = ‘DXA’, Est Time > 2000 Sec. ==> Timeshare High

WD-VeryLong: Account = ‘$M1$BOJ&I’, User = ‘DXA’, Est Time > 2000 Sec. ==> Timeshare Low

These five workloads are displayed in the order that they will be sorted in for session workload selection, highest priority first.

All requests whose Account = $M1$MSI&l will find a match on Who criteria to the first four workloads. Because WD-Critical is first in the sorted order, that workload will become the session workload and will be used for parsing for all requests whose Account = $M1$MSI&l.

When User DXA logs in with Account = $M1$MSI&l, his session and parsing priority will take place in workload WD-Critical. But if he switches accounts and logs in with Account = $M1$BOJ&I his session management and parsing workload will take place in MSI-Long based on the highest priority matching on User = DXA.

Setting up Special Parsing-Only Workloads

With this background, you can set up one, several or even many workloads specifically for parsing if you wish. Or you can just use the existing workloads and observe the information in the SessionWDID field in DBQLogTbl output using this new understanding to make better sense of the priority at which parsing is taking place.

If you decide to set up a single higher-priority workload for all parsing activities for all requests, then you will want to create a new workload and give it a very high priority.

To make sure this workload is chosen for the SessionWDID of all requests, consider including a Who criteria (such as Username, or Account) using the broadest possible matching scope (Username = * or Account = *). Only a single wild card Who classification is required.

It is recommended that Profile classification criterion not be used in defining a parsing-only workload, unless all users accessing the platform have an actual Profile assigned. When there is wild card classification on Profile (Profile = *), any session that logs on without providing a Profile will be disqualified from the workload.

A Second Important Step

Because you only want parsing work to run in the high-priority parsing-only workload, and you are using wild card Who criteria, an important second step needs to be taken so you can prevent AMP work from running there (it is a parsing-only workload after all). You must put in dummy secondary classification for that parsing-only workload so that no query will ever successfully map to the parsing-only workload for its AMP work.

For example, you could create a special dummy table just for this purpose that you are confident no one actually accesses. Then specify that table in the parsing workload’s secondary classification criteria. Or you could setup multiple criteria that are contradictory and therefore highly unlikely to ever be represented together in a query. For example, you could add query characteristic criteria that mandates that only single/few AMP queries will run in the workload, and at the same time include very large plan estimates, as shown in the following example.

Establishing a parsing-only workload is not going to reduce the resources required to do parsing. If the plan is complex or many decisions have to be made by the optimizer in producing the plan, the same number of CPU seconds will be required to accomplish parsing. However, a high priority parsing-only workload may speed up the time to do that parsing when the system is busy, because parsing may now be running at a higher priority.

You could consider placing this parsing workload on the SLES 11 Tactical Tier, but be very careful in doing that. If some of the parsing activity consumes more than 2 CPU seconds per node, the parsing task risks being demoted to a lower priority workload due to the automatic tactical exception that all tactical workloads come with. If you wish parsing to take place at a tactical priority, consider increasing the CPU threshold for demotion in the parsing-only workload’s tactical exception definition, so that it is greater than any expected parser CPU times.

Of course, as is true with all workload management decisions, you need to examine the tradeoffs involved. When you increase the priority of one type of work, by definition you reduce the priority of some other work. So make sure you are keeping an eye on the overall balance of priorities on your platform. As a result, you may want to only apply this parsing-only workload technique to just a critical set of queries, rather than all active queries.

Ignore ancestor settings:

Tags:

Apply supersede status to children:

↧

Big Blocks: Usability Tips & Tradeoffs

January 20, 2015, 5:08 pm

≫ Next: Now You See It, Now You Can't - How to Use Encryption in Teradata Systems

≪ Previous: New, Simplified Approach to Parsing Workload Selection

Course Number:

52691

Training URL:

http://www.teradata.com/t/ten/

Training Format:

Recorded webcast

What are big blocks? When does it make sense to use them? How do I get started?

This session will describe the 1 MB Data Block feature available in Teradata Database 14.10, and offer tips and techniques for using larger block sizes successfully. We’ll discuss the different recommendations for the Active EDW platform compared to the Appliance, and look at some test results that highlight where larger block sizes can make the most difference. In addition, we’ll point out tradeoffs and special cases to be aware of. And finally, you’ll hear specific suggestions for how to get started, alternative ways of expressing larger block sizes, and the key parameters supporting larger spool and base table block sizes. Reduce I/O and improve platform throughput with informed use of big blocks.

Presenter: Carrie Ballinger

Price:

$195

Credit Hours:

Channel:

Tags:

↧

Now You See It, Now You Can't - How to Use Encryption in Teradata Systems

January 23, 2015, 12:24 pm

≫ Next: Teradata QueryGrid and Adaptive Optimization

≪ Previous: Big Blocks: Usability Tips & Tradeoffs

Course Number:

49890

Training URL:

http://www.teradata.com/t/ten/

Training Format:

Recorded webcast

Many industry regulations, standards, and policies mandate the use of strong encryption to meet various security requirements.

Fortunately, encryption is widely supported within Teradata systems to secure access to systems and to protect sensitive information during transmission when stored in the database and when archived to tape. This presentation identifies key issues and considerations for the use of strong encryption technology, describes how to enable and use the various encryption features provided with the Teradata Database and platforms, and offers some important best practices learned from customer experiences to ensure the effective operation of systems where encryption is deployed.

Teradata Release information: Through Teradata 13.10

Note: This was a 2011 Teradata Partners Conference session, originally recorded in 2012 and updated in 2015.

Presenter: Jim Browning, Architect – Teradata

Audience:

Data Warehouse Administrator, Data Warehouse Architect/Designer, Data Warehouse Technical Specialist

Price:

$195

Credit Hours:

Channel:

Tags:

↧

Teradata QueryGrid and Adaptive Optimization

February 5, 2015, 1:39 pm

≫ Next: Controlling the Mix: What’s New in Teradata Active Systems Management for TD15.0

≪ Previous: Now You See It, Now You Can't - How to Use Encryption in Teradata Systems

Short teaser:

Not Your Typical DBMS-Hadoop Connector

Cover Image:

Teradata QueryGrid and Adaptive Optimization:

Not Your Typical DBMS-Hadoop Connector

Dr. Daniel Abadi October 17, 2014

Thisis a guest post by Dr. Abadi also posted at http://blogs.teradata.com/data-points/teradata-querygrid-and-adaptive-optimization/

For those readers who followed my writings for the Hadapt blog before it was acquired by Teradata http://hadapt.com/blog/, one of my common refrains was the architectural flaws inherent in database connectors to Hadoop. My problems with connectors centered on the following issues:

(1) Big data may mean different things to different people, but just about everybody agrees that it includes the property of “bigness” --- i.e. large scales of data. As Hadoop emerges as one of the major players in the Big Data space, enterprises are becoming increasingly comfortable storing very large data sets in Hadoop. Shipping a petabyte of data over a network to other systems for processing is challenging and expensive to do at high performance, and violates the general “Big Data” architectural principle of pushing the processing to the data.

(2) Queries that involved joining data located inside Hadoop with data inside the database system generally required loading the data from one system to the other, before running the query in a single system (either all in Hadoop, or all in the database system).

(3) For connectors that enabled queries to access data stored in a remote system (i.e. access DBMS data from Hadoop, or access Hadoop data from the DBMS), the lack of shared metadata makes optimizing these queries complex, and scan-based plans (over petabytes of data) are generally the only option.

In this post, I will discuss Teradata’s QueryGrid technology, and how, when used to “connect” Teradata and Hadoop, it functions quite differently from the traditional database-Hadoop connectors discussed above.

First, a little bit of history. When DBMS-Hadoop connectors first started being developed, Hadoop’s native query processing abilities were extremely poor. Hive still required every query to use the MapReduce data processing engine, Impala didn’t exist yet, Hadapt was still in its nascent stages, and Presto, Shark, and Parquet were not even conceptualized yet. Connectors to database systems were therefore created to enable Hadoop data to be processed by far more advanced query processing engines.

The world is quite different today, and Hadoop’s native query processing engines have grown by leaps and bounds, performing many more query processing operations at much improved performance. While more work still needs to be done for Hadoop’s engines to match the decades of effort of query optimizer refinement and SQL support that traditional database systems have been able to achieve, Hadoop’s native query processing support is finally rounding into shape.

Teradata QueryGrid over Inifiniband

Therefore, Teradata’s QueryGrid works in conjunction with the native query processing execution engines of Hadoop rather than attempting to replace them with Teradata. For example, let’s say a retail company has an error log file from their web server stored in Hadoop, and it was recently discovered that a bug in the application code sometimes prevented customers from being able to add to their shopping cart a quantity of more than one item of some particular widget. We therefore want to go through our error log and find customers that ran into this bug, and send them an email inviting them to increase their order quantity. While the error log contains information about when this bug was encountered, the email address and other contact information is stored in a Teradata database. In order to perform this query, we therefore need to join data in Hadoop with data inside Teradata.

Teradata’s QueryGrid technology allows users to express such queries over Teradata and Hadoop data in a single SQL query, and enables Hadoop (e.g. via Hive/Tez) to process the part of the query that accesses Hadoop-local data before performing the join. In our example, Hadoop’s native query processing engine performs (1) the scan of the error log for entries matching the particular bug that prevented quantities of more than one from being added to the shopping cart, and (2) the extraction of customer identifier information associated with these log entries --- either directly from the error log or via a join with another table in Hadoop that can be used to map session identifiers to customer identifiers. This list of matching customer identifiers is thus the only data that is sent over the “grid” to Teradata, where processing of the query (extracting the email addresses of these customers) is completed. By pushing down most of the query processing to Hadoop, Teradata thus avoids the “Big Data” network bottleneck of traditional DBMS-Hadoop connectors.

Even for those queries where the processing performed by Hadoop does not reduce the input data size, QueryGrid’s parallel data transfer technology enables a petabyte of data to be transferred from Hadoop to Teradata in a few minutes. (This assumes the transfer is done from a 1000-node Hadoop cluster to a Teradata cluster with 1000 AMPs over an Infiniband connection. Clusters that are 10 times smaller will take approximately 10 times as long). Each HDFS data node is automatically mapped to a Teradata AMP node and data is transferred directly over a socket connection. Thus, QueryGrid uses the network as efficiently as possible. QueryGrid leverages Hadoop’s processing engines for those queries where pushing down parts of query execution to Hadoop will reduce network data transfer, and for cases where this is not possible, can transfer data between Hadoop and Teradata at enormously high rates.

However, perhaps the coolest part of QueryGrid is actually what happens after Hadoop has finished with its part of query processing and the query is completed in Teradata. In general, it is impossible to predict the statistical profile of the data being sent to Teradata after initial processing in Hadoop. For our example above, it is unknown in advance whether 5 customers or 100,000 customers encountered this web server bug, and it is unknown if the same customers experienced the same bug repeatedly, or whether the bug was encountered randomly across the customer base. This problem is known as the predicate selectivity estimation problem in the database systems research community, and while it is notoriously difficult to solve in any database system, the complexity of having data spread over multiple locations without a centralized catalog renders this problem an order of magnitude more difficult.

Teradata QueryGrid therefore actively calculates statistics on data as it is sent from Hadoop to Teradata, and incrementally plans the query once it has a clear perspective of the statistical profile of the received data. This greatly improves Teradata’s ability to optimize the query, choosing the right query processing plan for the particular dataset set it ends up processing. This is an extremely important feature. At the end of the day, the success of any Big Data query processing platform is extremely dependent on the query optimizer. By optimizing and dynamically reoptimizing queries, Teradata QueryGrid overcomes a major shortfall of traditional federated systems.

QueryGrid is thus not your traditional DBMS-Hadoop connector. By pushing down query processing to Hadoop’s increasingly advanced query processing engines, it overcomes the network bottleneck of the early DBMS-Hadoop connectors. By enabling SQL queries to span both DBMS-local and Hadoop-local data (all managed with Teradata’s permissions and access control framework), far more advanced query analysis becomes possible. And with a dynamic optimization framework, query performance for multi-system queries is significantly improved.

It’s important to note that the same technology used to connect Teradata and Hadoop, can be used to connect Teradata with many other systems (e.g. MongoDB, Teradata, Aster, etc.). The basic principles of query pushdown, multi-system SQL, and dynamic optimization provide an excellent foundation upon which a flexible and powerful data processing platform can be created.

===============================================================================================================================================

Daniel Abadi is an Associate Professor at Yale University, founder of Hadapt, and a Teradata employee following the recent acquisition. He does research primarily in database system architecture and implementation. He received a Ph.D. from MIT and a M.Phil from Cambridge. He is best known for his research in column-store database systems (the C-Store project, which was commercialized by Vertica), high performance transactional systems (the H-Store project, commercialized by VoltDB), and Hadapt (acquired by Teradata). http://twitter.com/#!/daniel_abadi.

Tags:

QueryGrid Hadoop Aster MongoDB

Ignore ancestor settings:

Channel:

database

Apply supersede status to children:

↧

Controlling the Mix: What’s New in Teradata Active Systems Management for TD15.0

February 5, 2015, 2:29 pm

≫ Next: DBQL - the Ultimate Query Tracker - Are You Up-to-Date?

≪ Previous: Teradata QueryGrid and Adaptive Optimization

Course Number:

52629

Training URL:

http://www.teradata.com/t/ten/

Training Format:

Recorded webcast

Workload Management is a critical component of balancing application demands and ensuring consistent application performance ...

This session covers Teradata Active System Management (TASM) features introduced in Teradata 14.10 and Teradata 15.0. TASM continues to evolve to increasingly higher levels of automation, usability and continues to participate in the Unified Data Architecture (UDA). Come hear what new TASM features are available and how you can best leverage these new features to control your workload mix.

Presenters:
Doug Brown - Teradata Corporation
Brian Mitchell - Teradata Corporation

Price:

$195

Credit Hours:

Channel:

Tags:

↧

DBQL - the Ultimate Query Tracker - Are You Up-to-Date?

February 11, 2015, 3:26 pm

≫ Next: Using IOTAs to Determine the I/O Utilization of Your Workloads

≪ Previous: Controlling the Mix: What’s New in Teradata Active Systems Management for TD15.0

Course Number:

52912

Training URL:

http://www.teradata.com/t/ten/

Training Format:

Recorded webcast

Have you ever wished for a magic wand that could quickly point out the missing, stale, and unused statistics for you?

Wouldn't it be nice if something could dynamically collect and drop stats; Database Query Log (DBQL) is essential for bringing the new AutoStats feature to life in Teradata 14.10; It also enables new capability to log utility job statistics in Teradata 15.0; DBQL is the key tool to drive query performance analysis and workload management setup and adjustments. We will discuss the basics of DBQL, review DBQL enhancements, discuss query performance metrics and review how to trend performance data. We’ll help you understand how to prioritize optimization efforts and ensure resources are assigned to the proper workload definitions. To get the most out of your DBQL data, make sure you are completely informed and up-to-date to ensure faster problem solving and performance optimization.

Presenters: Barbara Christjohn & Steven Tagai - Teradata Corporation

Price:

$195

Credit Hours:

Channel:

Tags:

↧

Using IOTAs to Determine the I/O Utilization of Your Workloads

February 12, 2015, 2:52 pm

≫ Next: Implementing Non-Tactical Queries in a Mixed Workload Environment

≪ Previous: DBQL - the Ultimate Query Tracker - Are You Up-to-Date?

I/O Token Allocations (IOTAs) are used in Teradata Database 14.10 with SLES 11 systems to enforce the I/O side of Workload Management Capacity on Demand (WM COD) and other SLES 11 hard limits.

This posting describes what IOTAs are and what role they play in enforcing I/O limits with WM COD. In addition, and maybe more importantly, we’ll be looking at how IOTA numbers being reported in ResUsage can help you assess the I/O utilization of your workloads, and the I/O utilization of your entire platform, whether or not you have WM COD enabled.

Background on IOTAs and WM COD

The I/O side of WM COD is enforced at the individual disk drive level. Incoming I/O requests are prioritized and their cost is calculated before they are issued. Costing takes place in the TDMeter module shown below, which itself is part of a larger piece of code that resides on each disk drive called TDSched .

I/O costing is based on the physical bandwidth that is expected to be transferred when the I/O is executed. Different types of I/O (read vs. write) and differently-sized data blocks have different costs assigned to them.

In order to enforce COD at the prescribed level, the TDMeter module accumulates and disperses tokens. Each token represents unit of I/O bandwidth and a set number of these tokens are made available to the disk drive at intervals of a few milliseconds. Only when enough tokens have been accumulated to match the cost a particular I/O request, will that request be executed.

When WM COD is enabled, tokens (or groups of tokens referred to as IOTAs) are accumulated on a periodic time basis into a WM COD bucket that exists for each drive. Once there are enough tokens in the bucket for an I/O, the required tokens are removed from the bucket as payment for the I/O, and the I/O is performed.

The number of tokens issued per interval is determined based on the WM COD setting. Tokens are only issued to a disk drive if there is actually an I/O that is ready to run. Consequently, the number of tokens does not build up over time in the absence of I/O demand. You use them at the time they are issued or you lose them.

When there is an I/O waiting to run it will wait until enough tokens have arrived in the bucket to allow that I/O to be released. When that happens, the number of tokens that reflect the expected bandwidth of the I/O are taken out of the bucket, and any remaining tokens are discarded, unless there are additional I/Os waiting to run on that device.

If the number of tokens distributed to the bucket on a given device has been exhausted, and there are still I/Os waiting to run, those I/O requests will have to wait until the next interval for more tokens to be provided. Tokens are issued in such a way that total I/O bandwidth from that disk drive will conform to the COD settings that have been established.

IOTAs and ResUsageSPS

Whether or not you have WM COD enabled, tokens are available for viewing in some of the ResUsage tables. And they can provide a very helpful function when it comes to assessing I/O usage.

The two most useful fields related to tokens that appear in ResUsageSPS are FullPotentialIota and UsedIota. When you manipulate them appropriately you can derive the percent of the total system I/O bandwidth each workload on a node consumed during that logging interval.

The FullPotentialIota field reports the maximum IOTAs available across all disks on the node and represents the total I/O bandwidth across all drives on that node. Since it is a node-level metric, be mindful when you look at ResUsageSPS output that the FullPotentialIota column will be reported as the same number for each workload on that node, similar to NodeID. It is actually a single value, but is repeated in each SPS table row to make it easy to perform calculations.

This single-node excerpt of output from ResUsageSPS was captured during a test where four workloads were active, one in each of the four Timeshare access levels. Notice that FullPotentialIota is the same value for all the workloads on the node.

UsedIota, another field shown above, reports the number of IOTAs consumed by the four different workloads in a single logging interval. To translate these two IOTA fields into a percent of I/O utilization for each workload, divide Used by Potential: UsedIota / FullPotentialIota.

The next graphic illustrates the result of that calculation

In this example, four workloads were actively performing very I/O-intensive work. Each of the four SLES 11 Timeshare access levels had a single workload active. The four access levels have a priority difference of 1:2:4:8, which correlates to the differences in the I/O utilization among the four workloads reported in the output above.

The ResSpsView does this calculation for you, in a field named Used_FullPotential_ByWd.

IOTAs and ResUsageSPMA

ResUsageSPMA is just as useful as ResUsageSPS is when it comes to IOTA data, with or without WM COD.

You can derive system I/O utilization numbers in a similar way using the ResUsageSPMA table or its views. In 14.10 the IOTA fields are held in Spare fields within the SPMA table. In the ResSpmaView view, the spare fields have been renamed in this way:

Spare07 AS FullPotentialIota,

Spare08 AS CodPotentialIota,

Spare09 AS UsedIota,

To assess system I/O utilization, you can use this calculation: UsedIota / FullPotentialIota. Below is some output from ResSpmaView that shows this calculation.

IOTAs in ResUsage With WM COD Enabled

There is a third IOTA-related metric that appears in both ResUsage tables, CodPotentialIota. This field represents the tokens available for use when COD is enabled. The tokens that get reported in this column are used for internal purposes in order to control the I/O requests in a way that honors the WM COD setting. This metric does not lend itself to general purpose reporting, nor it is easy for someone without special internal knowledge to gain insights from it. For general purpose reporting, it is recommended that the focus be on the Full and Used IOTA fields.

CodPotentialIota values may appear somewhat confusing to the casual observer. With WM COD enabled, token counts are accumulated only when an I/O is pending, so they are demand-driven. In addition, any unused IOTAs are discarded immediately if no further I/O’s are waiting to execute on the drive, so not all tokens that were initially made available to a drive are reported.

CodPotentialIota only plays a role when COD is enabled. As a result, CodPotentialIota will show up in ResUsage output as being equal to UsedIota if there is no COD on the system. This can be seen in the ResUsageSPS output below, collected when there were four active Timeshare workloads on a system with no COD.

100% WM COD

Because the ResSpsView view was used to produce this output, the calculated field Used_FullPotential_ByWd field is provided. That tells you the I/O utilization for each of the four workloads. Notice that on each node the individual workload I/O utilization numbers add up to almost 100% This may not always be the case, but here it is a consequence of the very I/O-intensive work that was being executed in this test.

Contrast that usage example to one with 50% WM COD defined on the platform. The same workloads are active and the same work is being executed.

50% WM COD

In this case the I/O utilization within each workload is approximately cut in half, and the sum of the I/O utilization values is almost 50%, reflecting the WM COD setting. This validates that the disk sub-system is appropriately limiting I/O requests so that only 50% of the platform’s disk bandwidth will be used when 50% WM COD has been enabled.

Conclusion

IOTAS were designed as an internal mechanism to enforce the I/O side of capacity on demand and other SLES 11 hard limits within the disk subsystem. However, some of the IOTA fields that appear in ResUsageSPS and ResUsageSPMA provide an easy and accurate method of determining workload I/O utilization, or system level I/O usage at the node level.

Using the simple formula of dividing used IOTAs by potential IOTAs, you now have a new tool in understanding your I/O usage, no matter how complex or how simple your workload.

Tags:

Ignore ancestor settings:

Apply supersede status to children:

↧

Implementing Non-Tactical Queries in a Mixed Workload Environment

February 16, 2015, 5:35 pm

≫ Next: Dealing with Natural Data Skew

≪ Previous: Using IOTAs to Determine the I/O Utilization of Your Workloads

Course Number:

53293

Training URL:

http://www.teradata.com/t/ten/

Training Format:

Recorded webcast

This presentation will focus on techniques that have proven successful when implementing non-tactical query applications.

Such topics as optimizing few AMP and all AMP accesses utilizing various index choices will be addressed. We’ll also discuss other performance techniques.

Key points:

Understand the characteristics of non-tactical queries
Understand how to apply techniques for efficient all Amp and few AMP access
Learn key implementation techniques

Audience targeted: Experienced Teradata User, Business or IT Professional or DBA, Application Designer

Presenter: Ray Newell, Learning Consultant IV - Teradata Corporation

Audience:

Experienced Teradata User, Business or IT Professional or DBA, Application Designer

Price:

$195

Credit Hours:

Channel:

database

Tags:

non-tactical queries

mixed workload

performance techniques

few amp accesses

all amp accesses

↧

Dealing with Natural Data Skew

March 2, 2015, 8:38 am

≫ Next: Explaining the EXPLAIN for Business Users

≪ Previous: Implementing Non-Tactical Queries in a Mixed Workload Environment

Course Number:

48984

Training URL:

http://www.teradata.com/t/ten/

Training Format:

Recorded webcast

Everyone is taught how to collect statistics in order to make Teradata queries run well but what if you've collected all the statistics that you can and you STILL have bad performance?

In this presentation, we explain a number of SQL and design techniques to make severely skewed queries run dramatically better that have been proven to work in some of the largest production environments in the world. This is a fast-paced session geared to experienced practitioners.

Note: Updated and re-recorded in February 2015

Presenter:
Steve Molini, Principal Consultant - Teradata Corporation

Price:

$195

Credit Hours:

Channel:

Tags:

↧

Explaining the EXPLAIN for Business Users

March 9, 2015, 6:33 pm

≫ Next: Workload Management Capacity on Demand and other SLES11 Hard Limits

≪ Previous: Dealing with Natural Data Skew

Course Number:

38085

Training URL:

http://www.teradata.com/t/ten/

Training Format:

Recorded webcast

Whether you write your own code or it is generated for you, this is a topic that is essential for business users who run SQL queries.

This session examines the English text that appears when an EXPLAIN is requested and why it is useful in your environment. You will learn what the Optimizer EXPLAIN Facility is, where the text is generated, how to read the EXPLAIN output, and what spool sizes and estimated times really represent. We will explore topics such as confidence levels, table scans, join preparation and processing including Merge, Exclusion and Product joins and multi-table joins, and the newest optimization features. At the end of this session you will be able to understand essential elements of the EXPLAIN and be able to use it to gain more value from your Teradata system and your queries.

Don’t miss this opportunity to learn about the value and usefulness of EXPLAIN in any Teradata environment.

Through this course, you will learn about the EXPLAIN facility, where it is generated, key phrases, and how to read the output.

Key Points:

Inside a Teradata Node and what the VProcs do for your query
The EXPLAIN Facility syntax and what it means
EXPLAIN Examples
Teradata Visual EXPLAIN

Updated and re-recorded in 2015.

Presenter: Alison Torres, Director, Teradata Labs Customer Briefing Team – Teradata Corporatio

Audience:

Business users, especially those who write their own SQL

Price:

$195

Credit Hours:

Channel:

Tags:

↧

Workload Management Capacity on Demand and other SLES11 Hard Limits

March 16, 2015, 7:42 pm

≫ Next: Optimizing Disk IO and Memory for Big Data Vector Analysis

≪ Previous: Explaining the EXPLAIN for Business Users

Course Number:

52773

Training URL:

http://www.teradata.com/t/ten/

Training Format:

Recorded webcast

This session will explore the new options available for setting hard limits at different levels in the SLES11 priority hierarchy.

The key focus of this technical implementation session will be on Workload Management Capacity on Demand (WM COD), which limits resource consumption of CPU and I/O at the database level. In addition, virtual partition hard limits and workload hard limits will be covered. Learn how all three levels of hard limits in SLES11 interact when applied in combination, and how to set them within Viewpoint Workload Designer. The session will also cover tradeoffs and considerations in using SLES11 hard limits.

WM COD is available for SLES11 Teradata sites as a means of reducing the resource capacity for an entire database, usually after a hardware expansion or following a new installation. Please note that configuration guidelines, contract-based procedures or sales agreements related to WM COD will not be included in this session.

Presenters:
Carrie Ballinger - Teradata Corporation
Brian Mitchell - Teradata Corporation

Audience:

Database Administrator, Designer/Architect, Application Developer

Price:

$195

Credit Hours:

Channel:

Tags:

↧

Optimizing Disk IO and Memory for Big Data Vector Analysis

March 25, 2015, 7:48 am

≫ Next: Hybrid Row/Column-stores: A General and Flexible Approach

≪ Previous: Workload Management Capacity on Demand and other SLES11 Hard Limits

Short teaser:

Teradata 15.10 exploits Intel CPU SIMD instructions

Cover Image:

Simple queries get advanced optimizations

Optimizing Disk IO and Memory for Big Data Vector Analysis

Daniel Abadi Nov 11, 2014 This is a guest post by Dr. Abadi http://blogs.teradata.com/data-points/optimizing-disk-io-and-memory-for-big-data-vector-analysis/

At Yale, every spring I usually teach an advanced database systems implementation class that covers both traditional and more modern database system architectures. I often like to test my students with questions like the following:

Let’s say the following SQL query is issued to a data warehouse for a retail store (the query requests the total revenue generated by a particular store on October 20^th, 2014):

SELECT SUM(transaction_amount)
FROM transactions
WHERE store_id = 37
AND date='2014-10-20'

Assume that we either do not have indexes on store_id and date, or the query optimizer chooses not to use them to process this query (e.g. because the predicates are not restrictive enough). The query processor thus executes this query by scanning every record in the transactions table, performs two predicate checks (one to see if store_id is 37, and one to see if date is 2014-10-20), and if both checks pass, extracts the transaction amount and adds it to a cumulative sum.

What is the performance bottleneck of this query?

I obviously have not given the students enough information about the hardware configuration of the machine (or cluster of machines) on which this query is executed, nor about the software implementation of the database system, for students to be able to answer the above question definitively. But you can tell a lot about how much students understand the principles of database system architecture by how they go about describing what the answer depends on.

Before describing some potential bottlenecks, let’s eliminate one possibility: network bandwidth between query processing servers. It doesn’t matter whether the transactions table is small and fits easily on a single machine or is very large and is partitioned across thousands of machines --- this query is almost perfectly partitionable --- each machine can calculate its own cumulative sum for the partition of transaction records located locally, and the only communication across query processing servers that needs to happen is at the very end of the query, when each of the subtotals for each partition are added together.

However, several other bottlenecks may exist. To understand where they may come from, let’s examine the path of a single record from the transactions table through the database query processing engine. Let’s assume for now that the transactions table is larger than the size of memory, and the record we will track starts off on stable storage (e.g. on magnetic disk or SSD).

First, this record needs to be read from stable storage into the memory of the server that will process this record. Second, this record needs to be transferred from memory to the local cache/registers of the processing core that will end up processing this record. Third (finally), the processing core needs to perform two predicate evaluations on the record (on store_id and date), extract (if necessary) the transaction amount, and add it to the cumulative sum.

Each of these three steps will have to be performed on each record of the transactions table, and each may become a bottleneck for query processing. The first step is a potential “disk bandwidth bottleneck” --- the rate at which data can be transferred from stable storage to memory is simply too slow to keep up with subsequent parts of query processing. The second step is a potential “memory bandwidth bottleneck”, and the third step is a potential “CPU processing bottleneck”.

For “big data” datasets where the size of data is significantly larger than the size of memory, the most common bottleneck is disk bandwidth. Disk bandwidth of the highest-end disks remain on the order of hundreds of megabytes per second, while memory bandwidth is usually at least an order of magnitude faster. Furthermore, very little work is required of the CPU per record (just two predicate evaluations and a sum) --- database queries tend to be far less CPU-intensive than other domains (such as graphics rendering or scientific simulations). Hence, step 1 is often a bottleneck.

There are three techniques commonly used to eliminate the disk-bandwidth bottleneck in database systems. First, you can increase the memory available to store data on each machine, and thus decrease the amount of data that must be read from disk at query time. Second, you can use storage arrays of large numbers of disks and a fat pipe to transfer data from this array of disks to the processing server. In other words, even if each disk can only transfer data at 100MB/sec, 20 disks combined can transfer data at an aggregate rate of 2GB/sec. Third, you can leverage software to increase the efficiency of data transfer. For example, data compression allows for an effectively larger number of records to be packed per bit transferred. Column-store database systems allow only the columns accessed by the query (in our example --- store_id, date, and transaction_amount) to be read off of disk (so disk bandwidth need not be wasted reading in irrelevant data for a query). With some amount of effort, it is usually possible to eliminate the disk-bandwidth bottleneck through some combination of the three techniques described above.

Figure 1. Optimizing Disk IO, Memory, and CPU Usage

The next bottleneck to present itself is usually memory bandwidth. Unlike the disk bandwidth bottleneck, the memory bandwidth bottleneck cannot typically be solved via modifying the hardware configuration. Instead, the database system software needs to be intelligent about how to efficiently use memory bandwidth, so that every drop of memory bandwidth is utilized to the maximum possible extent. The two main techniques used for this are the same two software techniques mentioned above: compression and column-orientation.

In order for compression to alleviate the memory bandwidth bottleneck, data must remain compressed in memory. Many database systems decompress data in the buffer pool after it is brought in from disk --- this simplifies the system code, and makes data modifications easier to handle. Unfortunately this exacerbates the memory bandwidth bottleneck, as the full, uncompressed data is sent to the CPU whenever it needs to be processed. In contrast, well optimized systems (especially systems optimized for high performance data analysis) will keep data compressed in memory, and either operate directly on the compressed data, or only decompress in the CPU immediately prior to processing.

Column-oriented page layout is a particularly important technique to eliminate the memory bandwidth bottleneck. Assume that store_id, date, and transaction_amount each take up 4 bytes per record (12 bytes), and that each record in the transactions table is 300 bytes. Furthermore, assume a CPU core cache line is 128 bytes. In the best case scenario for a traditional row-oriented data layout, store_id, date, and transaction_amount are all in the same 128-byte subset of the 300-byte record, and the 128-byte cache line containing these three attributes is sent from memory to the cache of the processing core that will process this record. In such a scenario 12 / 128 = 9.4% of the data transferred from memory to cache will be accessed by the CPU. In the worst case scenario, these three attributes are located within different cache lines of the 300-byte record, and as a result, only 12/300 = 4% of the data transferred from memory to cache will be accessed by the CPU.

In contrast, if data is laid out on a page column-by-column rather than row-by-row, each cache line consists entirely of data from a single attribute of the table. Only cache lines corresponding to the three relevant attributes are sent from memory to the processing core cache, and hence nearly 100% of the data transferred from memory to cache will be accessed by the CPU. Hence, column-orientation usually improves the memory bandwidth efficiency by an order of magnitude, significantly alleviating the memory-bandwidth bottleneck.

In summary, only with effort and intelligence (especially around compression and column-orientation) can the first two bottlenecks --- disk-bandwidth and memory-bandwidth --- be eliminated, and the bottleneck shifted to the CPU.

To alleviate the CPU bottleneck, techniques must be used to maximize the efficiency of the processor. One clever way to accomplish this is to leverage the SIMD instruction set on modern CPUs such as the Intel Haswell using AVX2 vector processing. To understand how SIMD works, let’s first examine how the predicate of our example query (store_id = 20) would be evaluated normally (without leveraging SIMD instructions). For each record, the store_id would be extracted from the 128-byte cache line and sent to the 128-bit CPU register to do the comparison with the value “20”. This process happens sequentially --- each store_id is extracted and sent to the CPU register for comparison to “20” one-after-the-other.

Figure 2. Single CPU addition versus four with 128-bit vector registers

Note that a 128-bit register can actually hold 128/8 = 16 bytes of data. Since each store_id takes up 4 bytes, you can actually store 4 store_ids in a single register. In such a situation where you can fit a “vector” of values within a single CPU register, the SIMD instruction set allows a programmer to perform the same exact operation in parallel to each element of the vector. In other words, it is possible to compare 4 different store_ids to 20 in parallel in a single CPU step. This effectively quadruples the CPU efficiency.

We will perhaps go into more details of how to leverage SIMD features for database operations in future blog posts. The main takeaway that I want readers to get from this post is that leveraging SIMD instructions on modern CPUs (also known as “vectorized processing”) is entirely a CPU optimization. For such an optimization to make a difference in actual query performance, the disk-bandwidth and memory-bandwidth bottlenecks have to be removed first. Teradata Database builds cache line friendly columnar table structures in memory to exploit the Intel AVX2 vector processing.

Figure 3. The Pipeline to L1 Cache and CPU Cores

Teradata recently announced support for vectorized processing. While this is great news by itself, what is more interesting to me is the announcement that vectorized processing improves performance on real-life, actual Teradata queries by 10-20%. Using columnar arrays in memory, individual steps within a query run up to 50% faster. While a straightforward reading of these performance claims would seem to indicate that this improved performance is a testament to the high quality of this particular new feature, I hope that someone who has read this post carefully will understand that it means much more than that: it’s a testament to the high quality of the rest of the Teradata product to remove the disk-bandwidth and memory-bandwidth bottlenecks so that Intel CPU optimizations can actually make a bottom-line difference. It is an indication that Teradata has a legitimate column-store --- a column-store that improves the efficiency of disk transfer by only transferring the relevant columns, improves the efficiency of memory transfer by avoiding cache pollution of irrelevant attributes, and keeps data in columns inside the CPU register for parallel processing at the hardware level.

_________________________________________________________________________

Tags:

vector SIMD Haswell

Ignore ancestor settings:

Channel:

database

Apply supersede status to children:

↧

Hybrid Row/Column-stores: A General and Flexible Approach

March 25, 2015, 7:50 am

≫ Next: Inside a Teradata Node

≪ Previous: Optimizing Disk IO and Memory for Big Data Vector Analysis

Short teaser:

Why both row and columnar storage together?

Cover Image:

Guest post by Dr. Daniel Abadi, March 6, 2015

During a recent meeting with a post-doc in my lab at Yale, he reminded me that this summer will mark the 10-year anniversary of the publication of C-Store in VLDB 2005. C-Store was by no means the first ever column-store database system (the column-store idea has been around since the 70s --- nearly as long as relational database systems), but it was quite possibly the first proposed architecture of a column-store designed for petabyte-scale data analysis. The C-Store paper has been extremely influential, with close to every major database vendor developing column-oriented extensions to their core database product in the past 10 years, with most of them citing C-Store (along with other influential systems) in their corresponding research white-papers about their column-oriented features.

Given my history with the C-Store project, I surprised a lot of people when some of my subsequent projects such as HadoopDB/Hadapt did not start with a column-oriented storage system from the beginning. For example, industry analyst Curt Monash repeatedly made fun of me on this topic (see, e.g. http://www.dbms2.com/2012/10/16/hadapt-version-2/).

In truth, my love and passion for column-stores has not diminished since 2005. I still believe that every analytical database system should have a column-oriented storage option. However, it is naïve to think that column-oriented storage is always the right solution. For some workloads --- especially those that scan most rows of a table but only a small subset of the columns, column-stores are clearly preferable. On the other hand, there any many workloads that contain very selective predicates and require access to the entire tuple for those rows which pass the predicate. For such workloads, row-stores are clearly preferable.

There is thus general consensus in the database industry that a hybrid approach is needed. A database system should have both column-oriented and row-oriented storage options, and the optimal storage can be utilized depending on the expected workload.

Despite this consensus around the general idea of the need for a hybrid approach, there is a glaring lack of consensus about how to implement the hybrid approach. There have been many different proposals for how to build hybrid row/column-oriented database systems in the research and commercial literature. A sample of such proposals include:

(1) A fractured mirrors approach where the same data is replicated twice --- once in a column-oriented storage layer and once in a row-oriented storage layer. For any particular query, data is extracted from the optimal storage layer for that query, and processed by the execution engine.

(2) A column-oriented simulation within a row-store. Let’s say table X contains n columns. X gets replaced by n new tables, where each new table contains two columns --- (1) a row-identifier column and (2) the column values for one of the n columns in the original table. Queries are processed by joining together on the fly the particular set of these two-column tables that correspond to the columns accessed by that query.

(3) A “PAX” approach where each page/block of data contains data for all columns of a table, but data is stored column-by-column within the page/block.

(4) A column-oriented index approach where the base data is stored in a row-store, but column-oriented storage and execution can be achieved through the use of indexes.

(5) A table-oriented hybrid approach where a database administrator (DBA) is given a choice to store each table row-by-row or column-by-column, and the DBA makes a decision based on how they expect the tables to be used.

In the rest of this post, I will overview Teradata’s elegant hybrid row/column-store design and attempt to explain why I believe it is more flexible than the above mentioned approaches.

The flexibility of Teradata’s approach is characterized by three main contributions.

1: Teradata views the row-store vs. column-store debate as two extremes in a more general storage option space.

The row-store extreme stores each row continuously on storage and the column-store extreme stores each column continuously on storage. In other words, row-stores maintain locality of horizontal access of a table, and column-stores maintain locality of vertical access of table. In general however, the optimal access-locality could be on a rectangular region of a table.

Figure 1: Row and Column Stores (uncompressed)

To understand this idea, take the following example. Many workloads have frequent predicates on date attributes. By partitioning the rows of a table according to date (e.g. one partition per day, week, month, quarter, or year), those queries that contain predicates on date can be accelerated by eliminating all partitions corresponding to dates outside the range of the query, thereby efficiently utilizing I/O to read in data from the table from only those partitions that have data matching the requested data range. However, different queries may analyze different table attributes for a given date range. For example, one query may examine the total revenue brought in per store in the last quarter, while another query may examine the most popular pairs of widgets bought together in each product category in the last quarter. The optimal storage layout for such queries would be to have store and revenue columns stored together in the same partition, and to have product and product category columns stored together in the same partition. Therefore we want both column-partitions (store and revenue in one partition and product and product category in a different partition) and row-partitions (by date).

This arbitrary partitioning of a table by both rows and columns results in a set of rectangular partitions, each partition containing a subset of rows and columns from the original table. This is far more flexible than a “pure” column-store that enforces that each column be stored in a different physical or virtual partition.

Note that allowing arbitrary rectangular partitioning of a table is a more general approach than a pure column-store or a pure row-store. A column-store is simply a special type of rectangular partitioning where each partition is a long, narrow rectangle around a single column of data. Row-oriented storage can also be simulated with special types of rectangles. Therefore, by supporting arbitrary rectangular partitioning, Teradata is able to support “pure” column-oriented storage, “pure” row-oriented storage, and many other types of storage between these two extremes.

2: Teradata can physically store each rectangular partition in “row-format” or “column-format”.

One oft-cited advantage of column-stores is that for columns containing fixed-width values, the entire column can be represented as a single array of values. The row id for any particular element in the array can be determined directly by the index of the element within the array. Accessing a column in an array-format can lead to significant performance benefits, including reducing I/O and leveraging the SIMD instruction set on modern CPUs, since expression or predicate evaluation can occur in parallel on multiple array elements at once.

Another oft-cited advantage of column-stores (especially within my own research --- see e.g. http://db.csail.mit.edu/projects/cstore/abadisigmod06.pdf ) is that column-stores compress data much better than row-stores because there is more self-similarity (lower entropy) of data within a column than across columns, since each value within a column is drawn from the same attribute domain. Furthermore, it is not uncommon to see the same value repeat multiple times consecutively within a column, in which case the column can be compressed using run-length encoding --- a particularly useful type of compression since it can both result in high compression ratios and also is trivial to operate on directly, without requiring decompression of the data.

Both of these advantages of column-stores are supported in Teradata when the column-format is used for storage within a partition. In particular, multiple values of a column (or a small group of columns) are stored continuously in an array within a Teradata data structure called a “container”. Each container comes with a header indicating the row identifier of the first value within the container, and the row identifiers of every other value in the container can be deduced by adding their relative position within the container to the row identifier of the first value. Each container is automatically compressed using the optimal column-oriented compression format for that data, including run-length encoding the data when possible.

Figure 2: Column-format storage using containers.

However, one disadvantage of not physically storing the row identifier next to each value is that extraction of a value given a row identifier requires more work, since additional calculations must be performed to extract the correct value from the container. In some cases, these additional calculations involve just positional offsetting; however, in some cases, the compressed bits of the container have to be scanned in order to extract the correct value. Therefore Teradata also supports traditional row-format storage within each partition, where the row identifier is explicitly stored alongside any column values associated with that row. When partitions are stored using this “row format”, Teradata’s traditional mechanisms for quickly finding a row given a row identifier can be leveraged.

In general, when the rectangular partitioning scheme results in wide rectangles, row format storage is recommended, since the overhead of storing the row id with each row is amortized across the breadth of the row, and the benefits of array-oriented iteration through the data are minimal. But when the partitioning scheme results in narrow rectangles, column-format storage is recommended, in order to get the most out of column-oriented array iteration and compression. Either way --- having a choice between row format and column format for each partition further improves the flexibility of Teradata’s row/columnar hybrid scheme.

3: Teradata enables traditional primary indexing for quick row-access even when column-oriented partitioning is used.

Many column-stores do not support primary indexes due to the complexity involved in moving around records as a result of new inserts into the index. In contrast, Teradata Database 15.10 supports two types of primary indexing when a table has been partitioned to AMPs (logical servers) by the hash value of the primary index attribute. The first, called CPPI, maintains all row and column partitions on an AMP sorted by the hash value of the primary index attribute. These hash values are stored within the row identifier for the record, which enables each column partition to independently maintain the same sort order without explicitly communicating with each other. Since the data is sorted by the hash of the primary index attribute, finding particular records for a given value of the primary index attribute is extremely fast. The second, called CPPA, does not sort by the hash of the primary index attribute --- therefore the AMP that contains a particular record can be quickly identified given a value of the primary index attribute. However, further searching is necessary within the AMP to find the particular record. This searching is limited to the non-eliminated, nonempty column and row partitions. Finding a particular record given a row id for both CPPI and CPPA is extremely fast since, in either case, the records are in row id order.

Combined, these three features make Teradata’s hybrid solution to the row-store vs. column-store tradeoff extremely general and flexible. In fact, it’s possible to argue that there does not exist a more flexible hybrid solution from a major vendor on the market. Teradata has also developed significant flexibility inside its execution engine --- adapting to column-format vs. row-format input automatically, and using optimal query execution methods depending on the format-type that a particular query reads from.

====================================================================================

Tags:

columnar row

Ignore ancestor settings:

Channel:

database

Apply supersede status to children:

↧

Inside a Teradata Node

April 23, 2015, 12:30 pm

≫ Next: The Basics of Using JSON Data Types

≪ Previous: Hybrid Row/Column-stores: A General and Flexible Approach

Course Number:

45262

Training URL:

http://www.teradata.com/t/ten/

Training Format:

Recorded webcast

Alternate Title: Teradata 101. For anyone new to Teradata, this session is to acquaint you with the Teradata technology.

We will explore Teradata’s parallel environment; take a look inside a Node at Parsing Engines(PEs), the BYNET, and Access Module Processors (AMPs). We'll also discuss how the File System works and what happens when you want to add Nodes – Parallelism, Nodes, Virtual Processors.

Presenter: Alison Torres, Director, Teradata Labs Customer Briefing Team

Price:

$195

Credit Hours:

Channel:

Tags:

↧

The Basics of Using JSON Data Types

April 30, 2015, 8:10 am

≫ Next: Tables without a Primary Index

≪ Previous: Inside a Teradata Node

Course Number:

52483

Training URL:

http://www.teradata.com/t/ten/

Training Format:

Recorded webcast

This presentation provides the basics of using the JSON Data type including defining and using columns defined with them; using various functions and methods created for their usage; basic shredding and publishing functionality and usage with Array data.

Presenter: David Micheletto

Price:

$195

Credit Hours:

Channel:

database

Tags:

JSON

json_data_types

↧

Tables without a Primary Index

May 1, 2015, 1:24 pm

≫ Next: Priority Scheduler in SLES 11 for Teradata Database 15.0

≪ Previous: The Basics of Using JSON Data Types

Course Number:

45842

Training URL:

http://www.teradata.com/t/ten/

Training Format:

Recorded webcast

This session with provide information on the basic concepts of No Primary Index (NoPI) tables.

What are the advantages of using a NoPI table and what are some of the common uses for a NoPI table? How do you define a NoPI? How are rows distributed in the system with SQL requests and with the FastLoad utility for tables without a primary index? This session will include information on how to define and use an NoPI table (a new Teradata enhancement). Get the answers and more details about this performance-enhancing capability during this session. The mechanics of how NoPI tables are implemented as well as the features and limitations of this feature are described in this session. This topic is of particular interest to physical database designer and database performance analysts.

Updated and re-recorded in April 2015.

Presenter: Larry Carter, Senior Learning Consultant - Teradata

Audience:

Data Warehouse Administrators, Data Warehouse Architect/ Designers, and Data Warehouse Technical Specialists

Price:

$195

Credit Hours:

Channel:

database

Tags:

NoPI table

nopi

↧

Priority Scheduler in SLES 11 for Teradata Database 15.0

May 1, 2015, 5:18 pm

≫ Next: Teradata Application Performance Tuning: Tips and Techniques

≪ Previous: Tables without a Primary Index

Course Number:

53751

Training URL:

http://www.teradata.com/t/ten/

Training Format:

Recorded webcast

Join this in-depth presentation that covers the basic functionality of the SLES 11 Teradata Priority Scheduler.

The presentation will step you through how the feature works, how to assign priorities to workloads, and what the difference options and choices are. It provides usability pointers and recommendations to help you get started, or tweak what you already have. You will learn about how resources are divided up, how they can be optionally limited, and the importance of global weights.

Presenter: Carrie Ballinger, Sr. Technical Consultant - Teradata Corporation

Audience:

Data Warehouse Administrators, Data Warehouse Architect/Designers

Price:

$195

Credit Hours:

Channel:

Tags:

↧

Teradata Application Performance Tuning: Tips and Techniques

May 4, 2015, 6:14 pm

≫ Next: High Performance Multi-System Analytics using Teradata QueryGrid

≪ Previous: Priority Scheduler in SLES 11 for Teradata Database 15.0

Course Number:

52894

Training URL:

http://www.teradata.com/t/ten/

Training Format:

Recorded webcast

This session will discuss Application Query Tuning on the Teradata platform.

We’ll go through a step by step process designed to help you approach query tuning in the most effective manner. Specific tuning tips will be covered to help jumpstart your tuning success.

Presenter: Dan Fritz - Teradata Corporation

Price:

$195

Credit Hours:

Channel:

database

Tags:

Application Performance

dbql

tuning

↧