Quantcast
Channel: Teradata Developer Exchange - Database
Viewing all articles
Browse latest Browse all 174

Teradata Express Support for SQL H -- Install Instructions SQL H

$
0
0
Short teaser: 
Teradata Express Support for SQL H -- Resolve Hostname conflicts and other installation activities

The following instructions should help you set up and configure the Teradata SQL H capability with your Hadoop instance.

100   Resolve Hostname conflicts

Teradata SQL-H uses node hostname to resolve network addressing. However, typical systems configured by Teradata Staging will likely have conflicts/duplicate hostnames between Teradata nodes and Hadoop nodes. This conflict must be addressed, regardless if the connectivity is via customer/external LAN or via bynet/infiniband. This section describes the process to change the Hadoop node hostnames. This section may be skipped if there is no conflict in hostname between the Teradata and Hadoop nodes.

 

START OF SYSTEM OUTAGE. Note date and time: ___________________

 

100.1           Shutdown all Hadoop services. On the master node:

# hcli system stop

100.2           Change the hostname.

100.2.1    On the Hadoop master node, append the new hostname to each Hadoop node to the existing entry in the /etc/hosts file. Example:

Before: 39.64.8.2  byn002-8

After:  39.64.8.2  hdp002-8 byn002-8

Note it is mandatory that the new hostname must precede (to the left of) the original hostname.

100.2.2    Validate the new hostname is recognized. On the master node:

# hostname –f”

100.3           On the Hadoop Master node, run the following command:

# hcli support modifyHostname <oldname> <newname>

100.4           If Virtual IP addressing is used at the site, perform the necessary modifications for the new hostnames.

100.5           Make sure the Teradata system can resolve Hadoop node names (either through DNS or via /etc/hosts on the Teradata nodes).

100.6           Start Hadopp services. On the master node:

# hcli system start

 

END OF SYSTEM OUTAGE. Note date and time: ___________________

 

101   Add Hadoop IPAs to Teradata host file

Perform the following steps if Teradata hostnames and Hadoop hostnames are not conflicting and are not resolved through local DNS.

101.1           Save a copy of /etc/hosts file on all Teradata TPA nodes:

# cp /etc/hosts /etc/orig.hosts

101.2           Add Hadoop nodes to the /etc/hosts file of every Teradata TPA node. Example:

192.168.135.100  hdp002-8

192.168.135.101  hdp002-9

102   Hadoop Namenode configuration for Teradata proxy user setup

In order for Teradata SQL-H to work with aHadoop system, a Teradata super user must be configured on the hadoop namenode to be allowed to access HDFS from Teradata nodes on behalf of another Hadoop user in a secured way. The Teradata super user used for this setting is tdatuser. On the hadoop side, the following configurations are required in core-site.xml to add ‘tdatuser’ as trusted super user. The first property determines the file system groups that tdatuser is allowed to impersonate. The second property hadoop.proxyuser.tdatuser.hosts determines the hosts from where the tdatuser user is allowed to access the HDFS. If these configurations are not present, impersonation will not be allowed and Teradata queries will fail with a security error.

Add the following two properties to the hadoop namenode configuration file /etc/hadoop/conf/core-site.xml and change their values based on your Teradata/Hadoop environment setup, save the file and restart namenode..

 

<property>

  <name>hadoop.proxyuser.tdatuser.groups</name>

  <value>users</value>     

  <description>

          Allow the superuser tdatuser to impersonate any members of HDFS group(s). For example, ‘users’ is used as HDFS group that tdatuser is allowed to impersonate users belonged to this group.

  </description>

</property>

 

<property>

  <name>hadoop.proxyuser.tdatuser.hosts</name>

  <value>host1,host2</value>

  <description>

          The superuser can connect only from host1 and host2 to impersonate a user. Here host1 and host2 represents Teradata nodes. All nodes of the Teradata system need to be listed here in order for SQL-H query to be processed. It is recommended to use the IP addresses of the Teradata nodes.

   </description>

</property>

               

103   Grant user privileges

Certain privileges are required for the Teradata database user to run the setup successfully, and if needed, they can be granted by the admin user using the following statements:

GRANT EXECUTE PROCEDURE ON SQLJ TO [user] WITH GRANT OPTION;

GRANT CREATE EXTERNAL PROCEDURE ON SYSLIB TO [user] WITH GRANT OPTION;

GRANT DROP PROCEDURE ON SYSLIB TO [user] WITH GRANT OPTION;

GRANT DROP FUNCTION ON SYSLIB TO [user] WITH GRANT OPTION;

GRANT EXECUTE FUNCTION ON SYSLIB TO [user] WITH GRANT OPTION;

GRANT CREATE FUNCTION ON SYSLIB TO [user] WITH GRANT OPTION;

GRANT SELECT ON SYSLIB TO [user] WITH GRANT OPTION;

 

104   Run setup script

104.1           On a Teradata node, go to the /opt/teradata/sqlh/<version> directory and run the BTEQ setup script (tdsqlh_td.bteq) to do the setup: 

# bteq .logon localhost/[user],[password] < tdsqlh_td.bteq > output.txt 2>&1

where [user] and [password] are to be filled in. 

104.2           On a Teradata node, go to the /opt/teradata/tdsqlh_hdp/<version> directory and run the BTEQ setup script (tdsqlh_hdp.bteq) to do the setup: 

# bteq .logon localhost/[user],[password] < tdsqlh_hdp.bteq > output.txt 2>&1

where [user] and [password] are to be filled in. 

104.3           Check the output file (output.txt in the above example) for possible errors.  Upon successful execution, the script will install the required JAR files and create the load_from_hcatalog table operator function.

Notes:

You may run into the following error if there isn’t enough room in SYSLIB database. For installing Hadoop jars require about 19 megabytes of space. So, you need to increase the SYSLIB database size.

call sqlj.install_jar('cj!pig-withouthadoop.jar','pig', 0);

 *** Failure 2644 No more room in database SYSLIB.

 

Running the setup script for the first time may return following SQL Failure messages. These errors are benign and can be ignored safely. Examples:

DROP FUNCTION SYSLIB.load_from_hcatalog;

 *** Failure 5589 Function 'load_from_hcatalog' does not exist.

 

call sqlj.remove_jar('SQLH', 0);

 *** Failure 7972 Jar 'SYSLIB.SQLH' does not exist.

 

*** Warning: 9241 Check output for possible warnings encountered in Installing or Replacing a JAR.

 

105   Validate SQL-H install

Below steps will help to validate if the Teradata/Hadoop setup is indeed correct and ready for SQL-H queries.

105.1           Create hcatalog table with data:

105.1.1    Download files tdsqlh_example.hive and tdsql_data.csv from directory /opt/teradata/tdsqlh_hdp/<version> on a Teradata node, and copy them to the hadoop namenode under /tmp directrory.

105.1.2    Log into the hadoop namenode and go to /tmp directory.

105.1.3    Change files permissions on the copied files. 

# chmod 777 tdsqlh_example.hive  tdsql_data.csv

105.1.4    Change user to hive.

# su hive

105.2           Run sample hive script.

# hive < tdsqlh_example.hive 

 

Make sure the script completes and returns row count as 805.

Total MapReduce CPU Time Spent: 4 seconds 580 msec

OK

805

Time taken: 33.76 seconds

On successful completion, this will create a table tdsqlh_test with 14 columns and populate 805 rows.

 

105.3           Run SQL-H query to import rows from tdsqlh_test table

105.3.1    Using SQL Assistance or BTEQ, logon to the Teradata system as user dbc. If user dbc wasn’t present then you may choose a different user.

105.3.2    Run the following SQL query with proper hcatalog server address and port and see if it returns count(*) as 805.

 

SELECT count(*)

FROM SYSLIB.load_from_hcatalog(USING

       server('153.64.29.98')        -- Hcatalog server address

        port('9083')                        -- hcatalog server port

        username('hive')

        dbname('default')

        tablename('tdsqlh_test')

        columns('*')

        templeton_port('50111')         -- Web Hcatalog port

) as D1;

Results should come back with:

 *** Query completed. One row found. One column returned.

 *** Total elapsed time was 2 seconds.

 

   Count(*)

-----------

        805

 

If the above query returns an error instead of row count 805 then it’s possible that the TD/Hadoop setup could be wrong and requires manual trouble shooting to isolate the problem.

 

 

Ignore ancestor settings: 
0
Channel: 
Apply supersede status to children: 
0

Viewing all articles
Browse latest Browse all 174

Trending Articles