Initial configuration v2
Before creating the extension and the database objects that use the extension, you must modify the Postgres host, providing the location of the supporting libraries.
After installing Postgres, modify postgresql.conf
, located in:
/var/lib/edb/as_version/data
Modify the configuration file, adding the hdfs_fdw.jvmpath
parameter to the end of the configuration file and setting the value to specify the location of the Java virtual machine (libjvm.so
). Set the value of hdfs_fdw.classpath
to indicate the location of the Java class files used by the adapter. Use a colon (:) as a delimiter between each path. For example:
Note
Copy the jar files (hive-jdbc-1.0.1-standalone.jar
and hadoop-common-2.6.4.jar
) from the respective Hive and Hadoop sources or website to the PostgreSQL instance where Hadoop Foreign Data Wrapper is installed.
If you're using EDB Postgres Advanced Server and have a DATE
column in your database, you must set edb_redwood_date = OFF
in the postgresql.conf
file.
After setting the parameter values, restart the Postgres server. For detailed information about controlling the service on an EDB Postgres Advanced Server host, see the EDB Postgres Advanced Server documentation.
Before using the Hadoop Foreign Data Wrapper:
- Use the CREATE EXTENSION command to create the extension on the Postgres host.
- Use the CREATE SERVER command to define a connection to the Hadoop file system.
- Use the CREATE USER MAPPING command to define a mapping that associates a Postgres role with the server.
- Use the CREATE FOREIGN TABLE command to define a table in the EDB Postgres Advanced Server database that corresponds to a database that resides on the Hadoop cluster.
CREATE EXTENSION
Use the CREATE EXTENSION
command to create the hdfs_fdw
extension. To invoke the command, use your client of choice (for example, psql) to connect to the Postgres database from which you will be querying the Hive or Spark server, and invoke the command:
Parameters
IF NOT EXISTS
Include the IF NOT EXISTS
clause to instruct the server to issue a notice instead of returning an error if an extension with the same name already exists.
schema_name
Optionally specify the name of the schema in which to install the extension's objects.
Example
The following command installs the hdfs_fdw
Hadoop Foreign Data Wrapper:
CREATE EXTENSION hdfs_fdw;
For more information about using the foreign data wrapper CREATE EXTENSION
command, see the PostgreSQL documentation.
CREATE SERVER
Use the CREATE SERVER
command to define a connection to a foreign server. The syntax is:
The role that defines the server is the owner of the server. Use the ALTER SERVER
command to reassign ownership of a foreign server. To create a foreign server, you must have USAGE
privilege on the foreign-data wrapper specified in the CREATE SERVER
command.
Parameters
server_name
Use server_name
to specify a name for the foreign server. The server name must be unique in the database.
FOREIGN_DATA_WRAPPER
Include the FOREIGN_DATA_WRAPPER
clause to specify for the server to use the hdfs_fdw
foreign data wrapper when connecting to the cluster.
OPTIONS
Use the OPTIONS
clause of the CREATE SERVER
command to specify connection information for the foreign server. You can include:
Option | Description |
---|---|
host | The address or hostname of the Hadoop cluster. The default value is localhost . |
port | The port number of the Hive Thrift Server or Spark Thrift Server. The default is 10000 . |
client_type | Specify hiveserver2 or spark as the client type. To use the ANALYZE statement on Spark, you must specify a value of spark . The default value is hiveserver2 . |
auth_type | The authentication type of the client. Specify LDAP or NOSASL . If you don't specify an auth_type , the data wrapper decides the auth_type value on the basis of the user mapping. If the user mapping includes a user name and password, the data wrapper uses LDAP authentication. If the user mapping doesn't include a user name and password, the data wrapper uses NOSASL authentication. |
connect_timeout | The length of time before a connection attempt times out. The default value is 300 seconds |
enable_aggregate_pushdown | Similar to the table-level option but configured at the server level. If true , pushes the aggregate operations to the foreign server instead of fetching rows from the foreign server and performing the operations locally. You can also set this option for an individual table. The table-level value of the option takes precedence over the server-level option value. Default is true . |
enable_join_pushdown | Similar to the table-level option but configured at the server level. If true , pushes the join between two foreign tables from the same foreign server instead of fetching all the rows for both the tables and performing a join locally. You can also set this option for an individual table and, if any of the tables involved in the join has set the option to false , then the join isn't pushed down. The table-level value of the option takes precedence over the server-level option value. Default is true . |
enable_order_by_pushdown | Similar to the table-level option but configured at the server level. If true , pushes the order-by operation to the foreign server instead of fetching rows from the foreign server and performing the sort locally. You can also set this option for an individual table. The table-level value of the option takes precedence over the server-level option value. Default is true . |
fetch_size | Provided as a parameter to the JDBC API setFetchSize . The default value is 10,000 . |
log_remote_sql | If true , logging includes SQL commands executed on the remote Hive server and the number of times that a scan is repeated. The default is false . |
query_timeout | Use query_timeout to provide the number of seconds after which a request times out if it isn't satisfied by the Hive server. Query timeout is not supported by the Hive JDBC driver. |
use_remote_estimate | Include use_remote_estimate to instruct the server to use EXPLAIN commands on the remote server when estimating processing costs. By default, use_remote_estimate is false , and remote tables are assumed to have 1000 rows. |
Example
The following command creates a foreign server named hdfs_server
that uses the hdfs_fdw
foreign data wrapper to connect to a host with an IP address of 170.11.2.148
:
The foreign server uses the default port (10000) for the connection to the client on the Hadoop cluster. The connection uses an LDAP server.
For more information about using the CREATE SERVER
command, see the PostgreSQL documentation.
CREATE USER MAPPING
Use the CREATE USER MAPPING
command to define a mapping that associates a Postgres role with a foreign server:
You must be the owner of the foreign server to create a user mapping for that server.
Note
The Hadoop Foreign Data Wrapper supports NOSASL and LDAP authentication. If you're creating a user mapping for a server that uses LDAP authentication, use the OPTIONS
clause to provide the connection credentials (the user name and password) for an existing LDAP user. If the server uses NOSASL authentication, omit the OPTIONS
clause when creating the user mapping.
Parameters
role_name
Use role_name
to specify the role to associate with the foreign server.
server_name
Use server_name
to specify the name of the server that defines a connection to the Hadoop cluster.
OPTIONS
Use the OPTIONS
clause to specify connection information for the foreign server. If you're using LDAP authentication, provide:
username
— The name of the user on the LDAP server.
password
— the password associated with the username.
If you don't provide a user name and password, the data wrapper uses NOSASL authentication.
Example
The following command creates a user mapping for a role named enterprisedb
. The mapping is associated with a server named hdfs_server
:
CREATE USER MAPPING FOR enterprisedb SERVER hdfs_server;
If the database host uses LDAP authentication, provide connection credentials when creating the user mapping:
The command creates a user mapping for a role named enterprisedb
that is associated with a server named hdfs_server
. When connecting to the LDAP server, the Hive or Spark server authenticates as alice
, and provides a password of 1safepwd
.
For detailed information about the CREATE USER MAPPING
command, see the PostgreSQL documentation.
CREATE FOREIGN TABLE
A foreign table is a pointer to a table that resides on the Hadoop host. Before creating a foreign table definition on the Postgres server, connect to the Hive or Spark server and create a table. The columns in the table map to columns in a table on the Postgres server. Then, use the CREATE FOREIGN TABLE
command to define a table on the Postgres server with columns that correspond to the table that resides on the Hadoop host. The syntax is:
column_constraint
is:
table_constraint
is:
Parameters
table_name
Specify the name of the foreign table. Include a schema name to specify the schema in which the foreign table resides.
IF NOT EXISTS
Include the IF NOT EXISTS
clause to instruct the server not to return an error if a table with the same name already exists. If a table with the same name exists, the server issues a notice.
column_name
Specifies the name of a column in the new table. Each column corresponds to a column described on the Hive or Spark server.
data_type
Specify the data type of the column. When possible, specify the same data type for each column on the Postgres server and the Hive or Spark server. If a data type with the same name isn't available, the Postgres server attempts to cast the data type to a type compatible with the Hive or Spark server. If the server can't identify a compatible data type, it returns an error.
COLLATE collation
Include the COLLATE
clause to assign a collation to the column. The column data type's default collation is used by default.
INHERITS (parent_table [, ... ])
Include the INHERITS
clause to specify a list of tables from which the new foreign table automatically inherits all columns. Parent tables can be plain tables or foreign tables.
CONSTRAINT constraint_name
Specify an optional name for a column or table constraint. If not specified, the server generates a constraint name.
NOT NULL
Include the NOT NULL
keywords to indicate that the column isn't allowed to contain null values.
NULL
Include the NULL
keywords to indicate that the column is allowed to contain null values. This is the default.
CHECK (expr) [NO INHERIT]
Use the CHECK
clause to specify an expression that produces a Boolean result that each row in the table must satisfy. A check constraint specified as a column constraint can reference that column's value only, while an expression appearing in a table constraint can reference multiple columns.
A CHECK
expression can't contain subqueries or refer to variables other than columns of the current row.
Include the NO INHERIT
keywords to specify that a constraint can't propagate to child tables.
DEFAULT default_expr
Include the DEFAULT
clause to specify a default data value for the column whose column definition it appears in. The data type of the default expression must match the data type of the column.
SERVER server_name [OPTIONS (option 'value' [, ... ] ) ]
To create a foreign table that allows you to query a table that resides on a Hadoop file system, include the SERVER
clause and specify the server_name
value of the foreign server that uses the Hadoop data adapter.
Use the OPTIONS
clause to specify the following options and their corresponding values:
Option | Value |
---|---|
dbname | The name of the database on the Hive server. The database name is required. |
table_name | The name of the table on the Hive server. The default is the name of the foreign table. |
enable_aggregate_pushdown | Similar to the server-level option but configured at the table level. If true , pushes the aggregate operations to the foreign server instead of fetching rows from the foreign server and performing the operations locally. You can also set this option for an individual table. The table-level value of the option takes precedence over the server-level option value. Default is true . |
enable_join_pushdown | Similar to the server-level option but configured at the table level. If true , pushes the join between two foreign tables from the same foreign server instead of fetching all the rows for both the tables and performing a join locally. You can also set this option for an individual table. If any of the tables involved in the join has set the option to false , then the join isn't pushed down. The table-level value of the option takes precedence over the server-level option value. Default is true . |
enable_order_by_pushdown | Similar to the server-level option but configured at the table level. If true , pushes the order-by operation to the foreign server instead of fetching rows from the foreign server and performing the sort locally. You can also set this option for an individual table. The table-level value of the option takes precedence over the server-level option value. Default is true . |
Example
To use data that is stored on a distributed file system, you must create a table on the Postgres host that maps the columns of a Hadoop table to the columns of a Postgres table. For example, for a Hadoop table with the following definition:
Execute a command on the Postgres server that creates a comparable table on the Postgres server:
Include the SERVER
clause to specify the name of the database stored on the Hadoop file system (webdata
) and the name of the table (weblogs
) that corresponds to the table on the Postgres server.
For more information about using the CREATE FOREIGN TABLE
command, see the PostgreSQL documentation.
Data type mappings
When using the foreign data wrapper, you must create a table on the Postgres server that mirrors the table that resides on the Hive server. The Hadoop data wrapper automatically converts the following Hive data types to the target Postgres type:
Hive | Postgres |
---|---|
BIGINT | BIGINT/INT8 |
BOOLEAN | BOOL/BOOLEAN |
BINARY | BYTEA |
CHAR | CHAR |
DATE | DATE |
DOUBLE | FLOAT8 |
FLOAT | FLOAT/FLOAT4 |
INT/INTEGER | INT/INTEGER/INT4 |
SMALLINT | SMALLINT/INT2 |
STRING | TEXT |
TIMESTAMP | TIMESTAMP |
TINYINT | INT2 |
VARCHAR | VARCHAR |
DROP EXTENSION
Use the DROP EXTENSION
command to remove an extension. To invoke the command, use your client of choice (for example, psql) to connect to the Postgres database from which you're dropping the Hadoop server, and run the command:
Parameters
IF EXISTS
Include the IF EXISTS
clause to instruct the server to issue a notice instead of returning an error if an extension with the specified name doesn't exist.
name
Optionally, specify the name of the installed extension.
CASCADE
Automatically drop objects that depend on the extension. It drops all the other dependent objects too.
RESTRICT
Don't allow to drop extension if any objects, other than its member objects and extensions listed in the same DROP
command, depend on it.
Example
The following command removes the extension from the existing database:
DROP EXTENSION hdfs_fdw;
For more information about using the foreign data wrapper DROP EXTENSION
command, see the PostgreSQL documentation.
DROP SERVER
Use the DROP SERVER
command to remove a connection to a foreign server. The syntax is:
The role that drops the server is the owner of the server. Use the ALTER SERVER
command to reassign ownership of a foreign server. To drop a foreign server, you must have USAGE
privilege on the foreign-data wrapper specified in the DROP SERVER
command.
Parameters
IF EXISTS
Include the IF EXISTS
clause to instruct the server to issue a notice instead of returning an error if a server with the specified name doesn't exist.
name
Optionally, specify the name of the installed server.
CASCADE
Automatically drop objects that depend on the server. It drops all the other dependent objects too.
RESTRICT
Don't allow to drop the server if any objects depend on it.
Example
The following command removes a foreign server named hdfs_server
:
DROP SERVER hdfs_server;
For more information about using the DROP SERVER
command, see the PostgreSQL documentation.
DROP USER MAPPING
Use the DROP USER MAPPING
command to remove a mapping that associates a Postgres role with a foreign server. You must be the owner of the foreign server to remove a user mapping for that server.
Parameters
IF EXISTS
Include the IF EXISTS
clause to instruct the server to issue a notice instead of returning an error if the user mapping doesn't exist.
user_name
Specify the user name of the mapping.
server_name
Specify the name of the server that defines a connection to the Hadoop cluster.
Example
The following command drops a user mapping for a role named enterprisedb
. The mapping is associated with a server named hdfs_server
:
DROP USER MAPPING FOR enterprisedb SERVER hdfs_server;
For detailed information about the DROP USER MAPPING
command, see the PostgreSQL documentation.
DROP FOREIGN TABLE
A foreign table is a pointer to a table that resides on the Hadoop host. Use the DROP FOREIGN TABLE
command to remove a foreign table. Only the owner of the foreign table can drop it.
Parameters
IF EXISTS
Include the IF EXISTS
clause to instruct the server to issue a notice instead of returning an error if the foreign table with the specified name doesn't exist.
name
Specify the name of the foreign table.
CASCADE
Automatically drop objects that depend on the foreign table. It drops all the other dependent objects too.
RESTRICT
Don't allow to drop foreign table if any objects depend on it.
Example
For more information about using the DROP FOREIGN TABLE
command, see the PostgreSQL documentation.