Denodo Admin Guide
- 1 Onboarding Users
- 1.1 Data Sources
- 1.2 Managing AD Groups
- 1.3 VDB Creation
- 1.4 Auditing Privileges
- 2 License Management
- 3 Denodo Scheduler
- 4 Box
- 5 Tableau
- 5.1 Tableau Server
- 6 Server Architecture
- 6.1 Authentication
- 6.2 Server List
- 6.3 Load Balancer
- 6.4 URL Matrix
- 7 Backend
- 8 Log Aggregation
- 9 SSL Certificates
- 10 Backups
- 10.1 Development VQL Backups
- 10.2 VM Backups
- 11 Version Control
- 12 Monitoring
- 12.1 Denodo Monitoring
- 12.2 Health Checks
- 13 On-Call Schedule
- 14 Automated OS Patching
- 15 How To’s
- 15.1 Adding a user created function to Denodo via JAR file
- 15.2 Changing Denodo User-Agent for HTTP requests
- 15.3 Updating Denodo
- 15.4 Get a list of all data source connections in Denodo
- 15.5 Add an index to a cache table
- 15.6 ORA-00932: inconsistent datatypes: expected - got CLOB
- 15.7 Recover from a DROP CASCADE
- 15.8 See who has completed the Denodo training
- 15.9 See who has access to views or columns (and their dependents)
- 16 User Guide
- 16.1 Tips
Onboarding Users
Determine access needshttps://tufts.service-now.com/now/nav/ui/classic/params/target/%24oc_workbench.do%3Fsysparm_redirect%3D%2524oc_groups.do%26sysparm_group_id%3D4576677c4f720340512ea3928110c72a
If VDB exists, do one of the following:
assign to existing AD groups in grouper
create new AD group with custom permissions, assign, import into Denodo and deploy to production
If new VDB needed, do the following:
create and import AD groups following Managing AD Groups
create VDB following VDB Creation
Provide users with Denodo User Guide
Data Sources
Denodo Architecture Scratchwork
Managing AD Groups
How-To
Log in to Grouper using SSO
Search for the DS folder
When creating a group
use the prefix grp_ds_denodo
description should refer to a data steward
members should all be run by the data steward before addition
Add grp_ds_denodo-admin to the group with ADMIN privileges. This gives Denodo admins the ability to administer the group.
Create separate
-dev
and-read
groupsAdd the dev group to the read group so all dev group members are synced with the read group automatically. This will ensure read access for devs will exist beyond dev, since only the read group should be promoted outside of dev.
If developers need scheduler access, create a
-scheduler
group and associated that group with a scheduler project after importing into Denodo.If the developers need solution manager access, add the
-dev
group as a member ofgrp_ds_denodo-solution-manager-deploy
.If a service account is going to be used, create
-service
group and give it the appropriate permissions after importing. Do not assign it thetts-facstaff
role as this will create view clutter in applications like DBeaver and Tableau.
Add members to groups
Only add members that have been approved by the data steward
Import groups into denodo. Note: You may need to wait up to 30 minutes for grouper changes to get synced to AD before importing.
Role base: DC=tufts,DC=ad,DC=tufts,DC=edu Attribute with role name: sAMAccountName Attribute with description: description Role search pattern: (&(cn=*denodo*)(objectcategory=group))
Assign roles and permissions as necessary. Add
tts-facstaff
to give metadata access to all views.
Denodo LDAP Configuration
User Base: DC=tufts,DC=ad,DC=tufts,DC=edu
Attribute with User Name: sAMAccountName
User search pattern: (&(objectcategory=user))
Role Base: DC=tufts,DC=ad,DC=tufts,DC=edu
Attribute with role name: sAMAccountName
Role search pattern: (&(member=@{USERDN})(objectcategory=group))
Once a role is imported, be sure to add the tts-facstaff
role to it.
VDB Creation
In the Design Studio, under Administration, go to Database management.
Inside Database Management, click New
Enter a name and description, keep all the other settings at defaults.
Folder Structure
Execute the VQL below to create the basic folder structure for a VDB. Make sure your new database is selected in the top left corner of the shell before executing.
CREATE OR REPLACE FOLDER '/01-base_layer' ;
CREATE OR REPLACE FOLDER '/01-base_layer/01-connections' ;
CREATE OR REPLACE FOLDER '/01-base_layer/02-base_views' ;
CREATE OR REPLACE FOLDER '/02-transformation' ;
CREATE OR REPLACE FOLDER '/03-integration' ;
CREATE OR REPLACE FOLDER '/04-interface_fulfillment' ;
CREATE OR REPLACE FOLDER '/05-interfaces' ;
CREATE OR REPLACE FOLDER '/06-web_services' ;
CREATE OR REPLACE FOLDER '/07-associations' ;
Permissions
Once a VDB is created, make sure you create and import a dev and read group by following the above AD group guide. Also, make sure you update the tts-facstaff role to include metadata access to the new VDB.
Auditing Privileges
In Denodo, you can run a query with privileges based on one or more roles by adding CONTEXT('impersonate_roles'='role1,role2,role3,...')
to the end of any query.
See Privilege Auditing — Virtual DataPort Administration Guide 8.0
Note that since users get their roles from AD and don’t actually exist in Denodo, we can’t impersonate actual users.
License Management
Updating Licenses in Solution Manager
Updating the license for Denodo is easily done through the Solution Manager web app.
Navigate to the Install License Page and drag the new license file into the UI dialog, then click Install.
Configuring the VDP Server
As long as Solution Manager is being used to manage licenses, each VDP server needs to be configured to connect to Solution Manager for acquiring it’s license.
Edit
<DENODO_HOME>/conf/SolutionManager.properties
to have the following values (assumes SSL is setup):Restart the VDP server
Check the logs to confirm that the VDP server has started successfully and test connecting with a client.
Denodo Scheduler
SMTP Configuration
As found in /opt/denodo/conf/scheduler/ConfigurationParameters.properties
Box
Connecting Denodo to a Box file is a two step process. The first step involves creating an app that will give Denodo access to Box via OAuth and that people can share files with. This only needs to be done once and can be used for all files. This is outlined in the section below. The second step involves creating a connection to a Box file in Denodo using the credentials of the app.
Creating a Box App
This outlines how to create a Box App for Denodo. Since an app has already been created for Denodo, this guide is just for reference. The link to the app and it’s assigned email address are below.
App Link | |
---|---|
Email (used for sharing files) |
Go to the Box developer console: Content APIs for Web and Mobile Apps | Box Platform
Click Create a New App
Select Custom App
Select Server Authentication and give the app the name Denodo
Request authorization of the app in Box and create a TechConnect ticket requesting authorization of the Box app.
Once authorized, note down the email address/Service Account ID of the app, accessible from the General Settings tab. Use this email address to grant the app access to files.
Connecting Denodo to Box
Consult the Box section of the Denodo User Guide for a VQL-based approach. The steps below cover how to create a connection using the Design Studio GUI.
Create a new Data Source file connection (Delimited file, JSON, XML, Excel)
Alternatively, you can use the following VQL:
Select ‘HTTP Client’ as the Data Route
Fill in the Base URL using this format, https://api.box.com/2.0/files/{FILE_ID}/content, replacing {FILE_ID} with the ID of the file in box.
Download file - API Reference - Box Developer DocumentationUnder the Authentication section, fill in the fields as follows. The Token endpoint URL is https://api.box.com/oauth2/token. You’ll need to click on Extra parameters of the refresh token requests and fill in the additional parameters that Box requires (screenshot , as outlined in the linked documentation below
Client Credentials Grant - Box Developer DocumentationFill out the bottom section to match your file configuration. The screenshot below shows a common configuration for a CSV
Test the connection, confirm it works, then click Save.
Click Create Base View to create a base view for the file.
Interacting with the Denodo Service Account
File can be uploaded to the Denodo Box service account to prevent dependency on the Box accounts of individuals. While you can’t log in to Box and interact with the service account using the traditional UI, you can perform all essential tasks via the API. The sections below outline how to perform essential Box tasks.
The API can be accessed through a number of different methods including cURL, Postman, and Python, with Postman being the easiest method for most.
Files uploaded to the Denodo service account are, by design, accessible by the Denodo service account and do not need to go through the share process outlined above.
Data Strategy Drop Folder
A general Data Strategy Drop Folder has been created to that members of the Data Strategy team can request access to. Sub folders can be created in here and shared with other teams within the university for uploading of shared data files.
Sharing Files with the Service Account
In the event that files can’t be uploaded to the drop folder, files can be shared with Denodo Service Account by using the service account email in the Share dialog for a file and selecting Invite as Viewer.
See Creating a Box App for the email address.
Importing API Collection into Postman
An API collection for Postman has been created and can be downloaded here.
To import into Postman, click Import, then find and select the Box.postman_collection.json file.
Once imported, you’ll need to obtain the client ID and client secret for the service account and add them to the collection variables. You can obtain these from the app configuration page here (if you’ve been added as a collaborator): https://tufts.app.box.com/developers/console/app/1958391/configuration, or by sending a request to denodo-devs-request@elist.tufts.edu.
Once you have the client ID and secret, set them in the collection variables page and you’re ready to start making API requests.
Obtaining an Access Token
Request access token - API Reference - Box Developer Documentation
Access tokens are temporary and typically only valid for an hour.
Creating a Folder
Create folder - API Reference - Box Developer Documentation
Once a folder is created, you can add people as collaborators so that they interact with the folder using the Box UI, as outlined in Adding Collaborators.
To make uploading files to a folder easy, it’s recommended that you enable the folder upload email and make the access open.
When people upload files to the folder, they should receive a response from Box like the one below indicating that the upload was successful.
If a file being uploaded via email already exists, rather than add it as a new version of that file, it will create a new file with a randomly generated string as part of of the filename.
Listing Folder Items
List items in folder - API Reference - Box Developer Documentation
Get Folder Information
Get folder information - API Reference - Box Developer Documentation
Shared Links
Add shared link to file - API Reference - Box Developer Documentation
Add shared link to folder - API Reference - Box Developer Documentation
Adding Collaborators
Create collaboration - API Reference - Box Developer Documentation
Deleting a File
Delete file - API Reference - Box Developer Documentation
Tableau
Tableau Server
In order for workbooks using a Denodo JDBC data source to be published to a Tableau Server, the server needs to be updated to include the Denodo JDBC connector and driver. The steps below outline how to accomplish this.
Add the Denodo JDBC Connector (https://tufts.box.com/s/6fxx0xx3lv4gk6v7k3ymdr1rzgznye6h) under C:\Program Files\Tableau\Connectors
Add the Denodo JDBC Driver (denodo-vdp-jdbcdriver-9.0.4.jar ) under C:\Program Files\Tableau\Drivers
Restart Tableau Server using
tsm pending-changes apply
in the command line, enteringy
when prompted to restart.
References
Connectors Built with the Tableau Connector SDK
Taco package signature verification failure
It’s a known issue that verification of Taco-based package signatures can fail in Tableau. This affects both Tableau Desktop and Tableau Server. To get around the issue, you can disable package signature verification. How to do this is outlined below. Reference: https://community.denodo.com/kb/en/view/document/How%20to%20update%20the%20Tableau%20Denodo%20JDBC%20connector%20to%20support%20OAuth%20Authentication?tag=Consumers
Tableau Desktop
Tableau Server
Server Architecture
Authentication
Server List
Denodo 9
denodo-solution-manager-prod-10.it.tufts.edu
denodo-dev-10.it.tufts.edu
denodo-stage-10.it.tufts.edu
denodo-prod-10.it.tufts.edu
denodo-prod-11.it.tufts.edu
Load Balancer
The load balancer can be found here: https://tssdc-lb-float-vip.net.tufts.edu/
You need to create a ticket requesting access if you want read or write access.
This page outlines how to create a health check monitor that can be used to remove inactive nodes from a pool: https://community.denodo.com/kb/en/view/document/Configuring%20a%20Denodo%20Cluster%20with%20BigIP%20F5
The load balancer currently uses the same health check defined for denodo_vdp
define here for stage and prod. If the health checks fails for a VM, it’s removed from the pool and added back in once it starts passing.
iRules
We can use iRules to direct the traffic to specific nodes based on URL.
Endpoints
URL Matrix
Backend
A database for each environment will be used for the purpose of holding the Denodo cache, as well as serve as the backend for the Scheduler and Data Catalog.
A cache in Denodo is useful for when you want to optimize performance for API-based data sources or complex views.
Denodo supports a variety of databases for caching, but Postgres is the recommended database.
The Denodo backend is currently colocated with the future cloud data warehouse in test, stage and prod Aurora Postgres clusters in the Prod AWS Landing Zone.
Database Setup
Deploy cluster via CloudFormation template
Run database setup script
Setup Cache Data Source in Denodo
The below screenshot illustrates the setup of the cache connection in the Denodo Design Studio. It should be placed in the admin VDB. The purpose of this connection is to query the cache during troubleshooting and to make setting up the VDP cache and making updates easier.
Once the connection is created, do the following:
Go to the Cache section of the Server Configuration
Set Cache Status to On.
Set Maintenance to off (see Cache Maintenance)
Select Specify an existing data source as cache, under Database select admin, and for Data source select denododb_cache.
While the connection can be promoted to the other environments via Solution Manager, the server configuration must be set manually, after the connection has been deployed, you only need to manually set it once, then you can deploy changes to the connection via Revisions. When deploying, make sure the set the environment variables for the connection appropriately.
Cache Migration
If the cache database ever needs to be migrated, make updates either directly in Dev, or deploy a change to admin.denododb_cache
. If deploying through solution manager, make sure to update the environment properties for the connection before deploying. Once the connection changes are made, restart the VDP servers.
After updating the cache connection, run the results of the following query on each server in an environment to reset the cache on all tables, then re-run the cache jobs for any views that have them.
Configure Scheduler
Configure Data Catalog
Maintenance
Cache Maintenance Tasks
A job called clean_all_database_caches
is deployed in Denodo Scheduler in each environment. It runs at 4 AM on Sunday each week.
You can see the results of each run with the following query:
Database Maintenance
Running VACUUM FULL
on Postgres can return unused space freed up by cache maintenance back to the OS.
This readme outlines the use of cron for postgres on the Denodo DB servers.
Monitoring
Cost Estimates
On-demand: Co-locates data warehouse and Denodo onto one set of clusters
Log Aggregation
Logs across all Denodo servers are aggregated into Amazon CloudWatch using the CloudWatch Agent.
Logs can be viewed here.
Logs prior to 2023-10-06 can be accessed in the DLT account. Logs from 2023-10-06 to 2023-12-20 can be accessed in the LZ Dev account. Logs after 2023-12-20 can be accessed in the LZ Prod account. The above link can be used to view Denodo logs in each account.
Logs are grouped by file into log groups starting with the prefix /denodo/
. After the /denodo/
prefix comes the server type prefix (such as /vdp/
or /solution-manager/
), followed by the log type.
Within each log group is a stream with a name that represents where the stream is coming from (such as denodo-dev-01.it.tufts.edu
).
When viewing logs, be sure that your log viewer is set to show columns and details. Don’t select View in plain text, since this will keep you from properly filtering on the timestamp.
The code used to setup and configure the CloudWatch Agent on the Denodo servers can be found here
SSL Certificates
Leveraging the work of Ned Harvey and team, we have a CA-signed certificate generated by LetsEncrypt automatically delivered to our Denodo servers every 90 days.
Certbot will start attempting to generate and deploy new certificates approximately 1 month prior to the expiration of existing certificates. Generated and deployment attempts are set to take place on Tuesday, Wednesday and Thursday at 6:40 am. With generation and deployment centrally managed in one workflow, there’s little control over the date and time, but there is an option to have the deployment script executed separately which would allow control over the specific day and time for that part of the process.
The certificate files will be deployed by certbot across all Denodo serversm, storing them at /home/certbot/certdir
. A cron job (managed by Ned’s team via puppet) runs every Tuesday and Wednesday and Thursday at 5AM which runs /home/denod/denodo/scripts/certbot_install_cert.sh
. Certificate deployment will only occur when the certificates are detected in the certbot directory.
All scripts can be found here.
Backups
Development VQL Backups
Every hour, a cron job on the dev server will query the webservice admin.get_all_vql
and store the resulting VQL file in tufts-denodo-prod in the PROD LZ. The bucket has versioning enabled, so each copy operation will create a new version of the file. A policy is in place to delete versions older than 30 days. This will allow us to access a copy of the servers VQL at every hour in the past 30 days. In addition to an hourly backup, a copy of the VQL at the beginning of each day will be saved as well, and kept for 365 days.
Backups prior to 2023-10-19 can be accessed in the denodo-backups bucket in the DLT as long as that account is active.
Backup script code
S3 bucket code
VM Backups
Handled by ESCP
All Systems: Nightly backup retained for 7 days
Prod Systems: Nightly file system backed up using networker, retained for 3 months
Oracle/DB Systems: Data dumped and saved to networker for 3 months
Version Control
Enabling VCS
Go to VCS Management in Design Studio
In the Databases tab, find the desired VDB and click the gear next to it under the Configure column
In the configuration window, do the following:
Check Use version control
Un-check Use default configuration
Enter the VDB name under Branch
Enter the token for tufts-denodo under Password. You can get the token from CyberArk.
Create a branch for the VDB name in the denodo-vcs repo using the the GitHub web UI or the following command if you have the repo on your machine.
Migrating VDB’s Into Their Own Branch
Our initial VCS setup had all VDB’s using the same branch in the same Github repository (denodo-vcs). This resulted in progressively degrading VCS performance in Denodo, as well as very a dense file system footprint since each VDB repository had the commits and files for every VDB.
To resolve this, each VDB should be assigned it’s own branch in the denodo-vcs repo. This process will require a migration that will result in a loss of commit history, but will result in improved performance for all.
The process is as follows:
Commit and push the whole VDB
Export the VDB, be sure to uncheck Enter Single User Mode for Import
Delete the local repository for the VDB from the dev VDP server
Create an empty branch in denodo-vcs that matches the VDB name
Update the VCS configuration for the VDB to use the new branch (you’ll need need the tufts_denodo token from CyberArk)
Perform a Pull for the VDB
Run the VQL script from the VDB export, be sure to remove
ENTER SINGLE USER MODE
andEXIT SINGLE USER MODE
, these commands don't work properly and cause the whole VDP server to freeze and require a restart.Create a commit for the VDB
Push the VDB
Monitoring
Denodo Monitoring
Global Monitoring has been enabled in the Solution Manager to export monitoring data for the VDP servers across all environments to the Aurora prod cluster. Monitoring was enabled on 2024-04-12 so data is available from that point onward.
The data is contained in the following tables in the denodo schema:
cache_notification
connection_notification
datasource_polling
query_datasources_notification
queryblock_notification
request_notification
resource_polling
thread_polling
Base views have been created for the above under
admin/01-base_layer/02-base_views/denodo_monitor
In addition to writing to the database, text log files are kept on the Solution Manager server at /opt/denodo/resources/solution-manager-monitor/work
as outlined here https://community.denodo.com/docs/html/browse/8.0/en/solution_manager/administration/monitoring/monitoring#:~:text=The%20Denodo%20Monitor%20generates%20the,solution-manager-monitor%2Fwork.
Log retention of 5 days has been put in place by following this guide: https://community.denodo.com/kb/en/view/document/How%20to%20configure%20the%20Denodo%20Monitor%20to%20delete%20old%20log%20files#:~:text=When%20Monitoring%20is%20started%20via,requirements%20and%20the%20OS%20used.
If you need to restart or update Solution Manager, be sure to start the monitors for each environment. This should also be done whenever there’s a certificate deployment.
Health Checks
Health checks for all Denodo services are performed every five minutes on each Denodo server. If a service is down or unresponsive an email will be sent to denodo-devs@elist.tufts.edu and it will be restarted.
Health check logs can be found in the AWS LZ Prod account here.
Health check script is available here.
On-Call Schedule
The On-Call Schedule is maintained in ServiceNow here. This schedule allows members of the scheduler roster to be notified, in order, whenever a ticket with urgency of High is created within ServiceNow.
If it’s your first time being on an on-call schedule in ServiceNow, you’ll want to fill the contact form here.
When an urgent incident is created in ServiceNow, ServiceNow will cycle through the roster, starting with the Primary, and send an SMS message like the following below:
SMS messages come from +18446360241, so make sure that you create a contact for that number so that it doesn’t get blocked.
The On-Call Calendar can be viewed here.
If you need to schedule time off, go the On-Call Calendar, select the first shift where your time off is occurring and click Manage Shift
In the window that appears, select Schedule time off and select the dates for your time-off.
Those who are on-call can refer to the HSST doc for Denodo shared with ESCP to help with troubleshooting: https://tuftswork.atlassian.net/wiki/spaces/EnterpriseSystems/pages/1301184579
Automated OS Patching
The OS for the Denodo VM’s will be automatically patched each month on the following schedule:
Non-prod: 1st Sunday 1AM
Prod: 3rd Sunday 12AM
In the event that patching needs to happen manually, run the following command as root:
This will patch the OS and restart the VM.
How To’s
Adding a user created function to Denodo via JAR file
https://github.com/Tufts-Technology-Services/denodo/blob/main/custom-functions/README.md
Changing Denodo User-Agent for HTTP requests
The VDP server needs to be restarted to take affect.
Updating Denodo
https://github.com/Tufts-Technology-Services/denodo/tree/main/scripts/denodo#updating-denodo
Get a list of all data source connections in Denodo
You can view a list of all data source connections on a Denodo instance with the following query:
To get additional details on certain types of connection, run this query:
Add an index to a cache table
Denodo allows you to add indexes to cache tables for improved performance. However, there appears to be a bug in Denodo where index creation will succeed on the DB-side but Denodo will throw an error or warning. While this warning can be ignored when creating indexes in Design Studio, it can cause schedule cache jobs to fail if a PK index needed to be created as part of the job (invalidating rows based on PK enabled). To get around this, simply create the indexes in Design Studio before running the job.
Note that adding indexes to views will cause an additional error when trying to save changes to the view via the Edit UI panel in the UI. This error prevents the view changes from persisting in Denodo, limiting you to making changes via VQL.
To add an index to a cache table, do the following:
Enable caching for a view
Add the index under Cache table indexes
Click Save (you’ll get a warning saying there was issue create the index)
ORA-00932: inconsistent datatypes: expected - got CLOB
Problem:
When running:
Get error:
Cause:
Solution:
Possible Solution 1: Casting
https://community.denodo.com/answers/question/details?questionId=90670000000XcumAAC
Leads to error:
Possible Solution 2: Server Config
Open Data Source (only for VDP Administration Tool) > Configuration > Source Configuration > Delegate group by clause = No
This would slow down performance as well as create extra load on the denodo servers by not pushing the query to oracle.
Recover from a DROP CASCADE
DROP CASCADE
is a very dangerous command that will not only drop the view listed, but all if it’s dependents as well. The dependents list can be very long, and recovery can be very cumbersome. Try to avoid the DROP
command at all costs, especially DROP CASCADE
.
In the event that you accidentally execute DROP CASCADE
, follow these steps to recover.
Get a list of all dependents of the view dropped and their cache status
Get the latest VQL backup for the environment where the drop occured. This could be a development backup in S3, or a deployment backup. See the sections above for their locations.
Assemble a VQL file containing the relevent
CREATE, ALTER, CHOWN, and ALTER ROLE
statements for each view. Be sure to includeCONNECT DATABASE
andCLOSE;
statements for each database.Create a revision in Solution Manager, validate and deploy.
See who has completed the Denodo training
Run the following query in the VQL shell, or access the view in the Data Catalog using the link below.
See who has access to views or columns (and their dependents)
The view admin.execute_permission_view
has been created to help audit who has execute access to to a view and/or their dependents.
For example, to see who has execute access to views with columns containing visa
or citizen
that isn’t a Denodo admin, you can run the following query:
Example result:
This view has been deployed in all environments, and be access in either Design Studio or Data Catalog
https://denodo-dev.it.tufts.edu:9443/denodo-data-catalog/#/view/admin/execute_permission_audit
User Guide
A user-facing guide has been created that contains all information relevant to non-Admins. The guide is available here: https://tuftswork.atlassian.net/wiki/spaces/DS/pages/534347809
The guide has a public link that should be used to share with the rest of the university
Tips
Lots of screenshots
Focus on Tufts-specific information then link to Denodo docs
Don’t include any sensitive information (secrets). The page has a public link that we use for sharing.