Installing Tagetik Data Loader with CloudFormation

Niilo Remes
5 min readDec 13, 2022

--

Sending data from S3 bucket to Tagetik SAAS made easy

What is Tagetik Data Loader?

Tagetik Data Loader (TDL) is a software, which allows data loading from database or file into Tagetik SAAS. TDL is installed to Windows or Linux server, and this writing shows how a S3 bucket is mounted into directory inside Linux instance. The S3 directory is specified as endpoint inside Tagetik Data Loader Agent, and after installation Tagetik SAAS can fetch the files from the S3 bucket.

Architecture of the solution

The solution consists of six elements — Two S3 buckets (one for installation files, one for files used by Tagetik SAAS), one EC2 instance, Elastic IP for EC2, Secrets manager for storing passwords, and Tagetik SAAS.

From these, the S3 bucket for installation files, Secrets Manager Secret and Elastic IP has to be created manually. Everything else is automated through the CloudFormation template. The template requires an VPC with connection to the internet, but the default VPC is good for that.

Architecture of the solution. Resources outside the box has to be created before creating the stack.

Prerequisites

- Existing S3 bucket, with installation package and HTTPS Certificates inside at root

- Existing Elastic IP, which is whitelisted at Tagetik Side. Send the IP to Tagetik Support for whitelisting, and use the EIP address as stack parameter.

- Password for certificate, keystore and connection stored inside AWS Secrets Manager.

Installation package, certificate, keystore and the passwords for them are provided to you by Tagetik Support. The connection password has to be created by you.

Adding Secrets to Secrets Manager

Secrets can be created by either the AWS Console or CloudFormation template. Adding the secrets via console happens from the Secrets Manager service, by “Store a new secret” button and selecting “Other type of secret”. The template uses keys “tagetikkeystorepassword”, “tagetikcertificatepassword” and “tagetikconnectionpassword” for Key/value pairs, and it is possible to use same secret by specifying multiple Key/value rows inside it.

Console view of creating new Secret inside AWS Secret Manager
Use the Secret names as parameters for CFN Template

Permissions of using secrets stored inside Secret Manager can be granted with two different ways — either into the secret, or into the resource that queries the secret. In this template we use the latter way, by specifying a policy into EC2role, which allows the EC2 Instance to fetch the passwords from Secret Manager. Note that using custom KMS Key requires additional permissions for decryption, and the template assumes that you’re using default KMS Key (aws/secretsmanager)!

Installation of the stack

Verify that the prerequisites are fulfilled and create the stack from the template found at the bottom of the page. Fill in the required parameters.

After the installation, connect to the instance to run the required manual steps — Apache Karaf requires manual intervention for the first runtime, but after that and setting up the TDL as Linux Service, the service should handle server restarts without problems. Note that you have to run the bin/karaf as root!

Using AWS Systems Manager Session Manager is the easiest way to connect to newly created instance. Go to EC2 console, select instance, press Connect and select Session Manager, and you’re in.

Connecting to EC2 Instance from EC2 console
Session Manager requires SSM Agent to be installed into instance and running, as well as correct permissions for EC2 IAM Role

After connecting to instance, run the following commands to start Karaf, add TDL Wrapper to allow TDL to be run as service, start the service and enable it to make sure it is running even after restarting the instance.

/TagetikDataLoader/dev/apache-karaf/bin/karaf
wrapper:install --name "TDL" --display "TDL" --description "TDL"
logout
systemctl enable /TagetikDataLoader/dev/apache-karaf/bin/TDL.service
systemctl start TDL

Setting up connection to TDL inside Tagetik SAAS

To set up the connection to TDL, you must add an option for specifying the endpoint — by default, TDL is not visible in the UI.

Create new icon with + button, and save the view with Save icon
Select Tagetik Data Loader from the list
After adding the Tagetik Data Loader, your view should look like this

After adding the TDL tile, click it and open the settings panel. Click + and add the information data used at the TDL installation. “Tagetik Data Loader ID” is created by the software and can be found from tgk.agent.cfg inside your installation folder (in my example it is “/TagetikDataLoader/dev/apache-karaf/”). The password inside that file is encrypted at the startup of Karaf, and therefore you can’t use the file content for the password — You have to use the password inside the Secrets Manager path, that you specified at the CloudFormation parameter “connectionpassword” as the Authentication code. As the “Tagetik Data Loader Name” you have to use the name of the service provided for the Operating System, by default that is “TDL”.

Contents of the tgk.agent.cfg file inside EC2 Instance. Note that the secret you are seeing here is encrypted!
Adding the Tagetik Data Loader into Tagetik SAAS. Use the TDL ID found from the tgk.agent.cfg , Name from the wrapper you created earlier (TDL) and as the Authentication code the connectionpassword from Secrets Manager

After setting up the connection for TDL, you have to specify an endpoint for ETL operations. Create the endpoint tile the same way you added TDL tile, and use it to add the TDL as the endpoint. After creating the endpoint, you can use it at the ETL Jobs.

Adding the Endpoint to Tagetik SAAS. As the path, you should see the path that is specified as the CloudFormation parameter “s3databucketname”
Download the yaml into your computer, and use it for creating the stack

CloudWatch Alarms

If you selected “true” on createcloudwatchmonitoring parameter, the stack will create metrics and alarms for instance memory, disk and CPU usage. The software should clear the tmp folder automatically after transmitting the files, but in case of a memory leak or software malfunction alarms will be sent to escalation topic, which is found inside the Stack Outputs. Subscribe to the endpoint with the help of AWS Documentation to be aware of the alarms.

Note: I’m not associated with Tagetik or working for them.

--

--

Niilo Remes
Niilo Remes

Written by Niilo Remes

0 Followers

Cloud Infrastructure Engineer

No responses yet