Installing Tagetik Data Loader with CloudFormation
Sending data from S3 bucket to Tagetik SAAS made easy
What is Tagetik Data Loader?
Tagetik Data Loader (TDL) is a software, which allows data loading from database or file into Tagetik SAAS. TDL is installed to Windows or Linux server, and this writing shows how a S3 bucket is mounted into directory inside Linux instance. The S3 directory is specified as endpoint inside Tagetik Data Loader Agent, and after installation Tagetik SAAS can fetch the files from the S3 bucket.
Architecture of the solution
The solution consists of six elements — Two S3 buckets (one for installation files, one for files used by Tagetik SAAS), one EC2 instance, Elastic IP for EC2, Secrets manager for storing passwords, and Tagetik SAAS.
From these, the S3 bucket for installation files, Secrets Manager Secret and Elastic IP has to be created manually. Everything else is automated through the CloudFormation template. The template requires an VPC with connection to the internet, but the default VPC is good for that.
Prerequisites
- Existing S3 bucket, with installation package and HTTPS Certificates inside at root
- Existing Elastic IP, which is whitelisted at Tagetik Side. Send the IP to Tagetik Support for whitelisting, and use the EIP address as stack parameter.
- Password for certificate, keystore and connection stored inside AWS Secrets Manager.
Installation package, certificate, keystore and the passwords for them are provided to you by Tagetik Support. The connection password has to be created by you.
Adding Secrets to Secrets Manager
Secrets can be created by either the AWS Console or CloudFormation template. Adding the secrets via console happens from the Secrets Manager service, by “Store a new secret” button and selecting “Other type of secret”. The template uses keys “tagetikkeystorepassword”, “tagetikcertificatepassword” and “tagetikconnectionpassword” for Key/value pairs, and it is possible to use same secret by specifying multiple Key/value rows inside it.
Permissions of using secrets stored inside Secret Manager can be granted with two different ways — either into the secret, or into the resource that queries the secret. In this template we use the latter way, by specifying a policy into EC2role, which allows the EC2 Instance to fetch the passwords from Secret Manager. Note that using custom KMS Key requires additional permissions for decryption, and the template assumes that you’re using default KMS Key (aws/secretsmanager)!
Installation of the stack
Verify that the prerequisites are fulfilled and create the stack from the template found at the bottom of the page. Fill in the required parameters.
After the installation, connect to the instance to run the required manual steps — Apache Karaf requires manual intervention for the first runtime, but after that and setting up the TDL as Linux Service, the service should handle server restarts without problems. Note that you have to run the bin/karaf as root!
Using AWS Systems Manager Session Manager is the easiest way to connect to newly created instance. Go to EC2 console, select instance, press Connect and select Session Manager, and you’re in.
After connecting to instance, run the following commands to start Karaf, add TDL Wrapper to allow TDL to be run as service, start the service and enable it to make sure it is running even after restarting the instance.
/TagetikDataLoader/dev/apache-karaf/bin/karaf
wrapper:install --name "TDL" --display "TDL" --description "TDL"
logout
systemctl enable /TagetikDataLoader/dev/apache-karaf/bin/TDL.service
systemctl start TDL
Setting up connection to TDL inside Tagetik SAAS
To set up the connection to TDL, you must add an option for specifying the endpoint — by default, TDL is not visible in the UI.
After adding the TDL tile, click it and open the settings panel. Click + and add the information data used at the TDL installation. “Tagetik Data Loader ID” is created by the software and can be found from tgk.agent.cfg inside your installation folder (in my example it is “/TagetikDataLoader/dev/apache-karaf/”). The password inside that file is encrypted at the startup of Karaf, and therefore you can’t use the file content for the password — You have to use the password inside the Secrets Manager path, that you specified at the CloudFormation parameter “connectionpassword” as the Authentication code. As the “Tagetik Data Loader Name” you have to use the name of the service provided for the Operating System, by default that is “TDL”.
After setting up the connection for TDL, you have to specify an endpoint for ETL operations. Create the endpoint tile the same way you added TDL tile, and use it to add the TDL as the endpoint. After creating the endpoint, you can use it at the ETL Jobs.
CloudWatch Alarms
If you selected “true” on createcloudwatchmonitoring parameter, the stack will create metrics and alarms for instance memory, disk and CPU usage. The software should clear the tmp folder automatically after transmitting the files, but in case of a memory leak or software malfunction alarms will be sent to escalation topic, which is found inside the Stack Outputs. Subscribe to the endpoint with the help of AWS Documentation to be aware of the alarms.
Note: I’m not associated with Tagetik or working for them.