Lambda Integration
Project Plan: KualiBuild Document Submission Data Pipeline with AWS Lambda and Denodo
Project Goal: Develop a data pipeline to extract document submission data from KualiBuild using an AWS Lambda function, parse it with Python, and surface the data as a table in Denodo.
Point of Contact: Joe Strinka (jstrin01@tufts.edu)
Project Team:
AWS + Denodo Admin :
Sr Analytics Engineer: Joe Strinka
Python/Denodo Developer:
Sr Software Engineer: Nick Gotch
Software Engineering Co-op: Andrew Laurence
Kualibuild Admin
Applications Architect: Erol Irez
Subject Matter Experts
Assistant Director of Knowledge Strategy & Operations: Phillippa Lehar (Affiliation Agreements)
Input/Output:
Given a list of Kualibuild Application IDs,
Surface a Denodo table for each application ID that contains a list of documents and the contained information as columns.
Ex:
Table bv_kb_applicationName
document_id | col1 | col2 | … |
1234 | lorum | ipsum | … |
5678 | text | numbers |
|
Project Phases:
Phase 1: Design & Development
Data Definition:
List of Applications to start with:
Affiliation Agreement
Computer Request
AWS Lambda Development:
Develop a Python script using libraries like
requests
or the KualiBuild API (if available) to query KualiBuild for document submission data.Parse the retrieved data (e.g., JSON, XML) and extract document IDs, column names, and their corresponding values.
Structure the extracted data into a tabular format (e.g., pandas DataFrame).
Denodo Virtual Port Design:
Design a Denodo VD to consume the data produced by the Lambda function.
Define the schema of the VDP to match the tabular format created in the Lambda function.
Phase 2: Deployment & Testing
AWS Lambda Deployment:
Deploy the developed Lambda function to AWS.
Configure the Lambda function to be triggered by a scheduled event (e.g., daily) or an external trigger (if applicable).
Denodo Consumption Infrastructure:
Create the consumption infrastructure in Denodo based on the design from Phase 1.
Configure the Virtual Port to call the deployed Lambda function as its data source.
Testing & Validation:
Test the Lambda function to ensure it retrieves and parses data correctly.
Validate the Denodo VDP by querying the exposed data and verifying the structure and content.
Phase 3: Documentation & Handover
Documentation:
Document the entire data pipeline process, including Lambda function code, Denodo Virtual Port configuration, and data schema definitions.
Include instructions for deployment, testing, and ongoing maintenance.
Deliverables:
Deployed AWS Lambda function for data extraction and parsing
Denodo exposing the document submission data as a set of tables
bv_kb_applicationName
Project documentation
Instructions on adding new Application_IDs
Instructions on troubleshooting
Instructions on deploying lambda
Success Criteria:
Successful retrieval and parsing of document submission data from KualiBuild using the Lambda function.
Denodo successfully surfaces the data in a tabular format with all associated columns and values.
Data pipeline runs smoothly on a scheduled basis.
Query takes < 2 seconds to run when end user consumes
If longer let’s discuss caching or daily computation
Query has up to date information
Real time ideal, depending on performance