/
Lambda Integration

Lambda Integration

Project Plan: KualiBuild Document Submission Data Pipeline with AWS Lambda and Denodo

Project Goal: Develop a data pipeline to extract document submission data from KualiBuild using an AWS Lambda function, parse it with Python, and surface the data as a table in Denodo.

Point of Contact: Joe Strinka (jstrin01@tufts.edu)

Project Team:

  • AWS + Denodo Admin : 

    • Sr Analytics Engineer: Joe Strinka

  • Python/Denodo Developer: 

    • Sr Software Engineer: Nick Gotch

    • Software Engineering Co-op: Andrew Laurence

  • Kualibuild Admin 

    • Applications Architect: Erol Irez

  • Subject Matter Experts 

    • Assistant Director of Knowledge Strategy & Operations: Phillippa Lehar (Affiliation Agreements)

Input/Output:

Given a list of Kualibuild Application IDs,

Surface a Denodo table for each application ID that contains a list of documents and the contained information as columns. 

Ex:

Table bv_kb_applicationName

document_id

col1

col2

1234

lorum

ipsum

5678

text

numbers

 

 

 

Project Phases:

Phase 1: Design & Development

  1. Data Definition: 

    • List of Applications to start with: 

      1. Affiliation Agreement

      2. Computer Request

  2. AWS Lambda Development:

    • Develop a Python script using libraries like requests or the KualiBuild API (if available) to query KualiBuild for document submission data.

    • Parse the retrieved data (e.g., JSON, XML) and extract document IDs, column names, and their corresponding values.

    • Structure the extracted data into a tabular format (e.g., pandas DataFrame).

  3. Denodo Virtual Port Design:

    • Design a Denodo VD to consume the data produced by the Lambda function.

    • Define the schema of the VDP to match the tabular format created in the Lambda function.

Phase 2: Deployment & Testing

  1. AWS Lambda Deployment:

    • Deploy the developed Lambda function to AWS.

    • Configure the Lambda function to be triggered by a scheduled event (e.g., daily) or an external trigger (if applicable).

  2. Denodo Consumption Infrastructure:

    • Create the consumption infrastructure in Denodo based on the design from Phase 1.

    • Configure the Virtual Port to call the deployed Lambda function as its data source.

  3. Testing & Validation:

    • Test the Lambda function to ensure it retrieves and parses data correctly.

    • Validate the Denodo VDP by querying the exposed data and verifying the structure and content.

Phase 3: Documentation & Handover

  1. Documentation:

    • Document the entire data pipeline process, including Lambda function code, Denodo Virtual Port configuration, and data schema definitions.

    • Include instructions for deployment, testing, and ongoing maintenance.

Deliverables:

  • Deployed AWS Lambda function for data extraction and parsing

  • Denodo exposing the document submission data as a set of tables

    • bv_kb_applicationName

  • Project documentation

    • Instructions on adding new Application_IDs 

    • Instructions on troubleshooting

      • Instructions on deploying lambda

Success Criteria:

  • Successful retrieval and parsing of document submission data from KualiBuild using the Lambda function.

  • Denodo successfully surfaces the data in a tabular format with all associated columns and values.

  • Data pipeline runs smoothly on a scheduled basis.

    • Query takes < 2 seconds to run when end user consumes

      • If longer let’s discuss caching or daily computation

    • Query has up to date information

      • Real time ideal, depending on performance

Related content