Hello, I'm

Chris John Agarap

Data Engineer

Chris John Agarap

About Me

Data Engineer with experience in building robust web applications, database models, and ETL pipelines. An experienced and constant learner that aims to help innovate solutions that can positively impact the industry. I am good at...

  • Python
  • Django
  • Pandas
  • MySQL
  • Apache Spark
  • GCP
  • Hadoop
  • Kafka
  • Flask
View CV

What I do

Data Analysis

I use Python's Pandas, Matplotlib, NumPy, Scikit-learn, and Orange for Data Analysis. I can also use Google Cloud Platform services for such as AutoML, BigQuery, and Looker Studio.

Database Management

I know RDBMS and NoSQL for Database Management, namely PostgreSQL, MySQL, SQL Server, SQLite, Oracle, and MongoDB. I also have experience in performing queries and managing Hive Data Warehouse.

Data Engineering

As a data engineer, I have experience in using three programming languages, namely Python, Java, and Scala. The frameworks I’ve used for large-scale data processing and management are Google Cloud Functions, Pub/Sub, Dataproc, and Cloud Composer.

Education

FEU Institute of Technology

Bachelor of Science in Information Technology with specialization in Service Management and Business Analytics

2015-2019
Achievements:

  • • Academic Scholar, 2015-2019
  • • Top Performing Student, 2018-2019

Immaculate Heart of Mary Academy

High School

2011-2015
Achievements:

  • • Student Council Leadership Award, 2015
  • • Campus Journalism Award, 2014-2015
  • • Top Performing Student, 2012-2014

Work Experience

Catch AU

Data Engineer

February 2024 - present
Responsibilities:

  • • Building ETL pipelines in PySpark
  • • Data Ingestion using AWS Glue
  • • Data Storage using S3
  • • Data Warehousing with Redshift, Athena, and Snowflake

Collabera Digital

Streaming Developer

April 2023 - January 2024
Responsibilities:

  • • Building data transformations in PySpark
  • • Data Ingestion using Spark
  • • Data Streaming using Kafka

Advance Intelligence Group

Data Engineer

Dec 2022 - Mar 2023
Responsibilities:

  • • Data Ingestion using Google Cloud Function and REST APIs
  • • Data Pipeline monitoring with Google Cloud Composer and Scheduler
  • • Data Visualization using Google Looker Studio
  • • Data Warehousing with Google BigQuery and dbt

Indra Sistemas

Senior Analyst

July 2021 - Nov 2022
Responsibilities:

  • • Apache Spark for Data Engineering
  • • Maintenance of ETL Pipelines in Apache Airflow
  • • SQL Development using Hive

Xurpas Enterprise, Inc.

Junior Software Developer

Sept 2019 - June 2021
Responsibilities:

  • • Apache Spark for Data Engineering
  • • API Development
  • • Data Quality Testing
  • • Designing Database Models and Schema
  • • Web Application Development

Acudeen Technologies

Data Engineering Intern

Dec 2018 - Mar 2019
Responsibilities:

  • • Designed the Data Warehouse Schema for Invoices to be used by the Data Analysts.
  • • Developed and Deployed the ETL Pipeline for the Acudeen Platform.

SYKES Asia Inc.

Data Analytics Intern

Aug 2018 - Nov 2018
Responsibilities:

  • • Data Analysis
  • • Data Annotation
  • • Social Media Analytics

Projects

  • All Projects
  • Company Projects
  • Data Engineering
  • Database Management
  • Machine Learning
  • Web Applications

Animal Type Classifier using Scikit-Learn

The project's objective is to predict an animal's class given some of its characteristics such as hair, feathers, backbone, fins, etc. The machine learning algorithms used to create the classifier are Logistic Regression and Decision Tree.

  • Python
  • Pandas
  • Git
  • Scikit-Learn
  • Jupyter Notebook



Python imports and dataset

Preparing and Training the data

Final dataset (with prediction) and data visualization

Calm Flight: Online Flight and Hotel Reservation System

Class Project for Web Applications Development 1

Calm Flight is a flight reservation system for Domestic flights within the Philippines. Through this system, the customer and employee can login. In customer login, all flight transactions are recorded and for the admin login, flight destinations can be set, including transactions for hotel reservation.

  • PHP
  • MySQL
  • HTML & CSS
  • JavaScript



Home Page

Flight Search Results

Registration

Login

COVID Data Pipeline

This project contains an end-to-end data pipeline for processing Johns Hopkins University COVID-19 dataset. This pipeline has a backfill for the date range of January 2020 to March 2023. It consists of three main components: data loader, transformer, and data exporter.




Data Pipeline Architecture

DigiWiz: an Open-source Learning Platform

Class Project for Information and Software Assurance and Security

The United Nations found that around 265 million of the children are out of school and approximately 22% of them are supposedly enrolled in primary school. People, regardless of age, race, or gender, have right to education. To ensure that everyone has access to education, the United Nations established a goal for quality education as one of their sustainable development goals, a blueprint to achieve a sustainable future for all. This aims to provide an inclusive and quality education for all, and to promote lifelong learning.

DigiWiz, an open-source learning platform, was created to help support United Nations’ goal for quality education. To evaluate the effectivity of the system, 10 primary school students and 4 college students are asked to test the system. Based on the results from the respondents’ assessment, majority of the respondents agreed that the system met the functionality, performance, reliability, supportability/security, and usability.

The project documentation is provided below.

  • Python
  • Django
  • SQLite
  • HTML & CSS
  • JavaScript
  • ChartJS
  • Git


Live Demo

Home Page

Courses Page

Course Details Page

Admin Dashboard Page

ETL Pipeline for Acudeen Technologies

Project for Internship 2

The project extracts data from various database services, transform it into a specific format and loads it into SQL Server.

  • Python
  • Docker
  • Pandas
  • Git
  • SQLAlchemy
  • SQL Server



Online Ordering and Sales Management System with Customer Transaction Behavioral Analysis for TECHNOHOLICS

Capstone Project in partial fulfillment of the requirements for the Degree of Bachelor of Science in Information Technology with specialization in Service Management and Business Analytics.

The project is an e-commerce application that aims to help TECHNOHOLICS in monitoring transactions and forecasting sales of the business to come up with better marketing strategies based on the customers’ purchase data. Using the Apriori algorithm, the system generated the frequent item set of each customer and from it, association rules were produced to provide suggestions or recommendations about what the customer might purchase.

  • PHP
  • CodeIgniter
  • MySQL
  • HTML & CSS
  • JavaScript
  • AJAX
  • ChartJS
  • Git



Home Page

Product Details Page

Shopping Cart Page

Admin Dashboard Page

Sales Forecast

Feedback Page

Payroll Management System

Class Project for Web Applications Development 2

  • PHP
  • CodeIgniter
  • MySQL
  • HTML & CSS
  • JavaScript
  • Git



Login Page

Payroll Data

Employee Data

About Page

Simple Chatbot using Flask and Chatbot

The project's objective is to answer general queries that the user may ask. Using Deep Neural Networks in TensorFlow, the chatbot was able to understand and learn the text that the user inputs.

  • Python
  • Flask
  • TensorFlow
  • Numpy



Social Media Analytics and Data Annotation

Project for Internship 1

The aim of the social media analytics project is to analyze the feedback given by people to the clients via social media and personal blogs.
Meanwhile, the data annotation project is used in preparation for the data analysis to be made by the company's data analysts.

  • Microsoft Office



Student Record Management System

Class Project for Database Management 2

Student Record System is a management information system for education establishments to manage student data. It is used by professors to provide data for all students. The importance of this is the ability to report information of the student grades.  A second benefit, particularly with automated systems, is the efficiency in processing and exchanging student records among schools. When student records are added to an overall management information system that includes information on staff, materials, and budgeting for the school or school district, more management activities can be accomplished and efficiency will be improved. Student record systems, thus, play a key role in the overall functioning of the education system; but more importantly, they increase a school's ability to meet the needs of students.

This project is a Java application that connects to Oracle database.

  • Oracle
  • Git
  • Java
  • Netbeans



Entity Relationship Diagram

Twitter Sentiment Analysis

Class Project for Analytics Application

Social media nowadays has become an integral part of life, the rise of the different social networking platforms is inevitable. One example is Twitter which is a free social networking site that allows people to share their thoughts and opinions using tweets. With millions of people tweeting every day, it is clear that behind those tweets are emotions that the users express.

The software Orange is used to classify tweets based on Ekman’s 6 Universal Emotions Theory and identify whether the tweets are positive, negative, or neutral. 1000 tweets were gathered using the Twitter API via its query search for content. Before using the fetched data for analysis, the tweets were preprocessed using the preprocess text widget to remove unnecessary words or punctuation marks commonly found in the word cloud created beforehand. Sentiment Analysis was used to determine and predict the emotions behind the tweets of each user. Vader's technique was used in the sentiment analysis widget and Ekman’s emotion classifier was used in the tweet profiler widget. To visualize the results of the analysis, boxplot, distribution chart, and heat map were used.

For the complete project overview, the documentation is provided below.

  • Orange Data Mining



Orange is an open-source data visualization, machine learning and data mining toolkit.

Project's Overall Workspace

Preprocessed data visualized using the Word Cloud widget

The results gathered from the analysis were visualized in a Box Plot

Certifications and Licenses

Apache Spark for Java Developers

Udemy

March 2021

Getting Grounded on Analytics

Coursebank

January 2021

Modernizing Data Lakes and Data Warehouses with GCP

Coursera

August 2020

Leveraging Unstructured Data with Cloud Dataproc on Google Cloud Platform

Coursera

January 2020

Google Cloud Platform Big Data and Machine Learning Fundamentals

Coursera

December 2019

Microsoft Virtual Academy: Introduction to Programming with Python

Microsoft

September 2018

MTA: Database Fundamentals - Certified 2018

Microsoft

July 2018

Seminars and Trainings

NodeJS Training

Xurpas, Inc.

February 2020

Python Summit 2020

FEU Tech JPCS

February 2020

ACM Next 2019

FEU Tech ACM

June 2019

AI Pilipinas Meetup #7

Senti Techlabs & Microsoft

February 2019

AI Pilipinas Meetup #6: ML + IoT, GAN, TensorFlow 2.0

Senti Techlabs

January 2019

Recommendations

I worked with Chris, he was an Intern at Acudeen. Chris was knowledgable in Python and he used that in crafting and provisioning the ETL project at Acudeen. He is open to learning new technology and libraries within Python and curious at Javascript as well. He is easy to work with and get along with colleagues pretty well.

John Chamver Puno

worked with Chris John in different groups

I've had the opportunity to be on the same project with him numerous times wherein I was able to saw his different skills and display his good attitude with the group. Skills like keen analyzation of data and superb management of the database are only an example of what I believe why Chris would be a perfect fit as a Data Engineer.

Rex Christian Baldonado

worked with Chris John in the same group

He has vast knowledge on different programming languages and has the motivation to learn more languages inclined to modern industry. I was a member of his group in one of our projects and he has shown not just great programming skills but also leadership skills.

Kervin Rollan

worked with Chris John in the same group