Big Data

Smarter Data solutions using best practices incorporating the best of data lakes, data marts and warehouses for the flow, integration, processing, preparation and analysis of data for value driven insights

DATA FLOW

PySpark development within a Lambda or Kappa style architecture to allow for event-based streaming of data and batch processing. Technologies include Kafka and Spark streaming, IOT. ACID transactions with delta.io and Hudi.

DISCOVERY & GOVERNANCE

Metadata management, data lineage, schema’s. Masking, column and row level security.

DATA QUALITY & STEWARDSHIP

Data preparation, stewardship & data quality dimensions (e.g. completeness, accuracy) using open source (e.g. Deequ), custom frameworks. Glue Databrew and Talend.

DEVOPS

devops icon

Full release and versioning with source control, containers and documentation of the environment setup including Business As Usual steps to maintain environment

Big Data icon

DATA LAKE

Separation of data and compute with schema on read services. Lake Formation, Glue, Nifi, Talend, Hive, Presto, Athena , HBase, S3. Storage in Parquet or ORC.

DATA MINING & REPORTING

OLAP (Online Analytical Processing) based data marts, warehouses and NoSQL solutions. Redshift and Apache Kylin. OLTP using Aurora. Quicksight for reporting. SQL Analytics.

SECURITY

AWS Cognito, Google Auth and JWT Javascript Web Tokens, SAML, LDAP.

AI/ML

Modelling, Natural Language, AI and ML services with platform components from AWS using tools such as SageMaker, Comprehend, Jupyter/Zeppelin and DataBrew.

Grow your business.
Today is the day to build the business of your dreams. Share your mission with the world — and blow your customers away.
Start Now