Data Management With Ruby

talk 29 connections

Sergey Sergyenko's wroclove.rb 2022 talk. Defines data management as a holistic discipline broader than database management, ETL, data analytics or data science, spanning structure, architecture, shaping, protection and transformation of data. Surveys the variety of data-related professions (data engineer, data tester, data architect, data analyst, etc.) and cites a report listing Ruby among the top-5 required languages for data roles — data work is about writing code faster, not running faster code, and ~80% of data work is scraping and preparing data, which Ruby suits perfectly. Argues today's Ruby engineers still wear the data-engineer hat when doing migrations, bulk inserts or normalization, and that data prep often falls into a 'Terra incognita' between backend, devops, QA and data roles. Recommends learning ETL and SQL, mastering n+1 and indexes, and understanding tools like destroy_all vs delete_all. Core warnings: don't be greedy with data, delete unneeded data, avoid 'data-dictated development' (let data serve the app, not the reverse), and treat compliance/security seriously from day one. Case study: a HIPAA-compliant healthcare project where the team inherited PII-laden data from another vendor, couldn't uniquely identify users without consent, was locked into a restrictive compliant hosting provider, and couldn't use BI/analytics/third parties like New Relic or Power BI. After rejecting redesign-from-scratch, they picked data obfuscation (over encryption and tokenization) because it preserves shape and realism. Faker generates fakes but doesn't obfuscate existing production data, so the team built Grazer, a Ruby gem that scans models, emits per-model rule configs, applies Faker-backed strategies that preserve uniqueness and regional relationships (e.g. ZIP-code-consistent addresses), and extracts the obfuscated data as SQL insert statements (full or sliced) — never leaving the server as a dump. A background job tracks the delta to keep obfuscated data consistent as the real data changes.

type

talk

slide_count

talk Data Management With Ruby

about

Ruby tool

Makes the case that Ruby is a first-class language for data-management work.

talk Data Management With Ruby

about

Faker tool

Faker is surveyed as the go-to fake-data gem and the backbone of Grazer's obfuscation strategies.

talk Data Management With Ruby

about

Grazer project

The talk introduces Grazer as the team's custom obfuscation tool.

talk Data Management With Ruby

about

Data Obfuscation concept

Obfuscation is the central technique chosen for the HIPAA case study.

talk Data Management With Ruby

about

Data Masking concept

Data masking is picked over encryption and tokenization.

talk Data Management With Ruby

about

Data Tokenization concept

Tokenization is presented and rejected as an obfuscation technique.

talk Data Management With Ruby

about

HIPAA concept

Case study centers on a HIPAA-compliant healthcare application.

talk Data Management With Ruby

about

Personally Identifiable Information concept

Inherited PII drove the entire obfuscation effort.

talk Data Management With Ruby

about

ETL concept

Discusses ETL as part of the Ruby-for-data skill set.

talk Data Management With Ruby

about

Data-Dictated Development concept

Coined as an anti-pattern to avoid.

talk Data Management With Ruby

about

N+1 Queries concept

Listed as a must-know topic for Ruby data engineers.

talk Data Management With Ruby

about

Database Indexes concept

Discussed as a must-know topic, with a warning against the naïve 'indexes everywhere' advice.

talk Data Management With Ruby

about

destroy_all vs delete_all concept

Used as an example of tooling Ruby data engineers should understand.

talk Data Management With Ruby

about

Heroku tool

Mentioned as the staging target enabled by Grazer-obfuscated data.

talk Data Management With Ruby

about

New Relic tool

Cited as a third-party service hard to connect under compliant hosting, and where excessive data exposure leaked PII.

talk Data Management With Ruby

about

Power BI tool

The client wanted Power BI for analytics, motivating obfuscated datasets.

question Can Grazer reference other fields for derived values like encrypted passwords?

asked_at

Data Management With Ruby talk

Asked during Q&A.

question How to obfuscate fields that drive app logic (e.g. patient age)?

asked_at

Data Management With Ruby talk

Asked during Q&A.

question ActiveRecord encryption vs data obfuscation for PII

asked_at

Data Management With Ruby talk

Asked during Q&A.

question Performance of Faker on large databases during obfuscation

asked_at

Data Management With Ruby talk

Asked during Q&A.

question Obfuscation and re-identification via record counts / outliers

asked_at

Data Management With Ruby talk

Asked during Q&A.

question Does randomizing structural fields break analytics?

asked_at

Data Management With Ruby talk

Asked during Q&A.

person Sergey Sergyenko

authored

Data Management With Ruby talk

Sergey Sergyenko delivered this talk at wroclove.rb 2022.

takeaway Ruby as a Data Management Language

from_talk

Data Management With Ruby talk

Takeaway extracted from the talk.

takeaway Don't Be Greedy With Data

from_talk

Data Management With Ruby talk

Takeaway extracted from the talk.

takeaway Avoid Data-Dictated Development

from_talk

Data Management With Ruby talk

Takeaway extracted from the talk.

takeaway Take Compliance Seriously From Day One

from_talk

Data Management With Ruby talk

Takeaway extracted from the talk.

takeaway Don't Store Sensitive Data Yourself

from_talk

Data Management With Ruby talk

Takeaway extracted from Q&A on ActiveRecord encryption.

talk Data Management With Ruby

presented_at

wroclove.rb 2022 event

Talk delivered at wroclove.rb 2022.

Provenance

Created: 2026-04-17 16:17 seed
Last updated in: Data Management With Ruby — Sergey Sergyenko at wroclove.... 2026-04-17 21:52
Read by: 28 extractions