← Graph

Data Management With Ruby

talk 29 connections

Sergey Sergyenko's wroclove.rb 2022 talk. Defines data management as a holistic discipline broader than database management, ETL, data analytics or data science, spanning structure, architecture, shaping, protection and transformation of data. Surveys the variety of data-related professions (data engineer, data tester, data architect, data analyst, etc.) and cites a report listing Ruby among the top-5 required languages for data roles — data work is about writing code faster, not running faster code, and ~80% of data work is scraping and preparing data, which Ruby suits perfectly. Argues today's Ruby engineers still wear the data-engineer hat when doing migrations, bulk inserts or normalization, and that data prep often falls into a 'Terra incognita' between backend, devops, QA and data roles. Recommends learning ETL and SQL, mastering n+1 and indexes, and understanding tools like destroy_all vs delete_all. Core warnings: don't be greedy with data, delete unneeded data, avoid 'data-dictated development' (let data serve the app, not the reverse), and treat compliance/security seriously from day one. Case study: a HIPAA-compliant healthcare project where the team inherited PII-laden data from another vendor, couldn't uniquely identify users without consent, was locked into a restrictive compliant hosting provider, and couldn't use BI/analytics/third parties like New Relic or Power BI. After rejecting redesign-from-scratch, they picked data obfuscation (over encryption and tokenization) because it preserves shape and realism. Faker generates fakes but doesn't obfuscate existing production data, so the team built Grazer, a Ruby gem that scans models, emits per-model rule configs, applies Faker-backed strategies that preserve uniqueness and regional relationships (e.g. ZIP-code-consistent addresses), and extracts the obfuscated data as SQL insert statements (full or sliced) — never leaving the server as a dump. A background job tracks the delta to keep obfuscated data consistent as the real data changes.

type
talk
slide_count
30
talk Data Management With Ruby
about
Ruby tool
Makes the case that Ruby is a first-class language for data-management work.
talk Data Management With Ruby
about
Faker tool
Faker is surveyed as the go-to fake-data gem and the backbone of Grazer's obfuscation strategies.
talk Data Management With Ruby
about
Grazer project
The talk introduces Grazer as the team's custom obfuscation tool.
talk Data Management With Ruby
about
Obfuscation is the central technique chosen for the HIPAA case study.
talk Data Management With Ruby
about
Data Masking concept
Data masking is picked over encryption and tokenization.
talk Data Management With Ruby
about
Tokenization is presented and rejected as an obfuscation technique.
talk Data Management With Ruby
about
HIPAA concept
Case study centers on a HIPAA-compliant healthcare application.
talk Data Management With Ruby
about
Inherited PII drove the entire obfuscation effort.
talk Data Management With Ruby
about
ETL concept
Discusses ETL as part of the Ruby-for-data skill set.
talk Data Management With Ruby
about
Coined as an anti-pattern to avoid.
talk Data Management With Ruby
about
N+1 Queries concept
Listed as a must-know topic for Ruby data engineers.
talk Data Management With Ruby
about
Discussed as a must-know topic, with a warning against the naïve 'indexes everywhere' advice.
talk Data Management With Ruby
about
Used as an example of tooling Ruby data engineers should understand.
talk Data Management With Ruby
about
Heroku tool
Mentioned as the staging target enabled by Grazer-obfuscated data.
talk Data Management With Ruby
about
New Relic tool
Cited as a third-party service hard to connect under compliant hosting, and where excessive data exposure leaked PII.
talk Data Management With Ruby
about
Power BI tool
The client wanted Power BI for analytics, motivating obfuscated datasets.
Asked during Q&A.
asked_at
Data Management With Ruby talk
Asked during Q&A.
asked_at
Data Management With Ruby talk
Asked during Q&A.
asked_at
Data Management With Ruby talk
Asked during Q&A.
asked_at
Data Management With Ruby talk
Asked during Q&A.
asked_at
Data Management With Ruby talk
Asked during Q&A.
authored
Data Management With Ruby talk
Sergey Sergyenko delivered this talk at wroclove.rb 2022.
from_talk
Data Management With Ruby talk
Takeaway extracted from the talk.
from_talk
Data Management With Ruby talk
Takeaway extracted from the talk.
from_talk
Data Management With Ruby talk
Takeaway extracted from the talk.
from_talk
Data Management With Ruby talk
Takeaway extracted from the talk.
from_talk
Data Management With Ruby talk
Takeaway extracted from Q&A on ActiveRecord encryption.
talk Data Management With Ruby
presented_at
Talk delivered at wroclove.rb 2022.

Provenance

Created
2026-04-17 16:17 seed
Read by
28 extractions