Data Engineering Is 95% Software Engineering

Data engineering is much closer to software engineering than most people think — if you’re doing it right.

There’s a mystique around data engineering that I think does more harm than good. In practice, 95% of the work is straightforward software engineering: building pipelines, writing tests, managing infrastructure, deploying services. The remaining 5% is domain-specific knowledge about data formats, query optimisation, and the peculiarities of whichever analytics stack you’re working with.

If you’re a software engineer with curiosity about the data domain, you’re already 95% of the way there. The skills you take for granted — clean code, good testing, CI/CD, infrastructure as code — are exactly what’s missing from most data engineering teams. That remaining 5% you’ll pick up quickly.

The real problem isn’t that data engineering is hard. It’s that too many people treat it as a separate discipline entirely, and end up ignoring the software engineering fundamentals that would make their data systems actually reliable.