Strategies to Edit Production Data

By Julie Qiu

Elevator Pitch

Editing data in a production database is sometimes necessary. However, mistakes in running these queries can be disastrous. In this talk, you will learn strategies for making edits to a production database with examples from a Python stack that increase safety and auditability.

Description

At some point, we all find ourselves at a SQL prompt making edits to the production database. We know it’s a bad practice and we always intend to put in place safer infrastructure before we need to do it again — what does a better system actually look like?

This talk progresses through 5 strategies for teams using a Python stack to do SQL writes against a database, to achieve increasing safety and auditability:

(1) Develop a process for raw SQL edits (2) Run scripts locally (3) Run scripts on an existing server (4) Use a task runner (5) Build a Script Runner service

We’ll talk about the pros and cons of each strategy and help you determine which one is right for your specific needs.

By the end of this talk you’ll be ready to start upgrading your infrastructure for making changes to your production database safely!

Notes

Audience Level

Anyone working in a stack that involves a database.

Previous Experience

  • Speaker at DevOpsDays 2018 (http://bit.ly/julieqiu-devopsdaysnyc2018)
  • Speaker at Elastic{ON} 2018 (http://bit.ly/julie-elasticon-2018)
  • Speaker at PyCaribbean2018 (Feb 2018)
  • Speaker at Data Day Mexico (https://www.youtube.com/watch?v=I1-72RtWZsM)
  • Tutorial Presenter at PyCon 2018 (https://www.youtube.com/watch?v=6_P_h2bDwYs)
  • Speaker at PyCon 2018 (https://www.youtube.com/watch?v=43NiFtbNg_s)
  • Speaker at DjancoCon Europe 2018 (May 2018)
  • Speaker at Lead Developer London (June 2018)

  • Featured in Built In NYC: http://bit.ly/julieqiu-builtinnyc
  • Gave a lighting talk at Write/Speak/Code Meetup 2017
  • Gave a lighting talk at Write/Speak/Code Conference 2017
  • Regularly present at engineering weekly meetings (~40 people)

Past Presentations

http://bit.ly/julieqiu-slides

Outline (Deck: https://speakerdeck.com/julieqiu/strategies-to-edit-production-data)

[1 minute] Intro

  • Who am I
  • What I do

[2 minutes] Why We Run Manual SQL Writes

  • Lack of a UI to make changes
  • Edge cases
  • Speed and efficiency

Strategy #1 [4 minutes]: Develop a process for raw SQL edits

How this works:

If you need to execute a SQL write query against the production database: 1. Add a row to the Manual SQL Queries Google Sheet 2. A reviewer signs off on your query. 3. Run your query against the production database (inside a transaction!)

Pros:

  • No new features had to be built.

Cons:

  • Mistakes could still easily be made.
  • Papertrail is recorded manually.
  • Long queries will (and if not should) timeout.

Choose This Strategy If:

  • You need a quick system to increase safety and auditability immediately
  • This strategy will probably not go away at your average startup
  • We still use this today because it is easy and flexible
    • Introducing the process makes it slightly more painful, which encourages people to build better tools and use them

Strategy #2 [5 minutes]: Run scripts locally

Write scripts & connect to the remote database from your local machine

How this works:

  1. Write a migration script using SQLAlchemy or your preferred ORM
  2. Set up an SSH tunnel to the remote database using database connection strings
  3. Run your query against the production database from your laptop

Pros:

  • Easy to iterate and make changes to the script
  • Easy to manipulate the outputs of the script
  • Easy to import functions from other scripts (since the entire repository was accessible from the developer’s machine)

Cons

  • Changes could be made to the script and run on any environment without code review
  • Logs are only available on the user’s machine
  • Scripts will stop running when the user turns off their computer

Choose This Strategy If:

  • You want to execute database edits with somewhat complex logic
  • You don’t have the infrastructure in place to run scripts on a server
  • Your scripts don’t take a long time to run

Strategy #3 [4 minutes]: Run scripts on an existing server

Deploying scripts to an application and running them there

How this works:

  1. Write a migration script using SQLAlchemy or your preferred ORM
  2. Deploy the application code to the server
  3. SSH into the application server and run the script inside a session

Pros

  • Can write and run tests before running the scripts
  • All scripts being run on a remote environment will have gone through a code review
  • Ability to run long scripts (inside a session)

Cons

  • Can be annoying to make small changes depending on the deployment process
  • Lack of logging of what scripts have been run
  • Potential for scripts to affect CPU load on your server

Choose This Strategy If:

  • You want to execute database edits with somewhat complex logic
  • You need to run scripts that take a long time
  • You want to make sure your scripts are code reviewed
  • You don’t have a specific server dedicated to running scripts and don’t have time to set it up
  • Your scripts won’t overload the server that you are using

Strategy #4 [4 minutes]: Use a Task Runner

How this works (How we implemented this strategy):

  1. Setup a new Jenkins project that runs scripts
  2. Write a script that could run on this Jenkins project
  3. Run the script on Jenkins under this project:
    • To run a script, you would go to the project: (1) Click “Build with Parameters (2) Type in the filepath of the script you want to run (3) Select the arguments you want to use (4) Click “Build”

Pros

  • Detailed audit logs
  • All scripts will have gone through a code review
  • Don’t need to SSH inside a server to run scripts (minimizes room for error)

Cons

  • Annoying to make changes to a script, especially when they are minor (requires code review process, deploy for each change, etc.)
  • Difficult to manipulate the outputs of the scripts (i.e. cannot directly pipe the outputs into a file)

Choose This Strategy If:

  • You want to execute database edits with somewhat complex logic
  • You need to run scripts that take a long time
  • You want to make sure your scripts are code reviewed
  • You care about logging and having an audit trail
  • You already use Jenkins (or another task runner) and have time to set up a task to run scripts

Strategy #5 [5 minutes]: Build a Script Runner Service

Use the tools you use to build application servers to develop a script runner application

How this works:

  • A different server is set up for each environment to run scripts
  • Each server has access to credentials in Puppet, which are common to other EC2 instances in that environment
  • User runs scripts from a UI in Jenkins
  • Steps for running the script is similar to the steps in Strategy #4
  • The user no longer needs to manually input command line args (since these configs are now available as environment variables on the script runner server)
  • To run the script, you would go to the project: (1) Click “Build with Parameters (2) Type in the filepath of the script you want to run (3) Select the environment (4) Click “Build”

Pros:

  • Same as above, but removes the need to manually select params

Cons:

  • Initial infrastructure setup is more cumbersome

Choose This Strategy If:

  • You want to execute database edits with somewhat complex logic
  • You need to run scripts that take a long time
  • You want to make sure your scripts are code reviewed
  • You care about logging and having an audit trail
  • You want the configs setup for different environments to be consistent with the configs you use for your applications
  • You care about usability for your developers
  • You want to parallelize and scale your script runner application (i.e. have multiple AWS EC2 instances)
  • You have the time and resources to set a script runner service

Conclusions [5 minutes]

Make something useable

  • Engineers are also humans
    • Important to build your strategy with the end users (other engineers) in mind
    • The new strategy also needs to provide speed and usability in addition to safety and audability

Invest the effort because it is worth the cost

  • As engineers, we are often making trade-offs between development speed and safety
    • We often prioritize speed, but it is important not forget to invest in the safety part
    • As opposed to thinking that something is “not ideal” but keep doing it anyways, spend some time to think about the impact on our users if we make better decisions
  • Doing the right thing upfront could also cost very little time, relative to the hours of developer work and business cost incurred if something unexpected happens