John Lekberg


NYC Lead in Tap Water

Created 2019-07-14.

Lead Levels (mg/L) in NYC tap water collected after a 1-2 minute flush

I used data from New York City's "Free Residential at-the-tap Lead and Copper" dataset, which has data on lead and copper in water. The EPA has determined that there is no safe amount of lead in drinking water. However, humans do need some amount of copper to survive, so a trace amount of copper is necessary for good health. I'm not sure how to assess safe copper levels, so I stuck to analyzing lead levels.

Lead levels were measured at three points in time:

So I first measured lead levels over time (see "Lead Readings over Time"):

ggplot(water_lead, aes(Flush.Time, Lead.Reading)) +
    geom_point()

Based on that graphic, I decided to just analyze the 1-2 minute flush time. 5 minute flush time data is missing for 14,521 kits, and there is a lot of variability in the lead readings for first draw, and I think a lot of that variability may be explained by not having used the faucet recently. I think that the 1-2 minute flush data gives a more consistent and representative value for lead readings.

The graphic shows that the vast majority of lead levels measured in tap water are safe, although there is an increase in the outliers (kits that measured abnormally high lead levels) over time.


Lead Readings over Time

Code used to create "NYC Lead in Tap Water"


code.r - Main code used to generate the graphic.
library(magrittr)
library(dplyr)
library(tidyr)
library(ggplot2)

read.csv("~/Downloads/Free_Residential_at-the-tap_Lead_and_Copper_Data.csv") %>%
  select(-starts_with("Copper.")) -> water

ggplot(water, aes(Date.Collected, Lead.1.2.Minute.Flush..mg.L.)) + geom_point()

# -- To make "Lead Readings over Time" graphic

water %>%
  gather(Flush.Time, Lead.Reading, starts_with("Lead.")) %>%
  mutate(Flush.Time = recode(Flush.Time,
                             "Lead.First_Draw..mg.L." = "First Draw",
                             "Lead.1.2.Minute.Flush..mg.L." = "1-2 Minute Flush",
                             "Lead.5.Minute.Flush..mg.L." = "5 Minute Flush")) %>%
  mutate(
    Flush.Time = ordered(
      Flush.Time,
      levels = c("First Draw", "1-2 Minute Flush", "5 Minute Flush")
    )
  ) -> water_lead

ggplot(water_lead, aes(Flush.Time, Lead.Reading)) + geom_point()