Optimizing tea: An N=4 experiment

dynomight * * experiment

Tea is a little-known beverage, consumed for flavor or sometimes for conjectured effects as a stimulant. It's made by submerging the leaves of C. Sinensis in hot water. But how hot should the water be?

To resolve this, I brewed the same tea at four different temperatures, brought them all to a uniform serving temperature, and then had four subjects rate them along four dimensions.

Subjects

Subject A is an experienced tea drinker, exclusively of black tea w/ lots of milk and sugar.

Subject B is also an experienced tea drinker, mostly of black tea w/ lots of milk and sugar. In recent years, Subject B has been pressured by Subject D to try other teas. Subject B likes fancy black tea and claims to like fancy oolong, but will not drink green tea.

Subject C is similar to Subject A.

Subject D likes all kinds of tea, derives a large fraction of their joy in life from tea, and is world's preeminent existential angst + science blogger.

Tea and brewing

For a tea that was as "normal" as possible, I used pyramidal bags of PG Tips tea (Lipton Teas and Infusions, Trafford Park Rd., Trafford Park, Stretford, Manchester M17 1NH, UK).

I brewed it according to the instructions on the box, by submerging one bag in 250ml of water for 2.5 minutes. I did four brews with water at temperatures ranging from 79degC to 100degC (174.2degF to 212degF). To keep the temperature roughly constant while brewing, I did it in a Pyrex measuring cup (Corning Inc., 1 Riverfront Plaza, Corning, New York, 14831, USA) sitting in a pan of hot water on the stove.

After brewing, I poured the tea into four identical mugs with the brew temperature written on the bottom with a Sharpie Pro marker (Newell Brands, 5 Concourse Pkwy Atlanta, GA 30328, USA). Readers interested in replicating this experiment may note that those written temperatures still persist on the mugs today, three months later. The cups were dark red, making it impossible to see any difference in the teas.

After brewing, I put all the mugs in a pan of hot water until they converged to 80degC, so they were served at the same temperature.

Serving

I shuffled the mugs and placed them on a table in a random order. I then asked the subjects to taste from each mug and rate the teas for:

  • "Aroma"
  • "Flavor"
  • "Strength"
  • "Goodness"

Each rating was to be on a 1-5 scale, with 1=bad and 5=good.

Subjects A, B, and C had no knowledge of how the different teas were brewed. Subject D was aware, but was blinded as to which tea was in which mug.

During taste evaluation, Subjects A and C remorselessly pestered Subject D with questions about how a tea strength can be "good" or "bad". Subject D rejected these questions on the grounds that "good" cannot be meaningfully reduced to other words and urged Subjects A and C to review Wittgenstein's concept of meaning as use, etc. Subject B questioned the value of these discussions.

After ratings were complete, I poured tea out of all the cups until 100 ml remained in each, added around 1 gram (1/4 tsp) of sugar, and heated them back up to 80degC. I then re-shuffled the cups and presented them for a second round of ratings.

Results

For a single summary, I somewhat arbitrarily combined the four ratings into a "quality" score, defined as

(Quality) = 0.1 x (Aroma) + 0.3 x (Flavor) + 0.1 x (Strength) + 0.5 x (Goodness).

Here is the data for Subject A, along with a linear fit for quality as a function of brewing temperature. Broadly speaking, A liked everything, but showed weak evidence of any trend.

And here is the same for Subject B, who apparently hated everything.

Here is the same for Subject C, who liked everything, but showed very weak evidence of any trend.

And here is the same for Subject D. This shows extremely strong evidence of a negative trend. But, again, while blinded to the order, this subject was aware of the brewing protocol.

Finally, here are the results combining data from all subjects. This shows a mild trend, driven mostly by Subject D.

Thoughts

  1. This experiment provides very weak evidence that you might be brewing your tea too hot. Mostly, it just proves that Subject D thinks lower-middle tier black tea tastes better when brewed cooler. I already knew that.

  2. There are a lot of other dimensions to explore, such as the type of tea, the brew time, the amount of tea, and the serving temperature. I think that ideally, I'd randomize all those dimensions, gather a large sample, and then fit some kind of regression.

  3. Creating dozens of different brews and then serving them all blinded at different serving temperatures sounds like way too much work. Maybe there's an easier way to go about this? Can someone build me a robot?

  4. If you thirst to see Subject C's raw aroma scores or whatever, you can download the data or click on one of the entries in this table:

    Subject Aroma Flavor Strength Goodness Quality
    A x x x x x
    B x x x x x
    C x x x x x
    D x x x x x
    All x x x x x
  5. Subject D was really good at this; why can't everyone be like Subject D?

dynomiiiiiiiiiight (Or try substack or RSS.)
mistakes (Need to see what happens?)
comments lemmy / substack

My 9-week unprocessed food self-experiment * experiment science

My more-hardcore theanine self-experiment * experiment

My 16-month theanine self-experiment * experiment

Fahren-height * experiment