2 posts tagged with "Logging"

From Debug to Panic - Comparing Log Levels Across Top Languages

April 24, 2023 · 6 min read

Jeff Dwyer

Prefab Founder & Engineer

Log Levels DropDown

Log levels exist to help developers filter through application logs and focus on relevant information. But what levels are available in each language? I recently looked into this because one of the features of Prefab is that it lets you change your log levels on the fly. To build this, we needed to offer users a dropdown of log levels to choose from. But which levels to show?

You might think that providing a list of levels to choose from should be straightforward. We're all trying to achieve the same thing with logging, so I'm sure there's a nice standard solution, right? Well, no. In this blog post, we'll take a lighthearted look at the log levels available in popular languages and frameworks, comparing their similarities and differences, and pointing out the quirks that make them unique.

Log Levels by Language

A Tour of Log Levels:

Java Ever-popular, Java has a clear-cut set of log levels: Trace, Debug, Info, Warn, Error, and Fatal. Straight to the point, Java keeps things simple and functional... as long as you are you using a Simple Logging Facade for Java (SLF4J). Java likes a Facade Pattern.
Java Util Logging If you make the newbie mistake of trying to use the built-in java.util.logging you're in for a surprise. This package brings us one of our most esoteric set of levels: Finest, Finer, Fine, Config, Info, Warn, and Severe. A fine dining experience of log levels, you can smell the overthinking from a few decades away. Presumably this came about from some deep thinking that Fineness is the right way to describe the levels, whereas Debug describes the action you might take upon them. 20 years on, we're still not sure what to do with Finest logs.
Go A language known for its simplicity and efficiency, offers a slightly extended set of log levels: Trace, Debug, Info, Warn, Error, Fatal and Panic. This is the same as the Java set, but we've added Panic. This is because Go wants to make it clear that it's born of real world concerns. Sometimes your can't simply die and have that be the end of it. When you're running the Internet like Google is, sometimes you need to Panic. If you are using Kubernetes, logging at Panic will also send a Google Wave notification to Larry Page.
Elixir Speaking of real-world reliability, Elixir spices things up with a unique set of log levels: Debug, Info, Notice, Warn, Error, Critical, Alert, and Emergency. This is Elixir’s way of reminding you that it runs on top of Erlang and was achieving 6 nines of uptime when you were still writing “Hello World”. It has thought more about error handling that you ever will, which is why logging at Emergency actually increases America’s DEFCON by 1.
Node.js The popular JavaScript runtime sticks to it's philosophy of: the first step in doing X is choosing between 6 different basically equivalent NPM packages. LogLevel keeps it real with Trace, Debug, Info, Warn, Error but there's no reason not to get Silly and go with NPMLog which offers Silly, Verbose, Info, HTTP, Warn & Error. I didn't add Silly to the matrix below because that's just plain silly.
Ruby Known for its elegance, Ruby has a slightly different set of log levels: Debug, Info, Warn, Error, and Fatal. They're the same as Java, but with no Trace. This is because Ruby wants you to enjoy yourself and if it's come to Trace things have already gone too far and you should take a break and meditate upon the problem.
Python Python, the versatile language, keeps it simple with just five log levels: Debug, Info, Warn, Error, and Critical. You’ll note that this is the same as Ruby, but with Critical instead of Fatal. Python is a battlefield medic and can't be calling things dead if it's possible there's a chance they could be revived.
Php/Laravel Laravel, the popular PHP framework, can’t be contained to the big 5. It offers Debug, Info, Notice, Warn, Error, Critical, Alert, and Emergency. Sometimes Php gets a bad rap, so it wants you to know it’s prepared for any situation that might arise. What's the difference between Info and Notice? Why is Alert more serious than Critical? Php knows, but it's not telling.
C# A powerful language from Microsoft, C# opts for a set of log levels similar to its peers: Trace, Debug, Info, Warn, Error, and (like Python) Critical instead of Fatal. It's a solid, no-nonsense approach. Microsoft's version of Java needed to differentiate itself somehow, but didn't have any ideas of its own so stole one from Python.
Rust A systems programming language, Rust has a minimalistic set of log levels: Trace, Debug, Info, Warn, and Error. Rust knows that keeping things simple is key. It doesn't require anything more serious than Error because nothing can go wrong in properly written Rust code.
Kotlin This modern and expressive language JVM language, keeps its log levels consistent with its Java roots: Trace, Debug, Info, Warn, and Error. It's clear that Kotlin values simplicity and familiarity. It removed Fatal because it's not a fan of drama.
Swift Swift, the language for iOS development, has a concise set of log levels: Trace, Debug, Info, Warn, Error, and Critical. Swift ensures that developers can easily manage logs while building apps for Apple devices. Swift wants to make sure that your Fart Sound iPhone can properly log at Critical when notifications are disabled.
Scala No stranger to thinking differently, Scala takes a hardline approach to logging, only allowing us: Debug, Info and Warn. You might've expected the superset of all possible Log Levels for Scala, since Scala generally has the superset of all possible programming idea, but at some point enough is enough and they landed on the simple union of all log levels. And frankly, maybe that's all you need.

Conclusion:

After examining the log levels across various languages and frameworks, we were able to pick a core set of log levels that should work well with Prefab. The common log levels we identified are Trace, Debug, Info, Warn, and Error. These levels cover the essential needs of developers, allowing them to effectively debug their applications and handle critical notices.

The good news is that no language decided to place Error before Warn or Info before Debug. This means that a general ordering of log levels can be established and client specific mappings can be created to paper over the differences. These core log levels are universal enough to provide a solid foundation for Prefab's on-the-fly log level changing feature.

Log Levels Are Useless, Or Are They…

April 18, 2023 · 7 min read

Jeff Dwyer

Prefab Founder & Engineer

What should you log? How much should you log? What level should you log at? It's a question that's been asked for ages and there's no shortage of opinions on the subject.

Thom Shutt has an interesting post here that I think does a good job of describing the “bear case” for logging. Part of his recommendations are:

Only log errors
Throw away warn, info and debug levels

If that's surprising to hear, his reasoning for this recommendation is pretty solid:

When was the last time you ran Production at anything other than Info? Is changing the level a quick process that everyone on the team knows how to do?

The expected answer to his question is of course, that most of us aren't changing log levels very often, and that matches my obvervations. So it's a good point, if you aren’t able to to easily change the system and get out debug level… these lines really are just noise and you might as well delete them.

The Assumption

The underlying assumption here, is that changing the log level is a pain in the butt and takes a while. I can’t see your face, but I’m guessing you’re nodding. Most places I’ve worked, changing the log level means:

Finding and changing some xml or yaml file
Commit, push
Wait for PR approval
Build and deploy to staging then production
$Profit
Do it all again to revert your change

And depending on the language and framework, you may only be able to change the log level for the entire application at once.

Let’s be wildly optimistic and say this takes you 30 minutes from commit to production and assume we set the right logger in our first try. Even if that’s the case, we’ve still lost an hour of our day to maybe get the debug information we were hoping for.

So in this world, throwing away log levels and only doing errors makes a lot of sense.

A Better Way

So is this assumption true? Are log levels fundamentally hard to change at runtime? No, they're not.

Or at least they don't need to be anymore.

Here’s what changing a log level looks like using Prefab. Click, change, done.

Change Log Levels

And that’s it. The log levels update in realtime for whatever package or class you select. The logger is a hierarchical tree, so we set the level wherever it's convenient and everything below it will inherit that level.

That's pretty amazing, but the next trick takes it to a whole other level. Prefab also lets you target logs so that you only get debug for a particular user. To do this you just select "Targeted" log level add criteria like you would for a feature flag and instantly you can see debug logs for just a single user.

Target log level

What This Means

So, if the assumption we laid out above is false, the next question is

How does this change what we should log?

If the inability to change log levels leads us to a mindset of only logging errors. What happens when we can change log levels in seconds? What happens when we can only log for a given user, team or transaction?

Mean Time To Recovery (MTTR)

You're probably familiar with the DORA metrics so I won't go deep here. But here's a quick definition of MTTR from CodeClimate

The Mean Time to Recovery (MTTR) measures the time it takes to restore a system to its usual functionality. For elite teams, this looks like being able to recover in under an hour, whereas for many teams, this is more likely to be under a day.

If you're looking to meaningfully improve MTTR, it's hard to think of something more useful than the ability to instantly turn on debug level logging for a single user across your apps. The pager comes in, the bug tracking system explodes. You've got great metrics in your observation tool, but... why is this happening? What data is causing your system to start malfunctioning. Why shouldn't we give our on-call engineers the ability to instantly see the details of what's really going on and start to debug?

How Does it all work, or "Dynamic Configuration Brings Good Things to Life"

The enabling technology here is Dynamic Configuration. People fall in two camps on this: "haven't heard of it" and "can't live without it", but for the most part, this tech is something that has been the province of big tech (Twitter, Netflix, Robinhood, Amplitude). So, if you're not familiar with it, let's introduce the concept.

Almost all apps use some form of static configuration, this is when our apps read a property file or an environment variable. This looks like: Properties.get("http.timeout") or ENV["DATABASE_PORT"].

Dynamic configuration works just the same way DynamicConfig.get("http.timeout") but of course the value returned is... dynamic. We can set a new value a UI somewhere and each time we evaluate, we’ll get the latest version. A proper dynamic configuration system focusses on extremely high reliability and each lookup should be as fast as static configuration. At a high level the implementation ends up being a local Hashmap for each running instance and something that listens to a change stream that updates the map.

So how does dynamic configuration help us with logs?

Well, it turns out that your logger libarry also has a Hashmap of Logger -> LogLevel for each class that can log. Each time we call logger.info the logging library is going to look up the current log level to decide whether to log.

Long story short, if we store these Logger -> LogLevel mapping in our config system and update the logger on changes, we suddenly have the ability to change from warn to debug in milliseconds instead of hours.

Adoption

Learning to use this tool most effectively is a cultural change for an organization and one that will make some people uncomfortable at first. But it's a change that will pay dividends and the great news is that you can use it out of the box with your existing logging immediately.

Let's take an example. If you saw this in a PR, would you tell the engineer to remove it? A lot of people would. It looks like an artifact of the development process. It was surely useful, for local testing, but if we don't use debug in production it's just noise and should be removed.

def charge(user, amount)
  balance = get_balance(user)
  min_balance = get_min_balance(user)
  logger.debug("#{balance > min_balance ? "Will" : "Will Not"} Bill amount #{amount} because #{balance} > #{min_balance} for User #{user}")
  if balance > min_balance
    charge_user(user, amount)
  else
end

But this reasoning all gets stood on it's head if we can now instantly turn it on for a single problematic user for an hour while we debug to improve our MTTR.

As your team gains comfort with the ability to laser target logging, you'll eventually find that this promotes an even stronger practice around operational thinking. Now when you’re reading a PR, you can consider “if this goes sideways, what kind of information would I like to have at my fingertips” and use that to prepare operationally.

Conclusion

How we think about logging has been fundamentally constrained because it's always been hard to change the levels. But it's time to break past this and re-explore the potential of logging.

It's easy to try. You don't need to change your log aggregator. You don't need to change any of your logging code. It's not expensive. These log libraries are implemented as simple filters on top of what your already have. Prefab never see the contents of the logs themselves, it just changes the log level.

If you're looking to improve your MTTR, I'd encourage you to give Prefab a try. It works perfectly with your existing logging solution and is very easy to test on a local branch to get a feel for how it works. If you'd like to see a video it's just 2 minutes. Better yet, signup for Prefab and give it a try. You'll be changing levels in realtime in minutes.

Discussion on Hacker News

A Tour of Log Levels:​

Conclusion:​

The Assumption​

A Better Way​

What This Means​

Mean Time To Recovery (MTTR)​

How Does it all work, or "Dynamic Configuration Brings Good Things to Life"​

Adoption​