Monday, 24 June 2019

Debugging code - Identifying the bug



Julia Evans is a very smart woman in IT who creates very nice, funny and insight full comics, she calls 'zines', on Linux and Coding topics.

This morning I read she came up with a question that triggered me:


Last week I  realized I'm already 25 years in IT, after that my forced membership of a famous Dutch shooting club ended after 9 months (the kind of shooting club where you got free clothing, survival courses, and in my case also a truck-driving-license. Which other club offers that?). 

Anyway, during the years I discovered that despite of all the smart people I got to know and work with, this part of our work isn't obvious. In very many cases people seem to 'just do something'. No offense, but for developers it's often frustrating and just not fun to work on bugs or problems. And administrators that are confronted with a problem are often 'too busy with other stuff.' So they try something, don't find the thing, and at a later moment try something else. So, when I got involved I ask the obvious questions and in most cases I try out the same thing myself. Even though I do believe them, I want, I need,  to see the behavior with my own eyes.

By the way, I'm always reluctant to call it a bug. A bug is only a bug when you have reproduced it and based on common interpretation, together with the tester (if he found the issue), the functional/solution designer come to a consensus that the code does not do what it is supposed to. The functional specs are interpreted by both the tester and the developer. And in a certain way also the designer. It might be that the tester finds an anomaly, but that it is either a miss-interpretaton from him or a problem with the formulating of the specs. There are cases that the coder is right. But, of course, your program can work with an unexpected logic.


But, back to Julia's tweets:  they triggered me, so I jotted down some thoughts that I  got and are the basis of my search for issues.

To me it starts with identifying a case where it goes wrong, but equally importantly, together with a similar situation where it goes right. And, as far as possible, creating a unit test for both. Since, my work is mostly done on message processing platforms (Oracle SOA Suite, BPM Suite, Service Bus), I love it when a tester can hand me over a triggering message of the case and the involving response messages. I can then add them to my unit-test-set in SoapUI/ReadyAPI.
 

Then I add instrumentation (log lines, etc.) on key positions, that identify to which points the code is executed and what lines aren't reached. SOA Suite produces a flow trace of the execution. But often expressions are used that are quite complex one liners. I then split those up into several separate assignments to 'in between' variables. In Java, JavaScript, etc., I do not like complex one-liners. I prefer several variables for 'in between' values, and assignments with short expressions. That helps with line-by-line debugging.

Next, I iteratively narrow that gap between the point I can conclude the code reaches and the point I find not reached, until the statement or point of execution that fails can be identified.
In the log lines I add key variable values that are involved.


In very rare, very difficult, cases, I sometimes break down the code, cut away all the code that is not touched, until I get a minimal working Mickey Mouse (or in Dutch: Jip en Janneke) case. From there I build it up, and test iteratively in very small steps, until it breaks.

Also, very important, for difficult problems, I document very meticulously what I have done and concluded. My slogan here is: 'Deduction, my dear Whatson!' When having a problem, one can quickly come up with some potential causes and tests to check those. A unit test for a potential cause can go two ways, it can confirm or disapprove the suspicion. Both outcomes have consequences for the follow-up. Disaproving a potential cause, can strikethrough other potential paths as well. 

But, approving it, need additional steps to narrow down. I see it as a decision-tree to follow.

What I have found through the years, is that structurally document the steps done with the particular conclusions and the follow-ups is not quite obvious. But in many cases I found them important. Especially working in a Taskforce, or when I got hired to get involved in a case. In those cases the customer that hired you has the right to have something in hands that represents what he payed for.
I once was involved in a case that turned out to be a database bug. So I could not help the customer to solve it. But they where very pleased in the structured method I used to check out what could be the problem. And for those administrators and developers that got to do this as  a side job, besides there regular things: please do yourself a favor and document. I found Google Docs very usefull in this.


Oh, and by the way: I work with BPEL, BPMN, Oracle Service Bus, Java, Pl/Sql, XSLT, XQuery, Python/Jython/WLST, sometimes JavaScript, you name it. And actually, my way of structured code or systems analysis comes down to the same procedures. Regardless of technology.



No comments :