Summary of “97 things every software architect should know”

 

The essay “97 things every software architect should know”  discuss what it means to be a software architecture. It is a good source for ideas on how to improve your skills as an architect or what skills you need to develop to become one.

In this post I summarize the chapters I found interesting. Currently, only some chapters from the range 65-97 are summarized.

65. Your system is legacy, design for it. – by Dave Anderson.

  • you should wish your system will become legacy systems as this indicates that your system bring value to stay in production.
  • design for legacy:
    • Clarity – code should be clear to support teams
    • Testability – is your system easy to verify?

    • Traceability – can Ernie the Emergency bug fixer who has never seen the code before jump into production, diagnose a fault and put in a fix?

66. If there is only one solution, get a second opinion – by Timothy High

  • architect should consider several solution before finding a solution and evaluate their different trade offs.
  • if you can think of only one solution goes consults more experienced people.
  • Watch out from patterns you already know that might block you from finding more solutions – if you are good with a hammer everything might look like a nail to you 🙂

67. Understand the impact of change – by Doug Crawford

  • A good architect reduces complexity, design practical abstractions

  • The great architect understands the impact of change

  • changes includes changes in :
    • requirements
    • interfaces
    • abstractions
    • team members and structure

70. “Perfect” is the Enemy of “Good Enough” – by Greg Nyberg.

  • Software designers, and architects in particular, tend to evaluate solutions by how elegant and optimum they are for a given problem

  • My advice: Don’t give in to the temptation to make your design, or your implementation, perfect! Aim for “good enough” and stop when you’ve achieved it

  • What exactly is “good enough,”

    • do not impact system functionality,

    • do not impact system maintainability

    • do not impact performance in any meaningful way.

    • The architecture and design hangs together.

    • The implementation works

    • Code is clear, concise, and well-documented.

    • Could it be better? Sure, but it is good enough, so stop. Declare victory and move on to the next task.

  • Remember that application development is not a beauty contest

73. The Business Vs. The Angry Architect – By Chad LaVigne

  1. you need to listen what other people say to you and not concentrate only on what you have to say. you already know what you are going to say – so you might learn something new from listen to others. This is especially true for architects that are experienced and might think they know what is the next step for any situation.
  2. The business is your employer – you need to serve it. The business domain experts are your major partners – listen to what they have to say. if you disagree with them that is fine. learn to leave with disagreements. if there are too many and you can’t contain that – look for another business to work for and contribute your experience to its success.

75. Before anything, an architect is a developer – By Mike Brown

An architect of a project should also write code of the project. This will have several benefits:

  1. Help the architect to estimate how much time coding tasks will take his developer to complete.
  2. Its fun
  3. Keep him in sync with latest technologies updates
  4. His developers will appreciate him more as they will see his high quality code and that he take part of the developing effort.

77. Stable problems get high quality solutions – By Sam Gardiner

  1. An architect should be able to look at a problem and separate it to chunk of problems that are cohesive and decoupled.
  2. The splitting to chunks should be driven to create chunks that are stable i.e., chunks the do not change even if other chunks are changed.
  3. When a problem is stable it cause its design to be stable, when the design is stable – the implementation can be made stable with high quality.
  4. An architect should develop internal sense for separation that is similar to a good sense of direction that some people have.

78. It Takes Diligence – By Brian Hart

An architect is expected to have ingenuity in his work but less common characteristic is of being diligent.

Effective architects are using daily and weekly check lists to remind them do the things that are necessary but harder for them to do like tracking project status.

They use commitments as for example:

  • keeping project’s schedule and budget
  • complete tasks that are not fun for them
  • commitment to process/methodology
  • accepting overall responsibility of the project

Getting better in what you do requires diligence.

79. Take responsibility for your decisions – By Yi Zhou

Here is how you can become a responsible architect:

  1. each major decision need to be well documented, traceable and communicated to engineers and other stockholders so they can review and have the chance to object it.
  2. review your past architectual decision. Identify decision that remain valid and those that do not – figure out how you could avoid bad decisions. learn from your mistakes.
  3. enforce your decisions
  4. delegate decisions to other who are expert in a domain that you are not proficient. you are not know-it-all-genius.

80. Don’t Be a Problem Solver – By Eben Hewitt

An architect has the reflex to try and solved a given problem instead of asking themselves if this problem is a real problem? can it be avoided?

He should not accept requirements as solid. If removing some of them can have a more sustain design then he should offer that.

81. Choose your weapons carefully, relinquish them reluctantly – Chad LaVigne

  1. Selecting the technologies we use to attack problems with is a large part of the software architect‘s job.
  2. architect have a toolbox of technologies they are experienced with, They know the technologies limitations. They know how to solve problems when arise. They are able to estimate project effort accurately.
  3. technologies improved over time – you need to replace old technologies when new ones proved to be significantly better from the one you know.
  4. update technologies in your product if there is an obvious improvements to the business

88. Don’t Be Clever – By Eben Hewitt

  1. Good qualities an architect should have:
    1. General intelligence
    2. resourcefulness
    3. thoughtfulness
    4. a breadth and depth of knowledge
    5. affinity for precision
  2. Write simple solutions not clever ones. The disadvantages of clever solutions:
    1. They are not simple
    2. Hard to fix
    3. Hard to replace, they stick for ever

90. Find and retain passionate problem solvers- by Chad LaVigne

    1. Putting together a team of outstanding developers is one of the most important things you can do to ensure the success of a software project.

    2. Building a team
      1. asking someone to explain their approach to diagnosing a performance problem gives you great insight into their methods for problem solving.

      2. ask what they would change given the chance to start their most recent project anew.

      3. Good developers are passionate about their work. Asking them about past experience will bring out that passion and tell you what correct answers to technical trivia questions cannot.

    3. keep the team together

      1. Finding great developers is difficult. Letting people know they are valued is not. Don‘t miss simple chances to build morale and boost productivity.

      2. Keep criticism constructive and don‘t require that every solution look like it came from you.

      3. go the extra mile to keep it together

92. Pay down your technical debt – By Burk Hufnagel

  1. A fix of bug in a production system might be needed ASAP. This means that it will probably be “quick and dirty” fix.
  2. such fixes have hidden cost:
    1. system instability
    2. increase maintenance cost
    3. lack of proper design
    4. lack of documentation
    5. lack of tests
  3. Once the fix is in production, have the developers go back and fix it properly so that it can be included in the next scheduled release.

 

95. The Importance of Consommé – By Eben Hewitt

  1. A consommé is an extremely clarified broth, usually made with beef or veal, served as a delicate soup. It is made by repeated, simple, fine-grained straining.

  2. Software architecture requires a continual refinement of thought, a repeated straining of ideas until we have determined the essence of each requirement in the system.

97. Great software is not built, it is grown – by Bill de hÓra

  1. resist trying to design a large complete system

  2. Have a grand vision, but not a grand design.

  3. starting with a small running system, a working subset of the intended architecture – the simplest thing that could possibly work.

  4. it can teach us much about the architecture that a large system, or worse, a collection of architectural documents never can:

    • It will require a smaller team, which will reduce the cost of coordinating the project.

    • Its properties will be easier to observe.

    • It will be easier to deploy.

    • It will teach you and your team at the earliest possible moment what does and does not work.

    • It will tell you where the system will not evolve easily

    • Perhaps most important, it will be comprehensible and tangible to its stakeholders from the beginning, allowing them to grow into the overall design as well.

Advertisements

RSYNC design – a method for replicating updates

Background

rsync is an algorithm for efficient remote update of files over low bandwidth network link.

Definitions

  • S (source) – the process at the end that has access to the latest version of a file.
  • T (target) – the process at the end that has access to a version of a file that is older than the version S has.

Requirements

  • Sync changes between target file T that has older version of a source file S.
  • Minimize the network traffic for the synchronization.

Implementation

  1. T (the target) divides the file into blocks with fixed size L.
  2. For each block, T calculates a signature that is fast to calculate but not unique.
  3. For each block, T calculates a signature that is slow to calculate but unique.
  4. T sends S a list of pairs with fast and slow signatures for each block.
  5. S starts checking its first block.
  6. T created a temporary empty file X that will replace its current file with the version from S side.
  7. S calculate the fast signature for the block,  if it is not equal to any of T fast signatures then S sends the first byte to T and T append this byte to file X. if the fast signature is equal, then S calculate the slow signature of the block and compare it to the matching slow signature. if the two signatures are equals then S sends T the id of the block. T append the block from its local copy via block id. if the slow  signature is not equal then S will send the first byte to T.
  8. The next block S will check will be the one starting in the next byte if the previous block did not match or the one starting in an offset of block size if the previous block matched. S may need to calculate the fast signature to every possible block  (i.e. blocks generated from the previous block by shifting one byte at a time, starting from the first block and ending in the last) – so rsync is using a method similar to Adler checksum that can be calculated efficiently in iterations based on previous block signature plus small number of steps.
  9. repeat steps 7-8 until S reach its end.

rsync1

rsync2

rsync3

rsync4

Reference:

  1. Andrew Tridgell,  “Efficient Algorithms for Sorting and Synchronization” , https://rsync.samba.org/~tridge/phd_thesis.pdf

Architecture of ZAB – ZooKeeper Atomic Broadcast protocol

Background

ZooKeeper support clients reading and updating key values pairs with high availability. High availability is achieved by replicating the data to multiple nodes and let clients read from any node. Critical to the design of Zookeeper is the observation that each state change is incremental with respect to the previous state, so there is an implicit dependence on the order of the state changes. Zookeeper Atomic Broadcast (ZAB) is the protocol under the hood that drives ZooKeeper replication order guarantee. It also handles electing a leader and recovery of failing leaders and nodes. This post is about ZAB.

Definitions

  • leader and followers-  in ZooKeeper cluster, one of the nodes has a  leader role and the rest have followers roles. The leader is responsible for accepting all incoming state changes from the clients and replicate them to itself and to the followers. read requests are load balanced between all followers and leader.
  • transactions –  client state changes that a leader propagates to its followers.
  • ‘e’ – the epoch of a leader. epoch is an integer that is generated by a leader when he start to lead and should be larger than epoch’s of previous leaders.
  • ‘c’ – a sequence number that is generated by the leader, starting at 0 and increasing. it is used together with an epoch to order the incoming clients state changes.
  • ‘F.history’ – follower’s history queue. used for committing incoming transactions in the order they arrived.
  • outstanding transactions – the set of transactions in the F.History that have sequence number smaller than current COMMIT sequence number.

ZAB Requirements

  1. Replication guarantees
    1. Reliable delivery – If a transaction, M, is committed by one server, it will be eventually committed by all servers.
    2. Total order – If transaction A is committed before transaction B by one server, A will be committed before B by all servers. If A and B are committed messages, either A will be committed before B or B will be committed before A.
    3. Causal order – If a transaction B is sent after a transaction A has been committed by the sender of B, A must be ordered before B. If a sender sends C after sending B, C must be ordered after B.
  2. Transactions are replicated as long as majority (quorum) of nodes are up.
  3. When nodes fail and later restarted – it should catch up the transactions that were replicated during the time it was down.

ZAB Implementation

  • clients read from any of the ZooKeeper nodes.
  • clients write state changes to any of the ZooKeeper nodes and this state changes are forward to the leader node.
  • ZooKeeper uses a variation of  two-phase-commit protocol for replicating transactions to followers. When a leader receive a change update from a client it generate a transaction with sequel number c and the leader’s epoch e (see definitions) and send the transaction to all followers. a follower adds the transaction to its history queue and send ACK to the leader. When a leader receives ACK’s from a quorum it send the the quorum COMMIT for that transaction. a follower that accept COMMIT will commit this transaction unless c is higher than any sequence number in its history queue. It will wait for receiving COMMIT’s for all its earlier transactions (outstanding transactions) before commiting.

Screen Shot 2015-11-20 at 1.56.51 PM

picture taken from reference [4]

  • Upon leader crashes, nodes execute a recovery protocol both to agree upon a common consistent state before resuming regular operation and to establish a new leader to broadcast state changes.
  • To exercise the leader role, a node must have the support of a quorum of nodes. As nodes can crash and recover, there can be over time multiple leaders and in fact the same nodes may exercise the node role multiple times.
  • node’s life cycle: each node executes one iteration of this protocol at a time, and at any time, a process may drop the current iteration and start a new one by proceeding to Phase 0.
    • Phase 0 –  prospective leader election
    • Phase 1 – discovery
    • Phase 2 – synchronization
    • Phase 3 – broadcast
  • Phases 1 and 2 are important for bringing the ensemble to a mutually consistent state, specially when recovering from crashes.
  • Phase 1 – Discovery
    In this phase, followers communicate with their prospective leader, so that the leader gathers information about the most recent transactions that its followers accepted. The purpose of this phase is to discover the most updated sequence of accepted transactions among a quorum, and to establish a new epoch so that previous leaders cannot commit new proposals. Because quorum of the followers have all changes accepted by the previous leader- then it is promised that at least one of the followers in current quorum has in its history queue all the changes accepted by previous leader which means that the new leader will have them as well. Phase 1 exact algorithm available here.
  • Phase 2 – Synchronization
    The Synchronization phase concludes the recovery part of the protocol, synchronizing the replicas in the cluster using the leader’s updated history from the discovery phase. The leader communicates with the followers, proposing transactions from its history. Followers acknowledge the proposals if their own history is behind the leader’s history. When the leader sees acknowledgements from a quorum, it issues a commit message to them. At that point, the leader is said to be established, and not anymore prospective. Phase 2 exact algorithm available here.
  • Phase 3 – Broadcast
    If no crashes occur, peers stay in this phase indefinitely, performing broadcast of transactions as soon as a ZooKeeper client issues a write request.  Phase 3 exact algorithm available here.
  • To detect failures, Zab employs periodic heartbeat messages between followers and their leaders. If a leader does not receive heartbeats from a quorum of followers within a given timeout, it abandons its leadership and shifts to state election and Phase 0. A follower also goes to Leader Election Phase if it does not receive heartbeats from its leader within a timeout.

References:

  1. Flavio P. Junqueira, Benjamin C. Reed, and Marco Serafini, “Zab: High-performance broadcast for primary-backup systems” 
  2. Andr´e Medeiros, “ZooKeeper’s atomic broadcast protocol: Theory and practice”
  3. ZooKeeper Apache documentation
  4. Benjamin Reed,Flavio P. Junqueira, A simple totally ordered broadcast protocol”