Semantic Hashing

Semantic hashing is a method to map documents to a code e.g., 32-bit memory address so documents with semantically closed content will be mapped to close addresses. This method can be used to implement an information retrieval (IR) system where the query will be a document and search results will contain documents with similar content (semantics). This method was published by G.Hinton in this paper.

Indexing is implemented in the following manner: a document is mapped to a word-count vector and then this vector is passed through RBM autoencoder and encoded to 32-bit address.

For searching – the query string is treated as a document i.e., it’s word-count vector is passed through the encoder to get its matching address. Now, the search result will be the documents that are stored in the query address and also the documents that are stored  in close addresses. for example, close address can be addresses that are different up to 4 bits (also known as Hamming distance) from the original address.

One of the common methods in IR systems today is the tf-idf indexing technique. each document is indexed so each term in the corpus is pointing to a list of documents that contains this term. Basic searching is done by looking up the list of documents that matches each term in the search query and intersect those list leaving document contains all terms in the query. The disadvantage of this method is that search time is affected by the number of terms in the query. In contrast- semantic hashing gets the list of relevant documents in a single lookup and so is not affected by query size.

Another method in IR systems is latent-semantic-analysis (LSA).  Documents are mapped to word-count-vectors, then the dimension of the vectors is reduced using SVD method. Search is done by mapping the query document to word-count-vector, then reduce it’s dimension and measure the angle between the document vector to all the vectors of the corpus documents. The disadvantage of this method is that search time is linearly depends on the size of the corpus. In contrast- semantic hashing is only affected by the size of the documents list i.e., larger corpus means more collision in memory address mappings and longer document lists. The size of the lists does not increase linearly and the documents are spread across the memory addresses.

Below, are results from the paper where it can be seen that the search quality of this technique is similar to tf-idf (which is considered state of the art for IR systems).  The axis are recall and precision values of the systems. The tests included comparison of latent-semantic (LSA) system and semantic-hashing (graph on the left). Also a comparison of LSA and tf-idf and semantic-hashing that was followed by tf-idf filtering (graph on the right). It can be seen in the right graph that semantic-hashing with tf-idf filtering was close to tf-idf which shows that semantic hashing return documents that are similar to what tf-idf method would return – this basically shows that semantic-hashing is working 🙂

Screen Shot 2017-06-24 at 8.45.58 PM





Introduction to Restricted Boltzmann Machine (RBM)

Restricted Boltzmann Machines (RBM) are building blocks for certain type of neural networks which were invented by G.E.Hinton.

In a paper published in Science – Hinton describing how to use neural networks to reduce dimensionality of data using an autoencoder. Below is a sketch of a standard autoencoder where data is inserted from layer X on the left, code is for a data is presented in layer Z. The training of the network is done by adding a decoder network from the code layer on and calculate the error by the difference of the encoding at layer X’ to the original data in layer X then using back propagation to update the network weights.

Screen Shot 2017-06-16 at 2.36.59 PM

One of the challenges of training deep autoencoder is that unless the network’s weights are initialize close to their optimum value – the training will fail and the encoder will not work.  Hinton suggested a pre-train phase where every two layers up to the encoder layer are trained in separation from the other layers in that group. Once the first two layers are pre-trained than we move to the two layers where the last layer of the previous step becomes the first layer of the next pair. The graph below shows that a a network with 7 layers training is failing compared to a a network that was pre-trained.

Screen Shot 2017-06-16 at 2.10.35 PM

RBM is a two layer network with the following constraints

  • the layer on the left is called the visible layer and the one on the right is called the hidden layer
  • symmetrical connections
  • no connections between nodes within the same layer
  • There are bias weights for both visible and hidden units

Screen Shot 2017-06-16 at 2.17.02 PM

The pre-train is compose from the following steps (taken from wikipedia):

Screen Shot 2017-06-16 at 2.24.30 PM

Comments on the pre-train procedure:

  • ‘v’ is the visible vector, ‘h’ is the hidden later vector, ‘a’ is the bias vector of the visible layer and ‘b’ is the bias vector of the hidden layer.
  • in step 1  – the hidden layer contains only 1 or 0 value. it is calculated in a stochastic process by generating a random number in the range (0,1) and if the value if h_i is greater than this random value h_i=1 otherwise h_i=0, same goes for step3.
  • outer product is the matrix that is generated from multiplying the column vector v and the row vector h.
  • epsilon is the learning rate

Each two layers of the encoder are pre-trained using the training data several times (this is also called epochs). When completing the pre-train of 2 layers then the hidden layer becomes the visible layer of the next pair of the encoder and it is pre-trained together with the next layer of the encoder in a greedy way.

When all layers up to the code layer are pre- trained then  unrolling of the layer is done which means all matrix transpose are taken as weights to the encoder network. This described in the following sketch:

Screen Shot 2017-06-16 at 2.43.15 PM

Now, the fine tune phase starts which is the standard autoencoder unsupervised learning via back propagation algorithm.

We are ready to encode! we will use the neural net from bottom  layer up to the coding layer and pass through it our data for encoding.

Here are some results of encoding and then decoding using RBM which are compared to PCA algorithms, taken from Hinton’s paper:

Figure A 

  • Architecture: (28X28)-400-200-100-50-25-6
  • Training: 20,000 images
  • Testing: 10,000 new images.

Screen Shot 2017-06-16 at 2.48.27 PM

Figure B

  • Architecture: 784-1000-500-250-30.
  • Training: 60,000 images
  • Testing: 10,000 new images.

Screen Shot 2017-06-16 at 2.49.28 PM

Figure C

Screen Shot 2017-06-16 at 2.49.55 PM

Figure showing visual clustering of RBM compared to PCA:

Screen Shot 2017-06-16 at 2.50.08 PM




I recently read Cal Newport new book titled “DEEP WORK – RULES FOR FOCUSED SUCCESS IN A DISTRACTED WORLD” and found it an interesting book and most useful for my own career. It opens by explaining the value of working deep which basically means working in high concentration on things that are important to your career, the book continues with how hard it is to work in deep mode and then brings practical practices for achieving this. If you a person who is self aware and likes to improve his productivity –  then I recommend you buy this book.


A proven scientific fact is that people fight desires all day long. one should expect to be bombarded with the desire to do anything but work deeply throughout the day. Another fact is that a person have a finite amount of willpower that becomes depleted as you use it. you got 4 hours top per day for doing deep work. Rituals helps you harness your limited willpower to work deep. Do not trust on your good intentions to work deep – you will probably fail. you need rituals.

Some effective rituals

  1. Schedule Every Minute of Your Day: at the beginning of each workday, plan every hour of your day. I use google calendar to schedule appointment for each task I plan. Quantify the Depth of Every Activity – ask yourself: How long would it take to train a smart recent college graduate with no specialized training in my field to complete this task? If the answer is less than 3 months than this task is shallow and you do not want to do it. reduce your shallow work, this will leave you with more energy to work deep. Aimed on small important goals –  you know that you need goals – right?
  2. Where you’ll work: have a special location where you work deep e.g., a library with good atmosphere, pleasant coffee-shop. Some times, make grand gesture – go to some place special to work deep e.g., go to work from an hotel in another city for couple of days, order a flight to and back from Japan and work during the flight. you can practice ‘productive meditation‘ which means that you can use the time of walking your dog or other monotonic exercise to think on a specific deep question. this requires careful planing of what is the question and avoid looping the same thoughts.
  3. How long you’ll work: limit the deep work time. concentrating is hard – your brain will do whatever he can to stop you. I’m using the pomadoro technique.
  4. How you’ll work once you start to work: ban on internet use and cellular activity. Act on the lead measures: there are two types of metrics, lag measure e.g., publishing 6 paper in a year or learn the design of 10 systems  in 2 months and lead measure e.g., the number of hours one spend in deep work. lead measure help you to accomplish short term tasks that contribute to the lag measure.
  5. At the end of the day: Have an end to your working day. do not continue to work from home at night. At the end of the day do a shutdown-ritual which includes documenting all new tasks you collected during the day, quickly skim all open tasks, plan your next day and saying to yourself something like “shutdown complete!”. no more thinking on work after that. Downtime is important because:
    1. Downtime helps have insights
    2. Downtime helps recharge the energy needed to work deeply
    3. You should complete the 4 hour deep work at the office. The work you can do at night will not be deep – so no point doing it.
    4. Limiting your time necessitate more careful thinking about your organizational habits.
  6. At the end of the week: plan next week tasks focusing on tasks that are important, review the tasks you completed this week and check if it is close to your plan from the week before. understand what happened if you are not close.


  1. Deep Work: Professional activities performed in a state of distraction-free concentration that push your cognitive capabilities to their limit. These efforts create new value, improve your skill, and are hard to replicate.
  2. The Deep Work Hypothesis: The ability to perform deep work is becoming increasingly rare at exactly the same time it is becoming increasingly valuable in our economy. As a consequence, the few who cultivate this skill, and then make it the core of their working life, will thrive.
  3. Shallow Work: Noncognitively demanding, logistical-style tasks, often performed while distracted. These efforts tend not to create much new value in the world and are easy to replicate.