The Noisy Channel

 

Retiring a Great Interview Problem

August 8th, 2011 · 105 Comments · General

Interviewing software engineers is hard. Jeff Atwood bemoans how difficult it is to find candidates who can write code. The tech press sporadically publishes “best” interview questions that make me cringe — though I love the IKEA question. Startups like Codility and Interview Street see this challenge as an opportunity, offering hiring managers the prospect of outsourcing their coding interviews. Meanwhile, Diego Basch and others are urging us to stop subjecting candidates to whiteboard coding exercises.

I don’t have a silver bullet to offer. I agree that IQ tests and gotcha questions are a terrible way to assess software engineering candidates. At best, they test only one desirable attribute; at worst, they are a crapshoot as to whether a candidate has seen a similar problem or stumbles into the key insight. Coding questions are a much better tool for assessing people whose day job will be coding, but conventional interviews — whether by phone or in person — are a suboptimal way to test coding strength. Also, it’s not clear whether a coding question should assess problem-solving, pure translation of a solution into working code, or both.

In the face of all of these challenges, I came up with an interview problem that has served me and others well for a few years at Endeca, Google, and LinkedIn. It is with a heavy heart that I retire it, for reasons I’ll discuss at the end of the post. But first let me describe the problem and explain why it has been so effective.

The Problem

I call it the “word break” problem and describe it as follows:

Given an input string and a dictionary of words,
segment the input string into a space-separated
sequence of dictionary words if possible. For
example, if the input string is "applepie" and
dictionary contains a standard set of English words,
then we would return the string "apple pie" as output.

Note that I’ve deliberately left some aspects of this problem vague or underspecified, giving the candidate an opportunity to flesh them out. Here are examples of questions a candidate might ask, and how I would answer them:

Q: What if the input string is already a word in the
   dictionary?
A: A single word is a special case of a space-separated
   sequence of words.

Q: Should I only consider segmentations into two words?
A: No, but start with that case if it's easier.

Q: What if the input string cannot be segmented into a
   sequence of words in the dictionary?
A: Then return null or something equivalent.

Q: What about stemming, spelling correction, etc.?
A: Just segment the exact input string into a sequence
   of exact words in the dictionary.

Q: What if there are multiple valid segmentations?
A: Just return any valid segmentation if there is one.

Q: I'm thinking of implementing the dictionary as a
   trie, suffix treeFibonacci heap, ...
A: You don't need to implement the dictionary. Just
   assume access to a reasonable implementation.

Q: What operations does the dictionary support?
A: Exact string lookup. That's all you need.

Q: How big is the dictionary?
A: Assume it's much bigger than the input string,
   but that it fits in memory.

Seeing how a candidate negotiates these details is instructive: it offers you a sense of the candidate’s communication skills and attention to detail, not to mention the candidate’s basic understanding of data structures and algorithms.

A FizzBuzz Solution

Enough with the problem specification and on to the solution. Some candidates start with the simplified version of the problem that only considers segmentations into two words. I consider this a FizzBuzz problem, and I expect any competent software engineer to produce the equivalent of the following in their programming language of choice. I’ll use Java in my example solutions.

String SegmentString(String input, Set<String> dict) {
  int len = input.length();
  for (int i = 1; i < len; i++) {
    String prefix = input.substring(0, i);
    if (dict.contains(prefix)) {
      String suffix = input.substring(i, len);
      if (dict.contains(suffix)) {
        return prefix + " " + suffix;
      }
    }
  }
  return null;
}

I have interviewed candidates who could not produce the above — including candidates who had passed a technical phone screen at Google. As Jeff Atwood says, FizzBuzz problems are a great way to keep interviewers from wasting their time interviewing programmers who can’t program.

A General Solution

Of course, the more interesting problem is the general case, where the input string may be segmented into any number of dictionary words. There are a number of ways to approach this problem, but the most straightforward is recursive backtracking. Here is a typical solution that builds on the previous one:

String SegmentString(String input, Set<String> dict) {
  if (dict.contains(input)) return input;
  int len = input.length();
  for (int i = 1; i < len; i++) {
    String prefix = input.substring(0, i);
    if (dict.contains(prefix)) {
      String suffix = input.substring(i, len);
      String segSuffix = SegmentString(suffix, dict);
      if (segSuffix != null) {
        return prefix + " " + segSuffix;
      }
    }
  }
  return null;
}

Many candidates for software engineering positions cannot come up with the above or an equivalent (e.g., a solution that uses an explicit stack) in half an hour. I’m sure that many of them are competent and productive. But I would not hire them to work on information retrieval or machine learning problems, especially at a company that delivers search functionality on a massive scale.

Analyzing the Running Time

But wait, there’s more! When a candidate does arrive at a solution like the above, I ask for an big O analysis of its worst-case running time as a function of n, the length of the input string. I’ve heard candidates respond with everything from O(n) to O(n!).

I typically offer the following hint:

Consider a pathological dictionary containing the words
"a", "aa", "aaa", ..., i.e., words composed solely of
the letter 'a'. What happens when the input string is a
sequence of n-1 'a's followed by a 'b'?

Hopefully the candidate can figure out that the recursive backtracking solution will explore every possible segmentation of this input string, which reduces the analysis to determine the number of possible segmentations. I leave it as an exercise to the reader (with this hint) to determine that this number is O(2n).

An Efficient Solution

If a candidate gets this far, I ask if it is possible to do better than O(2n). Most candidates realize this is a loaded question, and strong ones recognize the opportunity to apply dynamic programming or memoization. Here is a solution using memoization:

Map<String, String> memoized;

String SegmentString(String input, Set<String> dict) {
  if (dict.contains(input)) return input;
  if (memoized.containsKey(input) {
    return memoized.get(input);
  }
  int len = input.length();
  for (int i = 1; i < len; i++) {
    String prefix = input.substring(0, i);
    if (dict.contains(prefix)) {
      String suffix = input.substring(i, len);
      String segSuffix = SegmentString(suffix, dict);
      if (segSuffix != null) {
        memoized.put(input, prefix + " " + segSuffix);
        return prefix + " " + segSuffix;

}
}
memoized.put(input, null);
return null;
}
Again the candidate should be able to perform the worst-case analysis. The key insight is that SegmentString is only called on suffixes of the original input string, and that there are only O(n) suffixes. I leave as an exercise to the reader to determine that the worst-case running time of the memoized solution above is O(n2), assuming that the substring operation only requires constant time (a discussion which itself makes for an interesting tangent).

Why I Love This Problem

There are lots of reasons I love this problem. I’ll enumerate a few:

  • It is a real problem that came up in the couse of developing production software. I developed Endeca’s original implementation for rewriting search queries, and this problem came up in the context of spelling correction and thesaurus expansion.
  • It does not require any specialized knowledge — just strings, sets, maps, recursion, and a simple application of dynamic programming / memoization. Basics that are covered in a first- or second-year undergraduate course in computer science.
  • The code is non-trivial but compact enough to use under the tight conditions of a 45-minute interview, whether in person or over the phone using a tool like Collabedit.
  • The problem is challenging, but it isn’t a gotcha problem. Rather, it requires a methodical analysis of the problem and the application of basic computer science tools.
  • The candidate’s performance on the problem isn’t binary. The worst candidates don’t even manage to implement the fizzbuzz solution in 45 minutes. The best implement a memoized solution in 10 minutes, allowing you to make the problem even more interesting, e.g., asking how they would handle a dictionary too large to fit in main memory. Most candidates perform somewhere in the middle.

Happy Retirement

Unfortunately, all good things come to an end. I recently discovered that a candidate posted this problem on Glassdoor. The solution posted there hardly goes into the level of detail I’ve provided in this post, but I decided that a problem this good deserved to retire in style.

It’s hard to come up with good interview problems, and it’s also hard to keep secrets. The secret may be to keep fewer secrets. An ideal interview question is one for which advance knowledge has limited value. I’m working with my colleagues on such an approach. Naturally, I’ll share more if and when we deploy it.

In the mean time, I hope that everyone who experienced the word break problem appreciated it as a worthy test of their skills. No problem is perfect, nor can performance on a single interview question ever be a perfect predictor of how well a candidate will perform as an engineer. Still, this one was pretty good, and I know that a bunch of us will miss it.

105 responses so far ↓

  • 1 rp1 // Aug 8, 2011 at 2:59 am

    Why is this not just n lookups? Start with one character. If that fails, look up the first two, etc. When a lookup succeeds, insert a space and set the string to start at the next unexamined character.

    Thus “aaaaa” becomes “a a a a a”. Works great with a trie.

  • 2 facepalm // Aug 8, 2011 at 3:26 am

    rp1, you’d fail the interview :-)

    the author already explained why in the article.

  • 3 binarysolo // Aug 8, 2011 at 3:40 am

    rp1 – You need to consider cases where a word is composed of a valid word base with a suffix that is not a standalone word.

    Say, valid word + an invalid word, such as: “shorter” –> = short + er = (valid) + (invalid).

    Recursion just makes the most sense as the author wrote; reducing the problem by accounting for the string from the back. As he points out, bad case is when you have tons of viable character combinations that remain workable with prior chars in combinations, and one persistent incompatibility that forces the exploration of the entire space of solutions .

    Er, I’m not articulating it well; just read his answer which explains it a lot better. :-)

  • 4 Grant Husbands // Aug 8, 2011 at 3:43 am

    @rp1, you’re failing to think about backtracking; imagine the input is “aaaaab”.

  • 5 Seems easy // Aug 8, 2011 at 3:44 am

    rp1’s solution seems perfectly fine to me. The author did not explain why I couldn’t use it.

    If you think differently explain why.

  • 6 Rick Williams // Aug 8, 2011 at 3:46 am

    You make “interview candidates” sign NDAs in order to interview. That’s amazing. Mind sharing the text of the NDA, I’d like to see what it covers. Thanks in advance.

  • 7 binarysolo // Aug 8, 2011 at 3:46 am

    Incidentally I’m kinda surprised this is considered a worthy-enough interview prob. I’m not a programmer by trade, though I enjoyed the CS106x+107 sequence at a li’l school near Daniel’s location and would think that a freshman with some amount of CS thinking/experience would trivially solve this.

  • 8 zui // Aug 8, 2011 at 3:54 am

    the main issue with such interviews is that 50% of the candidates have troubles answering those under stress no matter how simple.

    i know its an issue hard to solve.

    interviewers don’t care about finding the proper candidate. they care that a candidate passed tests so can’t be blamed if the candidate does not perform well enough at work.

    interviewers care of a level of assurance, if you prefer. hey it’s not as bad as it sounds, the candidate indeed is going to be likely to fit the position.

    will he really fit, will he like it and is he actually good at solving *new* problems or just solving classic interview quizzes ? Who cares thats a risk the interviewer is willing to take.

    A true interviewer, that is, someone who’s main care is that the candidate is going to be helping out the company is not going to give all these stupid questions the same way. A true interviewer, while he has a lot of questions and steps prepared will not serve them to the candidate. he will learn to know the candidate in a short amount of time and ask the right questions, ones which may not even have been prepared before.

    a true interviewer sees the challenge in every candidate he interviews, and not just the opposite (where the candidate is challenged by the interviewer’s questions)

    a true interviewer disregard the crap (hello best questions list) and focus on what matters.

    finally, new talents in the it world are HARD to come by so there’s a lot of competition to recruit them. well, let me tell you this straight, there’s many talents which are just discarded by the review/interview process that are here for you to pick from.

    thats how we find gems that everyone elses passed upon. then people wonder why my team always outperforms their.

  • 9 tzs // Aug 8, 2011 at 4:32 am

    Interesting problem, which I don’t think I’ve seen before. Here’s the O(n^2) solution I’d have come up with if confronted with this thing.

    Build a directed graph consisting of vertices labeled 0, 1, 2, …, n, where n is the length of the string, where there is an edge from k to j if and only if there is a dictionary word of length j-k starting at position k in the string.

    This can be done in O(n^2).

    Solutions to the problem then correspond to paths through the graph from 0 to n. Use something like Dijkstra’s algorithm to find a minimal path in O(n^2), which corresponds to a solution that uses the smallest number of dictionary words to exactly cover the string.

  • 10 Benjash // Aug 8, 2011 at 4:53 am

    Im not a fully blown coder by any means.

    But, I was thinking a solution, would be to break the string into consonant clusters (syllables maybe?) and group them into weighted groups. Or some algorithm measuring lexical units within the string.

    some database of word clusters like:

    String = applepieisgreat

    3wordcluster = “app”, “eat”, “ple”, “ond” etc …
    2wordcluster = “ap”, “le”, “pi”, “is” etc

    Then build up the words
    app + le / ap + ple / ple + pie / ple + pi / pie + is / etc

    Then working back from the biggest clusters run look ups against the dictionary see if its a word.
    Then it would be a simple jigsaw puzzle of what words fit in the string.

  • 11 Bob // Aug 8, 2011 at 4:54 am

    @Seems easy

    Let’s assume that this is the dictionary:

    this
    text
    is
    short
    shorter

    Now let’s apply rp1’s solution to the problem. It will walk along the string until it finds a match and apply that match to the output. It produces this:

    this text is short er

    “short” is obviously a match for the first 5 characters of “shorter”. Since the solution analyses those 5 characters first and finds a match, it happily accepts the word “short”. It now has the string “er” to analyse. “er” isn’t in the dictionary, so we’ll assume it just gets tagged on the end instead of discarded.

    The correct behaviour here would be to start backtracking. If we have “er” as an unmatched string, it is possible that it connects with something we’ve matched previously. So, let’s try “ter”. Nope, no match. Now “rter”. Nope, no match. However, when we eventually try “shorter”, we’ve got a match. With backtracking, we’ve found a solution that works for everything in the string:

    “this text is shorter”

    Clearly this isn’t O(n) as we have to re-examine components of the string multiple times. In fact, rp1’s algorithm is O(n^2), not O(n) as he suggestst.

  • 12 rcf // Aug 8, 2011 at 5:02 am

    @ Seems easy

    Suppose the words are a, aa, aaa, ab
    Suppose the input is aaab.
    The proper segmentation is aa+ab, but if you do it rp1s way you’d try a+? and fail to find a segmentation. If you also started with the longer string aaa you’d still fail.
    In the worst case scenario there’s a tiny chance you’d get the right answer greedily

  • 13 Tom White // Aug 8, 2011 at 5:17 am

    Coder interviews are “rocket science”. How to detect the best programmers is a hard problem.

    But, this coding test is for a programmer who is going to be hired to write a string-processing library (or similar). That might not be your goal.

    There is also a pretty high degree of quirks in this problem. Most programmers come from a background of processing whole lines, whole sentences, whole words, etc. This seems to be asking the programmer to make one dictionary lookup *per letter*. Perhaps too much of a brain shift for a high-pressure interview situation.

    Then, you think you speed it up by creating a O(n) cache to hold words you have seen when the O(log n) dictionary has all the words in memory (and the virtual memory pages that get loaded wouldn’t get paged out to disk during the one millisecond this runs). Your cache can only really help if it is the case that the program, the cache, and the input string are quite small and fit in the fastest level of processor cache.

    This article and exercise confirms my experience that most interviewers are not good enough programmers themselves and unbiased enough to recognize the great programmers when they pass by. But, that’s OK, the HR department has usually already rejected them due to some “problem with their resume”.

    You are correct that a coding problem is absolutely necessary. It needs to be maximally understandable so as to overcome the candidate’s anxiety. It needs to start easy and work up via interactive discussion to show the programmer’s actual level of competence. The automated websites for this look interesting but can only do a first level of checking for basic programming ability.

    We all need to step outside ourselves and understand what makes someone a good programmer: smart, quick learner, and good to work with.

  • 14 rcf // Aug 8, 2011 at 5:53 am

    @Tom White

    The cache has nothing to do with speeding up lookups or the size of the dictionary. If you have a string of length L than the standard backtracking solution will in the worst-case do 2^L dictionary lookups. By keeping track of which substrings are possible to segment we can reduce this to ~L lookups.

  • 15 Ian Wright // Aug 8, 2011 at 5:57 am

    As to someone who hasn’t coded since high school, I’m curious as to what legal actions you take against sites like Glassdoor that publish information bared in NDAs. I assume you would need to know who actually provided the information to the site to bring any sort of penlites against them. At the end of the day you’re just closing the stable door after the horse has already bolted.

  • 16 rocketsurgeon // Aug 8, 2011 at 7:34 am

    There’s a bug in the general solution, it doesn’t work as written. Maybe your interview questions should include a section on testing? :)

  • 17 Retric // Aug 8, 2011 at 7:43 am

    Reading this I can’t help but your falling into the java trap of trying to make a ridiculously generic solution which I would consider a danger sign but plenty of people love to see.

    Anyway, assuming a real world input and real world dictionary you can try plenty of things that break down for a dictionary that includes four hundred A’s but are actually valid solutions. Also, if you want a fast real world solution then sticking with a pure lookup dictionary would slow things down. EX: Being able to toss 5 letters into a lookup table that says the longest and shortest words that start with those letters would save a lot of time. Basically, ‘xxyzy’ = null, null saves you from looking up x to the maximum word length in your dictionary. Secondly sanitizing inputs is going to be both safer and faster. Granted anything you are coding in 45 minutes is going to be a long way from production ready code.

    PS: Even with backtracking you can code an O(n) solution for vary large input sets. Just keep track of the K best output sequences that are aka N, N-1,N-2,…N-K. (K being the lengh of the longest word in your dictionary.)

  • 18 jeremy // Aug 8, 2011 at 8:03 am

    Many candidates for software engineering positions cannot come up with the above or an equivalent (e.g., a solution that uses an explicit stack) in half an hour. I’m sure that many of them are competent and productive. But I would not hire them to work on information retrieval or machine learning problems, especially at a company that delivers search functionality on a massive scale.

    I have a comment about the relationship of this question to “massive scale” information retrieval and machine learning problems. Naturally, no one wants a O(2^n) solution. But even when the solution is O(n^2), if the size of your input n is 10 billion (aka massive scale, web scale, big data), even an elegant, memoized algorithm isn’t tractable anyway, correct?

    So while I indeed like this question, because it has a “progressive reveal” in levels of thinking, does someone working on web scale (massive) data really ever need to implement the highest level of thinking? Or is it that you just want a programmer who is aware of the issue, even if she or he never has to use it?

  • 19 The “word break” problem | en_GB@blog // Aug 8, 2011 at 8:09 am

    […] Retiring a Great Interview Problem (via Retiring a Great Interview Problem), this is the best I could get in less than 30 […]

  • 20 betacoder // Aug 8, 2011 at 8:15 am

    Trivial but noted: The simplified solution also has a off-by-one error.

  • 21 Daniel Tunkelang // Aug 8, 2011 at 8:21 am

    To those who noted the off-by-one error: thanks, it’s fixed now.

  • 22 Daniel Tunkelang // Aug 8, 2011 at 8:23 am

    @binarysolo

    I also used to think this problem was too easy. Experience proved otherwise. In fact, I found at Google that performance on this problem correlated strongly to overall performance in the interview process, albeit with a limited sample size.

  • 23 rcf // Aug 8, 2011 at 8:25 am

    @Retric congratulations, you’ve reinvented dynamic programming with your O(n) solution :-) That’d be a perfect solution.
    Also, if your map is stored in sorted order (like in C++) a binary search will find you the longest word that fits the remainder of the string. (To also efficiently find all of the words that fit will require something extra.)

  • 24 Daniel Tunkelang // Aug 8, 2011 at 8:27 am

    @Rick Williams

    It’s pretty common for companies to make candidates sign NDAs. I’ve learned about confidential information from several companies I interviewed with, typically as part of the sell of exciting things I could work on.

    But forget the legalities, which are unenforceable in practice. Instead think about how you’d like yourself and other engineers to be assessed. Part of the point of interview problems is to make hiring about more than a person’s resume. It’s not clear who gains when interview problems are retired because they’ve been disclosed. Still, I recognize practically that it is impossible to maintain secrets once enough people know them. Hence my comment at the end of the post.

  • 25 jdavid.net // Aug 8, 2011 at 8:29 am

    Questions like this are grand if you are looking for a ‘Search Engineer’ but if you are looking for a front end developer or someone that understands how to design a REST api, or something else usefull this is a bogus question.

    All to often I have seen interviewers use a one question fits all approach to interviewing. I have been out of college for a decade and have significant work experience in producing web pages and frameworks that respond to a browser or app in a specific way.

    Questions like these filter me out because I have not been doing college papers or playing around in a programming languages class, but rather have been building real tools.

    Questions like how would you design your own web framework might be a better question. Why is jQuery popular how does it differ from other js frameworks.

    Can you write a delegated click handler? How does it differ from an assigned click handler?

    Please ask a question relative to an applicant’s role/ experience.

    There are plenty of questions out there that are easy for college grads to answer, but hard to experienced programmers and vice verse.

  • 26 The Word Break Problem « Plus 1 Lab // Aug 8, 2011 at 8:42 am

    […] an interesting post on the Noisy Channel blog describing what author Daniel Tunkelang calls the word break problem.  I didn’t find this post interesting because the problem is a good interview question, I […]

  • 27 Daniel Tunkelang // Aug 8, 2011 at 9:00 am

    @zui

    No interview process is perfect. It’s tempting to only hire people you’ve worked with — indeed, I’d be inclined to so that as a founder. But that doesn’t scale, and it also biases the process towards people who are already well connected. Another approach is to focus almost entirely on the resume, but that has its own problems, from resume inflation to again favoring those who started with advantages.

    I’m by no means perfect but I think I give candidates as fair a shot as I know how. I hired a candidate with no college degree and a high-school equivalency — she performed phenomenally as a software engineer and is now a managing director at Goldman. I try to put nervous candidates at ease. But there’s no getting around that an interview is an assessment process, and not everyone does well when they are being assessed.

    I’m curious to hear the details of how you or your company interview candidates. Everyone benefits if we can make this process better.

  • 28 Geek Reading August 8, 2011 | Regular Geek // Aug 8, 2011 at 9:02 am

    […] Retiring a Great Interview Problem […]

  • 29 Daniel Tunkelang // Aug 8, 2011 at 9:03 am

    @Retric

    Is it really so generic? I’ve simplified the problem a little, but it’s pretty close to a real problem I had to solve for implementing search features that would be deployed in a very broad range of domains, including for part-number search. I had the opportunity to see domain-specific heuristics break down.

  • 30 Daniel Tunkelang // Aug 8, 2011 at 9:08 am

    @jeremy

    To clarify, the n here is the size of the input string, rather than the number of users or documents. But algorithms like these run in a high-traffic environment, with requests being processed concurrently. A feature whose cost blows up exponentially with a bad input can take a site down. Granted, there are other ways to guard against such failures (e.g., time-outs), it’s good to use algorithms that don’t have such blow up. And even better to understand the behavior of algorithms before rather than after those algorithms make it to production. :-)

  • 31 Grant Husbands // Aug 8, 2011 at 9:09 am

    @rcf, @Retric: You don’t need to invent new structures for the dictionary; lots of variants of trie/NFA/DFA will do just fine. However, the question explicitly disallows changing the dictionary and its API. I’m sure Daniel is aware that O(N) solutions exist if the dictionary is improved, but interview questions are for exploring the abilities of a candidate rather than finding the best possible solution.

  • 32 jeremy // Aug 8, 2011 at 9:33 am

    Right, I understand that this example problem is a small input string. Perhaps what I was trying to ask is how indicative of a real world massive scale machine learning or information retrieval problem this example problem really is.

    Agreed about being able to recognize that an algorithm might have quadratic or exponential blowup. But again, I’m asking about how realistically often one has to implement solutions that need dynamic programming when doing information retrieval machine learning on a web scale. I am assuming that you’re not just talking about parsing the input. I assume when you say massive scale information retrieval and machine learning, you’re working on algorithms to extract patterns from the data, to find relationships, to (set based) extraction of related entities. For example.

    And in that case, the size of the input isn’t 100 or 1000. But millions. Billions. So again, how often is memoization necessary, in practice?

    (And recognizing that something is going to be NP complete, or has a quadratic-time DP solution, or a quadratic time approximation, is different than being able to code that solution, during a half hour interview. So might it perhaps be better to test one’s ability to recognize say 7 or 8 different problems as to their potential complexity, rather than having a candidate write code for a single example?)

  • 33 Daniel Tunkelang // Aug 8, 2011 at 9:50 am

    @jeremy

    Different problems test different skills. I’ve never used this question as the only determinant in assessing a candidate, but I’ve found that it provides more bits of information than most.

  • 34 Retric // Aug 8, 2011 at 10:01 am

    @Grant Husbands I was not suggesting you needed to recreate a dictionary. However, the ideal program for a human language dictionary where the longest word is 32ish digits vs. some input text is very different than something you would use if you had 200 different ~100,000 digit DNA sequences in your dictionary. Conceder with a long enough input string iterating over the full dictionary and creating a quick index could easily reduce runtimes from years to seconds. And with a short enough input string any optimizations are basically pointless. EX: ‘cat’.

  • 35 Earl // Aug 8, 2011 at 10:01 am

    Daniel,

    I think you have a boundary problem in your example — the substring function in java takes (to my mind) weird arguments. You have to call substring until the endIndex / right argument is equal to the string length.
    eg:

    String test = “0123456”;
    puts(“%s length %d”, test, test.length());

    for (int i=1; i %s”, 1, i, test.substring(0, i));

    produces

    prefix [1, 1] -> 0
    prefix [1, 2] -> 01
    prefix [1, 3] -> 012
    prefix [1, 4] -> 0123
    prefix [1, 5] -> 01234
    prefix [1, 6] -> 012345
    prefix [1, 7] -> 0123456

  • 36 Earl // Aug 8, 2011 at 10:05 am

    Blech, html

    http://codepad.org/Ald1l9vb

  • 37 Retric // Aug 8, 2011 at 10:11 am

    PS: By index I mean find size of the longest word, and or do other preprocessing.

  • 38 Daniel Tunkelang // Aug 8, 2011 at 10:15 am

    @Earl

    Are you sure? Change the for loop to

    for (int i=1; i < = test.length(); i++)
    puts("prefix [%d, %d] -> %s”, 0, i, test.substring(0, i));
    }

    produces

    prefix [0, 1] -> 0
    prefix [0, 2] -> 01
    prefix [0, 3] -> 012
    prefix [0, 4] -> 0123
    prefix [0, 5] -> 01234
    prefix [0, 6] -> 012345
    prefix [0, 7] -> 0123456

  • 39 Grant Husbands // Aug 8, 2011 at 10:19 am

    @Retric: Your suggested 5-letter lookup was essentially a change to the dictionary API; it is to that that I was referring. For any lack of clarity on my part, I apologise. Anyway, there are plenty of ways of preprocessing the dictionary, and I mentioned common ones, but none of them fit the problem description, which explicitly disallows such preprocessing, making this debate irrelevant.

  • 40 John H // Aug 8, 2011 at 10:33 am

    Why are you retiring it? Because it’s out on the web?

    I don’t think having knowledge of the interview question ahead of time necessarily precludes its usefulness. Being able to deliver the solution quickly and efficiently is still a valuable assessment of skill. Also, assuming the candidate has to answer 4-6 diverse problems over the course of the day, it’s still a pretty good screen to have them ‘reproduce memorized answers’ (assuming they knew them ahead of time). This is further mitigated by having a couple of problems you can switch between, now the candidate would need to have 25-50 questions memorized.

    Honestly, having the solution to 50 interview questions to CS problems memorized is pretty good. On top of that, very few candidates do the research needed to find the problems ahead of time.

    PS, Thanks for the outline, I’m going to use this question for my candidates in the future.

  • 41 Charles Scalfani // Aug 8, 2011 at 2:40 pm

    I have always hated interview questions, which is why I don’t use them. I’d rather solve a problem WITH the candidate or if I’m being interviewed then with the interviewer. I want real world problems that have NOT been pre-solved. I want to have a design discussion with them and talk through the design and implementation issues. Coding is trivial after that.

    I would rather have the candidate bring in an example of code that is non-proprietary and something they’re particularly proud of. Then I review the code with them like I would if they worked for me.

    I’d much rather have an interview as a pseudo-working session. Then I can see how it would be to actually work with that person. There’s no better way to see how someone thinks than making them work WITH you; not jump through your artificial hoops.

  • 42 binarysolo // Aug 8, 2011 at 3:24 pm

    @Daniel

    Well, I still dunno if this is a great interview problem per se, but it sure is a great conversation starter given the very *accessible* nature that even us laymen who don’t have much programming can access the question and think of reasonable implementations. :-)

    I’m not familiar enough with the programming world, but I’d imagine that the value of clever thinking and efficient structural thinking >> some technical, nuanced aspect of some computer language which is what this problem fields out.

  • 43 jeremy // Aug 8, 2011 at 5:16 pm

    Different problems test different skills. I’ve never used this question as the only determinant in assessing a candidate, but I’ve found that it provides more bits of information than most.

    Again, yes.. but you specifically said that this specific question didn’t just test a candidate’s ability to think generally, computer-scientifically. But a candidate’s ability to think specifically in terms of massive scale, machine learning and information retrieval.

    And it’s that specific connection — between DP and massive scale in the context of ML and IR — that I’m struggling to understand, rather than the broader question of whether a candidate is a good computer scientist.

    It’s just that.. oh, nevermind. I’ll take it offline.

  • 44 Rick Williams // Aug 8, 2011 at 7:06 pm

    Thank you for the response about NDAs. I agree completely that it is unprofessional to publish secret interview questions after an interview.

    But that’s different from the NDA issue. I’ve been on dozens of interviews and have never been asked to sign an NDA for interviewing, nor have I required one of anyone I am interviewing. NDAs are something clients sign before being informed of proprietary business trade secret information. This is done only when absolutely necessary since it is much better simply not to reveal trade secrets to outside parties in the first place. It’s quite bizarre to hear of them being required for interviews and I don’t really believe that this is a common practice. If it is common in some segment of the field, then it is an ill advised practice.

  • 45 Golam Kawsar // Aug 8, 2011 at 7:35 pm

    Great interview question, but in my experience as a programmer, I have not seen many programmers who can cook up dynamic programming solutions to problems like these, let alone during the stress of an interview. Even recognizing this as a dynamic programming question will be hard for many.

    Thanks for such a nice post Daniel. Enjoyed reading it a lot!

  • 46 Debnath // Aug 8, 2011 at 8:13 pm

    The general solution will terminate if it finds the word in the dictionary, should it still not continue? I mean, there could be a word like “endgame”, for the lack of a better example (or backtracking), which might be a part of the dictionary, but are also individual words…I can understand they may not be popular in IR though since they are usually an abbreviation of the sub-words, and the sub words don’t really make sense independently, but given the problem definition…

  • 47 A great interview Problem | Phanikumar // Aug 8, 2011 at 10:20 pm

    […] problem with the code and runtime of the algorithm mentioned in the post. http://thenoisychannel.com/2011/08/08/retiring-a-great-interview-problem/ This entry was posted in Code by phani. Bookmark the […]

  • 48 Daniel Tunkelang // Aug 8, 2011 at 11:37 pm

    @John H

    I hope this question serves you well. It’s time for me to move on from it, and I thought the best way to retire this problem was to do so in a way that others would learn from it.

  • 49 Daniel Tunkelang // Aug 8, 2011 at 11:42 pm

    @Charles Scalfani

    We do have collaborative problem solving and product design discussions as part of the interview process. But I also want to see how a candidate writes code. Reviewing code they’ve written before is an option, but it’s tricky — especially for a candidate that has not written non-proprietary code in a long time.

  • 50 Daniel Tunkelang // Aug 8, 2011 at 11:44 pm

    @Debnath

    The termination condition is part of the problem statement. The problem could be changed to require outputting all valid segmentations, but there could be an exponential number of them. We could also require the “best” segmentation, which is an interesting design question as to what constitutes “best”.

  • 51 On “Retiring a Great Interview Problem” « Will.Whim // Aug 9, 2011 at 1:49 am

    […] Tunkaleng wrote an interesting blog post, “Retiring a Great Interview Problem” on an interview problem that he has, in the past, posed to interviewees, but which he has […]

  • 52 Will Fitzgerald // Aug 9, 2011 at 1:51 am

    I wrote a response to this, “On ‘Retiring a Great Interview Problem'” at http://t.co/uXQf4oh.

  • 53 Daniel Tunkelang // Aug 9, 2011 at 6:41 am

    Will, I read your response. A short one here: assessment of candidates on a problem like this isn’t binary. Rather, the point is to get as holistic a picture as is possible in the time constraints of how well a candidate solves an algorithmic problem and implements it. It’s not a perfect tool, but no tool is. I’m always on the lookout for better ones — don’t forget that I’m retiring this problem.

    And you allude to a candidate’s nervousness under interview conditions. Part of the interviewer’s job is to put the candidate at ease. That isn’t unique to questions that involve coding, and it doesn’t always work out. No interviewer or hiring process is perfect.

    Finally, while I share your concerns about using whiteboard coding in interviews (see my link to Diego’s post), I disagree that it discriminates against older candidates. That very hypothesis strikes me as ageist, at least in the absence of supporting data.

  • 54 Patrick Tilland // Aug 9, 2011 at 11:06 am

    As rocketsurgeon mentioned, there is one bug and also one typo in the code. There is a parenthesis missing in this line:

    if (dict.contains(input) return input;

    And the loop condition never reaches the next to last character in the input string:

    for (int i = 1; i < len – 1; i++)

  • 55 Daniel Tunkelang // Aug 9, 2011 at 11:14 am

    @rocketsurgeon
    @Patrick Tilland

    Bugs fixed. Thanks guys!

  • 56 Sonic Charmer // Aug 10, 2011 at 3:48 am

    I won’t write this out in Java/whatever syntax as I am too lazy, also not a programmer so not interesting in memorizing syntax of this or that language, but as stated I am lazy so I would want a ‘good’ segmentation, not just any. (I don’t want a lot of “a’s” if there’s an “aaaa” available). So I would check the longest-length word in the dictionary, say that length is k (of course cap this at input-string length and discard all dictionary words greater than this – I’ll assume the dictionary has easy capability both of this max-length-check and to ‘discard’/ignore all >k), then starting from i=1 search all (i,i+k-1) substrings of the input unless/till I find a match, if none reduce k (discard more dictionary!) & try again, if so parse out & into ‘match’, prefix and suffix (as applicable) and recurse onto both of the latter. Details too boring/obvious to spec out.

    Personally I dislike cutesy ‘interview questions’ and am instinctively distrustful of the filter they implicitly apply to candidates. When I’m interviewing someone I do what is known as ‘talk to’ the person, one might even say I ‘have a conversation’ with them. That may/may not be better but my way at least I don’t think someone could slide through the interview filter by memorizing some stuff off websites.

  • 57 Stavros Macrakis // Aug 10, 2011 at 7:59 am

    Many skills go into a good software engineer, and different interviewers will probe different skills. This problem emphasizes algorithmic thinking and coding — and I ask a similar interview question myself. But algorithmic coding is becoming surprisingly uncommon in many environments because the non-trivial algorithms are wrapped in libraries. Of course, there are places where understanding all this is crucial — someone has to write those libraries, and some people really do have Big Problems that the libraries don’t address — but it doesn’t seem to be a central skill that all programmers need to master.

    Is this good or bad? Consider A.N. Whitehead: “Civilization advances by extending the number of important operations which we can perform without thinking about them.”

  • 58 Stavros Macrakis // Aug 10, 2011 at 8:13 am

    One thing I’ve learned about interview questions is that you have to lead up to them in steps if you want to determine where a candidate’s understanding trails off — which may be very soon.

    For example, many candidates claim on their resumes that they know SQL. I used to ask such candidates how they would determine if person A was an ancestor of person B given a table of parent-child relations. This requires the (advanced) SQL feature of recursive queries (and I’d actually be happy if they could explain why it couldn’t be done in SQL, as it can’t in basic SQL). Now, I ask the question in stages:

    * In SQL, how would you represent parent-child relations?
    * How would you find X’s parents?
    * How would you find X’s grandparents?
    * How would you find all of X’s ancestors?
    * What if you wanted to do all this high-volume genealogy Web site — would you change your table design or queries? or use some technology other than SQL?

    I was shocked to discover that many candidates who listed SQL on their resumes couldn’t do *any* of this, and many required considerable coaching to do it. One candidate didn’t even remember that SQL queries start with SELECT — I would have forgiven this if he’d had conceptual understanding but had just forgotten the keyword, but he had zero conceptual understanding as well.

    All this to say that you can’t really trust the self-reporting on a resume and you’ve got to probe to understand what the candidate actually knows.

  • 59 Daniel Tunkelang // Aug 10, 2011 at 8:42 am

    @Sonic Charmer

    Your sketch is actually a reasonable start towards what I’d expect of a candidate. In fact, I think I could persuade you that your assumptions would have to be general enough to at least solve the interview problem as a special case where the min word length is one and the max is large enough to be the length of the input string. Please bear in mind that this “cutesy” problem is a simplification of one I had to solve to deliver software that has been deployed to hundreds of millions of people!

    @Stavros

    You’re right that not all software engineers needs to have strong command of algorithms. But I do require that strength of the folks I hire, given the problems that my team solves. Same applied at Google and Endeca. And yes, self-reporting on a resume is always subject to the maxim of “trust but verify”.

  • 60 Word Breaks « Programming Praxis // Aug 12, 2011 at 2:03 am

    […] Tunkelang posted this interview question to his blog: Given an input string and a dictionary of words, segment the input string into a space-separated […]

  • 61 Daniel Tunkelang // Aug 12, 2011 at 6:24 am

    @Programming Praxis

    A solution in Scheme. Nice!

  • 62 Ben Mabey // Aug 14, 2011 at 4:16 pm

    Thanks for sharing this problem! I did a Clojure and Ruby solution and discussed the differences in lazy (as in lazy lists) and non-lazy solutions: http://benmabey.com/2011/08/14/word-break-in-clojure-and-ruby-and-laziness-in-ruby.html

  • 63 Daniel Tunkelang // Aug 14, 2011 at 8:32 pm

    Ben, thank you! I’m honored to have inspired such an insightful and elegant post.

  • 64 Conducting a Remote Technical Interview | Hiring Tech Talent // Aug 15, 2011 at 9:19 am

    […] hacked, as the wise candidate can research typical questions ahead of time. Daniel Tunkelang has a great post on this, where he found that one of his best questions was posted on […]

  • 65 Raymond Moore // Aug 17, 2011 at 3:21 pm

    Excellent article \ topic. We are currently recruiting for a half dozen C++ \ Web UI positions and for many companies it is extremely difficult to determine whom is talking the talk and whom can write VERY clean code and think logically when faced with difficult programming requests. As an example a Sales Director will hand a candidate for a sales position a phone and say “make this call and pitch them” just so you can see what they have, this is partly what interviewing has become like in the IT world.

  • 66 Daniel Tunkelang // Aug 17, 2011 at 6:07 pm

    Raymond, thanks. The problem with all interviewing — and with interviewing software engineers in particular — is that it’s hard to extract a reliable signal under interview conditions.

    Ultimately the best solution may be to change the interview conditions. But the approach has to be both effective and efficient. It’s a great research problem — and I’ll let readers here know what my colleagues and I come up with.

    Of course, I’d love to hear what others are doing.

  • 67 State of Technology #21 « Dr Data's Blog // Aug 18, 2011 at 10:55 pm

    […] – How to retire a great Interview problem – “word break” problem described as […]

  • 68 Anatoly Karp // Aug 21, 2011 at 12:20 am

    As a side note, there is a discussion of a slightly more general problem in Peter Norvig’s excellent chapter from O’Reilly book “Beautiful Data” – http://norvig.com/ngrams/ch14.pdf (see section “Word Segmentation”). His remark that the memoized solution can be thought of as the Viterbi algorithm from the HMM theory is nicely illuminating (and of course obvious upon a moment’s thought).

  • 69 Daniel Tunkelang // Aug 21, 2011 at 9:31 am

    Indeed. Peter emailed me his “naive” solution — unfortunately, I don’t think he’s on the market. :-)

  • 70 Attention CMU Students! // Sep 7, 2011 at 7:04 pm

    […] and Wednesday. And of course LinkedIn will be conducting on-campus interviews: those will take place all day on Thursday, September […]

  • 71 William // Nov 4, 2011 at 10:57 pm

    Hi Daniel:
    how could you change your code so that it can find all valid segmentation for the whole string?
    E.X. string “aaa”, dict{‘a’, ‘aa’} to be segmentation{“a a a”, ‘a aa’, “aa a”}

  • 72 Daniel Tunkelang // Nov 5, 2011 at 1:44 pm

    William, interesting question. For starters, the number of valid segmentations may be exponential in the length — in fact, that will be the case for a string of n a’s if every sequence of a’s is a dictionary word. Could still use memoization / dynamic programming to avoid repeating work, but storing sets of sequences rather than a single one.

  • 73 Roberto Lupi // Dec 17, 2011 at 2:41 pm

    It can be done in O(n), n=length of the string, with some pretty relaxed assumption given the nature of the problem: we just need a preprocessing step on the dictionary.

    The idea is to build a set of rolling hashing function using the Rabin-Karp algorithm, for each word length in the dictionary, and the corresponding hash value for each word.

    To segment the string, we loop over it once, updating the rolling hash values (for each length) and if we find a match in our set of hash values from the dictionary, we have a potential match. We still have to check the actual dictionary to confirm the match, avoiding false positives.

    This design has the added advantage that the dictionary can be larger than what can fit into memory. We only need to store the hash values for each word in memory.

  • 74 Daniel Tunkelang // Dec 17, 2011 at 3:01 pm

    Roberto, one person I presented the problems to did suggested an approach along these lines: since dictionary membership is a regular language, just build a finite state machine. I was impressed by the ingenuity, but I then enforced the constraint that the dictionary only supported a constant-time membership test.

    By the way, I’ve since moved on to less exciting coding problems that require less ingenuity and are more a test of basics (though not quite as elementary as fizzbuzz). I’m still surprised at how many candidates with strong resumes fail at these.

  • 75 Roberto Lupi // Dec 18, 2011 at 12:27 pm

    Maybe stronger candidates tend to overcomplicate problems, instead of solving them in the simplest way they search for a clever one and get lost.

    “[T]he stupider one is, the closer one is to reality. The stupid one is, the clearer one. Stupidity is brief and artless, while intelligence wriggles and hides itself. Intelligence is a knave, but stupidity is honest and straight forward.” — Dostoevsky (The Brothers Karamazov)

  • 76 Daniel Tunkelang // Dec 18, 2011 at 12:33 pm

    Some strong candidates assume that an easy solution must be too naive and therefore wrong. For that reason, it’s important to set expectations at the beginning. If a problem is basic, I tell the candidate as much — which also helps avoid a candidate feeling insulted or worrying that the bar is too low. And for all problems, I urge candidates to come up with a working solution before optimizing it.

  • 77 Alex // Jan 10, 2012 at 11:23 am

    At first, it sounds depressing. I’m writing compilers and OSes, among other things, but don’t think I’d pass this interview.

    However, the more I read the more I realize I’m a different kind of programmer than who is sought here. I don’t cite CLR by heart, solve real problems in real manner (i.e., including asking others, not to mention using books, Internet etc.) and generally not good at “coding”. I guess my skills aren’t very marketable, in this approach. Might be a better choice not to program for somebody else.

  • 78 Daniel Tunkelang // Jan 10, 2012 at 9:12 pm

    Alex, I’ve actually switched to using problems that are a bit less CLR-ish. I still think it’s reasonable to expect someone to be able to real problems like this one, though I realize it’s unnatural to solve problems under interview conditions.

    An alternative approach would be to put less emphasis on the accuracy of the interviewing process and treat the first few months as a trial period. Unfortunately, that’s not the cultural norm, so instead we try to squeeze all the risk out of the hiring process.

    Anyway, if you’re able to write a compiler or OS, you should have no problem finding work you’re great at and enjoy.

  • 79 dbt // Jan 11, 2012 at 2:50 pm

    Alex, I have heard that lament before. I work at a place that uses similarly algorithmic questions (and other questions too — algorithms aren’t everything, but they’re important) and I sometimes hear my coworkers lament that they couldn’t get hired with the standards we have today. Which is, of course, nonsense. :)

    What’s important to realize about these questions is that in an hour, you have time to make an attempt at an answer, get feedback, and improve your solution. It is a conversation, and not just a blank whiteboard, an interviewer in the corner tapping a ruler on the table every time you make a mistake, and a disappointing early trip home.

  • 80 Daniel Tunkelang // Jan 11, 2012 at 3:27 pm

    Tapping a ruler on the table? More like rapping you on the knuckles! OK, the nuns didn’t really do that to us. :-)

    Seriously, dbt is right. Good interviews are a conversation. Otherwise, it would be better to make them non-interactive tests.

  • 81 Don’t write on the whiteboard – The Princeton Entrepreneurship Club // Jan 28, 2012 at 2:33 pm

    […] me to write tests for my code, find corner cases. He then asked me 3 other problems. They were Dan Tunkelang type problems. He ran out of problems and there were 15 minutes left. “Normally there’s not enough […]

  • 82 Hiring: you are doing it wrong | @abahgat's blog // Feb 22, 2012 at 3:27 am

    […] is a challenge. A lot has been written about the process itself and its quirks, ranging from programming puzzles to whiteboard interviews. However, there are still a few details that are often overlooked by […]

  • 83 Strata 2012: Big Data is Bigger than Ever // Mar 2, 2012 at 12:58 am

    […] minutes extended into three hours of conversation about everything from normalized KL divergence to interview problems — and segued into a reception with specialty big-data cocktails. By the time I got back to my […]

  • 84 man2code // Mar 18, 2012 at 1:18 am

    if (segSuffix != null) {
    memoized.put(input, prefix + ” ” + segSuffix);
    return prefix + ” ” + segSuffix;
    }

    this if should be added with else as mention below, as to show words fetched before non-dictionary word ::

    if (segSuffix != null) {
    memoized.put(input, prefix + ” ” + segSuffix);
    return prefix + ” ” + segSuffix;
    }else{
    return prefix ;
    }

  • 85 Jp Bida // Jun 28, 2012 at 11:21 am

    Would using human cycles with captcha’s get you closer to O(n)? Or does big O analysis only apply when we actually understand the details of the algorithm?

  • 86 netfootprint // Jul 1, 2012 at 1:41 am

    I was thinking about the same problem. Quite surprised to see the same thing appear here and asked for interviews :).
    shouldn’t “aaaaab” be
    “aaaaa” + b ? if a, aa, aaa, aaaa,aaaaa are in dict ?
    Step 1: check for “aaaaab”, not in dictionary
    Step 2: check for “aaaaa”, found in dictionary
    word segments are “aaaaa” + “b”
    I wonder why the need all combinations in a “real-world” problem!

  • 87 Algorithms: What is the most efficient algorithm to separateconnectedwords? assuming all the constituent words are in the vocabulary (and also assume for simplicity that there aren't any spelling mistakes) - Quora // Jul 3, 2012 at 4:23 am

    […] is a thorough discussion of this problem as an interview question by Daniel Tunkelang: [1][1] http://thenoisychannel.com/2011/…Comment Loading… • Post • 4:23am  Add […]

  • 88 Rob // Aug 27, 2012 at 8:30 pm

    I love the problem…but I can’t help to notice that the interviewer who has used this question to filter many fine candidates over the years can’t even write out a working solution without bugs given unlimited time, years of experience asking others this question and do so in the context of writing an in-depth analysis teaching the unwashed masses about how great an interview problem this is. Can it be such a great question if you can’t even get it right?

    The biggest problem with questions like this is that this type of programming is a highly perishable skill. I have played the guitar for 10 years but lately have spent more time singing acoustically – trying to fit my fingers to a Bach piece that used to come to me written on the back of my eyelids is now impossible.

    Now if I were playing Bach every day it would be a different story. Same with this problem. You are only going to hire the guy who happened to solve several problems like this last week because they ran into some issue where it was relavent.

  • 89 Daniel Tunkelang // Aug 27, 2012 at 8:41 pm

    Criticism noted. But bear in mind that the question isn’t a binary filter. It’s a test of algorithmic thinking and even of working out the problem requirements.

    I’m curious what you mean by “this type of programming” being highly perishable. If you mean solving a basic algorithmic problem that comes up in the course of real work, then I strongly object. As I said in the post, the problem isn’t from a textbook — it’s a simplified version of a real issue that has come up for me and others in the course of writing production software.

    But I grant that people don’t always have opportunities to use dynamic programming and perhaps even recursion. With that in mind, I’ve switched to interview problems that are less reliant on these. But I still expect the people I hire to be able to apply these fundamental computer science techniques with confidence.

    Of course there’s a risk with this and any interview problem of overfitting to someone’s recent experience. That’s why it’s good to use a diverse set of interview questions. If someone aces the interviews because she solved all of those problems last week, I’ll take my chances and hire her!

    Finally, if you have suggestions about how to interview more effectively, I’m all ears.

  • 90 Tapori // Sep 2, 2012 at 5:23 pm

    One problem with this is that good programmers tend to avoid recursion (no data to support it though).

    So while you are expecting a 10 min recursive version, the good programmer is trying to bring out a non recursive version and might fail. (Of course, non recursive version for this can be done in an interview).

    btw, this is a texbook exercise problem.

    I believe Sedgewick’s book (or perhaps CLR) has it.

  • 91 Daniel Tunkelang // Sep 2, 2012 at 5:30 pm

    I concede that a lot of good programmers may instinctively avoid recursion, although in this case I’d say that makes them worse programmers. The whole point of learning a set of software engineering tools is to apply the right one to the right problem, and this problem is mostly naturally defined and solved recursively.

    As for it being a textbook exercise, good to know. As I said in the original post, I first encountered this problem in the course of writing production software.

  • 92 Tapori // Sep 2, 2012 at 7:21 pm

    The problem (pun intended) is that the problem is still incompletely defined. For instance, we have no idea about the expected target hardware. Does it have limited memory? Limited stack space? Does the language we are to use support recursion? etc Ok, maybe the last one (or any of them) is not relevant these days, but you get my point.

    Basically we have no idea what we need to try and optimize for. Granted, a good candidate might and probably should try and clarify that, but in the face of ambiguity, good programmers follow “(instinctive) good practices” which they gained through experience etc.

    For instance, you will use quadratic memory and linear stack in the recursive version. A good programmer might instinctively try avoid the cost of the stack.

    But, if the goal is to optimize the time to write the code, then a recursive version will be faster (and I suppose is an implicit goal in the interviews, but almost never the case in critical production code).

    So calling them worse programmers for not using recursion is not right, IMO.

    btw, you seem to have ignored the cost of looking up the memoized structure. Since you are looking up the string your recursive version is cubic, inspite of the substring and dictionary lookup being O(1). Of course, that can be avoided if we represent the string for lookup by the end point indexes (i,j), rather than the string itself.

    Sorry for the long post.

  • 93 Daniel Tunkelang // Sep 2, 2012 at 8:15 pm

    No need to apologize — I appreciate the discussion!

    And several of the points you’ve raised have come up when I’ve used this problem in interviews. I’ve seen candidates implement a stack-based approach without recursion. I’ve also had discussions about nuances of scale and performance, including whether the limited memory requires the dictionary to be stored out of code (a great motivation for using a Bloom filter) and whether the cost of creating a read-only copy of a substring is constant (i.e., represented by the end-point indexes) or linear in the length of the substring. (because the substring is actually created as a string).

    So you’re right that not all good programmers will jump to recursion — though I do think that is the simplest path for most. And in an interview I urge candidates to start with the simplest solution that works. That’s not only a good idea during an interview, but a good idea in practice, to avoid premature optimization.

    Regardless of the choice of interview problem, the interviewer has to be competent and flexible. I’m sure a interviewer can butcher an interview with even the best problem. But some interview problems are better than others. And I strongly favor interview problems that are based on real problem, don’t require specialized knowledge, and provide candidates options to succeed without depending entirely on the candidate arriving at a single insight.

  • 94 Tapori // Sep 2, 2012 at 10:16 pm

    Completely agree with the comment about competent and flexible interviewers.

  • 95 Job Interviews: What is your favourite interview questionn for a software engineer? - Quora // Sep 3, 2012 at 12:07 pm

    […] is your favourite interview questionn for a software engineer?This blog post has a nice question – http://thenoisychannel.com/2011/… . What are your favourite interviewing questions?   Add […]

  • 96 mm // Sep 22, 2012 at 7:41 pm

    Due to this question I had interview with whitepages.com and this is why I couldn’t get the job… Great answer, hopefully I won’t see this question again in my interviews. LOL

  • 97 Quora // Sep 29, 2012 at 10:37 am

    Why do topmost tech companies give more priority to algorithms during the recruitment process?…

    Not all top tech companies. :-) At LinkedIn, we put a heavy emphasis on the ability to think through the problems we work on. For example, if someone claims expertise in machine learning, we ask them to apply it to one of our recommendation problems. A…

  • 98 techguy // Oct 16, 2012 at 11:51 pm

    I have a question about the memoized solution. I understand the advantage of saving the results of a dead end computation, where the code reads:

    memoized.put(input, null);

    But I don’t understand the advantage to memoizing here:

    memoized.put(input, prefix + ” ” + segSuffix);

    Since segSuffix has already been found to be not null, that means we have reached the end of the input string somewhere deeper in the call stack, and are now just unwinding back to the top? Maybe I’ve missed something, it’s late at night for me, but I can’t see it any other way right now. Thanks.

  • 99 Daniel Tunkelang // Oct 17, 2012 at 10:06 pm

    I think you’re right — we don’t need to memoize the non-null values. I’ve amended the code accordingly.

  • 100 CIKM 2012: Notes from a Conference in Paradise // Nov 12, 2012 at 7:00 am

    […] sessions of the conference. There was a talk on query segmentation, a topic responsible for my most popular blog post. Also a great talk on identifying good abandonment, a problem I’ve been interesting ever […]

  • 101 Thought this was cool: CIKM 2012: Notes from a Conference in Paradise « CWYAlpha // Nov 13, 2012 at 7:11 am

    […] sessions of the conference. There was a talk on query segmentation, a topic responsible for my most popular blog post. Also a great talk on identifying good abandonment, a problem I’ve been interesting ever since […]

  • 102 Dirk Gorissen // Jan 10, 2013 at 4:41 am

    As an aside, as far as I can see, as given the given code will fail to give a complete solution in cases like this:

    dict = [“the”,”big”,”cat”]
    string = “thebiggercat” or “thebigfoocat”

    Also, if you call it with the string “cats” it will return null instead of cat.

  • 103 Daniel Tunkelang // Jan 10, 2013 at 7:25 am

    Dirk, that is how the problem is set up.

    From the post:

    Q: What about stemming, spelling correction, etc.?

    A: Just segment the exact input string into a sequence of exact words in the dictionary.

    Of course you can generalize the problem to make it more interesting, especially if a candiate solves the original problem with time to spare.

  • 104 Dirk Gorissen // Jan 10, 2013 at 7:36 am

    Indeed, missed that, sorry. Thanks for a great post btw.

  • 105 Daniel Tunkelang // Jan 10, 2013 at 7:38 am

    My pleasure, glad you enjoyed it!

Clicky Web Analytics