Categories
General

Retiring a Great Interview Problem

Interviewing software engineers is hard. Jeff Atwood bemoans how difficult it is to find candidates who can write code. The tech press sporadically publishes “best” interview questions that make me cringe — though I love the IKEA question. Startups like Codility and Interview Street see this challenge as an opportunity, offering hiring managers the prospect of outsourcing their coding interviews. Meanwhile, Diego Basch and others are urging us to stop subjecting candidates to whiteboard coding exercises.

I don’t have a silver bullet to offer. I agree that IQ tests and gotcha questions are a terrible way to assess software engineering candidates. At best, they test only one desirable attribute; at worst, they are a crapshoot as to whether a candidate has seen a similar problem or stumbles into the key insight. Coding questions are a much better tool for assessing people whose day job will be coding, but conventional interviews — whether by phone or in person — are a suboptimal way to test coding strength. Also, it’s not clear whether a coding question should assess problem-solving, pure translation of a solution into working code, or both.

In the face of all of these challenges, I came up with an interview problem that has served me and others well for a few years at Endeca, Google, and LinkedIn. It is with a heavy heart that I retire it, for reasons I’ll discuss at the end of the post. But first let me describe the problem and explain why it has been so effective.

The Problem

I call it the “word break” problem and describe it as follows:

Given an input string and a dictionary of words,
segment the input string into a space-separated
sequence of dictionary words if possible. For
example, if the input string is "applepie" and
dictionary contains a standard set of English words,
then we would return the string "apple pie" as output.

Note that I’ve deliberately left some aspects of this problem vague or underspecified, giving the candidate an opportunity to flesh them out. Here are examples of questions a candidate might ask, and how I would answer them:

Q: What if the input string is already a word in the
   dictionary?
A: A single word is a special case of a space-separated
   sequence of words.

Q: Should I only consider segmentations into two words?
A: No, but start with that case if it's easier.

Q: What if the input string cannot be segmented into a
   sequence of words in the dictionary?
A: Then return null or something equivalent.

Q: What about stemming, spelling correction, etc.?
A: Just segment the exact input string into a sequence
   of exact words in the dictionary.

Q: What if there are multiple valid segmentations?
A: Just return any valid segmentation if there is one.

Q: I'm thinking of implementing the dictionary as a
   trie, suffix treeFibonacci heap, ...
A: You don't need to implement the dictionary. Just
   assume access to a reasonable implementation.

Q: What operations does the dictionary support?
A: Exact string lookup. That's all you need.

Q: How big is the dictionary?
A: Assume it's much bigger than the input string,
   but that it fits in memory.

Seeing how a candidate negotiates these details is instructive: it offers you a sense of the candidate’s communication skills and attention to detail, not to mention the candidate’s basic understanding of data structures and algorithms.

A FizzBuzz Solution

Enough with the problem specification and on to the solution. Some candidates start with the simplified version of the problem that only considers segmentations into two words. I consider this a FizzBuzz problem, and I expect any competent software engineer to produce the equivalent of the following in their programming language of choice. I’ll use Java in my example solutions.

String SegmentString(String input, Set<String> dict) {
  int len = input.length();
  for (int i = 1; i < len; i++) {
    String prefix = input.substring(0, i);
    if (dict.contains(prefix)) {
      String suffix = input.substring(i, len);
      if (dict.contains(suffix)) {
        return prefix + " " + suffix;
      }
    }
  }
  return null;
}

I have interviewed candidates who could not produce the above — including candidates who had passed a technical phone screen at Google. As Jeff Atwood says, FizzBuzz problems are a great way to keep interviewers from wasting their time interviewing programmers who can’t program.

A General Solution

Of course, the more interesting problem is the general case, where the input string may be segmented into any number of dictionary words. There are a number of ways to approach this problem, but the most straightforward is recursive backtracking. Here is a typical solution that builds on the previous one:

String SegmentString(String input, Set<String> dict) {
  if (dict.contains(input)) return input;
  int len = input.length();
  for (int i = 1; i < len; i++) {
    String prefix = input.substring(0, i);
    if (dict.contains(prefix)) {
      String suffix = input.substring(i, len);
      String segSuffix = SegmentString(suffix, dict);
      if (segSuffix != null) {
        return prefix + " " + segSuffix;
      }
    }
  }
  return null;
}

Many candidates for software engineering positions cannot come up with the above or an equivalent (e.g., a solution that uses an explicit stack) in half an hour. I’m sure that many of them are competent and productive. But I would not hire them to work on information retrieval or machine learning problems, especially at a company that delivers search functionality on a massive scale.

Analyzing the Running Time

But wait, there’s more! When a candidate does arrive at a solution like the above, I ask for an big O analysis of its worst-case running time as a function of n, the length of the input string. I’ve heard candidates respond with everything from O(n) to O(n!).

I typically offer the following hint:

Consider a pathological dictionary containing the words
"a", "aa", "aaa", ..., i.e., words composed solely of
the letter 'a'. What happens when the input string is a
sequence of n-1 'a's followed by a 'b'?

Hopefully the candidate can figure out that the recursive backtracking solution will explore every possible segmentation of this input string, which reduces the analysis to determine the number of possible segmentations. I leave it as an exercise to the reader (with this hint) to determine that this number is O(2n).

An Efficient Solution

If a candidate gets this far, I ask if it is possible to do better than O(2n). Most candidates realize this is a loaded question, and strong ones recognize the opportunity to apply dynamic programming or memoization. Here is a solution using memoization:

Map<String, String> memoized;

String SegmentString(String input, Set<String> dict) {
  if (dict.contains(input)) return input;
  if (memoized.containsKey(input) {
    return memoized.get(input);
  }
  int len = input.length();
  for (int i = 1; i < len; i++) {
    String prefix = input.substring(0, i);
    if (dict.contains(prefix)) {
      String suffix = input.substring(i, len);
      String segSuffix = SegmentString(suffix, dict);
      if (segSuffix != null) {
        return prefix + " " + segSuffix;
      }
  }
  memoized.put(input, null);
  return null;
}

Again the candidate should be able to perform the worst-case analysis. The key insight is that SegmentString is only called on suffixes of the original input string, and that there are only O(n) suffixes. I leave as an exercise to the reader to determine that the worst-case running time of the memoized solution above is O(n2), assuming that the substring operation only requires constant time (a discussion which itself makes for an interesting tangent).

Why I Love This Problem

There are lots of reasons I love this problem. I’ll enumerate a few:

  • It is a real problem that came up in the course of developing production software. I developed Endeca’s original implementation for rewriting search queries, and this problem came up in the context of spelling correction and thesaurus expansion.
  • It does not require any specialized knowledge — just strings, sets, maps, recursion, and a simple application of dynamic programming / memoization. Basics that are covered in a first- or second-year undergraduate course in computer science.
  • The code is non-trivial but compact enough to use under the tight conditions of a 45-minute interview, whether in person or over the phone using a tool like Collabedit.
  • The problem is challenging, but it isn’t a gotcha problem. Rather, it requires a methodical analysis of the problem and the application of basic computer science tools.
  • The candidate’s performance on the problem isn’t binary. The worst candidates don’t even manage to implement the fizzbuzz solution in 45 minutes. The best implement a memoized solution in 10 minutes, allowing you to make the problem even more interesting, e.g., asking how they would handle a dictionary too large to fit in main memory. Most candidates perform somewhere in the middle.

Happy Retirement

Unfortunately, all good things come to an end. I recently discovered that a candidate posted this problem on Glassdoor. The solution posted there hardly goes into the level of detail I’ve provided in this post, but I decided that a problem this good deserved to retire in style.

It’s hard to come up with good interview problems, and it’s also hard to keep secrets. The secret may be to keep fewer secrets. An ideal interview question is one for which advance knowledge has limited value. I’m working with my colleagues on such an approach. Naturally, I’ll share more if and when we deploy it.

In the meantime, I hope that everyone who experienced the word break problem appreciated it as a worthy test of their skills. No problem is perfect, nor can performance on a single interview question ever be a perfect predictor of how well a candidate will perform as an engineer. Still, this one was pretty good, and I know that a bunch of us will miss it.

ps. Check out this post by Behdad Esfahbod that does the problem justice! He also notes that looking up a string in a dictionary isn’t O(1) but has a cost proportional to the string length, which motivates exploring a trie rather than a dictionary to store the strings.

By Daniel Tunkelang

High-Class Consultant.

105 replies on “Retiring a Great Interview Problem”

Why is this not just n lookups? Start with one character. If that fails, look up the first two, etc. When a lookup succeeds, insert a space and set the string to start at the next unexamined character.

Thus “aaaaa” becomes “a a a a a”. Works great with a trie.

Like

rp1 – You need to consider cases where a word is composed of a valid word base with a suffix that is not a standalone word.

Say, valid word + an invalid word, such as: “shorter” –> = short + er = (valid) + (invalid).

Recursion just makes the most sense as the author wrote; reducing the problem by accounting for the string from the back. As he points out, bad case is when you have tons of viable character combinations that remain workable with prior chars in combinations, and one persistent incompatibility that forces the exploration of the entire space of solutions .

Er, I’m not articulating it well; just read his answer which explains it a lot better. 🙂

Like

rp1’s solution seems perfectly fine to me. The author did not explain why I couldn’t use it.

If you think differently explain why.

Like

You make “interview candidates” sign NDAs in order to interview. That’s amazing. Mind sharing the text of the NDA, I’d like to see what it covers. Thanks in advance.

Like

Incidentally I’m kinda surprised this is considered a worthy-enough interview prob. I’m not a programmer by trade, though I enjoyed the CS106x+107 sequence at a li’l school near Daniel’s location and would think that a freshman with some amount of CS thinking/experience would trivially solve this.

Like

the main issue with such interviews is that 50% of the candidates have troubles answering those under stress no matter how simple.

i know its an issue hard to solve.

interviewers don’t care about finding the proper candidate. they care that a candidate passed tests so can’t be blamed if the candidate does not perform well enough at work.

interviewers care of a level of assurance, if you prefer. hey it’s not as bad as it sounds, the candidate indeed is going to be likely to fit the position.

will he really fit, will he like it and is he actually good at solving *new* problems or just solving classic interview quizzes ? Who cares thats a risk the interviewer is willing to take.

A true interviewer, that is, someone who’s main care is that the candidate is going to be helping out the company is not going to give all these stupid questions the same way. A true interviewer, while he has a lot of questions and steps prepared will not serve them to the candidate. he will learn to know the candidate in a short amount of time and ask the right questions, ones which may not even have been prepared before.

a true interviewer sees the challenge in every candidate he interviews, and not just the opposite (where the candidate is challenged by the interviewer’s questions)

a true interviewer disregard the crap (hello best questions list) and focus on what matters.

finally, new talents in the it world are HARD to come by so there’s a lot of competition to recruit them. well, let me tell you this straight, there’s many talents which are just discarded by the review/interview process that are here for you to pick from.

thats how we find gems that everyone elses passed upon. then people wonder why my team always outperforms their.

Like

Interesting problem, which I don’t think I’ve seen before. Here’s the O(n^2) solution I’d have come up with if confronted with this thing.

Build a directed graph consisting of vertices labeled 0, 1, 2, …, n, where n is the length of the string, where there is an edge from k to j if and only if there is a dictionary word of length j-k starting at position k in the string.

This can be done in O(n^2).

Solutions to the problem then correspond to paths through the graph from 0 to n. Use something like Dijkstra’s algorithm to find a minimal path in O(n^2), which corresponds to a solution that uses the smallest number of dictionary words to exactly cover the string.

Like

Im not a fully blown coder by any means.

But, I was thinking a solution, would be to break the string into consonant clusters (syllables maybe?) and group them into weighted groups. Or some algorithm measuring lexical units within the string.

some database of word clusters like:

String = applepieisgreat

3wordcluster = “app”, “eat”, “ple”, “ond” etc …
2wordcluster = “ap”, “le”, “pi”, “is” etc

Then build up the words
app + le / ap + ple / ple + pie / ple + pi / pie + is / etc

Then working back from the biggest clusters run look ups against the dictionary see if its a word.
Then it would be a simple jigsaw puzzle of what words fit in the string.

Like

@Seems easy

Let’s assume that this is the dictionary:

this
text
is
short
shorter

Now let’s apply rp1’s solution to the problem. It will walk along the string until it finds a match and apply that match to the output. It produces this:

this text is short er

“short” is obviously a match for the first 5 characters of “shorter”. Since the solution analyses those 5 characters first and finds a match, it happily accepts the word “short”. It now has the string “er” to analyse. “er” isn’t in the dictionary, so we’ll assume it just gets tagged on the end instead of discarded.

The correct behaviour here would be to start backtracking. If we have “er” as an unmatched string, it is possible that it connects with something we’ve matched previously. So, let’s try “ter”. Nope, no match. Now “rter”. Nope, no match. However, when we eventually try “shorter”, we’ve got a match. With backtracking, we’ve found a solution that works for everything in the string:

“this text is shorter”

Clearly this isn’t O(n) as we have to re-examine components of the string multiple times. In fact, rp1’s algorithm is O(n^2), not O(n) as he suggestst.

Like

@ Seems easy

Suppose the words are a, aa, aaa, ab
Suppose the input is aaab.
The proper segmentation is aa+ab, but if you do it rp1s way you’d try a+? and fail to find a segmentation. If you also started with the longer string aaa you’d still fail.
In the worst case scenario there’s a tiny chance you’d get the right answer greedily

Like

Coder interviews are “rocket science”. How to detect the best programmers is a hard problem.

But, this coding test is for a programmer who is going to be hired to write a string-processing library (or similar). That might not be your goal.

There is also a pretty high degree of quirks in this problem. Most programmers come from a background of processing whole lines, whole sentences, whole words, etc. This seems to be asking the programmer to make one dictionary lookup *per letter*. Perhaps too much of a brain shift for a high-pressure interview situation.

Then, you think you speed it up by creating a O(n) cache to hold words you have seen when the O(log n) dictionary has all the words in memory (and the virtual memory pages that get loaded wouldn’t get paged out to disk during the one millisecond this runs). Your cache can only really help if it is the case that the program, the cache, and the input string are quite small and fit in the fastest level of processor cache.

This article and exercise confirms my experience that most interviewers are not good enough programmers themselves and unbiased enough to recognize the great programmers when they pass by. But, that’s OK, the HR department has usually already rejected them due to some “problem with their resume”.

You are correct that a coding problem is absolutely necessary. It needs to be maximally understandable so as to overcome the candidate’s anxiety. It needs to start easy and work up via interactive discussion to show the programmer’s actual level of competence. The automated websites for this look interesting but can only do a first level of checking for basic programming ability.

We all need to step outside ourselves and understand what makes someone a good programmer: smart, quick learner, and good to work with.

Like

@Tom White

The cache has nothing to do with speeding up lookups or the size of the dictionary. If you have a string of length L than the standard backtracking solution will in the worst-case do 2^L dictionary lookups. By keeping track of which substrings are possible to segment we can reduce this to ~L lookups.

Like

As to someone who hasn’t coded since high school, I’m curious as to what legal actions you take against sites like Glassdoor that publish information bared in NDAs. I assume you would need to know who actually provided the information to the site to bring any sort of penlites against them. At the end of the day you’re just closing the stable door after the horse has already bolted.

Like

There’s a bug in the general solution, it doesn’t work as written. Maybe your interview questions should include a section on testing? 🙂

Like

Reading this I can’t help but your falling into the java trap of trying to make a ridiculously generic solution which I would consider a danger sign but plenty of people love to see.

Anyway, assuming a real world input and real world dictionary you can try plenty of things that break down for a dictionary that includes four hundred A’s but are actually valid solutions. Also, if you want a fast real world solution then sticking with a pure lookup dictionary would slow things down. EX: Being able to toss 5 letters into a lookup table that says the longest and shortest words that start with those letters would save a lot of time. Basically, ‘xxyzy’ = null, null saves you from looking up x to the maximum word length in your dictionary. Secondly sanitizing inputs is going to be both safer and faster. Granted anything you are coding in 45 minutes is going to be a long way from production ready code.

PS: Even with backtracking you can code an O(n) solution for vary large input sets. Just keep track of the K best output sequences that are aka N, N-1,N-2,…N-K. (K being the lengh of the longest word in your dictionary.)

Like

Many candidates for software engineering positions cannot come up with the above or an equivalent (e.g., a solution that uses an explicit stack) in half an hour. I’m sure that many of them are competent and productive. But I would not hire them to work on information retrieval or machine learning problems, especially at a company that delivers search functionality on a massive scale.

I have a comment about the relationship of this question to “massive scale” information retrieval and machine learning problems. Naturally, no one wants a O(2^n) solution. But even when the solution is O(n^2), if the size of your input n is 10 billion (aka massive scale, web scale, big data), even an elegant, memoized algorithm isn’t tractable anyway, correct?

So while I indeed like this question, because it has a “progressive reveal” in levels of thinking, does someone working on web scale (massive) data really ever need to implement the highest level of thinking? Or is it that you just want a programmer who is aware of the issue, even if she or he never has to use it?

Like

@binarysolo

I also used to think this problem was too easy. Experience proved otherwise. In fact, I found at Google that performance on this problem correlated strongly to overall performance in the interview process, albeit with a limited sample size.

Like

@Retric congratulations, you’ve reinvented dynamic programming with your O(n) solution 🙂 That’d be a perfect solution.
Also, if your map is stored in sorted order (like in C++) a binary search will find you the longest word that fits the remainder of the string. (To also efficiently find all of the words that fit will require something extra.)

Like

@Rick Williams

It’s pretty common for companies to make candidates sign NDAs. I’ve learned about confidential information from several companies I interviewed with, typically as part of the sell of exciting things I could work on.

But forget the legalities, which are unenforceable in practice. Instead think about how you’d like yourself and other engineers to be assessed. Part of the point of interview problems is to make hiring about more than a person’s resume. It’s not clear who gains when interview problems are retired because they’ve been disclosed. Still, I recognize practically that it is impossible to maintain secrets once enough people know them. Hence my comment at the end of the post.

Like

Questions like this are grand if you are looking for a ‘Search Engineer’ but if you are looking for a front end developer or someone that understands how to design a REST api, or something else usefull this is a bogus question.

All to often I have seen interviewers use a one question fits all approach to interviewing. I have been out of college for a decade and have significant work experience in producing web pages and frameworks that respond to a browser or app in a specific way.

Questions like these filter me out because I have not been doing college papers or playing around in a programming languages class, but rather have been building real tools.

Questions like how would you design your own web framework might be a better question. Why is jQuery popular how does it differ from other js frameworks.

Can you write a delegated click handler? How does it differ from an assigned click handler?

Please ask a question relative to an applicant’s role/ experience.

There are plenty of questions out there that are easy for college grads to answer, but hard to experienced programmers and vice verse.

Like

@zui

No interview process is perfect. It’s tempting to only hire people you’ve worked with — indeed, I’d be inclined to so that as a founder. But that doesn’t scale, and it also biases the process towards people who are already well connected. Another approach is to focus almost entirely on the resume, but that has its own problems, from resume inflation to again favoring those who started with advantages.

I’m by no means perfect but I think I give candidates as fair a shot as I know how. I hired a candidate with no college degree and a high-school equivalency — she performed phenomenally as a software engineer and is now a managing director at Goldman. I try to put nervous candidates at ease. But there’s no getting around that an interview is an assessment process, and not everyone does well when they are being assessed.

I’m curious to hear the details of how you or your company interview candidates. Everyone benefits if we can make this process better.

Like

@Retric

Is it really so generic? I’ve simplified the problem a little, but it’s pretty close to a real problem I had to solve for implementing search features that would be deployed in a very broad range of domains, including for part-number search. I had the opportunity to see domain-specific heuristics break down.

Like

@jeremy

To clarify, the n here is the size of the input string, rather than the number of users or documents. But algorithms like these run in a high-traffic environment, with requests being processed concurrently. A feature whose cost blows up exponentially with a bad input can take a site down. Granted, there are other ways to guard against such failures (e.g., time-outs), it’s good to use algorithms that don’t have such blow up. And even better to understand the behavior of algorithms before rather than after those algorithms make it to production. 🙂

Like

@rcf, @Retric: You don’t need to invent new structures for the dictionary; lots of variants of trie/NFA/DFA will do just fine. However, the question explicitly disallows changing the dictionary and its API. I’m sure Daniel is aware that O(N) solutions exist if the dictionary is improved, but interview questions are for exploring the abilities of a candidate rather than finding the best possible solution.

Like

Right, I understand that this example problem is a small input string. Perhaps what I was trying to ask is how indicative of a real world massive scale machine learning or information retrieval problem this example problem really is.

Agreed about being able to recognize that an algorithm might have quadratic or exponential blowup. But again, I’m asking about how realistically often one has to implement solutions that need dynamic programming when doing information retrieval machine learning on a web scale. I am assuming that you’re not just talking about parsing the input. I assume when you say massive scale information retrieval and machine learning, you’re working on algorithms to extract patterns from the data, to find relationships, to (set based) extraction of related entities. For example.

And in that case, the size of the input isn’t 100 or 1000. But millions. Billions. So again, how often is memoization necessary, in practice?

(And recognizing that something is going to be NP complete, or has a quadratic-time DP solution, or a quadratic time approximation, is different than being able to code that solution, during a half hour interview. So might it perhaps be better to test one’s ability to recognize say 7 or 8 different problems as to their potential complexity, rather than having a candidate write code for a single example?)

Like

@jeremy

Different problems test different skills. I’ve never used this question as the only determinant in assessing a candidate, but I’ve found that it provides more bits of information than most.

Like

@Grant Husbands I was not suggesting you needed to recreate a dictionary. However, the ideal program for a human language dictionary where the longest word is 32ish digits vs. some input text is very different than something you would use if you had 200 different ~100,000 digit DNA sequences in your dictionary. Conceder with a long enough input string iterating over the full dictionary and creating a quick index could easily reduce runtimes from years to seconds. And with a short enough input string any optimizations are basically pointless. EX: ‘cat’.

Like

Daniel,

I think you have a boundary problem in your example — the substring function in java takes (to my mind) weird arguments. You have to call substring until the endIndex / right argument is equal to the string length.
eg:

String test = “0123456”;
puts(“%s length %d”, test, test.length());

for (int i=1; i %s”, 1, i, test.substring(0, i));

produces

prefix [1, 1] -> 0
prefix [1, 2] -> 01
prefix [1, 3] -> 012
prefix [1, 4] -> 0123
prefix [1, 5] -> 01234
prefix [1, 6] -> 012345
prefix [1, 7] -> 0123456

Like

@Earl

Are you sure? Change the for loop to

for (int i=1; i %s”, 0, i, test.substring(0, i));
}

produces

prefix [0, 1] -> 0
prefix [0, 2] -> 01
prefix [0, 3] -> 012
prefix [0, 4] -> 0123
prefix [0, 5] -> 01234
prefix [0, 6] -> 012345
prefix [0, 7] -> 0123456

Like

@Retric: Your suggested 5-letter lookup was essentially a change to the dictionary API; it is to that that I was referring. For any lack of clarity on my part, I apologise. Anyway, there are plenty of ways of preprocessing the dictionary, and I mentioned common ones, but none of them fit the problem description, which explicitly disallows such preprocessing, making this debate irrelevant.

Like

Why are you retiring it? Because it’s out on the web?

I don’t think having knowledge of the interview question ahead of time necessarily precludes its usefulness. Being able to deliver the solution quickly and efficiently is still a valuable assessment of skill. Also, assuming the candidate has to answer 4-6 diverse problems over the course of the day, it’s still a pretty good screen to have them ‘reproduce memorized answers’ (assuming they knew them ahead of time). This is further mitigated by having a couple of problems you can switch between, now the candidate would need to have 25-50 questions memorized.

Honestly, having the solution to 50 interview questions to CS problems memorized is pretty good. On top of that, very few candidates do the research needed to find the problems ahead of time.

PS, Thanks for the outline, I’m going to use this question for my candidates in the future.

Like

I have always hated interview questions, which is why I don’t use them. I’d rather solve a problem WITH the candidate or if I’m being interviewed then with the interviewer. I want real world problems that have NOT been pre-solved. I want to have a design discussion with them and talk through the design and implementation issues. Coding is trivial after that.

I would rather have the candidate bring in an example of code that is non-proprietary and something they’re particularly proud of. Then I review the code with them like I would if they worked for me.

I’d much rather have an interview as a pseudo-working session. Then I can see how it would be to actually work with that person. There’s no better way to see how someone thinks than making them work WITH you; not jump through your artificial hoops.

Like

@Daniel

Well, I still dunno if this is a great interview problem per se, but it sure is a great conversation starter given the very *accessible* nature that even us laymen who don’t have much programming can access the question and think of reasonable implementations. 🙂

I’m not familiar enough with the programming world, but I’d imagine that the value of clever thinking and efficient structural thinking >> some technical, nuanced aspect of some computer language which is what this problem fields out.

Like

Different problems test different skills. I’ve never used this question as the only determinant in assessing a candidate, but I’ve found that it provides more bits of information than most.

Again, yes.. but you specifically said that this specific question didn’t just test a candidate’s ability to think generally, computer-scientifically. But a candidate’s ability to think specifically in terms of massive scale, machine learning and information retrieval.

And it’s that specific connection — between DP and massive scale in the context of ML and IR — that I’m struggling to understand, rather than the broader question of whether a candidate is a good computer scientist.

It’s just that.. oh, nevermind. I’ll take it offline.

Like

Thank you for the response about NDAs. I agree completely that it is unprofessional to publish secret interview questions after an interview.

But that’s different from the NDA issue. I’ve been on dozens of interviews and have never been asked to sign an NDA for interviewing, nor have I required one of anyone I am interviewing. NDAs are something clients sign before being informed of proprietary business trade secret information. This is done only when absolutely necessary since it is much better simply not to reveal trade secrets to outside parties in the first place. It’s quite bizarre to hear of them being required for interviews and I don’t really believe that this is a common practice. If it is common in some segment of the field, then it is an ill advised practice.

Like

Great interview question, but in my experience as a programmer, I have not seen many programmers who can cook up dynamic programming solutions to problems like these, let alone during the stress of an interview. Even recognizing this as a dynamic programming question will be hard for many.

Thanks for such a nice post Daniel. Enjoyed reading it a lot!

Like

The general solution will terminate if it finds the word in the dictionary, should it still not continue? I mean, there could be a word like “endgame”, for the lack of a better example (or backtracking), which might be a part of the dictionary, but are also individual words…I can understand they may not be popular in IR though since they are usually an abbreviation of the sub-words, and the sub words don’t really make sense independently, but given the problem definition…

Like

@John H

I hope this question serves you well. It’s time for me to move on from it, and I thought the best way to retire this problem was to do so in a way that others would learn from it.

Like

@Charles Scalfani

We do have collaborative problem solving and product design discussions as part of the interview process. But I also want to see how a candidate writes code. Reviewing code they’ve written before is an option, but it’s tricky — especially for a candidate that has not written non-proprietary code in a long time.

Like

@Debnath

The termination condition is part of the problem statement. The problem could be changed to require outputting all valid segmentations, but there could be an exponential number of them. We could also require the “best” segmentation, which is an interesting design question as to what constitutes “best”.

Like

Comments are closed.