My Cloud Notebook: Anagram Cracker

Tuesday, March 15, 2011

Anagram Cracker

/******************************************************************************
 * This function is a code snippet. It can be freely used and distributed.
 * Author: irfan[dot]hamid[at]gmail[at]com
 *
 * This function takes two ASCII strings in C syntax (null-terminated
 * character arrays) and determines whether they are anagrams or not.
 *
 * Implementation notes:
 * The function contains a histogram that stores the frequency of each 
 * character it encounters in the first string. For each character of the 
 * second string, it decrements the corresponding entry in the histogram. If 
 * before every decrement, the value of the histogram bucket reaches zero, 
 * then the two strings are not anagrams, as there is some character that has
 * more occurrences in the second string than in the first string.
 *
 * If either of the two strings contains a non-alphabetic character, the
 * anagram property is considered to be violated.
 *
 * Remarks:
 * It is an efficient implementation because:
 *   o It is in-place, the original strings are not copied, no alloc or copy
 *   o Does not clobber the original strings
 *   o It is a linear time implementation
 *
 * Returns:
 * True if the strings are anagrams, false otherwise.
******************************************************************************/

bool is_anagram (char w1[], char w2[])
{
 unsigned int i, sz;

 /* The histogram */
 int freqtbl[26];

 /* Sanity check */
 if ((sz = strlen(w1)) != strlen(w2))
  return false;

 /* Initialize the histogram */
 bzero(freqtbl, 26*sizeof(int));

 /* Read the first string, incrementing the corresponding histogram entry */
 for (i = 0; i < sz; i++) {
  if (w1[i] >= 'A' && w1[i] <= 'Z')
   freqtbl[w1[i]-'A']++;
  else if (w1[i] >= 'a' && w1[i] <= 'z')
   freqtbl[w1[i]-'a']++;
  else
   return false;
 }

 /* Read the second string, decrementing the corresponding histogram entry */
 for (i = 0; i < sz; i++) {
  if (w2[i] >= 'A' && w2[i] <= 'Z') {
   if (freqtbl[w2[i]-'A'] == 0)
    return false;
   freqtbl[w2[i]-'A']--;
  } else if (w2[i] >= 'a' && w2[i] <= 'z') {
   if (freqtbl[w2[i]-'a'] == 0)
    return false;
   freqtbl[w2[i]-'a']--;
  } else {
   return false;
  }
 }

 return true;
}

http://shashank7s.blogspot.com/2011/02/anagram-cracker.html

2 comments:

AnonymousMarch 27, 2011 at 12:12 AM
http://www.mytechinterviews.com/anagrams-in-array-of-strings
ReplyDelete
Replies
rajeevprasannaMarch 27, 2011 at 12:27 AM
Different version:

Given an English word in the form of a string, how can you quickly find all valid
anagrams for that string (all valid rearrangements of the letters that form valid English words)? You are allowed to pre‐compute whatever you want to and store whatever you optionally pre‐compute on disk.

Answer:We want to use a hash table! If your interviewer really hates hash tables(which they sometimes do for some reason), you can use a tree instead.But let's assume you can use a hash table. Then for the pre‐computing step, go through each word in the dictionary, sort the letters of the word in alphabetical order (so "hacking" would become "acghikn") and add the sorted letters as a key in the table and the original word as one of the values in a list of values for that key.For example, the entry for "opst" would be the list ["opts", "post", "stop","pots","tops","spot"]. Then,whenever you get a string, you simply sort the letters of the string and look up the value in the hash table.  The running time is O(nlogn) for sorting the string (which is relatively small) and approximately O(1) for the lookup in the hash table.
ReplyDelete
Replies

Add comment

Popular Posts

Tuesday, March 15, 2011

Anagram Cracker

2 comments: