Table of Contents
Given a word pat and a text txt. Return the count of the occurrences of anagrams of the word in the text.
Example 1:
Input: txt = "forxxorfxdofr", pat = "for" Output: 3 Explanation: for, orf and ofr appears in the txt, hence answer is 3.
Example 2:
Input: txt = "aabaabaa", pat = "aaba" Output: 4 Explanation: aaba is present 4 times in txt.
Constraints:
1 <= |pat| <= |txt| <= 105
Both strings contain lowercase English letters.
Introduction: Understanding Anagrams
An anagram is a word or phrase formed by rearranging the letters of a different word or phrase, using all the original letters exactly once. Anagrams are often used in word puzzles and games. For example, the word “listen” is an anagram of “silent”.
In this blog post, we’ll explore how to count occurrences of anagrams within a string. This is a common problem in computer science, especially when dealing with text analysis or pattern matching. Whether you’re preparing for a coding interview or working on a project, this problem has real-world applications. Let’s dive into how to efficiently solve it.
How to Count Occurrences of Anagrams in a String
The problem is simple: given a string and a pattern, we want to find how many times the pattern appears as an anagram within the string.
Example: Given the string “cbaebabacd” and the pattern “abc”, the function should return 2 since there are two anagrams of “abc”: “cba” and “bac”.
To count occurrences of anagrams, we need to use a sliding window approach, which helps in optimizing our solution. The idea is to slide a window of the same length as the pattern across the string and check if the window contains the same characters as the pattern.
Brute Force Method for Counting Anagram Occurrences
The brute-force approach involves generating all possible substrings of the same length as the pattern and checking if they are anagrams of the given pattern. This method is straightforward but inefficient. Let’s go through the steps:
- Generate all substrings of the string that are the same length as the given pattern.
- Check if each substring is an anagram of the pattern by comparing the characters (or sorting them).
- Count the matches where the substring is an anagram of the pattern.
Time Complexity:
- For a string of length N and a pattern of length M, generating all substrings will take O(N – M + 1) iterations.
- Each comparison (checking if two substrings are anagrams) takes O(M log M) (since we are sorting the characters).
- Therefore, the overall time complexity of the brute-force method is O((N – M + 1) * M log M), which can be inefficient for large strings.
Here is the C++ implementation of the brute-force approach:
#include <iostream>
#include <algorithm>
using namespace std;
int countAnagramOccurrencesBruteForce(string str, string pattern) {
int count = 0;
int n = str.length();
int m = pattern.length();
// Sort the pattern for comparison
sort(pattern.begin(), pattern.end());
// Generate all substrings of length m and check if they are anagrams
for (int i = 0; i <= n - m; i++) {
string substring = str.substr(i, m);
sort(substring.begin(), substring.end());
if (substring == pattern) {
count++;
}
}
return count;
}
int main() {
string str = "cbaebabacd";
string pattern = "abc";
cout << "Occurrences of anagrams (Brute Force): " << countAnagramOccurrencesBruteForce(str, pattern) << endl;
return 0;
}
Brute Force Dry Run for Anagram Count
Iteration | i (Starting Index) | Substring (from i to i+2) | Sorted Substring | Sorted Word (“for”) | Is Anagram? | Count |
---|---|---|---|---|---|---|
1 | 0 | "for" | "for" | "for" | Yes | 1 |
2 | 1 | "orx" | "orx" | "for" | No | 1 |
3 | 2 | "rxx" | "xrx" | "for" | No | 1 |
4 | 3 | "xxo" | "oxx" | "for" | No | 1 |
5 | 4 | "xor" | "orx" | "for" | No | 1 |
6 | 5 | "orf" | "for" | "for" | Yes | 2 |
7 | 6 | "rfx" | "frx" | "for" | No | 2 |
8 | 7 | "fxd" | "dfx" | "for" | No | 2 |
9 | 8 | "xdo" | "dox" | "for" | No | 2 |
10 | 9 | "dof" | "dfo" | "for" | No | 2 |
11 | 10 | "ofr" | "for" | "for" | Yes | 3 |
Optimal Method for Counting Anagram Occurrences
The optimal method avoids sorting and instead uses the sliding window technique to reduce the complexity. Here’s the approach:
- Use a sliding window of the same size as the pattern to traverse through the string.
- Use a frequency map (hash map) to track the characters in both the pattern and the current window.
- As the window slides, update the frequency map by adding the new character and removing the old one.
- Compare the frequency map of the window with that of the pattern.
- If they match, it means the current window contains an anagram of the pattern.
Time Complexity:
- Initializing the frequency map for the pattern takes O(M) time, where M is the length of the pattern.
- Sliding the window takes O(N) time, where N is the length of the string.
- Checking the frequency map at each step takes constant time, O(1).
- Therefore, the overall time complexity is O(N + M), which is significantly more efficient than the brute-force approach.
Here is the C++ implementation of the optimal approach using the sliding window technique:
#include <iostream>
#include <unordered_map>
using namespace std;
int countAnagramOccurrencesOptimal(string str, string pattern) {
unordered_map<char, int> patternMap, windowMap;
int count = 0;
// Initialize the pattern map with frequency of characters
for (char c : pattern) {
patternMap[c]++;
}
int windowStart = 0;
// Slide the window across the string
for (int windowEnd = 0; windowEnd < str.length(); windowEnd++) {
char endChar = str[windowEnd];
windowMap[endChar]++;
// If window size exceeds pattern size, shrink the window
if (windowEnd - windowStart + 1 > pattern.length()) {
char startChar = str[windowStart];
windowMap[startChar]--;
windowStart++;
}
// If window matches pattern map, it's an anagram
if (windowMap == patternMap) {
count++;
}
}
return count;
}
int main() {
string str = "cbaebabacd";
string pattern = "abc";
cout << "Occurrences of anagrams (Optimal): " << countAnagramOccurrencesOptimal(str, pattern) << endl;
return 0;
}
Comparison: Brute Force vs. Optimal Approach
Approach | Time Complexity | Space Complexity | Explanation |
---|---|---|---|
Brute Force | O((N – M + 1) * M log M) | O(M) (for sorting) | Generates all substrings and sorts each substring. |
Optimal (Sliding Window) | O(N + M) | O(M) (for frequency map) | Efficiently slides the window and compares frequency maps. |
Why the Optimal Method is Better:
- Efficiency: The brute-force method involves sorting each substring, which is slower as the string grows. The sliding window method avoids sorting and operates in linear time.
- Scalability: For larger inputs, the optimal method will handle much bigger strings and patterns efficiently, whereas the brute-force approach will become too slow for larger datasets.
Conclusion
In this blog post, we have examined two approaches to count occurrences of anagrams in a string: the brute-force method and the optimal sliding window approach. While the brute-force method is easier to implement, it is inefficient for large inputs. The optimal method, using the sliding window technique, is significantly faster and more scalable.
For applications where performance is critical, such as text processing or data analysis, the optimal approach should be preferred.
Let me know if you have any questions or need further clarifications on the topic!