Data Structures and Algorithms in Python for Beginners
Categories
The article will teach you the basics of data structures and algorithms in Python. Arrays, lists, dictionaries, tuples, sets, and queues are all there and more.
Have you ever wondered how apps like Google Maps find the shortest path so fast? It all starts with data structures and algorithms. They are fundamental for efficient computing, helping programmers solve problems with optimized solutions. These solutions take the form of many services that you use, probably not even thinking about it: Google Maps, Google Search, Instagram, Facebook, Amazon, Netflix, and so on.
In this article, I will introduce you to data structures and algorithms in Python.
What are Data Structures and Algorithms?
Data structure refers to a specific data format used for organizing and storing data. Data structures differ by data values, relationships between them, and the operations that can be performed.
Algorithms are step-by-step procedures or formulas for completing a certain task. As you can imagine, different algorithms use different data structures to perform those tasks.
Types of Data Structures in Python
I will talk about these fundamental Python data structure types in the following sections.
1. Arrays/Lists
To make it clear from the start, lists are dynamic arrays. That’s why they both belong to the same ‘family’ of data structures, but there are still some differences between them.
Arrays are homogenous data structures, meaning that they store only elements of the same data type. These data structures come from the array Python module. Bear in mind that arrays from that module are dynamic, i.e., their size can be changed. If you’re using arrays from NumPy, they are fixed, i.e., you can’t change their size after creating them.
Lists in Python are also dynamic data structures that behave like arrays but with added flexibility.
Here’s a table outlining the basic differences between the two data structures.
Despite their differences, both data types are ideal for storing ordered collections of items.
Visual Representation
Arrays and lists can be visualized like this.
Basic Operations
In the table below, you can find some basic array operations that will be useful when you start writing algorithms.
Regarding the lists, the same operations are presented in the table below.
Hands-On Example
Now, we can apply all that to solve a practical problem. Take, for example, this interview question by Twitter (X).
Link to the question: https://platform.stratascratch.com/algorithms/10431-maximum-number-in-array
The task is to find the largest number in an array.
In the solution, the defined find_largest_number(arr) function assigns the variable largest to the first element in the array; that’s because we need a starting point for comparison.
Then, the for loop is written. It iterates over each number in the array and checks if it’s larger than the current value of the largest variable. If it is, we’ve found the newest largest number. After the iteration is finished, the loop returns the latest largest variable, i.e., the largest number overall.
def find_largest_number(arr):
largest = arr[0]
for num in arr:
if num > largest:
largest = num
return largest
Here’s the output.
Expected Output
Official output:
12Official output:
-3Official output:
10000002. Dictionaries
Dictionaries are data structures that store key-value pairs, which is perfect for storing ordered collections of items.
Visual Representation
To understand it better, here’s a visual representation of a Python dictionary.
Basic Operations
The table below gives you examples of basic operations on Python dictionaries.
Hands-On Example
This question by Meta is similar to the previous one, only this time we need to find the largest value in a dictionary. In addition, we need to output the value’s key and index, assuming all values are unique.
Find the maximum value, its key, and index from a dictionary. Assume all values are unique.
Link to the question: https://platform.stratascratch.com/algorithms/10380-discovering-max-value-in-dictionary
In the function we define, we use max(dictionary.values()) to find the maximum value. Easy, right? The rest of the solution is also easy. In the next code line, we again use the max() function, but this time we specify get. That way, we fetch the index of the largest value in the dictionary.
As the next step, we convert the dictionary’s keys into a list and find the maximum value’s index. Finally, we show the required output.
def find_max_value(dictionary):
max_value = max(dictionary.values())
max_key = max(dictionary, key=dictionary.get)
max_index = list(dictionary.keys()).index(max_key)
return max_value, max_key, max_index
Here’s the output.
Expected Output
Official output:
[10, "b", 1]Official output:
[150, "peach", 3]Official output:
[9, "giraffe", 3]3. Tuples
A tuple is a collection of elements separated by a comma. It is similar to a list, but unlike the list, it is immutable.
In Python, you can differentiate them by the type of a bracket: tuples are defined using parentheses (round brackets), while lists use square brackets for that.
Tuples are most commonly used to store related data, e.g., information about a person.
Visual Representation
Visually, tuples can be represented in the same way as lists.
Basic Operations
Here’s an overview of basic tuple operations. Because tuples are immutable, you can’t directly update, add, or remove an element. A workaround includes converting to a list, make changes, and then convert them back to a tuple.
Hands-On Example
Since tuples can be understood as immutable lists, we’ll solve this question by Apple, which uses lists as inputs.
Link to the question: https://platform.stratascratch.com/algorithms/10413-efficient-median-finder
The question requires us to write a function that finds a list median. We’ll pretend that we’re doing the same thing but for a tuple.
The first step in the defined function is to sort the input. We do that using the sorted() function. However, when the tuple is passed in its parentheses, it doesn’t modify the tuple directly. Instead, it converts it to a list internally for sorting and returns the result as a new list. So, to convert it back to a tuple, we need to use the tuple() function.
Then, we use len() to find the number of elements in the tuple. In the if statement, we use the modulo operator (%) to check if the tuple has an even number of elements. If it has, the median is calculated as the mean of two middle elements.
If the number of elements is not even, then the median is the middle element.
def find_median(tpl):
sorted_tuple = tuple(sorted(tpl))
n = len(sorted_tuple)
if n % 2 == 0:
median = (sorted_tuple[n//2 - 1] + sorted_tuple[n//2]) / 2
else:
median = sorted_tuple[n//2]
return median
Here’s the output.
Expected Output
Official output:
0Official output:
3.5Official output:
-54. Sets
Sets are data structures that are comprised of unordered, unique elements. A set in itself is mutable, but it must consist of immutable data types, such as numbers, strings, booleans, tuples, or bytes.
The implementation of sets in Python is based on a hash table, which allows a quick lookup. That’s why they are commonly used when removing duplicates or testing membership.
Visual Representation
The sets in Python can be visualized like this.
Basic Operations
The examples of the basic operations on Python sets are given below. Sets are unordered data structures, so they don’t allow updating specific elements directly. Updating an element in a set requires a workaround that involves removing the old element and adding a new one.
Hands-On Example
Here’s one more interview question by Meta. We need to find common words in any given two sentences and return them sorted alphabetically.
Find common words in any given two sentences and return them sorted alphabetically.
Link to the question: https://platform.stratascratch.com/algorithms/10373-common-words-in-two-sentences
To solve this problem, we will import Python’s string module, which offers many tools for handling strings.
The defined find_common_words(input) function takes an input parameter, expected to be a list containing two sentences (input[0] and input[1]).
Next, we define the clean_and_split(sentence) function. It splits the sentence into individual words based on whitespace. It then removes punctuation from each word, and converts them to lowercase. Finally, it uses set() to create a set, which automatically eliminates duplicates.
The clean_and_split(sentence) function is applied to both sentences, with words1 and words2 being sets that contain unique, cleaned words from the two sentences.
We can now use the intersection() method to find words that are present in both sets.
Finally, we use list() to convert the set intersection to a list and sorted() to return a sorted list of common words.
import string
def find_common_words(input):
sentence1 = input[0]
sentence2 = input[1]
def clean_and_split(sentence):
return set(word.strip(string.punctuation).lower() for word in sentence.split())
words1 = clean_and_split(sentence1)
words2 = clean_and_split(sentence2)
common_words = words1.intersection(words2)
return sorted(list(common_words))
Here’s the output.
Expected Output
Official output:
["day", "is", "learn", "to", "today"]Official output:
["code", "engaging", "learning", "to"]Official output:
["favorite", "the", "toy"]5. Stacks and Queues
Stacks and queues are linear data structures that differentiate by how the data is added and removed.
Stacks follow the Last-In-First-Out (LIFO) method, which means the last added element will be the first to be removed to access data.
On the other hand, the First-In-First-Out (FIFO) method used in queues will first remove the element that was added first.
In Python, stacks and queues are typically implemented using lists.
Visual Representation
Visually, stacks can be represented like this.
As for queues, here’s a visualization to help you understand them more easily.
Basic Operations
The basic operations in stacks are given below.
You can see the queue operations in the table below.
Hands-On Example
A good example of using stacks to solve practical problems is this interview question by Citadel and Spotify.
Check if an expression is balanced. An expression is considered flat if it has adequately closed brackets (Brackets, parenthesis, greater-than sign, less-than sign).
Link to the question: https://platform.stratascratch.com/algorithms/10371-balanced-expression-checker
Here, we need to check whether an expression is balanced. This means it has adequately closed brackets, parentheses, a greater-than sign, and a less-than sign.
In the solution, we define the is_balanced(expression) function that will return True if the expression is balanced and False if it’s not. We first initialize an empty stack. Next, we define two lists, with one containing an opening and the other containing closing brackets and signs.
The next step is to iterate through an input string using the for loop. If char is an opening bracket/sign, it is pushed onto the stack using the append() function.
Then, we handle the closing bracket/sign: the loop checks if the stack is empty. If it is, there’s no corresponding bracket/sign, so it returns False. If the stack is not empty, the topmost bracket/sign is removed from the stack using the pop() function. The indices of the popped opening bracket/sign and the current closing bracket/sign are compared; if they don’t match, the brackets/signs are misaligned, and the function returns False.
After looping, the function checks if the stack is empty. If yes, all opening brackets/signs had matching closing brackets/signs, so the expression is balanced. If not, the expression is unbalanced.
def is_balanced(expression):
stack = []
opening_brackets = ['(', '[', '{', '<']
closing_brackets = [')', ']', '}', '>']
for char in expression:
if char in opening_brackets:
stack.append(char)
elif char in closing_brackets:
if not stack:
return False
opening_bracket = stack.pop()
if opening_brackets.index(opening_bracket) != closing_brackets.index(char):
return False
return len(stack) == 0
Here’s the code output.
Expected Output
Official output:
trueOfficial output:
trueOfficial output:
trueKey Algorithms in Python
There are so many algorithms and algorithm types in Python that several articles wouldn’t be enough to cover them all. So, we won’t even attempt that, but it doesn’t mean we can’t cover the fundamental algorithms you’ll use very often in Python. Here they are.
We’ll go through several important algorithms in each category and show you a practical example. (On a side note, you’ve actually written several algorithms in the previous examples. You haven’t even noticed it! After reading about the algorithms below, as a practice, go back to data structure examples and try to recognize to which category each solution belongs.)
Quite often, it’s difficult to categorize an algorithm into a single category. That’s why you’ll see some of the searching and sorting algorithms also mentioned as recursion algorithms.
1. Searching Algorithms
Searching algorithms are used to find the positions or existence of a specific element in a data structure.
I’ll give short descriptions and use cases for some of those algorithms. At the end of this section, you’ll find a table overviewing each algorithm's characteristics.
1. Linear Search Algorithm
- Description: Sequentially checks each element in the collection.
- Use case: Searching for an element in small or unsorted datasets, e.g., a specific product in an unordered shopping list.
2. Binary Search Algorithm
- Description: Divides the search space into halves.
- Use case: Locating elements in large, sorted datasets, e.g., a name in a sorted phone book or a word in a dictionary.
3. Ternary Search Algorithm
- Description: Similar to binary search but divides the search range into three parts instead of two.
- Use case: Optimizing unimodal functions or finding a specific value in a sorted, unimodal dataset, e.g., finding the optimal temperature for a chemical reaction in a lab experiment or searching for a peak in stock prices within a defined range.
4. Jump Search Algorithm
- Description: Skips blocks of elements and performs a linear search within the identified block.
- Use case: A faster alternative to linear search on large, sorted datasets, e.g., finding a page number in a sorted index of a book where random access, such as flipping multiple pages, is available.
5. Interpolation Search Algorithm
- Description: Estimates the position of the target based on its value relative to the array range.
- Use case: Searching in uniformly distributed, sorted datasets, e.g., finding a specific temperature in a sorted sensor dataset or locating a name in a sorted phone book where entries are uniformly distributed.
6. Exponential Search Algorithm
- Description: Quickly finds a range containing the target and performs a binary search within the range.
- Use case: Searching in sorted datasets when the size is unbounded or unknown, e.g., finding a specific timestamp in an unbounded log file or searching in online streaming data.
7. Deep-First Search Algorithm
- Description: Explores as far as possible along each graph branch before backtracking.
- Use case: Exploring all possible paths in graphs or trees, e.g., solving a maze, finding all dependencies in a software project, or exploring file system directories.
8. Breadth-First Search Algorithm
- Description: Explores all neighbors of a graph node before moving to the next level.
- Use case: Finding the shortest path in unweighted graphs or traversing levels in trees, e.g., the shortest path in social networks (e.g., degree of separation on LinkedIn), or crawling websites for indexing.
9. Brute Force Search Algorithm
- Description: Checks every element in the dataset sequentially to find all possible solutions to a problem.
- Use case: Exhaustively searching for a solution when no heuristics or efficient algorithms are available, e.g., solving word search puzzles, decrypting passwords via exhaustive key attempts, or matching DNA sequences.
In the table below, we’re using the Big O notation to show each algorithm’s time complexity. For detailed explanations of each notation’s meaning, refer to the What is Big O Notation? article, where you can also find a cheat sheet for quick reference.
Hands-On Example
To see how you can implement a searching algorithm, we’ll solve this question by Goldman Sachs – it implements the brute force algorithm.
Given a number as an input, find the following smallest palindrome number. For simplicity, consider the upper limit of your search to be 1 million.
Link to the question: https://platform.stratascratch.com/algorithms/10405-next-smallest-palindrome
To implement the algorithm, we define the next_palindrome(n) function. In the algorithm, while True means the loop should run indefinitely until it explicitly encounters a break command or a return statement within the loop that meets the condition. The function increments the current number by 1, ensuring that the search starts from the next integer following n.
The if statement checks if the number is a palindrome by checking if it stays the same after enabling backward indexing to reverse the number. If it does, the number is a palindrome.
Then, return n returns the number that is palindrome, ending the loop and the function execution.
def next_palindrome(n):
while True:
n += 1
if str(n) == str(n)[::-1]:
return n
Here’s the output.
Expected Output
Official output:
131Official output:
1000001Official output:
1244212. Sorting Algorithms
Sorting algorithms are used to arrange the elements of iterable (e.g., list, array, etc.) in a specific order, usually ascending or descending.
Here are some of the more popular sorting algorithms.
- Bubble Sort Algorithm
- Description: Compares adjacent elements and swaps them if needed.
- Use case: Used for educational purposes to teach sorting concepts, e.g., in sorting small lists in embedded systems where computational simplicity is necessary.
- Selection Sort Algorithm
- Description: Finds the smallest/largest element and places it in its correct position.
- Use case: Used for sorting small datasets, especially when simplicity is valued, e.g., arranging a leaderboard in a simple video game.
- Insertion Sort Algorithm
- Description: Builds the sorted array one element at a time by comparing and inserting elements into their correct position.
- Use case: Often used in hybrid algorithms (e.g., Timsort) for sorting smaller subarrays, e.g., sorting playing cards in your hand or organizing a small list of names or scores.
- Merge Sort Algorithm
- Description: Divides the array into halves, sorts them, and merges them.
- Use case: Used in external sorting (datasets that don’t fit in memory), e.g., sorting large datasets in databases or sorting log files by timestamp.
- Quick Sort Algorithm
- Description: Picks a pivot and partitions the array around it.
- Use case: General-purpose sorting for large datasets when average-case performance is critical, e.g., sorting files by size in file management systems or arranging search results in search engines.
- Heap Sort Algorithm
- Description: Utilizes a heap data structure for efficient sorting.
- Use case: Used in priority queue implementations, e.g., sorting task priorities in operating systems or scheduling jobs in a processor.
- Timsort Algorithm
- Description: Optimized for real-world data with varying patterns.
- Use case: Used in Python’s built-in methods sorted() and .sort() use it, e.g., for sorting a large dataset with many nearly sorted segments, such as customer records.
Here’s an overview of the algorithms and their time complexity.
Hands-On Example
Here’s an interview question by Goldman Sachs. It wants you to find the median of two sorted arrays.
Link to the question: https://platform.stratascratch.com/algorithms/10393-median-in-sorted-arrays/official-solution
To solve the question, we’ll implement an algorithm that uses Python’s built-in sorted() function, which uses the Timsort algorithm under the hood, a hybrid sorting algorithm that combines merge sort and insert sort.
The solution defines the find_median_sorted_arrays(input) function. In the function, we first extract two array inputs.
After that, the two arrays are merged into one sorted array.
We then find the length of the merged array and apply the median calculation to it. The calculation checks whether the array has an even or odd number of elements. In the first case, the median is the average of two middle elements. In the second case, the median is equal to the middle element.
def find_median_sorted_arrays(input):
nums1 = input["nums1"]
nums2 = input["nums2"]
merged = sorted(nums1 + nums2)
n = len(merged)
if n % 2 == 0:
return (merged[n//2 - 1] + merged[n//2]) / 2
else:
return merged[n//2]
Here’s the output.
Expected Output
Official output:
16Official output:
3Official output:
03. Recursive Algorithms
Recursive algorithms employ a recursive technique where a function solves a problem by calling itself with smaller or simpler inputs until it reaches a base case. This is useful for problems that can be broken down into similar subproblems, e.g., mathematical computations, divide-and-conquer strategies, and tree or graph traversals.
Here are some recursive algorithms you’ll commonly use in Python.
1. Factorial Algorithm
- Description: Computes the product of all positive integers up to 𝑛.
- Use case: Combinatorics (combinations and permutations), statistics and probability, game theory (possible game configurations and moves),
2. Fibonacci Algorithm
- Description: Generates the Fibonacci sequence.
- Use case: Dynamic programming and algorithm design (e.g., optimization techniques), financial analysis (technical analysis of stocks and cryptocurrencies), nature and biology (natural patterns, e.g., leaves arrangements), computer graphics (creating fractals and procedural patterns), data structures (Fibonacci heap structures)
3. Search Algorithm
- Description: Searches a sorted list by dividing the search space in half
- Use case: Database indexing and searching for the specific records, optimization (finding roots of equations or minimizing functions)
4. Merge Sort Algorithm
- Description: Recursively splits and merges sorted subarrays.
- Use case: External sorting (e.g., sorting data on disks), sorting datasets with secondary keys, statistical analysis (inversion count)
5. Quick Sort Algorithm
- Description: Sorts by selecting a pivot and partitioning the array around it.
- Use case: Sorting datasets that fit in memory, partitioning large datasets when optimizing queries.
6. Tower of Hanoi Algorithm
- Description: Moves 𝑛 disks from one peg to another, following specific rules.
- Use case: Algorithm design and recursion (teaches recursion and problem decomposition), network optimization (models data movement in distributed systems), robotics and AI (planning algorithms to move items between locations)
7. DFS Algorithm
- Description: Explores all paths deeply before backtracking.
- Use case: Pathfinding (in mazes, games, navigation systems), graph algorithms (detecting cycles, connected components, topological ordering), AI (solving puzzles, exploring decision trees), web crawlers (crawling deep links in websites)
8. Permutations Algorithm
- Description: Generates all arrangements of a list or string
- Use case: Optimization problems (traveling salesman and scheduling problems), combinatorial testing (tests all possible input orders for software validation), cryptography (generates all keys or sequences in brute-force attacks)
9. Combinations Algorithm
- Description: Generates subsets where the order of elements does not matter
- Use case: Used in problems involving selection, e.g., creating teams and selecting items.
10. Tree Traversal (Inorder, Preorder, Postorder)
- Description: Visits all the nodes in a tree in a specific order: left subtree -> root -> right subtree (Inorder Traversal), root -> left subtree -> right subtree (Preorder Traversal), left subtree -> right subtree -> root (Postorder Traversal)
- Use case:
- Inorder Traversal: Retrieves nodes in sorted order for a binary search tree, e.g., product IDs in an e-commerce system.
- Preorder Traversal: Creates a copy of the tree-based structure (e.g., a directory structure or a decision tree) or serializes it, ensuring that it can be reconstructed in the exact same shape when deserialized.
- Postorder Traversal: Deletes or processes a tree in a bottom-up manner, e.g., when deallocating memory for a large binary tree structure or processing a dependency graph where each task depends on its children.
Here’s an overview of these algorithms.
Hands-On Example
We’ll solve this interview question by Goldman Sachs.
A person has to climb the stairs. They can climb either 1 or 2 steps at each step. Use recursion to count the number of different ways a person can climb steps.
Link to the question: https://platform.stratascratch.com/algorithms/10408-recursive-stair-climbing
The algorithm we’ll implement here can be categorized as a recursive divide-and-conquer algorithm with some similarities to the Fibonacci sequence. The algorithm should count the number of different ways a person can climb steps, provided they can climb one or two steps.
We define the count_ways(n) function that will do the task. In the if statement, we define the n <= 1 base case, which means a single step or no steps left to climb. This is a recursion-stopping condition, which prevents the recursion from running indefinitely.
Then, we define a recursion part of the algorithm, where it calls itself. It breaks down the problem into two parts. The first part is that, for any number of n steps, you can climb by taking one step (n-1) from there. The second part indicates climbing by taking two steps (n-2).
def count_ways(n):
if n <= 1:
return 1
else:
return count_ways(n-1) + count_ways(n-2)
Here’s the algorithm output.
Expected Output
Official output:
21Official output:
89Official output:
987Best Practices and Tips for Beginners
As a beginner, it’s important that you follow certain best practices to become proficient in Python data structures and algorithms. Here’s some advice from us that will help you cultivate good habits.
1. Understand the Basics:
- Master fundamental Python data structures, e.g., by reading articles explaining them, like this article.
- Familiarize yourself with the time and space complexity of operations using Big-O notation.
2. Write Clean and Modular Code
- When writing more complex algorithms, break them down into smaller functions, with each handling a specific task. This will make your code reusable and easier to debug.
- Write comments in your code to explain your thought process.
3. Practice Algorithm Writing
- Use platforms like StrataScratch, LeetCode, HackerRank, and Codewars to solve algorithm challenges of various difficulty levels.
4. Debug and Optimize
- Use Python’s pdb module to debug the code or simple print statements to understand your algorithm’s flow.
- Optimize your algorithm for better efficiency and test for edge cases.
Conclusion
If you master the concepts we covered in this article, you’ll be well on the way to mastering Python data structures and algorithms. By understanding the theory and practice behind them, you’ll be better able to apply algorithms in real-world problem-solving.
A very important step in achieving that is actually writing algorithms. Our algorithm interview questions are ideal for that, as they require problem understanding, technical knowledge, and coding skills to solve them. As a side effect, you become versed in solving data structure interview questions, a rather helpful skill if you want a job that involves working with algorithms and structuring data.