Pandas Interview Questions

This document provides a curated list of Pandas interview questions commonly asked in technical interviews for Data Science, Data Analysis, Machine Learning, and Python Developer roles. It covers fundamental concepts to advanced data manipulation techniques, including rigorous "brutally difficult" questions for senior roles.

This is updated frequently but right now this is the most exhaustive list of type of questions being asked.

Sno	Question Title	Practice Links	Companies Asking	Difficulty	Topics
1	What is Pandas and why is it used?	Pandas Docs	Google, Amazon, Meta, Netflix	Easy	Basics, Introduction
2	Difference between Series and DataFrame	GeeksforGeeks	Google, Amazon, Meta, Microsoft	Easy	Data Structures
3	How to create a DataFrame from dictionary?	Pandas Docs	Amazon, Google, Flipkart	Easy	DataFrame Creation
4	Difference between loc and iloc	Stack Overflow	Google, Amazon, Meta, Apple, Netflix	Easy	Indexing, Selection
5	How to read CSV, Excel, JSON files?	Pandas Docs	Most Tech Companies	Easy	Data I/O
6	How to handle missing values (NaN)?	Real Python	Google, Amazon, Meta, Netflix, Apple	Medium	Missing Data, fillna, dropna
7	Difference between dropna() and fillna()	Pandas Docs	Amazon, Google, Microsoft	Easy	Missing Data
8	Explain GroupBy in Pandas	Real Python	Google, Amazon, Meta, Netflix, Apple	Medium	GroupBy, Aggregation
9	How to merge two DataFrames?	Pandas Docs	Google, Amazon, Meta, Microsoft	Medium	Merging, Joining
10	Difference between merge(), join(), concat()	Stack Overflow	Google, Amazon, Meta	Medium	Merging, Joining, Concatenation
11	How to apply a function to DataFrame?	Pandas Docs	Google, Amazon, Meta, Netflix	Medium	apply, applymap, map
12	Difference between apply(), map(), applymap()	GeeksforGeeks	Google, Amazon, Microsoft	Medium	Data Transformation
13	How to rename columns in DataFrame?	Pandas Docs	Most Tech Companies	Easy	Column Operations
14	How to sort DataFrame by column values?	Pandas Docs	Most Tech Companies	Easy	Sorting
15	How to filter rows based on conditions?	Pandas Docs	Google, Amazon, Meta, Netflix	Easy	Filtering, Boolean Indexing
16	How to remove duplicate rows?	Pandas Docs	Amazon, Google, Microsoft	Easy	Data Cleaning, Deduplication
17	How to change data types of columns?	Pandas Docs	Most Tech Companies	Easy	Data Types
18	What is the difference between copy() and view?	Stack Overflow	Google, Amazon, Meta	Medium	Memory Management
19	Explain pivot tables in Pandas	Pandas Docs	Amazon, Google, Microsoft, Netflix	Medium	Pivot Tables, Reshaping
20	Difference between pivot() and pivot_table()	Stack Overflow	Google, Amazon, Meta	Medium	Reshaping
21	How to handle datetime data in Pandas?	Pandas Docs	Google, Amazon, Netflix, Meta	Medium	DateTime, Time Series
22	How to create a date range?	Pandas Docs	Amazon, Netflix, Google	Easy	DateTime
23	What is MultiIndex (Hierarchical Indexing)?	Pandas Docs	Google, Amazon, Meta	Hard	MultiIndex, Hierarchical Data
24	How to reset and set index?	Pandas Docs	Most Tech Companies	Easy	Indexing
25	How to perform rolling window calculations?	Pandas Docs	Google, Amazon, Netflix, Apple	Medium	Rolling Windows, Time Series
26	How to calculate moving averages?	GeeksforGeeks	Google, Amazon, Netflix, Apple	Medium	Rolling Windows, Finance
27	How to perform resampling on time series?	Pandas Docs	Google, Amazon, Netflix	Medium	Resampling, Time Series
28	Difference between transform() and apply()	Stack Overflow	Google, Amazon, Meta	Hard	GroupBy, Data Transformation
29	How to create bins with cut() and qcut()?	Pandas Docs	Google, Amazon, Meta	Medium	Discretization, Binning
30	How to handle categorical data?	Pandas Docs	Google, Amazon, Meta, Netflix	Medium	Categorical Data, Memory
31	How to one-hot encode categorical data?	Pandas Docs	Google, Amazon, Meta, Microsoft	Easy	Feature Engineering, ML
32	How to read data from SQL database?	Pandas Docs	Amazon, Google, Microsoft	Medium	Database I/O
33	How to export DataFrame to various formats?	Pandas Docs	Most Tech Companies	Easy	Data Export
34	How to handle large datasets efficiently?	Towards Data Science	Google, Amazon, Netflix, Meta	Hard	Performance, Memory Optimization
35	What is Categorical dtype and when to use it?	Pandas Docs	Google, Amazon, Meta	Medium	Data Types, Memory Optimization
36	How to optimize memory usage in Pandas?	Medium	Google, Amazon, Netflix	Hard	Memory Optimization
37	Difference between inplace=True and returning copy	Stack Overflow	Most Tech Companies	Easy	DataFrame Modification
38	How to use query() method for filtering?	Pandas Docs	Google, Amazon, Meta	Easy	Filtering, Query
39	How to work with string data (str accessor)?	Pandas Docs	Google, Amazon, Meta, Netflix	Medium	String Operations
40	How to use str accessor methods?	Pandas Docs	Amazon, Google, Microsoft	Medium	String Operations
41	How to split and expand string columns?	GeeksforGeeks	Amazon, Google, Meta	Medium	String Operations, Data Cleaning
42	How to use melt() for unpivoting data?	Pandas Docs	Google, Amazon, Meta	Medium	Reshaping, Unpivoting
43	How to use stack() and unstack()?	Pandas Docs	Google, Amazon, Meta	Medium	Reshaping, MultiIndex
44	How to cross-tabulate with crosstab()?	Pandas Docs	Google, Amazon, Meta	Medium	Cross Tabulation, Analysis
45	How to calculate correlations?	Pandas Docs	Google, Amazon, Meta, Netflix	Easy	Statistical Analysis
46	How to calculate descriptive statistics?	Pandas Docs	Most Tech Companies	Easy	Statistical Analysis
47	How to use agg() for multiple aggregations?	Pandas Docs	Google, Amazon, Meta, Netflix	Medium	Aggregation
48	How to use named aggregations?	Pandas Docs	Google, Amazon, Meta	Medium	GroupBy, Named Aggregation
49	How to handle timezone-aware datetime?	Pandas Docs	Google, Amazon, Netflix	Medium	DateTime, Timezones
50	How to interpolate missing values?	Pandas Docs	Google, Amazon, Netflix	Medium	Missing Data, Interpolation
51	How to forward fill and backward fill?	Pandas Docs	Amazon, Netflix, Google	Easy	Missing Data, Time Series
52	How to use where() and mask() methods?	Pandas Docs	Google, Amazon, Meta	Medium	Conditional Operations
53	How to clip values in DataFrame?	Pandas Docs	Amazon, Google, Meta	Easy	Data Transformation
54	How to rank values in Pandas?	Pandas Docs	Google, Amazon, Meta, Netflix	Easy	Ranking
55	How to calculate percentage change?	Pandas Docs	Google, Amazon, Netflix, Apple	Easy	Time Series, Finance
56	How to shift and lag data?	Pandas Docs	Google, Amazon, Netflix	Easy	Time Series, Lag Features
57	How to calculate cumulative statistics?	Pandas Docs	Google, Amazon, Meta, Netflix	Easy	Cumulative Operations
58	How to use explode() for list columns?	Pandas Docs	Google, Amazon, Meta	Medium	List Operations, Data Preprocessing
59	How to sample data from DataFrame?	Pandas Docs	Google, Amazon, Meta, Netflix	Easy	Sampling
60	How to detect and handle outliers?	Towards Data Science	Google, Amazon, Meta, Netflix	Medium	Outlier Detection, Data Cleaning
61	How to normalize/standardize data?	GeeksforGeeks	Google, Amazon, Meta, Microsoft	Medium	Feature Engineering, ML
62	How to use eval() for efficient operations?	Pandas Docs	Google, Amazon, Meta	Hard	Performance Optimization
63	How to perform element-wise operations?	Pandas Docs	Most Tech Companies	Easy	Arithmetic Operations
64	Why vectorized operations are faster than loops?	Real Python	Google, Amazon, Meta	Medium	Performance, Vectorization
65	How to profile Pandas code performance?	Pandas Docs	Google, Amazon, Netflix	Hard	Performance Profiling
66	How to use pipe() for method chaining?	Pandas Docs	Google, Amazon, Meta	Medium	Method Chaining
67	How to handle SettingWithCopyWarning?	Pandas Docs	Google, Amazon, Meta, Microsoft	Medium	Common Errors, Debugging
68	How to compare two DataFrames?	Pandas Docs	Amazon, Google, Microsoft	Medium	Data Comparison, Validation
69	How to combine DataFrames with different schemas?	Stack Overflow	Google, Amazon, Meta	Medium	Merging, Schema Alignment
70	How to create conditional columns?	GeeksforGeeks	Most Tech Companies	Easy	Data Transformation
71	How to use np.where() with Pandas?	Real Python	Google, Amazon, Meta, Netflix	Easy	Conditional Operations
72	How to use np.select() for multiple conditions?	Stack Overflow	Google, Amazon, Meta	Medium	Conditional Operations
73	How to count value frequencies?	Pandas Docs	Most Tech Companies	Easy	Data Exploration
74	How to find unique values and nunique()?	Pandas Docs	Most Tech Companies	Easy	Data Exploration
75	How to check for null values?	Pandas Docs	Most Tech Companies	Easy	Missing Data
76	How to use any() and all() methods?	Pandas Docs	Google, Amazon, Meta	Easy	Boolean Operations
77	How to select specific columns?	Pandas Docs	Most Tech Companies	Easy	Column Selection
78	How to drop columns or rows?	Pandas Docs	Most Tech Companies	Easy	Data Cleaning
79	How to use assign() for creating new columns?	Pandas Docs	Google, Amazon, Meta	Easy	Column Creation
80	How to use idxmax() and idxmin()?	Pandas Docs	Google, Amazon, Meta, Netflix	Easy	Indexing
81	Why is iterating over rows slow?	Stack Overflow	Google, Amazon, Meta	Medium	Performance
82	How to use iterrows() and itertuples()?	Pandas Docs	Amazon, Google, Microsoft	Easy	Iteration
83	How to vectorize custom functions?	Real Python	Google, Amazon, Meta	Hard	Performance Optimization
84	How to use Pandas with NumPy?	Pandas Docs	Google, Amazon, Meta, Netflix	Easy	NumPy Integration
85	How to flatten hierarchical index?	Stack Overflow	Google, Amazon, Meta	Medium	MultiIndex
86	How to group by multiple columns?	Pandas Docs	Most Tech Companies	Easy	GroupBy
87	How to filter groups after GroupBy?	Pandas Docs	Google, Amazon, Meta	Medium	GroupBy, Filtering
88	How to get first/last n rows per group?	Stack Overflow	Google, Amazon, Meta, Netflix	Medium	GroupBy
89	How to handle JSON with nested structures?	Pandas Docs	Amazon, Google, Meta	Medium	JSON Processing
90	How to read/write Parquet files?	Pandas Docs	Google, Amazon, Netflix, Meta	Easy	File I/O, Big Data
91	Difference between Parquet, CSV, and Feather	Towards Data Science	Google, Amazon, Netflix	Medium	File Formats, Performance
92	How to use chunksize for large files?	Pandas Docs	Google, Amazon, Netflix, Meta	Medium	Large Data Processing
93	How to use nsmallest() and nlargest()?	Pandas Docs	Google, Amazon, Meta	Easy	Selection
94	How to calculate weighted average?	Stack Overflow	Google, Amazon, Netflix, Apple	Medium	Aggregation, Finance
95	How to perform window functions like SQL?	Pandas Docs	Google, Amazon, Meta, Netflix	Medium	Window Functions
96	How to join on nearest key (asof join)?	Pandas Docs	Google, Amazon, Netflix, Apple	Hard	Joining, Time Series
97	How to use combine_first() for data merging?	Pandas Docs	Amazon, Google, Microsoft	Medium	Merging
98	How to create period indices?	Pandas Docs	Google, Amazon, Netflix	Medium	Time Series
99	How to use Timedelta for time differences?	Pandas Docs	Google, Amazon, Netflix	Easy	DateTime
100	How to set display options globally?	Pandas Docs	Most Tech Companies	Easy	Display Options
101	What is method chaining and when to use it?	Tom Augspurger Blog	Google, Amazon, Meta	Medium	Method Chaining, Clean Code
102	How to calculate month-over-month change?	StrataScratch	Google, Amazon, Meta, Netflix	Medium	Time Series, Analytics
103	How to find customers with highest orders?	DataLemur	Amazon, Google, Meta, Netflix	Medium	GroupBy, Aggregation
104	[HARD] How to calculate retention metrics efficiently?	StrataScratch	Meta, Netflix, Amazon, Google	Hard	Cohort Analysis, Time Series
105	[HARD] How to implement A/B test analysis?	Towards Data Science	Meta, Google, Netflix, Amazon	Hard	Statistical Analysis, Testing
106	[HARD] How to optimize memory with `category` types?	Pandas Docs	Google, Amazon, Netflix	Hard	Memory Optimization
107	[HARD] How to implement cohort analysis?	Towards Data Science	Meta, Netflix, Amazon, Google	Hard	Cohort Analysis
108	[HARD] How to calculate funnel drop-off rates?	StrataScratch	Meta, Google, Amazon, Netflix	Hard	Funnel Analysis, Analytics
109	[HARD] How to implement custom testing using `assert_frame_equal`?	Pandas Docs	Google, Amazon, Microsoft	Hard	Testing, Quality
110	[HARD] How to handle sparse data structures?	Pandas Docs	Google, Amazon, Netflix	Hard	Sparse Data, Memory
111	[HARD] How to use Numba/JIT with Pandas?	Pandas Docs	Google, Amazon, Hedge Funds	Hard	Performance
112	[HARD] How to implement custom accessors?	Pandas Docs	Google, Amazon, Meta	Hard	Extending Pandas
113	[HARD] How to use Swifter for parallel processing?	Swifter Docs	Google, Amazon, Uber	Hard	Parallelism
114	[HARD] Explain Pandas Block Manager structure	Pandas Wiki	Google, Amazon, Meta	Hard	Internals
115	[HARD] How Copy-on-Write (CoW) works in Pandas 2.0+?	Pandas Docs	Google, Meta, Microsoft	Hard	Internals, Performance
116	[HARD] How to use PyArrow backend for performance?	Pandas Docs	Google, Amazon, Databricks	Hard	Performance, Arrow
117	[HARD] How to implement custom index types?	Pandas Docs	Google, Amazon	Hard	Extending Pandas
118	[HARD] How to optimize MultiIndex slicing performance?	Pandas Docs	Google, Amazon, Hedge Funds	Hard	Optimization
119	[HARD] `groupby().transform()` internal mechanics vs `apply()`	Pandas Docs	Google, Amazon, Meta	Hard	Deep Dive
120	[HARD] How to implement rolling window with `raw=True`?	Pandas Docs	Google, Amazon, Hedge Funds	Hard	Optimization
121	[HARD] How to extend Pandas with custom plotting backends?	Pandas Docs	Google, Amazon	Hard	Extending Pandas
122	[HARD] How to handle time series offset aliases?	Pandas Docs	Google, Amazon, Hedge Funds	Hard	Time Series
123	[HARD] How to use Dask DataFrames for out-of-core computing?	Dask Docs	Google, Amazon, Netflix	Hard	Big Data
124	[HARD] How to optimize chained assignment performance?	Pandas Docs	Google, Amazon, Meta	Hard	Optimization
125	[HARD] Nullable integers/floats implementation?	Pandas Docs	Google, Amazon, Microsoft	Hard	Internals
126	[HARD] How to use Cython with Pandas?	Pandas Docs	Google, Amazon, HFT Firms	Hard	Performance
127	[HARD] Comparison of Parquet vs Feather vs ORC?	Apache Arrow	Google, Amazon, Netflix	Hard	Systems

Code Examples

1. Memory Optimization

import pandas as pd
import numpy as np

# Typical large dataframe creation
df = pd.DataFrame({
    'category': np.random.choice(['A', 'B', 'C'], size=1000000),
    'value': np.random.randn(1000000)
})

# Memory usage before optimization
print(df.memory_usage(deep=True).sum() / 1024**2, "MB")

# Optimize by converting object to category
df['category'] = df['category'].astype('category')

# Memory usage after optimization
print(df.memory_usage(deep=True).sum() / 1024**2, "MB")

2. Method Chaining for Clean Code

# Instead of multiple intermediate variables
df = (
    pd.read_csv('data.csv')
    .query('status == "active"')
    .assign(
        year=lambda x: pd.to_datetime(x['date']).dt.year,
        total_cost=lambda x: x['price'] * x['quantity']
    )
    .groupby(['year', 'region'])
    .agg(total_revenue=('total_cost', 'sum'))
    .reset_index()
    .sort_values('total_revenue', ascending=False)
)

3. Parallel Processing with Swifter

import pandas as pd
import swifter

df = pd.DataFrame({'text': ['some text'] * 100000})

def heavy_processing(text):
    # Simulate heavy work
    return text.upper()[::-1]

# Automatic parallelization
df['processed'] = df['text'].swifter.apply(heavy_processing)

Questions asked in Google interview

How would you optimize a Pandas operation running slowly on large dataset?
Explain the difference between merge() and join()
Write code to calculate rolling averages with different window sizes
How would you handle a DataFrame with 100 million rows?
Explain memory optimization techniques
Write code to perform complex GroupBy with multiple aggregations
Explain the internal data structure of DataFrame
How would you implement feature engineering pipelines?
Write code to calculate year-over-year growth
Explain vectorized operations and their importance
How to handle SettingWithCopyWarning?
Write code to perform window functions similar to SQL

Questions asked in Amazon interview

Write code to merge multiple DataFrames with different schemas
How would you calculate year-over-year growth?
Explain how to handle time series data with irregular intervals
Write code to identify and remove duplicate records
How would you implement a moving average crossover strategy?
Explain the difference between transform() and apply()
Write code to pivot data for sales analysis
How would you handle categorical variables with high cardinality?
Explain how to optimize for memory efficiency
Write code to perform cohort analysis

Questions asked in Meta interview

Write code to analyze user engagement data
How would you calculate conversion funnels?
Explain how to handle large-scale data processing
Write code to resample time series data
How would you implement A/B testing analysis?
Explain method chaining and its benefits
Write code to calculate retention metrics
How would you handle hierarchical data structures?
Explain vectorization benefits over loops
Write code to analyze network data

Questions asked in Microsoft interview

Explain the SettingWithCopyWarning and how to avoid it
Write code to perform window functions similar to SQL
How would you handle timezone conversions?
Explain the difference between views and copies
Write code to implement custom aggregation functions
How would you optimize Pandas for production?
Explain multi-level indexing use cases
Write code to compare two DataFrames
How would you handle missing data in time series?
Explain eval() and query() methods

Questions asked in Netflix interview

Write code to analyze viewing patterns and user behavior
How would you calculate streaming quality metrics?
Explain how to handle messy data from multiple sources
Write code to implement collaborative filtering preprocessing
How would you analyze content performance across regions?
Explain time series decomposition
Write code to calculate customer lifetime value
How would you handle data for recommendation systems?
Explain rolling window calculations for real-time analytics
Write code to analyze A/B test results

Questions asked in Apple interview

Write code to perform data validation on imported data
How would you implement data quality checks?
Explain how to handle multi-format data imports
Write code to analyze product performance metrics
How would you implement data anonymization?
Explain best practices for production Pandas code
Write code to create automated data reports
How would you handle data versioning?
Explain memory management for large DataFrames
Write code to implement time-based partitioning

Questions asked in Flipkart interview

Write code to analyze e-commerce transaction data
How would you calculate GMV metrics?
Explain handling high-cardinality categorical data
Write code to analyze customer purchase patterns
How would you implement product recommendation preprocessing?
Explain data aggregation for dashboard analytics

Questions asked in LinkedIn interview

Write code to analyze professional network connections
How would you calculate engagement metrics for posts?
Explain how to handle user activity data
Write code to implement skill-based matching
How would you analyze job posting performance?
Explain data preprocessing for NLP tasks