Skip to content

Pandas Interview Questions

This document provides a curated list of Pandas interview questions commonly asked in technical interviews for Data Science, Data Analysis, Machine Learning, and Python Developer roles. It covers fundamental concepts to advanced data manipulation techniques, including rigorous "brutally difficult" questions for senior roles.

This is updated frequently but right now this is the most exhaustive list of type of questions being asked.


Sno Question Title Practice Links Companies Asking Difficulty Topics
1 What is Pandas and why is it used? Pandas Docs Google, Amazon, Meta, Netflix Easy Basics, Introduction
2 Difference between Series and DataFrame GeeksforGeeks Google, Amazon, Meta, Microsoft Easy Data Structures
3 How to create a DataFrame from dictionary? Pandas Docs Amazon, Google, Flipkart Easy DataFrame Creation
4 Difference between loc and iloc Stack Overflow Google, Amazon, Meta, Apple, Netflix Easy Indexing, Selection
5 How to read CSV, Excel, JSON files? Pandas Docs Most Tech Companies Easy Data I/O
6 How to handle missing values (NaN)? Real Python Google, Amazon, Meta, Netflix, Apple Medium Missing Data, fillna, dropna
7 Difference between dropna() and fillna() Pandas Docs Amazon, Google, Microsoft Easy Missing Data
8 Explain GroupBy in Pandas Real Python Google, Amazon, Meta, Netflix, Apple Medium GroupBy, Aggregation
9 How to merge two DataFrames? Pandas Docs Google, Amazon, Meta, Microsoft Medium Merging, Joining
10 Difference between merge(), join(), concat() Stack Overflow Google, Amazon, Meta Medium Merging, Joining, Concatenation
11 How to apply a function to DataFrame? Pandas Docs Google, Amazon, Meta, Netflix Medium apply, applymap, map
12 Difference between apply(), map(), applymap() GeeksforGeeks Google, Amazon, Microsoft Medium Data Transformation
13 How to rename columns in DataFrame? Pandas Docs Most Tech Companies Easy Column Operations
14 How to sort DataFrame by column values? Pandas Docs Most Tech Companies Easy Sorting
15 How to filter rows based on conditions? Pandas Docs Google, Amazon, Meta, Netflix Easy Filtering, Boolean Indexing
16 How to remove duplicate rows? Pandas Docs Amazon, Google, Microsoft Easy Data Cleaning, Deduplication
17 How to change data types of columns? Pandas Docs Most Tech Companies Easy Data Types
18 What is the difference between copy() and view? Stack Overflow Google, Amazon, Meta Medium Memory Management
19 Explain pivot tables in Pandas Pandas Docs Amazon, Google, Microsoft, Netflix Medium Pivot Tables, Reshaping
20 Difference between pivot() and pivot_table() Stack Overflow Google, Amazon, Meta Medium Reshaping
21 How to handle datetime data in Pandas? Pandas Docs Google, Amazon, Netflix, Meta Medium DateTime, Time Series
22 How to create a date range? Pandas Docs Amazon, Netflix, Google Easy DateTime
23 What is MultiIndex (Hierarchical Indexing)? Pandas Docs Google, Amazon, Meta Hard MultiIndex, Hierarchical Data
24 How to reset and set index? Pandas Docs Most Tech Companies Easy Indexing
25 How to perform rolling window calculations? Pandas Docs Google, Amazon, Netflix, Apple Medium Rolling Windows, Time Series
26 How to calculate moving averages? GeeksforGeeks Google, Amazon, Netflix, Apple Medium Rolling Windows, Finance
27 How to perform resampling on time series? Pandas Docs Google, Amazon, Netflix Medium Resampling, Time Series
28 Difference between transform() and apply() Stack Overflow Google, Amazon, Meta Hard GroupBy, Data Transformation
29 How to create bins with cut() and qcut()? Pandas Docs Google, Amazon, Meta Medium Discretization, Binning
30 How to handle categorical data? Pandas Docs Google, Amazon, Meta, Netflix Medium Categorical Data, Memory
31 How to one-hot encode categorical data? Pandas Docs Google, Amazon, Meta, Microsoft Easy Feature Engineering, ML
32 How to read data from SQL database? Pandas Docs Amazon, Google, Microsoft Medium Database I/O
33 How to export DataFrame to various formats? Pandas Docs Most Tech Companies Easy Data Export
34 How to handle large datasets efficiently? Towards Data Science Google, Amazon, Netflix, Meta Hard Performance, Memory Optimization
35 What is Categorical dtype and when to use it? Pandas Docs Google, Amazon, Meta Medium Data Types, Memory Optimization
36 How to optimize memory usage in Pandas? Medium Google, Amazon, Netflix Hard Memory Optimization
37 Difference between inplace=True and returning copy Stack Overflow Most Tech Companies Easy DataFrame Modification
38 How to use query() method for filtering? Pandas Docs Google, Amazon, Meta Easy Filtering, Query
39 How to work with string data (str accessor)? Pandas Docs Google, Amazon, Meta, Netflix Medium String Operations
40 How to use str accessor methods? Pandas Docs Amazon, Google, Microsoft Medium String Operations
41 How to split and expand string columns? GeeksforGeeks Amazon, Google, Meta Medium String Operations, Data Cleaning
42 How to use melt() for unpivoting data? Pandas Docs Google, Amazon, Meta Medium Reshaping, Unpivoting
43 How to use stack() and unstack()? Pandas Docs Google, Amazon, Meta Medium Reshaping, MultiIndex
44 How to cross-tabulate with crosstab()? Pandas Docs Google, Amazon, Meta Medium Cross Tabulation, Analysis
45 How to calculate correlations? Pandas Docs Google, Amazon, Meta, Netflix Easy Statistical Analysis
46 How to calculate descriptive statistics? Pandas Docs Most Tech Companies Easy Statistical Analysis
47 How to use agg() for multiple aggregations? Pandas Docs Google, Amazon, Meta, Netflix Medium Aggregation
48 How to use named aggregations? Pandas Docs Google, Amazon, Meta Medium GroupBy, Named Aggregation
49 How to handle timezone-aware datetime? Pandas Docs Google, Amazon, Netflix Medium DateTime, Timezones
50 How to interpolate missing values? Pandas Docs Google, Amazon, Netflix Medium Missing Data, Interpolation
51 How to forward fill and backward fill? Pandas Docs Amazon, Netflix, Google Easy Missing Data, Time Series
52 How to use where() and mask() methods? Pandas Docs Google, Amazon, Meta Medium Conditional Operations
53 How to clip values in DataFrame? Pandas Docs Amazon, Google, Meta Easy Data Transformation
54 How to rank values in Pandas? Pandas Docs Google, Amazon, Meta, Netflix Easy Ranking
55 How to calculate percentage change? Pandas Docs Google, Amazon, Netflix, Apple Easy Time Series, Finance
56 How to shift and lag data? Pandas Docs Google, Amazon, Netflix Easy Time Series, Lag Features
57 How to calculate cumulative statistics? Pandas Docs Google, Amazon, Meta, Netflix Easy Cumulative Operations
58 How to use explode() for list columns? Pandas Docs Google, Amazon, Meta Medium List Operations, Data Preprocessing
59 How to sample data from DataFrame? Pandas Docs Google, Amazon, Meta, Netflix Easy Sampling
60 How to detect and handle outliers? Towards Data Science Google, Amazon, Meta, Netflix Medium Outlier Detection, Data Cleaning
61 How to normalize/standardize data? GeeksforGeeks Google, Amazon, Meta, Microsoft Medium Feature Engineering, ML
62 How to use eval() for efficient operations? Pandas Docs Google, Amazon, Meta Hard Performance Optimization
63 How to perform element-wise operations? Pandas Docs Most Tech Companies Easy Arithmetic Operations
64 Why vectorized operations are faster than loops? Real Python Google, Amazon, Meta Medium Performance, Vectorization
65 How to profile Pandas code performance? Pandas Docs Google, Amazon, Netflix Hard Performance Profiling
66 How to use pipe() for method chaining? Pandas Docs Google, Amazon, Meta Medium Method Chaining
67 How to handle SettingWithCopyWarning? Pandas Docs Google, Amazon, Meta, Microsoft Medium Common Errors, Debugging
68 How to compare two DataFrames? Pandas Docs Amazon, Google, Microsoft Medium Data Comparison, Validation
69 How to combine DataFrames with different schemas? Stack Overflow Google, Amazon, Meta Medium Merging, Schema Alignment
70 How to create conditional columns? GeeksforGeeks Most Tech Companies Easy Data Transformation
71 How to use np.where() with Pandas? Real Python Google, Amazon, Meta, Netflix Easy Conditional Operations
72 How to use np.select() for multiple conditions? Stack Overflow Google, Amazon, Meta Medium Conditional Operations
73 How to count value frequencies? Pandas Docs Most Tech Companies Easy Data Exploration
74 How to find unique values and nunique()? Pandas Docs Most Tech Companies Easy Data Exploration
75 How to check for null values? Pandas Docs Most Tech Companies Easy Missing Data
76 How to use any() and all() methods? Pandas Docs Google, Amazon, Meta Easy Boolean Operations
77 How to select specific columns? Pandas Docs Most Tech Companies Easy Column Selection
78 How to drop columns or rows? Pandas Docs Most Tech Companies Easy Data Cleaning
79 How to use assign() for creating new columns? Pandas Docs Google, Amazon, Meta Easy Column Creation
80 How to use idxmax() and idxmin()? Pandas Docs Google, Amazon, Meta, Netflix Easy Indexing
81 Why is iterating over rows slow? Stack Overflow Google, Amazon, Meta Medium Performance
82 How to use iterrows() and itertuples()? Pandas Docs Amazon, Google, Microsoft Easy Iteration
83 How to vectorize custom functions? Real Python Google, Amazon, Meta Hard Performance Optimization
84 How to use Pandas with NumPy? Pandas Docs Google, Amazon, Meta, Netflix Easy NumPy Integration
85 How to flatten hierarchical index? Stack Overflow Google, Amazon, Meta Medium MultiIndex
86 How to group by multiple columns? Pandas Docs Most Tech Companies Easy GroupBy
87 How to filter groups after GroupBy? Pandas Docs Google, Amazon, Meta Medium GroupBy, Filtering
88 How to get first/last n rows per group? Stack Overflow Google, Amazon, Meta, Netflix Medium GroupBy
89 How to handle JSON with nested structures? Pandas Docs Amazon, Google, Meta Medium JSON Processing
90 How to read/write Parquet files? Pandas Docs Google, Amazon, Netflix, Meta Easy File I/O, Big Data
91 Difference between Parquet, CSV, and Feather Towards Data Science Google, Amazon, Netflix Medium File Formats, Performance
92 How to use chunksize for large files? Pandas Docs Google, Amazon, Netflix, Meta Medium Large Data Processing
93 How to use nsmallest() and nlargest()? Pandas Docs Google, Amazon, Meta Easy Selection
94 How to calculate weighted average? Stack Overflow Google, Amazon, Netflix, Apple Medium Aggregation, Finance
95 How to perform window functions like SQL? Pandas Docs Google, Amazon, Meta, Netflix Medium Window Functions
96 How to join on nearest key (asof join)? Pandas Docs Google, Amazon, Netflix, Apple Hard Joining, Time Series
97 How to use combine_first() for data merging? Pandas Docs Amazon, Google, Microsoft Medium Merging
98 How to create period indices? Pandas Docs Google, Amazon, Netflix Medium Time Series
99 How to use Timedelta for time differences? Pandas Docs Google, Amazon, Netflix Easy DateTime
100 How to set display options globally? Pandas Docs Most Tech Companies Easy Display Options
101 What is method chaining and when to use it? Tom Augspurger Blog Google, Amazon, Meta Medium Method Chaining, Clean Code
102 How to calculate month-over-month change? StrataScratch Google, Amazon, Meta, Netflix Medium Time Series, Analytics
103 How to find customers with highest orders? DataLemur Amazon, Google, Meta, Netflix Medium GroupBy, Aggregation
104 [HARD] How to calculate retention metrics efficiently? StrataScratch Meta, Netflix, Amazon, Google Hard Cohort Analysis, Time Series
105 [HARD] How to implement A/B test analysis? Towards Data Science Meta, Google, Netflix, Amazon Hard Statistical Analysis, Testing
106 [HARD] How to optimize memory with category types? Pandas Docs Google, Amazon, Netflix Hard Memory Optimization
107 [HARD] How to implement cohort analysis? Towards Data Science Meta, Netflix, Amazon, Google Hard Cohort Analysis
108 [HARD] How to calculate funnel drop-off rates? StrataScratch Meta, Google, Amazon, Netflix Hard Funnel Analysis, Analytics
109 [HARD] How to implement custom testing using assert_frame_equal? Pandas Docs Google, Amazon, Microsoft Hard Testing, Quality
110 [HARD] How to handle sparse data structures? Pandas Docs Google, Amazon, Netflix Hard Sparse Data, Memory
111 [HARD] How to use Numba/JIT with Pandas? Pandas Docs Google, Amazon, Hedge Funds Hard Performance
112 [HARD] How to implement custom accessors? Pandas Docs Google, Amazon, Meta Hard Extending Pandas
113 [HARD] How to use Swifter for parallel processing? Swifter Docs Google, Amazon, Uber Hard Parallelism
114 [HARD] Explain Pandas Block Manager structure Pandas Wiki Google, Amazon, Meta Hard Internals
115 [HARD] How Copy-on-Write (CoW) works in Pandas 2.0+? Pandas Docs Google, Meta, Microsoft Hard Internals, Performance
116 [HARD] How to use PyArrow backend for performance? Pandas Docs Google, Amazon, Databricks Hard Performance, Arrow
117 [HARD] How to implement custom index types? Pandas Docs Google, Amazon Hard Extending Pandas
118 [HARD] How to optimize MultiIndex slicing performance? Pandas Docs Google, Amazon, Hedge Funds Hard Optimization
119 [HARD] groupby().transform() internal mechanics vs apply() Pandas Docs Google, Amazon, Meta Hard Deep Dive
120 [HARD] How to implement rolling window with raw=True? Pandas Docs Google, Amazon, Hedge Funds Hard Optimization
121 [HARD] How to extend Pandas with custom plotting backends? Pandas Docs Google, Amazon Hard Extending Pandas
122 [HARD] How to handle time series offset aliases? Pandas Docs Google, Amazon, Hedge Funds Hard Time Series
123 [HARD] How to use Dask DataFrames for out-of-core computing? Dask Docs Google, Amazon, Netflix Hard Big Data
124 [HARD] How to optimize chained assignment performance? Pandas Docs Google, Amazon, Meta Hard Optimization
125 [HARD] Nullable integers/floats implementation? Pandas Docs Google, Amazon, Microsoft Hard Internals
126 [HARD] How to use Cython with Pandas? Pandas Docs Google, Amazon, HFT Firms Hard Performance
127 [HARD] Comparison of Parquet vs Feather vs ORC? Apache Arrow Google, Amazon, Netflix Hard Systems

Code Examples

1. Memory Optimization

import pandas as pd
import numpy as np

# Typical large dataframe creation
df = pd.DataFrame({
    'category': np.random.choice(['A', 'B', 'C'], size=1000000),
    'value': np.random.randn(1000000)
})

# Memory usage before optimization
print(df.memory_usage(deep=True).sum() / 1024**2, "MB")

# Optimize by converting object to category
df['category'] = df['category'].astype('category')

# Memory usage after optimization
print(df.memory_usage(deep=True).sum() / 1024**2, "MB")

2. Method Chaining for Clean Code

# Instead of multiple intermediate variables
df = (
    pd.read_csv('data.csv')
    .query('status == "active"')
    .assign(
        year=lambda x: pd.to_datetime(x['date']).dt.year,
        total_cost=lambda x: x['price'] * x['quantity']
    )
    .groupby(['year', 'region'])
    .agg(total_revenue=('total_cost', 'sum'))
    .reset_index()
    .sort_values('total_revenue', ascending=False)
)

3. Parallel Processing with Swifter

import pandas as pd
import swifter

df = pd.DataFrame({'text': ['some text'] * 100000})

def heavy_processing(text):
    # Simulate heavy work
    return text.upper()[::-1]

# Automatic parallelization
df['processed'] = df['text'].swifter.apply(heavy_processing)

Questions asked in Google interview

  • How would you optimize a Pandas operation running slowly on large dataset?
  • Explain the difference between merge() and join()
  • Write code to calculate rolling averages with different window sizes
  • How would you handle a DataFrame with 100 million rows?
  • Explain memory optimization techniques
  • Write code to perform complex GroupBy with multiple aggregations
  • Explain the internal data structure of DataFrame
  • How would you implement feature engineering pipelines?
  • Write code to calculate year-over-year growth
  • Explain vectorized operations and their importance
  • How to handle SettingWithCopyWarning?
  • Write code to perform window functions similar to SQL

Questions asked in Amazon interview

  • Write code to merge multiple DataFrames with different schemas
  • How would you calculate year-over-year growth?
  • Explain how to handle time series data with irregular intervals
  • Write code to identify and remove duplicate records
  • How would you implement a moving average crossover strategy?
  • Explain the difference between transform() and apply()
  • Write code to pivot data for sales analysis
  • How would you handle categorical variables with high cardinality?
  • Explain how to optimize for memory efficiency
  • Write code to perform cohort analysis

Questions asked in Meta interview

  • Write code to analyze user engagement data
  • How would you calculate conversion funnels?
  • Explain how to handle large-scale data processing
  • Write code to resample time series data
  • How would you implement A/B testing analysis?
  • Explain method chaining and its benefits
  • Write code to calculate retention metrics
  • How would you handle hierarchical data structures?
  • Explain vectorization benefits over loops
  • Write code to analyze network data

Questions asked in Microsoft interview

  • Explain the SettingWithCopyWarning and how to avoid it
  • Write code to perform window functions similar to SQL
  • How would you handle timezone conversions?
  • Explain the difference between views and copies
  • Write code to implement custom aggregation functions
  • How would you optimize Pandas for production?
  • Explain multi-level indexing use cases
  • Write code to compare two DataFrames
  • How would you handle missing data in time series?
  • Explain eval() and query() methods

Questions asked in Netflix interview

  • Write code to analyze viewing patterns and user behavior
  • How would you calculate streaming quality metrics?
  • Explain how to handle messy data from multiple sources
  • Write code to implement collaborative filtering preprocessing
  • How would you analyze content performance across regions?
  • Explain time series decomposition
  • Write code to calculate customer lifetime value
  • How would you handle data for recommendation systems?
  • Explain rolling window calculations for real-time analytics
  • Write code to analyze A/B test results

Questions asked in Apple interview

  • Write code to perform data validation on imported data
  • How would you implement data quality checks?
  • Explain how to handle multi-format data imports
  • Write code to analyze product performance metrics
  • How would you implement data anonymization?
  • Explain best practices for production Pandas code
  • Write code to create automated data reports
  • How would you handle data versioning?
  • Explain memory management for large DataFrames
  • Write code to implement time-based partitioning

Questions asked in Flipkart interview

  • Write code to analyze e-commerce transaction data
  • How would you calculate GMV metrics?
  • Explain handling high-cardinality categorical data
  • Write code to analyze customer purchase patterns
  • How would you implement product recommendation preprocessing?
  • Explain data aggregation for dashboard analytics

Questions asked in LinkedIn interview

  • Write code to analyze professional network connections
  • How would you calculate engagement metrics for posts?
  • Explain how to handle user activity data
  • Write code to implement skill-based matching
  • How would you analyze job posting performance?
  • Explain data preprocessing for NLP tasks

Additional Resources