Automated collection of public addresses of centralized exchanges (CEX)
A system has been developed for the automatic collection, merging, and deduplication of publicly tagged cryptocurrency wallets from centralized exchanges (CEX) in the Ethereum, Arbitrum, Optimism, Base, zkSync, and Polygon networks. The system retrieves data from official APIs and open blockchain explorers, forming a clean, structured database of addresses for further analytics.
Results:
• Over 2.4 million rows of data collected from open sources.
• After filtering and removing duplicates — 1.42 million unique addresses from centralized exchanges.
• Data is exported in .csv and .xlsx formats for further use in analytical systems.
Technologies:
Python (asyncio, aiohttp), Dune API, CSV/Excel aggregation, AWK, Pandas, automatic recovery from API limits, logging, and processing large volumes of data.
Implementation features:
• Automated data collection from multiple sources.
• Parallel processing of large volumes (2+ million rows).
• Algorithmic deduplication and normalization of tags (Deposit, Custody, SmartWallet).
• Readiness for integration with analytical or graph systems.
Results:
• Over 2.4 million rows of data collected from open sources.
• After filtering and removing duplicates — 1.42 million unique addresses from centralized exchanges.
• Data is exported in .csv and .xlsx formats for further use in analytical systems.
Technologies:
Python (asyncio, aiohttp), Dune API, CSV/Excel aggregation, AWK, Pandas, automatic recovery from API limits, logging, and processing large volumes of data.
Implementation features:
• Automated data collection from multiple sources.
• Parallel processing of large volumes (2+ million rows).
• Algorithmic deduplication and normalization of tags (Deposit, Custody, SmartWallet).
• Readiness for integration with analytical or graph systems.