Web-Scraping-Project---HKTVmall
所属分类:硬件设计
开发工具:Jupyter Notebook
文件大小:4487KB
下载次数:0
上传日期:2021-09-10 11:15:40
上 传 者:
sh-1993
说明: no intro
(Project to scrape Electronics category from HKTVmall then clean and visualize data)
文件列表:
Web Scraping (HKTV Mall).pdf (4901367, 2021-09-10)
code (0, 2021-09-10)
code\Charts.ipynb (675168, 2021-09-10)
code\Ebook and Tablet Scraping.ipynb (46741, 2021-09-10)
code\Final Combined-Copy1.ipynb (83733, 2021-09-10)
code\Final Combined.ipynb (87176, 2021-09-10)
code\Headphones Scraping.ipynb (97904, 2021-09-10)
code\Laptop Scraping.ipynb (50728, 2021-09-10)
code\Mobile Phone Scraping.ipynb (118070, 2021-09-10)
code\Smart Watch Scraping.ipynb (88138, 2021-09-10)
data (0, 2021-09-10)
data\.DS_Store (6148, 2021-09-10)
data\Headphone Data.csv (112799, 2021-09-10)
data\Laptop Data.csv (54545, 2021-09-10)
data\Mobile Phone Data.csv (137723, 2021-09-10)
data\Smart Watch.csv (88688, 2021-09-10)
data\Tablet and Ebook Data.csv (44488, 2021-09-10)
# Web-Scraping-Project---HKTVmall
## Purpose
To collect and analyze data from HKTVmall with Python.
## Steps Taken
- Collect/Scrape data using BeautifulSoup and Selenium from top 5 electronics categories in HKTVmall.
- Combine and clean data using Pandas, NumPy, re.
- Visualize cleaned data with Matplotlib and Seaborn.
## Challenges
- HKTVmall has a pop-up ad(with attribute called "CrazyAd") that will appear randomly and prevent advancing to the next page without closing it first.
- Chinese characters were mixed in the product names and brands.
- Some products didn't have any sales data or were sold out.
- Sellers had the freedom to label categories for their products leading to incorrect classification.
- No standard format to name products and brands.
## Future Improvements
- Explore ways to deal with incorrect categories.
- Explore HKTVmall API to get more in-depth data if the focus is on data analysis.
- Expand the scope of the project to include other data which was limited due to time constraints.
近期下载者:
相关文件:
收藏者: