Web-Scraping-Project---HKTVmall

所属分类:硬件设计
开发工具:Jupyter Notebook
文件大小:4487KB
下载次数:0
上传日期:2021-09-10 11:15:40
上 传 者sh-1993
说明:  no intro
(Project to scrape Electronics category from HKTVmall then clean and visualize data)

文件列表:
Web Scraping (HKTV Mall).pdf (4901367, 2021-09-10)
code (0, 2021-09-10)
code\Charts.ipynb (675168, 2021-09-10)
code\Ebook and Tablet Scraping.ipynb (46741, 2021-09-10)
code\Final Combined-Copy1.ipynb (83733, 2021-09-10)
code\Final Combined.ipynb (87176, 2021-09-10)
code\Headphones Scraping.ipynb (97904, 2021-09-10)
code\Laptop Scraping.ipynb (50728, 2021-09-10)
code\Mobile Phone Scraping.ipynb (118070, 2021-09-10)
code\Smart Watch Scraping.ipynb (88138, 2021-09-10)
data (0, 2021-09-10)
data\.DS_Store (6148, 2021-09-10)
data\Headphone Data.csv (112799, 2021-09-10)
data\Laptop Data.csv (54545, 2021-09-10)
data\Mobile Phone Data.csv (137723, 2021-09-10)
data\Smart Watch.csv (88688, 2021-09-10)
data\Tablet and Ebook Data.csv (44488, 2021-09-10)

# Web-Scraping-Project---HKTVmall ## Purpose To collect and analyze data from HKTVmall with Python. ## Steps Taken - Collect/Scrape data using BeautifulSoup and Selenium from top 5 electronics categories in HKTVmall. - Combine and clean data using Pandas, NumPy, re. - Visualize cleaned data with Matplotlib and Seaborn. ## Challenges - HKTVmall has a pop-up ad(with attribute called "CrazyAd") that will appear randomly and prevent advancing to the next page without closing it first. - Chinese characters were mixed in the product names and brands. - Some products didn't have any sales data or were sold out. - Sellers had the freedom to label categories for their products leading to incorrect classification. - No standard format to name products and brands. ## Future Improvements - Explore ways to deal with incorrect categories. - Explore HKTVmall API to get more in-depth data if the focus is on data analysis. - Expand the scope of the project to include other data which was limited due to time constraints.

近期下载者

相关文件


收藏者