crazypythonmaster/pythonscript-scrapy
语言: Python
git: https://github.com/crazypythonmaster/pythonscript-scrapy
这些脚本从瑞典的两个着名站点提取所有医生的数据(如姓名,地址,电话号码等),并将结果输出为CSV格式。 并比较它们,做一些过程。 如何执行: 以下脚本需要按顺序运行。脚本所需的输入文件和脚本生成的输出文件 被提及。 用途: python -d hitta_primary_clinic.py: 描述:该脚本从Hitta中提取所有主要诊所。 输出: 1. hitta_primary_clinic_result.csv hitta.py: 描述:此脚本搜索父诊所名称并从Hitta中提取详细信息。 输入: 1.Vardcentraler.csv 输出: 1. hitta_result.csv
eniro.py: 描述:此脚本搜索父诊所名称并从Eniro中提取详细信息。 输入: 1.Vardcentraler.csv 输出: 1. eniro_result.csv
compare_hitta_eniro_vardcentraler.py: 说明:此脚本将Hitta和Eniro结果进行比较。 输入: 1. Vardcentraler.csv 2. hitta_result.csv 3. eniro_result.csv 4. hitta_primary_clinic_result.csv 输出: 1. hitta_eniro_vardcentraler_result.csv
hitta_eniro_vs_primary_clinic.py: 说明:此脚本比较Primary Clinic结果文件中的hitta_eniro结果。 输入: 1. hitta_eniro_vardcentraler_result.csv 2. hitta_primary_clinic_result.csv 输出: 1. hitta_eniro_vs_primary_clinic.csv 2. hitta_eniro_vs_primary_clinic_summary.csv
list_found_doctors.py: 说明:此脚本检查主页URL中的医生。 输入: 1. hitta_eniro_vardcentraler_result.csv 2. ALL_RECORDS_MATCH_73_273_TO_273_373_SPLIT_ADDRESS.csv 输出: 1. found_doctors_result.csv 2. doctors_result.csv
get_all_doctors.py: 说明:此脚本提取主页上找到的所有医生姓名。 输入: 1. found_doctors_result.csv 输出: 1. get_all_doctors_names_result.csv
本文使用googletrans自动翻译,仅供参考, 原文来自github.com
These scripts extract all doctors' data (such as name, address,telephone number and so on) from two famous sites in Sweden, and output the result as CSV format.
And compare them, do some processes.
How to excute :
The following scripts needs to be run in sequence. The input files required by the script and the output files generated by the scripts
are mentioned.
Usages:
python
hitta_primary_clinic.py:
Description: This script extracts all primary clinics from Hitta.
Output:
1. hitta_primary_clinic_result.csv
hitta.py:
Description: This script search for parent clinic name and extract the details from Hitta.
Input:
1.Vardcentraler.csv
Output:
1. hitta_result.csv
eniro.py:
Description: This script search for parent clinic name and extract the details from Eniro.
Input:
1.Vardcentraler.csv
Output:
1. eniro_result.csv
compare_hitta_eniro_vardcentraler.py:
Description: This script compares the Hitta and Eniro results together.
Input:
1. Vardcentraler.csv
2. hitta_result.csv
3. eniro_result.csv
4. hitta_primary_clinic_result.csv
Output:
1. hitta_eniro_vardcentraler_result.csv
hitta_eniro_vs_primary_clinic.py:
Description: This script compares the hitta_eniro result from Primary Clinic result file.
Input:
1. hitta_eniro_vardcentraler_result.csv
2. hitta_primary_clinic_result.csv
Output:
1. hitta_eniro_vs_primary_clinic.csv
2. hitta_eniro_vs_primary_clinic_summary.csv
list_found_doctors.py:
Description: This script checks for doctors in Home URL.
Input:
1. hitta_eniro_vardcentraler_result.csv
2. ALL_RECORDS_MATCH_73_273_TO_273_373_SPLIT_ADDRESS.csv
Output:
1. found_doctors_result.csv
2. doctors_result.csv
get_all_doctors.py:
Description: This script extract all doctors names found on home url.
Input:
1. found_doctors_result.csv
Output:
1. get_all_doctors_names_result.csv