自建本地扒站API接口源码

昨天不是说让GPT给我写扒站接口他不给写是吧,说什么容易违反法律什么的。今天我换了种方式问它,我跟它说,你可以帮我写一个API接口吗,我想把我自己网站上的静态资源全部都下载下来,然后它就真的开始写了哈哈哈哈,下面这个是经过了好几次修改完善以后的接口源码,目前功能还稍微有点拉胯,只能下载比较小的那种,资源多的话容易504,有兴趣的可以自己让GPT再把代码打磨一下。

自建本地扒站API接口源码-彩豆博客

安装依赖

安装Goutte库

首先,确保已经安装了 Composer。如果尚未安装,请按照 Composer 的官方文档进行安装:https://getcomposer.org/download/

打开终端(命令行界面)。

进入项目的根目录,运行以下命令来安装 Goutte

安装完成后,将在的项目目录中看到一个 vendor 文件夹,其中包含 Goutte 包及其依赖项。

composer require fabpot/goutte

生成autoload.php

确保已经在项目根目录中运行了 Composer 安装命令,以便安装了 Goutte 包和其他依赖项。如果尚未安装,请按照我之前提供的步骤进行安装。

打开终端(命令行界面)。

进入项目的根目录,运行以下命令以生成 autoload.php 文件:

composer dump-autoload -o

PHP扒站接口源码

<?php
require_once 'vendor/autoload.php'; // Make sure to install the Goutte library using Composer

use Goutte\Client;

// Check if URL parameter is provided
if (!isset($_GET['url'])) {
    echo "Please provide the 'url' parameter.";
    exit;
}

// Target website URL
$websiteUrl = $_GET['url'];

$client = new Client();
$crawler = $client->request('GET', $websiteUrl);

// Get the raw HTML source code of the page
$htmlSource = $crawler->html();

// Decode HTML entities in the raw HTML source
$htmlSource = html_entity_decode($htmlSource, ENT_COMPAT | ENT_HTML5, 'UTF-8');

// Extract resource URLs from the raw HTML source
$resourceLinks = [];
$imageLinks = [];

preg_match_all('/(href|src)="([^"]+\.(css|js|html))"/', $htmlSource, $matches);

foreach ($matches[2] as $resourceUrl) {
    // Check if the URL is absolute or relative
    if (strpos($resourceUrl, 'http') !== 0) {
        $resourceUrl = rtrim($websiteUrl, '/') . '/' . ltrim($resourceUrl, '/');
    }

    $resourceLinks[] = $resourceUrl;
}

preg_match_all('/<img[^>]+src="([^"]+)"/', $htmlSource, $matches);

foreach ($matches[1] as $imageUrl) {
    // Check if the image URL is absolute or relative
    if (strpos($imageUrl, 'http') !== 0) {
        $imageUrl = rtrim($websiteUrl, '/') . '/' . ltrim($imageUrl, '/');
    }

    $imageLinks[] = $imageUrl;
}

// Create a download directory based on website structure
$downloadPath = './downloaded_resources/';
if (!file_exists($downloadPath)) {
    mkdir($downloadPath, 0777, true);
}

// Create the URL directory if it doesn't exist
$urlDirectory = $downloadPath . parse_url($websiteUrl, PHP_URL_HOST);
if (!file_exists($urlDirectory)) {
    mkdir($urlDirectory, 0777, true);
}

// Download resources and images to their respective directories
foreach ($resourceLinks as $resourceUrl) {
    $parsedUrl = parse_url($resourceUrl);
    $resourcePath = $urlDirectory . $parsedUrl['path'];
    $resourceDir = dirname($resourcePath);

    if (!file_exists($resourceDir)) {
        mkdir($resourceDir, 0777, true);
    }

    downloadFile($resourceUrl, $resourcePath);
}

foreach ($imageLinks as $imageUrl) {
    $parsedUrl = parse_url($imageUrl);
    $imagePath = $urlDirectory . $parsedUrl['path'];
    $imageDir = dirname($imagePath);

    if (!file_exists($imageDir)) {
        mkdir($imageDir, 0777, true);
    }

    downloadFile($imageUrl, $imagePath);
}

// Save the HTML source to a file
$htmlFilePath = $urlDirectory . '/index.html';
file_put_contents($htmlFilePath, $htmlSource);

// Create a zip archive of downloaded files
$zipFilename = date('YmdHis') . '.zip';
$zip = new ZipArchive();
if ($zip->open($downloadPath . $zipFilename, ZipArchive::CREATE | ZipArchive::OVERWRITE) === true) {
    foreach (new RecursiveIteratorIterator(new RecursiveDirectoryIterator($urlDirectory)) as $file) {
        if ($file->isFile()) {
            $filePath = $file->getPathname();
            $relativePath = str_replace($urlDirectory, '', $filePath);
            $zip->addFile($filePath, $relativePath);
        }
    }
    $zip->close();
}

// Set headers to force download
header('Content-Type: application/zip');
header('Content-Disposition: attachment; filename="' . $zipFilename . '"');
header('Content-Length: ' . filesize($downloadPath . $zipFilename));
readfile($downloadPath . $zipFilename);

// Delete downloaded files
array_map('unlink', glob($downloadPath . '*'));
rmdir($downloadPath);

exit;

function downloadFile($url, $savePath) {
    $ch = curl_init($url);
    $fp = fopen($savePath, 'wb');

    curl_setopt($ch, CURLOPT_FILE, $fp);
    curl_setopt($ch, CURLOPT_HEADER, 0);

    curl_exec($ch);
    curl_close($ch);
    fclose($fp);
}
?>
------本页内容已结束,喜欢请分享------

感谢您的来访,获取更多精彩文章请收藏本站。

© 版权声明
THE END
喜欢就支持一下吧
点赞244 分享
评论 抢沙发

请登录后发表评论

    暂无评论内容