且看我是如何批量移除掘金的活动链接

本文正在参加「Python主题月」，详情查看活动链接

前言

如图掘友的需求是参加完活动，把文章内关于活动的链接移除。

这需求很有意思，我相信应该很多人都会有类似的需求，毕竟自己的文章总想纯粹点（掘金BB不要打我）。所以今天文章的目的就是用程序帮我们自动完成枯燥乏味的工作。

写作此文的目的除了完成需求外，还希望您能如下有所收获。

如果你想设计 CMS 系统，希望你可以了解到掘金的文章是如何一步步发布成功。
如果您是新手，我希望我分析的思路，可以带你走进程序员朴实无华的生活。
如果您是想学习 Python 却不知道能用来干嘛的小伙伴，我希望您能看到 Python 在爬虫/自动化上的优势。
如果您是后端大佬，我希望您能给作者多提意见，您的建议是我最好的进步。

技术依赖

思路分析

掘金的文章发布大致的流程如下：

1. 草稿箱
- 新建文章，默认存入草稿箱
- 更新文章，直接更新草稿箱文章内容
1. 文章
- 草稿箱文章发布，运营审核完成，文章公开
- 文章编辑，更新文章直接回到草稿箱如此往复

所以对于需求来说，我们要做的工作如下：

获取文章详情
更新文章，删除活动信息
再次发布文章

1. 文章发布

通过页面操作也可发现掘金做了两件事，更新文章、发布文章

更新文章没啥好说的，更新文章的标题、内容、标签等字段。

发布文章需要注意的是如果有绑定专栏需要提交对于专栏的ID

所以我们发布文章就要知道文章的 draft_id 与 更新后的文章

2. 草稿详情

此接口会返回文章详细的数据包括绑定的专栏等信息，返回的数据太多就不展示了。

这里拿到文章草稿详情就可以了。

3. 文章详情

此接口可以通过文章的 ID 获取到对应的 draft_id，现在的问题就是获取文章 ID 了。

4. 文章列表

如图获取掘金文章列表的接口。返回的数据如下：

我们可以访问文章的列表就可以获取到文章的 ID 了。

总结一下我们要做的就是：

获取全部文章 ID
通过文章 ID 获取文章对应草稿箱 ID
获取文章草稿箱内详情并更新
发布更新后文章

代码实现

代码的基础为掘金自动发布文章

1. 接口的封装

需要请求的接口为思路分析中分析的所有接口

import requests
from requests import cookies

class Juejin(object):

    # 掘金发布文章URL
    publish_url = "https://api.juejin.cn/content_api/v1/article/publish"

    # 掘金草稿箱文章URL
    article_draft_url = "https://api.juejin.cn/content_api/v1/article_draft/query_list"

    # 掘金草稿箱文章详情
    article_draft_detail_url = "https://api.juejin.cn/content_api/v1/article_draft/detail"

    # 掘金草稿箱文章详情
    article_draft_update_url = "https://api.juejin.cn/content_api/v1/article_draft/update"

    # 掘金草稿箱文章详情
    article_detail_url = "https://api.juejin.cn/content_api/v1/article/detail"

    # 文章列表
    article_list_url = "https://api.juejin.cn/content_api/v1/article/query_list"

    # 获取用户信息
    user_url = "https://api.juejin.cn/user_api/v1/user/get"


    def __init__(self, driver_cookies=None, cookie_obj=None):
        self.session = requests.session()
        if driver_cookies:
       

    def push_draft_last_one(self):
        article_draft = self.get_draft().get("data", [])
        if not article_draft:
            raise Exception("The article draft is empty")
        draft_id = article_draft[0].get("id")

        result = self.draft_publish(draft_id)
        print(result)
        if result.get("err_no", "") != 0:
            err_msg = result.get("err_msg", "")
            raise Exception(f"Juejin push article error, error message is {err_msg} ")
        return result.get("data", {})

    def request(self, *args, **kwargs):

        response = self.session.request(*args, **kwargs)
        if response.status_code != 200:
            raise Exception("Request error")
        return response.json()

    def get_user(self):
        return self.request("get", self.user_url)

    def get_article_list(self, user_id, cursor="0"):
        data = {
            "user_id": user_id,
            "sort_type": 2,
            "cursor": cursor
        }
        return self.request("post", self.article_list_url, json=data)

    def get_draft(self):
        return self.request('post', self.article_draft_url)

    def get_draft_detail(self, draft_id):
        return self.request("post", self.article_draft_detail_url, json={"draft_id": draft_id})

    def get_article_detail(self, article_id):
        return self.request("post", self.article_detail_url, json={"article_id": article_id})

    def draft_update(self, article_info):
        return self.request('post', self.article_draft_update_url, json=article_info)

    def draft_publish(self, draft_id, column_ids=None):

        if column_ids is None:
            column_ids = []

        json = {
            "draft_id": draft_id,
            "sync_to_org": False,
            "column_ids": column_ids
        }
        result = self.request('post', self.publish_url, json=json)
        return result
复制代码

2. 执行发布任务

如下为脚本的执行逻辑，没有难点就是跟着之前的思路分析逆向实现而已，需要注意的有：

活动时间不可随意更改，因后续的逻辑会以此为依据过滤文章和结束脚本。
文章中活动链接如为官方链接，可不做更改；如果有多种格式链接，请自行配置正则表达式。
脚本中增加了少许睡眠时间，担心掘金会有对应接口限制。
此脚本为了简单，不实现登录，可以自行复制浏览器的 cookie，下图为 cookie 的位置。

def update_and_republish():

    # 定义活动时间
    act_start_datetime = "2021-06-02 00:00:00"
    act_end_datetime = "2021-06-30 23:59:59"
    
    # 定义活动链接正则
    pattern1 = re.compile(r"这是我参与更文挑战的第\d*天，活动详情查看： \[更文挑战\]\(https\://juejin\.cn/post/6967194882926444557\)\n")
    pattern2 = re.compile(r"这是我参与更文挑战的第\d*天，活动详情查看： \[更文挑战\]\(https\://juejin\.cn/post/6967194882926444557\)")

    # session id 自行设置
    session_id = ""
    
    cookie = requests.cookies.create_cookie(
        domain=".juejin.cn",
        name="sessionid",
        value=session_id
    )
    juejin = Juejin(cookie_obj=cookie)

    user_id = juejin.get_user().get("data", {}).get("user_id")
    start_flag = True
    cursor = "0"
    has_more = True

    act_start_time = time.mktime(time.strptime(act_start_datetime, '%Y-%m-%d %H:%M:%S'))
    act_end_time = time.mktime(time.strptime(act_end_datetime, '%Y-%m-%d %H:%M:%S'))

    patterns = [pattern1, pattern2]

    # 获取文章列表
    def art_info():
        nonlocal cursor, has_more
        response = juejin.get_article_list(user_id, cursor)
        time.sleep(1)
        has_more = response.get("has_more")
        cursor = response.get("cursor")
        return response.get("data")
    
    # 删除活动链接后更新文章并发布
    def do_update_and_republish(article_id):
        # if article_id != '6969119163293892639':
        #     return
        draft_id = juejin.get_article_detail(article_id).get("data", {}).get("article_info", {}).get("draft_id")
        if not draft_id:
            return False
        data = juejin.get_draft_detail(draft_id).get("data", {})
        article_draft = data.get("article_draft")
        columns = data.get("columns")
        column_ids = [column.get("column_id") for column in columns]

        def mark_content_replace(mark_content):
            for pattern in patterns:
                mark_content = re.sub(pattern, "", mark_content)
            return mark_content

        article = {
            "brief_content": article_draft.get("brief_content"),
            "category_id": article_draft.get("category_id"),
            "cover_image": article_draft.get("cover_image"),
            "edit_type": article_draft.get("edit_type"),
            "html_content": article_draft.get("html_content"),
            "is_english": article_draft.get("is_english"),
            "is_gfw": article_draft.get("is_gfw"),
            "link_url": article_draft.get("link_url"),
            "mark_content": mark_content_replace(article_draft.get("mark_content")),
            "tag_ids": [str(tag_id) for tag_id in article_draft.get("tag_ids")],
            "title": article_draft.get("title"),
            "id": article_draft.get("id"),
        }

        print(article)
        # juejin.draft_publish(draft_id, column_ids)
        time.sleep(1)
        # juejin.draft_publish(draft_id, column_ids)
        time.sleep(1)

    # 主调度函数
    def do(data):
        for art in data:
            ctime = int(art.get("article_info", {}).get("ctime"))
            if ctime and act_end_time < ctime:
                continue
            elif ctime and act_start_time > ctime:
                nonlocal start_flag
                start_flag = False
                break

            a_id = art.get("article_id")
            do_update_and_republish(a_id)

    while start_flag and has_more:
        do(art_info())
复制代码