命令行工具全流程记录

0 项目前的准备工作

1. 确定目标

- 实现命令行爬取小说的功能
实现命令行对文件内容进行批量替换的功能
实现内网穿透功能

2. 项目名称

EasyTool

命令:

`et novel

et proxy server

et proxy client

et create

3. 项目目录结构

1. 项目初始

mkdir code
cd code
复制代码

这里我们使用的第三方库,cobra,来帮助我们实现命令行工具,

安装依赖

go get -u github.com/spf13/cobra
复制代码

使用工具cobra初始化项目

这里我们的项目名称为easytool, 简称为et

cobra init --pkg-name et 
复制代码

执行一下:

go run mian.go
复制代码

如果出现一下错误:

main.go:18:8: package et/cmd is not in GOROOT (c:\go\src\et\cmd)
复制代码

运行

go mod init et
复制代码

2.设置欢迎页面

根目录创建global/welcome.go文件

package global

var Welcome = ` _______  _________   
|\  ___ \|\___   ___\ 
\ \   __/\|___ \  \_| 
 \ \  \_|/__  \ \  \  
  \ \  \_|\ \  \ \  \ 
   \ \_______\  \ \__\
    \|_______|   \|__|

`

复制代码

修改root.go

import "et/global"
//...
var rootCmd = &cobra.Command{
	Use:   "et",
	Short: "一个命令行开发的实例",
	Long:  global.Welcome,
	// Uncomment the following line if your bare application
	// has an action associated with it:
	// Run: func(cmd *cobra.Command, args []string) { },
}

复制代码

3. 爬取小说

1. 新增命令 `et novel`

利用cobra提供的生成工具

cobra add novel
复制代码

修该novel.go


// novelCmd represents the novel command
var novelCmd = &cobra.Command{
	Use:   "novel",
	Short: "小说爬虫",
	Long: `使用说明：
1. 首先要生成配置文件使用命令et novel config
2. 配置好要爬取的小说网站信息
3. 运行爬取程序et novel run

`,
	Run: func(cmd *cobra.Command, args []string) {
		if len(args) > 0 {
			switch args[0] {
			case "config":
				novel.CreateJson()
			case "run":
				n := novel.NewNovel()
				n.Run()
			default:
				fmt.Println("无效命令")
			}
		} else {
			fmt.Println("缺少参数")
		}
	},
}
复制代码

2. 生成配置文件

实现et novel config命令

分析：将提前准备好的配置模板，写入道当前位置的novel.json中

新建pkg/novel/novel.go

首先准备配置文件中的字段

var json = `{
    "host":"",
    "url":"",
    "chapter":"",
    "novel":"",
    "is_fix":false
}`

复制代码

host 为域名

url是小说章节列表页

chapater 是章节的选取器

novel 是小说内容的选取器

is_fix 是否对连接进行拼接,因为好多网站里的连接都省去了域名,是不完整的,这个选项可帮助我们拼接字符串

pkg/novel/novel.go

package novel

import (
	"io/ioutil"
	"os"

	"github.com/spf13/cobra"
)

var json = `{
    "host":"",
    "url":"",
    "chapter":"",
    "novel":"",
    "is_fix":false
}`

func CreateJson() {
	_, err := os.Create("./novel.json")
	cobra.CheckErr(err)
	err = ioutil.WriteFile("./novel.json", []byte(json), 0777)
	cobra.CheckErr(err)

}
复制代码

3. 读取配置文件

利用viper读取json里面的配置文件

下载viper

go get github.com/spf13/viper
复制代码

创建获取配置文件的方法

type Novel struct {
	Host    string `mapstructure: "host"`
	Url     string `mapstructure:"url"`
	Chapter string `mapstructure:"chapter"`
	Novel   string `mapstructure:"novel"`
	IsFix   bool   `mapstructure:"is_fix"`
}

func NewNovel() Novel {
	viper := viper.New()
	viper.SetConfigName("novel") // name of config file (without extension)
	viper.SetConfigType("json")  // REQUIRED if the config file does not have the extension in the name
	viper.AddConfigPath("./")    // path to look for the config file in
	viper.AddConfigPath(".")     // optionally look for config in the working directory
	err := viper.ReadInConfig()  // Find and read the config file
	cobra.CheckErr(err)
	var config Novel
	viper.Unmarshal(&config)
	return config
}
复制代码

4.实现爬取小说逻辑

读取完配置以后,利用http库进行页面抓取

获取到内容后使用gojquery进行页面内容的获取,下面我们先安装这个库

go get github.com/PuerkitoBio/goquery
复制代码

使用goquery

根据css,将内容提取出来

type ChapterNode struct {
	Url  string
	Name string
}

func (n *Novel) Run() {
	os.Mkdir("./novel", 0777)
	content := make(chan ChapterNode)
	for i := 0; i < 100; i++ {
		go n.SaveContent(content)
	}
	doc, err := Request(n.Url)
	if err != nil {
		cobra.CheckErr(err)
	}
	// Find the review items
	doc.Find(n.Chapter).Each(func(i int, s *goquery.Selection) {
		// For each item found, get the band and title
		chapter := s.Text()
		link, _ := s.Attr("href")
		if n.IsFix {
			link = n.Host + link
		}
		fmt.Printf("章节名称: %s  章节地址: %v \n", chapter, link)
		node := ChapterNode{
			Url:  link,
			Name: chapter,
		}
		content <- node
	})
}

func Request(url string) (*goquery.Document, error) {
	resp, err := http.Get(url)
	cobra.CheckErr(err)
	defer resp.Body.Close()
	// Load the HTML document
	doc, err := goquery.NewDocumentFromReader(resp.Body)
	return doc, err
}

复制代码

5.小说的存储

文本存储，单个文件/每章一个文件（这个更像是数据库里的每一条记录）

这里有两种两种方式,一种是存储就能一个文件,这种方式的话就不能使用channel,因为,追加的过程,因为章节之间是有顺序的,所以这里的话为了体验一下channel+goroutine,所以将一本小说分章节存储到文件里,然后

func (n *Novel) SaveContent(c chan ChapterNode) {
	for node := range c {
		doc, _ := Request(node.Url)
		content := doc.Find(n.Novel).Text()
		os.Create("./novel/" + node.Name + ".txt")
		ioutil.WriteFile("./novel/"+node.Name+".txt", []byte(content), 0777)
		time.Sleep(1 * time.Second)
	}
}

复制代码