”工欲善其事,必先利其器。“—孔子《论语.录灵公》
首页 > 编程 > 是时候离开了吗?重建的时间到了!制作推特

是时候离开了吗?重建的时间到了!制作推特

发布于2024-09-01
浏览:363

The most critical features of a new social network for users fed up with Musk and Twitter, are as follows;

  • Import Twitter's archive.zip file
  • Easy as possible to sign up
  • Similar if not identical user features

Less critical but definitely helpful features of the platform;

  • Ethically monetised and moderated
  • Make use of AI to help identify problematic content
  • Blue tick with the use of Onfido or SMART identity services

In this post, we'll focus on the first feature. Importing Twitter's archive.zip file.

The file

Twitter haven't made your data all that easy to obtain. It's great that they give you access to it (legally, they have to). The format is crap.

It actually comes as a mini web archive and all your data is stuck in JavaScript files. It is more of a web app than convenient storage of data.

When you open up the Your archive.html file you get something like this;

Time to Leave? Time to Rebuild! Making Twitter

Note: I made the descision pretty early on to build using Next.js for the site, Go and GraphQL for the backend.

So, what do you do when your data isn't structured data?

Well, you parse it.

Creating a basic Go script

Head on over to the official docs on how to get started with Go, and set up your project directory.

We're going to hack this process together. It seems one of the most important features to attract people who feel too attached to TwitterX.

First step is to create a main.go file. In this file we'll GO (hah) and do some STUFF;

  • os.Args: This is a slice that holds command-line arguments.
  • os.Args[0] is the program's name, and os.Args[1] is the first argument passed to the program.
  • Argument Check: The function checks if at least one argument is provided. If not, it prints a message asking for a path.
  • run function: This function simply prints the path passed to it, for now.
package main

import (
    "fmt"
    "os"
)

func run(path string) {
    fmt.Println("Path:", path)
}

func main() {
    if len(os.Args) 



At every step, we'll run the file like so;

go run main.go twitter.zip

If you don't have a Twitter archive export, create a simple manifest.js file and give it the following JavaScript.

window.__THAR_CONFIG = {
  "userInfo" : {
    "accountId" : "1234567890",
    "userName" : "lukeocodes",
    "displayName" : "Luke ✨"
  },
};

Compress that into your twitter.zip file that we'll use throughout.

Read a Zip file

The next step is to read the contents of the zip file. We want to do this as efficiently as possible, and reduce time data is extracted on the disk.

There are many files in the zip that don't need to be extracted, too.

We'll edit the main.go file;

  • Opening the ZIP file: The zip.OpenReader() function is used to open the ZIP file specified by path.
  • Iterating through the files: The function loops over each file in the ZIP archive using r.File, which is a slice of zip.File. The Name property of each file is printed.
package main

import (
    "archive/zip"
    "fmt"
    "log"
    "os"
)

func run(path string) {
    // Open the zip file
    r, err := zip.OpenReader(path)
    if err != nil {
        log.Fatal(err)
    }
    defer r.Close()

    // Iterate through the files in the zip archive
    fmt.Println("Files in the zip archive:")
    for _, f := range r.File {
        fmt.Println(f.Name)
    }
}

func main() {
    // Example usage
    if len(os.Args) 



JS only! We're hunting structured data

This archive file is seriously unhelpful. We want to check for just .js files, and only in the /data directory.

  • Opening the ZIP file: The ZIP file is opened using zip.OpenReader().
  • Checking the /data directory: The program iterates through the files in the ZIP archive. It uses strings.HasPrefix(f.Name, "data/") to check if the file resides in the /data directory.
  • Finding .js files: The program also checks if the file has a .js extension using filepath.Ext(f.Name).
  • Reading and printing contents: If a .js file is found in the /data directory, the program reads and prints its contents.
package main

import (
    "archive/zip"
    "fmt"
    "io/ioutil"
    "log"
    "os"
    "path/filepath"
    "strings"
)

func readFile(file *zip.File) {
    // Open the file inside the zip
    rc, err := file.Open()
    if err != nil {
        log.Fatal(err)
    }
    defer rc.Close()

    // Read the contents of the file
    contents, err := ioutil.ReadAll(rc) // deprecated? :/ 
    if err != nil {
        log.Fatal(err)
    }

    // Print the contents
    fmt.Printf("Contents of %s:\n", file.Name)
    fmt.Println(string(contents))
}

func run(path string) {
    // Open the zip file
    r, err := zip.OpenReader(path)
    if err != nil {
        log.Fatal(err)
    }
    defer r.Close()

    // Iterate through the files in the zip archive
    fmt.Println("JavaScript files in the zip archive:")
    for _, f := range r.File {
        // Use filepath.Ext to check the file extension
        if strings.HasPrefix(f.Name, "data/") && strings.ToLower(filepath.Ext(f.Name)) == ".js" {
            readFile(f)
            return // Exit after processing the first .js file so we don't end up printing a gazillion lines when testing
        }
    }
}

func main() {
    // Example usage
    if len(os.Args) 



Parse the JS! We want that data

We've found the structured data. Now we need to parse it. The good news is there are existing packages for using JavaScript inside Go. We'll be using goja.

If you're on this section, familiar with Goja, and you've seen the output of the file, you may see we're going to have errors in our future.

Install goja:

go get github.com/dop251/goja

Now we're going to edit the main.go file to do the following;

  • Parsing with goja: The goja.New() function creates a new JavaScript runtime, and vm.RunString(processedContents) runs the processed JavaScript code within that runtime.
  • Handle errors in parsing
package main

import (
    "archive/zip"
    "fmt"
    "io/ioutil"
    "log"
    "os"
    "path/filepath"
    "strings"
)

func readFile(file *zip.File) {
    // Open the file inside the zip
    rc, err := file.Open()
    if err != nil {
        log.Fatal(err)
    }
    defer rc.Close()

    // Read the contents of the file
    contents, err := ioutil.ReadAll(rc) // deprecated? :/ 
    if err != nil {
        log.Fatal(err)
    }

    // Parse the JavaScript file using goja
    vm := goja.New()
    _, err = vm.RunString(contents)
    if err != nil {
        log.Fatalf("Error parsing JS file: %v", err)
    }

    fmt.Printf("Parsed JavaScript file: %s\n", file.Name)
}

func run(path string) {
    // Open the zip file
    r, err := zip.OpenReader(path)
    if err != nil {
        log.Fatal(err)
    }
    defer r.Close()

    // Iterate through the files in the zip archive
    fmt.Println("JavaScript files in the zip archive:")
    for _, f := range r.File {
        // Use filepath.Ext to check the file extension
        if strings.HasPrefix(f.Name, "data/") && strings.ToLower(filepath.Ext(f.Name)) == ".js" {
            readFile(f)
            return // Exit after processing the first .js file so we don't end up printing a gazillion lines when testing
        }
    }
}

func main() {
    // Example usage
    if len(os.Args) 



SUPRISE. window is not defined might be a familiar error. Basically goja runs an EMCA runtime. window is browser context and sadly unavailable.

ACTUALLY Parse the JS

I went through a few issues at this point. Including not being able to return data because it's a top level JS file.

Long story short, we need to modify the contents of the files before loading them into the runtime.

Let's modify the main.go file;

  • reConfig: A regex that matches any assignment of the form window.someVariable = { and replaces it with var data = {.
  • reArray: A regex that matches any assignment of the form window.someObject.someArray = [ and replaces it with var data = [
  • Extracting data: Running the script, we use vm.Get("data") to retrieve the value of the data variable from the JavaScript context.
package main

import (
    "archive/zip"
    "fmt"
    "io/ioutil"
    "log"
    "os"
    "path/filepath"
    "regexp"
    "strings"

    "github.com/dop251/goja"
)

func readFile(file *zip.File) {
    // Open the file inside the zip
    rc, err := file.Open()
    if err != nil {
        log.Fatal(err)
    }
    defer rc.Close()

    // Read the contents of the file
    contents, err := ioutil.ReadAll(rc)
    if err != nil {
        log.Fatal(err)
    }

    // Regular expressions to replace specific patterns
    reConfig := regexp.MustCompile(`window\.\w \s*=\s*{`)
    reArray := regexp.MustCompile(`window\.\w \.\w \.\w \s*=\s*\[`)

    // Replace patterns in the content
    processedContents := reConfig.ReplaceAllStringFunc(string(contents), func(s string) string {
        return "var data = {"
    })
    processedContents = reArray.ReplaceAllStringFunc(processedContents, func(s string) string {
        return "var data = ["
    })

    // Parse the JavaScript file using goja
    vm := goja.New()
    _, err = vm.RunString(processedContents)
    if err != nil {
        log.Fatalf("Error parsing JS file: %v", err)
    }

    // Retrieve the value of the 'data' variable from the JavaScript context
    value := vm.Get("data")
    if value == nil {
        log.Fatalf("No data variable found in the JS file")
    }

    // Output the parsed data
    fmt.Printf("Processed JavaScript file: %s\n", file.Name)
    fmt.Printf("Data extracted: %v\n", value.Export())
}

func run(path string) {
    // Open the zip file
    r, err := zip.OpenReader(path)
    if err != nil {
        log.Fatal(err)
    }
    defer r.Close()

    // Iterate through the files in the zip archive
    for _, f := range r.File {
        // Check if the file is in the /data directory and has a .js extension
        if strings.HasPrefix(f.Name, "data/") && strings.ToLower(filepath.Ext(f.Name)) == ".js" {
            readFile(f)
            return // Exit after processing the first .js file so we don't end up printing a gazillion lines when testing
        }
    }
}

func main() {
    // Example usage
    if len(os.Args) 



Hurrah. Assuming I didn't muck up the copypaste into this post, you should now see a rather ugly print of the struct data from Go.

JSON would be nice

Edit the main.go file to marshall the JSON output.

  • Use value.Export() to get the data from the struct
  • Use json.MarshallIndent() for pretty printed JSON (use json.Marshall if you want to minify the output).
package main

import (
    "archive/zip"
    "encoding/json"
    "fmt"
    "io/ioutil"
    "log"
    "os"
    "path/filepath"
    "regexp"
    "strings"

    "github.com/dop251/goja"
)

func readFile(file *zip.File) {
    // Open the file inside the zip
    rc, err := file.Open()
    if err != nil {
        log.Fatal(err)
    }
    defer rc.Close()

    // Read the contents of the file
    contents, err := ioutil.ReadAll(rc) // deprecated :/
    if err != nil {
        log.Fatal(err)
    }

    // Regular expressions to replace specific patterns
    reConfig := regexp.MustCompile(`window\.\w \s*=\s*{`)
    reArray := regexp.MustCompile(`window\.\w \.\w \.\w \s*=\s*\[`)

    // Replace patterns in the content
    processedContents := reConfig.ReplaceAllStringFunc(string(contents), func(s string) string {
        return "var data = {"
    })
    processedContents = reArray.ReplaceAllStringFunc(processedContents, func(s string) string {
        return "var data = ["
    })

    // Parse the JavaScript file using goja
    vm := goja.New()
    _, err = vm.RunString(processedContents)
    if err != nil {
        log.Fatalf("Error parsing JS file: %v", err)
    }

    // Retrieve the value of the 'data' variable from the JavaScript context
    value := vm.Get("data")
    if value == nil {
        log.Fatalf("No data variable found in the JS file")
    }

    // Convert the data to a Go-native type
    data := value.Export()

    // Marshal the Go-native type to JSON
    jsonData, err := json.MarshalIndent(data, "", "  ")
    if err != nil {
        log.Fatalf("Error marshalling data to JSON: %v", err)
    }

    // Output the JSON data
    fmt.Println(string(jsonData))
}

func run(zipFilePath string) {
    // Open the zip file
    r, err := zip.OpenReader(zipFilePath)
    if err != nil {
        log.Fatal(err)
    }
    defer r.Close()

    // Iterate through the files in the zip archive
    for _, f := range r.File {
        // Check if the file is in the /data directory and has a .js extension
        if strings.HasPrefix(f.Name, "data/") && strings.ToLower(filepath.Ext(f.Name)) == ".js" {
            readFile(f)
            return // Exit after processing the first .js file
        }
    }
}

func main() {
    // Example usage
    if len(os.Args) 



That's it!

go run main.go twitter.zip
}
  "userInfo": {
    "accountId": "1234567890",
    "displayName": "Luke ✨",
    "userName": "lukeocodes"
  }
}

Open source

I'll be open sourcing a lot of this work so that others who want to parse the data from the archive, can store it how they like.

版本声明 本文转载于:https://dev.to/lukeocodes/time-to-leave-time-to-rebuild-making-twitter20-4jgc?1如有侵犯,请联系[email protected]删除
最新教程 更多>
  • 在UTF8 MySQL表中正确将Latin1字符转换为UTF8的方法
    在UTF8 MySQL表中正确将Latin1字符转换为UTF8的方法
    在UTF8表中将latin1字符转换为utf8 ,您遇到了一个问题,其中含义的字符(例如,“jáuòiñe”)在utf8 table tabled tablesset中被extect(例如,“致电。为了解决此问题,您正在尝试使用“ mb_convert_encoding”和“ iconv”转换受...
    编程 发布于2025-06-07
  • 如何为PostgreSQL中的每个唯一标识符有效地检索最后一行?
    如何为PostgreSQL中的每个唯一标识符有效地检索最后一行?
    postgresql:为每个唯一标识符在postgresql中提取最后一行,您可能需要遇到与数据集合中每个不同标识的信息相关的信息。考虑以下数据:[ 1 2014-02-01 kjkj 在数据集中的每个唯一ID中检索最后一行的信息,您可以在操作员上使用Postgres的有效效率: id dat...
    编程 发布于2025-06-07
  • Python中嵌套函数与闭包的区别是什么
    Python中嵌套函数与闭包的区别是什么
    嵌套函数与python 在python中的嵌套函数不被考虑闭合,因为它们不符合以下要求:不访问局部范围scliables to incling scliables在封装范围外执行范围的局部范围。 make_printer(msg): DEF打印机(): 打印(味精) ...
    编程 发布于2025-06-07
  • 找到最大计数时,如何解决mySQL中的“组函数\”错误的“无效使用”?
    找到最大计数时,如何解决mySQL中的“组函数\”错误的“无效使用”?
    如何在mySQL中使用mySql 检索最大计数,您可能会遇到一个问题,您可能会在尝试使用以下命令:理解错误正确找到由名称列分组的值的最大计数,请使用以下修改后的查询: 计数(*)为c 来自EMP1 按名称组 c desc订购 限制1 查询说明 select语句提取名称列和每个名称...
    编程 发布于2025-06-07
  • 如何解决由于Android的内容安全策略而拒绝加载脚本... \”错误?
    如何解决由于Android的内容安全策略而拒绝加载脚本... \”错误?
    Unveiling the Mystery: Content Security Policy Directive ErrorsEncountering the enigmatic error "Refused to load the script..." when deployi...
    编程 发布于2025-06-07
  • Spark DataFrame添加常量列的妙招
    Spark DataFrame添加常量列的妙招
    在Spark Dataframe ,将常数列添加到Spark DataFrame,该列具有适用于所有行的任意值的Spark DataFrame,可以通过多种方式实现。使用文字值(SPARK 1.3)在尝试提供直接值时,用于此问题时,旨在为此目的的column方法可能会导致错误。 df.withCo...
    编程 发布于2025-06-07
  • 如何在php中使用卷发发送原始帖子请求?
    如何在php中使用卷发发送原始帖子请求?
    如何使用php 创建请求来发送原始帖子请求,开始使用curl_init()开始初始化curl session。然后,配置以下选项: curlopt_url:请求 [要发送的原始数据指定内容类型,为原始的帖子请求指定身体的内容类型很重要。在这种情况下,它是文本/平原。要执行此操作,请使用包含以下标头...
    编程 发布于2025-06-07
  • 用户本地时间格式及时区偏移显示指南
    用户本地时间格式及时区偏移显示指南
    在用户的语言环境格式中显示日期/时间,并使用时间偏移在向最终用户展示日期和时间时,以其localzone and格式显示它们至关重要。这确保了不同地理位置的清晰度和无缝用户体验。以下是使用JavaScript实现此目的的方法。方法:推荐方法是处理客户端的Javascript中的日期/时间格式化和时...
    编程 发布于2025-06-07
  • Java中假唤醒真的会发生吗?
    Java中假唤醒真的会发生吗?
    在Java中的浪费唤醒:真实性或神话?在Java同步中伪装唤醒的概念已经是讨论的主题。尽管存在这种行为的潜力,但问题仍然存在:它们实际上是在实践中发生的吗? Linux的唤醒机制根据Wikipedia关于伪造唤醒的文章,linux实现了pthread_cond_wait()功能的Linux实现,利用...
    编程 发布于2025-06-07
  • 为什么使用Firefox后退按钮时JavaScript执行停止?
    为什么使用Firefox后退按钮时JavaScript执行停止?
    导航历史记录问题:JavaScript使用Firefox Back Back 此行为是由浏览器缓存JavaScript资源引起的。要解决此问题并确保在后续页面访问中执行脚本,Firefox用户应设置一个空功能。 警报'); }; alert('inline Alert')...
    编程 发布于2025-06-07
  • FastAPI自定义404页面创建指南
    FastAPI自定义404页面创建指南
    response = await call_next(request) if response.status_code == 404: return RedirectResponse("https://fastapi.tiangolo.com") else: ...
    编程 发布于2025-06-07
  • 编译器报错“usr/bin/ld: cannot find -l”解决方法
    编译器报错“usr/bin/ld: cannot find -l”解决方法
    错误:“ usr/bin/ld:找不到-l “ 此错误表明链接器在链接您的可执行文件时无法找到指定的库。为了解决此问题,我们将深入研究如何指定库路径并将链接引导到正确位置的详细信息。添加库搜索路径的一个可能的原因是,此错误是您的makefile中缺少库搜索路径。要解决它,您可以在链接器命令中添加...
    编程 发布于2025-06-07
  • 如何使用PHP从XML文件中有效地检索属性值?
    如何使用PHP从XML文件中有效地检索属性值?
    从php $xml = simplexml_load_file($file); foreach ($xml->Var[0]->attributes() as $attributeName => $attributeValue) { echo $attributeName,...
    编程 发布于2025-06-07
  • 对象拟合:IE和Edge中的封面失败,如何修复?
    对象拟合:IE和Edge中的封面失败,如何修复?
    To resolve this issue, we employ a clever CSS solution that solves the problem:position: absolute;top: 50%;left: 50%;transform: translate(-50%, -50%)...
    编程 发布于2025-06-07
  • 如何使用FormData()处理多个文件上传?
    如何使用FormData()处理多个文件上传?
    )处理多个文件输入时,通常需要处理多个文件上传时,通常是必要的。 The fd.append("fileToUpload[]", files[x]); method can be used for this purpose, allowing you to send multi...
    编程 发布于2025-06-07

免责声明: 提供的所有资源部分来自互联网,如果有侵犯您的版权或其他权益,请说明详细缘由并提供版权或权益证明然后发到邮箱:[email protected] 我们会第一时间内为您处理。

Copyright© 2022 湘ICP备2022001581号-3