Code
import pandas as pd
= pd.read_parquet("/data/blog/2021-07-28-wikipedia-link-recognition/enwiki-20210701-pages-articles1.xml-p1p41242.gz.parquet")
df df
title | text | link | start | end | |
---|---|---|---|---|---|
0 | Anarchism | Anarchism is a political philosophy and moveme... | [political philosophy, Political movement, aut... | [15, 40, 70, 127, 179, 264, 317, 344, 362, 392... | [35, 48, 79, 136, 184, 272, 336, 355, 383, 410... |
1 | Autism | Autism is a developmental disorder characteriz... | [developmental disorder, Regressive autism, de... | [12, 308, 375, 461, 473, 562, 588, 612, 621, 6... | [34, 318, 399, 468, 494, 569, 601, 619, 631, 6... |
2 | Albedo | sunlight relative to various surface conditio... | [sunlight, diffuse reflection, sunlight, solar... | [1, 117, 139, 172, 239, 397, 417, 820, 865, 14... | [9, 135, 154, 187, 249, 406, 427, 839, 876, 14... |
3 | A | A, or a, is the first letter and the first vow... | [Letter (alphabet), vowel letter, English alph... | [22, 43, 63, 95, 144, 168, 203, 224, 258, 598,... | [28, 55, 86, 119, 145, 171, 223, 229, 267, 609... |
4 | Alabama | Alabama () is a state in the Southeastern regi... | [Southeastern United States, United States, Te... | [29, 56, 83, 107, 128, 144, 177, 217, 246, 272... | [41, 69, 92, 114, 135, 158, 188, 237, 264, 283... |
... | ... | ... | ... | ... | ... |
21073 | Heuristic routing | Heuristic routing is a system used to describe... | [network topology, Heuristic, Routing, telecom... | [90, 114, 212, 325, 357, 435, 1843, 1848, 2037... | [106, 123, 219, 351, 374, 444, 1847, 1853, 204... |
21074 | Hierarchical routing | Hierarchical routing is a method of routing in... | [routing, network address, Transmission Contro... | [36, 86, 103, 133, 152, 228, 254, 276, 340, 35... | [43, 96, 132, 150, 158, 235, 261, 280, 348, 36... |
21075 | High-performance equipment | High-performance equipment describes telecommu... | [telecommunications, electromagnetic interfere... | [37, 249, 309] | [55, 277, 316] |
21076 | Hop | A hop is a type of jump. Hop or hops may also ... | [Jumping, Hop (film), Hop! Channel, House of P... | [19, 58, 84, 122, 167, 217, 274, 405, 432, 512... | [23, 68, 96, 136, 176, 225, 286, 429, 444, 522... |
21077 | Horn | Horn most often refers to: *Horn (acoustic), a... | [Horn (acoustic), Horn (instrument), Horn (ana... | [28, 102, 179, 340, 495, 514, 530, 543, 560, 6... | [43, 119, 193, 347, 511, 526, 540, 557, 566, 6... |
21078 rows × 5 columns