warc
Here are 67 public repositories matching this topic...
Since the data elements in the left Metadata panel in a Collection Index are searchable via the search bar above the links, it would be helpful to add something like the following to the user guide:
Search Collection Index
- Find all links from a session in a Collection by typing session:
(session + colon, no spaces) - After typing session: enter the session ID (can be foun
The compression is unwanted e.g. when i'm scraping on a drive with filesystem compression, or when I want to use a strong compression algo after i'm done scraping.
-
Updated
Nov 13, 2019 - JavaScript
The README states:
Both Service Worker and Custom Element APIs are new and only supported in modern web browsers (e.g., Chrome > v67, Firefox > v63).
Since ipwb's inception, support for these two techs have become largely ubiquitous in all major browsers except for Safari/Custom Elements and some mobile browsers per CanIUse. I think this mes
The About/Help menu does not have a tie back into this repo for users to submit bugs encountered. This might be useful, as otherwise there is no reference to the repo in the application, just an e-mail address.
warcio uses a default Content-Type value for WARC records of application/warc-record. This MIME type is not documented or specified anywhere; the WARC spec only mentions application/warc as the MIME type for WARC files and application/warc-fields for warcinfo and metadata records (though it is ambiguous on whether that is required or
After pull request #170 c++ document aligner compilation fails with the following error:
[ 44%] Linking CXX executable bin/ngram_test
/usr/bin/ld: /usr/lib/gcc/x86_64-linux-gnu/9/../../../x86_64-linux-gnu/Scrt1.o: in function `_start':
(.text+0x24): undefined reference to `main'
collect2: error: ld returned 1 exit status
make[2]: *** [CMakeFiles/ngram_test.dir/build.make:163: bin/ngram_t
The current behavior needs to be documented. Per the WARC/1.1 spec, there is no documented "right way" but identifying the current approaches would be useful and help to guide how it might be done.
-
Updated
Dec 13, 2019 - Scala
-
Updated
May 24, 2020 - Python
-
Updated
Jun 18, 2020 - Java
-
Updated
Feb 3, 2019 - JavaScript
-
Updated
Apr 16, 2020 - JavaScript
-
Updated
May 24, 2020 - Python
Documentation needed
Looks like a great project, but I'm having trouble changing the Extensions.java file in order to find different extensions. I couldn't find any documentation for how to accomplish this, and when I change the file and re-run ./gradlew check I get failed most of the time.
Please add documentation for how to alter the Extensions file successfully, and also how to run the code against the oldindex
-
Updated
Jun 12, 2020 - Ruby
Add a command-line option that allows a) replacing the default click settings (click.yaml) and b) adding more of them at runtime.
--click-data=click.yaml
--click-match="^example\.com"
--click-selector="div.foo span.bar"
-
Updated
Aug 20, 2017 - R
-
Updated
Jun 22, 2020 - JavaScript
-
Updated
Jun 2, 2020 - JavaScript
-
Updated
Apr 11, 2017 - Python
-
Updated
Aug 7, 2018 - Python
Improve this page
Add a description, image, and links to the warc topic page so that developers can more easily learn about it.
Add this topic to your repo
To associate your repository with the warc topic, visit your repo's landing page and select "manage topics."

We should clean up some of the more problematic documentation issues.
CHANGES.mdfile? Can we also pull in @kris-sigur blogs 1, [2](https://kris-sigur