PHP Classes

basset-ir: Retrieve, transform and process text documents

Recommend this page to a friend!
  Info   Documentation   View files Files   Install with Composer Install with Composer   Download Download   Reputation   Support forum   Blog    
Ratings Unique User Downloads Download Rankings
Not enough user ratingsTotal: 173 All time: 8,807 This week: 62Up
Version License PHP version Categories
basset-ir 2.52GNU Lesser Genera...7.1Algorithms, PHP 5, Statistics, Text p...
Description 

Author

This package can retrieve, transform and process text documents.

It can take one or more text documents eventually from files and retrieve their contents to perform several types of processing to transform the documents. Currently it can:

- Extract text features
- Normalize text
- Evaluate text similarity
- Perform statistic calculations
- Split the text in tokens
- Generate stem strings from the text words

Innovation Award
PHP Programming Innovation award winner
April 2018
Winner


Prize: One big elePHPant Plush Mascott
Word processing operations are very useful for applications that need to take some text written by humans and make sense out of it somehow.

This package can perform several types of word processing operations that can be useful for many types of PHP applications.

Manuel Lemos
Picture of Jericko Tejido
  Performance   Level  
Innovation award
Innovation award
Nominee: 1x

Winner: 1x

 

Documentation

Droopy

Build Status

Basset

Basset is a full-text PHP Information Retrieval library. This is a collection of developments in the field of IR and ported over to PHP for research purposes.

Basset provides different ways of searching through documents in a collection (ad-hoc retrieval), by applying advanced and experimental IR algorithms and/or techniques gathered from different Research studies and Conferences, most notably:

  1. TREC
  2. SIGIR
  3. ECIR
  4. ACM

Documentation

You can read about it here

Using the Cranfield Collection and the sample.php file

The Cranfield Collection has been the pioneer collection in information retrieval to validate a system's effectiveness.

I've included the 1400 abstract Cranfield Collection as an XML file that you can parse into separate files.

The test file at tests/sample.php can be executed right away to do the parsing and do a search for a single test query. Customize it to your needs if needed.

You can read Cranfield/cranfield-collection/cranqrel for Glassgow's qrels result.

I've also included SMART system's stopword list for standardization (see stopwords/stopwords.txt).


  Files folder image Files (200)  
File Role Description
Files folder imageconfig (1 file)
Files folder imageCranfield (1 file, 1 directory)
Files folder imagesrc (1 directory)
Files folder imagestopwords (1 file)
Files folder imagetests (3 files, 1 directory)
Accessible without login Plain text file .travis.yml Data Auxiliary data
Accessible without login Plain text file autoload.php Aux. Auxiliary script
Accessible without login Plain text file composer.json Data Auxiliary data
Accessible without login Plain text file LICENSE Lic. License text
Accessible without login Plain text file README.markdown Doc. Documentation

The PHP Classes site has supported package installation using the Composer tool since 2013, as you may verify by reading this instructions page.
Install with Composer Install with Composer
 Version Control Unique User Downloads Download Rankings  
 100%
Total:173
This week:0
All time:8,807
This week:62Up