Welcome to the MacNN Forums.

If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.

You are here: MacNN Forums > Software - Troubleshooting and Discussion > Mac OS X > How big is Spotlights index?

How big is Spotlights index?
Thread Tools
Addicted to MacNN
Join Date: Apr 2001
Location: The bottom of Cloud City
Status: Offline
Reply With Quote
May 19, 2005, 01:44 PM
 
Just curious. What is it called and how can I find it to see how big it might be?

"Ahhhhhhhhhhhhhhhh"
     
Grizzled Veteran
Join Date: May 2001
Location: Ca
Status: Offline
Reply With Quote
May 19, 2005, 02:34 PM
 
I was wondering that as well.
Anyone
With some loud music + a friend to chat nearby you can get alot done. - but jezz, I'd avoid it if I had the choice---- If only real people came with Alpha Channels.......:)
AIM:xflaer
deinterlaced.com
     
Senior User
Join Date: Nov 1999
Location: Milkyway Galaxy
Status: Offline
Reply With Quote
May 19, 2005, 03:07 PM
 
Simple:
Check out the hidden folder .Spotlight-V100/ on the base of all your volumes. Inside that folder you'll find these files which constitute the index:

.journalHistoryLog
ContentIndex.db
store.db
.store.db
_rules.plist

PS. need to be root to access the .Spotlight-V100/ directory, not just an admin.

---
I Know It All
     
Posting Junkie
Join Date: Dec 2000
Status: Offline
Reply With Quote
May 19, 2005, 05:00 PM
 
You can see how large your Spotlight index is by typing this:

sudo du -sh /.Spotlight-V100/

Mine is 369 MB.

Ticking sound coming from a .pkg package? Don't let the .bom go off! Inspect it first with Pacifist. Macworld - five mice!
     
Clinically Insane
Join Date: Nov 1999
Status: Offline
Reply With Quote
May 19, 2005, 05:07 PM
 
I would imagine that it depends largely upon how many files you have accessible to you. The size is unlikely to matter as much as the number, so a big drive with only a few files, such as one used for capturing video, would still have a smaller index than a small drive with many files, such as the drive you boot from.
You are in Soviet Russia. It is dark. Grue is likely to be eaten by YOU!
     
Mac Elite
Join Date: Nov 2001
Status: Offline
Reply With Quote
May 19, 2005, 06:33 PM
 
Originally Posted by Millennium
I would imagine that it depends largely upon how many files you have accessible to you. The size is unlikely to matter as much as the number, so a big drive with only a few files, such as one used for capturing video, would still have a smaller index than a small drive with many files, such as the drive you boot from.
This is going to be highly dependent on the file types. That's because Spotlight will do full text searches on "text" files (plain text, Word, PDF, every email, etc.) files, so if you have 10,000 1 MB text documents, you're going to have a HUGE file.

Larger files (videos, photos, etc.,) that aren't text, will just have the metadata indexed, so they won't add as much to the index.
     
Clinically Insane
Join Date: Nov 1999
Status: Offline
Reply With Quote
May 19, 2005, 06:52 PM
 
Originally Posted by CatOne
This is going to be highly dependent on the file types. That's because Spotlight will do full text searches on "text" files (plain text, Word, PDF, every email, etc.) files, so if you have 10,000 1 MB text documents, you're going to have a HUGE file.
The Project Gutenberg translation of Les Miserables, one of the longest single volumes around, is only three megs, and that file is pure ASCII. No one has that many 1-MB text files, if only because such files are few and far between outside of server logs. Also, it is unlikely that Spotlight actually stores the full text in its index. Instead, it ought to go out to the file itself, both to ensure that it has the most recent data and to save space. The slight performance overhead involved in opening the file will outweigh the potentially massive space (not to mention privacy and security) overhead of storing the file's contents in another location.

Most "text" files actually have very small amounts of actual text when compared to the size of the file. Word files are the worst when it comes to formatting bloat, but PDFs aren't exactly lean and mean either.
You are in Soviet Russia. It is dark. Grue is likely to be eaten by YOU!
     
Clinically Insane
Join Date: Oct 2000
Location: Los Angeles
Status: Offline
Reply With Quote
May 19, 2005, 07:02 PM
 
Originally Posted by Millennium
The Project Gutenberg translation of Les Miserables, one of the longest single volumes around, is only three megs, and that file is pure ASCII. No one has that many 1-MB text files, if only because such files are few and far between outside of server logs. Also, it is unlikely that Spotlight actually stores the full text in its index. Instead, it ought to go out to the file itself, both to ensure that it has the most recent data and to save space. The slight performance overhead involved in opening the file will outweigh the potentially massive space (not to mention privacy and security) overhead of storing the file's contents in another location.

Most "text" files actually have very small amounts of actual text when compared to the size of the file. Word files are the worst when it comes to formatting bloat, but PDFs aren't exactly lean and mean either.
Yeah, PDFs will definitely be larger than any native word processing file, including Word, since the format encapsulates a variety of information in order to display the file as it was seen in its native form. However, I don't see how that extra data would increase the size of the Spotlight index for that file type. As others have said, Spotlight is simply indexing text.

"The natural progress of things is for liberty to yield and government to gain ground." TJ
     
Fresh-Faced Recruit
Join Date: Mar 2005
Location: Uk
Status: Offline
Reply With Quote
May 19, 2005, 09:36 PM
 
Originally Posted by Millennium
Also, it is unlikely that Spotlight actually stores the full text in its index. Instead, it ought to go out to the file itself, both to ensure that it has the most recent data and to save space. The slight performance overhead involved in opening the file will outweigh the potentially massive space (not to mention privacy and security) overhead of storing the file's contents in another location.
The performance overhead in opening the file and reading it to determine its contents isn't slight but massive.
Isn't the whole point to spotlight that almost all the info it needs about a file is stored in the database allowing fast searches?
Most text files people make are going to contain a small number of words repeated frequently. If the database can get away with storing a reference to a word and its frequency most documents will take up a lot less space in the database.
     
Addicted to MacNN
Join Date: Apr 2001
Location: The bottom of Cloud City
Status: Offline
Reply With Quote
May 20, 2005, 12:36 AM
 
I mostly have photoshop files. Anywho, mine is 148M.

"Ahhhhhhhhhhhhhhhh"
     
Mac Elite
Join Date: Jan 2000
Location: Seattle, WA, King
Status: Offline
Reply With Quote
May 20, 2005, 01:41 AM
 
Spotlight will only index the first few hundred KB of files, IIRC.
     
Clinically Insane
Join Date: Oct 2000
Location: Los Angeles
Status: Offline
Reply With Quote
May 20, 2005, 01:45 AM
 
Originally Posted by stinch
The performance overhead in opening the file and reading it to determine its contents isn't slight but massive.
Isn't the whole point to spotlight that almost all the info it needs about a file is stored in the database allowing fast searches?
Most text files people make are going to contain a small number of words repeated frequently. If the database can get away with storing a reference to a word and its frequency most documents will take up a lot less space in the database.
You're partially missing the point - it's an index rather than a database.

"The natural progress of things is for liberty to yield and government to gain ground." TJ
     
Posting Junkie
Join Date: Dec 2000
Status: Offline
Reply With Quote
May 20, 2005, 02:21 AM
 
Originally Posted by Big Mac
You're partially missing the point - it's an index rather than a database.
Actually, the files within the invisible /.Spotlight-V100 directory have the extension .db. To me, that stands for "database."

Ticking sound coming from a .pkg package? Don't let the .bom go off! Inspect it first with Pacifist. Macworld - five mice!
     
Professional Poster
Join Date: Jun 2001
Location: Northwest Ohio
Status: Offline
Reply With Quote
May 20, 2005, 07:38 AM
 
Originally Posted by CharlesS
Actually, the files within the invisible /.Spotlight-V100 directory have the extension .db. To me, that stands for "database."
It's an index stored in a database.
     
Mac Elite
Join Date: Nov 2001
Status: Offline
Reply With Quote
May 20, 2005, 11:01 AM
 
Originally Posted by Millennium
The Project Gutenberg translation of Les Miserables, one of the longest single volumes around, is only three megs, and that file is pure ASCII. No one has that many 1-MB text files, if only because such files are few and far between outside of server logs. Also, it is unlikely that Spotlight actually stores the full text in its index. Instead, it ought to go out to the file itself, both to ensure that it has the most recent data and to save space. The slight performance overhead involved in opening the file will outweigh the potentially massive space (not to mention privacy and security) overhead of storing the file's contents in another location.

Most "text" files actually have very small amounts of actual text when compared to the size of the file. Word files are the worst when it comes to formatting bloat, but PDFs aren't exactly lean and mean either.
Milliennium, you're a software developer, right? Do you have Apple's developer documentation library on your machine? It's > 450 MB of PDFs. And much/most of that is text (sure, you can get a 120 page manual on the //vx if you want, and 5 copies of iMac technical specifications).

Also, I have > 100 MB of emails. And that's pretty much 100% text, because the attachments are in a separate folder. Many of those emails are > 80 kb in size, because people don't properly trim the emails on replies and they get long response chains. Gotta love sales reps.

Spotlight doesn't store the FULL text in its index (I'm sure it cuts out ifs, and, and buts). But it really does have to index many of the words in there... how else could you find "Pythagoras" in a PDF, without putting it in an index somewhere? Are you going to hash each word in some way?
     
Mac Elite
Join Date: Nov 2001
Status: Offline
Reply With Quote
May 20, 2005, 11:03 AM
 
[turbo:/$] sudo du -sk .Spotlight-V100/
236716 .Spotlight-V100/
     
   
Thread Tools
Forum Links
Forum Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts
BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On
Top
Privacy Policy
All times are GMT -5. The time now is 07:16 PM.
All contents of these forums © 1995-2011 MacNN. All rights reserved.
Branding + Design: www.gesamtbild.com
vBulletin v.3.8.7 © 2000-2011, Jelsoft Enterprises Ltd., Content Relevant URLs by vBSEO 3.3.2